[jira] [Commented] (LIVY-684) Livy server support zookeeper service discovery

2019-09-19 Thread Yiheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933110#comment-16933110
 ] 

Yiheng Wang commented on LIVY-684:
--

This is duplicate with https://issues.apache.org/jira/browse/LIVY-616

> Livy server support zookeeper service discovery
> ---
>
> Key: LIVY-684
> URL: https://issues.apache.org/jira/browse/LIVY-684
> Project: Livy
>  Issue Type: New Feature
>Reporter: Zhefeng Wang
>Priority: Minor
>
> Livy server hasn't support service discovery, which is widely used in highly 
> available. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (LIVY-684) Livy server support zookeeper service discovery

2019-09-19 Thread Zhefeng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933114#comment-16933114
 ] 

Zhefeng Wang commented on LIVY-684:
---

[~yihengw] Thanks, I find it, this can be closed

> Livy server support zookeeper service discovery
> ---
>
> Key: LIVY-684
> URL: https://issues.apache.org/jira/browse/LIVY-684
> Project: Livy
>  Issue Type: New Feature
>Reporter: Zhefeng Wang
>Priority: Minor
>
> Livy server hasn't support service discovery, which is widely used in highly 
> available. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (LIVY-667) Support query a lot of data.

2019-09-19 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930215#comment-16930215
 ] 

runzhiwang edited comment on LIVY-667 at 9/20/19 3:13 AM:
--

hi,[~jerryshao], [~mgaido]. There are four design to support query a lot of 
data.  What's your opinion?

1.Merge the result rdd to one partition, and save as a single file in hdfs. And 
livy reads the file line by line directly.  Cons: it's slow to read line by 
line.
 2.Repartition each partition into fixed size, and save in hdfs. And livy reads 
by toLocalIterator which read one partition into memory at each time. Cons: 
there are a lot of files in hdfs if the size of each partition is too small.
 3.Cache rdd, and read each partition by batch. Cons: the shortage of memory 
and disk will cause the recompute of rdd, which maybe time-consuming
 4.Save rdd to hdfs without repartition. and read each partition by batch, the 
code of "read each partition by batch" is just like the PR.


was (Author: runzhiwang):
hi,[~jerryshao], [~mgaido]. There are several design to support query a lot of 
data.  What's your opinion?

1.Merge the result rdd to one partition, and save as a single file in hdfs. And 
livy reads the file line by line directly.  Cons: it's slow to read line by 
line.
 2.Repartition each partition into fixed size, and save in hdfs. And livy reads 
by toLocalIterator which read one partition into memory at each time. Cons: 
there are a lot of files in hdfs if the size of each partition is too small.
 3.Cache rdd, and read each partition by batch. Cons: the shortage of memory 
and disk will cause the recompute of rdd, which maybe time-consuming
 4.Save rdd to hdfs without repartition. and read each partition by batch, the 
code of "read each partition by batch" is just like the PR.

> Support query a lot of data.
> 
>
> Key: LIVY-667
> URL: https://issues.apache.org/jira/browse/LIVY-667
> Project: Livy
>  Issue Type: Bug
>  Components: Thriftserver
>Affects Versions: 0.6.0
>Reporter: runzhiwang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When enable livy.server.thrift.incrementalCollect, thrift use toLocalIterator 
> to load one partition at each time instead of the whole rdd to avoid 
> OutOfMemory. However, if the largest partition is too big, the OutOfMemory 
> still occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)