Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Saisai Shao
Nan, I think Meisam already had a PR about this this, maybe you can discuss with him on the github based on the proposed code. Sorry I didn't follow the long discussion thread, but I think Paypal's solution sounds simpler. On Wed, Aug 23, 2017 at 12:07 AM, Nan Zhu wrote:

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-22 Thread Nan Zhu
based on this result, I think we should follow the bulk operation pattern Shall we move forward with the PR from Paypal? Best, Nan On Mon, Aug 21, 2017 at 12:21 PM, Meisam Fathi wrote: > Bottom line up front: > 1. The cost of calling 1 individual REST calls is

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Hi Marcelo, > I'm not really familiar with how multi-node HA was implemented (I > stopped at session recovery), but why isn't a single server doing the > update and storing the results in ZK? Unless it's actually doing > load-balancing, it seems like that would avoid multiple servers having > to

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
> Just an FYI, apache mailing lists cant share attachments. If you could > please upload the files to another file sharing site and include links > instead. > Thanks for the information. I added the files to the JIRA ticket and put the contents of the previous email as a comment. Here are the

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
I forgot to attach the first chart. Sorry about that. [image: transfer_time_bar_plot.png] Thanks, Meisam On Mon, Aug 21, 2017 at 12:21 PM Meisam Fathi wrote: > Bottom line up front: > 1. The cost of calling 1 individual REST calls is about two order of > magnitude

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-21 Thread Meisam Fathi
Bottom line up front: 1. The cost of calling 1 individual REST calls is about two order of magnitude higher than calling a single batch REST call (1 * 0.05 seconds vs. 1.4 seconds) 2. Time to complete a batch REST call plateaus at about 10,000 application reports per call. Full story: I

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
I like that approach on paper, although I currently don't have much time to actually be able to review the PR and provide decent feedback. I think that regardless of the approach, one goal should be to probably separate what is being monitored from how it's being monitored; that way you can later

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Prabhu Kasinathan
As Meisam highlighted, in our case, we have Livy Multi-Node HA i.e livy running on 6 servers for each cluster, load-balanced, sharing livy metadata on zookeeper and running thousands of applications. With below changes, we are seeing good improvements due to batching the requests (one per livy

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
Hi Nan, In the highlighted line > > https://github.com/apache/incubator-livy/pull/36/files#diff-a3f879755cfe10a678cc08ddbe60a4d3R75 > > I assume that it will get the reports of all applications in YARN, even > they are finished? That's right. That line will return reports for all Spark

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:57 PM, Nan Zhu wrote: > yes, we finally converge on the idea > > how large the reply can be? if I have only one running applications and I > still need to fetch 1000 > > on the other side > > I have 1000 running apps, what's the cost of sending

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
yes, we finally converge on the idea how large the reply can be? if I have only one running applications and I still need to fetch 1000 on the other side I have 1000 running apps, what's the cost of sending 1000 requests even the thread pool and yarn client are shared? On Wed, Aug 16, 2017

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 12:27 PM, Nan Zhu wrote: > I am using your words *current*. What's the definition of "current" in > livy? I think that's all application which still keep some records in the > livy's process's memory space There are two views of what is current:

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:34 AM, Nan Zhu wrote: > Yes, I know there is such an API, what I don't understand is what I should > pass in the filtering API you mentioned, say we query YARN for every 5 > tickets > > 0: Query and get App A is running > > 4: App A is done > >

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
Yes, I know there is such an API, what I don't understand is what I should pass in the filtering API you mentioned, say we query YARN for every 5 tickets 0: Query and get App A is running 4: App A is done 5: Query...so what I should fill as filtering parameters at 5 get capture the changes of

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
yes, it is going to be Akka if moving forward (at least not going to introduce an actor framework to livy) On Wed, Aug 16, 2017 at 11:24 AM, Meisam Fathi wrote: > That is true, but I was under the impression that this will be implemented > with Akka (maybe because it is

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Meisam Fathi
That is true, but I was under the impression that this will be implemented with Akka (maybe because it is mentioned in the design doc). On Wed, Aug 16, 2017 at 11:21 AM Marcelo Vanzin wrote: > On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi > wrote:

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 11:16 AM, Meisam Fathi wrote: > I do agree that actor based design is cleaner and more maintainable. But we > had to discard it because it adds more dependencies to Livy. I've been reading "actor system" as a design pattern, not as introducing a

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Nan Zhu
> What I proposed is having a single request to YARN to get all applications' statuses, if that's possible. You'd still have multiple application handles that are independent of each other. They'd all be updated separately from that one thread talking to YARN. This has nothing to do with a "shared

Re: resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-16 Thread Marcelo Vanzin
On Wed, Aug 16, 2017 at 9:06 AM, Nan Zhu wrote: >> I'm not really sure what you're talking about here, since I did not > suggest a "shared data structure", and I'm not really sure what that > means in this context. > > What you claimed is just monitoring/updating the state

resolve the scalability problem caused by app monitoring in livy with an actor-based design

2017-08-14 Thread Nan Zhu
Hi, all In HDInsight, we (Microsoft) use Livy as the Spark job submission service. We keep seeing the customers fall into the problem when they submit many concurrent applications to the system, or recover livy from a state with many concurrent applications By looking at the code and the