Re: shutdown vs kill API is Mesos

2018-01-20 Thread Mohit Jaggi
Thanks Stephan. Please read inline. On Sat, Jan 20, 2018 at 5:03 AM, Stephan Erb wrote: > Q1: Does Aurora use COMMAND or DEFAULT executor? > > > Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM > executor. On top, Aurora supports alternative custom executors [1] such as > th

Re: shutdown vs kill API is Mesos

2018-01-20 Thread Stephan Erb
> Q1: Does Aurora use COMMAND or DEFAULT executor? Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM executor. On top, Aurora supports alternative custom executors [1] such as the Docker compose executor [2]. Mesos seems to be betting on the new DEFAULT executor. It should b

Re: shutdown vs kill API is Mesos

2018-01-17 Thread Mohit Jaggi
FYII had a quick chat with Vinod from the Mesos team. I have some questions for Aurora users inline: *Originally the default was the COMMAND executor. In this world the scheduler has no visibility into the command executor.* *More recently, we added a DEFAULT executor which is used by framewo

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
So that is pretty much what I proposed... If the method signature has to change, we can keep the executorId as it is, unless we want to take this opportunity to clean that up. I will check if the SHUTDOWN works in non-executor cases also. On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner wrote: > We

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
> > We still need "Agent ID" for the shutdown call. Darn. In that case, how about we change the method signature in Driver to accept agentId and ignore that param in MesosSchedulerDriver. But do we really need the command line option? Aurora can run tasks without an executor. I'm assuming th

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
We still need "Agent ID" for the shutdown call. On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi wrote: > Sounds good. But do we really need the command line option? One can use an > older Driver if KILL is preferred for some reason. > > On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner wrote: > >> This

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
Sounds good. But do we really need the command line option? One can use an older Driver if KILL is preferred for some reason. On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner wrote: > This situation is much simpler if task ID == executor ID. I can't come up > with a good reason why this is not the

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
This situation is much simpler if task ID == executor ID. I can't come up with a good reason why this is not the case today. Our executor IDs originally included static prefix, though i do not recall any rationale for this. When Renan added custom executor support, this static prefix was made co

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
To do this in a backward compatible manner, one way is : ``` void destroy(taskId, executorId, agentId) { if(driver instanceOf Versioned) (Versioned...)driver.shutdown(executorId, agentId) else driver.kill(taskId) } ``` Any other opinions? On Tue, Jan 16, 2018 at 11:12 AM, David McLau

Re: shutdown vs kill API is Mesos

2018-01-16 Thread David McLaughlin
Nope, I support getting SHUTDOWN in for users of the new API. On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi wrote: > Are you suggesting that we delay the switch to SHUTDOWN call until this > working group can resolve the API perf issue? > > On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin > wr

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
Are you suggesting that we delay the switch to SHUTDOWN call until this working group can resolve the API perf issue? On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin wrote: > We are working with Mesos folks to resolve it. There is a Mesos > performance working group that folks can join if they

Re: shutdown vs kill API is Mesos

2018-01-15 Thread David McLaughlin
We are working with Mesos folks to resolve it. There is a Mesos performance working group that folks can join if they'd like to contribute: http://mesos.apache.org/blog/performance-working-group-progress-report/ I'm not sure what you mean by branch. Everything we used to scale test is on master.

Re: shutdown vs kill API is Mesos

2018-01-15 Thread Meghdoot bhattacharya
David, should twitter try against mesos 1.5 to see if things are better with the new api instead of libmesos. This is going to be a drift over time that will stop us from adopting new features. If it was sometime back it would be good to rerun the tests and open a ticket in Mesos if issues exis

Re: shutdown vs kill API is Mesos

2018-01-12 Thread Mohit Jaggi
I understand. You don't agree with the second point of the summary. What about this: If I change Driver.kill it to have a method Driver.destroy that calls either KILL or SHUTDOWN as follows: void destroy(taskId, executorId, agentId) { if(driver instanceOf Versioned) driver.shutdown(execut

Re: shutdown vs kill API is Mesos

2018-01-12 Thread David McLaughlin
I'm not sure I agree with the summary. Bill's proposal was using shutdown only when using the new API. I would also support this if it's possible. On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi wrote: > Summary so far: > - Bill supports making this change > - This change cannot be made in a backw

Re: shutdown vs kill API is Mesos

2018-01-12 Thread Mohit Jaggi
Summary so far: - Bill supports making this change - This change cannot be made in a backward compatible manner - David (Twitter) does not want to use HTTP APIs due to performance concerns. I conclude that folks from Twitter don't support this change Question: - Are there other users that want thi

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
Sorry, I guess referring to it as the libmesos way of talking to the Mesos master is a bit misleading. And I stand corrected, the V0 is only an adaptor to the V1 interface which still uses the undocumented RPC way of talking to the master ( https://github.com/apache/mesos/blob/master/src/java/jni/

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Mohit Jaggi
David, - LCD makes sense. Does that mean that Twitter is using the SCHEDULER_DRIVER version? - I don't see Bill's proposal on this thread.

Re: shutdown vs kill API is Mesos

2018-01-11 Thread David McLaughlin
Sorry, the other approach outlined by Bill would in theory work too, but it sounds like in practice it also needs more changes on the Mesos side. On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin wrote: > Right. In order to keep the current abstraction in Aurora (both APIs), we > obviously have

Re: shutdown vs kill API is Mesos

2018-01-11 Thread David McLaughlin
Right. In order to keep the current abstraction in Aurora (both APIs), we obviously have to bind to the lower common denominator API methods. So the only way to integrate with shutdown will be to fix the performance issues so we can switch to the new API. The performance issue we ran into at Twitt

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
The HTTP API is what is used under the hood for V0 and V1 (instead of libmesos), I believe that's what David was referencing when he mentioned the HTTP performance issues. Here's a better explanation from the original patch submitted by Zameer: https://github.com/apache/aurora/commit/705dbc7cd7c3ff

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Mohit Jaggi
Thanks Renan. I saw that code. "Driver" interface does not have SHUTDOWN...so it is not "compatible". I was trying to change to VersionedSchedulerDriverService all over the code (that wreaks havoc across the tests!) but Mesos's Java wrapper https://github.com/apache/mesos/tree/72752fc6deb8ebcbfbd54

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/VersionedSchedule

Re: shutdown vs kill API is Mesos

2018-01-09 Thread Mohit Jaggi
David, Where can I find this code? Mohit. On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin wrote: > The new API is present in Aurora in a compatibility layer, but the HTTP > performance issues still exist so we can't make it the default. > > On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner wrote: >

Re: shutdown vs kill API is Mesos

2017-12-09 Thread Mohit Jaggi
Filed https://issues.apache.org/jira/browse/AURORA-1960 On Sat, Dec 9, 2017 at 4:45 PM, Bill Farner wrote: > The new API is present in Aurora in a compatibility layer > > > Aha! I had not explored that code >

Re: shutdown vs kill API is Mesos

2017-12-09 Thread Bill Farner
> > The new API is present in Aurora in a compatibility layer Aha! I had not explored that code yet. It does seem that

Re: shutdown vs kill API is Mesos

2017-12-09 Thread David McLaughlin
The new API is present in Aurora in a compatibility layer, but the HTTP performance issues still exist so we can't make it the default. On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner wrote: > Aurora pre-dates SHUTDOWN by several years, so the option was not > present. Additionally, the SHUTDOWN ca

Re: shutdown vs kill API is Mesos

2017-12-09 Thread Bill Farner
Aurora pre-dates SHUTDOWN by several years, so the option was not present. Additionally, the SHUTDOWN call is not available in the API used by Aurora. Last i knew, Aurora could not use the "new" API because of performance issues in the implementation, but i do not know where that stands today. ht

shutdown vs kill API is Mesos

2017-12-09 Thread Mohit Jaggi
Folks, Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for killing tasks. As Aurora has an executor per task, won't SHUTDOWN work better? It will avoid zombie executors. Mohit.