Re: shutdown vs kill API is Mesos

2018-01-20 Thread Stephan Erb
> Q1: Does Aurora use COMMAND or DEFAULT executor? 

Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM executor. 
On top, Aurora supports alternative custom executors [1] such as the Docker 
compose executor [2].
Mesos seems to be betting on the new DEFAULT executor. It should be possible to 
make Thermos fit the DEFAULT executor model (as it supports task groups), but I 
have no real estimate how much refactoring this would require. 

> Q2: I think that this is ok as Aurora's reconciliation will still work... 
> Right?

Aurora assumes a correspondence of one task per executor, so I believe this is 
correct.

> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it 
> already handle that?

I have never tried it, but I believe it should work out of the box [3].

[1] 
https://github.com/apache/aurora/blob/master/docs/features/custom-executors.md
[2] https://github.com/mesos/docker-compose-executor
[3] 
https://github.com/apache/aurora/blob/8af269f52f162faa36cd2778979626eefcbe8181/src/main/python/apache/aurora/executor/aurora_executor.py#L301-L313



Best regards,
Stephan


On Wed, 2018-01-17 at 16:45 -0800, Mohit Jaggi wrote:
> FYII had a quick chat with Vinod from the Mesos team. I have some 
> questions for Aurora users inline:
> 
> Originally the default was the COMMAND executor. In this world the scheduler 
> has no visibility into the command executor.
> More recently, we added a DEFAULT executor which is used by frameworks when 
> they want to launch pod like task groups
> The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM or 
> DEFAULT executor *and* uses v1 scheduler API.
> 
> 
> Q1: Does Aurora use COMMAND or DEFAULT executor? 
> 
> 
> note that SHUTDOWN is not as robust as you might think :slightly_smiling_face:
> for one, there is no reconciliation API for the executor state. it is very 
> much best effort. 
> KILL is more robust for killing tasks, because task status updates are 
> reliably delivered and there is reconciliation API
> 
> Q2: I think that this is ok as Aurora's reconciliation will still work as we 
> don't have "executor state". "task state" will be a good and correct proxy 
> for that. Aurora will send SHUTDOWN again and again until it succeeds in the 
> same way as it does now with KILL. Right?
> 
> Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it 
> already handle that?
> 
> 
> 
> 
> On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi  wrote:
> > So that is pretty much what I proposed...
> > If the method signature has to change, we can keep the executorId as it is, 
> > unless we want to take this opportunity to clean that up. I will check if 
> > the SHUTDOWN works in non-executor cases also.
> > 
> > On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner  wrote:
> > > > We still need "Agent ID" for the shutdown call.
> > > 
> > > Darn.  In that case, how about we change the method signature in Driver 
> > > to accept agentId and ignore that param in MesosSchedulerDriver.
> > > > But do we really need the command line option?
> > > 
> > > Aurora can run tasks without an executor.  I'm assuming the shutdown call 
> > > is incompatible with that mode.
> > > 
> > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  wrote:
> > > > We still need "Agent ID" for the shutdown call.
> > > > 
> > > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  
> > > > wrote:
> > > > > Sounds good. But do we really need the command line option? One can 
> > > > > use an older Driver if KILL is preferred for some reason.
> > > > > 
> > > > > On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner  
> > > > > wrote:
> > > > > > This situation is much simpler if task ID == executor ID.  I can't 
> > > > > > come up with a good reason why this is not the case today.  Our 
> > > > > > executor IDs originally included static prefix, though i do not 
> > > > > > recall any rationale for this.  When Renan added custom executor 
> > > > > > support, this static prefix was made configurable.  Again, i do not 
> > > > > > believe there was any rationale for the utility of executor IDs.
> > > > > > I propose the following:
> > > > > > - Change relevant code in MesosTaskFactory to 
> > > > > > setExecutorId(task.getTaskId())
> > > > > > - Add a command line parameter (default false) to toggle use of 
> > > > > > executor shutdown in VersionedSchedulerDriverService.killTask
> > > > > > 
> > > > > > 
> > > > > > Does anyone see an issue with this approach?
> > > > > > 
> > > > > > On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
> > > > > >  wrote:
> > > > > > > To do this in a backward compatible manner, one way is :
> > > > > > > ```
> > > > > > > void destroy(taskId, executorId, agentId) {
> > > > > > > 
> > > > > > > 
> > > > > > > if(driver instanceOf Versioned)   
> > > > > > > (Versioned...)driver.shutdown(executorId, 

Re: shutdown vs kill API is Mesos

2018-01-17 Thread Mohit Jaggi
FYII had a quick chat with Vinod from the Mesos team. I have some
questions for Aurora users inline:


*Originally the default was the COMMAND executor. In this world the
scheduler has no visibility into the command executor.*
*More recently, we added a DEFAULT executor which is used by frameworks
when they want to launch pod like task groups*

*The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM
or DEFAULT executor *and* uses v1 scheduler API.*

Q1: Does Aurora use COMMAND or DEFAULT executor?


*note that SHUTDOWN is not as robust as you might think
:slightly_smiling_face:*
*for one, there is no reconciliation API for the executor state. it is very
much best effort. *
*KILL is more robust for killing tasks, because task status updates are
reliably delivered and there is reconciliation API*

Q2: I think that this is ok as Aurora's reconciliation will still work as
we don't have "executor state". "task state" will be a good and correct
proxy for that. Aurora will send SHUTDOWN again and again until it succeeds
in the same way as it does now with KILL. Right?

Q3: Does thermos executor need any changes to respond to SHUTDOWN or does
it already handle that?




On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi  wrote:

> So that is pretty much what I proposed...
>
> If the method signature has to change, we can keep the executorId as it
> is, unless we want to take this opportunity to clean that up. I will check
> if the SHUTDOWN works in non-executor cases also.
>
> On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner  wrote:
>
>> We still need "Agent ID" for the shutdown call.
>>
>>
>> Darn.  In that case, how about we change the method signature in Driver
>> to accept agentId and ignore that param in MesosSchedulerDriver.
>>
>> But do we really need the command line option?
>>
>>
>> Aurora can run tasks without an executor.  I'm assuming the shutdown call
>> is incompatible with that mode.
>>
>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi 
>> wrote:
>>
>>> We still need "Agent ID" for the shutdown call.
>>>
>>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi 
>>> wrote:
>>>
 Sounds good. But do we really need the command line option? One can use
 an older Driver if KILL is preferred for some reason.

 On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner 
 wrote:

> This situation is much simpler if task ID == executor ID.  I can't
> come up with a good reason why this is not the case today.  Our executor
> IDs originally included static prefix, though i do not recall any 
> rationale
> for this.  When Renan added custom executor support, this static prefix 
> was
> made configurable.  Again, i do not believe there was any rationale for 
> the
> utility of executor IDs.
>
> I propose the following:
> - Change relevant code in MesosTaskFactory to
> setExecutorId(task.getTaskId())
> - Add a command line parameter (default false) to toggle use of
> executor shutdown in VersionedSchedulerDriverService.killTask
>
> Does anyone see an issue with this approach?
>
> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
> wrote:
>
>> To do this in a backward compatible manner, one way is :
>>
>> ```
>> void destroy(taskId, executorId, agentId) {
>>
>> if(driver instanceOf Versioned)
>>(Versioned...)driver.shutdown(executorId, agentId)
>> else
>>driver.kill(taskId)
>>
>> }
>> ```
>>
>> Any other opinions?
>>
>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>> dmclaugh...@apache.org> wrote:
>>
>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>
>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
>>> wrote:
>>>
 Are you suggesting that we delay the switch to SHUTDOWN call until
 this working group can resolve the API perf issue?

 On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
 dmclaugh...@apache.org> wrote:

> We are working with Mesos folks to resolve it. There is a Mesos
> performance working group that folks can join if they'd like to 
> contribute:
> http://mesos.apache.org/blog/performance-working-group-progr
> ess-report/
>
> I'm not sure what you mean by branch. Everything we used to scale
> test is on master.
>
> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
> meghdoo...@yahoo.com> wrote:
>
>> David, should twitter try against mesos 1.5 to see if things are
>> better with the new api instead of libmesos. This is going to be a 
>> drift
>> over time that will stop us from adopting new features.
>>
>> If it was sometime back it 

Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
So that is pretty much what I proposed...

If the method signature has to change, we can keep the executorId as it is,
unless we want to take this opportunity to clean that up. I will check if
the SHUTDOWN works in non-executor cases also.

On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner  wrote:

> We still need "Agent ID" for the shutdown call.
>
>
> Darn.  In that case, how about we change the method signature in Driver to
> accept agentId and ignore that param in MesosSchedulerDriver.
>
> But do we really need the command line option?
>
>
> Aurora can run tasks without an executor.  I'm assuming the shutdown call
> is incompatible with that mode.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  wrote:
>
>> We still need "Agent ID" for the shutdown call.
>>
>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi 
>> wrote:
>>
>>> Sounds good. But do we really need the command line option? One can use
>>> an older Driver if KILL is preferred for some reason.
>>>
>>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner  wrote:
>>>
 This situation is much simpler if task ID == executor ID.  I can't come
 up with a good reason why this is not the case today.  Our executor IDs
 originally included static prefix, though i do not recall any rationale for
 this.  When Renan added custom executor support, this static prefix was
 made configurable.  Again, i do not believe there was any rationale for the
 utility of executor IDs.

 I propose the following:
 - Change relevant code in MesosTaskFactory to
 setExecutorId(task.getTaskId())
 - Add a command line parameter (default false) to toggle use of
 executor shutdown in VersionedSchedulerDriverService.killTask

 Does anyone see an issue with this approach?

 On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
 wrote:

> To do this in a backward compatible manner, one way is :
>
> ```
> void destroy(taskId, executorId, agentId) {
>
> if(driver instanceOf Versioned)
>(Versioned...)driver.shutdown(executorId, agentId)
> else
>driver.kill(taskId)
>
> }
> ```
>
> Any other opinions?
>
> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
> dmclaugh...@apache.org> wrote:
>
>> Nope, I support getting SHUTDOWN in for users of the new API.
>>
>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
>> wrote:
>>
>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>> this working group can resolve the API perf issue?
>>>
>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>> dmclaugh...@apache.org> wrote:
>>>
 We are working with Mesos folks to resolve it. There is a Mesos
 performance working group that folks can join if they'd like to 
 contribute:
 http://mesos.apache.org/blog/performance-working-group-progr
 ess-report/

 I'm not sure what you mean by branch. Everything we used to scale
 test is on master.

 On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
 meghdoo...@yahoo.com> wrote:

> David, should twitter try against mesos 1.5 to see if things are
> better with the new api instead of libmesos. This is going to be a 
> drift
> over time that will stop us from adopting new features.
>
> If it was sometime back it would be good to rerun the tests and
> open a ticket in Mesos if issues exist. All aurora users can then 
> push for
> resolution.
>
> Also details on branch etc that has the api integration?
>
> Thx
>
> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
> dmclaugh...@apache.org> wrote:
>
> I'm not sure I agree with the summary. Bill's proposal was using
> shutdown only when using the new API. I would also support this if 
> it's
> possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <
> mohit.ja...@uber.com> wrote:
>
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to
>> performance concerns. I conclude that folks from Twitter don't 
>> support this
>> change
>>
>> Question:
>> - Are there other users that want this change?
>>
>>
>>
>

>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
>
> We still need "Agent ID" for the shutdown call.


Darn.  In that case, how about we change the method signature in Driver to
accept agentId and ignore that param in MesosSchedulerDriver.

But do we really need the command line option?


Aurora can run tasks without an executor.  I'm assuming the shutdown call
is incompatible with that mode.

On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  wrote:

> We still need "Agent ID" for the shutdown call.
>
> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  wrote:
>
>> Sounds good. But do we really need the command line option? One can use
>> an older Driver if KILL is preferred for some reason.
>>
>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner  wrote:
>>
>>> This situation is much simpler if task ID == executor ID.  I can't come
>>> up with a good reason why this is not the case today.  Our executor IDs
>>> originally included static prefix, though i do not recall any rationale for
>>> this.  When Renan added custom executor support, this static prefix was
>>> made configurable.  Again, i do not believe there was any rationale for the
>>> utility of executor IDs.
>>>
>>> I propose the following:
>>> - Change relevant code in MesosTaskFactory to
>>> setExecutorId(task.getTaskId())
>>> - Add a command line parameter (default false) to toggle use of executor
>>> shutdown in VersionedSchedulerDriverService.killTask
>>>
>>> Does anyone see an issue with this approach?
>>>
>>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
>>> wrote:
>>>
 To do this in a backward compatible manner, one way is :

 ```
 void destroy(taskId, executorId, agentId) {

 if(driver instanceOf Versioned)
(Versioned...)driver.shutdown(executorId, agentId)
 else
driver.kill(taskId)

 }
 ```

 Any other opinions?

 On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
 dmclaugh...@apache.org> wrote:

> Nope, I support getting SHUTDOWN in for users of the new API.
>
> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
> wrote:
>
>> Are you suggesting that we delay the switch to SHUTDOWN call until
>> this working group can resolve the API perf issue?
>>
>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>> dmclaugh...@apache.org> wrote:
>>
>>> We are working with Mesos folks to resolve it. There is a Mesos
>>> performance working group that folks can join if they'd like to 
>>> contribute:
>>> http://mesos.apache.org/blog/performance-working-group-progr
>>> ess-report/
>>>
>>> I'm not sure what you mean by branch. Everything we used to scale
>>> test is on master.
>>>
>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>> meghdoo...@yahoo.com> wrote:
>>>
 David, should twitter try against mesos 1.5 to see if things are
 better with the new api instead of libmesos. This is going to be a 
 drift
 over time that will stop us from adopting new features.

 If it was sometime back it would be good to rerun the tests and
 open a ticket in Mesos if issues exist. All aurora users can then push 
 for
 resolution.

 Also details on branch etc that has the api integration?

 Thx

 On Jan 12, 2018, at 11:39 AM, David McLaughlin <
 dmclaugh...@apache.org> wrote:

 I'm not sure I agree with the summary. Bill's proposal was using
 shutdown only when using the new API. I would also support this if it's
 possible.

 On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi  wrote:

> Summary so far:
> - Bill supports making this change
> - This change cannot be made in a backward compatible manner
> - David (Twitter) does not want to use HTTP APIs due to
> performance concerns. I conclude that folks from Twitter don't 
> support this
> change
>
> Question:
> - Are there other users that want this change?
>
>
>

>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
We still need "Agent ID" for the shutdown call.

On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi  wrote:

> Sounds good. But do we really need the command line option? One can use an
> older Driver if KILL is preferred for some reason.
>
> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner  wrote:
>
>> This situation is much simpler if task ID == executor ID.  I can't come
>> up with a good reason why this is not the case today.  Our executor IDs
>> originally included static prefix, though i do not recall any rationale for
>> this.  When Renan added custom executor support, this static prefix was
>> made configurable.  Again, i do not believe there was any rationale for the
>> utility of executor IDs.
>>
>> I propose the following:
>> - Change relevant code in MesosTaskFactory to
>> setExecutorId(task.getTaskId())
>> - Add a command line parameter (default false) to toggle use of executor
>> shutdown in VersionedSchedulerDriverService.killTask
>>
>> Does anyone see an issue with this approach?
>>
>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
>> wrote:
>>
>>> To do this in a backward compatible manner, one way is :
>>>
>>> ```
>>> void destroy(taskId, executorId, agentId) {
>>>
>>> if(driver instanceOf Versioned)
>>>(Versioned...)driver.shutdown(executorId, agentId)
>>> else
>>>driver.kill(taskId)
>>>
>>> }
>>> ```
>>>
>>> Any other opinions?
>>>
>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>> dmclaugh...@apache.org> wrote:
>>>
 Nope, I support getting SHUTDOWN in for users of the new API.

 On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
 wrote:

> Are you suggesting that we delay the switch to SHUTDOWN call until
> this working group can resolve the API perf issue?
>
> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
> dmclaugh...@apache.org> wrote:
>
>> We are working with Mesos folks to resolve it. There is a Mesos
>> performance working group that folks can join if they'd like to 
>> contribute:
>> http://mesos.apache.org/blog/performance-working-group-progr
>> ess-report/
>>
>> I'm not sure what you mean by branch. Everything we used to scale
>> test is on master.
>>
>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>> meghdoo...@yahoo.com> wrote:
>>
>>> David, should twitter try against mesos 1.5 to see if things are
>>> better with the new api instead of libmesos. This is going to be a drift
>>> over time that will stop us from adopting new features.
>>>
>>> If it was sometime back it would be good to rerun the tests and open
>>> a ticket in Mesos if issues exist. All aurora users can then push for
>>> resolution.
>>>
>>> Also details on branch etc that has the api integration?
>>>
>>> Thx
>>>
>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>> dmclaugh...@apache.org> wrote:
>>>
>>> I'm not sure I agree with the summary. Bill's proposal was using
>>> shutdown only when using the new API. I would also support this if it's
>>> possible.
>>>
>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi 
>>> wrote:
>>>
 Summary so far:
 - Bill supports making this change
 - This change cannot be made in a backward compatible manner
 - David (Twitter) does not want to use HTTP APIs due to performance
 concerns. I conclude that folks from Twitter don't support this change

 Question:
 - Are there other users that want this change?



>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-16 Thread Mohit Jaggi
Sounds good. But do we really need the command line option? One can use an
older Driver if KILL is preferred for some reason.

On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner  wrote:

> This situation is much simpler if task ID == executor ID.  I can't come up
> with a good reason why this is not the case today.  Our executor IDs
> originally included static prefix, though i do not recall any rationale for
> this.  When Renan added custom executor support, this static prefix was
> made configurable.  Again, i do not believe there was any rationale for the
> utility of executor IDs.
>
> I propose the following:
> - Change relevant code in MesosTaskFactory to
> setExecutorId(task.getTaskId())
> - Add a command line parameter (default false) to toggle use of executor
> shutdown in VersionedSchedulerDriverService.killTask
>
> Does anyone see an issue with this approach?
>
> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi 
> wrote:
>
>> To do this in a backward compatible manner, one way is :
>>
>> ```
>> void destroy(taskId, executorId, agentId) {
>>
>> if(driver instanceOf Versioned)
>>(Versioned...)driver.shutdown(executorId, agentId)
>> else
>>driver.kill(taskId)
>>
>> }
>> ```
>>
>> Any other opinions?
>>
>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>> dmclaugh...@apache.org> wrote:
>>
>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>
>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
>>> wrote:
>>>
 Are you suggesting that we delay the switch to SHUTDOWN call until this
 working group can resolve the API perf issue?

 On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
 dmclaugh...@apache.org> wrote:

> We are working with Mesos folks to resolve it. There is a Mesos
> performance working group that folks can join if they'd like to 
> contribute:
> http://mesos.apache.org/blog/performance-working-group-progr
> ess-report/
>
> I'm not sure what you mean by branch. Everything we used to scale test
> is on master.
>
> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
> meghdoo...@yahoo.com> wrote:
>
>> David, should twitter try against mesos 1.5 to see if things are
>> better with the new api instead of libmesos. This is going to be a drift
>> over time that will stop us from adopting new features.
>>
>> If it was sometime back it would be good to rerun the tests and open
>> a ticket in Mesos if issues exist. All aurora users can then push for
>> resolution.
>>
>> Also details on branch etc that has the api integration?
>>
>> Thx
>>
>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>> dmclaugh...@apache.org> wrote:
>>
>> I'm not sure I agree with the summary. Bill's proposal was using
>> shutdown only when using the new API. I would also support this if it's
>> possible.
>>
>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi 
>> wrote:
>>
>>> Summary so far:
>>> - Bill supports making this change
>>> - This change cannot be made in a backward compatible manner
>>> - David (Twitter) does not want to use HTTP APIs due to performance
>>> concerns. I conclude that folks from Twitter don't support this change
>>>
>>> Question:
>>> - Are there other users that want this change?
>>>
>>>
>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-16 Thread Bill Farner
This situation is much simpler if task ID == executor ID.  I can't come up
with a good reason why this is not the case today.  Our executor IDs
originally included static prefix, though i do not recall any rationale for
this.  When Renan added custom executor support, this static prefix was
made configurable.  Again, i do not believe there was any rationale for the
utility of executor IDs.

I propose the following:
- Change relevant code in MesosTaskFactory to setExecutorId(task.getTaskId()
)
- Add a command line parameter (default false) to toggle use of executor
shutdown in VersionedSchedulerDriverService.killTask

Does anyone see an issue with this approach?

On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi  wrote:

> To do this in a backward compatible manner, one way is :
>
> ```
> void destroy(taskId, executorId, agentId) {
>
> if(driver instanceOf Versioned)
>(Versioned...)driver.shutdown(executorId, agentId)
> else
>driver.kill(taskId)
>
> }
> ```
>
> Any other opinions?
>
> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin  > wrote:
>
>> Nope, I support getting SHUTDOWN in for users of the new API.
>>
>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi 
>> wrote:
>>
>>> Are you suggesting that we delay the switch to SHUTDOWN call until this
>>> working group can resolve the API perf issue?
>>>
>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>> dmclaugh...@apache.org> wrote:
>>>
 We are working with Mesos folks to resolve it. There is a Mesos
 performance working group that folks can join if they'd like to contribute:
 http://mesos.apache.org/blog/performance-working-group-progress-report/

 I'm not sure what you mean by branch. Everything we used to scale test
 is on master.

 On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
 meghdoo...@yahoo.com> wrote:

> David, should twitter try against mesos 1.5 to see if things are
> better with the new api instead of libmesos. This is going to be a drift
> over time that will stop us from adopting new features.
>
> If it was sometime back it would be good to rerun the tests and open a
> ticket in Mesos if issues exist. All aurora users can then push for
> resolution.
>
> Also details on branch etc that has the api integration?
>
> Thx
>
> On Jan 12, 2018, at 11:39 AM, David McLaughlin 
> wrote:
>
> I'm not sure I agree with the summary. Bill's proposal was using
> shutdown only when using the new API. I would also support this if it's
> possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi 
> wrote:
>
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance
>> concerns. I conclude that folks from Twitter don't support this change
>>
>> Question:
>> - Are there other users that want this change?
>>
>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-15 Thread David McLaughlin
We are working with Mesos folks to resolve it. There is a Mesos performance
working group that folks can join if they'd like to contribute:
http://mesos.apache.org/blog/performance-working-group-progress-report/

I'm not sure what you mean by branch. Everything we used to scale test is
on master.

On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
meghdoo...@yahoo.com> wrote:

> David, should twitter try against mesos 1.5 to see if things are better
> with the new api instead of libmesos. This is going to be a drift over time
> that will stop us from adopting new features.
>
> If it was sometime back it would be good to rerun the tests and open a
> ticket in Mesos if issues exist. All aurora users can then push for
> resolution.
>
> Also details on branch etc that has the api integration?
>
> Thx
>
> On Jan 12, 2018, at 11:39 AM, David McLaughlin 
> wrote:
>
> I'm not sure I agree with the summary. Bill's proposal was using shutdown
> only when using the new API. I would also support this if it's possible.
>
> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi 
> wrote:
>
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance
>> concerns. I conclude that folks from Twitter don't support this change
>>
>> Question:
>> - Are there other users that want this change?
>>
>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-15 Thread Meghdoot bhattacharya
David, should twitter try against mesos 1.5 to see if things are better with 
the new api instead of libmesos. This is going to be a drift over time that 
will stop us from adopting new features.

If it was sometime back it would be good to rerun the tests and open a ticket 
in Mesos if issues exist. All aurora users can then push for resolution.

Also details on branch etc that has the api integration?

Thx

> On Jan 12, 2018, at 11:39 AM, David McLaughlin  wrote:
> 
> I'm not sure I agree with the summary. Bill's proposal was using shutdown 
> only when using the new API. I would also support this if it's possible.  
> 
>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi  wrote:
>> Summary so far:
>> - Bill supports making this change
>> - This change cannot be made in a backward compatible manner
>> - David (Twitter) does not want to use HTTP APIs due to performance 
>> concerns. I conclude that folks from Twitter don't support this change
>> 
>> Question:
>> - Are there other users that want this change?
>> 
>> 
> 


Re: shutdown vs kill API is Mesos

2018-01-12 Thread David McLaughlin
I'm not sure I agree with the summary. Bill's proposal was using shutdown
only when using the new API. I would also support this if it's possible.

On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi  wrote:

> Summary so far:
> - Bill supports making this change
> - This change cannot be made in a backward compatible manner
> - David (Twitter) does not want to use HTTP APIs due to performance
> concerns. I conclude that folks from Twitter don't support this change
>
> Question:
> - Are there other users that want this change?
>
>
>


Re: shutdown vs kill API is Mesos

2018-01-12 Thread Mohit Jaggi
Summary so far:
- Bill supports making this change
- This change cannot be made in a backward compatible manner
- David (Twitter) does not want to use HTTP APIs due to performance
concerns. I conclude that folks from Twitter don't support this change

Question:
- Are there other users that want this change?


Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
Sorry, I guess referring to it as the libmesos way of talking to the Mesos
master is a bit misleading.

And I stand corrected, the V0 is only an adaptor to the V1 interface which
still uses the undocumented RPC way of talking to the master (
https://github.com/apache/mesos/blob/master/src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp)
while using V1 versioned protobufs.

V1 one on the other hand talks to Mesos via a well defined HTTP API.
There's still a dependency on libmesos because the implementation of the
code that handles the HTTP requests is made available via JNI. The big
difference here being that someone else can implement their own Java only
version of the driver and the dependency on libmesos would be gone.

Apologies for the confusion.

On Thu, Jan 11, 2018 at 2:03 PM, Mohit Jaggi  wrote:

> David,
> - LCD makes sense. Does that mean that Twitter is using the
>  SCHEDULER_DRIVER
> 
>  version?
> - I don't see Bill's proposal on this thread. Did I miss it?
>
> Renan,
> VersionedDriverFactory
> 's
> comments indicate that libmesos is still used. What am I missing?
>
> BTW, with the patch for Thermos (from Stephan I think), the need for
> switching to SHUTDOWN is reduced.
> Mohit.
>
> On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin 
> wrote:
>
>> Sorry, the other approach outlined by Bill would in theory work too, but
>> it sounds like in practice it also needs more changes on the Mesos side.
>>
>> On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin > > wrote:
>>
>>> Right. In order to keep the current abstraction in Aurora (both APIs),
>>> we obviously have to bind to the lower common denominator API methods. So
>>> the only way to integrate with shutdown will be to fix the performance
>>> issues so we can switch to the new API.
>>>
>>> The performance issue we ran into at Twitter was that with status
>>> updates that were similar to our production volume, they started to get
>>> dropped and tasks end up being LOST and unnecessarily killed. So it's a
>>> definite blocker for us to adopt in its current state. We have someone who
>>> has fixing this on the Mesos side in their backlog, but it's currently not
>>> the highest priority for us.
>>>
>>> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <
>>> renanidelva...@gmail.com> wrote:
>>>
 The HTTP API is what is used under the hood for V0 and V1 (instead of
 libmesos), I believe that's what David was referencing when he mentioned
 the HTTP performance issues. Here's a better explanation from the original
 patch submitted by Zameer: https://github.com/apa
 che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d
 iff-75bd5a98db87502a2332e9110d2eafc6

 I'm not sure about the Shutdown call, as you mentioned, the versioned
 driver seems to have the method but the driver interface does not. This
 might get tricky from here on in since Mesos has V1 only compatible calls.

 On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi 
 wrote:

> Thanks Renan. I saw that code. "Driver" interface does not have
> SHUTDOWN...so it is not "compatible". I was trying to change to
> VersionedSchedulerDriverService all over the code (that wreaks havoc
> across the tests!) but Mesos's Java wrapper doesn't seem to have that
> call either. Perhaps, that is why David referred to the HTTP API.
>
> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
> renanidelva...@gmail.com> wrote:
>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>> /mesos/SchedulerDriverModule.java
>>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>> /mesos/VersionedSchedulerDriverService.java#L50
>>
>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi 
>> wrote:
>>
>>> David,
>>> Where can I find this code?
>>>
>>> Mohit.
>>>
>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>> dmclaugh...@apache.org> wrote:
>>>
 The new API is present in Aurora in a compatibility layer, but the
 HTTP performance issues still exist so we can't make it the default.

 On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner 
 wrote:

> Aurora pre-dates SHUTDOWN by several years, so the option was not
> present.  Additionally, the SHUTDOWN call is not available in the API 
> used
> by Aurora. 

Re: shutdown vs kill API is Mesos

2018-01-11 Thread Mohit Jaggi
David,
- LCD makes sense. Does that mean that Twitter is using the SCHEDULER_DRIVER

 version?
- I don't see Bill's proposal on this thread. Did I miss it?

Renan,
VersionedDriverFactory
's
comments indicate that libmesos is still used. What am I missing?

BTW, with the patch for Thermos (from Stephan I think), the need for
switching to SHUTDOWN is reduced.
Mohit.

On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin 
wrote:

> Sorry, the other approach outlined by Bill would in theory work too, but
> it sounds like in practice it also needs more changes on the Mesos side.
>
> On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin 
> wrote:
>
>> Right. In order to keep the current abstraction in Aurora (both APIs), we
>> obviously have to bind to the lower common denominator API methods. So the
>> only way to integrate with shutdown will be to fix the performance issues
>> so we can switch to the new API.
>>
>> The performance issue we ran into at Twitter was that with status updates
>> that were similar to our production volume, they started to get dropped and
>> tasks end up being LOST and unnecessarily killed. So it's a definite
>> blocker for us to adopt in its current state. We have someone who has
>> fixing this on the Mesos side in their backlog, but it's currently not the
>> highest priority for us.
>>
>> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle > > wrote:
>>
>>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>>> libmesos), I believe that's what David was referencing when he mentioned
>>> the HTTP performance issues. Here's a better explanation from the original
>>> patch submitted by Zameer: https://github.com/apa
>>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d
>>> iff-75bd5a98db87502a2332e9110d2eafc6
>>>
>>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>>> driver seems to have the method but the driver interface does not. This
>>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>>
>>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi 
>>> wrote:
>>>
 Thanks Renan. I saw that code. "Driver" interface does not have
 SHUTDOWN...so it is not "compatible". I was trying to change to
 VersionedSchedulerDriverService all over the code (that wreaks havoc
 across the tests!) but Mesos's Java wrapper doesn't seem to have that
 call either. Perhaps, that is why David referred to the HTTP API.

 On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
 renanidelva...@gmail.com> wrote:

> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
> /mesos/SchedulerDriverModule.java
>
> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
> /mesos/VersionedSchedulerDriverService.java#L50
>
> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi 
> wrote:
>
>> David,
>> Where can I find this code?
>>
>> Mohit.
>>
>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>> dmclaugh...@apache.org> wrote:
>>
>>> The new API is present in Aurora in a compatibility layer, but the
>>> HTTP performance issues still exist so we can't make it the default.
>>>
>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner 
>>> wrote:
>>>
 Aurora pre-dates SHUTDOWN by several years, so the option was not
 present.  Additionally, the SHUTDOWN call is not available in the API 
 used
 by Aurora.  Last i knew, Aurora could not use the "new" API because of
 performance issues in the implementation, but i do not know where that
 stands today.

 https://mesos.apache.org/documentation/latest/scheduler-http
 -api/#shutdown

> NOTE: This is a new call that was not present in the old API


 On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
 wrote:

> Folks,
> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN
> for killing tasks. As Aurora has an executor per task, won't SHUTDOWN 
> work
> better? It will avoid zombie executors.
>
> Mohit.
>


>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-11 Thread David McLaughlin
Sorry, the other approach outlined by Bill would in theory work too, but it
sounds like in practice it also needs more changes on the Mesos side.

On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin 
wrote:

> Right. In order to keep the current abstraction in Aurora (both APIs), we
> obviously have to bind to the lower common denominator API methods. So the
> only way to integrate with shutdown will be to fix the performance issues
> so we can switch to the new API.
>
> The performance issue we ran into at Twitter was that with status updates
> that were similar to our production volume, they started to get dropped and
> tasks end up being LOST and unnecessarily killed. So it's a definite
> blocker for us to adopt in its current state. We have someone who has
> fixing this on the Mesos side in their backlog, but it's currently not the
> highest priority for us.
>
> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle 
> wrote:
>
>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>> libmesos), I believe that's what David was referencing when he mentioned
>> the HTTP performance issues. Here's a better explanation from the original
>> patch submitted by Zameer: https://github.com/apa
>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#
>> diff-75bd5a98db87502a2332e9110d2eafc6
>>
>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>> driver seems to have the method but the driver interface does not. This
>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>
>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi 
>> wrote:
>>
>>> Thanks Renan. I saw that code. "Driver" interface does not have
>>> SHUTDOWN...so it is not "compatible". I was trying to change to
>>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>>> call either. Perhaps, that is why David referred to the HTTP API.
>>>
>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
>>> renanidelva...@gmail.com> wrote:
>>>
 https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
 d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
 /mesos/SchedulerDriverModule.java

 https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
 d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
 /mesos/VersionedSchedulerDriverService.java#L50

 On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi 
 wrote:

> David,
> Where can I find this code?
>
> Mohit.
>
> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
> dmclaugh...@apache.org> wrote:
>
>> The new API is present in Aurora in a compatibility layer, but the
>> HTTP performance issues still exist so we can't make it the default.
>>
>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner 
>> wrote:
>>
>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>> present.  Additionally, the SHUTDOWN call is not available in the API 
>>> used
>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>> performance issues in the implementation, but i do not know where that
>>> stands today.
>>>
>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>> -api/#shutdown
>>>
 NOTE: This is a new call that was not present in the old API
>>>
>>>
>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
>>> wrote:
>>>
 Folks,
 Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
 killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
 better? It will avoid zombie executors.

 Mohit.

>>>
>>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-11 Thread David McLaughlin
Right. In order to keep the current abstraction in Aurora (both APIs), we
obviously have to bind to the lower common denominator API methods. So the
only way to integrate with shutdown will be to fix the performance issues
so we can switch to the new API.

The performance issue we ran into at Twitter was that with status updates
that were similar to our production volume, they started to get dropped and
tasks end up being LOST and unnecessarily killed. So it's a definite
blocker for us to adopt in its current state. We have someone who has
fixing this on the Mesos side in their backlog, but it's currently not the
highest priority for us.

On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle 
wrote:

> The HTTP API is what is used under the hood for V0 and V1 (instead of
> libmesos), I believe that's what David was referencing when he mentioned
> the HTTP performance issues. Here's a better explanation from the original
> patch submitted by Zameer: https://github.com/apache/aurora/commit/
> 705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#diff-
> 75bd5a98db87502a2332e9110d2eafc6
>
> I'm not sure about the Shutdown call, as you mentioned, the versioned
> driver seems to have the method but the driver interface does not. This
> might get tricky from here on in since Mesos has V1 only compatible calls.
>
> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi  wrote:
>
>> Thanks Renan. I saw that code. "Driver" interface does not have
>> SHUTDOWN...so it is not "compatible". I was trying to change to
>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>> call either. Perhaps, that is why David referred to the HTTP API.
>>
>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle > > wrote:
>>
>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>> /mesos/SchedulerDriverModule.java
>>>
>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>
>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi 
>>> wrote:
>>>
 David,
 Where can I find this code?

 Mohit.

 On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
 dmclaugh...@apache.org> wrote:

> The new API is present in Aurora in a compatibility layer, but the
> HTTP performance issues still exist so we can't make it the default.
>
> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner 
> wrote:
>
>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>> present.  Additionally, the SHUTDOWN call is not available in the API 
>> used
>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>> performance issues in the implementation, but i do not know where that
>> stands today.
>>
>> https://mesos.apache.org/documentation/latest/scheduler-http
>> -api/#shutdown
>>
>>> NOTE: This is a new call that was not present in the old API
>>
>>
>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
>> wrote:
>>
>>> Folks,
>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>> better? It will avoid zombie executors.
>>>
>>> Mohit.
>>>
>>
>>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
The HTTP API is what is used under the hood for V0 and V1 (instead of
libmesos), I believe that's what David was referencing when he mentioned
the HTTP performance issues. Here's a better explanation from the original
patch submitted by Zameer:
https://github.com/apache/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#diff-75bd5a98db87502a2332e9110d2eafc6

I'm not sure about the Shutdown call, as you mentioned, the versioned
driver seems to have the method but the driver interface does not. This
might get tricky from here on in since Mesos has V1 only compatible calls.

On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi  wrote:

> Thanks Renan. I saw that code. "Driver" interface does not have
> SHUTDOWN...so it is not "compatible". I was trying to change to
> VersionedSchedulerDriverService all over the code (that wreaks havoc
> across the tests!) but Mesos's Java wrapper doesn't seem to have that
> call either. Perhaps, that is why David referred to the HTTP API.
>
> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle 
> wrote:
>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/
>> scheduler/mesos/SchedulerDriverModule.java
>>
>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>> d07b1f029150e245de/src/main/java/org/apache/aurora/
>> scheduler/mesos/VersionedSchedulerDriverService.java#L50
>>
>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi  wrote:
>>
>>> David,
>>> Where can I find this code?
>>>
>>> Mohit.
>>>
>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin >> > wrote:
>>>
 The new API is present in Aurora in a compatibility layer, but the HTTP
 performance issues still exist so we can't make it the default.

 On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:

> Aurora pre-dates SHUTDOWN by several years, so the option was not
> present.  Additionally, the SHUTDOWN call is not available in the API used
> by Aurora.  Last i knew, Aurora could not use the "new" API because of
> performance issues in the implementation, but i do not know where that
> stands today.
>
> https://mesos.apache.org/documentation/latest/scheduler-http
> -api/#shutdown
>
>> NOTE: This is a new call that was not present in the old API
>
>
> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
> wrote:
>
>> Folks,
>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>> better? It will avoid zombie executors.
>>
>> Mohit.
>>
>
>

>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-11 Thread Mohit Jaggi
Thanks Renan. I saw that code. "Driver" interface does not have
SHUTDOWN...so it is not "compatible". I was trying to change to
VersionedSchedulerDriverService all over the code (that wreaks havoc across
the tests!) but Mesos's Java wrapper https://github.com/apache/mesos/tree/72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/java/src/org/apache/mesos/v1/scheduler>
doesn't seem to have that call either. Perhaps, that is why David referred
to the HTTP API.

On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle 
wrote:

> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f02
> 9150e245de/src/main/java/org/apache/aurora/scheduler/mesos/
> SchedulerDriverModule.java
>
> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f02
> 9150e245de/src/main/java/org/apache/aurora/scheduler/mesos/
> VersionedSchedulerDriverService.java#L50
>
> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi  wrote:
>
>> David,
>> Where can I find this code?
>>
>> Mohit.
>>
>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin 
>> wrote:
>>
>>> The new API is present in Aurora in a compatibility layer, but the HTTP
>>> performance issues still exist so we can't make it the default.
>>>
>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:
>>>
 Aurora pre-dates SHUTDOWN by several years, so the option was not
 present.  Additionally, the SHUTDOWN call is not available in the API used
 by Aurora.  Last i knew, Aurora could not use the "new" API because of
 performance issues in the implementation, but i do not know where that
 stands today.

 https://mesos.apache.org/documentation/latest/scheduler-http
 -api/#shutdown

> NOTE: This is a new call that was not present in the old API


 On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
 wrote:

> Folks,
> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
> better? It will avoid zombie executors.
>
> Mohit.
>


>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-11 Thread Renan DelValle
https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java

https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/VersionedSchedulerDriverService.java#L50

On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi  wrote:

> David,
> Where can I find this code?
>
> Mohit.
>
> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin 
> wrote:
>
>> The new API is present in Aurora in a compatibility layer, but the HTTP
>> performance issues still exist so we can't make it the default.
>>
>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:
>>
>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>> performance issues in the implementation, but i do not know where that
>>> stands today.
>>>
>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>> -api/#shutdown
>>>
 NOTE: This is a new call that was not present in the old API
>>>
>>>
>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
>>> wrote:
>>>
 Folks,
 Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
 killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
 better? It will avoid zombie executors.

 Mohit.

>>>
>>>
>>
>


Re: shutdown vs kill API is Mesos

2018-01-09 Thread Mohit Jaggi
David,
Where can I find this code?

Mohit.

On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin 
wrote:

> The new API is present in Aurora in a compatibility layer, but the HTTP
> performance issues still exist so we can't make it the default.
>
> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:
>
>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>> present.  Additionally, the SHUTDOWN call is not available in the API used
>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>> performance issues in the implementation, but i do not know where that
>> stands today.
>>
>> https://mesos.apache.org/documentation/latest/scheduler-
>> http-api/#shutdown
>>
>>> NOTE: This is a new call that was not present in the old API
>>
>>
>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi  wrote:
>>
>>> Folks,
>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>> better? It will avoid zombie executors.
>>>
>>> Mohit.
>>>
>>
>>
>


Re: shutdown vs kill API is Mesos

2017-12-09 Thread Mohit Jaggi
Filed https://issues.apache.org/jira/browse/AURORA-1960

On Sat, Dec 9, 2017 at 4:45 PM, Bill Farner  wrote:

> The new API is present in Aurora in a compatibility layer
>
>
> Aha!  I had not explored that code
> 
> yet.  It does seem that SHUTDOWN provides the behavior that we aim for
> when killing tasks.  The global executor shutdown timeout (
> --executor_shutdown_grace_period) potentially interferes with our
> graceful_shutdown_wait_secs job-level configuration.  However, an
> operator could use the former as an upper limit to the latter.
>
> From what i see, i'd support a patch to switch to SHUTDOWN when using
> DriverKind.V0_DRIVER or DriverKind.V1_DRIVER.
>
> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin 
> wrote:
>
>> The new API is present in Aurora in a compatibility layer, but the HTTP
>> performance issues still exist so we can't make it the default.
>>
>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:
>>
>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>> present.  Additionally, the SHUTDOWN call is not available in the API used
>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>> performance issues in the implementation, but i do not know where that
>>> stands today.
>>>
>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>> -api/#shutdown
>>>
 NOTE: This is a new call that was not present in the old API
>>>
>>>
>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi 
>>> wrote:
>>>
 Folks,
 Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
 killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
 better? It will avoid zombie executors.

 Mohit.

>>>
>>>
>>
>


Re: shutdown vs kill API is Mesos

2017-12-09 Thread Bill Farner
>
> The new API is present in Aurora in a compatibility layer


Aha!  I had not explored that code

yet.  It does seem that SHUTDOWN provides the behavior that we aim for when
killing tasks.  The global executor shutdown timeout (
--executor_shutdown_grace_period) potentially interferes with our
graceful_shutdown_wait_secs job-level configuration.  However, an operator
could use the former as an upper limit to the latter.

>From what i see, i'd support a patch to switch to SHUTDOWN when using
DriverKind.V0_DRIVER or DriverKind.V1_DRIVER.

On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin 
wrote:

> The new API is present in Aurora in a compatibility layer, but the HTTP
> performance issues still exist so we can't make it the default.
>
> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner  wrote:
>
>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>> present.  Additionally, the SHUTDOWN call is not available in the API used
>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>> performance issues in the implementation, but i do not know where that
>> stands today.
>>
>> https://mesos.apache.org/documentation/latest/scheduler-
>> http-api/#shutdown
>>
>>> NOTE: This is a new call that was not present in the old API
>>
>>
>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi  wrote:
>>
>>> Folks,
>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN for
>>> killing tasks. As Aurora has an executor per task, won't SHUTDOWN work
>>> better? It will avoid zombie executors.
>>>
>>> Mohit.
>>>
>>
>>
>