FYI....I had a quick chat with Vinod from the Mesos team. I have some
questions for Aurora users inline:


*Originally the default was the COMMAND executor. In this world the
scheduler has no visibility into the command executor.*
*More recently, we added a DEFAULT executor which is used by frameworks
when they want to launch pod like task groups*

*The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM
or DEFAULT executor *and* uses v1 scheduler API.*

Q1: Does Aurora use COMMAND or DEFAULT executor?


*note that SHUTDOWN is not as robust as you might think
:slightly_smiling_face:*
*for one, there is no reconciliation API for the executor state. it is very
much best effort. *
*KILL is more robust for killing tasks, because task status updates are
reliably delivered and there is reconciliation API*

Q2: I think that this is ok as Aurora's reconciliation will still work as
we don't have "executor state". "task state" will be a good and correct
proxy for that. Aurora will send SHUTDOWN again and again until it succeeds
in the same way as it does now with KILL. Right?

Q3: Does thermos executor need any changes to respond to SHUTDOWN or does
it already handle that?




On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:

> So that is pretty much what I proposed...
>
> If the method signature has to change, we can keep the executorId as it
> is, unless we want to take this opportunity to clean that up. I will check
> if the SHUTDOWN works in non-executor cases also.
>
> On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <wfar...@apache.org> wrote:
>
>> We still need "Agent ID" for the shutdown call.
>>
>>
>> Darn.  In that case, how about we change the method signature in Driver
>> to accept agentId and ignore that param in MesosSchedulerDriver.
>>
>> But do we really need the command line option?
>>
>>
>> Aurora can run tasks without an executor.  I'm assuming the shutdown call
>> is incompatible with that mode.
>>
>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mohit.ja...@uber.com>
>> wrote:
>>
>>> We still need "Agent ID" for the shutdown call.
>>>
>>> On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <mohit.ja...@uber.com>
>>> wrote:
>>>
>>>> Sounds good. But do we really need the command line option? One can use
>>>> an older Driver if KILL is preferred for some reason.
>>>>
>>>> On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <wfar...@apache.org>
>>>> wrote:
>>>>
>>>>> This situation is much simpler if task ID == executor ID.  I can't
>>>>> come up with a good reason why this is not the case today.  Our executor
>>>>> IDs originally included static prefix, though i do not recall any 
>>>>> rationale
>>>>> for this.  When Renan added custom executor support, this static prefix 
>>>>> was
>>>>> made configurable.  Again, i do not believe there was any rationale for 
>>>>> the
>>>>> utility of executor IDs.
>>>>>
>>>>> I propose the following:
>>>>> - Change relevant code in MesosTaskFactory to
>>>>> setExecutorId(task.getTaskId())
>>>>> - Add a command line parameter (default false) to toggle use of
>>>>> executor shutdown in VersionedSchedulerDriverService.killTask
>>>>>
>>>>> Does anyone see an issue with this approach?
>>>>>
>>>>> On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi <mohit.ja...@uber.com>
>>>>> wrote:
>>>>>
>>>>>> To do this in a backward compatible manner, one way is :
>>>>>>
>>>>>> ```
>>>>>> void destroy(taskId, executorId, agentId) {
>>>>>>
>>>>>> if(driver instanceOf Versioned....)
>>>>>>    (Versioned...)driver.shutdown(executorId, agentId)
>>>>>> else
>>>>>>    driver.kill(taskId)
>>>>>>
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> Any other opinions?
>>>>>>
>>>>>> On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin <
>>>>>> dmclaugh...@apache.org> wrote:
>>>>>>
>>>>>>> Nope, I support getting SHUTDOWN in for users of the new API.
>>>>>>>
>>>>>>> On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi <mohit.ja...@uber.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are you suggesting that we delay the switch to SHUTDOWN call until
>>>>>>>> this working group can resolve the API perf issue?
>>>>>>>>
>>>>>>>> On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin <
>>>>>>>> dmclaugh...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> We are working with Mesos folks to resolve it. There is a Mesos
>>>>>>>>> performance working group that folks can join if they'd like to 
>>>>>>>>> contribute:
>>>>>>>>> http://mesos.apache.org/blog/performance-working-group-progr
>>>>>>>>> ess-report/
>>>>>>>>>
>>>>>>>>> I'm not sure what you mean by branch. Everything we used to scale
>>>>>>>>> test is on master.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya <
>>>>>>>>> meghdoo...@yahoo.com> wrote:
>>>>>>>>>
>>>>>>>>>> David, should twitter try against mesos 1.5 to see if things are
>>>>>>>>>> better with the new api instead of libmesos. This is going to be a 
>>>>>>>>>> drift
>>>>>>>>>> over time that will stop us from adopting new features.
>>>>>>>>>>
>>>>>>>>>> If it was sometime back it would be good to rerun the tests and
>>>>>>>>>> open a ticket in Mesos if issues exist. All aurora users can then 
>>>>>>>>>> push for
>>>>>>>>>> resolution.
>>>>>>>>>>
>>>>>>>>>> Also details on branch etc that has the api integration?
>>>>>>>>>>
>>>>>>>>>> Thx
>>>>>>>>>>
>>>>>>>>>> On Jan 12, 2018, at 11:39 AM, David McLaughlin <
>>>>>>>>>> dmclaugh...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> I'm not sure I agree with the summary. Bill's proposal was using
>>>>>>>>>> shutdown only when using the new API. I would also support this if 
>>>>>>>>>> it's
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi <
>>>>>>>>>> mohit.ja...@uber.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Summary so far:
>>>>>>>>>>> - Bill supports making this change
>>>>>>>>>>> - This change cannot be made in a backward compatible manner
>>>>>>>>>>> - David (Twitter) does not want to use HTTP APIs due to
>>>>>>>>>>> performance concerns. I conclude that folks from Twitter don't 
>>>>>>>>>>> support this
>>>>>>>>>>> change
>>>>>>>>>>>
>>>>>>>>>>> Question:
>>>>>>>>>>> - Are there other users that want this change?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to