> Q1: Does Aurora use COMMAND or DEFAULT executor? Aurora is currently using neither. In Mesos terms Thermos is a CUSTOM executor. On top, Aurora supports alternative custom executors [1] such as the Docker compose executor [2]. Mesos seems to be betting on the new DEFAULT executor. It should be possible to make Thermos fit the DEFAULT executor model (as it supports task groups), but I have no real estimate how much refactoring this would require.
> Q2: I think that this is ok as Aurora's reconciliation will still work... > Right? Aurora assumes a correspondence of one task per executor, so I believe this is correct. > Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it > already handle that? I have never tried it, but I believe it should work out of the box [3]. [1] https://github.com/apache/aurora/blob/master/docs/features/custom-executors.md [2] https://github.com/mesos/docker-compose-executor [3] https://github.com/apache/aurora/blob/8af269f52f162faa36cd2778979626eefcbe8181/src/main/python/apache/aurora/executor/aurora_executor.py#L301-L313 Best regards, Stephan On Wed, 2018-01-17 at 16:45 -0800, Mohit Jaggi wrote: > FYI....I had a quick chat with Vinod from the Mesos team. I have some > questions for Aurora users inline: > > Originally the default was the COMMAND executor. In this world the scheduler > has no visibility into the command executor. > More recently, we added a DEFAULT executor which is used by frameworks when > they want to launch pod like task groups > The SHUTDOWN executor call is only applicable if a scheduler uses CUSTOM or > DEFAULT executor *and* uses v1 scheduler API. > > > Q1: Does Aurora use COMMAND or DEFAULT executor? > > > note that SHUTDOWN is not as robust as you might think :slightly_smiling_face: > for one, there is no reconciliation API for the executor state. it is very > much best effort. > KILL is more robust for killing tasks, because task status updates are > reliably delivered and there is reconciliation API > > Q2: I think that this is ok as Aurora's reconciliation will still work as we > don't have "executor state". "task state" will be a good and correct proxy > for that. Aurora will send SHUTDOWN again and again until it succeeds in the > same way as it does now with KILL. Right? > > Q3: Does thermos executor need any changes to respond to SHUTDOWN or does it > already handle that? > > > > > On Tue, Jan 16, 2018 at 4:48 PM, Mohit Jaggi <[email protected]> wrote: > > So that is pretty much what I proposed... > > If the method signature has to change, we can keep the executorId as it is, > > unless we want to take this opportunity to clean that up. I will check if > > the SHUTDOWN works in non-executor cases also. > > > > On Tue, Jan 16, 2018 at 3:03 PM, Bill Farner <[email protected]> wrote: > > > > We still need "Agent ID" for the shutdown call. > > > > > > Darn. In that case, how about we change the method signature in Driver > > > to accept agentId and ignore that param in MesosSchedulerDriver. > > > > But do we really need the command line option? > > > > > > Aurora can run tasks without an executor. I'm assuming the shutdown call > > > is incompatible with that mode. > > > > > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <[email protected]> wrote: > > > > We still need "Agent ID" for the shutdown call. > > > > > > > > On Tue, Jan 16, 2018 at 1:57 PM, Mohit Jaggi <[email protected]> > > > > wrote: > > > > > Sounds good. But do we really need the command line option? One can > > > > > use an older Driver if KILL is preferred for some reason. > > > > > > > > > > On Tue, Jan 16, 2018 at 1:51 PM, Bill Farner <[email protected]> > > > > > wrote: > > > > > > This situation is much simpler if task ID == executor ID. I can't > > > > > > come up with a good reason why this is not the case today. Our > > > > > > executor IDs originally included static prefix, though i do not > > > > > > recall any rationale for this. When Renan added custom executor > > > > > > support, this static prefix was made configurable. Again, i do not > > > > > > believe there was any rationale for the utility of executor IDs. > > > > > > I propose the following: > > > > > > - Change relevant code in MesosTaskFactory to > > > > > > setExecutorId(task.getTaskId()) > > > > > > - Add a command line parameter (default false) to toggle use of > > > > > > executor shutdown in VersionedSchedulerDriverService.killTask > > > > > > > > > > > > > > > > > > Does anyone see an issue with this approach? > > > > > > > > > > > > On Tue, Jan 16, 2018 at 11:15 AM, Mohit Jaggi > > > > > > <[email protected]> wrote: > > > > > > > To do this in a backward compatible manner, one way is : > > > > > > > ``` > > > > > > > void destroy(taskId, executorId, agentId) { > > > > > > > > > > > > > > > > > > > > > if(driver instanceOf Versioned....) > > > > > > > (Versioned...)driver.shutdown(executorId, agentId) > > > > > > > else > > > > > > > driver.kill(taskId) > > > > > > > > > > > > > > > > > > > > > } > > > > > > > ``` > > > > > > > > > > > > > > Any other opinions? > > > > > > > > > > > > > > On Tue, Jan 16, 2018 at 11:12 AM, David McLaughlin > > > > > > > <[email protected]> wrote: > > > > > > > > Nope, I support getting SHUTDOWN in for users of the new API. > > > > > > > > > > > > > > > > On Tue, Jan 16, 2018 at 11:06 AM, Mohit Jaggi > > > > > > > > <[email protected]> wrote: > > > > > > > > > Are you suggesting that we delay the switch to SHUTDOWN call > > > > > > > > > until this working group can resolve the API perf issue? > > > > > > > > > > > > > > > > > > On Mon, Jan 15, 2018 at 3:55 PM, David McLaughlin > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > We are working with Mesos folks to resolve it. There is a > > > > > > > > > > Mesos performance working group that folks can join if > > > > > > > > > > they'd like to > > > > > > > > > > contribute:http://mesos.apache.org/blog/performance-working-group-progress-report/ > > > > > > > > > > > > > > > > > > > > I'm not sure what you mean by branch. Everything we used to > > > > > > > > > > scale test is on master. > > > > > > > > > > > > > > > > > > > > On Mon, Jan 15, 2018 at 10:08 AM, Meghdoot bhattacharya > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > David, should twitter try against mesos 1.5 to see if > > > > > > > > > > > things are better with the new api instead of libmesos. > > > > > > > > > > > This is going to be a drift over time that will stop us > > > > > > > > > > > from adopting new features. > > > > > > > > > > > > > > > > > > > > > > If it was sometime back it would be good to rerun the > > > > > > > > > > > tests and open a ticket in Mesos if issues exist. All > > > > > > > > > > > aurora users can then push for resolution. > > > > > > > > > > > > > > > > > > > > > > Also details on branch etc that has the api integration? > > > > > > > > > > > > > > > > > > > > > > Thx > > > > > > > > > > > > > > > > > > > > > > On Jan 12, 2018, at 11:39 AM, David McLaughlin > > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > I'm not sure I agree with the summary. Bill's proposal > > > > > > > > > > > > was using shutdown only when using the new API. I would > > > > > > > > > > > > also support this if it's possible. > > > > > > > > > > > > On Fri, Jan 12, 2018 at 11:14 AM, Mohit Jaggi > > > > > > > > > > > > <[email protected]> wrote: > > > > > > > > > > > > > Summary so far:- Bill supports making this change > > > > > > > > > > > > > - This change cannot be made in a backward compatible > > > > > > > > > > > > > manner > > > > > > > > > > > > > - David (Twitter) does not want to use HTTP APIs due > > > > > > > > > > > > > to performance concerns. I conclude that folks from > > > > > > > > > > > > > Twitter don't support this change > > > > > > > > > > > > > > > > > > > > > > > > > > Question: > > > > > > > > > > > > > - Are there other users that want this change? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
