David, - LCD makes sense. Does that mean that Twitter is using the SCHEDULER_DRIVER <https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java#L72> version? - I don't see Bill's proposal on this thread. Did I miss it?
Renan, VersionedDriverFactory <https://github.com/apache/aurora/blob/2e1ca42887bc8ea1e8c6cddebe9d1cf29268c714/src/main/java/org/apache/aurora/scheduler/mesos/VersionedDriverFactory.java#L24>'s comments indicate that libmesos is still used. What am I missing? BTW, with the patch for Thermos (from Stephan I think), the need for switching to SHUTDOWN is reduced. Mohit. On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin <dmclaugh...@apache.org> wrote: > Sorry, the other approach outlined by Bill would in theory work too, but > it sounds like in practice it also needs more changes on the Mesos side. > > On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin <dmclaugh...@apache.org> > wrote: > >> Right. In order to keep the current abstraction in Aurora (both APIs), we >> obviously have to bind to the lower common denominator API methods. So the >> only way to integrate with shutdown will be to fix the performance issues >> so we can switch to the new API. >> >> The performance issue we ran into at Twitter was that with status updates >> that were similar to our production volume, they started to get dropped and >> tasks end up being LOST and unnecessarily killed. So it's a definite >> blocker for us to adopt in its current state. We have someone who has >> fixing this on the Mesos side in their backlog, but it's currently not the >> highest priority for us. >> >> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <renanidelva...@gmail.com >> > wrote: >> >>> The HTTP API is what is used under the hood for V0 and V1 (instead of >>> libmesos), I believe that's what David was referencing when he mentioned >>> the HTTP performance issues. Here's a better explanation from the original >>> patch submitted by Zameer: https://github.com/apa >>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d >>> iff-75bd5a98db87502a2332e9110d2eafc6 >>> >>> I'm not sure about the Shutdown call, as you mentioned, the versioned >>> driver seems to have the method but the driver interface does not. This >>> might get tricky from here on in since Mesos has V1 only compatible calls. >>> >>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <mohit.ja...@uber.com> >>> wrote: >>> >>>> Thanks Renan. I saw that code. "Driver" interface does not have >>>> SHUTDOWN...so it is not "compatible". I was trying to change to >>>> VersionedSchedulerDriverService all over the code (that wreaks havoc >>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that >>>> call either. Perhaps, that is why David referred to the HTTP API. >>>> >>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle < >>>> renanidelva...@gmail.com> wrote: >>>> >>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e >>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler >>>>> /mesos/SchedulerDriverModule.java >>>>> >>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e >>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler >>>>> /mesos/VersionedSchedulerDriverService.java#L50 >>>>> >>>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <mohit.ja...@uber.com> >>>>> wrote: >>>>> >>>>>> David, >>>>>> Where can I find this code? >>>>>> >>>>>> Mohit. >>>>>> >>>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin < >>>>>> dmclaugh...@apache.org> wrote: >>>>>> >>>>>>> The new API is present in Aurora in a compatibility layer, but the >>>>>>> HTTP performance issues still exist so we can't make it the default. >>>>>>> >>>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <wfar...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not >>>>>>>> present. Additionally, the SHUTDOWN call is not available in the API >>>>>>>> used >>>>>>>> by Aurora. Last i knew, Aurora could not use the "new" API because of >>>>>>>> performance issues in the implementation, but i do not know where that >>>>>>>> stands today. >>>>>>>> >>>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http >>>>>>>> -api/#shutdown >>>>>>>> >>>>>>>>> NOTE: This is a new call that was not present in the old API >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <mohit.ja...@uber.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Folks, >>>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN >>>>>>>>> for killing tasks. As Aurora has an executor per task, won't SHUTDOWN >>>>>>>>> work >>>>>>>>> better? It will avoid zombie executors. >>>>>>>>> >>>>>>>>> Mohit. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >