Sorry, I guess referring to it as the libmesos way of talking to the Mesos
master is a bit misleading.

And I stand corrected, the V0 is only an adaptor to the V1 interface which
still uses the undocumented RPC way of talking to the master (
https://github.com/apache/mesos/blob/master/src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp)
while using V1 versioned protobufs.

V1 one on the other hand talks to Mesos via a well defined HTTP API.
There's still a dependency on libmesos because the implementation of the
code that handles the HTTP requests is made available via JNI. The big
difference here being that someone else can implement their own Java only
version of the driver and the dependency on libmesos would be gone.

Apologies for the confusion.

On Thu, Jan 11, 2018 at 2:03 PM, Mohit Jaggi <[email protected]> wrote:

> David,
> - LCD makes sense. Does that mean that Twitter is using the
>  SCHEDULER_DRIVER
> <https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982ed07b1f029150e245de/src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java#L72>
>  version?
> - I don't see Bill's proposal on this thread. Did I miss it?
>
> Renan,
> VersionedDriverFactory
> <https://github.com/apache/aurora/blob/2e1ca42887bc8ea1e8c6cddebe9d1cf29268c714/src/main/java/org/apache/aurora/scheduler/mesos/VersionedDriverFactory.java#L24>'s
> comments indicate that libmesos is still used. What am I missing?
>
> BTW, with the patch for Thermos (from Stephan I think), the need for
> switching to SHUTDOWN is reduced.
> Mohit.
>
> On Thu, Jan 11, 2018 at 2:01 PM, David McLaughlin <[email protected]>
> wrote:
>
>> Sorry, the other approach outlined by Bill would in theory work too, but
>> it sounds like in practice it also needs more changes on the Mesos side.
>>
>> On Thu, Jan 11, 2018 at 1:55 PM, David McLaughlin <[email protected]
>> > wrote:
>>
>>> Right. In order to keep the current abstraction in Aurora (both APIs),
>>> we obviously have to bind to the lower common denominator API methods. So
>>> the only way to integrate with shutdown will be to fix the performance
>>> issues so we can switch to the new API.
>>>
>>> The performance issue we ran into at Twitter was that with status
>>> updates that were similar to our production volume, they started to get
>>> dropped and tasks end up being LOST and unnecessarily killed. So it's a
>>> definite blocker for us to adopt in its current state. We have someone who
>>> has fixing this on the Mesos side in their backlog, but it's currently not
>>> the highest priority for us.
>>>
>>> On Thu, Jan 11, 2018 at 1:45 PM, Renan DelValle <
>>> [email protected]> wrote:
>>>
>>>> The HTTP API is what is used under the hood for V0 and V1 (instead of
>>>> libmesos), I believe that's what David was referencing when he mentioned
>>>> the HTTP performance issues. Here's a better explanation from the original
>>>> patch submitted by Zameer: https://github.com/apa
>>>> che/aurora/commit/705dbc7cd7c3ff477bcf766cdafe49a68ab47dee#d
>>>> iff-75bd5a98db87502a2332e9110d2eafc6
>>>>
>>>> I'm not sure about the Shutdown call, as you mentioned, the versioned
>>>> driver seems to have the method but the driver interface does not. This
>>>> might get tricky from here on in since Mesos has V1 only compatible calls.
>>>>
>>>> On Thu, Jan 11, 2018 at 1:24 PM, Mohit Jaggi <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks Renan. I saw that code. "Driver" interface does not have
>>>>> SHUTDOWN...so it is not "compatible". I was trying to change to
>>>>> VersionedSchedulerDriverService all over the code (that wreaks havoc
>>>>> across the tests!) but Mesos's Java wrapper doesn't seem to have that
>>>>> call either. Perhaps, that is why David referred to the HTTP API.
>>>>>
>>>>> On Thu, Jan 11, 2018 at 1:14 PM, Renan DelValle <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>>> /mesos/SchedulerDriverModule.java
>>>>>>
>>>>>> https://github.com/apache/aurora/blob/aae2b0dc73b7534c66982e
>>>>>> d07b1f029150e245de/src/main/java/org/apache/aurora/scheduler
>>>>>> /mesos/VersionedSchedulerDriverService.java#L50
>>>>>>
>>>>>> On Tue, Jan 9, 2018 at 1:21 PM, Mohit Jaggi <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> David,
>>>>>>> Where can I find this code?
>>>>>>>
>>>>>>> Mohit.
>>>>>>>
>>>>>>> On Sat, Dec 9, 2017 at 4:27 PM, David McLaughlin <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> The new API is present in Aurora in a compatibility layer, but the
>>>>>>>> HTTP performance issues still exist so we can't make it the default.
>>>>>>>>
>>>>>>>> On Sat, Dec 9, 2017 at 4:24 PM, Bill Farner <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Aurora pre-dates SHUTDOWN by several years, so the option was not
>>>>>>>>> present.  Additionally, the SHUTDOWN call is not available in the API 
>>>>>>>>> used
>>>>>>>>> by Aurora.  Last i knew, Aurora could not use the "new" API because of
>>>>>>>>> performance issues in the implementation, but i do not know where that
>>>>>>>>> stands today.
>>>>>>>>>
>>>>>>>>> https://mesos.apache.org/documentation/latest/scheduler-http
>>>>>>>>> -api/#shutdown
>>>>>>>>>
>>>>>>>>>> NOTE: This is a new call that was not present in the old API
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Dec 9, 2017 at 4:11 PM, Mohit Jaggi <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Folks,
>>>>>>>>>> Our Mesos team is wondering why Aurora chose KILL over SHUTDOWN
>>>>>>>>>> for killing tasks. As Aurora has an executor per task, won't 
>>>>>>>>>> SHUTDOWN work
>>>>>>>>>> better? It will avoid zombie executors.
>>>>>>>>>>
>>>>>>>>>> Mohit.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to