I was able to figure this out with my C++ framework. I created a new
thread (pthread_create) and detached its execution from the executor
(pthread_detach) and then sent a TASK_RUNNING status update to framework.
Thanks for the explanation!


On Tue, Jul 1, 2014 at 2:51 PM, Vinod Kone <[email protected]> wrote:

> What Sharma said.
>
> Both the scheduler and executor drivers are single threaded i.e., you will
> only get one call back at a time. IOW, unless you return from one callback
> you won't get the next callback.
>
>
> On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <[email protected]>
> wrote:
>
>> Hi Asim,
>>
>> I am using (developing) a Java executor. I see a similar strategy in the
>> Mesos-Hadoop executor.
>>
>>
>> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java
>>
>> Executor's successful launching of the task (asynchronously) is usually
>> immediately followed by a TaskState.TASK_RUNNING status message to
>> driver. It can then return from the launchTask method, but the executor
>> process shouldn't exit, it will have to remain running for at least the
>> duration of the task. Upon completion of the task, the executor must notify
>> Mesos of its completion. A task lost status will be reported by Mesos if
>> the executor were to exit pre-maturely.
>>
>> My explanation is from understanding Mesos as a user and framework
>> developer. Someone from the Mesos dev team may have a better way to explain
>> this.
>> I suspect framework callbacks, at least at the executor, aren't done
>> concurrently. I haven't looked in to the details of why/how/etc.
>>
>>
>>
>>
>>
>> On Tue, Jul 1, 2014 at 7:48 AM, Asim <[email protected]> wrote:
>>
>>> Thanks for your response!
>>>
>>> Yes the executor (launchTask) only gets one task that it executes
>>> synchronously and finishes. Since launchTask is a callback, my intuition
>>>  is the scheduler should launch these tasks in parallel (even within a
>>> single machine) after calculating the resources required. I can create a
>>> new thread in launchTask() callback and return immediately but that will
>>> cause a lost slave since the scheduler assumes it is finished but there is
>>> a zombie thread still around. Hence, I am not completely sure creating new
>>> threads will solve this issue.
>>>
>>> I am using the C++ framework. Is there an example on how this is
>>> accomplished in current frameworks?  I looked at Spark and it does not seem
>>> to be doing anything special for its callbacks to ensure that multiple
>>> tasks on a single machine execute in parallel.
>>>
>>> Thanks,
>>> Asim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <[email protected]>
>>> wrote:
>>>
>>>> A likely scenario is that your executor is running the task
>>>> synchronously inside the callback to launchTask(). If you make it instead
>>>> run the task asynchronously (e.g., in a separate thread), that should
>>>> resolve it.
>>>>
>>>>
>>>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>>>> run simultaneously. Currently, I find that every machine processes the
>>>>> tasks in a serial fashion one after another.
>>>>>
>>>>> I have written a framework with a scheduler and a executor. The
>>>>> scheduler launches a task list on a bunch of machines (that show up as
>>>>> offers). When I send a task list to run
>>>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>>>> machine picks up one task at a time (and then goes to the next). This
>>>>> happens even though the offer can accommodate more than one task from this
>>>>> task list easily.
>>>>>
>>>>> Is there something that I am missing?
>>>>>
>>>>> Thanks,
>>>>> Asim
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to