What Sharma said.

Both the scheduler and executor drivers are single threaded i.e., you will
only get one call back at a time. IOW, unless you return from one callback
you won't get the next callback.


On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <[email protected]> wrote:

> Hi Asim,
>
> I am using (developing) a Java executor. I see a similar strategy in the
> Mesos-Hadoop executor.
>
>
> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java
>
> Executor's successful launching of the task (asynchronously) is usually
> immediately followed by a TaskState.TASK_RUNNING status message to
> driver. It can then return from the launchTask method, but the executor
> process shouldn't exit, it will have to remain running for at least the
> duration of the task. Upon completion of the task, the executor must notify
> Mesos of its completion. A task lost status will be reported by Mesos if
> the executor were to exit pre-maturely.
>
> My explanation is from understanding Mesos as a user and framework
> developer. Someone from the Mesos dev team may have a better way to explain
> this.
> I suspect framework callbacks, at least at the executor, aren't done
> concurrently. I haven't looked in to the details of why/how/etc.
>
>
>
>
>
> On Tue, Jul 1, 2014 at 7:48 AM, Asim <[email protected]> wrote:
>
>> Thanks for your response!
>>
>> Yes the executor (launchTask) only gets one task that it executes
>> synchronously and finishes. Since launchTask is a callback, my intuition
>>  is the scheduler should launch these tasks in parallel (even within a
>> single machine) after calculating the resources required. I can create a
>> new thread in launchTask() callback and return immediately but that will
>> cause a lost slave since the scheduler assumes it is finished but there is
>> a zombie thread still around. Hence, I am not completely sure creating new
>> threads will solve this issue.
>>
>> I am using the C++ framework. Is there an example on how this is
>> accomplished in current frameworks?  I looked at Spark and it does not seem
>> to be doing anything special for its callbacks to ensure that multiple
>> tasks on a single machine execute in parallel.
>>
>> Thanks,
>> Asim
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <[email protected]>
>> wrote:
>>
>>> A likely scenario is that your executor is running the task
>>> synchronously inside the callback to launchTask(). If you make it instead
>>> run the task asynchronously (e.g., in a separate thread), that should
>>> resolve it.
>>>
>>>
>>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to launch multiple tasks on multiple machines (t >> m) that can
>>>> run simultaneously. Currently, I find that every machine processes the
>>>> tasks in a serial fashion one after another.
>>>>
>>>> I have written a framework with a scheduler and a executor. The
>>>> scheduler launches a task list on a bunch of machines (that show up as
>>>> offers). When I send a task list to run
>>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every
>>>> machine picks up one task at a time (and then goes to the next). This
>>>> happens even though the offer can accommodate more than one task from this
>>>> task list easily.
>>>>
>>>> Is there something that I am missing?
>>>>
>>>> Thanks,
>>>> Asim
>>>>
>>>>
>>>
>>
>

Reply via email to