What Sharma said. Both the scheduler and executor drivers are single threaded i.e., you will only get one call back at a time. IOW, unless you return from one callback you won't get the next callback.
On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <[email protected]> wrote: > Hi Asim, > > I am using (developing) a Java executor. I see a similar strategy in the > Mesos-Hadoop executor. > > > https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java > > Executor's successful launching of the task (asynchronously) is usually > immediately followed by a TaskState.TASK_RUNNING status message to > driver. It can then return from the launchTask method, but the executor > process shouldn't exit, it will have to remain running for at least the > duration of the task. Upon completion of the task, the executor must notify > Mesos of its completion. A task lost status will be reported by Mesos if > the executor were to exit pre-maturely. > > My explanation is from understanding Mesos as a user and framework > developer. Someone from the Mesos dev team may have a better way to explain > this. > I suspect framework callbacks, at least at the executor, aren't done > concurrently. I haven't looked in to the details of why/how/etc. > > > > > > On Tue, Jul 1, 2014 at 7:48 AM, Asim <[email protected]> wrote: > >> Thanks for your response! >> >> Yes the executor (launchTask) only gets one task that it executes >> synchronously and finishes. Since launchTask is a callback, my intuition >> is the scheduler should launch these tasks in parallel (even within a >> single machine) after calculating the resources required. I can create a >> new thread in launchTask() callback and return immediately but that will >> cause a lost slave since the scheduler assumes it is finished but there is >> a zombie thread still around. Hence, I am not completely sure creating new >> threads will solve this issue. >> >> I am using the C++ framework. Is there an example on how this is >> accomplished in current frameworks? I looked at Spark and it does not seem >> to be doing anything special for its callbacks to ensure that multiple >> tasks on a single machine execute in parallel. >> >> Thanks, >> Asim >> >> >> >> >> >> >> >> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <[email protected]> >> wrote: >> >>> A likely scenario is that your executor is running the task >>> synchronously inside the callback to launchTask(). If you make it instead >>> run the task asynchronously (e.g., in a separate thread), that should >>> resolve it. >>> >>> >>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I want to launch multiple tasks on multiple machines (t >> m) that can >>>> run simultaneously. Currently, I find that every machine processes the >>>> tasks in a serial fashion one after another. >>>> >>>> I have written a framework with a scheduler and a executor. The >>>> scheduler launches a task list on a bunch of machines (that show up as >>>> offers). When I send a task list to run >>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every >>>> machine picks up one task at a time (and then goes to the next). This >>>> happens even though the offer can accommodate more than one task from this >>>> task list easily. >>>> >>>> Is there something that I am missing? >>>> >>>> Thanks, >>>> Asim >>>> >>>> >>> >> >

