I was able to figure this out with my C++ framework. I created a new thread (pthread_create) and detached its execution from the executor (pthread_detach) and then sent a TASK_RUNNING status update to framework. Thanks for the explanation!
On Tue, Jul 1, 2014 at 2:51 PM, Vinod Kone <[email protected]> wrote: > What Sharma said. > > Both the scheduler and executor drivers are single threaded i.e., you will > only get one call back at a time. IOW, unless you return from one callback > you won't get the next callback. > > > On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila <[email protected]> > wrote: > >> Hi Asim, >> >> I am using (developing) a Java executor. I see a similar strategy in the >> Mesos-Hadoop executor. >> >> >> https://github.com/mesos/hadoop/blob/master/src/main/java/org/apache/hadoop/mapred/MesosExecutor.java >> >> Executor's successful launching of the task (asynchronously) is usually >> immediately followed by a TaskState.TASK_RUNNING status message to >> driver. It can then return from the launchTask method, but the executor >> process shouldn't exit, it will have to remain running for at least the >> duration of the task. Upon completion of the task, the executor must notify >> Mesos of its completion. A task lost status will be reported by Mesos if >> the executor were to exit pre-maturely. >> >> My explanation is from understanding Mesos as a user and framework >> developer. Someone from the Mesos dev team may have a better way to explain >> this. >> I suspect framework callbacks, at least at the executor, aren't done >> concurrently. I haven't looked in to the details of why/how/etc. >> >> >> >> >> >> On Tue, Jul 1, 2014 at 7:48 AM, Asim <[email protected]> wrote: >> >>> Thanks for your response! >>> >>> Yes the executor (launchTask) only gets one task that it executes >>> synchronously and finishes. Since launchTask is a callback, my intuition >>> is the scheduler should launch these tasks in parallel (even within a >>> single machine) after calculating the resources required. I can create a >>> new thread in launchTask() callback and return immediately but that will >>> cause a lost slave since the scheduler assumes it is finished but there is >>> a zombie thread still around. Hence, I am not completely sure creating new >>> threads will solve this issue. >>> >>> I am using the C++ framework. Is there an example on how this is >>> accomplished in current frameworks? I looked at Spark and it does not seem >>> to be doing anything special for its callbacks to ensure that multiple >>> tasks on a single machine execute in parallel. >>> >>> Thanks, >>> Asim >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Jun 30, 2014 at 4:48 PM, Sharma Podila <[email protected]> >>> wrote: >>> >>>> A likely scenario is that your executor is running the task >>>> synchronously inside the callback to launchTask(). If you make it instead >>>> run the task asynchronously (e.g., in a separate thread), that should >>>> resolve it. >>>> >>>> >>>> On Mon, Jun 30, 2014 at 12:48 PM, Asim <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I want to launch multiple tasks on multiple machines (t >> m) that can >>>>> run simultaneously. Currently, I find that every machine processes the >>>>> tasks in a serial fashion one after another. >>>>> >>>>> I have written a framework with a scheduler and a executor. The >>>>> scheduler launches a task list on a bunch of machines (that show up as >>>>> offers). When I send a task list to run >>>>> with driver->launchTasks(offers[i].id(), tasks[i]) I find that every >>>>> machine picks up one task at a time (and then goes to the next). This >>>>> happens even though the offer can accommodate more than one task from this >>>>> task list easily. >>>>> >>>>> Is there something that I am missing? >>>>> >>>>> Thanks, >>>>> Asim >>>>> >>>>> >>>> >>> >> >

