I've filed a ticket to immediately re-offer recovered resources from terminal tasks / executors:
https://issues.apache.org/jira/browse/MESOS-3078 On Fri, Jul 17, 2015 at 2:24 PM, Philip Weaver <philip.wea...@gmail.com> wrote: > Your advice worked and made a huge difference. With > allocation_interval=50ms, the 1000 tasks now execute in 21s instead of > 120s. Thanks. > > On Fri, Jul 17, 2015 at 2:20 PM, Philip Weaver <philip.wea...@gmail.com> > wrote: > >> Ok, thanks! >> >> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <agall...@concord.io> >> wrote: >> >>> I use a similar pattern. >>> >>> I have my own scheduler as you have. I deploy my own executor which >>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a >>> proc. It monitors the child proc and reports status of child pid exit >>> status. >>> >>> Check out the Marathon code if you are writing in scala. It is an >>> excellent example for both scheduler and executor templates. >>> >>> -ag >>> >>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <philip.wea...@gmail.com> >>> wrote: >>> >>>> Awesome, I suspected that was the case, but hadn't discovered the >>>> --allocation_interval flag, so I will use that. >>>> >>>> I installed from the mesosphere RPMs and didn't change any flags from >>>> there. I will try to find some logs that provide some insight into the >>>> execution times. >>>> >>>> I am using a command task. I haven't looked into executors yet; I had a >>>> hard time finding some examples in my language (Scala). >>>> >>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler < >>>> benjamin.mah...@gmail.com> wrote: >>>> >>>>> One other thing, do you use an executor to run many tasks? Or are you >>>>> using a command task? >>>>> >>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler < >>>>> benjamin.mah...@gmail.com> wrote: >>>>> >>>>>> Currently, recovered resources are not immediately re-offered as you >>>>>> noticed, and the default allocation interval is 1 second. I'd recommend >>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the >>>>>> second bullet you listed. Although, in your case it would be better to >>>>>> immediately re-offer recovered resources (feel free to file a ticket for >>>>>> supporting that). >>>>>> >>>>>> For the first bullet, mind providing some more information? E.g. >>>>>> master flags, slave flags, scheduler logs, master logs, slave logs, >>>>>> executor logs? We would need to trace through a task launch to see where >>>>>> the latency is being introduced. >>>>>> >>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver < >>>>>> philip.wea...@gmail.com> wrote: >>>>>> >>>>>>> I'm trying to understand the behavior of mesos, and if what I am >>>>>>> observing is typical or if I'm doing something wrong, and what options I >>>>>>> have for improving the performance of how offers are made and how tasks >>>>>>> are >>>>>>> executed for my particular use case. >>>>>>> >>>>>>> I have written a Scheduler that has a queue of very small tasks (for >>>>>>> testing, they are "echo hello world", but in production many of them >>>>>>> won't >>>>>>> be much more expensive than that). Each task is configured to use 1 cpu >>>>>>> resource. When resourceOffers is called, I launch as many tasks as I >>>>>>> can in >>>>>>> the given offers; that is, one call to driver.launchTasks for each >>>>>>> offer, >>>>>>> with a list of tasks that has one task for each cpu in that offer. >>>>>>> >>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes >>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos >>>>>>> because >>>>>>> we want to use it to replace our current homegrown cluster controller, >>>>>>> which can execute 1000 tasks in way less than 120s. >>>>>>> >>>>>>> I am seeing two things that concern me: >>>>>>> >>>>>>> - The time between driver.launchTasks and receiving a callback >>>>>>> to statusUpdate when the task completes is typically 200-500ms, and >>>>>>> sometimes even as high as 1000-2000ms. >>>>>>> - The time between when a task completes and when I get an offer >>>>>>> for the newly freed resource is another 500ms or so. >>>>>>> >>>>>>> These latencies explain why I can only execute tasks at a rate of >>>>>>> about 8/s. >>>>>>> >>>>>>> It looks like my offers always include all 4 cores on each machine, >>>>>>> which would indicate that mesos doesn't like to send an offer as soon >>>>>>> as a >>>>>>> single resource is avaiable, and prefers to delay and send an offer with >>>>>>> more resources in it. Is this true? >>>>>>> >>>>>>> Thanks in advance for any advice you can offer! >>>>>>> >>>>>>> - Phllip >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> >>> >> >