Your advice worked and made a huge difference. With allocation_interval=50ms, the 1000 tasks now execute in 21s instead of 120s. Thanks.
On Fri, Jul 17, 2015 at 2:20 PM, Philip Weaver <philip.wea...@gmail.com> wrote: > Ok, thanks! > > On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <agall...@concord.io> > wrote: > >> I use a similar pattern. >> >> I have my own scheduler as you have. I deploy my own executor which >> downloads a tar from some storage and effectively ` execvp ( ... ) ` a >> proc. It monitors the child proc and reports status of child pid exit >> status. >> >> Check out the Marathon code if you are writing in scala. It is an >> excellent example for both scheduler and executor templates. >> >> -ag >> >> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <philip.wea...@gmail.com> >> wrote: >> >>> Awesome, I suspected that was the case, but hadn't discovered the >>> --allocation_interval flag, so I will use that. >>> >>> I installed from the mesosphere RPMs and didn't change any flags from >>> there. I will try to find some logs that provide some insight into the >>> execution times. >>> >>> I am using a command task. I haven't looked into executors yet; I had a >>> hard time finding some examples in my language (Scala). >>> >>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler < >>> benjamin.mah...@gmail.com> wrote: >>> >>>> One other thing, do you use an executor to run many tasks? Or are you >>>> using a command task? >>>> >>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler < >>>> benjamin.mah...@gmail.com> wrote: >>>> >>>>> Currently, recovered resources are not immediately re-offered as you >>>>> noticed, and the default allocation interval is 1 second. I'd recommend >>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the >>>>> second bullet you listed. Although, in your case it would be better to >>>>> immediately re-offer recovered resources (feel free to file a ticket for >>>>> supporting that). >>>>> >>>>> For the first bullet, mind providing some more information? E.g. >>>>> master flags, slave flags, scheduler logs, master logs, slave logs, >>>>> executor logs? We would need to trace through a task launch to see where >>>>> the latency is being introduced. >>>>> >>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver < >>>>> philip.wea...@gmail.com> wrote: >>>>> >>>>>> I'm trying to understand the behavior of mesos, and if what I am >>>>>> observing is typical or if I'm doing something wrong, and what options I >>>>>> have for improving the performance of how offers are made and how tasks >>>>>> are >>>>>> executed for my particular use case. >>>>>> >>>>>> I have written a Scheduler that has a queue of very small tasks (for >>>>>> testing, they are "echo hello world", but in production many of them >>>>>> won't >>>>>> be much more expensive than that). Each task is configured to use 1 cpu >>>>>> resource. When resourceOffers is called, I launch as many tasks as I can >>>>>> in >>>>>> the given offers; that is, one call to driver.launchTasks for each offer, >>>>>> with a list of tasks that has one task for each cpu in that offer. >>>>>> >>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes >>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos >>>>>> because >>>>>> we want to use it to replace our current homegrown cluster controller, >>>>>> which can execute 1000 tasks in way less than 120s. >>>>>> >>>>>> I am seeing two things that concern me: >>>>>> >>>>>> - The time between driver.launchTasks and receiving a callback to >>>>>> statusUpdate when the task completes is typically 200-500ms, and >>>>>> sometimes >>>>>> even as high as 1000-2000ms. >>>>>> - The time between when a task completes and when I get an offer >>>>>> for the newly freed resource is another 500ms or so. >>>>>> >>>>>> These latencies explain why I can only execute tasks at a rate of >>>>>> about 8/s. >>>>>> >>>>>> It looks like my offers always include all 4 cores on each machine, >>>>>> which would indicate that mesos doesn't like to send an offer as soon as >>>>>> a >>>>>> single resource is avaiable, and prefers to delay and send an offer with >>>>>> more resources in it. Is this true? >>>>>> >>>>>> Thanks in advance for any advice you can offer! >>>>>> >>>>>> - Phllip >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> >> >