Re: High latency when scheduling and executing many tiny tasks.

Alexander Gallego Fri, 17 Jul 2015 14:25:49 -0700

I take back the executor in scala. Just looked at the source and both
PathExecutor and CommandExecutor proxy to mesos TaskBuilder.setCommand



    executor match {
      case CommandExecutor() =>
        builder.setCommand(TaskBuilder.commandInfo(app, Some(taskId), host,
ports, envPrefix))
        containerProto.foreach(builder.setContainer)

      case PathExecutor(path) =>
        val executorId = f"marathon-${taskId.getValue}" // Fresh executor
        val executorPath = s"'$path'" // TODO: Really escape this.
        val cmd = app.cmd orElse app.args.map(_ mkString " ") getOrElse ""
        val shell = s"chmod ug+rx $executorPath && exec $executorPath $cmd"
        val command = TaskBuilder.commandInfo(app, Some(taskId), host,
ports, envPrefix).toBuilder.setValue(shell)

        val info = ExecutorInfo.newBuilder()
          .setExecutorId(ExecutorID.newBuilder().setValue(executorId))
          .setCommand(command)
        containerProto.foreach(info.setContainer)
        builder.setExecutor(info)
        val binary = new ByteArrayOutputStream()
        mapper.writeValue(binary, app)
        builder.setData(ByteString.copyFrom(binary.toByteArray))
    }


The pattern of execvp'ing is still what I use and in fact what mesos uses:

        if (task.command().shell()) {
          execl(
              "/bin/sh",
              "sh",
              "-c",
              task.command().value().c_str(),
              (char*) NULL);
        } else {
          execvp(task.command().value().c_str(), argv);
        }


Sorry for the missinformation about the executor in Marathon.



On Fri, Jul 17, 2015 at 5:20 PM, Philip Weaver <[email protected]>
wrote:

> Ok, thanks!
>
> On Fri, Jul 17, 2015 at 2:18 PM, Alexander Gallego <[email protected]>
> wrote:
>
>> I use a similar pattern.
>>
>> I have my own scheduler as you have. I deploy my own executor which
>> downloads a tar from some storage and effectively ` execvp ( ... ) ` a
>> proc. It monitors the child proc and reports status of child pid exit
>> status.
>>
>> Check out the Marathon code if you are writing in scala. It is an
>> excellent example for both scheduler and executor templates.
>>
>> -ag
>>
>> On Fri, Jul 17, 2015 at 5:06 PM, Philip Weaver <[email protected]>
>> wrote:
>>
>>> Awesome, I suspected that was the case, but hadn't discovered the
>>> --allocation_interval flag, so I will use that.
>>>
>>> I installed from the mesosphere RPMs and didn't change any flags from
>>> there. I will try to find some logs that provide some insight into the
>>> execution times.
>>>
>>> I am using a command task. I haven't looked into executors yet; I had a
>>> hard time finding some examples in my language (Scala).
>>>
>>> On Fri, Jul 17, 2015 at 2:00 PM, Benjamin Mahler <
>>> [email protected]> wrote:
>>>
>>>> One other thing, do you use an executor to run many tasks? Or are you
>>>> using a command task?
>>>>
>>>> On Fri, Jul 17, 2015 at 1:54 PM, Benjamin Mahler <
>>>> [email protected]> wrote:
>>>>
>>>>> Currently, recovered resources are not immediately re-offered as you
>>>>> noticed, and the default allocation interval is 1 second. I'd recommend
>>>>> lowering that (e.g. --allocation_interval=50ms), that should improve the
>>>>> second bullet you listed. Although, in your case it would be better to
>>>>> immediately re-offer recovered resources (feel free to file a ticket for
>>>>> supporting that).
>>>>>
>>>>> For the first bullet, mind providing some more information? E.g.
>>>>> master flags, slave flags, scheduler logs, master logs, slave logs,
>>>>> executor logs? We would need to trace through a task launch to see where
>>>>> the latency is being introduced.
>>>>>
>>>>> On Fri, Jul 17, 2015 at 12:26 PM, Philip Weaver <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I'm trying to understand the behavior of mesos, and if what I am
>>>>>> observing is typical or if I'm doing something wrong, and what options I
>>>>>> have for improving the performance of how offers are made and how tasks 
>>>>>> are
>>>>>> executed for my particular use case.
>>>>>>
>>>>>> I have written a Scheduler that has a queue of very small tasks (for
>>>>>> testing, they are "echo hello world", but in production many of them 
>>>>>> won't
>>>>>> be much more expensive than that). Each task is configured to use 1 cpu
>>>>>> resource. When resourceOffers is called, I launch as many tasks as I can 
>>>>>> in
>>>>>> the given offers; that is, one call to driver.launchTasks for each offer,
>>>>>> with a list of tasks that has one task for each cpu in that offer.
>>>>>>
>>>>>> On a cluster of 3 nodes and 4 cores each (12 total cores), it takes
>>>>>> 120s to execute 1000 tasks out of the queue. We are evaluting mesos 
>>>>>> because
>>>>>> we want to use it to replace our current homegrown cluster controller,
>>>>>> which can execute 1000 tasks in way less than 120s.
>>>>>>
>>>>>> I am seeing two things that concern me:
>>>>>>
>>>>>>    - The time between driver.launchTasks and receiving a callback to
>>>>>>    statusUpdate when the task completes is typically 200-500ms, and 
>>>>>> sometimes
>>>>>>    even as high as 1000-2000ms.
>>>>>>    - The time between when a task completes and when I get an offer
>>>>>>    for the newly freed resource is another 500ms or so.
>>>>>>
>>>>>> These latencies explain why I can only execute tasks at a rate of
>>>>>> about 8/s.
>>>>>>
>>>>>> It looks like my offers always include all 4 cores on each machine,
>>>>>> which would indicate that mesos doesn't like to send an offer as soon as 
>>>>>> a
>>>>>> single resource is avaiable, and prefers to delay and send an offer with
>>>>>> more resources in it. Is this true?
>>>>>>
>>>>>> Thanks in advance for any advice you can offer!
>>>>>>
>>>>>> - Phllip
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>


-- 





Sincerely,
Alexander Gallego
Co Founder & CTO

Re: High latency when scheduling and executing many tiny tasks.

Reply via email to