Thanks Tomas.

We're still quite far from the 10k-20k machines limit :-)

Currently, our framework scheduler generates many (millions) of mostly
small tasks (some in the ~100ms, some in the few seconds).
I understand that the network is the main bottleneck, but we sometimes
experience lost tasks, and sometimes I see master logs indicating that the
master is unable to talk with the zookeeper service (which is on the same
host), and I was wondering if it's related to CPU/RAM of the master machine.
Is 1 CPU enough? 2? 4?
1GiB RAM? 4? 8?

On Thu, Jan 8, 2015 at 5:00 PM, Tomas Barton <[email protected]> wrote:

> Hi Itamar,
>
> there's definitely certain limit of machines which can Mesos master
> handle. This limit is between 10 000 - 20 000 (that's number
> reported by Twitter). This bottleneck is caused by event loop which
> handles communication at master.
>
> With hundreds of machines you should be fine. Only in case that your
> framework scheduler would demand
> too many resources for computing allocations you might encounter some
> problems.
>
> How does the strength of the master & scheduler machines affect the
>> overall cluster performance?
>
>
> I would say that the network is usually the main bottleneck. Adding extra
> RAM won't improve mesos-master
> performance. Of course if there's high CPU load on master you might
> observe performance regression. Also
> this depends on granularity of your tasks, if you have few long running
> tasks or many short tasks (which runs
> just hundreds of ms).
>
> Tomas
>
>
> On 6 January 2015 at 10:12, Itamar Ostricher <[email protected]> wrote:
>
>> Are there recommendations regarding master / scheduler machines resources
>> as function of cluster size?
>>
>> Say I have a cluster with hundreds of slave machines and thousands of
>> CPUs, with a single framework that will schedule millions of tasks.
>> How does the strength of the master & scheduler machines affect the
>> overall cluster performance?
>>
>> Thanks,
>> - Itamar.
>>
>
>

Reply via email to