Re: Recommended resources for master / scheduler machines

Tomas Barton Thu, 08 Jan 2015 11:34:23 -0800

Is ZooKeeper running in distributed mode?

ZooKeeper is writes periodically all data to disk (transaction log), so the
bottleneck could be ZooKeeper rather than
not enough CPUs. ZooKeeper limits each key to 1MB, typically 512MB should
be enough for ZooKeeper (or 4GB
might not be enough, depends on your use-case).


from ZooKeeper docs:

ZooKeeper's transaction log must be on a dedicated device. (A dedicated
partition is not enough.) ZooKeeper writes the log sequentially, without
seeking Sharing your log device with other processes can cause seeks and
contention, which in turn can cause multi-second delays.

 In particular, you should not create a situation in which ZooKeeper swaps
to disk. The disk is death to ZooKeeper. Everything is ordered, so if
processing one request swaps the disk, all other queued requests will
probably do the same. the disk. DON'T SWAP.


On 8 January 2015 at 16:47, Itamar Ostricher <ita...@yowza3d.com> wrote:

> Thanks Tomas.
>
> We're still quite far from the 10k-20k machines limit :-)
>
> Currently, our framework scheduler generates many (millions) of mostly
> small tasks (some in the ~100ms, some in the few seconds).
> I understand that the network is the main bottleneck, but we sometimes
> experience lost tasks, and sometimes I see master logs indicating that the
> master is unable to talk with the zookeeper service (which is on the same
> host), and I was wondering if it's related to CPU/RAM of the master machine.
> Is 1 CPU enough? 2? 4?
> 1GiB RAM? 4? 8?
>
> On Thu, Jan 8, 2015 at 5:00 PM, Tomas Barton <barton.to...@gmail.com>
> wrote:
>
>> Hi Itamar,
>>
>> there's definitely certain limit of machines which can Mesos master
>> handle. This limit is between 10 000 - 20 000 (that's number
>> reported by Twitter). This bottleneck is caused by event loop which
>> handles communication at master.
>>
>> With hundreds of machines you should be fine. Only in case that your
>> framework scheduler would demand
>> too many resources for computing allocations you might encounter some
>> problems.
>>
>> How does the strength of the master & scheduler machines affect the
>>> overall cluster performance?
>>
>>
>> I would say that the network is usually the main bottleneck. Adding extra
>> RAM won't improve mesos-master
>> performance. Of course if there's high CPU load on master you might
>> observe performance regression. Also
>> this depends on granularity of your tasks, if you have few long running
>> tasks or many short tasks (which runs
>> just hundreds of ms).
>>
>> Tomas
>>
>>
>> On 6 January 2015 at 10:12, Itamar Ostricher <ita...@yowza3d.com> wrote:
>>
>>> Are there recommendations regarding master / scheduler machines
>>> resources as function of cluster size?
>>>
>>> Say I have a cluster with hundreds of slave machines and thousands of
>>> CPUs, with a single framework that will schedule millions of tasks.
>>> How does the strength of the master & scheduler machines affect the
>>> overall cluster performance?
>>>
>>> Thanks,
>>> - Itamar.
>>>
>>
>>
>

Re: Recommended resources for master / scheduler machines

Reply via email to