Hi there!

On a related matter:
May I ask you how you perform your measurements? Especially for capturing RAM and CPU usage.. I also want to do some performance tests and I would be thankful to hear how you succeeded on that issue ;)

Regards,
Sonja

Am 13.07.2015 um 10:56 schrieb Arjun Sharma:
Hi,

Many of the discussions on this forum suggest using one worker per physical machine, and increasing the number of threads per worker, versus using multiple workers per physical machine, with a less number of threads. This does not seem to be the case with my experiments.

The cluster I am using has 12 physical machines (used exclusively for workers), 64 GB of RAM and 12 cores each. I experimented with two setups:

Setup 1 runs 72 workers (i.e., 6 workers per machine), 72*72 partitions, which is the default, and 8 threads per worker.

Setup 2 tries to simulate Setup 1, but using threads instead of workers. Therefore, it has 12 workers (1 worker per machine), 72*72 partitions (using numUserPartitions), and since the number of parallel tasks per machine in Setup 1 is 6 workers * 8 threads, then the number of compute, input, output threads is set to 48.

In both cases 56 GB of RAM is assigned equally to all workers on the machine (either given to the 1 worker on that machine or divided among 6 of them).

In my case, Setup 1 performs significantly better (faster) than Setup 2, which sounds counter intuitive, and not agreeing with other suggestions of using less number of workers, and more number of threads. Is there anything I am missing here? Is there any kind of tuning or configuration parameter setting that can make Setup 2 outperform Setup 1?

Thanks!

Reply via email to