Thanks for the info!

Am 13.07.2015 um 11:22 schrieb Arjun Sharma:
I am not measuring RAM or CPU usage. I am just measuring the overall time the job takes to finish on a large input. For assigning RAM to the workers, I am using the job parameters -Dmapreduce.map.memory.mb=9300 -Dmapreduce.map.java.opts="-Xms9G -Xmx9G" (I am running on YARN).

On Mon, Jul 13, 2015 at 2:05 AM, Sonja Koenig <[email protected] <mailto:[email protected]>> wrote:

    Hi there!

    On a related matter:
    May I ask you how you perform your measurements? Especially for
    capturing RAM and CPU usage..
    I also want to do some performance tests and I would be thankful
    to hear how you succeeded on that issue ;)

    Regards,
    Sonja


    Am 13.07.2015 um 10:56 schrieb Arjun Sharma:

        Hi,

        Many of the discussions on this forum suggest using one worker
        per physical machine, and increasing the number of threads per
        worker, versus using multiple workers per physical machine,
        with a less number of threads. This does not seem to be the
        case with my experiments.

        The cluster I am using has 12 physical machines (used
        exclusively for workers), 64 GB of RAM and 12 cores each. I
        experimented with two setups:

        Setup 1 runs 72 workers (i.e., 6 workers per machine), 72*72
        partitions, which is the default, and 8 threads per worker.

        Setup 2 tries to simulate Setup 1, but using threads instead
        of workers. Therefore, it has 12 workers (1 worker per
        machine), 72*72 partitions (using numUserPartitions), and
        since the number of parallel tasks per machine in Setup 1 is 6
        workers * 8 threads, then the number of compute, input, output
        threads is set to 48.

        In both cases 56 GB of RAM is assigned equally to all workers
        on the machine (either given to the 1 worker on that machine
        or divided among 6 of them).

        In my case, Setup 1 performs significantly better (faster)
        than Setup 2, which sounds counter intuitive, and not agreeing
        with other suggestions of using less number of workers, and
        more number of threads. Is there anything I am missing here?
        Is there any kind of tuning or configuration parameter setting
        that can make Setup 2 outperform Setup 1?

        Thanks!




Reply via email to