RE: Optimal number of Workers

chadi jaber Wed, 16 Apr 2014 03:47:21 -0700

Thanks !! it's clear now

From: [email protected]
To: [email protected]
Subject: RE: Optimal number of Workers
Date: Wed, 16 Apr 2014 13:58:15 +0530





Giraph uses threads for compute, netty server, netty client on workers, 
execution pools, input, output etc.You can see most of these options in 
org.apache.giraph.conf.GiraphConstants for instance
  /** Netty client threads */  IntConfOption NETTY_CLIENT_THREADS =      new 
IntConfOption("giraph.nettyClientThreads", 4, "Netty client threads");
  /** Netty server threads */  IntConfOption NETTY_SERVER_THREADS =      new 
IntConfOption("giraph.nettyServerThreads", 16,          "Netty server threads");
  /** Number of threads for vertex computation */  IntConfOption 
NUM_COMPUTE_THREADS =      new IntConfOption("giraph.numComputeThreads", 1,     
     "Number of threads for vertex computation");
  /** Number of threads for input split loading */  IntConfOption 
NUM_INPUT_THREADS =      new IntConfOption("giraph.numInputThreads", 1,         
 "Number of threads for input split loading");

The idea is that if you run your job in a cluster of 5 machines: typically 1 
machine is the master & 4 of them are "workers" which load the graph & compute 
on it. Each worker is a separate machine and to maximize its utilization we can 
use as many threads as it can handle.
However, if you are running it in pseudo mode then all workers run on the same 
machine & still try to launch the number of threads (default set in the config) 
- though each worker is now a thread (instead of a machine) it still launches 
all these other threads unscrupulously. Anyway, u can configure these threads 
spawned by workers to reduce the over all number of threads launched in your 
one machine.
From: [email protected]
To: [email protected]
Subject: Optimal number of Workers
Date: Tue, 15 Apr 2014 13:34:53 +0200




Hello !!Can anybody explain how threads are used by worker in Giraph ? for 
which purposes ? how the number of thread to use is determined by worker?
I often have the following error :org.apache.hadoop.mapred.Child: Error running 
child : java.lang.OutOfMemoryError: unable to create new native thread.
A check on the number of thread by worker gives child processes with 100 
threads by worker process (10 workers in a 12 processors machine), which is in 
my opinion too large isn't it ?if i reduce the number of workers , the number 
of threads decreases. How must we choose the number of workers?
Thanks in advance.Chadi

RE: Optimal number of Workers

Reply via email to