Re: UNDERSTANDING HADOOP PERFORMANCE

MARCOS MEDRADO RUBINELLI Thu, 11 Apr 2013 04:18:19 -0700

dfs.namenode.handler.count and dfs.datanode.handler.count control how many 
concurrent threads the server will have to handle incoming requests. The 
default values should be fine for smaller clusters, but if you have a lot of 
simultaneous HDFS operations, you may see performance gains by increasing these 
numbers. Just make sure you have the memory to spare and adjust your heap sizes 
accordingly.


dfs.heartbeat.interval and dfs.blockreport.intervalMsec will affect performance 
in larger clusters. Datanodes send a message to the namenode saying they are 
still alive every dfs.heartbeat.interval seconds, and after 
dfs.namenode.stale.datanode.interval milliseconds without a heartbeat, the 
namenode will mark that datanode as stale. Similarly, the datanode will send a 
list of all the blocks it has every dfs.blockreport.intervalMsec milliseconds. 
For a cluster of 30 machines, that means the namenode receives a heartbeat, on 
average, every 0.1 seconds, and a block report every 6 minutes, which should be 
a negligible load and worth the extra reliability. If your block reports are 
taking too long, that's a sign that you have too many small files and should 
look into archiving or consolidating them somehow. Personally, I ran into 
trouble around 1 million blocks/datanode.

dfs.namenode.decommission.interval is only used when removing datanodes from 
the cluster. You can safely ignore it.

Regards,
Marcos

On 11-04-2013 07:19, Dibyendu Karmakar wrote:

Hi everyone,
I am testing hadoop performance. I have come accross the following parameters:
1. dfs.replication
2. dfs.block.size
3. dfs.heartbeat.interval   (dafault: 3)
4. dfs.blockreport.intervalMsec   (default: 3600000)
5. dfs.namenode.handler.count   (default: 10)
6. dfs.datanode.handler.count   (default: 3)
7.dfs.replication.interval    (default: 3)
8.dfs.namenode.decomission.interval    (default: 300)

I have successfully tested 1 and 2 parameters. But the rest of the
parameters starting from dfs.heartbeat.interval is confusing me a lot.

On increment of those parameters, will the hadoop perform better? (
considering separately for read and write operation )...
OR, do I have to decrease those parameters to have hadoop perform better?

Anyone please help. If possible please explain
dfs.namenode.hanlder.count and dfs.datanode.handler.count i.e. what
these two parameters do?

Thank you

Re: UNDERSTANDING HADOOP PERFORMANCE

Reply via email to