RE: Doubts: Deployment and Configuration of YARN cluster

German Florez-Larrahondo Wed, 15 Jan 2014 05:51:39 -0800

Nirmal


-A good summary regarding memory configuration settings/best-practices can
be found here. Note that in YARN, the way you configure resource limits
dictates number of containers in the nodes and in the cluster:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk
_installing_manually_book/content/rpm-chap1-11.html

 

-A good intro to YARN configuration is this:

http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce
_11.html

 

Regards

.g

 

 

 

From: Nirmal Kumar [mailto:[email protected]] 
Sent: Wednesday, January 15, 2014 7:22 AM
To: [email protected]
Subject: Doubts: Deployment and Configuration of YARN cluster

 

All,

 

I am new to YARN and have certain doubts regarding the deployment and
configuration of YARN on a cluster.

 

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need
to distribute the below files to all the slave nodes in the cluster:

.         conf/core-site.xml

.         conf/hdfs-site.xml

.         conf/yarn-site.xml

.         conf/mapred-site.xml

 

Also we need to ONLY change the following file on each slave nodes:

.         conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

 

Do we need to change any other config file on the slave nodes?

Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
the slave nodes? 

This is since I might have a *heterogeneous environment* i.e. different
nodes with different memory and cores. For NM1 I might have 40GB memory and
for the other say 20GB.

 

Also,

{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a
Hadoop task subprocess.

{mapreduce.map.java.opts}         specify the *max. heap space* of the
allocated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memory.mb}

{mapreduce.reduce.java.opts}

are the above properties applicable to all the Map\Reduce tasks(from
different Map Reduce applications) in general, running on different slave
nodes?

or Can I change these for a particular slave node.? For e.g. say for a
SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
map task with 8GB. Same with the reduce task.

 

I need some understanding to *configure processing capacity* in the cluster
like Container Size, No. of Containers, No. of Mappers\Reducers. 

 

Thanks,

-Nirmal

 

  _____  







NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Reply via email to