Re: Setting Hadoop head size

Marco Didonna Fri, 16 Dec 2011 06:52:31 -0800

On 16 December 2011 12:49, Hans Drexler <[email protected]> wrote:
> We are using Whirr to setup a rackspace cluster to run Hadoop jobs. We use
> the Cloudera Hadoop. Below is our hadoop.properties
>
>
>
> whirr.cluster-name=our_cluster
>
> whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,6
> hadoop-datanode+hadoop-tasktracker
>
> whirr.provider=cloudservers-us
>
> whirr.identity=${env:RACKSPACE_USERNAME}
>
> whirr.credential=${env:RACKSPACE_API_KEY}
>
> whirr.hardware-id=6
>
> whirr.image=49
>
> whirr.login-user=user
>
> whirr.private-key-file=/home/user/.ssh/id_rsa_whirr
>
> whirr.public-key-file=/home/user/.ssh/id_rsa_whirr.pub
>
> whirr.hadoop-install-function=install_cdh_hadoop
>
> whirr.hadoop-configure-function=configure_cdh_hadoop
>
>
>
>
>
> All is working fine. But now I want to change the hadoop configuration file
> on the nodes. Actually, we want to increase the amount of heap space
> available to Hadoop (HADOOP_HEAPSIZE). So we want to change the
> hadoop-env.sh file on each node.
>
>
>
> My Question is: How can I do that? Do I need to open the
> lib/whirr-cdh-0.6.0-incubating.jar and tweak the contents of that jar, then
> repackage it?
>
>
>
> I hope somebody can share some knowledge on this. Thanks!
>
>



The HADOOP_HEAPSIZE environment variable in hadoop-env.sh controls how
much heap space each daemon (datanode, tasktracker etc) is assigned
to. In addition to that task tracker launches separate child JVMs to
run map and reduce tasks in. Each of these child JVM is given by
default 200MB of maximum heap space. You can control this parameter by
modifying

mapred.map.child.java.opts=-Xmx500m
mapred.reduce.child.java.opts=-Xmx500m

You could also use mapred.child.java.opts but it didn't work for me.

I hope this helps.

MD

Re: Setting Hadoop head size

Reply via email to