On 16 December 2011 12:49, Hans Drexler <[email protected]> wrote: > We are using Whirr to setup a rackspace cluster to run Hadoop jobs. We use > the Cloudera Hadoop. Below is our hadoop.properties > > > > whirr.cluster-name=our_cluster > > whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,6 > hadoop-datanode+hadoop-tasktracker > > whirr.provider=cloudservers-us > > whirr.identity=${env:RACKSPACE_USERNAME} > > whirr.credential=${env:RACKSPACE_API_KEY} > > whirr.hardware-id=6 > > whirr.image=49 > > whirr.login-user=user > > whirr.private-key-file=/home/user/.ssh/id_rsa_whirr > > whirr.public-key-file=/home/user/.ssh/id_rsa_whirr.pub > > whirr.hadoop-install-function=install_cdh_hadoop > > whirr.hadoop-configure-function=configure_cdh_hadoop > > > > > > All is working fine. But now I want to change the hadoop configuration file > on the nodes. Actually, we want to increase the amount of heap space > available to Hadoop (HADOOP_HEAPSIZE). So we want to change the > hadoop-env.sh file on each node. > > > > My Question is: How can I do that? Do I need to open the > lib/whirr-cdh-0.6.0-incubating.jar and tweak the contents of that jar, then > repackage it? > > > > I hope somebody can share some knowledge on this. Thanks! > >
The HADOOP_HEAPSIZE environment variable in hadoop-env.sh controls how much heap space each daemon (datanode, tasktracker etc) is assigned to. In addition to that task tracker launches separate child JVMs to run map and reduce tasks in. Each of these child JVM is given by default 200MB of maximum heap space. You can control this parameter by modifying mapred.map.child.java.opts=-Xmx500m mapred.reduce.child.java.opts=-Xmx500m You could also use mapred.child.java.opts but it didn't work for me. I hope this helps. MD
