Thanks for your reply. We installed munin on the nodes and noticed that not all memory was being used by hadoop. So we think we can make it faster by allocating more memory. But maybe we have been changing the wrong paremeters.
By the way, technically we did succeed in increasing the HADOOP_HEAPSIZE setting by changing the whirr-cdh-0.6.0-incubating.jar file inside the jar. But I still have the feeling we do it the "wrong" way. PPS. We use this home-grown shell script to deploy munin on all hadoop nodes. It reads a file cluster-nodes.txt that must contain ip-addres (or full host name) and password of each node. Munin helps us keeping tabs on what goes on at the cluster nodes. Maybe somebody can use it. Any remarks welcome (yes, I know sshpass is dirty!) #!/bin/sh while read line; do set -- $line host=$1 pass=$2 hostname=`echo $host | tr '.' "-"` echo "Installing munin node on $host" sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no /usr/bin/aptitude -y install munin-node < /dev/null sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no '/bin/echo "allow ^50\.57\.191\.88$" >> /etc/munin/munin-node.conf' < /dev/null sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no '/usr/sbin/service munin-node restart' < /dev/null echo "Adding $hostname to local config" rm "/etc/munin/munin-conf.d/conf-$hostname.conf" echo "[$hostname.localdomain]" > "/etc/munin/munin-conf.d/conf-$hostname.conf" echo "\taddress $host" >> "/etc/munin/munin-conf.d/conf-$hostname.conf" echo "\tuse_node_name yes" >> "/etc/munin/munin-conf.d/conf-$hostname.conf" echo "Done installing on $host" done < cluster-nodes.txt Kind regards, Hans -----Original Message----- From: Marco Didonna [mailto:[email protected]] Sent: vrijdag 16 december 2011 15:52 To: [email protected] Subject: Re: Setting Hadoop head size On 16 December 2011 12:49, Hans Drexler <[email protected]> wrote: > We are using Whirr to setup a rackspace cluster to run Hadoop jobs. We use > the Cloudera Hadoop. Below is our hadoop.properties > > > > whirr.cluster-name=our_cluster > > whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,6 > hadoop-datanode+hadoop-tasktracker > > whirr.provider=cloudservers-us > > whirr.identity=${env:RACKSPACE_USERNAME} > > whirr.credential=${env:RACKSPACE_API_KEY} > > whirr.hardware-id=6 > > whirr.image=49 > > whirr.login-user=user > > whirr.private-key-file=/home/user/.ssh/id_rsa_whirr > > whirr.public-key-file=/home/user/.ssh/id_rsa_whirr.pub > > whirr.hadoop-install-function=install_cdh_hadoop > > whirr.hadoop-configure-function=configure_cdh_hadoop > > > > > > All is working fine. But now I want to change the hadoop configuration file > on the nodes. Actually, we want to increase the amount of heap space > available to Hadoop (HADOOP_HEAPSIZE). So we want to change the > hadoop-env.sh file on each node. > > > > My Question is: How can I do that? Do I need to open the > lib/whirr-cdh-0.6.0-incubating.jar and tweak the contents of that jar, then > repackage it? > > > > I hope somebody can share some knowledge on this. Thanks! > > The HADOOP_HEAPSIZE environment variable in hadoop-env.sh controls how much heap space each daemon (datanode, tasktracker etc) is assigned to. In addition to that task tracker launches separate child JVMs to run map and reduce tasks in. Each of these child JVM is given by default 200MB of maximum heap space. You can control this parameter by modifying mapred.map.child.java.opts=-Xmx500m mapred.reduce.child.java.opts=-Xmx500m You could also use mapred.child.java.opts but it didn't work for me. I hope this helps. MD
