Thanks for your reply. We installed munin on the nodes and noticed that not all 
memory was being used by hadoop. So we think we can make it faster by 
allocating more memory. But maybe we have been changing the wrong paremeters.

By the way, technically we did succeed in increasing the HADOOP_HEAPSIZE 
setting by changing the whirr-cdh-0.6.0-incubating.jar file inside the jar. But 
I still have the feeling we do it the "wrong" way.

PPS. We use this home-grown shell script to deploy munin on all hadoop nodes. 
It reads a file cluster-nodes.txt that must contain ip-addres (or full host 
name) and password of each node. Munin helps us keeping tabs on what goes on at 
the cluster nodes. Maybe somebody can use it. Any remarks welcome (yes, I know 
sshpass is dirty!)



#!/bin/sh
while read line; do
  set -- $line
  host=$1
  pass=$2
  hostname=`echo $host | tr '.' "-"`
  echo "Installing munin node on $host"
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no 
/usr/bin/aptitude -y install munin-node < /dev/null
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no '/bin/echo 
"allow ^50\.57\.191\.88$" >> /etc/munin/munin-node.conf' < /dev/null
  sshpass -p "$pass" ssh "root@$host" -o StrictHostKeyChecking=no 
'/usr/sbin/service munin-node restart' < /dev/null
  echo "Adding $hostname to local config"
  rm "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "[$hostname.localdomain]" > "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "\taddress $host" >> "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "\tuse_node_name yes" >> "/etc/munin/munin-conf.d/conf-$hostname.conf"
  echo "Done installing on $host"
done < cluster-nodes.txt



Kind regards, 

Hans

-----Original Message-----
From: Marco Didonna [mailto:[email protected]] 
Sent: vrijdag 16 december 2011 15:52
To: [email protected]
Subject: Re: Setting Hadoop head size

On 16 December 2011 12:49, Hans Drexler <[email protected]> wrote:
> We are using Whirr to setup a rackspace cluster to run Hadoop jobs. We use
> the Cloudera Hadoop. Below is our hadoop.properties
>
>
>
> whirr.cluster-name=our_cluster
>
> whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,6
> hadoop-datanode+hadoop-tasktracker
>
> whirr.provider=cloudservers-us
>
> whirr.identity=${env:RACKSPACE_USERNAME}
>
> whirr.credential=${env:RACKSPACE_API_KEY}
>
> whirr.hardware-id=6
>
> whirr.image=49
>
> whirr.login-user=user
>
> whirr.private-key-file=/home/user/.ssh/id_rsa_whirr
>
> whirr.public-key-file=/home/user/.ssh/id_rsa_whirr.pub
>
> whirr.hadoop-install-function=install_cdh_hadoop
>
> whirr.hadoop-configure-function=configure_cdh_hadoop
>
>
>
>
>
> All is working fine. But now I want to change the hadoop configuration file
> on the nodes. Actually, we want to increase the amount of heap space
> available to Hadoop (HADOOP_HEAPSIZE). So we want to change the
> hadoop-env.sh file on each node.
>
>
>
> My Question is: How can I do that? Do I need to open the
> lib/whirr-cdh-0.6.0-incubating.jar and tweak the contents of that jar, then
> repackage it?
>
>
>
> I hope somebody can share some knowledge on this. Thanks!
>
>


The HADOOP_HEAPSIZE environment variable in hadoop-env.sh controls how
much heap space each daemon (datanode, tasktracker etc) is assigned
to. In addition to that task tracker launches separate child JVMs to
run map and reduce tasks in. Each of these child JVM is given by
default 200MB of maximum heap space. You can control this parameter by
modifying

mapred.map.child.java.opts=-Xmx500m
mapred.reduce.child.java.opts=-Xmx500m

You could also use mapred.child.java.opts but it didn't work for me.

I hope this helps.

MD

Reply via email to