Hi, I did it follow the Whirr documentation
$ cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.whirr $ rm -f /etc/hadoop-0.20/conf.whirr/*-site.xml $ cp ~/.whirr/myhadoopcluster/hadoop-site.xml /etc/hadoop-0.20/conf.whirr Just another (naive) question, can u explain to me about the role of local Hadoop in this installation, why we need to configure the local instance to work with the cluster? And what is the main advantage of Whirr over the ec2 scripts in the hadoop src/contrib/ec2? Thanks for your reply. On Mon, Apr 16, 2012 at 3:02 PM, Huanchen Zhang <[email protected]>wrote: > Hi, > > How did you configure the local hadoop? > > I just simply copied ~/.whirr/myhadoopcluster/hadoop-site.xml to my local > hadoop config folder, and it works. > > Best, > Huanchen > > On Apr 16, 2012, at 12:12 AM, Đỗ Hoàng Khiêm wrote: > > Hi, I am new to Whirr and I'm trying to setup a Hadoop cluster on EC2 with > Whirr,I have followed the tutorial on Cloudera > https://ccp.cloudera.com/display/CDHDOC/Whirr+Installation > > Before install Whirr, I install Hadoop (0.20.2-cdh3u3), then install Whirr > (0.5.0-cdh3u3) on my local machine (running Linux Mint 11). > > Here's my cluster config file > > whirr.cluster-name=large-cluster > whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,1 > hadoop-datanode+hadoop-tasktracker > whirr.provider=aws-ec2 > whirr.identity=XXXXXXXXXXXXXXX > whirr.credential=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > whirr.private-key-file=${sys:user.home}/.ssh/id_rsa > whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub > whirr.hadoop-install-function=install_cdh_hadoop > whirr.hadoop-configure-function=configure_cdh_hadoop > whirr.hardware-id=m1.large > whirr.image-id=us-east-1/ami-da0cf8b3 > whirr.location-id=us-east-1 > > The cluster lauching looks normally > > khiem@master ~ $ whirr launch-cluster --config large-hadoop.properties > Bootstrapping cluster > Configuring template > Starting 1 node(s) with roles [hadoop-datanode, hadoop-tasktracker] > Configuring template > Starting 1 node(s) with roles [hadoop-jobtracker, hadoop-namenode] > Nodes started: [[id=us-east-1/i-9aa01dfd, providerId=i-9aa01dfd, > group=large-cluster, name=null, location=[id=us-east-1a, scope=ZONE, > description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.196.142.64], > publicAddresses=[107.20.64.97], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]] > Nodes started: [[id=us-east-1/i-0aa31e6d, providerId=i-0aa31e6d, > group=large-cluster, name=null, location=[id=us-east-1a, scope=ZONE, > description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.85.130.43], > publicAddresses=[50.17.128.123], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]] > Authorizing firewall ingress to [Instance{roles=[hadoop-jobtracker, > hadoop-namenode], publicIp=50.17.128.123, privateIp=10.85.130.43, > id=us-east-1/i-0aa31e6d, nodeMetadata=[id=us-east-1/i-0aa31e6d, > providerId=i-0aa31e6d, group=large-cluster, name=null, > location=[id=us-east-1a, scope=ZONE, description=us-east-1a, > parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, > imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, > arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.85.130.43], > publicAddresses=[50.17.128.123], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]}] on ports [50070, 50030] for > [116.96.138.41/32] > Authorizing firewall ingress to [Instance{roles=[hadoop-jobtracker, > hadoop-namenode], publicIp=50.17.128.123, privateIp=10.85.130.43, > id=us-east-1/i-0aa31e6d, nodeMetadata=[id=us-east-1/i-0aa31e6d, > providerId=i-0aa31e6d, group=large-cluster, name=null, > location=[id=us-east-1a, scope=ZONE, description=us-east-1a, > parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, > imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, > arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.85.130.43], > publicAddresses=[50.17.128.123], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]}] on ports [8020, 8021] for > [50.17.128.123/32] > Running configuration script > Configuration script run completed > Running configuration script > Configuration script run completed > Completed configuration of large-cluster > Namenode web UI available at > http://ec2-50-17-128-123.compute-1.amazonaws.com:50070 > Jobtracker web UI available at > http://ec2-50-17-128-123.compute-1.amazonaws.com:50030 > Wrote Hadoop site file /home/khiem/.whirr/large-cluster/hadoop-site.xml > Wrote Hadoop proxy script /home/khiem/.whirr/large-cluster/hadoop-proxy.sh > Wrote instances file /home/khiem/.whirr/large-cluster/instances > Started cluster of 2 instances > Cluster{instances=[Instance{roles=[hadoop-datanode, hadoop-tasktracker], > publicIp=107.20.64.97, privateIp=10.196.142.64, id=us-east-1/i-9aa01dfd, > nodeMetadata=[id=us-east-1/i-9aa01dfd, providerId=i-9aa01dfd, > group=large-cluster, name=null, location=[id=us-east-1a, scope=ZONE, > description=us-east-1a, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.196.142.64], > publicAddresses=[107.20.64.97], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]}, Instance{roles=[hadoop-jobtracker, > hadoop-namenode], publicIp=50.17.128.123, privateIp=10.85.130.43, > id=us-east-1/i-0aa31e6d, nodeMetadata=[id=us-east-1/i-0aa31e6d, > providerId=i-0aa31e6d, group=large-cluster, name=null, > location=[id=us-east-1a, scope=ZONE, description=us-east-1a, > parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, > imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, > arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, privateAddresses=[10.85.130.43], > publicAddresses=[50.17.128.123], hardware=[id=m1.large, providerId=m1.large, > name=null, processors=[[cores=2.0, speed=2.0]], ram=7680, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false]], supportsImage=is64Bit()], > loginUser=ubuntu, userMetadata={}]}], > configuration={hadoop.job.ugi=root,root, > mapred.job.tracker=ec2-50-17-128-123.compute-1.amazonaws.com:8021, > hadoop.socks.server=localhost:6666, > fs.s3n.awsAccessKeyId=AKIAIGXAURLAB7CQE77A, > fs.s3.awsSecretAccessKey=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, > fs.s3.awsAccessKeyId=AZIAIGXIOPLAB7CQE77A, > hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.SocksSocketFactory, > fs.default.name=hdfs://ec2-50-17-128-123.compute-1.amazonaws.com:8020/, > fs.s3n.awsSecretAccessKey=dWDRq2z0EQegdPrbbL8Dab3eCu98O32r3gOrIbOK}} > > > > I've also started the proxy and update the local Hadoop configuration > follow Cloudera tutorial, but when I tried to test the HDFS with hadoop > fs -ls / > > the terminal prints connection error: > > 12/04/12 11:54:43 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found > in the classpath. Usage of hadoop-site.xml is deprecated. Instead use > core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of > core-default.xml, mapred-default.xml and hdfs-default.xml respectively > 12/04/12 11:54:43 INFO security.UserGroupInformation: JAAS Configuration > already set up for Hadoop, not re-installing. > 12/04/12 11:54:45 INFO ipc.Client: Retrying connect to server: > ec2-50-17-128-123.compute-1.amazonaws.com/50.17.128.123:8020. Already tried 0 > time(s). > 12/04/12 11:54:46 INFO ipc.Client: Retrying connect to server: > ec2-50-17-128-123.compute-1.amazonaws.com/50.17.128.123:8020. Already tried 1 > time(s). > 12/04/12 11:54:48 INFO ipc.Client: Retrying connect to server: > ec2-50-17-128-123.compute-1.amazonaws.com/50.17.128.123:8020. Already tried 2 > time(s). > 12/04/12 11:54:49 INFO ipc.Client: Retrying connect to server: > ec2-50-17-128-123.compute-1.amazonaws.com/50.17.128.123:8020. Already tried 3 > time(s) > > > > In the proxy terminal > > Running proxy to Hadoop cluster atec2-50-17-128-123.compute-1.amazonaws.com. > Use Ctrl-c to quit. > Warning: Permanently added > 'ec2-50-17-128-123.compute-1.amazonaws.com,50.17.128.123' (RSA) to the list > of known hosts. > channel 2: open failed: connect failed: Connection refused channel 2: open > failed: connect failed: Connection refused > channel 2: open failed: connect failed: Connection refused > channel 2: open failed: connect failed: Connection refused > channel 2: open failed: connect failed: Connection refused > > The namenode webUI (50070 port) also not available, I can ssh to the > namenode but inside the namenode, it looks like there's none of Hadoop or > Java installation, is this strange thing? > > Any comment is appreciated. > > . > > > > >
