Hi all, I'm trying to launch an EC2 cluster using spark-ec2 script, and it seems that the script fails to configure HDFS properly. What's most puzzling is that it did work perfectly on Sunday. Here's the command i'm using:
./spark-1.1.0-bin-hadoop2.4/ec2/spark-ec2 \ -k xxxxxxxxxxxxxx \ -i xxxxxxxxxxxxxx \ -s 5 --instance-type c3.xlarge --spot-price=0.15 \ --spark-version=1.0.0 --region us-west-2 \ launch xxxxxxxxxxxxxx When i used this command on Sunday (the only difference was the number of slaves), it ran for a few minutes without any intervention and created a cluster with working HDFS set up across all nodes. Today the results are quite different: Firstly, i can see an error about JAVA_HOME in the script output when launching ephemeral HDFS - here's the relevant part of the output: RSYNC'ing /root/ephemeral-hdfs/conf to slaves... xxxxxxxxxxxxxx xxxxxxxxxxxxxx Formatting ephemeral HDFS namenode... 14/10/08 09:48:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ip-xxxxxxxxxxxxxx STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ 14/10/08 09:48:09 INFO util.GSet: VM type = 64-bit 14/10/08 09:48:09 INFO util.GSet: 2% max memory = 17.78 MB 14/10/08 09:48:09 INFO util.GSet: capacity = 2^21 = 2097152 entries 14/10/08 09:48:09 INFO util.GSet: recommended=2097152, actual=2097152 14/10/08 09:48:09 INFO namenode.FSNamesystem: fsOwner=root 14/10/08 09:48:09 INFO namenode.FSNamesystem: supergroup=supergroup 14/10/08 09:48:09 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/10/08 09:48:09 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 14/10/08 09:48:09 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 14/10/08 09:48:09 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/10/08 09:48:09 INFO common.Storage: Image file of size 110 saved in 0 seconds. 14/10/08 09:48:09 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. 14/10/08 09:48:09 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ip-xxxxxxxxxxxxxx ************************************************************/ Starting ephemeral HDFS... ./ephemeral-hdfs/setup.sh: line 31: /root/ephemeral-hdfs/sbin/start-dfs.sh: No such file or directory starting namenode, logging to /root/ephemeral-hdfs/libexec/../logs/hadoop-root-namenode-.out localhost: starting datanode, logging to /root/ephemeral-hdfs/libexec/../logs/hadoop-root-datanode-ip-xxxxxxxxxxxxxx.out localhost: Error: JAVA_HOME is not set. localhost: starting secondarynamenode, logging to /root/ephemeral-hdfs/libexec/../logs/hadoop-root-secondarynamenode-ip-xxxxxxxxxxxxxx.out localhost: Error: JAVA_HOME is not set. (unfortunately i don't have the log from Sunday to compare...) Secondly, right after that i'm asked about formattig persistent HDFS: Setting up persistent-hdfs ~/spark-ec2/persistent-hdfs ~/spark-ec2 Pseudo-terminal will not be allocated because stdin is not a terminal. Pseudo-terminal will not be allocated because stdin is not a terminal. RSYNC'ing /root/persistent-hdfs/conf to slaves... ec2-54-68-238-63.us-west-2.compute.amazonaws.com ec2-54-68-15-31.us-west-2.compute.amazonaws.com Formatting persistent HDFS namenode... 14/10/08 10:11:05 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ip-172-31-5-156/172.31.5.156 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) And i haven't been asked this question at all when running this script earlier! Regardless of what i answer, the end result is that the HDFS in the cluster is not working correctly, i.e. it seems that instead of being set up across all nodes, it points to a local directory on master node. For example, when i ssh to cluster master and run `/root/ephemeral-hdfs/bin/hadoop fs -du .` i should see paths like hdfs://namenode-ip:9000/folder-on-hdfs hdfs://namenode-ip:9000/file-on-hdfs etc. but i see file:/root/.... and it seems that the elements listed there are identical to contents of /root dir in local filesystem. Why the script could be behaving differrently than before, and what can i do to fix this? best, Jan Warchoł -- *Jan Warchoł* *Data** Engineer* ----------------------------------------- M: +48 509 078 203 E: jan.warc...@codilime.com ----------------------------------------- CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland, 01-167 Warsaw, ul. Zawiszy 14/97. Registered by The District Court for the Capital City of Warsaw, XII Commercial Department of the National Court Register. Entered into National Court Register under No. KRS 0000388871. Tax identification number (NIP) 5272657478. Statistical number (REGON) 142974628. -----------------------------------------