Hi all,

I'm trying to launch an EC2 cluster using spark-ec2 script, and it seems
that the script fails to configure HDFS properly.  What's most puzzling is
that it did work perfectly on Sunday.  Here's the command i'm using:

./spark-1.1.0-bin-hadoop2.4/ec2/spark-ec2 \
-k xxxxxxxxxxxxxx \
-i xxxxxxxxxxxxxx \
-s 5 --instance-type c3.xlarge --spot-price=0.15 \
--spark-version=1.0.0 --region us-west-2 \
launch xxxxxxxxxxxxxx

When i used this command on Sunday (the only difference was the number of
slaves), it ran for a few minutes without any intervention and created a
cluster with working HDFS set up across all nodes.  Today the results are
quite different:

Firstly, i can see an error about JAVA_HOME in the script output when
launching ephemeral HDFS - here's the relevant part of the output:

RSYNC'ing /root/ephemeral-hdfs/conf to slaves...
xxxxxxxxxxxxxx
xxxxxxxxxxxxxx
Formatting ephemeral HDFS namenode...
14/10/08 09:48:09 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ip-xxxxxxxxxxxxxx
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
14/10/08 09:48:09 INFO util.GSet: VM type       = 64-bit
14/10/08 09:48:09 INFO util.GSet: 2% max memory = 17.78 MB
14/10/08 09:48:09 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/10/08 09:48:09 INFO util.GSet: recommended=2097152, actual=2097152
14/10/08 09:48:09 INFO namenode.FSNamesystem: fsOwner=root
14/10/08 09:48:09 INFO namenode.FSNamesystem: supergroup=supergroup
14/10/08 09:48:09 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/10/08 09:48:09 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/10/08 09:48:09 INFO namenode.FSNamesystem: isAccessTokenEnabled=false
accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/10/08 09:48:09 INFO namenode.NameNode: Caching file names occuring more
than 10 times
14/10/08 09:48:09 INFO common.Storage: Image file of size 110 saved in 0
seconds.
14/10/08 09:48:09 INFO common.Storage: Storage directory
/tmp/hadoop-root/dfs/name has been successfully formatted.
14/10/08 09:48:09 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-xxxxxxxxxxxxxx
************************************************************/
Starting ephemeral HDFS...
./ephemeral-hdfs/setup.sh: line 31: /root/ephemeral-hdfs/sbin/start-dfs.sh:
No such file or directory
starting namenode, logging to
/root/ephemeral-hdfs/libexec/../logs/hadoop-root-namenode-.out
localhost: starting datanode, logging to
/root/ephemeral-hdfs/libexec/../logs/hadoop-root-datanode-ip-xxxxxxxxxxxxxx.out
localhost: Error: JAVA_HOME is not set.
localhost: starting secondarynamenode, logging to
/root/ephemeral-hdfs/libexec/../logs/hadoop-root-secondarynamenode-ip-xxxxxxxxxxxxxx.out
localhost: Error: JAVA_HOME is not set.

(unfortunately i don't have the log from Sunday to compare...)

Secondly, right after that i'm asked about formattig persistent HDFS:

Setting up persistent-hdfs
~/spark-ec2/persistent-hdfs ~/spark-ec2
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
RSYNC'ing /root/persistent-hdfs/conf to slaves...
ec2-54-68-238-63.us-west-2.compute.amazonaws.com
ec2-54-68-15-31.us-west-2.compute.amazonaws.com
Formatting persistent HDFS namenode...
14/10/08 10:11:05 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ip-172-31-5-156/172.31.5.156
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N)

And i haven't been asked this question at all when running this script
earlier!

Regardless of what i answer, the end result is that the HDFS in the cluster
is not working correctly, i.e. it seems that instead of being set up across
all nodes, it points to a local directory on master node.  For example,
when i ssh to cluster master and run `/root/ephemeral-hdfs/bin/hadoop fs
-du .` i should see paths like

hdfs://namenode-ip:9000/folder-on-hdfs
hdfs://namenode-ip:9000/file-on-hdfs
etc.

but i see

file:/root/....

and it seems that the elements listed there are identical to contents of
/root dir in local filesystem.

Why the script could be behaving differrently than before, and what can i
do to fix this?

best,
Jan Warchoł

-- 
*Jan Warchoł*
*Data** Engineer*


-----------------------------------------
M: +48 509 078 203
E: jan.warc...@codilime.com
-----------------------------------------

CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland,
01-167 Warsaw, ul. Zawiszy 14/97. Registered by The District Court for the
Capital City of Warsaw, XII Commercial Department of the National Court
Register. Entered into National Court Register under No. KRS 0000388871.
Tax identification number (NIP) 5272657478. Statistical number (REGON)
142974628.

-----------------------------------------

Reply via email to