Hello,
I have Spark 1.3.1 running well on EC2 with ephemeral hdfs using the
spark-ec2 script, quite happy with it.
I want to switch to persistent-hdfs in order to be able to maintain data
between cluster stop/starts. Unfortunately spark-ec stop/start causes spark
to revert back from persistent to ephemeral hdfs - it changes the HDFS_URL
environment variable and several others back to ephemeral.
I managed to get Spark with persistent-hdfs running once by grep'ing for the
ephemeral HDFS address and ports in all files and changing them to
persistent port (9000 to 9010). All was working. Then I stopped the cluster
and started it again and now I can't get persistent to work anymore...
Here are some of the configurations I've setup:
env: HDFS_HOME=/root/persistent-hdfs
env: HDFS_URL=hdfs://xxx.ec2.internal:9010
mapreduce/conf/core-site.xml:
hdfs://ec2-xx.compute-1.amazonaws.com:9010
persistent-hdfs/conf/core-site.xml:
hdfs://ec2-xxx.compute-1.amazonaws.com:9010
spark/conf/core-site.xml:
hdfs://ec2-xxx.compute-1.amazonaws.com:9010
I've restarted the daemons using persistent-hdfs/bin/stop-all.sh ;
start-all.sh .
I can use the "hadoop" command to interact with persistent-hdfs - "hadoop fs
-ls" works, as do other hadoop fs commands.
However, when I start the python or scala shell and try to access HDFS I run
into the following issue:
Py4JJavaError: An error occurred while calling o25.load.
: java.lang.RuntimeException: java.net.ConnectException: Call to
xxx.compute-1.amazonaws.com/12.12.12.133:9000 failed on connection
exception: java.net.ConnectException: Connection refused
Note the port: it's 9000, as in ephemeral HDFS, instead of 9010 for
persistent.
Any ideas? What configuration am I missing to get pyshell / scala shell to
use persistent instead of ephemeral?
Best,
Tony
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Required-settings-for-permanent-HDFS-Spark-on-EC2-tp22860.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org