Hello, I have Spark 1.3.1 running well on EC2 with ephemeral hdfs using the spark-ec2 script, quite happy with it.
I want to switch to persistent-hdfs in order to be able to maintain data between cluster stop/starts. Unfortunately spark-ec stop/start causes spark to revert back from persistent to ephemeral hdfs - it changes the HDFS_URL environment variable and several others back to ephemeral. I managed to get Spark with persistent-hdfs running once by grep'ing for the ephemeral HDFS address and ports in all files and changing them to persistent port (9000 to 9010). All was working. Then I stopped the cluster and started it again and now I can't get persistent to work anymore... Here are some of the configurations I've setup: env: HDFS_HOME=/root/persistent-hdfs env: HDFS_URL=hdfs://xxx.ec2.internal:9010 mapreduce/conf/core-site.xml: <value>hdfs://ec2-xx.compute-1.amazonaws.com:9010</value> persistent-hdfs/conf/core-site.xml: <value>hdfs://ec2-xxx.compute-1.amazonaws.com:9010</value> spark/conf/core-site.xml: <value>hdfs://ec2-xxx.compute-1.amazonaws.com:9010</value> I've restarted the daemons using persistent-hdfs/bin/stop-all.sh ; start-all.sh . I can use the "hadoop" command to interact with persistent-hdfs - "hadoop fs -ls" works, as do other hadoop fs commands. However, when I start the python or scala shell and try to access HDFS I run into the following issue: Py4JJavaError: An error occurred while calling o25.load. : java.lang.RuntimeException: java.net.ConnectException: Call to xxx.compute-1.amazonaws.com/12.12.12.133:9000 failed on connection exception: java.net.ConnectException: Connection refused Note the port: it's 9000, as in ephemeral HDFS, instead of 9010 for persistent. Any ideas? What configuration am I missing to get pyshell / scala shell to use persistent instead of ephemeral? Best, Tony -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Required-settings-for-permanent-HDFS-Spark-on-EC2-tp22860.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
