Re: Required settings for permanent HDFS Spark on EC2

2015-06-05 Thread Nicholas Chammas
If your problem is that stopping/starting the cluster resets configs, then
you may be running into this issue:

https://issues.apache.org/jira/browse/SPARK-4977

Nick

On Thu, Jun 4, 2015 at 2:46 PM barmaley  wrote:

> Hi - I'm having similar problem with switching from ephemeral to persistent
> HDFS - it always looks for 9000 port regardless of options I set for 9010
> persistent HDFS. Have you figured out a solution? Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Required-settings-for-permanent-HDFS-Spark-on-EC2-tp22860p23157.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Required settings for permanent HDFS Spark on EC2

2015-06-04 Thread barmaley
Hi - I'm having similar problem with switching from ephemeral to persistent
HDFS - it always looks for 9000 port regardless of options I set for 9010
persistent HDFS. Have you figured out a solution? Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Required-settings-for-permanent-HDFS-Spark-on-EC2-tp22860p23157.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Required settings for permanent HDFS Spark on EC2

2015-05-12 Thread darugar
Hello,

I have Spark 1.3.1 running well on EC2 with ephemeral hdfs using the
spark-ec2 script, quite happy with it.

I want to switch to persistent-hdfs in order to be able to maintain data
between cluster stop/starts. Unfortunately spark-ec stop/start causes spark
to revert back from persistent to ephemeral hdfs - it changes the HDFS_URL
environment variable and several others back to ephemeral.

I managed to get Spark with persistent-hdfs running once by grep'ing for the
ephemeral HDFS address and ports in all files and changing them to
persistent port (9000 to 9010). All was working. Then I stopped the cluster
and started it again and now I can't get persistent to work anymore...

Here are some of the configurations I've setup:

env: HDFS_HOME=/root/persistent-hdfs
env: HDFS_URL=hdfs://xxx.ec2.internal:9010
mapreduce/conf/core-site.xml:   
hdfs://ec2-xx.compute-1.amazonaws.com:9010
persistent-hdfs/conf/core-site.xml:   
hdfs://ec2-xxx.compute-1.amazonaws.com:9010
spark/conf/core-site.xml:   
hdfs://ec2-xxx.compute-1.amazonaws.com:9010

I've restarted the daemons using persistent-hdfs/bin/stop-all.sh ;
start-all.sh .

I can use the "hadoop" command to interact with persistent-hdfs - "hadoop fs
-ls" works, as do other hadoop fs commands.

However, when I start the python or scala shell and try to access HDFS I run
into the following issue:

Py4JJavaError: An error occurred while calling o25.load.
: java.lang.RuntimeException: java.net.ConnectException: Call to
xxx.compute-1.amazonaws.com/12.12.12.133:9000 failed on connection
exception: java.net.ConnectException: Connection refused

Note the port: it's 9000, as in ephemeral HDFS, instead of 9010 for
persistent.

Any ideas? What configuration am I missing to get pyshell / scala shell to
use persistent instead of ephemeral?

Best,

Tony



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Required-settings-for-permanent-HDFS-Spark-on-EC2-tp22860.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org