The plan is to create an EC2 cluster and run the (py) spark on it. Input data
is from s3, output data goes to an hbase in a persistent cluster (also EC2).
My questions are:

1. I need to install some software packages on all the workers (sudo apt-get
install ...). Is there a better way to do this than going to every node to
manually install them?

2. I assume the spark can access the hbase which is in a different cluster.
Am I correct? if yes, how?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/EC2-cluster-set-up-and-access-to-HBase-in-a-different-cluster-tp16622.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to