The plan is to create an EC2 cluster and run the (py) spark on it. Input data is from s3, output data goes to an hbase in a persistent cluster (also EC2). My questions are:
1. I need to install some software packages on all the workers (sudo apt-get install ...). Is there a better way to do this than going to every node to manually install them? 2. I assume the spark can access the hbase which is in a different cluster. Am I correct? if yes, how? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/EC2-cluster-set-up-and-access-to-HBase-in-a-different-cluster-tp16622.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org