Many thanks for the replies. The way I currently have my set up is as follows; 6 nodes running Hadoop with each node having approximately 5GB of data. Launched a Spark Master (and Shark via ./shark) on one of the Hadoop nodes and launched 5 worker Spark nodes on the remaining 5 Hadoop nodes.
So I'm assuming the setup above constitutes a Standalone deployment? and from reading the documentation, it was mentioned that it's best to have Spark as close as possible to the HDFS so hence my choice in using the mentioned setup. Is that the best way to setup Spark to ensure data locality? and would there be any benefits in running Mesos in that setup as well? Thanks Majd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680p714.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
