Many thanks for the replies.
The way I currently have my set up is as follows;
6 nodes running Hadoop with each node having approximately 5GB of data.
Launched a Spark Master (and Shark via ./shark) on one of the Hadoop nodes
and launched 5 worker Spark nodes on the remaining 5 Hadoop nodes.
So
If you intend to run Hadoop mapReduce and Spark on the same cluster
concurrently, and you have enough memory on the jobtracker master, then you can
run the Spark master (for standalone as Raymond mentions) on the same node .
This is not necessary but more for convenience so you only have so ssh
Not sure what did you aim to solve. When you mention Spark Master, I guess you
probably mean spark standalone mode? In that case spark cluster does not
necessary coupled with hadoop cluster. While if you aim to achieve better data
locality , then yes, run spark worker on HDFS data node might hel