Data locality during Spark RDD creation

Debasish Das Fri, 03 Jan 2014 11:35:21 -0800

Hi,

I have HDFS and MapReduce running on 20 nodes and a experimental spark
cluster running on subset of the HDFS node (say 8 of them).


If some ETL is done using MR most likely the data will be replicated across
all 20 nodes (assuming I used all the nodes).

Is it a good idea to run spark cluster on all 20 nodes where HDFS is
running so that all the RDDs are data local and the bulk data transfer is
minimized ?

Thanks.
Deb

Data locality during Spark RDD creation

Reply via email to