Using HDFS locality. The workers call for the data from hdfs/queue etc. Unless you use parallelize then its sent from driver (typically on the master) to the worker nodes.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Tue, Jun 24, 2014 at 11:51 AM, srujana <srujana...@persistent.co.in> wrote: > Hi, > > I am working on auto scaling spark cluster. I would like to know how master > distributes the data to the slaves for processing in detail. > > Any information on this would be helpful. > > Thanks, > Srujana > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-data-is-distributed-while-processing-in-spark-cluster-tp8160.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >