That depends. By default, the tasks are launched with location preference. So if there is not free slot currently available on Node 1, Spark will wait for a free slot. However if enable delay scheduler (see config property spark.locality.wait), then it may launch tasks on other machines with free slots, and pull the data over the network.
On Tue, Jan 28, 2014 at 7:07 PM, Annamalai, Sai IN BLR STS < [email protected]> wrote: > à RDD's are cached, say RDD1 is cached in NODE 1. It was discussed in the > RDD paper that distributed shared memory was compared against. > > So is it that if NODE 2 is free with slot and worker in NODE 2 can > directly access mem copy of RDD1 at NODE 1 or is a transfer via network is > inevitable??? > > > > > > Regards, > > Sai Prasanna. > > Siemens Corporate Research Technology, Bangalore. > > >
