Can you please elaborate, I didn't get what you intended for me to read in that link.
Regards. On Mon, Oct 20, 2014 at 7:03 PM, Saurabh Wadhawan < saurabh.wadha...@guavus.com> wrote: > What about: > > > http://mail-archives.apache.org/mod_mbox/spark-user/201310.mbox/%3CCAF_KkPwk7iiQVD2JzOwVVhQ_U2p3bPVM=-bka18v4s-5-lp...@mail.gmail.com%3E > > > Regards > - Saurabh Wadhawan > > > > On 20-Oct-2014, at 4:56 pm, Kamal Banga <banga.ka...@gmail.com> wrote: > > 1. All RDD operations are executed in workers. So reading a text file > or executing val x = 1 will happen on worker. (link > <http://stackoverflow.com/questions/24637312/spark-driver-in-apache-spark>) > > > 2. > a. Without braodcast: Let's say you have 'n' nodes. You can set hadoop's > replication factor to n and it will replicate that data across all nodes. > b. With broadcast: using sc.broadcast() should do it. (link > <http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables> > ) > > On Mon, Oct 20, 2014 at 1:18 AM, Saurabh Wadhawan < > saurabh.wadha...@guavus.com> wrote: > >> Any response for this? >> >> 1. How do I know what statements will be executed on worker side out of >> the spark script in a stage. >> e.g. if I have >> val x = 1 (or any other code) >> in my driver code, will the same statements be executed on the worker >> side in a stage? >> >> 2. How can I do a map side join in spark : >> a. without broadcast(i.e. by reading a file once in each executor) >> b. with broadcast but by broadcasting complete RDD to each executor >> >> Regards >> - Saurabh Wadhawan >> >> >> >> On 19-Oct-2014, at 1:54 am, Saurabh Wadhawan < >> saurabh.wadha...@guavus.com> wrote: >> >> Hi, >> >> I have following questions: >> >> 1. When I write a spark script, how do I know what part runs on the >> driver side and what runs on the worker side. >> So lets say, I write code to to read a plain text file. >> Will it run on driver side only or will it run on server side only >> or on both sides >> >> 2. If I want each worker to load a file for lets say join and the file >> is pretty huge lets say in GBs, so that I don't want to broadcast it, then >> what's the best way to do it. >> Another way to say the same thing would be how do I load a data >> structure for fast lookup(and not an RDD) on each worker node in the >> executor >> >> Regards >> - Saurabh >> >> >> > >