Re: What executes on worker and what executes on driver side

Kamal Banga Tue, 28 Oct 2014 07:24:02 -0700

Can you please elaborate, I didn't get what you intended for me to read in
that link.


Regards.

On Mon, Oct 20, 2014 at 7:03 PM, Saurabh Wadhawan <
saurabh.wadha...@guavus.com> wrote:

>  What about:
>
>
> http://mail-archives.apache.org/mod_mbox/spark-user/201310.mbox/%3CCAF_KkPwk7iiQVD2JzOwVVhQ_U2p3bPVM=-bka18v4s-5-lp...@mail.gmail.com%3E
>
>
> Regards
>  - Saurabh Wadhawan
>
>
>
>  On 20-Oct-2014, at 4:56 pm, Kamal Banga <banga.ka...@gmail.com> wrote:
>
>  1.  All RDD operations are executed in workers. So reading a text file
> or executing val x = 1 will happen on worker. (link
> <http://stackoverflow.com/questions/24637312/spark-driver-in-apache-spark>)
>
>
>  2.
> a. Without braodcast: Let's say you have 'n' nodes. You can set hadoop's
> replication factor to n and it will replicate that data across all nodes.
> b. With broadcast: using sc.broadcast() should do it. (link
> <http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables>
> )
>
> On Mon, Oct 20, 2014 at 1:18 AM, Saurabh Wadhawan <
> saurabh.wadha...@guavus.com> wrote:
>
>> Any response for this?
>>
>>  1. How do I know what statements will be executed on worker side out of
>> the spark script in a stage.
>>     e.g. if I have
>>     val x = 1 (or any other code)
>>     in my driver code, will the same statements be executed on the worker
>> side in a stage?
>>
>> 2. How can I do a map side join in spark :
>>    a. without broadcast(i.e. by reading a file once in each executor)
>>    b. with broadcast but by broadcasting complete RDD to each executor
>>
>> Regards
>>  - Saurabh Wadhawan
>>
>>
>>
>>  On 19-Oct-2014, at 1:54 am, Saurabh Wadhawan <
>> saurabh.wadha...@guavus.com> wrote:
>>
>>  Hi,
>>
>>   I have following questions:
>>
>>  1. When I write a spark script, how do I know what part runs on the
>> driver side and what runs on the worker side.
>>      So lets say, I write code to to read a plain text file.
>>      Will it run on driver side only or will it run on server side only
>> or on both sides
>>
>>  2. If I want each worker to load a file for lets say join and the file
>> is pretty huge lets say in GBs, so that I don't want to broadcast it, then
>> what's the best way to do it.
>>       Another way to say the same thing would be how do I load a data
>> structure for fast lookup(and not an RDD) on each worker node in the
>> executor
>>
>>  Regards
>> - Saurabh
>>
>>
>>
>
>

Re: What executes on worker and what executes on driver side

Reply via email to