Oh, you meant main driver. Yes, correct.

On Sun, Jan 5, 2014 at 1:36 AM, Archit Thakur <[email protected]>wrote:

> Yeah, I believed that too.
>
> The last being the jvm in which your driver runs.??? Isn't it in the 3
> worker daemon, we have already considered.
>
>
> On Sun, Jan 5, 2014 at 1:28 AM, Roshan Nair <[email protected]> wrote:
>
>> I missed this. Its actually 1+3+3+1. The last being the jvm in which your
>> driver runs.
>>
>> Roshan
>> On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote:
>>
>>> Hi Archit,
>>>
>>> I believe its the last case - 1+3+3.
>>>
>>> From what I've seen its one jvm per worker per spark application.
>>>
>>> You will have multiple threads within a worker jvm working on different
>>> partitions concurrently. The number of partitions that a worker handles
>>> concurrently appears to be determined by the number of cores you've set the
>>> worker(or app) to use.
>>>
>>> You'd have to save to disk and reload an RDD into memory between stages,
>>> which is why spark won't do that.
>>>
>>> Roshan
>>> On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]>
>>> wrote:
>>>
>>>> A JVM reuse doubt.
>>>> Lets say I have a job which has 5 stages:
>>>> Each stage has 10 tasks(10 partitions) Each task has 3 transformation.
>>>> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be
>>>> launched?
>>>>
>>>> 1-Master Daemon 3-Worker Daemon
>>>> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3
>>>> machine, but trasformation done sequentially launching a JVM every
>>>> transformation for each stage.)
>>>> OR
>>>> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine
>>>> but different stage in different set of JVM)
>>>> OR
>>>> 1+3+5*3 (So, JVM will be reused for different partition on single
>>>> machine but different stage in different set of JVM)
>>>> OR
>>>> 1+3+3 (So, One JVM per Worker in any case).
>>>> OR
>>>> none
>>>>
>>>> Thx,
>>>> Archit_Thakur.
>>>>
>>>>
>>>>
>

Reply via email to