Yeah, I believed that too.

The last being the jvm in which your driver runs.??? Isn't it in the 3
worker daemon, we have already considered.


On Sun, Jan 5, 2014 at 1:28 AM, Roshan Nair <[email protected]> wrote:

> I missed this. Its actually 1+3+3+1. The last being the jvm in which your
> driver runs.
>
> Roshan
> On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote:
>
>> Hi Archit,
>>
>> I believe its the last case - 1+3+3.
>>
>> From what I've seen its one jvm per worker per spark application.
>>
>> You will have multiple threads within a worker jvm working on different
>> partitions concurrently. The number of partitions that a worker handles
>> concurrently appears to be determined by the number of cores you've set the
>> worker(or app) to use.
>>
>> You'd have to save to disk and reload an RDD into memory between stages,
>> which is why spark won't do that.
>>
>> Roshan
>> On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]>
>> wrote:
>>
>>> A JVM reuse doubt.
>>> Lets say I have a job which has 5 stages:
>>> Each stage has 10 tasks(10 partitions) Each task has 3 transformation.
>>> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be
>>> launched?
>>>
>>> 1-Master Daemon 3-Worker Daemon
>>> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3
>>> machine, but trasformation done sequentially launching a JVM every
>>> transformation for each stage.)
>>> OR
>>> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine but
>>> different stage in different set of JVM)
>>> OR
>>> 1+3+5*3 (So, JVM will be reused for different partition on single
>>> machine but different stage in different set of JVM)
>>> OR
>>> 1+3+3 (So, One JVM per Worker in any case).
>>> OR
>>> none
>>>
>>> Thx,
>>> Archit_Thakur.
>>>
>>>
>>>

Reply via email to