I missed this. Its actually 1+3+3+1. The last being the jvm in which your
driver runs.

Roshan
On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote:

> Hi Archit,
>
> I believe its the last case - 1+3+3.
>
> From what I've seen its one jvm per worker per spark application.
>
> You will have multiple threads within a worker jvm working on different
> partitions concurrently. The number of partitions that a worker handles
> concurrently appears to be determined by the number of cores you've set the
> worker(or app) to use.
>
> You'd have to save to disk and reload an RDD into memory between stages,
> which is why spark won't do that.
>
> Roshan
> On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]> wrote:
>
>> A JVM reuse doubt.
>> Lets say I have a job which has 5 stages:
>> Each stage has 10 tasks(10 partitions) Each task has 3 transformation.
>> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be
>> launched?
>>
>> 1-Master Daemon 3-Worker Daemon
>> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3
>> machine, but trasformation done sequentially launching a JVM every
>> transformation for each stage.)
>> OR
>> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine but
>> different stage in different set of JVM)
>> OR
>> 1+3+5*3 (So, JVM will be reused for different partition on single machine
>> but different stage in different set of JVM)
>> OR
>> 1+3+3 (So, One JVM per Worker in any case).
>> OR
>> none
>>
>> Thx,
>> Archit_Thakur.
>>
>>
>>

Reply via email to