Hi Archit, I believe its the last case - 1+3+3.
>From what I've seen its one jvm per worker per spark application. You will have multiple threads within a worker jvm working on different partitions concurrently. The number of partitions that a worker handles concurrently appears to be determined by the number of cores you've set the worker(or app) to use. You'd have to save to disk and reload an RDD into memory between stages, which is why spark won't do that. Roshan On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]> wrote: > A JVM reuse doubt. > Lets say I have a job which has 5 stages: > Each stage has 10 tasks(10 partitions) Each task has 3 transformation. > My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be launched? > > 1-Master Daemon 3-Worker Daemon > JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3 > machine, but trasformation done sequentially launching a JVM every > transformation for each stage.) > OR > 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine but > different stage in different set of JVM) > OR > 1+3+5*3 (So, JVM will be reused for different partition on single machine > but different stage in different set of JVM) > OR > 1+3+3 (So, One JVM per Worker in any case). > OR > none > > Thx, > Archit_Thakur. > > >
