I missed this. Its actually 1+3+3+1. The last being the jvm in which your driver runs.
Roshan On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote: > Hi Archit, > > I believe its the last case - 1+3+3. > > From what I've seen its one jvm per worker per spark application. > > You will have multiple threads within a worker jvm working on different > partitions concurrently. The number of partitions that a worker handles > concurrently appears to be determined by the number of cores you've set the > worker(or app) to use. > > You'd have to save to disk and reload an RDD into memory between stages, > which is why spark won't do that. > > Roshan > On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]> wrote: > >> A JVM reuse doubt. >> Lets say I have a job which has 5 stages: >> Each stage has 10 tasks(10 partitions) Each task has 3 transformation. >> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be >> launched? >> >> 1-Master Daemon 3-Worker Daemon >> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3 >> machine, but trasformation done sequentially launching a JVM every >> transformation for each stage.) >> OR >> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine but >> different stage in different set of JVM) >> OR >> 1+3+5*3 (So, JVM will be reused for different partition on single machine >> but different stage in different set of JVM) >> OR >> 1+3+3 (So, One JVM per Worker in any case). >> OR >> none >> >> Thx, >> Archit_Thakur. >> >> >>
