Yeah, I believed that too. The last being the jvm in which your driver runs.??? Isn't it in the 3 worker daemon, we have already considered.
On Sun, Jan 5, 2014 at 1:28 AM, Roshan Nair <[email protected]> wrote: > I missed this. Its actually 1+3+3+1. The last being the jvm in which your > driver runs. > > Roshan > On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote: > >> Hi Archit, >> >> I believe its the last case - 1+3+3. >> >> From what I've seen its one jvm per worker per spark application. >> >> You will have multiple threads within a worker jvm working on different >> partitions concurrently. The number of partitions that a worker handles >> concurrently appears to be determined by the number of cores you've set the >> worker(or app) to use. >> >> You'd have to save to disk and reload an RDD into memory between stages, >> which is why spark won't do that. >> >> Roshan >> On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]> >> wrote: >> >>> A JVM reuse doubt. >>> Lets say I have a job which has 5 stages: >>> Each stage has 10 tasks(10 partitions) Each task has 3 transformation. >>> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be >>> launched? >>> >>> 1-Master Daemon 3-Worker Daemon >>> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3 >>> machine, but trasformation done sequentially launching a JVM every >>> transformation for each stage.) >>> OR >>> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine but >>> different stage in different set of JVM) >>> OR >>> 1+3+5*3 (So, JVM will be reused for different partition on single >>> machine but different stage in different set of JVM) >>> OR >>> 1+3+3 (So, One JVM per Worker in any case). >>> OR >>> none >>> >>> Thx, >>> Archit_Thakur. >>> >>> >>>
