Oh, you meant main driver. Yes, correct.
On Sun, Jan 5, 2014 at 1:36 AM, Archit Thakur <[email protected]>wrote: > Yeah, I believed that too. > > The last being the jvm in which your driver runs.??? Isn't it in the 3 > worker daemon, we have already considered. > > > On Sun, Jan 5, 2014 at 1:28 AM, Roshan Nair <[email protected]> wrote: > >> I missed this. Its actually 1+3+3+1. The last being the jvm in which your >> driver runs. >> >> Roshan >> On Jan 5, 2014 1:24 AM, "Roshan Nair" <[email protected]> wrote: >> >>> Hi Archit, >>> >>> I believe its the last case - 1+3+3. >>> >>> From what I've seen its one jvm per worker per spark application. >>> >>> You will have multiple threads within a worker jvm working on different >>> partitions concurrently. The number of partitions that a worker handles >>> concurrently appears to be determined by the number of cores you've set the >>> worker(or app) to use. >>> >>> You'd have to save to disk and reload an RDD into memory between stages, >>> which is why spark won't do that. >>> >>> Roshan >>> On Jan 5, 2014 1:06 AM, "Archit Thakur" <[email protected]> >>> wrote: >>> >>>> A JVM reuse doubt. >>>> Lets say I have a job which has 5 stages: >>>> Each stage has 10 tasks(10 partitions) Each task has 3 transformation. >>>> My Cluster is size 4 (1 Master, 3 Workers), How many JVMs will be >>>> launched? >>>> >>>> 1-Master Daemon 3-Worker Daemon >>>> JVM = 1+3+10*3*5 (where at a time 10 will be executed parallely on 3 >>>> machine, but trasformation done sequentially launching a JVM every >>>> transformation for each stage.) >>>> OR >>>> 1+3+5*10 (where at a time 10 will be executed parallely on 3 machine >>>> but different stage in different set of JVM) >>>> OR >>>> 1+3+5*3 (So, JVM will be reused for different partition on single >>>> machine but different stage in different set of JVM) >>>> OR >>>> 1+3+3 (So, One JVM per Worker in any case). >>>> OR >>>> none >>>> >>>> Thx, >>>> Archit_Thakur. >>>> >>>> >>>> >
