Ok, Thanks Bejoy. Only in some typical scenarios it's possible , like the one that you have mentioned. Much more number of mappers and less number of mappers slots.
Regards, Rahul On Tue, Apr 16, 2013 at 2:40 PM, Bejoy Ks <bejoy.had...@gmail.com> wrote: > Hi Rahul > > If you look at larger cluster and jobs that involve larger input data > sets. The data would be spread across the whole cluster, and a single node > might have various blocks of that entire data set. Imagine you have a > cluster with 100 map slots and your job has 500 map tasks, now in that case > there should be multiple map tasks in a single task tracker based on slot > availability. > > Here if you enable jvm reuse, all tasks related to a job on a single > TaskTracker would use the same jvm. The benefit here is just the time you > are saving in spawning and cleaning up jvm for individual tasks. > > > > > On Tue, Apr 16, 2013 at 2:04 PM, Rahul Bhattacharjee < > rahul.rec....@gmail.com> wrote: > >> Hi, >> >> I have a question related to VM reuse in Hadoop.I now understand the >> purpose of VM reuse , but I am wondering how is it useful. >> >> Example. for VM reuse to be effective or kicked in , we need more than >> one mapper task to be submitted to a single node (for the same job).Hadoop >> would consider spawning mappers into nodes which actually contains the data >> , it might rarely happen that multiple mappers are allocated to a single >> task tracker. And even if a single task nodes gets to run multiple mappers >> then it might as well run in parallel in multiple VM rather than >> sequentially in a single VM. >> >> I am sure I am missing some link here , please help me find that. >> >> Thanks, >> Rahul >> > >