*How many mapper/reducers are running per node for this job?* I am running 7-8 mappers per node. The spike is seen in mapper phase so no reducers where running at that point of time.
*Also how many mappers are running as data local mappers?* How to determine this ? * You load/data equally distributed?* Yes as we use presplit hash keys in our hbase cluster and data is pretty evenly distributed. Thanks, Rahul On Wed, May 13, 2015 at 10:25 AM, Anil Gupta <[email protected]> wrote: > How many mapper/reducers are running per node for this job? > Also how many mappers are running as data local mappers? > You load/data equally distributed? > > Your disk, cpu ratio looks ok. > > Sent from my iPhone > > > On May 13, 2015, at 10:12 AM, rahul malviya <[email protected]> > wrote: > > > > *The High CPU may be WAIT IOs, which would mean that you’re cpu is > waiting > > for reads from the local disks.* > > > > Yes I think thats what is going on but I am trying to understand why it > > happens only in case of snapshot MR but if I run the same job without > using > > snapshot everything is normal. What is the difference in snapshot version > > which can cause such a spike ? I looking through the code for snapshot > > version if I can find something. > > > > cores / disks == 24 / 12 or 40 / 12. > > > > We are using 10K sata drives on our datanodes. > > > > Rahul > > > > On Wed, May 13, 2015 at 10:00 AM, Michael Segel < > [email protected]> > > wrote: > > > >> Without knowing your exact configuration… > >> > >> The High CPU may be WAIT IOs, which would mean that you’re cpu is > waiting > >> for reads from the local disks. > >> > >> What’s the ratio of cores (physical) to disks? > >> What type of disks are you using? > >> > >> That’s going to be the most likely culprit. > >>>> On May 13, 2015, at 11:41 AM, rahul malviya < > [email protected]> > >>> wrote: > >>> > >>> Yes. > >>> > >>>> On Wed, May 13, 2015 at 9:40 AM, Ted Yu <[email protected]> wrote: > >>>> > >>>> Have you enabled short circuit read ? > >>>> > >>>> Cheers > >>>> > >>>> On Wed, May 13, 2015 at 9:37 AM, rahul malviya < > >> [email protected] > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I have recently started running MR on hbase snapshots but when the MR > >> is > >>>>> running there is pretty high CPU usage on datanodes and I start > seeing > >> IO > >>>>> wait message in datanode logs and as soon I kill the MR on Snapshot > >>>>> everything come back to normal. > >>>>> > >>>>> What could be causing this ? > >>>>> > >>>>> I am running cdh5.2.0 distribution. > >>>>> > >>>>> Thanks, > >>>>> Rahul > >> > >> >
