Inline. On Wed, May 13, 2015 at 10:31 AM, rahul malviya <[email protected]> wrote:
> *How many mapper/reducers are running per node for this job?* > I am running 7-8 mappers per node. The spike is seen in mapper phase so no > reducers where running at that point of time. > > *Also how many mappers are running as data local mappers?* > How to determine this ? > On the counter web page of your job. Look for "Data-local map tasks" counter. > > > * You load/data equally distributed?* > Yes as we use presplit hash keys in our hbase cluster and data is pretty > evenly distributed. > > Thanks, > Rahul > > > On Wed, May 13, 2015 at 10:25 AM, Anil Gupta <[email protected]> > wrote: > > > How many mapper/reducers are running per node for this job? > > Also how many mappers are running as data local mappers? > > You load/data equally distributed? > > > > Your disk, cpu ratio looks ok. > > > > Sent from my iPhone > > > > > On May 13, 2015, at 10:12 AM, rahul malviya < > [email protected]> > > wrote: > > > > > > *The High CPU may be WAIT IOs, which would mean that you’re cpu is > > waiting > > > for reads from the local disks.* > > > > > > Yes I think thats what is going on but I am trying to understand why it > > > happens only in case of snapshot MR but if I run the same job without > > using > > > snapshot everything is normal. What is the difference in snapshot > version > > > which can cause such a spike ? I looking through the code for snapshot > > > version if I can find something. > > > > > > cores / disks == 24 / 12 or 40 / 12. > > > > > > We are using 10K sata drives on our datanodes. > > > > > > Rahul > > > > > > On Wed, May 13, 2015 at 10:00 AM, Michael Segel < > > [email protected]> > > > wrote: > > > > > >> Without knowing your exact configuration… > > >> > > >> The High CPU may be WAIT IOs, which would mean that you’re cpu is > > waiting > > >> for reads from the local disks. > > >> > > >> What’s the ratio of cores (physical) to disks? > > >> What type of disks are you using? > > >> > > >> That’s going to be the most likely culprit. > > >>>> On May 13, 2015, at 11:41 AM, rahul malviya < > > [email protected]> > > >>> wrote: > > >>> > > >>> Yes. > > >>> > > >>>> On Wed, May 13, 2015 at 9:40 AM, Ted Yu <[email protected]> > wrote: > > >>>> > > >>>> Have you enabled short circuit read ? > > >>>> > > >>>> Cheers > > >>>> > > >>>> On Wed, May 13, 2015 at 9:37 AM, rahul malviya < > > >> [email protected] > > >>>> wrote: > > >>>> > > >>>>> Hi, > > >>>>> > > >>>>> I have recently started running MR on hbase snapshots but when the > MR > > >> is > > >>>>> running there is pretty high CPU usage on datanodes and I start > > seeing > > >> IO > > >>>>> wait message in datanode logs and as soon I kill the MR on Snapshot > > >>>>> everything come back to normal. > > >>>>> > > >>>>> What could be causing this ? > > >>>>> > > >>>>> I am running cdh5.2.0 distribution. > > >>>>> > > >>>>> Thanks, > > >>>>> Rahul > > >> > > >> > > > -- Thanks & Regards, Anil Gupta
