First a side comment: if you send an email to a mailing list like this one and didn't get any answer within a few hours, sending another one right away usually won't help. It's just bad etiquette.
Now I'm reading over the whole thread and things are really not that clear to me. - You say you have 1 region server and 3 datanodes. Is there an intersection? If not, you miss out on enabling local reads and take a big performance hit although if you didn't enable it for your unit test then it's just something you might want to look at later. - What's the machine that runs the unit test like? - How many disks per datanodes? JBOD SATA or fancier? - Where are the mappers running? One task tracker per datanode? Or is it per regionserver (eg 1)? - You say you have 8 concurrent mappers running... so I don't know if they are all on the same machine or not (see my previous question), but since you have 7 regions it means by default you can only have 7 mappers running. Where's the 8th one coming from? - When the MR job is running, how are the disks performing (via iostat)? Again knowing whether or not the RS is colocated with a DN would help at lot. - Is the data set the same in the unit test and in the MR test? Thx, J-D On Mon, Feb 20, 2012 at 5:42 PM, Himanish Kushary <[email protected]> wrote: > Could somebody help me figure out whats the difference while running > through map-reduce..is it just the concurrency that causing the issue.Will > increasing the number of region servers help ? > > BTW, the master is also on the same server as the regionserver.Is it just a > environment issue or there is some other configuration that me improve the > read performance from within the mapper. > > Thanks > Himanish
