Re: Batch Get performance degrades from within Mapreduce

Jean-Daniel Cryans Tue, 21 Feb 2012 11:49:39 -0800

First a side comment: if you send an email to a mailing list like this
one and didn't get any answer within a few hours, sending another one
right away usually won't help. It's just bad etiquette.


Now I'm reading over the whole thread and things are really not that
clear to me.

- You say you have 1 region server and 3 datanodes. Is there an
intersection? If not, you miss out on enabling local reads and take a
big performance hit although if you didn't enable it for your unit
test then it's just something you might want to look at later.

- What's the machine that runs the unit test like?

- How many disks per datanodes? JBOD SATA or fancier?

- Where are the mappers running? One task tracker per datanode? Or is
it per regionserver (eg 1)?

- You say you have 8 concurrent mappers running... so I don't know if
they are all on the same machine or not (see my previous question),
but since you have 7 regions it means by default you can only have 7
mappers running. Where's the 8th one coming from?

- When the MR job is running, how are the disks performing (via
iostat)? Again knowing whether or not the RS is colocated with a DN
would help at lot.

- Is the data set the same in the unit test and in the MR test?

Thx,

J-D

On Mon, Feb 20, 2012 at 5:42 PM, Himanish Kushary <[email protected]> wrote:
> Could somebody help me figure out whats the difference while running
> through map-reduce..is it just the concurrency that causing the issue.Will
> increasing the number of region servers help ?
>
> BTW, the master is also on the same server as the regionserver.Is it just a
> environment issue or there is some other configuration that me improve the
> read performance from within the mapper.
>
> Thanks
> Himanish

Re: Batch Get performance degrades from within Mapreduce

Reply via email to