Re: MR sharded Scans giving poor performance..

Ryan Rawson Mon, 26 Jul 2010 15:02:08 -0700

Hey,

A few questions:


- sharded scan, are you not using TableInputFormat?
- 1 MB block size - what block size?  You probably shouldnt set the
HDFS block size to 1MB, it just causes more nn traffic.
- Tests a year ago indicated that HFile block size really didnt
improve speed when you went beyond 64k or so.
- Run more maps/machine... one map task per disk probably?
- Try setting the client cache to an in-between level, 2-6 perhaps.

Let us know about those other questions and we can go from there.
-ryan

On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
<[email protected]> wrote:
> I am trying to assess the performance of Scans on a 100TB db on 180 nodes 
> running Hbase 0.20.5..
>
> I run a sharded scan (each Map task runs a scan on a specific range: 
> speculative execution is turned false so that there is no duplication in 
> tasks) on a fully compacted table...
>
> 1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 
> 30 KB in size: 1 big column family with just one field..
> Region lease timeout is set to an hour.. And I don't get any socket timeout 
> exceptions so I have not reassigned the write socket timeout...
>
> I ran experiments on the following cases:
>
>  1.  The client level cache is set to 1 (default: got he number using 
> getCaching): The MR tasks take around 13 hours to finish in the average.. 
> Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish 
> the entire job)...
>  2.  Client cache set to 20 rows: this is much worse than the previous case: 
> we get around a super low 1MBps per node...
>
>         Question: Should I set it to a value such that the block size is a 
> multiple of the above said cache size? Or the cache size to a much lower 
> value?
>
> I find that these numbers are much less than the ones I get when it's running 
> with just a few nodes..
>
> Can you guys help me with this problem?
>
> Thank you
> Vidhya
>

Re: MR sharded Scans giving poor performance..

Reply via email to