I did not use a TableInputFormat: I ran my own scans on specific ranges (just for more control from my side to tune the ranges and the ease with which I can run a hadoop streaming job)..
1 MB for Hfile Block size.. Not the HDFS block size.. I increased it since I didn't care too much for random read performance.. HDFS block size is the default value... (I have a related question then: does the Hfile block size influence only the size of the index and the efficiency of random reads? I don't see an effect on scans though)... I had previously run 5 tasks per machine and at 20 rows, but that resulted in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced the number of tasks.. Let me decrease the number of rows and see.. Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right? Thank you Vidhya On 7/26/10 3:01 PM, "Ryan Rawson" <[email protected]> wrote: Hey, A few questions: - sharded scan, are you not using TableInputFormat? - 1 MB block size - what block size? You probably shouldnt set the HDFS block size to 1MB, it just causes more nn traffic. - Tests a year ago indicated that HFile block size really didnt improve speed when you went beyond 64k or so. - Run more maps/machine... one map task per disk probably? - Try setting the client cache to an in-between level, 2-6 perhaps. Let us know about those other questions and we can go from there. -ryan On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman <[email protected]> wrote: > I am trying to assess the performance of Scans on a 100TB db on 180 nodes > running Hbase 0.20.5.. > > I run a sharded scan (each Map task runs a scan on a specific range: > speculative execution is turned false so that there is no duplication in > tasks) on a fully compacted table... > > 1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is > 30 KB in size: 1 big column family with just one field.. > Region lease timeout is set to an hour.. And I don't get any socket timeout > exceptions so I have not reassigned the write socket timeout... > > I ran experiments on the following cases: > > 1. The client level cache is set to 1 (default: got he number using > getCaching): The MR tasks take around 13 hours to finish in the average.. > Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish > the entire job)... > 2. Client cache set to 20 rows: this is much worse than the previous case: > we get around a super low 1MBps per node... > > Question: Should I set it to a value such that the block size is a > multiple of the above said cache size? Or the cache size to a much lower > value? > > I find that these numbers are much less than the ones I get when it's running > with just a few nodes.. > > Can you guys help me with this problem? > > Thank you > Vidhya >
