On Wed, Jan 25, 2012 at 6:21 AM, Tim Robertson <[email protected]> wrote: > Hi all, >
Hey Tim. > This gave me 32 regions across 2 of our 3 region servers (we have HDFS > across 17 nodes but only machines running 3 RS). > The balancer ran? I'd think it'd balance the regions across the three servers. Something stuck in transition stopping the balancer running (See master log). > And then the following to scan: > $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5 > So, sounds like we're going against two of the three servers only. > The output of the scan is: > 12/01/25 15:11:02 INFO mapred.JobClient: ROWS=5242850 > 12/01/25 15:11:02 INFO mapred.JobClient: ELAPSED_TIME=1624832 > (job took 52 secs in reality) > > Can anyone elaborate on how I am meant to interpret these numbers > please? Looks like 3.2 rows per <timeunit> > Your MR job scanned 5M rows. It looks like you had 5 clients so you should have had 5 mappers running. The ELAPSED_TIME is supposed to be the sum of the elapsed time of all mappers. The above looks way wrong to me. > [I am trying to benchmark because our real data of 340M rows (215G on > HDFS) takes 60 mins to scan which seems a lot] > Three servers? Scanning in sequence? What rate you seeing per server Tim? What kind of servers (I think you've posted your profile the list before but ... it was a while back (smile)). What size the rows being returned? Thanks, St.Ack
