Hi all, I am trying to sanitize our setup, and using the PerformanceEvaluation as a basis to check.
To to this, I ran the following to load it up: $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 5 This gave me 32 regions across 2 of our 3 region servers (we have HDFS across 17 nodes but only machines running 3 RS). And then the following to scan: $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5 The output of the scan is: 12/01/25 15:11:02 INFO mapred.JobClient: ROWS=5242850 12/01/25 15:11:02 INFO mapred.JobClient: ELAPSED_TIME=1624832 (job took 52 secs in reality) Can anyone elaborate on how I am meant to interpret these numbers please? Looks like 3.2 rows per <timeunit> [I am trying to benchmark because our real data of 340M rows (215G on HDFS) takes 60 mins to scan which seems a lot] Thanks for any pointers you might provide to help benchmark scanning, Tim
