Hi all,

I am trying to sanitize our setup, and using the PerformanceEvaluation
as a basis to check.

To to this, I ran the following to load it up:
$HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation
randomWrite 5
This gave me 32 regions across 2 of our 3 region servers (we have HDFS
across 17 nodes but only machines running 3 RS).

And then the following to scan:
$HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5

The output of the scan is:
12/01/25 15:11:02 INFO mapred.JobClient:     ROWS=5242850
12/01/25 15:11:02 INFO mapred.JobClient:     ELAPSED_TIME=1624832
(job took 52 secs in reality)

Can anyone elaborate on how I am meant to interpret these numbers
please?  Looks like 3.2 rows per <timeunit>

[I am trying to benchmark because our real data of 340M rows (215G on
HDFS) takes 60 mins to scan which seems a lot]

Thanks for any pointers you might provide to help benchmark scanning,
Tim

Reply via email to