Norbert, It would probably be best if you wrote a quick MapReduce job that iterates over those records and outputs the sum of bytes for each one. Then you could use that output and get some general descriptive statistics based on it.
Cheers, -Xavier On 1/24/11 9:37 AM, Norbert Burger wrote: > Hi folks - is there a recommended way of estimating HBase HDFS usage for a > new environment? > > We have a DEV HBase cluster in place, and from this, I'm trying to estimate > the specs of our not-yet-built PROD environment. One of the variables we're > considering is HBase usage of HDFS. What I've just tried is to calculate an > average bytes/record ratio by using "hadoop dfs -du /hbase", and dividing by > the number of records/table. But this ignores any kind of fixed overhead, > so I have concerns about it. > > Is there a better way? > > Norbert >
