Re: estimate HBase DFS filesystem usage

Norbert Burger Mon, 24 Jan 2011 10:24:24 -0800

Good idea.  But it seems like this approach would give me the size of just
the raw data itself, ignoring any kind of container (like HFiles) that are
used to store the data.  What I'd like ideally is to get an idea of what the
fixed cost (in terms of bytes) is for each my tables, and then understand
how I can calculate a variable bytes/record cost.


Is this feasible?

Norbert

On Mon, Jan 24, 2011 at 1:16 PM, Xavier Stevens <[email protected]>wrote:

> Norbert,
>
> It would probably be best if you wrote a quick MapReduce job that
> iterates over those records and outputs the sum of bytes for each one.
> Then you could use that output and get some general descriptive
> statistics based on it.
>
> Cheers,
>
>
> -Xavier
>
>
> On 1/24/11 9:37 AM, Norbert Burger wrote:
> > Hi folks - is there a recommended way of estimating HBase HDFS usage for
> a
> > new environment?
> >
> > We have a DEV HBase cluster in place, and from this, I'm trying to
> estimate
> > the specs of our not-yet-built PROD environment.  One of the variables
> we're
> > considering is HBase usage of HDFS.  What I've just tried is to calculate
> an
> > average bytes/record ratio by using "hadoop dfs -du /hbase", and dividing
> by
> > the number of records/table.  But this ignores any kind of fixed
> overhead,
> > so I have concerns about it.
> >
> > Is there a better way?
> >
> > Norbert
> >
>

Re: estimate HBase DFS filesystem usage

Reply via email to