Thanks Xavier.  I'll give that a shot.

Norbert

On Mon, Jan 24, 2011 at 1:33 PM, Xavier Stevens <[email protected]>wrote:

> Not sure if there is a way to do that.  You could get a really rough
> estimate if you did the job I described and subtracted the total bytes
> calculated for the records from the "hadoop fs -dus /hbase/<table_name>"
> bytes.  Then that would give an idea of the amount of overhead.  I have
> a feeling it is negligible in the grand scheme of things.
>
> -Xavier
>
> On 1/24/11 10:23 AM, Norbert Burger wrote:
> > Good idea.  But it seems like this approach would give me the size of
> just
> > the raw data itself, ignoring any kind of container (like HFiles) that
> are
> > used to store the data.  What I'd like ideally is to get an idea of what
> the
> > fixed cost (in terms of bytes) is for each my tables, and then understand
> > how I can calculate a variable bytes/record cost.
> >
> > Is this feasible?
> >
> > Norbert
> >
> > On Mon, Jan 24, 2011 at 1:16 PM, Xavier Stevens <[email protected]
> >wrote:
> >
> >> Norbert,
> >>
> >> It would probably be best if you wrote a quick MapReduce job that
> >> iterates over those records and outputs the sum of bytes for each one.
> >> Then you could use that output and get some general descriptive
> >> statistics based on it.
> >>
> >> Cheers,
> >>
> >>
> >> -Xavier
> >>
> >>
> >> On 1/24/11 9:37 AM, Norbert Burger wrote:
> >>> Hi folks - is there a recommended way of estimating HBase HDFS usage
> for
> >> a
> >>> new environment?
> >>>
> >>> We have a DEV HBase cluster in place, and from this, I'm trying to
> >> estimate
> >>> the specs of our not-yet-built PROD environment.  One of the variables
> >> we're
> >>> considering is HBase usage of HDFS.  What I've just tried is to
> calculate
> >> an
> >>> average bytes/record ratio by using "hadoop dfs -du /hbase", and
> dividing
> >> by
> >>> the number of records/table.  But this ignores any kind of fixed
> >> overhead,
> >>> so I have concerns about it.
> >>>
> >>> Is there a better way?
> >>>
> >>> Norbert
> >>>
>

Reply via email to