Re: HBaseStorage STORE method comparison

Raghu Angadi Wed, 07 Mar 2012 00:23:11 -0800

fastest might be to use local mode, and avoid even the first map only job :)


You are right, for 10 keys it does not really matter. Even doing 1000s of
updates to the same row in #2 is still a in-memory update for HBase. The
actual cost of HBase put() is probably slightly high for #2, but it is a
negligible part of the rest of the overhead.

On Tue, Mar 6, 2012 at 10:24 AM, Norbert Burger <[email protected]>wrote:

> Hi folks --
>
> For a very sparse HBase table (2 column families, 1000s of columns) what's
> the expected performance difference in using HBaseStorage with the
> following two STORE methods?  Note that in our use case, there only a
> handful of unique rowkeys (approx 10).
>
> 1) GROUP BY the 1000s of columns by rowkey, and write only 10 very wide
> rows into HBase
> 2) Skip the GROUP BY, and just write the raw data as is.  Conceptually,
> this seems like a rewrite on the 10 rowkeys, but we're writing a different
> column each time.
>
> Originally our processing was using approach #1, but I just modified it to
> use method #2, and I'm seeing a decent performance increase.  I think much
> of the difference uis the overhead of launching another Hadoop job, since
> GROUP BY is a blocking operator.  Any thoughts?
>
> Norbert
>

Re: HBaseStorage STORE method comparison

Reply via email to