Re: Aggregation while Bulk Loading into HBase

Nick Dimiduk Wed, 28 Nov 2012 10:43:38 -0800

Why don't you aggregate these data in a preprocessing step... like a
map-reduce job? You can then load the output of that work directly into
HBase.


-n

On Wed, Nov 28, 2012 at 5:37 AM, Narayanan K <[email protected]> wrote:

> Hi all,
>
> I have a scenario where I need to do aggregation while bulk loading into
> HBase.
>
> Say for example, I have the following rows in my flat file, each with 2
> fields  - product-id, amount. Values as below :
>
> P1, 1000
> P2, 200
> P3, 2500
> P1,1500
> P2, 300
>
> My rowkey is product-id and I have a column : details:amount=<val>
>
> What I want is, after the bulk load of the above file, the table must have
> the following rows and column values :
>
> P1 -- details:amount=2500
> P2 -- details:amount=500
> P3 -- details:amount=2500
>
> My understanding of Bulk Load is that, when the map function gets a row
> from the file, it can do some transformation, prepare the rowkey, columns
> and then write to the HBase Table.
>
> But in our case, we will need an instance of the HTable in the Mapper, do a
> GET operation and find the rowkey if it already exists and then add up the
> column amounts and then write back.
> But in that case, all parallel mappers will open connection to the same
> table and the GET will not be synchronized - leading to race conditions,
> right ?
>
> Is this the right way to do? If not, what are the other ways by which this
> can be achieved?
>
> Thanks in advance,
> Narayanan K
>

Re: Aggregation while Bulk Loading into HBase

Reply via email to