Re: Bulk Loads and Updates

gordoslocos Wed, 03 Oct 2012 13:23:01 -0700

Thank you Paul.

I was just thinking that I could use add a reducer to the step that prepares 
the data to build custom logic around having multiple entries which produce the 
same rowkey. What do u think?


Sent from my iPhone

On 03/10/2012, at 17:12, Paul Mackles <[email protected]> wrote:

> Keys in hbase are a combination of rowkey/column/timestamp.
> 
> Two records with the same rowkey but different column will result in two
> different cells with the same rowkey which is probably what you expect.
> 
> For two records with the same rowkey and same column, the timestamp will
> normally differentiate them but in the case of a bulk load, the timestamp
> could be the same so it may actually be a tie and both will be stored.
> There are no updates in bulk loads.
> 
> All 20 versions will get loaded but the 10 oldest will be deleted during
> the next major compaction.
> 
> I would definitely recommend setting up small scale tests for all of the
> above scenarios to confirm.
> 
> On 10/3/12 3:35 PM, "Juan P." <[email protected]> wrote:
> 
>> Hi guys,
>> I've been reading up on bulk load using MapReduce jobs and I wanted to
>> validate something.
>> 
>> If I the input I wanted to load into HBase produced the same key for
>> several lines. How will HBase handle that?
>> 
>> I understand the MapReduce job will create StoreFiles which the region
>> servers just pick up and make available to the users. But is there a
>> validation to treat the first as insert and the rest as updates?
>> 
>> What about the limit on the number of versions of a key HBase can have? If
>> I want to have 10 versions, but the bulk load has 20 values for the same
>> key, will it only keep the last 10?
>> 
>> Thanks,
>> Juan
>

Re: Bulk Loads and Updates

Reply via email to