Re: [ANN] HBaseHUT - HBase High Update Throughput

Alex Baranau Wed, 01 Dec 2010 10:18:52 -0800

Well, the implementation is quite simple (for now): there are few classes in
sources. You can take a look and provide a feedback (would be great)!

Briefly (and simplified), the idea is the following:

Writing new data:
When new data is arrived (in form of a Put) it is stored as a separate
record in HBase (magic postfix is added to the key so that it doesn't
override existing record with the same key which is meant to be updated).

Reading data:
When user fetches data using scan pending updates (if there are any) are
processed. Thus user always gets latest data.
User can also perform processing on e.g. some periodic basis to prevent too
much work during data scans. Currently user can process updates for
particular records or set of records (the set is determined by simple Scan).

Updates Processing internals:
When updates processing is fired, the records with the same original keys
are passed to UpdateProcessor (UP) implementation (defined by user) as
iterable list. UP outputs updated record which is stored back in HBase (if
this was requested) with updated postfix and processed records are scheduled
for remove. New postfix tells what data has been processed to preserve
reading consistency (as writing updated record and deleting processed is not
atomic).

Hope this gives some picture. I'd encourage you to use
https://groups.google.com/group/hbasehut to discuss details. Thank you!

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Dec 1, 2010 at 6:51 PM, Renaud Delbru <[email protected]>wrote:

>  Hi,
>
> could you describe the logic behind your implementation. I haven't found
> any related documents on your wiki.
>
> cheers
> --
> Renaud Delbru
>
>
> On 30/11/10 13:40, Alex Baranau wrote:
>
>> Hello,
>>
>> Let me introduce new effort around HBase: HBaseHUT.
>> It suggests solution to mentioned many times on this mailing list problem
>> "do Get on every Put operation to update record" (which causes bad write
>> performance) and suitable for many use-cases.
>>
>> Sources available here: http://github.com/sematext/HBaseHUT
>> Wiki with some idea/usage details:
>> https://github.com/sematext/HBaseHUT/wiki
>>
>> It would be great to receive the feedback on the overall idea.
>>
>> Thank you!
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>>
>

Re: [ANN] HBaseHUT - HBase High Update Throughput

Reply via email to