Well, the implementation is quite simple (for now): there are few classes in sources. You can take a look and provide a feedback (would be great)!
Briefly (and simplified), the idea is the following: Writing new data: When new data is arrived (in form of a Put) it is stored as a separate record in HBase (magic postfix is added to the key so that it doesn't override existing record with the same key which is meant to be updated). Reading data: When user fetches data using scan pending updates (if there are any) are processed. Thus user always gets latest data. User can also perform processing on e.g. some periodic basis to prevent too much work during data scans. Currently user can process updates for particular records or set of records (the set is determined by simple Scan). Updates Processing internals: When updates processing is fired, the records with the same original keys are passed to UpdateProcessor (UP) implementation (defined by user) as iterable list. UP outputs updated record which is stored back in HBase (if this was requested) with updated postfix and processed records are scheduled for remove. New postfix tells what data has been processed to preserve reading consistency (as writing updated record and deleting processed is not atomic). Hope this gives some picture. I'd encourage you to use https://groups.google.com/group/hbasehut to discuss details. Thank you! Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Dec 1, 2010 at 6:51 PM, Renaud Delbru <[email protected]>wrote: > Hi, > > could you describe the logic behind your implementation. I haven't found > any related documents on your wiki. > > cheers > -- > Renaud Delbru > > > On 30/11/10 13:40, Alex Baranau wrote: > >> Hello, >> >> Let me introduce new effort around HBase: HBaseHUT. >> It suggests solution to mentioned many times on this mailing list problem >> "do Get on every Put operation to update record" (which causes bad write >> performance) and suitable for many use-cases. >> >> Sources available here: http://github.com/sematext/HBaseHUT >> Wiki with some idea/usage details: >> https://github.com/sematext/HBaseHUT/wiki >> >> It would be great to receive the feedback on the overall idea. >> >> Thank you! >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> >> >
