A lot of your design depends on your read/write rate & the amount of duplication in your inserts. For example, if your read rate is really low and your write rate is really high with a low dedupe, you could try:
Row = USER_ID Column Qualifier = PRODUCT_ID MAX_VERSIONS = 1 Setting the max versions for a CF to 1 basically allows the dedupe kick in & treat your column qualifier as a set. Putting the data in the CF instead of the value feed means that you'll dedupe on read demand instead of read-modify-write. That said, RMW works better with high dedupe or a high read rate because you'd otherwise write unnecessary duplicate values on flush. Also, with read-modify-write, consider using bloom filters if you have a high miss rate. It's cheaper to do a bloom filter query of a really large file if the key doesn't exist most of the time. We used this to store unique email thread UUIDs for our messaging application. I'm guessing this might be a little too advanced for your question if your just getting up and going. I'm more trying to help you understand that you should think about how your read/write/re-write/modify data flow is going to look because HBase has a lot off knobs to optimize for a wide variety of flow situations. Nicolas On 2/10/12 4:45 AM, "weichao" <[email protected]> wrote: >Maybe you can build a index-table, like > >rowkey:[USER_ID/ProductID] = { rk => main-table's rowkey} > >when view a product, check Index, find the rk, use the rk to get row from >Main-talbe. delete this row, modify index-talbe's rk. > >of cause, use coprocessor to handle this may make it simple... > > >2012/2/9 Mark <[email protected]> > >> We would like to maintain a history of all product views by a given >>user. >> We are currently using a row key like USER_ID_ID/TIMESTAMP. This works >> however we would like to maintain a unique list of these users to >>product >> views. >> >> So if i have rows like: >> >> mark/1328731167014262 = { data => 'Product 123' } >> mark/1328731162502304 = { data => 'Product 456' } >> mark/1328731157711375 = { data => 'Product 789' } >> >> And I view Product 789 again I want it to be like: >> >> mark/1328731292355173 = { data => 'Product 789' } >> mark/1328731167014262 = { data => 'Product 123' } >> mark/1328731162502304 = { data => 'Product 456' } >> >> So it basically replaces the old value. How can this be accomplished? >> >> Thanks >>
