2010/6/22 N Kapshoo <[email protected]> > Is there any querying value in separating out values tied to each > other vs. keeping them in a serialized object? I am guessing the > second option would be much faster considering it is one composite > value on the disk, but I would like to know if there are any specific > advantages to doing things the other way. Thanks. > The values themselves are very small, basic information in String. > > Eg: > > DocInfo: <docId><type> = value1 > DocInfo: <docId><priority> = value2 > DocInfo: <docId><etcetc> = value3 > > > Vs > > DocInfo: docId = value (JSON(type, priority, etcetc)) > > Thank you. >
This is mostly depends on usage pattern. 1. each value in storage have full key key/family/qualifier/timestamp, so keyvalue size increasing (but this negative effect can be negated by using compression). So serialisation form will be smaller, take less disk io, and can be faster. 2. second option gives you atomic updates (i.e all data comes as one "piece") and with first option you can have concurrent updates of the fields (and of course individual history, in opposite to serialized object, which will have history for a whole object) 3. in serialised form you cant use server side filters (out of the box, you should patch hbase to support custom filters, which will deserialise object or use jsonpath on it's serialised form), but with first option - you can.
