Not sure there is a right/wrong way. You should probably just do what you're most comfortable with / what makes the most sense to you.
> -----Original Message----- > From: N Kapshoo [mailto:[email protected]] > Sent: Monday, June 21, 2010 3:23 PM > To: [email protected] > Subject: Re: composite value vs composite qualifier > > Does it still make sense to follow the previous id generation we > talked about? (for performance reasons instead of storing an entire > string?) > > <docId><byte1> = value1 > <docId><byte2> = value2 > > instead of > <docId><author> = value1 > <docId><status> = value2 > etc? > > > On Mon, Jun 21, 2010 at 5:19 PM, N Kapshoo <[email protected]> wrote: > > Aha. That makes sense (both atomic writes and Filters). > > > > I am definitely only looking to filter within a given user, so looks > > like what you describe below might work for me. > > > > Thanks so much for all your help, Jonathan. You have saved me (at > > least) 2 weeks of tinkering and poking around! > > > > On Mon, Jun 21, 2010 at 5:10 PM, Jonathan Gray <[email protected]> > wrote: > >> It would be inefficient to run that query against this schema, if > you're talking about finding all documents with a given author across > all users. In that case you'd want to use an additional table that had > row keys as authors. > >> > >> If you want to search for documents with a specific author within a > given users documents (single row) then you could use filters, and as > Andrey said, it would be simpler if it was broken up into individual > qualifiers but could also be done with a custom filter to read the > serialized value. > >> > >> To answer your question, you'd want a QualifierFilter that matched > against qualifiers of the form <anylong><author> and then a ValueFilter > which matched the value against the specific author you're looking for. > >> > >> JG > >> > >>> -----Original Message----- > >>> From: N Kapshoo [mailto:[email protected]] > >>> Sent: Monday, June 21, 2010 2:59 PM > >>> To: [email protected] > >>> Subject: Re: composite value vs composite qualifier > >>> > >>> I am not sure how to use filters in my case since I do not know the > >>> column name. > >>> Eg: > >>> DocInfo: 123213+author = "abc" > >>> > >>> 123213 is the docId. If I want to look for authors named 'abc' in > all > >>> docs, how would I go about specifying a filter? > >>> > >>> Thanks. > >>> > >>> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev > <[email protected]> > >>> wrote: > >>> > 2010/6/22 N Kapshoo <[email protected]> > >>> > > >>> >> Is there any querying value in separating out values tied to > each > >>> >> other vs. keeping them in a serialized object? I am guessing the > >>> >> second option would be much faster considering it is one > composite > >>> >> value on the disk, but I would like to know if there are any > >>> specific > >>> >> advantages to doing things the other way. Thanks. > >>> >> The values themselves are very small, basic information in > String. > >>> >> > >>> >> Eg: > >>> >> > >>> >> DocInfo: <docId><type> = value1 > >>> >> DocInfo: <docId><priority> = value2 > >>> >> DocInfo: <docId><etcetc> = value3 > >>> >> > >>> >> > >>> >> Vs > >>> >> > >>> >> DocInfo: docId = value (JSON(type, priority, etcetc)) > >>> >> > >>> >> Thank you. > >>> >> > >>> > > >>> > This is mostly depends on usage pattern. > >>> > > >>> > 1. each value in storage have full key > >>> key/family/qualifier/timestamp, so > >>> > keyvalue size increasing > >>> > (but this negative effect can be negated by using compression). > So > >>> > serialisation form will be smaller, take less disk io, and can be > >>> faster. > >>> > > >>> > 2. second option gives you atomic updates (i.e all data comes as > one > >>> > "piece") and with first option you > >>> > can have concurrent updates of the fields (and of course > individual > >>> history, > >>> > in opposite to serialized object, which will have history for a > whole > >>> > object) > >>> > > >>> > 3. in serialised form you cant use server side filters (out of > the > >>> box, you > >>> > should patch hbase to support custom filters, which will > deserialise > >>> object > >>> > or use jsonpath on it's serialised form), but with first option - > you > >>> can. > >>> > > >> > >
