Aha. That makes sense (both atomic writes and Filters). I am definitely only looking to filter within a given user, so looks like what you describe below might work for me.
Thanks so much for all your help, Jonathan. You have saved me (at least) 2 weeks of tinkering and poking around! On Mon, Jun 21, 2010 at 5:10 PM, Jonathan Gray <[email protected]> wrote: > It would be inefficient to run that query against this schema, if you're > talking about finding all documents with a given author across all users. In > that case you'd want to use an additional table that had row keys as authors. > > If you want to search for documents with a specific author within a given > users documents (single row) then you could use filters, and as Andrey said, > it would be simpler if it was broken up into individual qualifiers but could > also be done with a custom filter to read the serialized value. > > To answer your question, you'd want a QualifierFilter that matched against > qualifiers of the form <anylong><author> and then a ValueFilter which matched > the value against the specific author you're looking for. > > JG > >> -----Original Message----- >> From: N Kapshoo [mailto:[email protected]] >> Sent: Monday, June 21, 2010 2:59 PM >> To: [email protected] >> Subject: Re: composite value vs composite qualifier >> >> I am not sure how to use filters in my case since I do not know the >> column name. >> Eg: >> DocInfo: 123213+author = "abc" >> >> 123213 is the docId. If I want to look for authors named 'abc' in all >> docs, how would I go about specifying a filter? >> >> Thanks. >> >> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]> >> wrote: >> > 2010/6/22 N Kapshoo <[email protected]> >> > >> >> Is there any querying value in separating out values tied to each >> >> other vs. keeping them in a serialized object? I am guessing the >> >> second option would be much faster considering it is one composite >> >> value on the disk, but I would like to know if there are any >> specific >> >> advantages to doing things the other way. Thanks. >> >> The values themselves are very small, basic information in String. >> >> >> >> Eg: >> >> >> >> DocInfo: <docId><type> = value1 >> >> DocInfo: <docId><priority> = value2 >> >> DocInfo: <docId><etcetc> = value3 >> >> >> >> >> >> Vs >> >> >> >> DocInfo: docId = value (JSON(type, priority, etcetc)) >> >> >> >> Thank you. >> >> >> > >> > This is mostly depends on usage pattern. >> > >> > 1. each value in storage have full key >> key/family/qualifier/timestamp, so >> > keyvalue size increasing >> > (but this negative effect can be negated by using compression). So >> > serialisation form will be smaller, take less disk io, and can be >> faster. >> > >> > 2. second option gives you atomic updates (i.e all data comes as one >> > "piece") and with first option you >> > can have concurrent updates of the fields (and of course individual >> history, >> > in opposite to serialized object, which will have history for a whole >> > object) >> > >> > 3. in serialised form you cant use server side filters (out of the >> box, you >> > should patch hbase to support custom filters, which will deserialise >> object >> > or use jsonpath on it's serialised form), but with first option - you >> can. >> > >
