Does it still make sense to follow the previous id generation we
talked about? (for performance reasons instead of storing an entire
string?)

<docId><byte1> = value1
<docId><byte2> = value2

instead of
<docId><author> = value1
<docId><status> = value2
etc?


On Mon, Jun 21, 2010 at 5:19 PM, N Kapshoo <[email protected]> wrote:
> Aha. That makes sense (both atomic writes and Filters).
>
> I am definitely only looking to filter within a given user, so looks
> like what you describe below might work for me.
>
> Thanks so much for all your help, Jonathan. You have saved me (at
> least) 2 weeks of tinkering and poking around!
>
> On Mon, Jun 21, 2010 at 5:10 PM, Jonathan Gray <[email protected]> wrote:
>> It would be inefficient to run that query against this schema, if you're 
>> talking about finding all documents with a given author across all users.  
>> In that case you'd want to use an additional table that had row keys as 
>> authors.
>>
>> If you want to search for documents with a specific author within a given 
>> users documents (single row) then you could use filters, and as Andrey said, 
>> it would be simpler if it was broken up into individual qualifiers but could 
>> also be done with a custom filter to read the serialized value.
>>
>> To answer your question, you'd want a QualifierFilter that matched against 
>> qualifiers of the form <anylong><author> and then a ValueFilter which 
>> matched the value against the specific author you're looking for.
>>
>> JG
>>
>>> -----Original Message-----
>>> From: N Kapshoo [mailto:[email protected]]
>>> Sent: Monday, June 21, 2010 2:59 PM
>>> To: [email protected]
>>> Subject: Re: composite value vs composite qualifier
>>>
>>> I am not sure how to use filters in my case since I do not know the
>>> column name.
>>> Eg:
>>> DocInfo: 123213+author = "abc"
>>>
>>> 123213 is the docId. If I want to look for authors named 'abc' in all
>>> docs, how would I go about specifying a filter?
>>>
>>> Thanks.
>>>
>>> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]>
>>> wrote:
>>> > 2010/6/22 N Kapshoo <[email protected]>
>>> >
>>> >> Is there any querying value in separating out values tied to each
>>> >> other vs. keeping them in a serialized object? I am guessing the
>>> >> second option would be much faster considering it is one composite
>>> >> value on the disk, but I would like to know if there are any
>>> specific
>>> >> advantages to doing things the other way. Thanks.
>>> >> The values themselves are very small, basic information in String.
>>> >>
>>> >> Eg:
>>> >>
>>> >> DocInfo: <docId><type> = value1
>>> >> DocInfo: <docId><priority> = value2
>>> >> DocInfo: <docId><etcetc> = value3
>>> >>
>>> >>
>>> >> Vs
>>> >>
>>> >> DocInfo: docId = value (JSON(type, priority, etcetc))
>>> >>
>>> >> Thank you.
>>> >>
>>> >
>>> > This is mostly depends on usage pattern.
>>> >
>>> > 1. each value in storage have full key
>>> key/family/qualifier/timestamp, so
>>> > keyvalue size increasing
>>> > (but this negative effect can be negated by using compression). So
>>> > serialisation form will be smaller, take less disk io, and can be
>>> faster.
>>> >
>>> > 2. second option gives you atomic updates (i.e all data comes as one
>>> > "piece") and with first option you
>>> > can have concurrent updates of the fields (and of course individual
>>> history,
>>> > in opposite to serialized object, which will have history for a whole
>>> > object)
>>> >
>>> > 3. in serialised form you cant use server side filters (out of the
>>> box, you
>>> > should patch hbase to support custom filters, which will deserialise
>>> object
>>> > or use jsonpath on it's serialised form), but with first option - you
>>> can.
>>> >
>>
>

Reply via email to