I am not sure how to use filters in my case since I do not know the
column name.
Eg:
DocInfo: 123213+author = "abc"

123213 is the docId. If I want to look for authors named 'abc' in all
docs, how would I go about specifying a filter?

Thanks.

On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev <[email protected]> wrote:
> 2010/6/22 N Kapshoo <[email protected]>
>
>> Is there any querying value in separating out values tied to each
>> other vs. keeping them in a serialized object? I am guessing the
>> second option would be much faster considering it is one composite
>> value on the disk, but I would like to know if there are any specific
>> advantages to doing things the other way. Thanks.
>> The values themselves are very small, basic information in String.
>>
>> Eg:
>>
>> DocInfo: <docId><type> = value1
>> DocInfo: <docId><priority> = value2
>> DocInfo: <docId><etcetc> = value3
>>
>>
>> Vs
>>
>> DocInfo: docId = value (JSON(type, priority, etcetc))
>>
>> Thank you.
>>
>
> This is mostly depends on usage pattern.
>
> 1. each value in storage have full key key/family/qualifier/timestamp, so
> keyvalue size increasing
> (but this negative effect can be negated by using compression). So
> serialisation form will be smaller, take less disk io, and can be faster.
>
> 2. second option gives you atomic updates (i.e all data comes as one
> "piece") and with first option you
> can have concurrent updates of the fields (and of course individual history,
> in opposite to serialized object, which will have history for a whole
> object)
>
> 3. in serialised form you cant use server side filters (out of the box, you
> should patch hbase to support custom filters, which will deserialise object
> or use jsonpath on it's serialised form), but with first option - you can.
>

Reply via email to