I should mention that current 'status' is part of the JSON. But I am
trying to separate it out since there will be a 'count' and
updateStatus call on this and I do not want to keep reading/writing
the entire json just for this 1 flag.
Correct me if you think I am going down the wrong path...
Thanks.

On Mon, Jun 21, 2010 at 12:25 PM, N Kapshoo <[email protected]> wrote:
> Thanks for the quick reply.
>
> I have a schema design based on ids because I actually have the ids as
> rowids in another table. This is to avoid data redundancy since we
> might have a big doc referenced by millions of users, but we dont want
> to store a copy for every user. So,
>
> Table: Docs
> Row: docId (long generated by incrementColumnValue)
> ColFamily: Data
>
> Table: Users
> Row: UserId
> ColFamily: DocInfo
> Qualifier: docId
> Value: More information per user (JSON)
>
> Now in addition:
> ColFamily: DocInfo
> Qualifier: docId_status
> Value: Status
>
> Now I want a status on each doc for each user. This status might
> change several times.
> The first column, docInfo is static, its value doesnt change once
> inserted. However the status can be toggled back and forth (between Y
> and N).
>
> The docs per user should always be sorted by docId.
>
> How would you design it? I am not sure how I can get the values into
> the qualifiers when it should be sorted by docId always. Thank you.
>
> On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]> wrote:
>> Can you describe your schema a bit more?  Could you use versioning instead 
>> of incrementing IDs on the qualifiers?
>>
>> Also, you could consider having a composite value, so id1_asLong would have 
>> a value that contained both val1 and val5 in your example.  You could use 
>> any number of serialization strategies (comma-separated, JSON, 
>> Thrift/protobuf, Writable, etc).
>>
>> If you want them as two columns, I would recommend that things you want to 
>> retrieve together be neighboring.  For example, you might make the 
>> qualifiers a composite type of <id_as_long><qf_type>, so <id1_asLong><0byte> 
>> for the existing stuff and <id1_asLong><1byte> for status?  That way they 
>> are stored sequentially so optimally efficient at read time.
>>
>> JG
>>
>>> -----Original Message-----
>>> From: N Kapshoo [mailto:[email protected]]
>>> Sent: Monday, June 21, 2010 9:59 AM
>>> To: [email protected]
>>> Subject: Long vs String for qualifier
>>>
>>> I have a 'long' number that I get by using
>>> HTable.'incrementColumnValue'. This long is used as the qualifier id
>>> on a columnFamily.
>>>
>>> Now I need to add a prefix 'status' so that I can store another value
>>> in the same family.
>>>
>>> How should I consider String vs long sorting?
>>>
>>> So right now:
>>>
>>> colFamily: id1_asLong = val1
>>> colFamily: id2_asLong = val2
>>> colFamily: id3_asLong = val3
>>> colFamily: id4_asLong = val4
>>>
>>> and in addition
>>>
>>> colFamily: status_id1_asString = val5
>>> colFamily: status_id2_asString = val6
>>> colFamily: status_id3_asString = val7
>>> colFamily: status_id4_asString = val8
>>>
>>> To make sure that 'id' values are sorted and accessed sequentially,
>>> should I change my design so that the id1_asLong is stored as
>>> id1_asString?
>>> When I do my Get, I always get id1_asLong and status_id1_asString
>>> together.
>>>
>>> Thanks.
>>
>

Reply via email to