Thanks for the quick reply.

I have a schema design based on ids because I actually have the ids as
rowids in another table. This is to avoid data redundancy since we
might have a big doc referenced by millions of users, but we dont want
to store a copy for every user. So,

Table: Docs
Row: docId (long generated by incrementColumnValue)
ColFamily: Data

Table: Users
Row: UserId
ColFamily: DocInfo
Qualifier: docId
Value: More information per user (JSON)

Now in addition:
ColFamily: DocInfo
Qualifier: docId_status
Value: Status

Now I want a status on each doc for each user. This status might
change several times.
The first column, docInfo is static, its value doesnt change once
inserted. However the status can be toggled back and forth (between Y
and N).

The docs per user should always be sorted by docId.

How would you design it? I am not sure how I can get the values into
the qualifiers when it should be sorted by docId always. Thank you.

On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]> wrote:
> Can you describe your schema a bit more?  Could you use versioning instead of 
> incrementing IDs on the qualifiers?
>
> Also, you could consider having a composite value, so id1_asLong would have a 
> value that contained both val1 and val5 in your example.  You could use any 
> number of serialization strategies (comma-separated, JSON, Thrift/protobuf, 
> Writable, etc).
>
> If you want them as two columns, I would recommend that things you want to 
> retrieve together be neighboring.  For example, you might make the qualifiers 
> a composite type of <id_as_long><qf_type>, so <id1_asLong><0byte> for the 
> existing stuff and <id1_asLong><1byte> for status?  That way they are stored 
> sequentially so optimally efficient at read time.
>
> JG
>
>> -----Original Message-----
>> From: N Kapshoo [mailto:[email protected]]
>> Sent: Monday, June 21, 2010 9:59 AM
>> To: [email protected]
>> Subject: Long vs String for qualifier
>>
>> I have a 'long' number that I get by using
>> HTable.'incrementColumnValue'. This long is used as the qualifier id
>> on a columnFamily.
>>
>> Now I need to add a prefix 'status' so that I can store another value
>> in the same family.
>>
>> How should I consider String vs long sorting?
>>
>> So right now:
>>
>> colFamily: id1_asLong = val1
>> colFamily: id2_asLong = val2
>> colFamily: id3_asLong = val3
>> colFamily: id4_asLong = val4
>>
>> and in addition
>>
>> colFamily: status_id1_asString = val5
>> colFamily: status_id2_asString = val6
>> colFamily: status_id3_asString = val7
>> colFamily: status_id4_asString = val8
>>
>> To make sure that 'id' values are sorted and accessed sequentially,
>> should I change my design so that the id1_asLong is stored as
>> id1_asString?
>> When I do my Get, I always get id1_asLong and status_id1_asString
>> together.
>>
>> Thanks.
>

Reply via email to