Now here is my conundrum:
I would be doing both queries very often. The UI shows count(status=Y)
on the very first page and then depending on whether user does a
listing, would show all other info and status info per doc.

Is it a bad idea to have both a new ColumnFamily and store it in a
qualifier as well? Same data in 2 places, but it would help the read
performance in both queries right?

When you say append a byte, I assume something like this, am I right?

byte[] arr = Bytes.toBytes(docId);
arr[arr.length] = '0';

Thanks so much for your help.

On Mon, Jun 21, 2010 at 12:33 PM, Jonathan Gray <[email protected]> wrote:
> Got it.
>
> Well, you could do what you're describing below, appending something at the 
> end of the docId to notate that it's the status column.  You wouldn't need to 
> use a "_status" string, could be as simple as appending an additional byte of 
> type information.
>
> Another option is to break status into a separate column family.
>
> What are the most common queries and which query is most critical 
> performance-wise?
>
> Are you most interested in "give me all docs and their statuses for user X" 
> or more like "give me the info for doc Y" or "give me status for doc Z"?
>
> If the first one, then seems like adding a type byte after the docId would 
> make the most sense and be most optimal.
>
> JG
>
>> -----Original Message-----
>> From: N Kapshoo [mailto:[email protected]]
>> Sent: Monday, June 21, 2010 10:26 AM
>> To: [email protected]
>> Subject: Re: Long vs String for qualifier
>>
>> Thanks for the quick reply.
>>
>> I have a schema design based on ids because I actually have the ids as
>> rowids in another table. This is to avoid data redundancy since we
>> might have a big doc referenced by millions of users, but we dont want
>> to store a copy for every user. So,
>>
>> Table: Docs
>> Row: docId (long generated by incrementColumnValue)
>> ColFamily: Data
>>
>> Table: Users
>> Row: UserId
>> ColFamily: DocInfo
>> Qualifier: docId
>> Value: More information per user (JSON)
>>
>> Now in addition:
>> ColFamily: DocInfo
>> Qualifier: docId_status
>> Value: Status
>>
>> Now I want a status on each doc for each user. This status might
>> change several times.
>> The first column, docInfo is static, its value doesnt change once
>> inserted. However the status can be toggled back and forth (between Y
>> and N).
>>
>> The docs per user should always be sorted by docId.
>>
>> How would you design it? I am not sure how I can get the values into
>> the qualifiers when it should be sorted by docId always. Thank you.
>>
>> On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]>
>> wrote:
>> > Can you describe your schema a bit more?  Could you use versioning
>> instead of incrementing IDs on the qualifiers?
>> >
>> > Also, you could consider having a composite value, so id1_asLong
>> would have a value that contained both val1 and val5 in your example.
>>  You could use any number of serialization strategies (comma-separated,
>> JSON, Thrift/protobuf, Writable, etc).
>> >
>> > If you want them as two columns, I would recommend that things you
>> want to retrieve together be neighboring.  For example, you might make
>> the qualifiers a composite type of <id_as_long><qf_type>, so
>> <id1_asLong><0byte> for the existing stuff and <id1_asLong><1byte> for
>> status?  That way they are stored sequentially so optimally efficient
>> at read time.
>> >
>> > JG
>> >
>> >> -----Original Message-----
>> >> From: N Kapshoo [mailto:[email protected]]
>> >> Sent: Monday, June 21, 2010 9:59 AM
>> >> To: [email protected]
>> >> Subject: Long vs String for qualifier
>> >>
>> >> I have a 'long' number that I get by using
>> >> HTable.'incrementColumnValue'. This long is used as the qualifier id
>> >> on a columnFamily.
>> >>
>> >> Now I need to add a prefix 'status' so that I can store another
>> value
>> >> in the same family.
>> >>
>> >> How should I consider String vs long sorting?
>> >>
>> >> So right now:
>> >>
>> >> colFamily: id1_asLong = val1
>> >> colFamily: id2_asLong = val2
>> >> colFamily: id3_asLong = val3
>> >> colFamily: id4_asLong = val4
>> >>
>> >> and in addition
>> >>
>> >> colFamily: status_id1_asString = val5
>> >> colFamily: status_id2_asString = val6
>> >> colFamily: status_id3_asString = val7
>> >> colFamily: status_id4_asString = val8
>> >>
>> >> To make sure that 'id' values are sorted and accessed sequentially,
>> >> should I change my design so that the id1_asLong is stored as
>> >> id1_asString?
>> >> When I do my Get, I always get id1_asLong and status_id1_asString
>> >> together.
>> >>
>> >> Thanks.
>> >
>

Reply via email to