Now here is my conundrum: I would be doing both queries very often. The UI shows count(status=Y) on the very first page and then depending on whether user does a listing, would show all other info and status info per doc.
Is it a bad idea to have both a new ColumnFamily and store it in a qualifier as well? Same data in 2 places, but it would help the read performance in both queries right? When you say append a byte, I assume something like this, am I right? byte[] arr = Bytes.toBytes(docId); arr[arr.length] = '0'; Thanks so much for your help. On Mon, Jun 21, 2010 at 12:33 PM, Jonathan Gray <[email protected]> wrote: > Got it. > > Well, you could do what you're describing below, appending something at the > end of the docId to notate that it's the status column. You wouldn't need to > use a "_status" string, could be as simple as appending an additional byte of > type information. > > Another option is to break status into a separate column family. > > What are the most common queries and which query is most critical > performance-wise? > > Are you most interested in "give me all docs and their statuses for user X" > or more like "give me the info for doc Y" or "give me status for doc Z"? > > If the first one, then seems like adding a type byte after the docId would > make the most sense and be most optimal. > > JG > >> -----Original Message----- >> From: N Kapshoo [mailto:[email protected]] >> Sent: Monday, June 21, 2010 10:26 AM >> To: [email protected] >> Subject: Re: Long vs String for qualifier >> >> Thanks for the quick reply. >> >> I have a schema design based on ids because I actually have the ids as >> rowids in another table. This is to avoid data redundancy since we >> might have a big doc referenced by millions of users, but we dont want >> to store a copy for every user. So, >> >> Table: Docs >> Row: docId (long generated by incrementColumnValue) >> ColFamily: Data >> >> Table: Users >> Row: UserId >> ColFamily: DocInfo >> Qualifier: docId >> Value: More information per user (JSON) >> >> Now in addition: >> ColFamily: DocInfo >> Qualifier: docId_status >> Value: Status >> >> Now I want a status on each doc for each user. This status might >> change several times. >> The first column, docInfo is static, its value doesnt change once >> inserted. However the status can be toggled back and forth (between Y >> and N). >> >> The docs per user should always be sorted by docId. >> >> How would you design it? I am not sure how I can get the values into >> the qualifiers when it should be sorted by docId always. Thank you. >> >> On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]> >> wrote: >> > Can you describe your schema a bit more? Could you use versioning >> instead of incrementing IDs on the qualifiers? >> > >> > Also, you could consider having a composite value, so id1_asLong >> would have a value that contained both val1 and val5 in your example. >> You could use any number of serialization strategies (comma-separated, >> JSON, Thrift/protobuf, Writable, etc). >> > >> > If you want them as two columns, I would recommend that things you >> want to retrieve together be neighboring. For example, you might make >> the qualifiers a composite type of <id_as_long><qf_type>, so >> <id1_asLong><0byte> for the existing stuff and <id1_asLong><1byte> for >> status? That way they are stored sequentially so optimally efficient >> at read time. >> > >> > JG >> > >> >> -----Original Message----- >> >> From: N Kapshoo [mailto:[email protected]] >> >> Sent: Monday, June 21, 2010 9:59 AM >> >> To: [email protected] >> >> Subject: Long vs String for qualifier >> >> >> >> I have a 'long' number that I get by using >> >> HTable.'incrementColumnValue'. This long is used as the qualifier id >> >> on a columnFamily. >> >> >> >> Now I need to add a prefix 'status' so that I can store another >> value >> >> in the same family. >> >> >> >> How should I consider String vs long sorting? >> >> >> >> So right now: >> >> >> >> colFamily: id1_asLong = val1 >> >> colFamily: id2_asLong = val2 >> >> colFamily: id3_asLong = val3 >> >> colFamily: id4_asLong = val4 >> >> >> >> and in addition >> >> >> >> colFamily: status_id1_asString = val5 >> >> colFamily: status_id2_asString = val6 >> >> colFamily: status_id3_asString = val7 >> >> colFamily: status_id4_asString = val8 >> >> >> >> To make sure that 'id' values are sorted and accessed sequentially, >> >> should I change my design so that the id1_asLong is stored as >> >> id1_asString? >> >> When I do my Get, I always get id1_asLong and status_id1_asString >> >> together. >> >> >> >> Thanks. >> > >
