I should mention that current 'status' is part of the JSON. But I am trying to separate it out since there will be a 'count' and updateStatus call on this and I do not want to keep reading/writing the entire json just for this 1 flag. Correct me if you think I am going down the wrong path... Thanks.
On Mon, Jun 21, 2010 at 12:25 PM, N Kapshoo <[email protected]> wrote: > Thanks for the quick reply. > > I have a schema design based on ids because I actually have the ids as > rowids in another table. This is to avoid data redundancy since we > might have a big doc referenced by millions of users, but we dont want > to store a copy for every user. So, > > Table: Docs > Row: docId (long generated by incrementColumnValue) > ColFamily: Data > > Table: Users > Row: UserId > ColFamily: DocInfo > Qualifier: docId > Value: More information per user (JSON) > > Now in addition: > ColFamily: DocInfo > Qualifier: docId_status > Value: Status > > Now I want a status on each doc for each user. This status might > change several times. > The first column, docInfo is static, its value doesnt change once > inserted. However the status can be toggled back and forth (between Y > and N). > > The docs per user should always be sorted by docId. > > How would you design it? I am not sure how I can get the values into > the qualifiers when it should be sorted by docId always. Thank you. > > On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]> wrote: >> Can you describe your schema a bit more? Could you use versioning instead >> of incrementing IDs on the qualifiers? >> >> Also, you could consider having a composite value, so id1_asLong would have >> a value that contained both val1 and val5 in your example. You could use >> any number of serialization strategies (comma-separated, JSON, >> Thrift/protobuf, Writable, etc). >> >> If you want them as two columns, I would recommend that things you want to >> retrieve together be neighboring. For example, you might make the >> qualifiers a composite type of <id_as_long><qf_type>, so <id1_asLong><0byte> >> for the existing stuff and <id1_asLong><1byte> for status? That way they >> are stored sequentially so optimally efficient at read time. >> >> JG >> >>> -----Original Message----- >>> From: N Kapshoo [mailto:[email protected]] >>> Sent: Monday, June 21, 2010 9:59 AM >>> To: [email protected] >>> Subject: Long vs String for qualifier >>> >>> I have a 'long' number that I get by using >>> HTable.'incrementColumnValue'. This long is used as the qualifier id >>> on a columnFamily. >>> >>> Now I need to add a prefix 'status' so that I can store another value >>> in the same family. >>> >>> How should I consider String vs long sorting? >>> >>> So right now: >>> >>> colFamily: id1_asLong = val1 >>> colFamily: id2_asLong = val2 >>> colFamily: id3_asLong = val3 >>> colFamily: id4_asLong = val4 >>> >>> and in addition >>> >>> colFamily: status_id1_asString = val5 >>> colFamily: status_id2_asString = val6 >>> colFamily: status_id3_asString = val7 >>> colFamily: status_id4_asString = val8 >>> >>> To make sure that 'id' values are sorted and accessed sequentially, >>> should I change my design so that the id1_asLong is stored as >>> id1_asString? >>> When I do my Get, I always get id1_asLong and status_id1_asString >>> together. >>> >>> Thanks. >> >
