Sounds reasonable.
> -----Original Message----- > From: N Kapshoo [mailto:[email protected]] > Sent: Monday, June 21, 2010 10:31 AM > To: [email protected] > Subject: Re: Long vs String for qualifier > > I should mention that current 'status' is part of the JSON. But I am > trying to separate it out since there will be a 'count' and > updateStatus call on this and I do not want to keep reading/writing > the entire json just for this 1 flag. > Correct me if you think I am going down the wrong path... > Thanks. > > On Mon, Jun 21, 2010 at 12:25 PM, N Kapshoo <[email protected]> wrote: > > Thanks for the quick reply. > > > > I have a schema design based on ids because I actually have the ids > as > > rowids in another table. This is to avoid data redundancy since we > > might have a big doc referenced by millions of users, but we dont > want > > to store a copy for every user. So, > > > > Table: Docs > > Row: docId (long generated by incrementColumnValue) > > ColFamily: Data > > > > Table: Users > > Row: UserId > > ColFamily: DocInfo > > Qualifier: docId > > Value: More information per user (JSON) > > > > Now in addition: > > ColFamily: DocInfo > > Qualifier: docId_status > > Value: Status > > > > Now I want a status on each doc for each user. This status might > > change several times. > > The first column, docInfo is static, its value doesnt change once > > inserted. However the status can be toggled back and forth (between Y > > and N). > > > > The docs per user should always be sorted by docId. > > > > How would you design it? I am not sure how I can get the values into > > the qualifiers when it should be sorted by docId always. Thank you. > > > > On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]> > wrote: > >> Can you describe your schema a bit more? Could you use versioning > instead of incrementing IDs on the qualifiers? > >> > >> Also, you could consider having a composite value, so id1_asLong > would have a value that contained both val1 and val5 in your example. > You could use any number of serialization strategies (comma-separated, > JSON, Thrift/protobuf, Writable, etc). > >> > >> If you want them as two columns, I would recommend that things you > want to retrieve together be neighboring. For example, you might make > the qualifiers a composite type of <id_as_long><qf_type>, so > <id1_asLong><0byte> for the existing stuff and <id1_asLong><1byte> for > status? That way they are stored sequentially so optimally efficient > at read time. > >> > >> JG > >> > >>> -----Original Message----- > >>> From: N Kapshoo [mailto:[email protected]] > >>> Sent: Monday, June 21, 2010 9:59 AM > >>> To: [email protected] > >>> Subject: Long vs String for qualifier > >>> > >>> I have a 'long' number that I get by using > >>> HTable.'incrementColumnValue'. This long is used as the qualifier > id > >>> on a columnFamily. > >>> > >>> Now I need to add a prefix 'status' so that I can store another > value > >>> in the same family. > >>> > >>> How should I consider String vs long sorting? > >>> > >>> So right now: > >>> > >>> colFamily: id1_asLong = val1 > >>> colFamily: id2_asLong = val2 > >>> colFamily: id3_asLong = val3 > >>> colFamily: id4_asLong = val4 > >>> > >>> and in addition > >>> > >>> colFamily: status_id1_asString = val5 > >>> colFamily: status_id2_asString = val6 > >>> colFamily: status_id3_asString = val7 > >>> colFamily: status_id4_asString = val8 > >>> > >>> To make sure that 'id' values are sorted and accessed sequentially, > >>> should I change my design so that the id1_asLong is stored as > >>> id1_asString? > >>> When I do my Get, I always get id1_asLong and status_id1_asString > >>> together. > >>> > >>> Thanks. > >> > >
