RE: Long vs String for qualifier

Jonathan Gray Mon, 21 Jun 2010 10:34:15 -0700

Sounds reasonable.


> -----Original Message-----
> From: N Kapshoo [mailto:[email protected]]
> Sent: Monday, June 21, 2010 10:31 AM
> To: [email protected]
> Subject: Re: Long vs String for qualifier
> 
> I should mention that current 'status' is part of the JSON. But I am
> trying to separate it out since there will be a 'count' and
> updateStatus call on this and I do not want to keep reading/writing
> the entire json just for this 1 flag.
> Correct me if you think I am going down the wrong path...
> Thanks.
> 
> On Mon, Jun 21, 2010 at 12:25 PM, N Kapshoo <[email protected]> wrote:
> > Thanks for the quick reply.
> >
> > I have a schema design based on ids because I actually have the ids
> as
> > rowids in another table. This is to avoid data redundancy since we
> > might have a big doc referenced by millions of users, but we dont
> want
> > to store a copy for every user. So,
> >
> > Table: Docs
> > Row: docId (long generated by incrementColumnValue)
> > ColFamily: Data
> >
> > Table: Users
> > Row: UserId
> > ColFamily: DocInfo
> > Qualifier: docId
> > Value: More information per user (JSON)
> >
> > Now in addition:
> > ColFamily: DocInfo
> > Qualifier: docId_status
> > Value: Status
> >
> > Now I want a status on each doc for each user. This status might
> > change several times.
> > The first column, docInfo is static, its value doesnt change once
> > inserted. However the status can be toggled back and forth (between Y
> > and N).
> >
> > The docs per user should always be sorted by docId.
> >
> > How would you design it? I am not sure how I can get the values into
> > the qualifiers when it should be sorted by docId always. Thank you.
> >
> > On Mon, Jun 21, 2010 at 12:12 PM, Jonathan Gray <[email protected]>
> wrote:
> >> Can you describe your schema a bit more?  Could you use versioning
> instead of incrementing IDs on the qualifiers?
> >>
> >> Also, you could consider having a composite value, so id1_asLong
> would have a value that contained both val1 and val5 in your example.
>  You could use any number of serialization strategies (comma-separated,
> JSON, Thrift/protobuf, Writable, etc).
> >>
> >> If you want them as two columns, I would recommend that things you
> want to retrieve together be neighboring.  For example, you might make
> the qualifiers a composite type of <id_as_long><qf_type>, so
> <id1_asLong><0byte> for the existing stuff and <id1_asLong><1byte> for
> status?  That way they are stored sequentially so optimally efficient
> at read time.
> >>
> >> JG
> >>
> >>> -----Original Message-----
> >>> From: N Kapshoo [mailto:[email protected]]
> >>> Sent: Monday, June 21, 2010 9:59 AM
> >>> To: [email protected]
> >>> Subject: Long vs String for qualifier
> >>>
> >>> I have a 'long' number that I get by using
> >>> HTable.'incrementColumnValue'. This long is used as the qualifier
> id
> >>> on a columnFamily.
> >>>
> >>> Now I need to add a prefix 'status' so that I can store another
> value
> >>> in the same family.
> >>>
> >>> How should I consider String vs long sorting?
> >>>
> >>> So right now:
> >>>
> >>> colFamily: id1_asLong = val1
> >>> colFamily: id2_asLong = val2
> >>> colFamily: id3_asLong = val3
> >>> colFamily: id4_asLong = val4
> >>>
> >>> and in addition
> >>>
> >>> colFamily: status_id1_asString = val5
> >>> colFamily: status_id2_asString = val6
> >>> colFamily: status_id3_asString = val7
> >>> colFamily: status_id4_asString = val8
> >>>
> >>> To make sure that 'id' values are sorted and accessed sequentially,
> >>> should I change my design so that the id1_asLong is stored as
> >>> id1_asString?
> >>> When I do my Get, I always get id1_asLong and status_id1_asString
> >>> together.
> >>>
> >>> Thanks.
> >>
> >

RE: Long vs String for qualifier

Reply via email to