Why is that? Afaik everything is just a byte sequence, what prevents non-printable chars from being used in CF/table names?
- Nasron On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <[email protected] > wrote: > This is fine for the key. Just so you are aware, you can not use this for > table name and CF name since they need to be printable characters only. > > JM > > > 2013/11/6 Nasron Cheong <[email protected]> > > > Yes, after some digging around, the key is to store integers as byte > > representation, but more importantly to store them as big-endian so that > > the lexicographical sequence is maintained. > > > > Thanks! > > > > - Nasron > > > > > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <[email protected]> > > wrote: > > > > > you can store the byte representation of the integer (fixed length) > > instead > > > of the integer (which will be stored as strings of variable length) and > > > will also be sorted. > > > > > > > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong > > > <[email protected]>wrote: > > > > > > > Yes, its limited in the sense that we have to precalculate the number > > of > > > > digits required so we don't run out, and if we overestimate, then our > > row > > > > keys end up taking up more space than we'd care to. > > > > > > > > We can probably live with this approach for now, but I wonder if > > there's > > > a > > > > better way. > > > > > > > > - Nasron > > > > > > > > > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari < > > > > [email protected]> wrote: > > > > > > > > > Hi Nasron, > > > > > > > > > > Why are you saying that it's a limited way? Does it achieve your > > needs? > > > > > > > > > > > > > > > 2013/11/4 Nasron Cheong <[email protected]> > > > > > > > > > > > An example query would be the following, say the column qualifier > > was > > > > of > > > > > > the form > > > > > > > > > > > > <bucket #>:<msg type> > > > > > > > > > > > > where <bucket #> should be an integer value, and msg type is a > > > string. > > > > > E.g. > > > > > > > > > > > > 1:abc > > > > > > 1000:abc > > > > > > 2: abc > > > > > > > > > > > > would appear in the above sequence, which is out of order when > > doing > > > > > prefix > > > > > > filtering. Zero padding could fix this: > > > > > > > > > > > > 0001:abc > > > > > > 0002:abc > > > > > > 1000: abc > > > > > > > > > > > > But is a limited way of ensuring the sequence of CQ (column > > > qualifiers) > > > > > is > > > > > > correct, in order for prefix filtering to work. Are there other > > > > options? > > > > > > > > > > > > - Nasron > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong > > > > > > <[email protected]>wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I'm trying to determine the best way to serialize a sequence of > > > > > > > integers/strings that represent a hierarchy for a column > > qualifier, > > > > > which > > > > > > > would be compatible with the ColumnPrefixFilters, and > > > > > BinaryComparators. > > > > > > > > > > > > > > However, due to the lexicographical sorting, it's awkward to > > > > serialize > > > > > > the > > > > > > > sequence of values needed to get it to work. > > > > > > > > > > > > > > What are the typical solutions to this? Do people just zero pad > > > > > integers > > > > > > > to make sure they sort correctly? Or do I have to implement my > > own > > > > > > > QualifierFilter - which seems expensive since I'd be > > deserializing > > > > > every > > > > > > > byte array just to compare. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > - Nasron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > Premal Shah. > > > > > >
