Re: Sorting question

Chris Tarnas Mon, 22 Aug 2011 19:21:19 -0700

That has worked well for me when I don't care about using printable characters.


-chris

On Aug 22, 2011, at 6:06 PM, Mark <[email protected]> wrote:

> How about an empty byte (0x00)?
> 
> On 8/22/11 6:03 PM, Chris Tarnas wrote:
>> Generally you want your delimiters to be less than any valid character. For 
>> normal character data I've found tab (0x09) works well, it's pretty much the 
>> first option. Forward slash (0x2f) is less reliable depending on what other 
>> non-alphanumeric characters are allowed.
>> 
>> -chris
>> 
>> 
>> 
>> On Aug 22, 2011, at 5:04 PM, Mark wrote:
>> 
>>> I have another question though ;)
>>> 
>>> Is there a better separator I could use to accomplish natural sorting? Also 
>>> what is the preferred way to use start and stop keys when scanning? For 
>>> example: STARTROW =>  "foo", ENDROW =>  "foo#{what should go here?}".
>>> 
>>> Thanks
>>> 
>>> On 8/22/11 4:59 PM, Mark wrote:
>>>> After further investigation it turns out it is my use case.
>>>> 
>>>> My keys are actually in the form of:
>>>> "idx_query/foo bar/9223372035540718511"
>>>> "idx_query/foo/9223372035540718648"
>>>> 
>>>> Now that I look at it, it make perfect sense why "foo bar" comes before 
>>>> "foo/"
>>>> 
>>>> Sorry for the confusion.
>>>> 
>>>> On 8/22/11 9:16 AM, Chris Tarnas wrote:
>>>>> Good point on the sorting issues with thrift - what client language are 
>>>>> you using? Using perl I have not seen inconstancies in ordering.
>>>>> 
>>>>> Do your strings have any particular terminator that is being included but 
>>>>> not seen in your output? Can you send out the rowkeys from scans in the 
>>>>> HBase shell? That would help narrow it down.
>>>>> 
>>>>> -chris
>>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 22, 2011, at 10:55 AM, Jesse Hutton wrote:
>>>>> 
>>>>>> I don't use the thrift API, but my suspicion is that it doesn't return
>>>>>> results in the correct order. You're not the only one I've seen report
>>>>>> strange things about results ordering recently, and IIRC they were using
>>>>>> thrift as well.
>>>>>> 
>>>>>> Can you verify that the results sort the same using the Java API or even 
>>>>>> by
>>>>>> looking at it in the HBase shell?
>>>>>> 
>>>>>> Jesse
>>>>>> 
>>>>>> On Mon, Aug 22, 2011 at 11:28 AM, Mark<[email protected]>   
>>>>>> wrote:
>>>>>> 
>>>>>>> Im still also confused on how  "foo " is less than "foo". Aren't their
>>>>>>> respective bytes [102, 111, 111, 32] , and [102, 111, 111] ?
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/22/11 7:33 AM, Mark wrote:
>>>>>>> 
>>>>>>>> Is there anyway to around this to achieve natural ordering? Thanks
>>>>>>>> 
>>>>>>>> On 8/21/11 10:17 PM, Chris Tarnas wrote:
>>>>>>>> 
>>>>>>>>> HBase doesn't use the localized sorting rules, it sorts on the byte
>>>>>>>>> value. Space is ASCII 32, a value less than the alphanumeric 
>>>>>>>>> characters.
>>>>>>>>> 
>>>>>>>>> -chris
>>>>>>>>> 
>>>>>>>>> On Aug 21, 2011, at 8:11 PM, Mark<[email protected]**>    
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> FYI I am using openScannerWithPrefix thrift api call
>>>>>>>>>> On 8/21/11 6:47 PM, Mark wrote:
>>>>>>>>>> 
>>>>>>>>>>> Why when scanning do I see the following sort order?
>>>>>>>>>>> 
>>>>>>>>>>> "foo  bar"
>>>>>>>>>>> "foo bar"
>>>>>>>>>>> "foo"
>>>>>>>>>>> 
>>>>>>>>>>> I thought that "foo" would be sorted before "foo bar" since this is
>>>>>>>>>>> natural ordering. Why am I seeing these results?
>>>>>>>>>>>

Re: Sorting question

Reply via email to