It looks like that converting a ASCII-8bit bytes array to a UTF-8 string
will result such an decoding error.
see: http://stackoverflow.com/a/11162470

I found a work-around solution by explicitly converting string to bytes
array in HBase shell. For example, executing the following command

split 'tsdb-test'.to_java_bytes, "\x00\x00\xC0".to_java_bytes

which will call the method:

void HBaseAdmin.split(byte[] tableNameOrRegionName, byte[] splitPoint)

rather than

void HBaseAdmin.split(String tableNameOrRegionName, String splitPoint)

works correctly.

I'm wondering if it is more reasonable to take this as default behavior. It
will be working for both plain strings and arbitrary bytes arrays.



On Thu, Jul 4, 2013 at 7:10 PM, Ding Haifeng <[email protected]> wrote:

> It seems that jRuby-based HBase shell does not handle ASCII-8bit
> correctly. Is there any work-around for this?
>
> My locale settings are all en_US.
>
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_PAPER="en_US"
> LC_NAME="en_US"
> LC_ADDRESS="en_US"
> LC_TELEPHONE="en_US"
> LC_MEASUREMENT="en_US"
> LC_IDENTIFICATION="en_US"
> LC_ALL=
>
>
> On Thu, Jul 4, 2013 at 4:00 PM, Ding Haifeng <[email protected]> wrote:
>
>> @stack: Thanks for explanation. I understand the difference between
>> single quotes and double quotes. Using single quote to interpret the string
>> literally is not the behavior I expect. I want the bytes exactly
>> represented by the escaped hexadecimal strings.
>>
>> @Ted: I filed a JIRA issue at
>> https://issues.apache.org/jira/browse/HBASE-8865 . I also added some
>> more observations there.
>>
>>
>>
>> On Thu, Jul 4, 2013 at 1:54 AM, Stack <[email protected]> wrote:
>>
>>> Try single quotes.  The shell (ruby) may be trying to 'help you' by
>>> interpreting your hex.
>>>
>>> hbase(main):018:0> print "\x20\n"
>>>
>>> hbase(main):019:0> print '\x20\n'
>>> \x20\nhbase(main):020:0>
>>>
>>> See how w/ double quotes it prints space and new line where when I
>>> single-quote it, it prints out the literal?
>>>
>>> At the end of the shell help it says:
>>>
>>> "If you are using binary keys or values and need to enter them in the
>>> shell, use
>>> double-quote'd hexadecimal representation. For example:
>>>
>>>   hbase> get 't1', "key\x03\x3f\xcd"
>>>   hbase> get 't1', "key\003\023\011"
>>>   hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
>>> ..."
>>>
>>> Looks like we need to add a line which says if you are using hex, to
>>> avoid
>>> ruby's interpreting your intent, single-quote.
>>>
>>> St.Ack
>>>
>>>
>>> On Wed, Jul 3, 2013 at 4:30 AM, Ding Haifeng <[email protected]>
>>> wrote:
>>>
>>> > Hi, all.
>>> >
>>> > When I tried to do a manual region split from HBase shell, I found that
>>> > split command acts incorrectly with hex split keys.
>>> >
>>> > For example, I executed
>>> >
>>> > hbase(main):003:0> split 'tsdb', "\x00\x00\xC3"
>>> >
>>> > but table 'tsdb' actually split at "\x00\x00\xEF\xBF\xBD"
>>> >
>>> > I'm running Hbase 0.94.8, r1485407, both server-side and client-side.
>>> >
>>> > Any help would be appreciated. Thanks.
>>> >
>>> >
>>> >
>>> > --
>>> > Ding Haifeng
>>> >
>>>
>>
>>
>>
>> --
>> Ding Haifeng
>>
>
>
>
> --
> Ding Haifeng
>



-- 
Ding Haifeng

Reply via email to