Hi!

Thanks. So, it isn't a fixed width with 2 bytes in general, but rather
depends on the characters? If yes, I think this means I don't have to be
worried about at all? 

Thanks,
Thomas

-----Original Message-----
From: Joey Echeverria [mailto:[email protected]] 
Sent: Dienstag, 26. Juli 2011 18:36
To: [email protected]
Subject: Re: Encoding when using Bytes.toBytes(String)?

Bytes.toBytes(String) encodes using UTF-8 [1]. If all of your characters
are ASCII, then you'll use only one byte per character. I think some
ANSI characters will map to multibyte characters in UTF-8.

-Joey

[1]
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#
toBytes(java.lang.String)

On Tue, Jul 26, 2011 at 6:37 AM, Steinmaurer Thomas
<[email protected]> wrote:
> Hello,
>
>
>
> we are currently running tests in respect to disk space usage when 
> inserting records into our table. Just want to be sure, if
> Bytes.toBytes(String) encodes a character with 2 bytes (Unicode)?
>
>
>
> As we only have ANSI characters in the rowkey (~ 48 characters) and 
> qualifier values, I wonder if we could save disk space by converting 
> stuff to an Ansi-String before sending it to the server?
>
>
>
> Thanks,
>
> Thomas
>
>
>
>



--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to