Correct. On Tue, Jul 26, 2011 at 11:07 PM, Steinmaurer Thomas <[email protected]> wrote: > Hi! > > Thanks. So, it isn't a fixed width with 2 bytes in general, but rather > depends on the characters? If yes, I think this means I don't have to be > worried about at all? > > Thanks, > Thomas > > -----Original Message----- > From: Joey Echeverria [mailto:[email protected]] > Sent: Dienstag, 26. Juli 2011 18:36 > To: [email protected] > Subject: Re: Encoding when using Bytes.toBytes(String)? > > Bytes.toBytes(String) encodes using UTF-8 [1]. If all of your characters > are ASCII, then you'll use only one byte per character. I think some > ANSI characters will map to multibyte characters in UTF-8. > > -Joey > > [1] > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html# > toBytes(java.lang.String) > > On Tue, Jul 26, 2011 at 6:37 AM, Steinmaurer Thomas > <[email protected]> wrote: >> Hello, >> >> >> >> we are currently running tests in respect to disk space usage when >> inserting records into our table. Just want to be sure, if >> Bytes.toBytes(String) encodes a character with 2 bytes (Unicode)? >> >> >> >> As we only have ANSI characters in the rowkey (~ 48 characters) and >> qualifier values, I wonder if we could save disk space by converting >> stuff to an Ansi-String before sending it to the server? >> >> >> >> Thanks, >> >> Thomas >> >> >> >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
