I read the following on Text Manager and just wondering if this means that
"strings are always mainteained in UTF-8 format while characters are always
in 2 bytes".

No. The encoding for characters in a string is based on the device's character encoding. Currently no device is based on UTF-8 - the set of officially supported character encodings is Latin, Shift-JIS, and EUC-CN (aka GB).


"Character variables are always two bytes long. However, when you add a
character to a string, the operating system may shrink it down to a single
byte if it's a low ASCII character. Thus, any string that you work with may
contain a mix of single-byte and multi-byte characters, up to four bytes."

Here's an example from what happens on a Japanese device that uses the Shift-JIS character encoding, first using the character 'A' (0x41):


WChar myChar = 0x0041;  // sizeof(myChar) == 2.
UInt16 charSize = TxtSetNextChar(buffer, 0, myChar);    // charSize = 1

After the call to TxtSetNextChar, buffer[0] will contain 0x41.

Now an example (also using Japanese) of what happens when you set the first character in the buffer string to be a full-width (double-byte) space (0x8140):

WChar myChar = 0x8140;  // sizeof(myChar) == 2.
UInt16 charSize = TxtSetNextChar(buffer, 0, myChar);    // charSize = 2

After this call, buffer[0] will contain 0x81, and buffer[1] will contain 0x40.

So in a string, a single "character" can occupy one or two bytes when the device's character encoding is Shift-JIS. If and when a device is released that uses UTF-8, each character will occupy between one and four bytes in the string.

-- Ken
--
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

--
For information on using the Palm Developer Forums, or to unsubscribe, please see 
http://www.palmos.com/dev/support/forums/

Reply via email to