I read the following on Text Manager and just wondering if this means that "strings are always mainteained in UTF-8 format while characters are always in 2 bytes".
No. The encoding for characters in a string is based on the device's character encoding. Currently no device is based on UTF-8 - the set of officially supported character encodings is Latin, Shift-JIS, and EUC-CN (aka GB).
"Character variables are always two bytes long. However, when you add a character to a string, the operating system may shrink it down to a single byte if it's a low ASCII character. Thus, any string that you work with may contain a mix of single-byte and multi-byte characters, up to four bytes."
Here's an example from what happens on a Japanese device that uses the Shift-JIS character encoding, first using the character 'A' (0x41):
WChar myChar = 0x0041; // sizeof(myChar) == 2. UInt16 charSize = TxtSetNextChar(buffer, 0, myChar); // charSize = 1
After the call to TxtSetNextChar, buffer[0] will contain 0x41.
Now an example (also using Japanese) of what happens when you set the first character in the buffer string to be a full-width (double-byte) space (0x8140):
WChar myChar = 0x8140; // sizeof(myChar) == 2. UInt16 charSize = TxtSetNextChar(buffer, 0, myChar); // charSize = 2
After this call, buffer[0] will contain 0x81, and buffer[1] will contain 0x40.
So in a string, a single "character" can occupy one or two bytes when the device's character encoding is Shift-JIS. If and when a device is released that uses UTF-8, each character will occupy between one and four bytes in the string.
-- Ken -- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-470-9200
-- For information on using the Palm Developer Forums, or to unsubscribe, please see http://www.palmos.com/dev/support/forums/
