Re: Unicode Chinese Mac

Dar Scott Tue, 10 May 2005 16:12:43 -0700


On May 10, 2005, at 1:53 PM, Dar Scott wrote:

You can't use =, is a number, contains, line, item, foundchunk, filter (except for a trick), find, +, -, /, *, add, subtract, offset (except with extra scripting), and just about anything.

But as was pointed out earlier, you get some gain by using htmlText instead of unicodeText.

Also, UTF8 will work OK for words (usually), items and lines. Not chars; you have to remember that all characters outside of the ASCII range are represented by multiple bytes. The cool thing is that ASCII characters cannot be in those multiple bytes. All of the syntactically significant characters in words, items and lines are ASCII and thus the coding cannot be embedded in those characters.

You can use (null-free) UTF8 as a key in arrays. You can use it with '=', offset and 'contains', I think, as long as the strings are correct UTF8. If caseSensitive applies to only ASCII characters, then that can be true or false.

But since each char is 1 to 4 bytes, the easiest way to get the char count is to assume BMP (no surrogates) and convert to UTF16 and half the length.

UTF8 has no byte-order, so it can move among OSes without BOM consideration.

So, for some types of processing, using UTF8 might be better than host UTF16.

Dar

--
**********************************************
    DSC (Dar Scott Consulting & Dar's Lab)
    http://www.swcp.com/dsc/
    Programming and software
**********************************************

_______________________________________________
use-revolution mailing list
[email protected]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Unicode Chinese Mac

Reply via email to