Re: Unicode sorting

Dar Scott Fri, 02 Jun 2006 13:12:39 -0700


On Jun 2, 2006, at 9:45 AM, Devin Asay wrote:

  replace "Ж" with "ж" in lList

I didn't know you could do that with the current editor. I had beensuggesting a way to do that kind of thing using UTF-8 and was hopingan script editor publisher would pick up on it.

However, the 2.7.1 editor uses host order UTF-16, which is prettysilly since you can end up with problems like this:

replace ""&quote with "т" in lList --U.C. Russ T has #0022 asbyte 2 (= ascii quote char)


And that solution isn't quite right and isn't close on other platforms.

Not only that but strings like "Ж is zhe" are garbled. Who knowswhat happens with characters in the high range of the rev traditionalhost character encoding.

The right way to do this until we get full Unicode is to make thisUTF8. The bad news is that some folks might be already using thisand assuming Unicode and where it does not work, adding lots of adhoc fixes.


UTF-8!

Why? There are no hidden ASCII chars in UTF-8. I mean 7-bit trueASCII. If it looks like an ASCII char, it is. All non-ASCII charsare represented by a sequence of bytes with the high-bit set. With aminor exceptions that can be taken care of (>= single char, format(),etc) this means that UTF-8 with Unicode in comments and quotedliterals will parse OK. There might be a surprise, of course.

This is also why item and line parsing works fine with UTF-8. Thereare no hidden commas and line ends.


Dar



_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Unicode sorting

Reply via email to