On Jun 2, 2006, at 9:45 AM, Devin Asay wrote:

  replace "Ж" with "ж" in lList

I didn't know you could do that with the current editor. I had been suggesting a way to do that kind of thing using UTF-8 and was hoping an script editor publisher would pick up on it.

However, the 2.7.1 editor uses host order UTF-16, which is pretty silly since you can end up with problems like this:

replace ""&quote with "т" in lList --U.C. Russ T has #0022 as byte 2 (= ascii quote char)

And that solution isn't quite right and isn't close on other platforms.

Not only that but strings like "Ж is zhe" are garbled. Who knows what happens with characters in the high range of the rev traditional host character encoding.

The right way to do this until we get full Unicode is to make this UTF8. The bad news is that some folks might be already using this and assuming Unicode and where it does not work, adding lots of ad hoc fixes.

UTF-8!

Why? There are no hidden ASCII chars in UTF-8. I mean 7-bit true ASCII. If it looks like an ASCII char, it is. All non-ASCII chars are represented by a sequence of bytes with the high-bit set. With a minor exceptions that can be taken care of (>= single char, format(), etc) this means that UTF-8 with Unicode in comments and quoted literals will parse OK. There might be a surprise, of course.

This is also why item and line parsing works fine with UTF-8. There are no hidden commas and line ends.

Dar



_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to