Wow!  Great news for sorting Unicode!

On May 30, 2006, at 5:08 PM, Devin Asay wrote:

I got your code to work by making some simple changes in the sortCodeFromRussian function:

Deven, I've been processing some bits of UTF-8, and something dawned on me that is probably known by the Unicode experts.

**** A lexical byte sort of well-formed UTF-8 will result in a Unicode code point sort! *****

That avoids the NUL problem in sort. That means that russianLex() can return the UTF-8 of the string with your character conversions.

I think the replace command will work with UTF-8, so you can even avoid a character loop. All you need is 34 replaces and then a return. OK, that might actually be slower than a character loop.

Dar
Unicode Sophomore


_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to