On May 25, 2006, at 4:19 PM, Devin Asay wrote:

I have a need to sort long lists of Cyrillic unicode text according to Russian alphabet order. Before I start writing my own routine, has anyone figured out how to sort unicode text lists?

Here are some hints:

1.
Trick: If you are sorting strings with only characters from the same 256 character range, then byte-order doesn't matter when doing a lexical sort. For example, if all your characters are in the Cyrillic range of U+0400 to U+04FF, then you can use an ordinary byte character sort. However, if you have spaces (U+0020) then you will need to replace them with something else for sorting or make sure you have control over order.

2.
If the high byte if the Unicode characters never looks like a digit then you can compare with < (probably not important if using 'sort').

3.
The basic alphabet of a language is typically coded in roughly the order needed for sorting. That rough order may be just fine for your need.

4.
Conversion from lower to upper or upper to lower for sorting is often just a bit-logic operation. However, since you usually have to do range checking, then adding or subtracting an offset works fine, too. If you know you have only basic upper and lower letters, doing the bit op every time is probably faster. This should work for a rough sort.

5.
The basic alphabet of a language in unicode might include characters you don't use. That is OK as long as the ones you do use are coded in the right order. The holes don't matter.

Dar
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to