Thanks, Dar. These tips will come in handy, and help confirm some of
the things I was already thinking. A 'sort lines' command, after
converting upper case to lower, works fairly well, except that,
curiously, a space sorts *after* all cyrillic chars. I'm sure that's
because rev is really doing an ascii sort on the unicode text, and
the first byte of each unicode character is < #0020. What we really
need is a sort ... unicode option to go along with sort ... text and
sort ... numeric.
Devin
On May 26, 2006, at 12:07 AM, Dar Scott wrote:
On May 25, 2006, at 4:19 PM, Devin Asay wrote:
I have a need to sort long lists of Cyrillic unicode text
according to Russian alphabet order. Before I start writing my own
routine, has anyone figured out how to sort unicode text lists?
Here are some hints:
1.
Trick: If you are sorting strings with only characters from the
same 256 character range, then byte-order doesn't matter when doing
a lexical sort. For example, if all your characters are in the
Cyrillic range of U+0400 to U+04FF, then you can use an ordinary
byte character sort. However, if you have spaces (U+0020) then you
will need to replace them with something else for sorting or make
sure you have control over order.
2.
If the high byte if the Unicode characters never looks like a digit
then you can compare with < (probably not important if using 'sort').
3.
The basic alphabet of a language is typically coded in roughly the
order needed for sorting. That rough order may be just fine for
your need.
4.
Conversion from lower to upper or upper to lower for sorting is
often just a bit-logic operation. However, since you usually have
to do range checking, then adding or subtracting an offset works
fine, too. If you know you have only basic upper and lower
letters, doing the bit op every time is probably faster. This
should work for a rough sort.
5.
The basic alphabet of a language in unicode might include
characters you don't use. That is OK as long as the ones you do
use are coded in the right order. The holes don't matter.
Dar
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution