Jim Allan wrote: > But in the Unicode Collation Algorithm diacritics are only used for > secondary level sorting. See again http://unicode.org/reports/tr10 .
I thought that the standards listed some languages as being excluded from that set of rules. > That's what people normally want in sorts, as you can see by looking at > dictionaries for languages which contain numerous diacritics. That really depends upon the language. > Your forms are being sorted properly, if you recognize that diacritics > are a secondary element in the sort, only taken into account for forms > that are identical save for diacritics. For !Kung,and related languages, diacritic marks are primary elements. IOW, if diacritic marks are mishandled, the result is an unsorted list. > The innate Unicode value of the symbol is only used in generating a possible > forth level of collation, usually not employed. For African languages it should be the first level of collation, not fourth level. > Modern sorting technology has moved past considering the character value of a character at all. IOW, they are able to recognize that a glyph that represents a letter with a diacritic mark, and a glyph of a letter plus a glyph of a combining diacritic mark have equal value. Which means that it should be easy to setup a sequence to use for sorting !Kung and related languages. [I'm using !Kung as an example, because I can spell it correctly.The other person is starting their project with other Namibian languages that are slightly easier to speak and spell. Not that !Kung is that hard to speak.] xan jonathon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
