Jim Allan wrote:

> But in the Unicode Collation Algorithm diacritics are only used for
> secondary level sorting. See again http://unicode.org/reports/tr10 .

I thought that the standards listed some languages as being
excluded from that set of rules.

> That's what people normally want in sorts, as you can see by looking at
> dictionaries for languages which contain numerous diacritics.

That really depends upon the language.

> Your forms are being sorted properly, if you recognize that diacritics
> are a secondary element in the sort, only taken into account for forms
> that are identical save for diacritics.

For !Kung,and related languages, diacritic marks are primary
elements. IOW, if diacritic marks are mishandled, the result
is an unsorted list.

> The innate Unicode value of the symbol is only used in generating a possible 
> forth level of collation, usually not employed.

For African languages it should be the first level of
collation, not fourth level.

> Modern sorting technology has moved past considering the
character value of a character at all.

IOW, they are able to recognize that a glyph that represents
a letter with a diacritic mark, and a glyph of a letter plus
a glyph of a combining diacritic mark have equal value.

Which means that it should be easy to setup a sequence to
use for sorting !Kung and related languages.  [I'm using
!Kung as an example, because I can spell it correctly.The
other person is starting their project with other Namibian
languages that are slightly easier to speak and spell.  Not
that !Kung is that hard to speak.]

xan

jonathon

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to