On 9/18/2016 3:26 AM, Janusz S. Bien wrote:
Quote/Cytat - Christoph Päper <christoph.pae...@crissov.de> (pią, 16 wrz 2016, 23:51:38):

Janusz S. Bień <jsb...@mimuw.edu.pl>:

1. Graphemes, if I understand correctly, are language dependent, …

That’s true in linguistic terminology – well, at least within the more popular schools of thought –, but not in technical (i.e. Unicode) jargon.

From the Unicode glossary:

Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system.[...] (2) What a user thinks of as a character.

"writing system" is vague enough to cover variations that might be regional or language dependent.

As for (2), cf.

User-Perceived Character. What everyone thinks of as a character in their script.

So we have "a user" versus "everyone...in their script" - is the difference intentional? Probably not. Anyway the definitions are language/locale dependent.

The "everyone" here aims at a shared understanding.

This becomes tricky in the case of Abugidas. There's certainly a shared understanding that the "unit of writing" is the syllable, rather than in individual mark, but the latter do have well-understood identities, not least for teaching. That's perhaps the reason why there's the handwaving about "minimally distinctive".

In some scripts like that, users can enter multiple sequences of characters that resolve (for all practical purposes) into the same syllable. (A big part of that in some scripts is that Unicode does not always provide a means to normalize the order of subsidiary signs and marks, typically combining marks)

For some tasks it would be great to have only well-formed syllables; but to do that, you would need to add additional interpretation on top of the Unicode definitions of a grapheme cluster.

If you just wrap the raw combining sequences into textels, then some tasks might not actually get simpler. Instead of a simple rule that determines which alternate orderings of marks are equivalent (to account for users not typing them in the preferred order) you would have to exhaustively list all combinations and set up equivalent tables.


Reply via email to