On Sun, Sep 18 2016 at 22:02 CEST, asm...@ix.netcom.com writes: > On 9/18/2016 3:26 AM, Janusz S. Bien wrote:
[...] >> From the Unicode glossary: >> >> Grapheme. (1) A minimally distinctive unit of writing in the context >> of a particular writing system.[...] (2) What a user thinks of as a >> character. > > "writing system" is vague enough to cover variations that might be > regional or language dependent. That is obvious for me. >> >> As for (2), cf. >> >> User-Perceived Character. What everyone thinks of as a character in >> their script. >> >> So we have "a user" versus "everyone...in their script" - is the >> difference intentional? Probably not. Anyway the definitions are >> language/locale dependent. > > The "everyone" here aims at a shared understanding. That's also quite obvious for me. "A user" is grapheme (2) is at least strange. > > This becomes tricky in the case of Abugidas. There's certainly a > shared understanding that the "unit of writing" is the syllable, > rather than in individual mark, but the latter do have well-understood > identities, not least for teaching. That's perhaps the reason why > there's the handwaving about "minimally distinctive". > > In some scripts like that, users can enter multiple sequences of > characters that resolve (for all practical purposes) into the same > syllable. (A big part of that in some scripts is that Unicode does not > always provide a means to normalize the order of subsidiary signs and > marks, typically combining marks) > > For some tasks it would be great to have only well-formed syllables; > but to do that, you would need to add additional interpretation on top > of the Unicode definitions of a grapheme cluster. > > If you just wrap the raw combining sequences into textels, then some > tasks might not actually get simpler. Instead of a simple rule that > determines which alternate orderings of marks are equivalent (to > account for users not typing them in the preferred order) you would > have to exhaustively list all combinations and set up equivalent > tables. I would like to know how Swift is handling this. I still have a feeling that the Swift characters are almost exactly my textels. Best regards Janusz -- , Prof. dr hab. Janusz S. Bien - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/