2014-03-23 1:16 GMT+01:00 Richard Wordingham < richard.wording...@ntlworld.com>:
> On Sat, 22 Mar 2014 23:37:49 +0100 > Philippe Verdy <verd...@wanadoo.fr> wrote: > > > 2014-03-22 20:50 GMT+01:00 Richard Wordingham < > > richard.wording...@ntlworld.com>: > > > > > > But it won't apply to "diacritics" (combining characters or joiner > > > > controls like CGJ, ZWK and ZWNJ, and possibly even some oher > > > > format > > > > controls) that have combining class 0 because their encoding > > > > order is significant to you know where to stop the effect of > > > > Backspace. > > > > > > Your approach recommends input methods that separate combining > > > marks of different combining classes by CGJ for easier editing! > > > > > > > NO. I certainly do not recommend it ! This is a false assertion. > > If one takes your approach to handling input, then one needs CGJ to ease > the correction of diacritics. I am not saying that you recommend the > use of CGJ. > > > > I see absolutely no reason why Backspace would arbitrarily delete > > > > only the last encoded character when users canno even count them > > > > and may not have input them separately. or could expect them to > > > > have be typed in a different order. > > > > > > > > So yes, entering: > > > > <CEDILLA DEADKEY, ACUTE DEADKEY, C, BACKSPACE>, or > > > > <ACUTE DEADKEY, CEDILLA DEADKEY, C, BACKSPACE>, or > > > > <ACUTE DEADKEY, C WITH CEDILLA, BACKSPACE>, or > > > > <CEDILLA DEADKEY, C WITH ACUTE, BACKSPACE> > > > > should all result in keeping only the letter C in the backing > > > > store. > > > > > > > And with a IME supporint Compose key this will also be true; > > > > > > > <COMPOSE, C, CEDILLA, ACUTE, BACKSPACE>, or > > > > <COMPOSE, C, ACUTE, CEDILLA, BACKSPACE>, or > > > > <COMPOSE, C WITH CEDILLA, ACUTE, BACKSPACE>, or > > > > <COMPOSE, C WITH ACUTE, CEDILLA, BACKSPACE> > > > > > > Your input methods suggest that there is something unitary about the > > > result - which makes sense if their output is U+1E08 LATIN CAPITAL > > > LETTER C WITH CEDILLA AND ACUTE. Would you make the same arguments > > > if 'C' were replaced with 'S'? There is no character LATIN CAPITAL > > > LETTER S WITH CEDILLA AND ACUTE. > > > > I have NOT said that there existed such character (look at the > > separating commas). > > I looked at the names. Dead keys are effectively modifiers applied > beforehand rather than simultaneously, so there is no more reason for > the dead key sequences to generate more than one character than there > is for an ordinary key to generate multiple characters. > > The use of 'COMPOSE' indicates that one is not simply entering a > sequence of characters. 'COMPOSE, C, CEDILLA, ACUTE' should mean > an input process different to simply 'C, COMBINING CEDILLA, COMBINING > ACUTE'. > Here again you reinterpret what I did not say. When U used DEADKEY or COMPOSE, I was evidently refering to keystrokes, not characters. So I did not imply any encoding of characters (I was clear enough to say that these sequences of keystrokes was allowed to generate any canonically equivalent encoding), so instrad I described the input (on keyboard or IME) and the expected output (an encoded text that should be canonically equivalent). I have NOWHERE intended to force the use of CGJ (you seem to imply that these keys will generate separate combining diacritics/joiners, one or two, for each key... This is wrong, the IME or keyboard driver handles the state of keystrokes, even if you use a COMPOSE key or a DEAD KEY, this does not matter, and so it won't feed the encoded text with streams of characters as long as the state is not complete enough: In fact this input with a compose key does not work: COMPOSE, C, CEDILLA, ACUTE simply because the composed sequence is areaddy terminated after the cedilla modifier key. So when you would type the acute modifier key it would not be associated. That's another reson why dead keys are working: the state is not complete as long as you have not *finally* input the base letter. But let's suppose that the driver must generate something, then for the ACUTE key it would need to output the combining character, possibly with a preceding CGJ if the intent is to have the acute accent ordered relatively with the cedilla (this is very unusual). In most usages, by far, diacritics never need any preceding CGJ to preserve their relative ordering: it is almost never the case for diacrititcs that have distinct non-zero combining classes. The rare cases occur however in classical pointed Hebrew. For this reason the keyboard driver will likely include a separate key mapping for the CGJ, either - as a base key entered after the diacritic deadkey, to force the ouput of CGJ+diacritic characters ; or - as a sequence with COMPOSE+diacritic key, without any key for the intermediate base letter, to produce the same ouput. In the first case (driver with dead keys), you need a single keyboard mapping for the CGJ working as a dead key. In the second case (driver with compose key), you use the COMPOSE key mapping only, but you still need to map positions for the second base key (in the 3-key compose sequence) meant to represent diacritics. The effect of Backspace entered just after it would delete simulatenously CGJ and the diacritic characters. It does not need to depend on the input state of the driver or the IME. In all cases, nothing in the keyboard mapping or IME will generate a CGJ character isolately, ir will be always followed by something. But what would happen if you would type the compose sequence generating CGJ with COMPOSE where you forget to press the initial base letter, or type COMPOSE after the base letter ? C, COMPOSE, ACUTE you get the characters <C, CGJ, combining ACUTE> you cannot type another CEDILLA after it without pressing COMPOSE again before it, to get <C, CGJ, combining ACUTE, CGJ, combining CEDILLA>. The result is clearly abusing the use of CGJ when the input output should just be canonically equivalent to <C, combining ACUTE, combining CEDILLA> (i.e. without any CGJ at all) Your system would be even less meaningful, it would break in most renderers and spell checkers. It would break in IDNA domain names. it would not match in plain text search unless they are tuned so that ther collators discard the CGJs to look for fuzzy matches (fuzzy matches would also look for strings that are compatibility equivalent under NFKD, or could search at collation levels 2, or at collation level 1 ignoring all diacritics and CGJ wherever they are). So compose keys cause more confusion to native users than dead keys that are smarter as they can record more internal states and also allow arbitrary order of input for unordered diacritics (like acute plus cedilla : you can press their dead key in any order, the IME or driver handles the case and generates them, preferably in canonical order with growing combining classes; the drive or IME alos generates them in an input state where it also knows the base letter to ouput, it can precombine the diacritics and so it will output C WITH CEDILLA, followed by COMBINING ACUTE, as expected, and still without needing any CGJ).
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode