On Tue, 5 Jun 2018 19:48:53 -0700 Manish Goregaokar via Unicode <unicode@unicode.org> wrote:
> Following up from my previous email > <https://www.unicode.org/mail-arch/unicode-ml/y2018-m06/0007.html>, > one of the ideas that was brought up was that if we're going to > consider NFKC forms equivalent, we should require things to be typed > in NFKC. > > > I'm a bit wary of this. As Richard brought up in that thread, some > Thai NFKC forms are untypable. I *suspect* there are Hangul keyboards > (perhaps physical non-IME based ones) that have this problem. > > Do folks have other examples? Interested in both: I don't know of any different problems for NFKC,but there are problems with getting people to enter normalised data. > - Words (as in, real things people will want to type) where a > keyboard/IME does not type the NFKC form There are problems with insisting that users type normalised text. Vietnamese is probably a real issue here; the standard keyboard is set up to enter vowels (some of which are accented) and tone marks separately. Indeed, with the nặnɡ tone (as in the vowel of its name), one is likely to find the codepoint sequence <U+0103 LATIN SMALL LETTER A WITH BREVE, U+0323 COMBINING DOT BELOW> which is not NFC, not NFD and not even FCD. > - Words where the NFKC form is *visually* distinct enough that it > will look weird to native speakers There may be issues with BMP CJK compatibility ideographs. I don't know how far they've been replaced by variation sequences requesting the same appearance. > - Words where a keyboard/IME *can* type the NFKC form but users are > not used to it Well, typing Tai Khuen in normalised form is hideously counter-intuitive, but at present the USE makes displaying correctly spelt text into a struggle for a font. The problem there is that the usual way of typing a closed syllable with a tone mark gets normalised at the end to <SAKOT, tone mark, final_consonant>; that normalisation broke early pre-USE OpenType-based fonts as databases caught up with Unicode 5.2. That problem was promptly cured by HarfBuzz tweaking its internal normalisation, until USE unintentionally outlawed correct spelling. A universal keyboard for entering large swathes of the Latin script is not a very big problem, but entering text with diacritics in form NFC is a real pain. This problem might arise when editing a Hungarian program without a Hungarian keyboard. The program development environment would have to provide a normalisation tool. Richard.