On Fri, 1 Feb 2013 23:51:34 +0000 (GMT) Julian Bradfield <[email protected]> wrote:
> On 2013-02-01, Costello, Roger L. <[email protected]> wrote: > > So why would one ever generate text in decomposed form (NFD)? > > Text that I type is quite likely to be in decomposed (or at least not > composed) form, because I find it a lot easier to have a few > keystrokes for combining accents than to set up compose key sequences > for all the possible composed characters. > For example, > ǂhèẽ-ǂhèẽ ǃn̥à̰ĩ-ǃn̥à̰ĩ > was part of the title of a talk. Is there a composed form of à̰? I > don't know, and don't want to! > Much easier to do searches and other text processing on it, too. > (The current dictionary project for this language uses NFD in its data > files, too.) But if you use a member of the Keyman family of inputs methods (I've been using Keyman for Linux (KMFL), you can set up a keyboard so you just enter that using XSAMPA keystrokes, e.g. =\he_Le~-=\he_Le~ !\n_0a_L_ki~-!\n_0a_L_ki~ and get ǂhèẽ-ǂhèẽ ǃn̥à̰ĩ-ǃn̥à̰ĩ. The keyboard mappinɡ definition determines whether the combining grave from ‘_L’ composes. The only problem is that to get NFC you have to remember to type a_L_k to get the NFC form à̰ rather than a_k_L, which delivers the NFD form à̰, but do you not have to remember the order of diacritics anyway? Simple codepoint-sequence based searching only works if diacritics are in the correct order. Having set up an NFC-deliverinɡ XSAMPA-based keyboard so that it had rules O => ɔ, O\ => ʘ, O\\ => O, I’ve found it would be a lot more useful if I’d been a lot less puristic and set it up so that I had O => O, O\ => ɔ, O\\ => ʘ. I use multiple backslashes to get some additional characters and recover ASCII, an idea I ɡot from Martin Hosken’s IPA keyboard. I’m currently pondering how to maintain puristic and ‘practical’ versions from the same source files. Ideally I’d also merge in the related Emacs keyboard definition. However, as you say, processing is a lot simpler if the text is guaranteed to be in NFD. Richard.

