On 2/15/2016 3:32 PM, Mats Blakstad wrote:
[…]
> I now wonder, generally, is it best to add new precomposed characters to 
> Unicode? Should there be a unicode symbol for each combination used? What is 
> best practise? I ask because I see some unicodes are precomposed characters, 
> I'm not sure why they are useful, but if they are maybe we also should add 
> these?
[…]

On Mon, 15 Feb 2016 20:46:28 -0800, Asmus Freytag (t)  answered :
[…]
> However, precomposing these is simply out. Unicode locked that door and threw 
> away the key (short answer). The long answer will come along shortly.

Existing precomposed characters have been proposed before the deadline, i.e. in 
the past millennium, and encoded for backwards compatibility. Therefore, the 
scripts of many Latin-writing countries, including Vietnam, can be represented 
both in NFD *and* NFC, but this is purely fortuitous. The well-known Unicode 
encoding scheme being based on _combining diacritics_, a part of implementation 
consists in making these supported at all stages of data processing, including 
input.

The big oopsie that you stumbled upon, is that Windows keyboard layout 
drivers―as opposed to Linux―cannot generate by dead keys more than one single 
UTF-16 code unit. Supposedly this is due to a gap in keyboard standardization. 
When ISO/IEC 9995 was published in 1994, after a decade of work―and after a 
couple of years thriving Unicode―the standard provided nothing to cater for 
Unicode implementation. A bit later, the Windows keyboard APIs were frozen, for 
backwards compatibility.

Indeed there _is_ a problem. But there are solutions.

On Tue, 16 Feb 2016 09:00:26 +0100, Philippe Verdy  answered :
[…]
> Keyboard layouts MUST generate the combining sequence.
[…]

Indeed Unicode states that «it is straightforward to adapt such a system» of 
dead keys to output combining sequences as well, and that was the idea when 
ISO/IEC 9995-11 was added past year. That last and most recent part of the 
standard specifies the algorithm of an IME that uses the NormalizeString 
function or the String Normalize method provided by the OS. You may wish to 
look up the long description in French Wikipédia [1].

On Windows there is however no need of a *new* and ISO/IEC-conformant IME, as 
Keyman keyboard layouts are already able to generate whatever sequence is 
required, from whatever input is specified, with dead keys or visible on 
screen. If you checked the Pan Africa (Deadkeys) layout that is suitable for 
Togo and many other African countries, as well as the official SIL Pan Africa 
keyboard, and they don’t match your requirements―because diacritics are entered 
_after_ the base letter, even to get existing precomposed letters output―you 
may wish to create a layout that outputs combining sequences entered by dead 
keys, using Keyman Developer.

Experience shows however that training on dead key layouts as used for French, 
can be extended to the use of combining diacritics entered after the base 
letter, with an appropriate keyboard layout driver. These combining characters 
being actually the most useful form of most diacritics, it is recommended that 
they be generated when the space bar is hit after a dead key if such are 
present. More obviously all needed diacritics are allocated to key positions, 
so that they can be added to any letter by the means of a single keystroke. One 
example is the keyboard layout for Bamanankan and French on the /Mali Pense/ 
site that Don Osbornʼs /Beyond Niamey/ blog linkes to [2]. Anyway, entering 
diacritics _after_ the base letter is the most up-to-date way to input composed 
characters, because it is very intuitive, and because it realizes the spirit of 
the character representation scheme of Unicode.

I hope that helps too.

Best regards,

Marcel

[1] 
https://fr.wikipedia.org/wiki/ISO/CEI_9995#ISO.2FCEI_9995-11_-_Les_touches_mortes

[2] Don Osborn. Beyond Niamey: Writing Bambara right. (2014, November 25). 
Retrieved October 22, 2015, from 
http://niamey.blogspot.fr/2014/11/writing-bambara-right.html

Reply via email to