Iʼve got a problem with the word joiner and would ask anybody if things could
be changed please. After two examples, Iʼll draw the issue.
To do traditional French typography on the PC, a justifying no-break space is
needed along with the colon, because this punctuation must be placed in the
middle between the word it belongs to and the following word. According to the
Standard, page 799 (§ 23.2), such a space is obtained by bracketing a white
space with word joiners: U+2060 U+0020 U+2060. To make this colon readily
available on keyboard, I should therefore program the sequence:
{VK_OEM_2 /*T34 B09*/ ,3 ,0x2060 ,' ' ,0x2060 ,':' ,NONE ,NONE ,NONE ,NONE
,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE ,NONE }
Still in French, the letter apostrophe, when used as current apostrophe,
prevents the following word from being identified as a word because of the
missing word boundary and, subsequently, prevents the autoexpand from working.
This can be fixed by adding a word joiner after the apostrophe, thanks to an
autocorrect entry that replaces U+02BC inserted by default in typographic mode,
with U+02BC U+2060. (About why to use U+02BC, even in French, please refer to
the preceding thread ‘A new take on the English Apostrophe in Unicode’. Iʼll
just add now that without disambiguating apostrophes and close-quotes, any
search for quotations, e.g. to mark them up, using the generic character *
bracketed like ‘*’, must fail because results are cut at the next apostrophe
instead of extending to the closing-quote.)
However, despite of the word joiner having been encoded and recommended since
version 3.2 of the Standard, it is still not implemented on Windows 7.
Therefore I must use the traditional zero width no-break space U+FEFF instead.
In TUS, sections 23.2 (page 799) and 23.8 (pages 821 sqq), we are taught that
for the semantics of word joining, U+2060 is strongly preferred, but U+FEFF
must still be supported for backward compatibility. As well, it results from §
23.8 that in careful text processing, U+FEFF always occurs only at the very
beginning of text files when used as a byte order mark (page 822), while
applications where Unicode has been carefully implemented, are expected to
always mention the charset and the transformation format the files are written
in, and donʼt need U+FEFF as a BOM. Therefore, it seems that U+FEFF can still
be used as a ZWNBSP in *new* text files, despite of its use being strongly
discouraged and U+2060 being preferred.
Supposing that Microsoft choose not to implement U+2060 WJ because quitting the
usage of U+FEFF ZWNBSP appeared needless and would have brought much trouble
for no use (or at least, not much), please permit me to ask if Unicode couldnʼt
follow Microsoft once again and remove the recommendation of U+2060 please.
Most people just *canʼt* use this character, and keyboard implementations
*must* avoid it.
Best regards,
Marcel Schneider