This message contained a screenshot and originally contained several attached
screenshots, which prevented it from being forwarded to the List. I removed all
and suggest that for screenshots, readers might refer to the links I added in
my e-mail I resent today to Khaled Hosny.
On Tue, Jun 30, 2015, Doug Ewell wrote:
> Khaled Hosny wrote:
>
> >> On my netbook, which is running Windows 7 Starter, U+2060 is not a
> >> part of any of the shipped fonts.
> >
> > It is a control character, it does not need to have a glyph in the
> > font to be properly supported.
Thank you Khaled, I will respond soon after this.
> The problem is the word "supported." Marcel is seeing a visible glyph (a
> .notdef box) for what is supposed to be an invisible, zero-width
> character, and that is leading him to conclude that Windows doesn't
> "support" this character.
The .notdef box is exactly what I see sometimes on the Notepad and every time
in the Word dialogs when I use U+2060, but in fact, what I see in the document
is a particular glyph, representing a tall fullheight empty box with a wide
space to its right despite of the font being proportional, and in the Notepad
text the same box but without space. Only when I switch the font to the one you
indicate below, the word joiner displays correctly on my version of Microsoft
Word. Please see the attached screenshots (I wanted to paste them into this
e-mail).
> On my Win 7 machine at work, when I enter the string "onetwo"
> ("one\u2060two") and click on either word, both words are selected. That
> is exactly what I would expect WJ to do. This works on the built-in
> Notepad as well as Notepad++ and BabelPad (but not on GoDaddy's
> Web-based email client).
The selection with double-click corresponds to what Richard did with the quick
cursor move. These phenomena are text processing features which give little
evidence on the presence or the absence of word boundaries. So I redid your
test but used the search tool, with the "Whole words only" option enabled. This
gives an idea of how the application percieves the words as entities, or better
said, how developers expect users to expect search results. Well that isn't
really a better expression... What I want to say is that what we see is
normally what we are expected to expect. Personally I wouldn't like to get
selected only a part of the compound I want most probably to mark up as a
whole, nor do you, Doug. This is why a double-click on no matter which spot on
the sequence makes this sequence selected as a whole. By contrast, given that
we took care to insert word joiners where normally we aren't expected to
(because it is sufficient to simply type the words one after each other without
anything between, to get them as *one* word), the software engineers expect us
to wish to join what must remain a sequence of separate words. Consequently,
the built-in search engine will recognize each word as a word for itself.
This is where good software deploys its benefits. Some software does not
recognize the ZWNBSP or the NBSP (I don't know which one or both) as indicating
the presence of a word boundary, and therefore does not work correctly. That
depends also on the PDF conversion tool. Please check the screenshots (I
switched the UIs to English wherever possible, that is, on LibreOffice). [This
e-mail has been blocked because it contained several attached screenshots. So I
resend it without attached images.]
> But out of more than 500 fonts on that machine, the only stock Microsoft
> fonts that show WJ with zero-width, instead of a .notdef glyph, are
> Javanese Text, Myanmar Text, and Segoe UI Symbol. So while it's
> inaccurate to extrapolate this to "Microsoft doesn't support WJ," the
> font support is definitely lacking.
I wish to thank you personally Doug, for this very valuable hint. Effectively,
on Microsoft Word 2010 Starter on Widows 7 Starter, the WJ is not correctly
displayed unless the font is switched to Segoe UI Symbol (which is the one out
of the three that had been shipped with my OS). If the Segoe typeface is not
appropriate in the document, we can ask Word to find and replace all istances
of U+2060 with the same formatted in Segoe UI Symbol. This may be what Word
users are expected to do every time. Even if that isn't really what we expect
of a Productivity Suite. Perhaps, or most probably, this problem does not occur
in other high-end software, as Microsoft Publisher (needs to be confirmed). But
if somebody buys Microsoft Office Premium, or Professional, he should be save
from that misfunctioning. As should be everybody using Microsoft software, in
fact.
> The bit about characters being converted to other characters, of course,
> has nothing to do with Windows and everything to do with particular
> applications.
Based on this hint, I did more tests and found out that for a proper conversion
to plain text, any segment including U+00A0, U+FEFF and other format
characters, when copied from a document on Microsoft Word, must first be pasted
into a LibreOffice document, then copied again and finally pasted into the text
editor. I should avoid to vent further about that issue, and I'd better wait
for official comments; I simply suppose that there is an algorithm (say, then,
as a part of Microsoft Word) detecting where the clipboard item goes to, and
eventually destroying the format characters. Guess everybody to what use...
Thanks a lot!
Marcel
[originally one pasted screenshot]