2011/7/4 Andreas Prilop <[email protected]>: > On Sun, 3 Jul 2011, Jukka K. Korpela wrote: > >>> You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for >>> Windows 7) within a random long word (aaaaaaaaaa....) and the SHY >>> is recognized to generate the intended hyphenation break. >> >> That’s good news, if your analysis is correct, but the problem still >> exists in all Word versions up and including Word 2007. > > Philippe Verdy does not understand the difference between U+001F > and U+00AD. Even MS Word 2010 continues to use U+001F as soft hyphen > but does not recognize U+00AD as soft hyphen.
I do know the difference, thanks. I've not spoken at all about U+001F and not even tested it (anyway it does not mean anything, and certainly not a soft hyphen, except possibly in old legacy word processing formats converted to Word ; it's just an ASCII control with unspecified behavior, not suitable for plain-text interchange). I have entered TRUE soft hyphens as U+00AD, in a plain-text document, and opened it in word. And this works effectively as expected. I could also copy-paste a SHY from a plain-text document, or from the Charmap utility, or from my keyboard, and it works as well. Saving the document back in XML format confirms that it remains U+00AD. U+001F can only be a legacy from the past, it is certainly not correct for the XML validation, and current Word formats are XML-based (I don't know what Word uses in its past binary format compatible with Word 6, but this binary format does not have to obey the same rules as it is clearly not plain text ; same remark about the legacy RTF format, still supported and used in Windows Write/Wordpad, which contains lots of legacy hacks and that was not designed for Unicode conformance).

