Re: SHY, CGJ, etc. (was: unicode Digest V12 #108)

2011-07-08 Thread Philippe Verdy
After more tests, it seems that Word effectively changes a SOFT HYPHEN (U+00AD) on input into control US (U+001F), which it uses not as a regular soft hyphen but as an optional hyphen. This is then changed back to a regular soft hyphen in the clipboard when copying it there in a rich text format

Re: unicode Digest V12 #108

2011-07-08 Thread Philippe Verdy
2011/7/6 Asmus Freytag asm...@ix.netcom.com: On 7/3/2011 6:31 AM, Philippe Verdy wrote: Regarfing the previous comment about the Danish aa, Sorry, most of that discussion missed the mark. Modern Danish can have AA for two reasons. Accidental occurrence, as in dataanalyse which is composed

Re: unicode Digest V12 #108

2011-07-06 Thread Asmus Freytag
On 7/3/2011 6:31 AM, Philippe Verdy wrote: Regarfing the previous comment about the Danish aa, Sorry, most of that discussion missed the mark. Modern Danish can have AA for two reasons. Accidental occurrence, as in dataanalyse which is composed of two words which just happens to put two A

Re: unicode Digest V12 #108

2011-07-06 Thread Jukka K. Korpela
2011-07-06 9:25, Asmus Freytag wrote: Because accidental digraphs (in Danish) happen at word boundaries in a compound, the SHY is an elegant way to mark them. It may often be a practical trick, given the current repertoire of characters in Unicode and the way they are handled in different

Re: unicode Digest V12 #108

2011-07-06 Thread Asmus Freytag
On 7/6/2011 12:16 AM, Jukka K. Korpela wrote: Allowing word division just to say that some characters do not constitute a digraph (or trigraph…) is not practical e.g. when the text has otherwise no word divisions, for one reason or another, or when the particular word division point is

Re: unicode Digest V12 #108

2011-07-06 Thread Ken Whistler
On 7/6/2011 11:18 AM, Asmus Freytag wrote: The Danes, over a decade ago, when they made the official recommendation to use SHY appear to have come to the conclusion that AA can never occur accidentally, except at word division in compounds. Not really a safe conclusion. :)

SHY, CGJ, etc. (was: unicode Digest V12 #108)

2011-07-04 Thread Andreas Prilop
On Sun, 3 Jul 2011, Jukka K. Korpela wrote: You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for Windows 7) within a random long word (aa) and the SHY is recognized to generate the intended hyphenation break. That’s good news, if your analysis is correct, but the

Re: SHY, CGJ, etc. (was: unicode Digest V12 #108)

2011-07-04 Thread Philippe Verdy
2011/7/4 Andreas Prilop prilop4...@trashmail.net: On Sun, 3 Jul 2011, Jukka K. Korpela wrote: You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for Windows 7) within a random long word (aa) and the SHY is recognized to generate the intended hyphenation break.

Re: unicode Digest V12 #108

2011-07-03 Thread André Szabolcs Szelp
On Sat, Jul 2, 2011 at 20:56, Jukka K. Korpela jkorp...@cs.tut.fi wrote: I may have missed some parts of the discussion, but I don’t see why you couldn’t just use the zero-width non-joiner. Using it may cause risks of its own, but at least you would be dealing with risks related to the

Re: unicode Digest V12 #108

2011-07-03 Thread Leo Broukhis
2011/7/3 André Szabolcs Szelp a.sz.sz...@gmail.com: I would also think that ZWNJ is more safe and appropriate than CGJ. Szabolcs How would it help? If I don't like typographic ligatures in principle, I would be within my right to put ZWNJs between every pair of letters and it must have

Re: unicode Digest V12 #108

2011-07-03 Thread Philippe Verdy
2011/7/2 Jukka K. Korpela jkorp...@cs.tut.fi: And there is really no guarantee that programs support the soft hyphen. For one, Microsoft Word doesn’t—it treats it as just another printable character. You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for Windows 7) within a random

Re: unicode Digest V12 #108

2011-07-03 Thread Jukka K. Korpela
Philippe Verdy wrote: 2011/7/2 Jukka K. Korpela jkorp...@cs.tut.fi: And there is really no guarantee that programs support the soft hyphen. For one, Microsoft Word doesn’t—it treats it as just another printable character. You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for

Re: unicode Digest V12 #108

2011-07-03 Thread Jukka K. Korpela
Philippe Verdy wrote: 2011/7/2 Jukka K. Korpela jkorp...@cs.tut.fi: And there is really no guarantee that programs support the soft hyphen. For one, Microsoft Word doesn’t—it treats it as just another printable character. You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for

Re: unicode Digest V12 #108

2011-07-03 Thread Jukka K. Korpela
Philippe Verdy wrote: 2011/7/2 Jukka K. Korpela jkorp...@cs.tut.fi: And there is really no guarantee that programs support the soft hyphen. For one, Microsoft Word doesn’t—it treats it as just another printable character. You're wrong, it DOES. I just tested it (in Microsoft Word 2010 for

Re: unicode Digest V12 #108

2011-07-02 Thread Andrew Miller
From: Philippe Verdy verd...@wanadoo.fr Date: Sat, 2 Jul 2011 15:59:18 +0200 Subject: Re: ch ligature in a monospace font 2011/7/1 Richard Wordingham richard.wording...@ntlworld.com: I wonder if anyone has some statistics on the use of CGJ.  Its revised intended use was to disrupt

Re: unicode Digest V12 #108

2011-07-02 Thread Philippe Verdy
2011/7/2 Andrew Miller a.j.mil...@bcs.org.uk: The ng in Llangollen is not the digram ng but two separate letters (unlike the ll in the name which is the digram). Why not simply using a soft hyphen between n and g in this case ? Soft hyphens are normally recognized as such by smart correctors

Re: unicode Digest V12 #108

2011-07-02 Thread Asmus Freytag
On 7/2/2011 8:59 AM, Philippe Verdy wrote: 2011/7/2 Andrew Millera.j.mil...@bcs.org.uk: The ng in Llangollen is not the digram ng but two separate letters (unlike the ll in the name which is the digram). Why not simply using a soft hyphen between n and g in this case ? Soft hyphens are

Re: unicode Digest V12 #108

2011-07-02 Thread Jukka K. Korpela
Asmus Freytag wrote: On 7/2/2011 8:59 AM, Philippe Verdy wrote: [...] Why not simply using a soft hyphen between n and g in this case ? Soft hyphens are normally recognized as such by smart correctors and as well by search engines or collators. It seems enough for me to indicate that this is

SHY, CGJ, etc. (was: Re: unicode Digest V12 #108)

2011-07-02 Thread doug
...@bcs.org.uk; unicode@unicode.org Subject: Re: unicode Digest V12 #108 On 7/2/2011 8:59 AM, Philippe Verdy wrote: 2011/7/2 Andrew Millera.j.mil...@bcs.org.uk: The ng in Llangollen is not the digram ng but two separate letters (unlike the ll in the name which is the digram). Why not simply using