I'm a bit concerned about the implication that correctly encoded Breton, Welsh, 
etc. Unicode text needs to be sprinkled liberally with SHY or CGJ or other 
invisible formatting characters, to resolve any possible ambiguity in these 
languages' orthographies.  This is like saying English text needs to have a SHY 
at every potential hyphenation point, so text processors don't have to use a 
dictionary to hyphenate.

I can easily see this thread being misinterpreted or taken out of context by 
newcomers, or reposted or blogged by someone eager to make a point about 
unneeded complexity in Unicode.  Really, for 99.9% of applications, shouldn't 
we just write the letters?

--Doug
Sent via BlackBerry by AT&T

-----Original Message-----
From: Asmus Freytag <asm...@ix.netcom.com>
Sender: unicode-bou...@unicode.org
Date: Sat, 02 Jul 2011 10:02:03 
To: <verd...@wanadoo.fr>
Cc: Andrew Miller<a.j.mil...@bcs.org.uk>; <unicode@unicode.org>
Subject: Re: unicode Digest V12 #108

On 7/2/2011 8:59 AM, Philippe Verdy wrote:
> 2011/7/2 Andrew Miller<a.j.mil...@bcs.org.uk>:
>> The "ng" in Llangollen is not the digram "ng" but two separate letters
>> (unlike the "ll" in the name which is the digram).
> Why not simply using a soft hyphen between "n" and "g" in this case ?
> Soft hyphens are normally recognized as such by smart correctors and
> as well by search engines or collators. It seems enough for me to
> indicate that this is not the Welsh digram "ng" ; CGJ anyway is
> certainly not the correct disjoiner in your case.
>
>
This solution works well if the word can split between the n and the g.

In fact, if such split is possible, I would call it the preferred 
solution to indicating an "accidental" digraph.

An example:

The Danish digraph "aa", normally spelled "å" in modern orthography, but 
retained in names etc. can occur "accidentally" in compound nouns, such 
as "dataanalyse". Adding a SHY is the preferred method to indicate that 
the "aa" is accidental.

Other characters may have the same effect of breaking the digraph, their 
use might require an *additional* SHY to be inserted, if and when a 
linebreak opportunity needs to be manually marked (say for an unusual 
compound not recognized by the automatic hyphenator). It would be bad to 
have to have *two* invisible characters at that location.



Reply via email to