Arthur Reutenauer wrote:
> there is a concept in Unicode, that of grapheme cluster
> (http://unicode.org/glossary/#grapheme_cluster)
I am intrigued to know why Unicode defines a grapheme cluster in terms
of a /horizontally/ segmentable unit of text :
> A grapheme cluster represents a
> So let's take an example : let's say I want to consider œ as just one
> unit in order to simplify the problem. What I want in the general case
> is left/righthyphenmin=2, so what I want to get is
>
> œ́di-pus
> œdi-pus
>
> and not
>
> œ́-di-pus
>
> How can I achieve that? These
Le 25/05/2016 21:09, Arthur Reutenauer a écrit :
>> I think the main point is that it is not treated the same way as œ. For
>> instance with right/lefthyphenmin = 2, we have
>>
>> œ́-di-pus
>> œdi-pus
>
> If that’s what the patterns do, they should be fixed,
To be clear : the patterns produce
> I think the main point is that it is not treated the same way as œ. For
> instance with right/lefthyphenmin = 2, we have
>
> œ́-di-pus
> œdi-pus
If that’s what the patterns do, they should be fixed, and \left and
\righthyphenmin set to something more useful. That’s what we have for
> The sequence <œ, combining acute> is seen as weird because it’s the
> first one in a Latin-script language that cannot be input as a single
> Unicode character.
I think the main point is that it is not treated the same way as œ. For
instance with right/lefthyphenmin = 2, we have
œ́-di-pus
> Except the huge amount of time one would have to spend on that... We
> have 26000 hyphenated words, but that's really not much, especially for
> Latin, and I don't think it's enough to use patgen...
What do you mean?
Arthur
> It’s one grapheme; if you want to treat as two characters, for
> example in order to allow a break after œ and œ́ at the start of a
> word, set \lefthyphenmin to 1 and write the patterns in such a way
> that they prohibit a break after any other single-character word
> start. There’s nothing
> I agree that "œ+combining acute" should be treated the same as œ, but
> should it count as 1 or two characters (o+e) for the
> right/left/hyphenmin? I'm not sure about that, but maybe some would
> treat it as two characters
It’s one grapheme; if you want to treat as two characters, for
On 5/25/2016 10:00 AM, Mojca Miklavec wrote:
Dear Eric,
On 20 May 2016 at 18:21, Muller, Eric wrote:
A few questions:
1. the hyphenation patterns are meant to work on text that has been
"normalized" in some way;
In the early days of TeX it was sufficient if it worked with 8-bit
fonts and