Re: [tex-hyphen] String preparation

2016-05-25 Thread Philip Taylor
Arthur Reutenauer wrote: > there is a concept in Unicode, that of grapheme cluster > (http://unicode.org/glossary/#grapheme_cluster) I am intrigued to know why Unicode defines a grapheme cluster in terms of a /horizontally/ segmentable unit of text : > A grapheme cluster represents a

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> So let's take an example : let's say I want to consider œ as just one > unit in order to simplify the problem. What I want in the general case > is left/righthyphenmin=2, so what I want to get is > > œ́di-pus > œdi-pus > > and not > > œ́-di-pus > > How can I achieve that? These

Re: [tex-hyphen] String preparation

2016-05-25 Thread Élie Roux
Le 25/05/2016 21:09, Arthur Reutenauer a écrit : >> I think the main point is that it is not treated the same way as œ. For >> instance with right/lefthyphenmin = 2, we have >> >> œ́-di-pus >> œdi-pus > > If that’s what the patterns do, they should be fixed, To be clear : the patterns produce

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> I think the main point is that it is not treated the same way as œ. For > instance with right/lefthyphenmin = 2, we have > > œ́-di-pus > œdi-pus If that’s what the patterns do, they should be fixed, and \left and \righthyphenmin set to something more useful. That’s what we have for

Re: [tex-hyphen] String preparation

2016-05-25 Thread Élie Roux
> The sequence <œ, combining acute> is seen as weird because it’s the > first one in a Latin-script language that cannot be input as a single > Unicode character. I think the main point is that it is not treated the same way as œ. For instance with right/lefthyphenmin = 2, we have œ́-di-pus

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> Except the huge amount of time one would have to spend on that... We > have 26000 hyphenated words, but that's really not much, especially for > Latin, and I don't think it's enough to use patgen... What do you mean? Arthur

Re: [tex-hyphen] String preparation

2016-05-25 Thread Élie Roux
> It’s one grapheme; if you want to treat as two characters, for > example in order to allow a break after œ and œ́ at the start of a > word, set \lefthyphenmin to 1 and write the patterns in such a way > that they prohibit a break after any other single-character word > start. There’s nothing

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> I agree that "œ+combining acute" should be treated the same as œ, but > should it count as 1 or two characters (o+e) for the > right/left/hyphenmin? I'm not sure about that, but maybe some would > treat it as two characters It’s one grapheme; if you want to treat as two characters, for

Re: [tex-hyphen] String preparation

2016-05-25 Thread Hans Hagen
On 5/25/2016 10:00 AM, Mojca Miklavec wrote: Dear Eric, On 20 May 2016 at 18:21, Muller, Eric wrote: A few questions: 1. the hyphenation patterns are meant to work on text that has been "normalized" in some way; In the early days of TeX it was sufficient if it worked with 8-bit fonts and