Here are some ideas about how to implement the needed features in the easiest possible way.
We need exactly two kinds of potential hyphenation point: the one where ZWJ has to be inserted on the next line, and the one where ZWNJ can be inserted on the next line. We should avoid changing the way patterns are stored in tables, otherwise it would imply too many changes. Ideally it should be a generalization of the existing format. Looking at §921 in TeX, the Program, I see that hyf_num is an array of small_number, and according to §101, a small_number is a number 0..63. As I doubt anyone has ever used values higher than 20, I suggest considering values 63, 62, 61, etc. as being -1, -2, -3, etc. In the hyphenation patterns we will need an additional notation to distinguish "positive" from "negative" patterns, (for example a * in front of the number) positive will be stored as usual, and "negative" patterns will be stored with high values 63, 62, etc. We need new primitives \neghyphenchar \prehyphenchar \posthyphenchar \preneghyphenchar \postneghyphenchar where \post* are character nodes inserted on the next line. Then we change the code of §923: it will first look for ordinary hyphen locations and then for "negative" hyphen locations. We handle both in a similar way (replacing high values 63, 62, ... by 1, 2, ...) but apply \prehyphenchar \hyphenchar and \posthyphenchar in the first case, and \preneghyphenchar, \neghyphenchar and \postneghyphenchar in the second case. The rest is as usual. I think this solves the Uyghur issue with a minimal amount of changes. > Le 27 févr. 2021 à 00:33, Yannis Haralambous <[email protected]> a écrit : > > Oops, you’re right. We do need two kinds of hyphenchars, with and without > ^^^^200d. > > Envoyé de mon iPhone > >> Le 26 févr. 2021 à 23:52, Jonathan Kew <[email protected]> a écrit : >> >> On 26/02/2021 22:44, Yannis Haralambous wrote: >>>>> Le 26 févr. 2021 à 23:37, Jonathan Kew <[email protected] >>>>> <mailto:[email protected]>> a écrit : >>>> >>>> On 26/02/2021 22:00, Yannis Haralambous wrote: >>>>> dear TeX-hyphen members, >>>>> I'm new to this list (although not necessarily new to TeX hyphenation :-) >>>>> Here is the problem: we are preparing hyphenation patterns for Uyghur, >>>>> written in Arabic script. >>>>> As letters must be in initial/medial form before the hyphen and >>>>> medial/final form on the next line begin, >>>>> I was wondering if we could change TeX internals so that instead of one, >>>>> three hyphenchars are used: >>>>> ^^^^200d and `-' on the upper line and ^^^^200d on the lower line, in >>>>> order to obtain the equivalent >>>>> of \discretionary{^^^^200d-}{^^^^200d}{} >>> Hi Jonathan, >>>> The problem with this is that it wouldn't be the appropriate >>>> \discretionary in the case where the letter before the hyphenation >>>> position is a right- (rather than dual-) joining character. >>> Sorry I don't understand what you mean. You mean when it is a biform >>> character like the waw or the ra? In that case the ZWJ will do no harm. It >>> is an invisible character that does not affect glyphs of biform characters. >> >> Yes, the ZWJ before the hyphen on the first line would be harmless. But the >> ZWJ after the break (at the beginning of the second line) will cause the >> following character to take on a medial or final form, whereas it should >> remain initial or medial when it's after alef/dal/re/waw. >> >> JK <http://www.imt-atlantique.fr/> Yannis HARALAMBOUS Professor Computer Science Department UMR CNRS 6285 Lab-STICC <http://perso.telecom-bretagne.eu/yannisharalambous/> <https://twitter.com/y_haralambous> <https://www.linkedin.com/in/yannis-haralambous-5529073?trk=hp-identity-name>Technopôle Brest-Iroise CS 83818 29238 Brest Cedex 3, France Une école de l'IMT <http://www.imt.fr/> The history of linguistics is largely a history of misreadings, of failed communication between authors and readers, exacerbated by the illusion that communication has successfully occurred. (John E. Joseph)
