On Wed, May 21, 2008 at 6:06 PM, Filippou, Dimitrios (RTIT) wrote:
>
> Now, I have almost completed the work on UTF-8 patterns for ancient Greek
> (see also attached), but I have come across a small incident and I need your
> help: I noticed that words ending with a consonant (e.g., λ) and right
> guillemets (») get hyphenated before the consonant, which is wrong, despite
> the patterns that prohibit hyphenation before final consonants (e.g., 4λ.).
> To overcome the problem I added extra prohibitive patterns with the right
> guillemets after the final consonant (e.g., 4λ».). But I think that the
> problem is just the \llcode of the character ». What do you think?
>
> As soon as I fix that problem with the right guillemets, I will release the
> revised patterns on CTAN.
>
> Many thanks for your interest in my patterns and your help in debugging them.
Hello Dimitrios,
can you please check if the attached patch (by Jonathan) fixes the
issue you had with faulty hyphenations in XeTeX and then remove the
guillemets from patterns again?
You don't need to release them under a new version. The source of
patterns as will be included in TeX Live 2008 will come from here:
http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns
and we will submit files to CTAN and to TeX Live.
It would be best if you could simply prepare a version of patterns
called hyph-el-monoton.tex, hyph-el-polyton.tex, hyph-grc.tex with the
following constraints:
- no TeX macros in patterns except for \patterns and \hyphenation, in
particular we mean: no \message, no \lccode, no \begingroup, no
\endinput
- pure UTF-8
- if you need special characters, let us know (they will be put under
loadhyph-el-*.tex)
I assume that it should be rather easy to fix a few lines in your perl code.
These UTF-8 patterns will be then considered upstream, and I will try
to convince authors of other patterns to switch to the new loading
mechanism (i.e.: update these new patterns).
Of course we can do the "conversion" as well, but it makes more sense
if you as the author can fix patterns whenever you need to, else the
changes may go unnoticed for us and it would be unnecessary double
work.
You may ask Karl for an account for SVN to be able to access & modify
your files in the repository if you want. I have put a copy of your
script to the repository as well, but it would be really really really
great if you could take care of all the tools you need to generate the
patterns in this repository.
Thanks a lot,
Mojca
PS: here are some radical comments by Hans. Please defend
yourself/explain when these are needed :) :) :)
>> MtxRun | checking language ??, file hyph-grc.tex
>> MtxRun | invalid character (0x0009) in patterns of language ??,
>> file hyph-grc.tex, n=17
>> MtxRun | invalid character » (0x00BB) in patterns of language ??, file
>> hyph-grc.tex, n=19
>
> leftguilemmot ... get rid of it
>
>> MtxRun | invalid character ᾿ (0x1FBF) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
>> MtxRun | invalid character ᾽ (0x1FBD) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> GREEK PSILI GREEK KORONIS (category sk)
>
> maybe ask Thomas S of that stuff makes sense
>
>> MtxRun | invalid character ʼ (0x02BC) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> MODIFIER LETTER APOSTROPHE
>
>> MtxRun | invalid character ' (0x0027) in patterns of language ??, file
>> hyph-grc.tex, n=78
>
> APOSTROPHE
>
>> MtxRun | invalid character ' (0x2019) in patterns of language ??, file
>> hyph-grc.tex, n=78
>> MtxRun | there are errors that need to be fixed
>
> RIGHT SINGLE QUOTATION MARK
>
> maybe that's all related to some funny input encoding
>
>> Arthur, same question about 0x0009.
>> » needs to be removed from patterns (Jonathan will hopefully fix the ini
>> file).
>> I guess that other 5 characters are needed. lccodes are set in
>> loadhyph for five characters.
>
> i think they should all go away
>
>> MtxRun | checking language ??, file hyph-cop.tex
>> MtxRun | invalid character ̀ (0x0300) in patterns of language ??, file
>> hyph-cop.tex, n=72
>> MtxRun | invalid character ̈ (0x0308) in patterns of language ??, file
>> hyph-cop.tex, n=5
>> MtxRun | there are errors that need to be fixed
>
> COMBINING GRAVE ACCENT
> COMBINING DIAERESIS
>
> kick out those lines
>
>> No idea. Done by Jonathan. Germans say: "Ich verstehe Bahnhof." (or:
>> "It's all Greek to me.") But these are combining characters as far as
>> I can see.
>>
>>
>> There are some other characters, mostly apostrophes. I suspect (don't
>> know, only suspect) that the functionality of patterns in such cases
>> changes if apostrophe is remapped to "single right quotation
>> character" with mapping=tex-text, and I suspect that people might be
>> using different characters in composed-words, but I might be wrong.
>
> wipe'm out ... either make patterns really clever (so, all kind of
> combinations) or not .. probably much of this dates from the 8 bit times
Index:
/Volumes/Nenya/texlive/Master/texmf-dist/tex/latex/latexconfig/xelatex.ini
===================================================================
--- /Volumes/Nenya/texlive/Master/texmf-dist/tex/latex/latexconfig/xelatex.ini
(revision 8778)
+++ /Volumes/Nenya/texlive/Master/texmf-dist/tex/latex/latexconfig/xelatex.ini
(working copy)
@@ -1,8 +1,93 @@
% xelatex.ini
% jonathan kew
-% updated: 18 May 2006
+% updated: 16 June 2008
% Public domain
\XeTeXuseglyphmetrics=1
\input unicode-letters
+% disable the \dump in latex.ltx
+\expandafter\let\csname saved-dump-cs\endcsname\dump
+\let\dump=\relax
\input latex.ltx
+% Because latex.ltx sets up character code tables for T1 encoding by default,
+% we need to reset values from unicode-letters that may have been overridden
+\begingroup
[EMAIL PROTECTED] [EMAIL PROTECTED] % reset chars "80-"FF to category "other",
no case mapping
+\loop \ifnum\count@<256
+ [EMAIL PROTECTED] [EMAIL PROTECTED]
+ [EMAIL PROTECTED] [EMAIL PROTECTED]
+ \advance\count@ by 1 \repeat
+\def\C #1 #2 #3 {\global\uccode"#1="#2 \global\lccode"#1="#3 } % case mappings
(non-letter)
+\def\L #1 #2 #3 {\global\catcode"#1=11 % category: letter
+ \C #1 #2 #3 % with case mappings
+ \ifnum"#1="#3 \else \global\sfcode"#1=999 \fi % uppercase letters have
sfcode=999
+ \global\XeTeXmathcode"#1="7"01"#1 % BMP letters default to class 7 (var),
fam 1
+ }
+\def\l #1 {\L #1 #1 #1 } % letter without case mappings
+\l 00AA
+\L 00B5 039C 00B5
+\l 00BA
+\L 00C0 00C0 00E0
+\L 00C1 00C1 00E1
+\L 00C2 00C2 00E2
+\L 00C3 00C3 00E3
+\L 00C4 00C4 00E4
+\L 00C5 00C5 00E5
+\L 00C6 00C6 00E6
+\L 00C7 00C7 00E7
+\L 00C8 00C8 00E8
+\L 00C9 00C9 00E9
+\L 00CA 00CA 00EA
+\L 00CB 00CB 00EB
+\L 00CC 00CC 00EC
+\L 00CD 00CD 00ED
+\L 00CE 00CE 00EE
+\L 00CF 00CF 00EF
+\L 00D0 00D0 00F0
+\L 00D1 00D1 00F1
+\L 00D2 00D2 00F2
+\L 00D3 00D3 00F3
+\L 00D4 00D4 00F4
+\L 00D5 00D5 00F5
+\L 00D6 00D6 00F6
+\L 00D8 00D8 00F8
+\L 00D9 00D9 00F9
+\L 00DA 00DA 00FA
+\L 00DB 00DB 00FB
+\L 00DC 00DC 00FC
+\L 00DD 00DD 00FD
+\L 00DE 00DE 00FE
+\l 00DF
+\L 00E0 00C0 00E0
+\L 00E1 00C1 00E1
+\L 00E2 00C2 00E2
+\L 00E3 00C3 00E3
+\L 00E4 00C4 00E4
+\L 00E5 00C5 00E5
+\L 00E6 00C6 00E6
+\L 00E7 00C7 00E7
+\L 00E8 00C8 00E8
+\L 00E9 00C9 00E9
+\L 00EA 00CA 00EA
+\L 00EB 00CB 00EB
+\L 00EC 00CC 00EC
+\L 00ED 00CD 00ED
+\L 00EE 00CE 00EE
+\L 00EF 00CF 00EF
+\L 00F0 00D0 00F0
+\L 00F1 00D1 00F1
+\L 00F2 00D2 00F2
+\L 00F3 00D3 00F3
+\L 00F4 00D4 00F4
+\L 00F5 00D5 00F5
+\L 00F6 00D6 00F6
+\L 00F8 00D8 00F8
+\L 00F9 00D9 00F9
+\L 00FA 00DA 00FA
+\L 00FB 00DB 00FB
+\L 00FC 00DC 00FC
+\L 00FD 00DD 00FD
+\L 00FE 00DE 00FE
+\L 00FF 0178 00FF
+\endgroup
+\expandafter\let\expandafter\dump\csname saved-dump-cs\endcsname
\dump