On 16 Jun 2008, at 5:02 pm, Apostolos Syropoulos wrote:

  Hello,

   I have checked the proposed and it is obvious that it depends on
unicode-letters.tex to have the correct \catcode,
\lccode and \uccode. But it seems that this file is buggy.

Not really. It is intended to initialize the TeX code tables according to the Unicode standard, and it does so correctly.

Take for
example the following entry

\L 1F24 1F2C 1F24

This means that the \uccode for ETA WITH PSILI AND OXIA is CAPITAL ETA
WITH PSILI AND OXIA which absolutely
totally wrong!

But it is defined as such by the standard:

1F24;GREEK SMALL LETTER ETA WITH PSILI AND OXIA;Ll;0;L;1F20 0301;;;;N;;;1F2C;;1F2C

(see http://unicode.org/Public/UNIDATA/UnicodeData.txt)

In Greek when one capitalizes letters they all lose
their accents. The only things that remain are the
dieresis (in case a letter has both a dieresis and an accent then the
accent simply goes away) and the YPOGEGRAMMENI
which becomes a PROSGEGRAMMENI. The correct codes can be found in the
attached file.

While I can understand your concern to improve the behavior for Greek, I'm a bit reluctant to make such changes at this level. My feeling is that the "built-in" behavior in xe(la)tex, provided by the default formats and unicode-letters.tex, should follow the defined international standards as closely as possible, even though this is not always ideal for particular languages.

To customize or override the Unicode definitions for Greek, I would suggest providing a LaTeX package (to make it easy to use) that loads the xgrcodes settings; or perhaps this can be integrated with François's polyglossia. This approach keeps the default as "clean" and standards-based as possible (and simplifies maintenance by keeping your Greek-specific support separate from the files that are auto-generated from the Unicode database), while making it easy for users who want those features to load them in their documents.

Jonathan

Reply via email to