Arthur Reutenauer wrote:
        Mojca,

  I'm still not getting the algorithm you suggest.  In particular:

I need a definition for command
   \def\mycommand#1#2{...}
that I could call as
    % A3 is code of ccaron in EC
    % č is two tokens: ^^c4^^8d
    \mycommand{č}{^^a3}
    % no idea what is Tau in greek encoding (don't care)
    % but it's only a single token
    \mycommand{Τ}{^^ff}

The pseudocode:
  - test if #1 is one or two tokens (use the same trick as Taco suggested)
  - if it's interpreted as two tokens, ignore
  - if it's interpreted as one token (like Tau),
    make that letter \active and define it to generate #2

  That won't be enough.  Because, if I undertand Z. R.'s explanations
correctly, you could have the following situation:

  (Assuming pTeX is in EUC-JP mode)

  1. The input is “ši” (U+0161, U+0069).  It's reencoded as 0xB2, 0x69
in the EC font encoding, which is not a valid EUC-JP code, hence the
first byte is interpreted as a character, and so is the second byte.

  2. The input is “šč” (U+0161, U+010D).  It's reencoded as 0xB2, 0xA3
in EC, which *is* a valid EUC-JP code (corresponding to Unicode
character U+6A2A, as it is), hence that two-character sequences is
interpreted as a single Japanese character, and the original input is
simply lost.

  I don't see how we could solve the situation by considering each
character individually (like we currently do in UTF-8), given pTeX's
behaviour.

I also read that explanation (but not very thoroughly). My impression
was: ptex understands any kind of input as long as it is a valid
Japanese character, and produces random 8-bit stuff otherwise (which
could coincide with an 8-bit font encoding for western europe, but only
if you are both careful and lucky).

I cannot imagine how it would be possible to work around those input
restrictions dynamically.

Best wishes,
Taco




Reply via email to