Re: [tex-hyphen] ptex-specific patterns

Arthur Reutenauer Mon, 31 May 2010 08:20:28 -0700

        Mojca,

  I'm still not getting the algorithm you suggest.  In particular:


> I need a definition for command
>    \def\mycommand#1#2{...}
> that I could call as
>     % A3 is code of ccaron in EC
>     % č is two tokens: ^^c4^^8d
>     \mycommand{č}{^^a3}
>     % no idea what is Tau in greek encoding (don't care)
>     % but it's only a single token
>     \mycommand{Τ}{^^ff}
> 
> The pseudocode:
>   - test if #1 is one or two tokens (use the same trick as Taco suggested)
>   - if it's interpreted as two tokens, ignore
>   - if it's interpreted as one token (like Tau),
>     make that letter \active and define it to generate #2

  That won't be enough.  Because, if I undertand Z. R.'s explanations
correctly, you could have the following situation:

  (Assuming pTeX is in EUC-JP mode)

  1. The input is “ši” (U+0161, U+0069).  It's reencoded as 0xB2, 0x69
in the EC font encoding, which is not a valid EUC-JP code, hence the
first byte is interpreted as a character, and so is the second byte.

  2. The input is “šč” (U+0161, U+010D).  It's reencoded as 0xB2, 0xA3
in EC, which *is* a valid EUC-JP code (corresponding to Unicode
character U+6A2A, as it is), hence that two-character sequences is
interpreted as a single Japanese character, and the original input is
simply lost.

  I don't see how we could solve the situation by considering each
character individually (like we currently do in UTF-8), given pTeX's
behaviour.

>>  In this case, you don't have infinite recursion, but the system blows
>> up in your face in some other way I can't remember; I came into it two
>> years ago when we were converting the patterns, at some early stage.
>> There may be a workaround, though.
> 
> Let's just assume that this won't happen. If it does, we'll care about it 
> later.

  You can't just assume that.  On the contrary, we need to know it now
if it's going to happen in order to prevent it.

> Maybe because I have no idea how the Russians use it. It would be fine
> with me to change it, but we need to do that exclusively in
> cooperation with the author.

  And I think we should do that.  Again, it's been our intention since
two years anyway.

> I would worry much more about the crazyness of last-minute addition of
> luatex-specific loading :) :) :)

  I don't see what's to worry about that.  It's a new feature, we don't
have the burden of backward compatibility, and it's LuaTeX, so users
expect it to be experimental.

        Arthur

Re: [tex-hyphen] ptex-specific patterns

Reply via email to