Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

Matthias Andree Wed, 04 Jun 2014 12:04:02 -0700

Am 04.06.2014 19:48, schrieb Arnt Gulbrandsen:
> Compliant SMTP servers only accept mail to/from EAI addresses if the
> SMTP client uses the SMTPUTF8 form of the MAIL FROM command. The SMTP
> client, in turn, only uses that form if the origin too used it.
> 
> The purpose of this feature is to guarantee that EAI messages don't land
> in the mailboxes of incompatible recipients. The relevant effect of this
> feature is that in order to send mail to a unicode address, the _sender_
> must declare that the message uses EAI. Having 8-bit clean relays on the
> way is not enough.
> 
>> Thus an EAI domain name may show up as xn--mumble in HELO commands.
> 
> Yes. I think it's a bad idea to do that. The chance that some SMTP
> server's gethostbyname() will return the UTF8 form and the SMTP server
> then complain about EHLO/PTR mismatch is too great. But it can happen.
> 
>> There will be more. I'll just document them and fix them, so I
>> don't have to spend a lot of time reviewing another version.


I'm late to the game, haven't checked the relevant RFCs or Arnt's patch,
but a few thoughts on this -- perhaps you can answer "all dealt with" --
but here we go:

* It reminds me a bit of the 8BITMIME feature that was in discussion in
the late 1990's/early 2000's.  I think The World™ never consented on how
to deal with all that depending on how radical a certain software
implemented its policies.  Meaning: do we need this?  Is Microsoft going
to implement it?  IBM's Lotus Domino/Notes suites on the client end?


* My bigger concern is that UNICODE opens up ambiguities at various
levels, for instance when doing table lookups (especially for policies,
such as access control):

  + IDN punycode (xn--blech-rassel), as mentioned above.

  + Unicode normalization forms, are these handled consistently?
    <http://www.unicode.org/reports/tr15/>
    I searched the patch for the word fragment "normal", no hits.
    I find that worrisome.

  + Characters that are different but use similar-looking gylphs,
    (homoglyphs), for instance, between Greek/Cyrillic/Latin scripts.
    Latin A, Cyrillic A, Greek A are three code points for an
    indistinguishable character. A А Α <- in what order are these?
    Hint:
    0000000: 4120 d090 20ce 910a                      A .. ...
    or U+0041 U+0020 U+0410 U+0020 U+0391

    Is there a consistent policy for treating them that does not open up
    loop- and ratholes and pitfalls and barndoors and all other sorts of
    unfortunate openings for unaware/malicious parties?

  + How does the patch make Postfix deal with table lookups for tables
    that don't go through postmap and cannot be normalized?

I don't want to create artifical adoption obstacles here, but I think
there is some room for nasty surprises, and that space needs exploration
and solutions.  That's not just security discussion, but also reliability.

(Perhaps Unicode requires - or I missed - homoglyph tables, and case
mapping tables...)

I think Wietse's expectation on how not to change established behaviour
of release versions is clear, and I've always known I can rely on
Postfix's compatibility.  (Not to say that Postfix's compatibility is
exemplary, as in "good example", but I digress.)

Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

Reply via email to