Am 04.06.2014 19:48, schrieb Arnt Gulbrandsen: > Compliant SMTP servers only accept mail to/from EAI addresses if the > SMTP client uses the SMTPUTF8 form of the MAIL FROM command. The SMTP > client, in turn, only uses that form if the origin too used it. > > The purpose of this feature is to guarantee that EAI messages don't land > in the mailboxes of incompatible recipients. The relevant effect of this > feature is that in order to send mail to a unicode address, the _sender_ > must declare that the message uses EAI. Having 8-bit clean relays on the > way is not enough. > >> Thus an EAI domain name may show up as xn--mumble in HELO commands. > > Yes. I think it's a bad idea to do that. The chance that some SMTP > server's gethostbyname() will return the UTF8 form and the SMTP server > then complain about EHLO/PTR mismatch is too great. But it can happen. > >> There will be more. I'll just document them and fix them, so I >> don't have to spend a lot of time reviewing another version.
I'm late to the game, haven't checked the relevant RFCs or Arnt's patch, but a few thoughts on this -- perhaps you can answer "all dealt with" -- but here we go: * It reminds me a bit of the 8BITMIME feature that was in discussion in the late 1990's/early 2000's. I think The World™ never consented on how to deal with all that depending on how radical a certain software implemented its policies. Meaning: do we need this? Is Microsoft going to implement it? IBM's Lotus Domino/Notes suites on the client end? * My bigger concern is that UNICODE opens up ambiguities at various levels, for instance when doing table lookups (especially for policies, such as access control): + IDN punycode (xn--blech-rassel), as mentioned above. + Unicode normalization forms, are these handled consistently? <http://www.unicode.org/reports/tr15/> I searched the patch for the word fragment "normal", no hits. I find that worrisome. + Characters that are different but use similar-looking gylphs, (homoglyphs), for instance, between Greek/Cyrillic/Latin scripts. Latin A, Cyrillic A, Greek A are three code points for an indistinguishable character. A А Α <- in what order are these? Hint: 0000000: 4120 d090 20ce 910a A .. ... or U+0041 U+0020 U+0410 U+0020 U+0391 Is there a consistent policy for treating them that does not open up loop- and ratholes and pitfalls and barndoors and all other sorts of unfortunate openings for unaware/malicious parties? + How does the patch make Postfix deal with table lookups for tables that don't go through postmap and cannot be normalized? I don't want to create artifical adoption obstacles here, but I think there is some room for nasty surprises, and that space needs exploration and solutions. That's not just security discussion, but also reliability. (Perhaps Unicode requires - or I missed - homoglyph tables, and case mapping tables...) I think Wietse's expectation on how not to change established behaviour of release versions is clear, and I've always known I can rely on Postfix's compatibility. (Not to say that Postfix's compatibility is exemplary, as in "good example", but I digress.)