On Wednesday, June 4, 2014 3:23:24 PM CEST, Wietse Venema wrote:
Arnt Gulbrandsen:
On Wednesday, June 4, 2014 12:55:18 PM CEST, Wietse Venema wrote:
 ...

Yes. We must maintain compatibility with existing practice. Postfix
has always passed 8-bit headers and envelopes (localparts) for the
past 15 years.  It would be an unaceptable compatibility break if,
for example, a corporate perimeter MTA were to start bouncing inbound
mail just because 1) some up-stream client is changed to flag that
email as SMTPUTF8, but 2) some down-stream internal server doesn't
announce SMTPUTF8.

I think you're right. The two code blocks that return 5.6.7 should perhaps be included later, but definitely not included now.

Thus, the SMTP client, cleanup daemon, and other daemon programs
MUST NOT engage into any EAI-related stuff unless a message is
flagged as EAI-enabled.  I will add a guard around that code.

The smtputf8 flag in the queue file acts as such a guard.

No it doesn't.

OK: It's meant to act as such a guard.

Example: ORCPT handling in the cleanup Milter client
and in the SMTP client is unconditional on the smtputf8 flag.
However, given that UTF8 addresses use a special encoding, I suspect
that it is better to decode them properly (the alternative would
be to not decode them at all and just pass them on, but that requires
some extra code to handle existing queue files that contain decoded
attributes).

You'll see some other code like that in the DSN generation, when it chooses quoting format. I didn't find an alternative I really liked.

It's not clear to me that UTF8 addresses always use that special encoding. They probably should, but I found 6533 rather confusing. The niceties of UTF8 addresses in SMTPUTF8 messages vs. UTF8 addresses in other settings aren't as simple as I wish they were.

The ORCPT code in Milter/SMTP expects that all 8-bit addresses are SMTPUTF8 addresses that have somehow escaped into ASCIIland, so they should be encoded as RFC6533 says in ORCPT. That's based on my reading of RFC6533. I don't entirely like it, but I don't see any real alternative either. If you see localpart "jøran" and don't know whether it's just-send-8 or escaped EAI, should you follow EAI's quoting rules or extrapolate from RFC1984?

And what should you do if you receive an ORCPT using EAI-style quoting even though the MAIL FROM did not declare SMTPUTF8? Should that ORCPT be reencoded using 1984 encoding or keep its EAI encoding? Icky.

Have you given any thought of what happens when a company installs
Postfix-EAI on the perimeter, and WANTS TO FORWARD THE MAIL TO THEIR
INTERNAL SYSTEMS that may or may not have EAI support?

Yes.
...
Outgoing mail from that company to unicode addresses may begin to work, depending on whether the internal origin server supports EAI.

Incorrect. This does not require any EAI support in the SMTP client.
The SMTP client simply hands the mail to the gateway without any
transformation of the recipient domain.

If the best MX for the unicode recipient obeys RFC6531 section 3.4, then the SMTP client on the gateway has to use the SMTPUTF8 MAIL FROM parameter, ie. support EAI. By extension the origin server has to do the same.

Incoming mail to that company from unicode addresses still doesn't work.

This has worked for 15 years, at least with UTF8 localparts.

Sorry about the sloppy writing. I meant unicode domains. You're right, it has worked with 8-bit localparts in ASCII domains.

We
must maintain compatibility with existing practice. It would be an
unacceptable compatibility break if Postfix were to suddenly start
rejecting such mail.

OK.

Is there a possibity that the same domain name may exist as an UTF8
string in some contexts and as xn-mumble elsewhere?  If this is a
problem then it will affect many database lookups.

As far as I can tell the xn-- mumble is never used outside the DNS lookups, neither in the RFCs nor in practice. The EAI RFCs say to use the xn-- form for MX lookups, to use an ASCII domain name for the EHLO argument, and otherwise don't discuss xn--.

In particular they don't say that the email address foo@xn--bar is equivalent to foo@bär. They also don't say it's different.

I chose to make them essentially different. If a site admin chooses to add xn--bar to mydestinations, that user has to configure the rest so it works. I chose that mostly because I think xn-- is a phisher's dream. People won't recognize their own domains. But the choice also makes life simpler for table/database lookups.

How do UTF8 domain names interact with DNS RHSBL lists? Do they
expect the UTF8 form or the xn--mumble form?

Unknown as yet. I expect it'll have to be xn-- mumble, but that's really just my guesswork. As far as I could tell none of the RHSBL operators have considered that matter yet.

How do UTF8 domain names interact with reject_unknown_sender_domain,
reject_unknown_recipient_domain, etc.? It looks like you are passing
the UTF8 domain name in DNS queries.

I added a new function, valid_mail_domain(), which is essentially like the old valid_hostname() except that it takes UTF8 and converts at xn--mumble, then I inspected each caller to decide whether it should call valid_hostname() or valid_mail_domain(). If you want I'll list each caller and my rationale for the decision.

First, all Postfix table lookups are case-insensitive by default.
You may have missed that.

Indeed I did. Mydestinations will need more work, at least. I'll look at it.

Second, not all lookup tables may support UTF8.  What does the POSIX
standard have to say about this for regular expressions?  This
affects the regexp: table.

Third, in database queries, strings that contain UTF8 may require
special treatment when the default locale is not unicode-based.
We must maintain compatibility with existing practice: Postfix
currently passes 8bit strings as if they are in the default locale.
It would be an unacceptable compatibility break if Postfix suddenly
starts to fail those queries just because they aren't well-formed
UTF8.

Are you saying that at present, Postfix treats other people's 8bit as though it were case-insensitive in the server's locale? And that Postfix requires tables to be case-insensitive and silently expecting them to use the right locale?

The pgsql table, at least, appears to uses the locale that was chosen while creating the database, not the system locale on the Postfix server.

Being compatible with that will require a bit of luck.

So it looks like there all the work on the database interface still
needs to be done.

Finally, you appear to have broken the valid_hostname(3) abstraction.
This module enforces RFC rules for hostnames (and domain names) in
calls of infrastructure functions such as getaddrinfo(), getnameinfo()
and functions at lower levels in the stack.
Unless the EAI RFCs say otherwise, the hostname in HELO commands
cannot be an UTF8 string, therefore it cannot be treated as if it
is a recipient domain.

I agree (and I think I said as much in the README). That's why I call valid_hostname() in many cases and valid_mail_domain() in others.

Recipient domains require a validator that is specific for recipient
domains, and that validator does not belong in the valid_hostname(3)
module. I think this also requires a different version of the
host_port() function that is specific for recipient addresses and
that has flag whether or not UTF8 functionality is enabled.

Moving valid_mail_domain() into its own file is fine. The purpose of valid_mail_domain() is precisely to validate recipient (and sender) domains.

I think host_port() had better not be split. It's used for mail hosts, and those are like EHLO arguments, they have to be ASCII even when sender/recipient domains can be unicode. So in /etc/postfix/transport the LHS can use unicode but the RHS cannot for the foreseeable future.

More later, after I have reviewed the rest of the code, and after
I have checked it against the RFCs for compliance and completeness.

You've already found at least two holes, perhaps four. You told me earlier I needn't bother about minor improvements. Are these big enough that you'd prefer me to submit a new patch?

Thanks for your responses; I hope I haven't disturbed your vacation too much.

Anrt

Reply via email to