Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-06 Thread Arnt Gulbrandsen

On Friday, June 6, 2014 1:55:37 AM CEST, Wietse Venema wrote:

Postfix is meant to be used by human operators anywhere on the
Internet. Therefore, the postqueue/postmap/etc.  tools will have
to accept non-ASCII domain names from a human operator in either
UTF-8 form and xn--mumble form, and they will have to convert those
forms into their stored form. Those tools will also have to render
non-ASCII domain names in their stored form, or convert them into
UTF-8 or xn--mumble form on request by the human operator.


Makes sense; patch coming. Will take a few days.

Arnt


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Arnt Gulbrandsen

On Wednesday, June 4, 2014 11:16:51 PM CEST, Wietse Venema wrote:

* Postfix table queries are case-insensitive. I don't see any attempt
  to implement that for UTF8 addresses. This leaves an ambiguity.


I looked at this now.

As I read the code, tables mostly map to lower case and then do a binary 
comparison. The mysql and pgsql tables may additionally use the database 
server's ilike operation. Finally, lowercase() maps U to u, but leaves 0xC0 
as 0xC0, even if the Postfix server runs in a locale where the lowercase 
form of that is 0xE0.


Is that correct?

I can provide a supplementary patch that provides case insensitivity for 
unicode. It's easy, but there are several ways to do it, and I don't know 
which you prefer.


1. Toupper/tolower in Postfix, with the usual table. This adds the bulk of 
a table and is language-independent but imperfect. The well-known problem 
is i/ı. (The lowercase(I) equvalent is ı in Turkish and a handful of 
other locales.)


2a. Toupper/tolower that call out to ICU if EAI is enabled and there's any 
non-ASCII is in the argument. This slows down toupper()/tolower() but 
Postfix escapes having the table and ICU devotes considerable effort to 
correctness. It's easy to compose the string, too (composition means to use 
å instead of a+ring above).


2b. Ditto, but calling a language-sensitive function in ICU, so that i is 
equal to İ if the Postfix server runs in one of those locales. I'm unhappy 
about this alternative — a Swiss service provider may well service both 
Kazakh and Korean users and how should the service providers's Postfix be 
configured?


3. Switching to titlecase. A bigger change. Titlecase is a form in which in 
which case differences are erased and in principle it's neither equal to 
uppercase nor to lowercase. It's only usable for implementing 
case-insensitive comparison/lookup using fast binary comparison.


In my opinion the change to titlecase isn't worth it. There aren't enough 
problems with lowercase() to justify such a sweeping change. Also keeping 
lower case allows compiled tables to survive upgrades/downgrades.


I'm neutral regarding 1 and 2a. If you'll tell me what you prefer I'll 
write a patch and test that it matches another implementation.


Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Arnt Gulbrandsen

On Thursday, June 5, 2014 10:36:49 AM CEST, Arnt Gulbrandsen wrote:
In my opinion the change to titlecase isn't worth it. There 
aren't enough problems with lowercase() to justify such a 
sweeping change. Also keeping lower case allows compiled tables 
to survive upgrades/downgrades.


Worse: There are likely user-supplied tables that depend on lowercase 
input. Both the mysql and pgsql tables make it easy to configure 
case-sensitive queries, which switching to titlecase would break. I think 
titlecase is definitely out.


(Btw, I wrote tolower() instead of lowercase() once or twice in the 
previous message. Sorry.)


Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Wietse Venema
Arnt Gulbrandsen:
 On Wednesday, June 4, 2014 11:16:51 PM CEST, Wietse Venema wrote:
  * Postfix table queries are case-insensitive. I don't see any attempt
to implement that for UTF8 addresses. This leaves an ambiguity.
 
 I looked at this now.
 
 As I read the code, tables mostly map to lower case and then do a binary 
 comparison. The mysql and pgsql tables may additionally use the database 
 server's ilike operation. Finally, lowercase() maps U to u, but leaves 0xC0 
 as 0xC0, even if the Postfix server runs in a locale where the lowercase 
 form of that is 0xE0.
 
 Is that correct?

That question is not applicable.  Postfix locale is C, and
lowercase() only translates ASCII characters.

 I can provide a supplementary patch that provides case insensitivity for 
 unicode. It's easy, but there are several ways to do it, and I don't know 
 which you prefer.
 
 1. Toupper/tolower in Postfix, with the usual table. This adds the bulk of 
 a table and is language-independent but imperfect. The well-known problem 
 is i/?. (The lowercase(I) equvalent is ? in Turkish and a handful of 
 other locales.)
 
 2a. Toupper/tolower that call out to ICU if EAI is enabled and there's any 
 non-ASCII is in the argument. This slows down toupper()/tolower() but 
 Postfix escapes having the table and ICU devotes considerable effort to 
 correctness. It's easy to compose the string, too (composition means to use 
 ? instead of a+ring above).
 
 2b. Ditto, but calling a language-sensitive function in ICU, so that i is 
 equal to ? if the Postfix server runs in one of those locales. I'm unhappy 
 about this alternative ? a Swiss service provider may well service both 
 Kazakh and Korean users and how should the service providers's Postfix be 
 configured?
 
 3. Switching to titlecase. A bigger change. Titlecase is a form in which in 
 which case differences are erased and in principle it's neither equal to 
 uppercase nor to lowercase. It's only usable for implementing 
 case-insensitive comparison/lookup using fast binary comparison.
 
 In my opinion the change to titlecase isn't worth it. There aren't enough 
 problems with lowercase() to justify such a sweeping change. Also keeping 
 lower case allows compiled tables to survive upgrades/downgrades.
 
 I'm neutral regarding 1 and 2a. If you'll tell me what you prefer I'll 
 write a patch and test that it matches another implementation.

This will require further research. If case canonicalization is as
complex as you describe then the correct result is likely to
differ from what real people expect. That is a security hole.

Wietse


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Arnt Gulbrandsen

This will require further research. If case canonicalization is as
complex as you describe then the correct result is likely to
differ from what real people expect. That is a security hole.


That was the case in the nineties, but by now the case folding algorithms 
in unicode have won. They've been used to much that people have come to 
expect that they're right. There are problems, but lowercase() escapes all 
but ı/i.


But ı is nasty. I have even found two domains that differ only in ı/i, so 
Postfix cannot treat them as equal.


Composition (the other part of canonicalization) is worse matter. You're 
right, that might lead to security problems. It can lead to table lookup 
misses, and I'm sure that table misses can lead to several kinds of 
security problems. For example forgetting mandatory TLS.


The safest alternative is to fully compose table lookup keys. (Or fully 
decompose, but fully compose is usually faster.) I'll provide a patch to do 
the 2a alternative. It'll take a few days.


Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Viktor Dukhovni
On Thu, Jun 05, 2014 at 02:24:38PM +0200, Arnt Gulbrandsen wrote:

 But ? is nasty. I have even found two domains that differ only in ?/i, so
 Postfix cannot treat them as equal.

Domains passed to lookup tables and match lists need to be in
a-label form.  The remaining surprises with domains and case-insensitive
comparisons vs. unicode will be with header/body checks, likely OK.

-- 
Viktor.


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Arnt Gulbrandsen

On Thursday, June 5, 2014 4:32:52 PM CEST, Viktor Dukhovni wrote:

Domains passed to lookup tables and match lists need to be in
a-label form.


That would make pcre almost impossible and mysql and pgsql lookups rather 
inconvenient.


The a-label form of blåbærsyltetøy in a-label form is 
xn--blbrsyltety-y8ao3x. Matching the PCRE /.*syltetøy.*/ in a-label form 
would be inconvenient, perhaps impossible.


Postgres and Mysql have builtin support for UTF8 strings so mysql/pgsql 
tables can use e.g. the ilike operator, but they do not support strings 
composed from a-labels. Here's a pgqsl concoction to match usernames, 
optionally with subaddresses:


  select id from addresses where localpart='%u' or localpart ilike '%u-%'

I cannot imagine any way to implement that if %u is in a-label form.

Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Viktor Dukhovni
On Thu, Jun 05, 2014 at 05:18:48PM +0200, Arnt Gulbrandsen wrote:

 On Thursday, June 5, 2014 4:32:52 PM CEST, Viktor Dukhovni wrote:
 Domains passed to lookup tables and match lists need to be in
 a-label form.
 
 That would make pcre almost impossible and mysql and pgsql lookups rather
 inconvenient.

What's the problem with the canonical representation of the domain exactly
as it appears on the wire in DNS, in certificate DNS altnames, ...

 The a-label form of bl?b?rsyltet?y in a-label form is
 xn--blbrsyltety-y8ao3x. Matching the PCRE /.*syltet?y.*/ in a-label form
 would be inconvenient, perhaps impossible.

Regular expressions on partial DNS labels are not that useful anyway.
Generally one just wants all the sub-domains of a particular domain.
Sometimes one wants to filter cable-modem/DSL PTR records, otherwise
I'm losing sleep over partial DNS label regexps.

 Postgres and Mysql have builtin support for UTF8 strings so mysql/pgsql
 tables can use e.g. the ilike operator, but they do not support strings
 composed from a-labels. Here's a pgqsl concoction to match usernames,
 optionally with subaddresses:

Nothing lost when the domain name is a-label form.  The localpart
remains unicode, and one still needs some sort of UTF-8 - utf-8
lower-case operator that operates correctly on ASCII.  Frankly
applying lowercase() to just the ASCII octets works fine in this
situation, provided the domain is in a-label form already.  Unicode
email address localparts would be case-sensitive in their non-ASCII
octets, not the end of the world.

-- 
Viktor.


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-05 Thread Wietse Venema
Viktor Dukhovni:
 Not too many people in Russia read Hebrew (right to left) or can
 even cut and paste it reliably into a left to right context.

Postfix is meant to be used by human operators anywhere on the
Internet. Therefore, the postqueue/postmap/etc.  tools will have
to accept non-ASCII domain names from a human operator in either
UTF-8 form and xn--mumble form, and they will have to convert those
forms into their stored form. Those tools will also have to render
non-ASCII domain names in their stored form, or convert them into
UTF-8 or xn--mumble form on request by the human operator.

This way, human operators can manage domain names that are in the
operator's native script, but they can fall back to ASCII when the
domain is in some alien script.

So it does not matter what the stored form is (and in the case of
the mail queue, the stored form is controlled by the sender anyway).
What matters is that Postfix management tools allow humans to use
the stored form effectively. In other words, the tools must allow
the human operator to choose how to enter a non-ASCII domain name,
and how to render it.

See also my longer, previous, post in this thread.

Wietse


Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-04 Thread Arnt Gulbrandsen

On Wednesday, June 4, 2014 3:23:24 PM CEST, Wietse Venema wrote:

Arnt Gulbrandsen:

On Wednesday, June 4, 2014 12:55:18 PM CEST, Wietse Venema wrote:

 ...

Yes. We must maintain compatibility with existing practice. Postfix
has always passed 8-bit headers and envelopes (localparts) for the
past 15 years.  It would be an unaceptable compatibility break if,
for example, a corporate perimeter MTA were to start bouncing inbound
mail just because 1) some up-stream client is changed to flag that
email as SMTPUTF8, but 2) some down-stream internal server doesn't
announce SMTPUTF8.


I think you're right. The two code blocks that return 5.6.7 should perhaps 
be included later, but definitely not included now.



Thus, the SMTP client, cleanup daemon, and other daemon programs
MUST NOT engage into any EAI-related stuff unless a message is
flagged as EAI-enabled.  I will add a guard around that code.


The smtputf8 flag in the queue file acts as such a guard.


No it doesn't.


OK: It's meant to act as such a guard.


Example: ORCPT handling in the cleanup Milter client
and in the SMTP client is unconditional on the smtputf8 flag.
However, given that UTF8 addresses use a special encoding, I suspect
that it is better to decode them properly (the alternative would
be to not decode them at all and just pass them on, but that requires
some extra code to handle existing queue files that contain decoded
attributes).


You'll see some other code like that in the DSN generation, when it chooses 
quoting format. I didn't find an alternative I really liked.


It's not clear to me that UTF8 addresses always use that special encoding. 
They probably should, but I found 6533 rather confusing. The niceties of 
UTF8 addresses in SMTPUTF8 messages vs. UTF8 addresses in other settings 
aren't as simple as I wish they were.


The ORCPT code in Milter/SMTP expects that all 8-bit addresses are SMTPUTF8 
addresses that have somehow escaped into ASCIIland, so they should be 
encoded as RFC6533 says in ORCPT. That's based on my reading of RFC6533. I 
don't entirely like it, but I don't see any real alternative either. If you 
see localpart jøran and don't know whether it's just-send-8 or escaped 
EAI, should you follow EAI's quoting rules or extrapolate from RFC1984?


And what should you do if you receive an ORCPT using EAI-style quoting even 
though the MAIL FROM did not declare SMTPUTF8? Should that ORCPT be 
reencoded using 1984 encoding or keep its EAI encoding? Icky.



Have you given any thought of what happens when a company installs
Postfix-EAI on the perimeter, and WANTS TO FORWARD THE MAIL TO THEIR
INTERNAL SYSTEMS that may or may not have EAI support?


Yes.

...
Outgoing mail from that company to unicode addresses may begin to work, 
depending on whether the internal origin server supports EAI.


Incorrect. This does not require any EAI support in the SMTP client.
The SMTP client simply hands the mail to the gateway without any
transformation of the recipient domain.


If the best MX for the unicode recipient obeys RFC6531 section 3.4, then 
the SMTP client on the gateway has to use the SMTPUTF8 MAIL FROM parameter, 
ie. support EAI. By extension the origin server has to do the same.


Incoming mail to that company from unicode addresses still doesn't work. 


This has worked for 15 years, at least with UTF8 localparts.


Sorry about the sloppy writing. I meant unicode domains. You're right, it 
has worked with 8-bit localparts in ASCII domains.



We
must maintain compatibility with existing practice. It would be an
unacceptable compatibility break if Postfix were to suddenly start
rejecting such mail.


OK.


Is there a possibity that the same domain name may exist as an UTF8
string in some contexts and as xn-mumble elsewhere?  If this is a
problem then it will affect many database lookups.


As far as I can tell the xn-- mumble is never used outside the DNS lookups, 
neither in the RFCs nor in practice. The EAI RFCs say to use the xn-- form 
for MX lookups, to use an ASCII domain name for the EHLO argument, and 
otherwise don't discuss xn--.


In particular they don't say that the email address foo@xn--bar is 
equivalent to foo@bär. They also don't say it's different.


I chose to make them essentially different. If a site admin chooses to add 
xn--bar to mydestinations, that user has to configure the rest so it works. 
I chose that mostly because I think xn-- is a phisher's dream. People won't 
recognize their own domains. But the choice also makes life simpler for 
table/database lookups.



How do UTF8 domain names interact with DNS RHSBL lists? Do they
expect the UTF8 form or the xn--mumble form?


Unknown as yet. I expect it'll have to be xn-- mumble, but that's really 
just my guesswork. As far as I could tell none of the RHSBL operators have 
considered that matter yet.



How do UTF8 domain names interact with reject_unknown_sender_domain,
reject_unknown_recipient_domain, etc.? It looks like you are passing
the 

Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-04 Thread Arnt Gulbrandsen

On Wednesday, June 4, 2014 6:58:43 PM CEST, Viktor Dukhovni wrote:

My impression is that UTF-8 domain names are are an MUA display
format issue.


There was tremedously tedious discussion of the approach you suggest, and 
of many others. There was even a set of experimental RFCs issued. In the 
end the experimental RFCs were discarded. here's how RFC 6532 sums up the 
final message format changes:


  The preceding changes mean that the following constructs now allow
  UTF-8:

  1.  Unstructured text, used in header fields like Subject: or
  Content-description:.

  2.  Any construct that uses atoms, including but not limited to the
  local parts of addresses and Message-IDs.  This includes
  addresses in the for clauses of Received: header fields.

  3.  Quoted strings.

  4.  Domains.

6531 references 6532, and the MAIL FROM/RCPT TO syntax allows UTF8. 6855 
makes corresponding changes to IMAP (no more mUTF7, hurray), 6856 to POP, 
etc.


Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-04 Thread Arnt Gulbrandsen

On Wednesday, June 4, 2014 8:38:49 PM CEST, Wietse Venema wrote:

I'll read the RFCs carefully and see where it allows UTF8 in SMTP
command parameters and replies.


You'll do that, but I'll tell you anyway: The client may use it once the 
server has issued an EHLO response containing SMTPUTF8, and the server may 
use it once the client has issued a MAIL FROM, VRFY or EXPN command with 
the SMTPUTF8 parameter.


Postfix (both with and without my patch) violates that. If a client tells 
Postfix:


 MAIL FROM:æ@æ.æ

then Postfix may conceivably answer that æ@æ.æ is not a legal sender 
address, since æ.æ isn't a valid domain. 6531 says that that response 
should be ASCII-only, since the client hasn't given permission to use UTF8 
in responses. My viewpoint is that no matter what RFC6531 says, the client 
must accept hearing its own arguments in the SMTP reply. Postfix is right 
and 6531 is wrongish, so I followed Postfix' reply style rather than comply 
with 6531.



However even without reading those RFCs it is clear that UTF8 cannot
be used in 220 server greetings or in EHLO commands or replies,
because at that time the server/client have not agreed to use UTF8.


Right.


Thus, myhostname (or equivalent) must be ASCII, as it always must
have been.  There is no need to use valid_mail_domain() in
reject_non_fqdn_hostname etc.


Right. I made some mistakes. I wish I were perfect, but know I am not.

Arnt



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-04 Thread Matthias Andree
Am 04.06.2014 19:48, schrieb Arnt Gulbrandsen:
 Compliant SMTP servers only accept mail to/from EAI addresses if the
 SMTP client uses the SMTPUTF8 form of the MAIL FROM command. The SMTP
 client, in turn, only uses that form if the origin too used it.
 
 The purpose of this feature is to guarantee that EAI messages don't land
 in the mailboxes of incompatible recipients. The relevant effect of this
 feature is that in order to send mail to a unicode address, the _sender_
 must declare that the message uses EAI. Having 8-bit clean relays on the
 way is not enough.
 
 Thus an EAI domain name may show up as xn--mumble in HELO commands.
 
 Yes. I think it's a bad idea to do that. The chance that some SMTP
 server's gethostbyname() will return the UTF8 form and the SMTP server
 then complain about EHLO/PTR mismatch is too great. But it can happen.
 
 There will be more. I'll just document them and fix them, so I
 don't have to spend a lot of time reviewing another version.

I'm late to the game, haven't checked the relevant RFCs or Arnt's patch,
but a few thoughts on this -- perhaps you can answer all dealt with --
but here we go:

* It reminds me a bit of the 8BITMIME feature that was in discussion in
the late 1990's/early 2000's.  I think The World™ never consented on how
to deal with all that depending on how radical a certain software
implemented its policies.  Meaning: do we need this?  Is Microsoft going
to implement it?  IBM's Lotus Domino/Notes suites on the client end?


* My bigger concern is that UNICODE opens up ambiguities at various
levels, for instance when doing table lookups (especially for policies,
such as access control):

  + IDN punycode (xn--blech-rassel), as mentioned above.

  + Unicode normalization forms, are these handled consistently?
http://www.unicode.org/reports/tr15/
I searched the patch for the word fragment normal, no hits.
I find that worrisome.

  + Characters that are different but use similar-looking gylphs,
(homoglyphs), for instance, between Greek/Cyrillic/Latin scripts.
Latin A, Cyrillic A, Greek A are three code points for an
indistinguishable character. A А Α - in what order are these?
Hint:
000: 4120 d090 20ce 910a  A .. ...
or U+0041 U+0020 U+0410 U+0020 U+0391

Is there a consistent policy for treating them that does not open up
loop- and ratholes and pitfalls and barndoors and all other sorts of
unfortunate openings for unaware/malicious parties?

  + How does the patch make Postfix deal with table lookups for tables
that don't go through postmap and cannot be normalized?

I don't want to create artifical adoption obstacles here, but I think
there is some room for nasty surprises, and that space needs exploration
and solutions.  That's not just security discussion, but also reliability.

(Perhaps Unicode requires - or I missed - homoglyph tables, and case
mapping tables...)

I think Wietse's expectation on how not to change established behaviour
of release versions is clear, and I've always known I can rely on
Postfix's compatibility.  (Not to say that Postfix's compatibility is
exemplary, as in good example, but I digress.)



Re: Patch: Unicode email support (RFC 6531, 6532, 6533)

2014-06-04 Thread Wietse Venema
Arnt Gulbrandsen:
 De/composition are pushed to the DNS. The SMTP part just says: Convert to a 
 IDNA a-labels in order to do the MX lookup, and otherwise don't mess with 
 the bytes you received. (My patch uses ICU to convert to a-labels.)

That is a mis-conception.

DNS is not the only interface that requires xn--mumble names. Like
a cancer, EAI has the potential to infect many aspects of address
handling and policy lookup. This is why I estimated that SMTPUTF8
would be a major project.

* The form xn--mumble will also be required in server greetings and
  EHLO commands, when an MTA host- or domain name contains non-ASCII
  characters. This means that Postfix must convert myhostname into
  xn--mumble form in those contexts that require ASCII text.

* With multiple forms for the same domain name, xn--mumble in
  HELO/EHLO (and perhaps other SMTP commands) and UTF8 in
  MAIL/RCPT/ETRN/VRFY, Postfix lookup tables must either contain
  multiple lookup keys for the same domain name, or Postfix must
  convert all domain/email-address lookup keys into one canonical
  form. That is, either convert all UTF8 domain names into xn--mumble,
  or convert all xn--mumble domain names into UTF8.  Having only
  one lookup key per domain in Postfix lookup tables will more
  secure but it will be a royal pain to implement (and here is no
  way to do that with header/body_checks).

* I am not sure that we can rely on the postmap table query or
  create map commands to normalize domain names in lookup keys.
  Also, LDAP/*SQL*/etc.  databases aren't created with postmap
  commands.  All this could be another argument to use only xn--mumble
  or to use only UTF8 forms in databases. Again, more secure but a
  royal pain to implement, because postmap doesn't really know if
  a lookup key is a user, a domain, or something else.

* If xn--mumble were to become the canonical form for table lookup,
  then Postfix parent-domain matching will not be broken: where
  buecher.com becomes xn--bcher-kva.com, foo.buecher.com becomes
  foo.xn--bcher-kva.com.

Other things:

* Postfix table queries are case-insensitive. I don't see any attempt
  to implement that for UTF8 addresses. This leaves an ambiguity.

Wietse