On Wednesday, June 4, 2014 11:16:51 PM CEST, Wietse Venema wrote:
* Postfix table queries are case-insensitive. I don't see any attempt
  to implement that for UTF8 addresses. This leaves an ambiguity.

I looked at this now.

As I read the code, tables mostly map to lower case and then do a binary comparison. The mysql and pgsql tables may additionally use the database server's ilike operation. Finally, lowercase() maps U to u, but leaves 0xC0 as 0xC0, even if the Postfix server runs in a locale where the lowercase form of that is 0xE0.

Is that correct?

I can provide a supplementary patch that provides case insensitivity for unicode. It's easy, but there are several ways to do it, and I don't know which you prefer.

1. Toupper/tolower in Postfix, with the usual table. This adds the bulk of a table and is language-independent but imperfect. The well-known problem is i/ı. (The lowercase("I") equvalent is "ı" in Turkish and a handful of other locales.)

2a. Toupper/tolower that call out to ICU if EAI is enabled and there's any non-ASCII is in the argument. This slows down toupper()/tolower() but Postfix escapes having the table and ICU devotes considerable effort to correctness. It's easy to compose the string, too (composition means to use å instead of "a"+"ring above").

2b. Ditto, but calling a language-sensitive function in ICU, so that i is equal to İ if the Postfix server runs in one of those locales. I'm unhappy about this alternative — a Swiss service provider may well service both Kazakh and Korean users and how should the service providers's Postfix be configured?

3. Switching to titlecase. A bigger change. Titlecase is a form in which in which case differences are erased and in principle it's neither equal to uppercase nor to lowercase. It's only usable for implementing case-insensitive comparison/lookup using fast binary comparison.

In my opinion the change to titlecase isn't worth it. There aren't enough problems with lowercase() to justify such a sweeping change. Also keeping lower case allows compiled tables to survive upgrades/downgrades.

I'm neutral regarding 1 and 2a. If you'll tell me what you prefer I'll write a patch and test that it matches another implementation.

Arnt

Reply via email to