On Thu, Jun 05, 2014 at 05:18:48PM +0200, Arnt Gulbrandsen wrote: > On Thursday, June 5, 2014 4:32:52 PM CEST, Viktor Dukhovni wrote: > >Domains passed to lookup tables and match lists need to be in > >a-label form. > > That would make pcre almost impossible and mysql and pgsql lookups rather > inconvenient.
What's the problem with the canonical representation of the domain exactly as it appears on the wire in DNS, in certificate DNS altnames, ... > The a-label form of bl?b?rsyltet?y in a-label form is > xn--blbrsyltety-y8ao3x. Matching the PCRE /.*syltet?y.*/ in a-label form > would be inconvenient, perhaps impossible. Regular expressions on partial DNS labels are not that useful anyway. Generally one just wants all the sub-domains of a particular domain. Sometimes one wants to filter cable-modem/DSL PTR records, otherwise I'm losing sleep over partial DNS label regexps. > Postgres and Mysql have builtin support for UTF8 strings so mysql/pgsql > tables can use e.g. the ilike operator, but they do not support strings > composed from a-labels. Here's a pgqsl concoction to match usernames, > optionally with subaddresses: Nothing lost when the domain name is a-label form. The localpart remains unicode, and one still needs some sort of UTF-8 -> utf-8 lower-case operator that operates correctly on ASCII. Frankly applying lowercase() to just the ASCII octets works fine in this situation, provided the domain is in a-label form already. Unicode email address localparts would be case-sensitive in their non-ASCII octets, not the end of the world. -- Viktor.