Date: Sun, 1 Nov 2015 23:45:36 -0800 From: Erik Fair <f...@netbsd.org> Message-ID: <56dd2b5f-26b1-40b7-bd84-103d3f255...@netbsd.org>
| So, what did we do by default: allow "_" in hostnames when that's | explicitly against standard, or not? Which standard? There's RFC952 that specifies the format of HOSTS.TXT (from 1985). (The update in Hosts Requirements isn't material, one way or the other, here.) What else? In fact, aside from being listed in HOSTS.TXT, what exactly is a hostname? Is it defined somewhere? As far as the DNS goes, there is exactly one syntax requirement, on DNS names, and that pertains to their length. Aside from that, anything goes. There's a recommended syntax to use, but it is just that recommended. For particular uses, such as as the domain name in an e-mail address, there are more restrictive requirements. Other protocols that pass names as data likely have syntax limitations as well (just what those are tends to depend upon the protocol.) Observing the DNS recommendation for names is very likely to make the name suitable for any of those uses. That's why it exists. But the DNS can also be used for all kinds of other purposes -- some of the labels used there are purely internal DNS labels, and have no other use at all (such as the labels found in the data of NS records.) There are no syntax requirements (aside from lengths) on those labels. None. If you're considering a name from the domain name part of an e-mail address, then it makes perfect sense to syntax check it, according to the rules in rfc5322 (or 2822, or 822, or 733, or whatever even older version you're using...) If you're sending a name in a HTTP transaction, you'll need to syntax check it to make sure it meets the applicable rules (whatever those are.) But library functions, like the resolver library, and getaddrinfo(), and the older variants, have no idea why the DNS is being consulted, they don't know, generally, what rules are intended to apply - they cannot possibly legitimately object to anything (getaddrinfo() when given a port as well might guess what rules might apply, but even then it cannot know for sure .. the name it is being looked up, if the port is "http" (or 80) or "https" (or 443) then the name being looked up might be the hostname from a URL, which has some syntax constraints, or it might be the name configured as the local proxy server, which does not - it is just a key to use to extract an address from the DNS.) There is no rational way that those functions can ever validate name syntax, and get it correct. If you believe any of this is incorrect, please be explicit, and quote the standard that says so - without knowing just what this mythical standard is that an underscore in hostname apparently explicitly violates, it is very difficult to refute. But this requirement is truly an old wives' tale - an urban myth - people tell each other there is such a rule, it sounds plausible, so they believe it, and pass it on to others. kre ps: do go read section 11 of rfc2181 while you're pondering all of this.