Date:        Sun, 1 Nov 2015 23:45:36 -0800
    From:        Erik Fair <f...@netbsd.org>
    Message-ID:  <56dd2b5f-26b1-40b7-bd84-103d3f255...@netbsd.org>

  | So, what did we do by default: allow "_" in hostnames when that's
  | explicitly against standard, or not?

Which standard?

There's RFC952 that specifies the format of HOSTS.TXT (from 1985).
(The update in Hosts Requirements isn't material, one way or the other, here.)

What else?   In fact, aside from being listed in HOSTS.TXT, what exactly
is a hostname?   Is it defined somewhere?

As far as the DNS goes, there is exactly one syntax requirement, on DNS
names, and that pertains to their length.   Aside from that, anything goes.
There's a recommended syntax to use, but it is just that recommended.

For particular uses, such as as the domain name in an e-mail address,
there are more restrictive requirements.   Other protocols that pass names
as data likely have syntax limitations as well (just what those are
tends to depend upon the protocol.)   Observing the DNS recommendation for
names is very likely to make the name suitable for any of those uses.
That's why it exists.

But the DNS can also be used for all kinds of other purposes -- some
of the labels used there are purely internal DNS labels, and have no other
use at all (such as the labels found in the data of NS records.)   There
are no syntax requirements (aside from lengths) on those labels.   None.

If you're considering a name from the domain name part of an e-mail address,
then it makes perfect sense to syntax check it, according to the rules in
rfc5322 (or 2822, or 822, or 733, or whatever even older version you're
using...)   If you're sending a name in a HTTP transaction, you'll need to
syntax check it to make sure it meets the applicable rules (whatever those 
are.)

But library functions, like the resolver library, and getaddrinfo(),
and the older variants, have no idea why the DNS is being consulted, they
don't know, generally, what rules are intended to apply - they cannot
possibly legitimately object to anything (getaddrinfo() when given a port
as well might guess what rules might apply, but even then it cannot know
for sure .. the name it is being looked up, if the port is "http" (or 80)
or "https" (or 443) then the name being looked up might be the hostname
from a URL, which has some syntax constraints, or it might be the name
configured as the local proxy server, which does not - it is just a key to
use to extract an address from the DNS.)

There is no rational way that those functions can ever validate name
syntax, and get it correct.

If you believe any of this is incorrect, please be explicit, and quote the
standard that says so - without knowing just what this mythical standard is
that an underscore in hostname apparently explicitly violates, it is very
difficult to refute.

But this requirement is truly an old wives' tale - an urban myth - people
tell each other there is such a rule, it sounds plausible, so they believe
it, and pass it on to others.

kre

ps: do go read section 11 of rfc2181 while you're pondering all of this.

Reply via email to