Sounds good. Thanks for researching this! On Tue, Apr 29, 2025, 4:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> I wrote: > > Nathan Long <he...@nathanmlong.com> writes: > >> At least in the case of `inet`, another reason is for accurate > comparison. > >> IPv4 and IPv6 both have shorthand textual representations; eg `127.1` = > >> `127.1.0.0`. Text storage would consider these unequal. > > > I'm not sure how much we want to press that point, because AFAICS > > the code we use does not have the same abbreviation rules you are > > expecting. Notably, it thinks '127.1' means 127.1.0.0. > > (We lifted this logic from BIND 20+ years ago, so while it might > > not entirely agree with practice elsewhere, it has a respectable > > pedigree and I'm hesitant to mess with it.) > > I spent a little while researching this. BIND stopped including the > relevant code at all sometime in the past 10 years, apparently feeling > that POSIX standardization means the libc versions of inet_pton() > behave sufficiently alike everywhere. You can still find copies > of their code at, eg, > > https://users.isc.org/~each/doxygen/bind9/inet__pton_8c-source.html > > and there are also versions in the NetBSD source tree and probably > elsewhere. As far as I can find, none of these will interpret '127.1' > as 127.0.0.1. Some will reject it (which is what the POSIX spec for > the function says to do) and some will interpret it as 127.1.0.0. > > Where 127.1 => 127.0.0.1 seems to come from is inet_addr (in POSIX) > and inet_aton (not in POSIX), which are legacy IPv4-only functions. > They say (quoting POSIX here): > > Values specified using IPv4 dotted decimal notation take one of > the following forms: > > a.b.c.d > When four parts are specified, each shall be interpreted as a > byte of data and assigned, from left to right, to the four > bytes of an Internet address. > > a.b.c > When a three-part address is specified, the last part shall be > interpreted as a 16-bit quantity and placed in the rightmost > two bytes of the network address. This makes the three-part > address format convenient for specifying Class B network > addresses as "128.net.host". > > a.b > When a two-part address is supplied, the last part shall be > interpreted as a 24-bit quantity and placed in the rightmost > three bytes of the network address. This makes the two-part > address format convenient for specifying Class A network > addresses as "net.host". > > a > When only one part is given, the value shall be stored > directly in the network address without any byte > rearrangement. > > All numbers supplied as parts in IPv4 dotted decimal notation may > be decimal, octal, or hexadecimal. > > Frankly, I don't think we want to support this. Classful network > addresses have gone the way of the dodo. And the fact that it'd be > inconsistent with our traditional interpretation for some non-error > cases such as '127.1/16'::inet is really problematic. > Moreover, the option to allow octal input is a true disaster, not > least because there is plenty of code out there that is willing to > print IPv4 addresses with zero-padded *decimal* byte values. > > So at this point I'm very unexcited about touching the behavior of > inet_in. Maybe in another universe it would have acted differently, > but we have too many years of history with the current behavior. > > I do take your point about the inet types helping to standardize > comparison behavior, but I think we should probably limit the text > to talking about IPv6 abbreviations. Maybe like > > these types offer input error checking and specialized > operators and functions (see <xref linkend="functions-net"/>). > + They also simplify comparisons of inconsistently-written addresses, > + such as abbreviated and unabbreviated IPv6 addresses. > </para> > > regards, tom lane >