Sounds good. Thanks for researching this!

On Tue, Apr 29, 2025, 4:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> I wrote:
> > Nathan Long <he...@nathanmlong.com> writes:
> >> At least in the case of `inet`, another reason is for accurate
> comparison.
> >> IPv4 and IPv6 both have shorthand textual representations; eg `127.1` =
> >> `127.1.0.0`. Text storage would consider these unequal.
>
> > I'm not sure how much we want to press that point, because AFAICS
> > the code we use does not have the same abbreviation rules you are
> > expecting.  Notably, it thinks '127.1' means 127.1.0.0.
> > (We lifted this logic from BIND 20+ years ago, so while it might
> > not entirely agree with practice elsewhere, it has a respectable
> > pedigree and I'm hesitant to mess with it.)
>
> I spent a little while researching this.  BIND stopped including the
> relevant code at all sometime in the past 10 years, apparently feeling
> that POSIX standardization means the libc versions of inet_pton()
> behave sufficiently alike everywhere.  You can still find copies
> of their code at, eg,
>
> https://users.isc.org/~each/doxygen/bind9/inet__pton_8c-source.html
>
> and there are also versions in the NetBSD source tree and probably
> elsewhere.  As far as I can find, none of these will interpret '127.1'
> as 127.0.0.1.  Some will reject it (which is what the POSIX spec for
> the function says to do) and some will interpret it as 127.1.0.0.
>
> Where 127.1 => 127.0.0.1 seems to come from is inet_addr (in POSIX)
> and inet_aton (not in POSIX), which are legacy IPv4-only functions.
> They say (quoting POSIX here):
>
>     Values specified using IPv4 dotted decimal notation take one of
>     the following forms:
>
>     a.b.c.d
>         When four parts are specified, each shall be interpreted as a
>         byte of data and assigned, from left to right, to the four
>         bytes of an Internet address.
>
>     a.b.c
>         When a three-part address is specified, the last part shall be
>         interpreted as a 16-bit quantity and placed in the rightmost
>         two bytes of the network address. This makes the three-part
>         address format convenient for specifying Class B network
>         addresses as "128.net.host".
>
>     a.b
>         When a two-part address is supplied, the last part shall be
>         interpreted as a 24-bit quantity and placed in the rightmost
>         three bytes of the network address. This makes the two-part
>         address format convenient for specifying Class A network
>         addresses as "net.host".
>
>     a
>         When only one part is given, the value shall be stored
>         directly in the network address without any byte
>         rearrangement.
>
>     All numbers supplied as parts in IPv4 dotted decimal notation may
>     be decimal, octal, or hexadecimal.
>
> Frankly, I don't think we want to support this.  Classful network
> addresses have gone the way of the dodo.  And the fact that it'd be
> inconsistent with our traditional interpretation for some non-error
> cases such as '127.1/16'::inet is really problematic.
> Moreover, the option to allow octal input is a true disaster, not
> least because there is plenty of code out there that is willing to
> print IPv4 addresses with zero-padded *decimal* byte values.
>
> So at this point I'm very unexcited about touching the behavior of
> inet_in.  Maybe in another universe it would have acted differently,
> but we have too many years of history with the current behavior.
>
> I do take your point about the inet types helping to standardize
> comparison behavior, but I think we should probably limit the text
> to talking about IPv6 abbreviations.  Maybe like
>
>     these types offer input error checking and specialized
>     operators and functions (see <xref linkend="functions-net"/>).
> +   They also simplify comparisons of inconsistently-written addresses,
> +   such as abbreviated and unabbreviated IPv6 addresses.
>    </para>
>
>                         regards, tom lane
>

Reply via email to