I spent some more time digging into this to make sure I was not overlooking 
something fundamental in the resolver behavior here.

I agree that POSIX does not specify getaddrinfo() as a dns rrset api. It is a 
socket address selection api, and the specification leaves room for address 
family filtering, ordering, mapped addresses, AI_ADDRCONFIG, and other system 
policy decisions [1].

But I think there is an important distinction between:

- getaddrinfo() is not specified to preserve exact DNS semantics
- getaddrinfo() will normally lose arbitrary dns answers

The second conclusion does not seem to follow from the first one.

In the specific A/AAAA case discussed here, the linux libc implementations I 
checked generally do expose the full usable RRset returned by the resolver 
unless there is an explicit policy reason not to (AI_ADDRCONFIG, v4mapped 
handling, requested address family, resolver policy, etc).

This behavior is also consistent with dns resolver semantics themselves. 
Rfc1035 defines truncation handling via the TC bit, and rfc1123 requires 
retrying over tcp when truncation occurs [2][3]. In the ordinary dns case, I 
would therefore not expect a conforming resolver stack to silently hand libc an 
arbitrary partial RRset.

The cases where getaddrinfo() may legitimately omit addresses are mostly the 
same cases where connection behavior is already policy-sensitive anyway:

- mixed ipv4/v6 environments
- AI_ADDRCONFIG filtering
- v4mapped handling
- resolver policy rules
- non dns nss sources

Those are not really random losses of dns data. They are explicit host 
resolution and connectivity policy decisions.

What concerns me more about introducing a dns client inside libpq is that we 
would no longer be following the same resolver path as the rest of the system. 
That is user-visible behavior, not merely an implementation detail.

For example, it risks bypassing or changing behavior around:

- /etc/hosts
- nsswitch.conf
- mdns
- ldap integration
- systemd-resolved policy
- split dns
- vpn-specific resolver routing
- container/runtime-specific resolution

The current behavior may not be theoretically perfect from a dns abstraction 
perspective, but it is operationally well-understood and consistent with 
existing unix networking expectations.

My concern is that we may be trading a relatively narrow theoretical weakness 
in the getaddrinfo() contract for a much broader compatibility and behavioral 
change in existing deployments.

[1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/getaddrinfo.html
[2] https://datatracker.ietf.org/doc/html/rfc1035
[3] https://datatracker.ietf.org/doc/html/rfc1123#section-6.1.3.2

Reply via email to