On Thu, 2026-03-05 at 14:59 +0000, Evgeny Kuzin wrote:
> We run a PostgreSQL clusters with streaming replication. After a failover, 
> the old primary
> becomes a standby and vice versa. The challenge is: how do clients find the 
> new primary?
>
> Current options:
>    1. Update DNS on every failover - operationally complex, TTL delays, 
> requires automation

Your proposal would also suffer from TTL delays in the case of a cluster 
reconfiguration.

>    2. Consul/etcd - adds operational complexity and another failure domain
>    3. Multiple hosts in connection string - requires application changes when 
> cluster
>       topology changes (e.g., adding a new standby)
>
> The proposed approach:
>  * Single A-record (db.internal) pointing to all cluster member IPs
>  * Clients connect with 
>    host=db.internal target_session_attrs=read-write
>  * libpq tries each IP until it finds the primary
>
> IIUC this​ is how JDBC'stargetServerType=primary works - it iterates through 
> all resolved
> addresses. The "useless connection attempts" are actually the feature: it's 
> probing to
> find the right server, same as when you specify multiple hosts explicitly.
> The only difference fromhost=pg1,pg2,pg3 is that DNS provides the list 
> instead of the
> connection string. From libpq's perspective, why should it matter where the 
> address list came from?

I see the point of your proposal.

One example of what Tom worries about is "localhost" resolving to both 
"127.0.0.1" and "::1",
a very common case.  With the proposed change, any connection attempt to 
"localhost" that fails
would now take twice as long to fail.  Also, if the problem is authentication, 
the server would
perform two authentication attempts.  That is a clear regression that may 
affect many people.

The question is whether the overall benefits of your proposal (which certainly 
makes sense
in a setup like you describe) would be worth a performance and resource usage 
regression like
the one I described above.  Or can you see a way to modify your approach so 
that that problem
can be avoided?

Yours,
Laurenz Albe


Reply via email to