On Wed, 30 Oct 2024, 18:45 Jelte Fennema-Nio, <postg...@jeltef.nl> wrote: > > On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aa...@cybertec.at> wrote: > > The idea is great, I have been wanting something like this for a long > > time. For future proofing it might be a good idea to not require the > > communicated-waited value to be a LSN. > > Yours and Matthias' feedback make total sense I think. From an > implementation perspective I think there are a few things necessary to > enable these wider usecases: > 1. The token should be considered opaque for clients (should be documented)
I disagree. It is critical that a consumer knows what to do with the output. Blindly passing it around is not a valid strategy: In my example of keeping track of replication slots the client also has to keep track of every cluster ID to make it work correctly, as every postgres instance may only know about a subset of other PG instances: A client would have to know how to discern and how to merge the returned set of [cluster_id, LSN] pairs into its own view of a global progress: Say, you connect to cluster A, which receives changes from clusters X and Y, cluster B, which receives from X and Z, and cluster C, which receives from all of X, Y, and Z. Cluster B should ignore [Y_ID, Lsn], as keeping the [cluster id, LSN] pair around would be sensitive to resource attacks, but the client will have to merge the response from that scluster to make sure it doesn't accidentally "go back in time" when it switches from cluster A or B to another cluster with the "wait for this minimal replication state" 'token'. > > Even without sharding LSN might not be a final choice. Right now on > > the primary the visibility order is not LSN order. So if a connection > > does synchronous_commit = off commit, the write location is not even > > going to see the commit. By publishing the end of the commit record it > > would be better. But I assume at some point we would like to have a > > consistent visibility order, which quite likely means using something > > other than LSN as the logical clock. Or have CSN=LSN -based snapshots on the primary, too, as that also would solve the unordered visibility issue on the primary, as well as the unacknowledged read issue. > I was going to say that the default could probably still be LSN, but > this makes me doubt that. Is there some other token that we can send > now that we could "wait" on instead of the LSN, which would work for. > If not, I think LSN is still probably a good choice as the default. Or > maybe only as a default in case synchronous_commit != off. I don't see how we can have anything but LSN as 'wait-for-this' condition, as everything else could appear out-of-order in the WAL (we don't allow the record to be modified during XLogInsert()/ReserveXLogInsertLocation()), and WAL is our one source of truth for change capture. PS. I have other complaints about timestamp-based replication/snapshots, but unless someone thinks otherwise and/or it is made relevant I'll consider that off-topic. Kind regards, Matthias van de Meent Neon (https://neon.tech)