Re: sequences vs. synchronous replication

Tomas Vondra Thu, 23 Dec 2021 10:50:44 -0800

On 12/23/21 15:42, Fujii Masao wrote:

On 2021/12/23 3:49, Tomas Vondra wrote:
Attached is a patch tweaking WAL logging - in wal_level=minimal we dothe same thing as now, in higher levels we log every sequence fetch.
Thanks for the patch!

With the patch, I found that the regression test for sequences failed.

+            fetch = log = fetch;

This should be "log = fetch"?
On second thought, originally a sequence doesn't guarantee that thevalue already returned by nextval() will never be returned by subsequentnextval() after the server crash recovery. That is, nextval() may returnthe same value across crash recovery. Is this understanding right? Forexample, this case can happen if the server crashes after nextval()returned the value but before WAL for the sequence was flushed to thepermanent storage.

I think the important step is commit. We don't guarantee anything forchanges in uncommitted transactions. If you do nextval in a transactionand the server crashes before the WAL gets flushed before COMMIT, thenyes, nextval may generate the same nextval again. But after commit thatis not OK - it must not happen.

So it's not a bug that sync standby may return the same value as
already returned in the primary because the corresponding WAL has not
been replicated yet, isn't it?

No, I don't think so. Once the COMMIT happens (and gets confirmed by thesync standby), it should be possible to failover to the sync replicawithout losing any data in committed transaction. Generating duplicatevalues is a clear violation of that.

IMHO the fact that we allow a transaction to commit (even just locally)without flushing all the WAL it depends on is clearly a data loss bug.

BTW, if the returned value is stored in database, the same value isguaranteed not to be returned again after the server crash or by syncstandby. Because in that case the WAL of the transaction storing thatvalue is flushed and replicated.

True, assuming the table is WAL-logged etc. I agree the issue may beaffecting a fairly small fraction of workloads, because most people usesequences to generate data for inserts etc.

So I think this makes it acceptable / manageable. Of course, thismeans the values are much less monotonous (across backends), but Idon't think we really promised that. And I doubt anyone is reallyusing sequences like this (just nextval) in performance critical usecases.
I think that this approach is not acceptable to some users. So, if weactually adopt WAL-logging every sequence fetch, also how about exposingSEQ_LOG_VALS as reloption for a sequence? If so, those who want to logevery sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH,those who prefer the current behavior in spite of the risk we'rediscussing at this thread can set the reloption to 32 like it is fornow, for example.


I think it'd be worth explaining why you think it's not acceptable?

I've demonstrated the impact on regular workloads (with other changesthat write stuff to WAL) is not measurable, and enabling sequencecaching eliminates most of the overhead for the rare corner caseworkloads if needed. It does generate a bit more WAL, but the sequenceWAL records are pretty tiny.

I'm opposed to adding relooptions that affect correctness - it justseems like a bad idea to me. Moreover setting the CACHE for a sequencedoes almost the same thing - if you set CACHE 32, we only generate WALonce every 32 increments. The only difference is that this cache is notshared between backends, so one backend will generate 1,2,3,... andanother backend will generate 33,34,35,... etc. I don't think that's aproblem, because if you want strictly monotonous / gap-less sequencesyou can't use our sequences anyway. Yes, with short-lived backends thismay consume the sequences faster, but well - short-lived backends areexpensive anyway and overflowing bigserial is still unlikely.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sequences vs. synchronous replication

Reply via email to