On 12/23/21 15:42, Fujii Masao wrote:


On 2021/12/23 3:49, Tomas Vondra wrote:
Attached is a patch tweaking WAL logging - in wal_level=minimal we do the same thing as now, in higher levels we log every sequence fetch.

Thanks for the patch!

With the patch, I found that the regression test for sequences failed.

+            fetch = log = fetch;

This should be "log = fetch"?

On second thought, originally a sequence doesn't guarantee that the value already returned by nextval() will never be returned by subsequent nextval() after the server crash recovery. That is, nextval() may return the same value across crash recovery. Is this understanding right? For example, this case can happen if the server crashes after nextval() returned the value but before WAL for the sequence was flushed to the permanent storage.

I think the important step is commit. We don't guarantee anything for changes in uncommitted transactions. If you do nextval in a transaction and the server crashes before the WAL gets flushed before COMMIT, then yes, nextval may generate the same nextval again. But after commit that is not OK - it must not happen.

So it's not a bug that sync standby may return the same value as
already returned in the primary because the corresponding WAL has not
been replicated yet, isn't it?


No, I don't think so. Once the COMMIT happens (and gets confirmed by the sync standby), it should be possible to failover to the sync replica without losing any data in committed transaction. Generating duplicate values is a clear violation of that.

IMHO the fact that we allow a transaction to commit (even just locally) without flushing all the WAL it depends on is clearly a data loss bug.

BTW, if the returned value is stored in database, the same value is guaranteed not to be returned again after the server crash or by sync standby. Because in that case the WAL of the transaction storing that value is flushed and replicated.


True, assuming the table is WAL-logged etc. I agree the issue may be affecting a fairly small fraction of workloads, because most people use sequences to generate data for inserts etc.

So I think this makes it acceptable / manageable. Of course, this means the values are much less monotonous (across backends), but I don't think we really promised that. And I doubt anyone is really using sequences like this (just nextval) in performance critical use cases.

I think that this approach is not acceptable to some users. So, if we actually adopt WAL-logging every sequence fetch, also how about exposing SEQ_LOG_VALS as reloption for a sequence? If so, those who want to log every sequence fetch can set this SEQ_LOG_VALS reloption to 0. OTOH, those who prefer the current behavior in spite of the risk we're discussing at this thread can set the reloption to 32 like it is for now, for example.


I think it'd be worth explaining why you think it's not acceptable?

I've demonstrated the impact on regular workloads (with other changes that write stuff to WAL) is not measurable, and enabling sequence caching eliminates most of the overhead for the rare corner case workloads if needed. It does generate a bit more WAL, but the sequence WAL records are pretty tiny.

I'm opposed to adding relooptions that affect correctness - it just seems like a bad idea to me. Moreover setting the CACHE for a sequence does almost the same thing - if you set CACHE 32, we only generate WAL once every 32 increments. The only difference is that this cache is not shared between backends, so one backend will generate 1,2,3,... and another backend will generate 33,34,35,... etc. I don't think that's a problem, because if you want strictly monotonous / gap-less sequences you can't use our sequences anyway. Yes, with short-lived backends this may consume the sequences faster, but well - short-lived backends are expensive anyway and overflowing bigserial is still unlikely.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to