On Mon, Apr 4, 2016 at 6:41 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > I'm not very excited about this patch. Too much code for so little benefit > and fragile too. > > I'm not even sure what definition of "meaningful progress" is useful. If we > commit this, a similar bug could be filed for many similar WAL record types.
Up to now, "progress" means the addition of WAL records by backends, "meaningful" refers to records that impact the decision of skipping checkpoints or not. > Since we zero out WAL files specifically to allow them to be compressed in > the archive, this patch saves a few bytes in the archive. Seems like we > could fake up some WAL files if needed for restore with an external tool, if > we really care. This assumes that most deployments compress the segments when archiving them for simplicity. I would believe that the reverse is true, most of archive command processes do not compress them. And I am pretty sure that some cloud deployments have as well pricing models depending on the number of accesses done on an instance. > Since we agree it wouldn't be backpatched, its pretty much saying we don't > care. Seeing the flow of this thread, that's not completely true. Andres has been arguing for no backpatch, Amit and I not, because the checkpoint skip logic is incorrect. The logic to decide is a checkpoint should be skipped or not is still broken on back-branches for wal_level = hot_standby. That's related to the former complain reported here. > A promising approach would be to just skip logging the RUNNING_XACTS record > if there are no xacts running, which is the case we care about here (no > progress => no xacts). HS startup can only happen at a checkpoint, so if > there is no WAL file there is no checkpoint, so we don't need the > RUNNING_XACTS record to be written. That's a short patch. I am not sure if that's actually simpler... Do you mean here that we should save the latest transaction ID at the moment a snapshot is taken and skip the next snapshot if TXID has not progressed? Say in LogStandbySnapshot() when a snapshot is taken, we allocate the last XID with ReadNewTransactionId() in XlogCtlData, then compare it the next time a standby snapshot is taken. With such a logic, you need to do something like that: - Save the last transaction XID when RUNNING_XACTS has been logged in shmem - Save in shmem as well if the previous RUNNING_XACTS has overflowed its subxids or not. If it has overflowed, we should log RUNNING_XACTS next time LogStandbySnapshot is logged to prevent the case of pending snapshots at recovery. AFAIK, standby snapshots with no xacts running are still useful at recovery for two things to accelerate a hot standby initialization: - If the previous snapshot overflowed its subxids, the next empty snapshot can help out clear the situation. That's the case where the bgwriter snapshot is helpful actually, because there is no need to wait for the next checkpoint record to be replayed to initialize a hot standby. - An empty snapshot after checkpoint is mandatory, even if it is empty, no? Knowing that, the next checkpoint cannot be skipped because the logic mentioned above in xlog.c to decide if a checkpoint should be skipped or not ignores the fact that a standby snapshot has been logged. It bases its logic on the sole presence of a checkpoint record, so this will never work. I mean this check of course: if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_FORCE)) == 0) { if (prevPtr == ControlFile->checkPointCopy.redo && prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE) { WALInsertLockRelease(); LWLockRelease(CheckpointLock); END_CRIT_SECTION(); return; } } The approach introducing the concept of WAL progress addresses the problem of unnecessary checkpoints and of unnecessary standby snapshots. If we take the problem only from the angle of RUNNING_XACTS the current logic used to check if a checkpoint should be skipped or not is not addressed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers