On 9/14/21, 7:23 AM, "Dipesh Pandit" <[email protected]> wrote:
> I agree that when we are creating a .ready file we should compare
> the current .ready file with the last .ready file to check if this file is
> created out of order. We can store the state of the last .ready file
> in shared memory and compare it with the current .ready file. I
> believe that archiver specific shared memory area can be used
> to store the state of the last .ready file unless I am missing
> something and this needs to be stored in a separate shared
> memory area.
>
> With this change, we have the flexibility to move the current archiver
> state out of shared memory and keep it local to archiver. I have
> incorporated these changes and updated a new patch.
I wonder if this can be simplified even further. If we don't bother
trying to catch out-of-order .ready files in XLogArchiveNotify() and
just depend on the per-checkpoint/restartpoint directory scans, we can
probably remove lastReadySegNo from archiver state completely.
+ /* Force a directory scan if we are archiving anything but a regular
+ * WAL file or if this WAL file is being created out-of-order.
+ */
+ if (!IsXLogFileName(xlog))
+ PgArchForceDirScan();
+ else
+ {
+ TimeLineID tli;
+ XLogSegNo last_segno;
+ XLogSegNo this_segno;
+
+ last_segno = PgArchGetLastReadySegNo();
+ XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size);
+
+ /*
+ * Force a directory scan in case if this .ready file created
out of
+ * order.
+ */
+ last_segno++;
+ if (last_segno != this_segno)
+ PgArchForceDirScan();
+
+ PgArchSetLastReadySegNo(this_segno);
+ }
This is an interesting idea, but the "else" block here seems prone to
race conditions. I think we'd have to hold arch_lck to prevent that.
But as I mentioned above, if we are okay with depending on the
fallback directory scans, I think we can remove the "else" block
completely.
+ /* Initialize the current state of archiver */
+ xlogState.lastSegNo = MaxXLogSegNo;
+ xlogState.lastTli = MaxTimeLineID;
It looks like we have two ways to force a directory scan. We can
either set force_dir_scan to true, or lastSegNo can be set to
MaxXLogSegNo. Why not just set force_dir_scan to true here so that we
only have one way to force a directory scan?
+ /*
+ * Failed to archive, make sure that
archiver performs a
+ * full directory scan in the next
cycle to avoid missing
+ * the WAL file which could not be
archived due to some
+ * failure in current cycle.
+ */
+ PgArchForceDirScan();
Don't we also need to force a directory scan in the other cases we
return early from pgarch_ArchiverCopyLoop()? We will have already
advanced the archiver state in pgarch_readyXlog(), so I think we'd end
up skipping files if we didn't. For example, if archive_command isn't
set, we'll just return, and the next call to pgarch_readyXlog() might
return the next file.
+ /* Continue directory scan until we find a regular WAL
file */
+ SpinLockAcquire(&PgArch->arch_lck);
+ PgArch->force_dir_scan = true;
+ SpinLockRelease(&PgArch->arch_lck);
nitpick: I think we should just call PgArchForceDirScan() here.
Nathan