On Tue, Oct 30, 2018 at 6:16 AM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> Magnus cornered me at pgconf.eu and asked me whether I could prototype
> the "barriers" I'd been talking about in the online checksumming thread.
>
> The problem there was to make sure that all processes, backends and
> auxiliary processes have seen the new state of checksums being enabled,
> and aren't currently in the process of writing a new page out.
>
> The current prototype solves that by requiring a restart, but that
> strikes me as a far too large hammer.
>
> The attached patch introduces "global barriers" (name was invented in a
> overcrowded hotel lounge, so ...), which allow to wait for such a change
> to be absorbed by all backends.
>
> I've only tested the code with gdb, but that seems to work:
>
> p WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM))
>
> waits until all backends (including bgwriter, checkpointers, walwriters,
> bgworkers, ...) have accepted interrupts at least once.  Multiple such
> requests are coalesced.
>
> I decided to wait until interrupts are actually process, rather than
> just the signal received, because that means the system is in a well
> defined state. E.g. there's no pages currently being written out.
>
> For the checksum enablement patch you'd do something like;
>
> EnableChecksumsInShmemWithLock();
> WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
>
> and after that you should be able to set it to a perstistent mode.
>
>
> I chose to use procsignals to send the signals, a global uint64
> globalBarrierGen, and per-backend barrierGen, barrierFlags, with the
> latter keeping track which barriers have been requested. There likely
> seem to be other usecases.
>
>
> The patch definitely is in a prototype stage. At the very least it needs
> a high-level comment somewhere, and some of the lower-level code needs
> to be cleaned up.
>
> One thing I wasn't happy about is how checksum internals have to absorb
> barrier requests - that seems unavoidable, but I'd hope for something
> more global than just BufferSync().
>
>
> Comments?
>
>

Finally getting back to this one.

In re-reading this, I notice there are a lot of references to Intterrupt
(with two t). I'm guessing this is just a spelling error, and not something
that actually conveys some meaning?

Can you elaborate on what you mean with:
+       /* XXX: need a more principled approach here */

Is that the thing you refer to above about "checksum internals"?

Also in checking we figured it'd be nice to have a wait event for this,
since a process can potentially get stuck in an infinite loop waiting for
some other process if it's misbehaving. Kind of like the attached?

//Magnus
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a84042a4ea..58b360d225 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3765,6 +3765,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_GLOBAL_BARRIER:
+			event_name = "GlobalBarrier";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATING:
 			event_name = "Hash/Batch/Allocating";
 			break;
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4e3e4a4893..9aed52df4a 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -21,6 +21,7 @@
 #include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
@@ -368,6 +369,7 @@ EmitGlobalBarrier(GlobalBarrierKind kind)
 void
 WaitForGlobalBarrier(uint64 generation)
 {
+	pgstat_report_wait_start(WAIT_EVENT_GLOBAL_BARRIER);
 	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
 	{
 		PGPROC *proc = &ProcGlobal->allProcs[i];
@@ -389,6 +391,7 @@ WaitForGlobalBarrier(uint64 generation)
 			oldval = pg_atomic_read_u64(&proc->barrierGen);
 		}
 	}
+	pgstat_report_wait_end();
 }
 
 /*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2b656a8168..0d35b19420 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -827,6 +827,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_GLOBAL_BARRIER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATING,
 	WAIT_EVENT_HASH_BATCH_ELECTING,
 	WAIT_EVENT_HASH_BATCH_LOADING,

Reply via email to