Attached patch adds some logging for each individual fsync call made during a checkpoint, along with a summary at the end. You need to have the following to see all of the detail:

log_checkpoints=on
log_min_messages=debug1

And here's a sample:

LOG:  checkpoint starting: immediate force wait
DEBUG:  checkpoint sync: file=1 time=1.946000 msec
DEBUG:  checkpoint sync: file=2 time=0.666000 msec
DEBUG:  checkpoint sync: file=3 time=0.004000 msec
LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s

I think that it's reasonable for the sort of people who turn log_checkpoints on to also get the sync summary line, thus it being logged at LOG level. The detail on individual syncs might go to DEBUG2 or lower instead of DEBUG1 where I put it, that part I don't have a strong opinion on. It's at DEBUG1 to make testing the patch without a gigantic amount of log data also coming in easier.

Right now the code is written such that all the calls that grab timing information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow turning this code path off at build time. I personally think that if you're already making an fsync call and have log_checkpoints on, the additional overhead of also timing that fsync is minimal even on platforms where timing is slow (I don't have such a system to test that assumption however). And I've seen enough value in troubleshooting nasty checkpoint sync problems using this patch to feel it's worth having even if it does add some overhead.

I'm a little concerned about log_checkpoints changing on me in the middle of the execution of a checkpoint, which would cause some problems here. Not sure if that's something I should explicitly code for, given that all I think it will do is screw up one of the timing results. It does seem a risk from the last minute self-review I just did of the code.

I'll give a sample program that stresses the system, generating slow timing results and other types of bad behavior, along with the next patch I submit here shortly.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us


diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1219fcf..c72da72 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@
 
 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -29,6 +30,11 @@
 #include "utils/memutils.h"
 #include "pg_trace.h"
 
+/*
+ * For now just leave this on all the time, eventually this will
+ * need to be a full compiler flag like DEBUG_DEADLOCK
+ */
+#define DEBUG_FSYNC	1
 
 /* interval for calling AbsorbFsyncRequests in mdsync */
 #define FSYNCS_PER_ABSORB		10
@@ -927,6 +933,15 @@ mdsync(void)
 	PendingOperationEntry *entry;
 	int			absorb_counter;
 
+#ifdef DEBUG_FSYNC
+	/* Statistics on sync times */
+	int processed = 0;
+	instr_time sync_start,sync_end,diff;
+	double elapsed;
+	double longest = 0;
+	double total_elapsed = 0;
+#endif
+
 	/*
 	 * This is only called during checkpoints, and checkpoints should only
 	 * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1084,32 @@ mdsync(void)
 				seg = _mdfd_getseg(reln, entry->tag.forknum,
 							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
 								   false, EXTENSION_RETURN_NULL);
+
+#ifdef DEBUG_FSYNC
+				if (log_checkpoints)
+					INSTR_TIME_SET_CURRENT(sync_start);
+#endif
 				if (seg != NULL &&
 					FileSync(seg->mdfd_vfd) >= 0)
+				{
+
+#ifdef DEBUG_FSYNC
+					if (log_checkpoints)
+					{
+						INSTR_TIME_SET_CURRENT(sync_end);
+						diff=sync_end;
+						INSTR_TIME_SUBTRACT(diff, sync_start);
+						elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0;
+						if (elapsed > longest)
+							longest = elapsed;
+						total_elapsed+=elapsed;
+						processed++;
+						/* Reduce the log level here to DEBUG2 in final patch? */
+						elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed);
+					}
+#endif
 					break;		/* success; break out of retry loop */
+					}
 
 				/*
 				 * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1151,13 @@ mdsync(void)
 			elog(ERROR, "pendingOpsTable corrupted");
 	}							/* end loop over hashtable entries */
 
+#ifdef DEBUG_FSYNC
+	/* Report on sync performance metrics */
+	if (log_checkpoints && (processed > 0))
+		elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec",
+			processed, longest, total_elapsed / processed);
+#endif
+
 	/* Flag successful completion of mdsync */
 	mdsync_in_progress = false;
 }
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to