[HACKERS] Instrument checkpoint sync calls

Greg Smith Sun, 14 Nov 2010 13:48:00 -0800

Attached patch adds some logging for each individual fsync call madeduring a checkpoint, along with a summary at the end. You need to havethe following to see all of the detail:


log_checkpoints=on
log_min_messages=debug1


And here's a sample:

LOG:  checkpoint starting: immediate force wait
DEBUG:  checkpoint sync: file=1 time=1.946000 msec
DEBUG:  checkpoint sync: file=2 time=0.666000 msec
DEBUG:  checkpoint sync: file=3 time=0.004000 msec
LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec

LOG: checkpoint complete: wrote 3 buffers (0.1%); 0 transaction logfile(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s,total=0.003 s

I think that it's reasonable for the sort of people who turnlog_checkpoints on to also get the sync summary line, thus it beinglogged at LOG level. The detail on individual syncs might go to DEBUG2or lower instead of DEBUG1 where I put it, that part I don't have astrong opinion on. It's at DEBUG1 to make testing the patch without agigantic amount of log data also coming in easier.

Right now the code is written such that all the calls that grab timinginformation are wrapped around "ifdef DEBUG_FSYNC", which is a variableset to 1 that could be a compile-time option like DEBUG_DEADLOCK, toallow turning this code path off at build time. I personally think thatif you're already making an fsync call and have log_checkpoints on, theadditional overhead of also timing that fsync is minimal even onplatforms where timing is slow (I don't have such a system to test thatassumption however). And I've seen enough value in troubleshootingnasty checkpoint sync problems using this patch to feel it's worthhaving even if it does add some overhead.

I'm a little concerned about log_checkpoints changing on me in themiddle of the execution of a checkpoint, which would cause some problemshere. Not sure if that's something I should explicitly code for, giventhat all I think it will do is screw up one of the timing results. Itdoes seem a risk from the last minute self-review I just did of the code.

I'll give a sample program that stresses the system, generating slowtiming results and other types of bad behavior, along with the nextpatch I submit here shortly.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1219fcf..c72da72 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@
 
 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -29,6 +30,11 @@
 #include "utils/memutils.h"
 #include "pg_trace.h"
 
+/*
+ * For now just leave this on all the time, eventually this will
+ * need to be a full compiler flag like DEBUG_DEADLOCK
+ */
+#define DEBUG_FSYNC	1
 
 /* interval for calling AbsorbFsyncRequests in mdsync */
 #define FSYNCS_PER_ABSORB		10
@@ -927,6 +933,15 @@ mdsync(void)
 	PendingOperationEntry *entry;
 	int			absorb_counter;
 
+#ifdef DEBUG_FSYNC
+	/* Statistics on sync times */
+	int processed = 0;
+	instr_time sync_start,sync_end,diff;
+	double elapsed;
+	double longest = 0;
+	double total_elapsed = 0;
+#endif
+
 	/*
 	 * This is only called during checkpoints, and checkpoints should only
 	 * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1084,32 @@ mdsync(void)
 				seg = _mdfd_getseg(reln, entry->tag.forknum,
 							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
 								   false, EXTENSION_RETURN_NULL);
+
+#ifdef DEBUG_FSYNC
+				if (log_checkpoints)
+					INSTR_TIME_SET_CURRENT(sync_start);
+#endif
 				if (seg != NULL &&
 					FileSync(seg->mdfd_vfd) >= 0)
+				{
+
+#ifdef DEBUG_FSYNC
+					if (log_checkpoints)
+					{
+						INSTR_TIME_SET_CURRENT(sync_end);
+						diff=sync_end;
+						INSTR_TIME_SUBTRACT(diff, sync_start);
+						elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0;
+						if (elapsed > longest)
+							longest = elapsed;
+						total_elapsed+=elapsed;
+						processed++;
+						/* Reduce the log level here to DEBUG2 in final patch? */
+						elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed);
+					}
+#endif
 					break;		/* success; break out of retry loop */
+					}
 
 				/*
 				 * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1151,13 @@ mdsync(void)
 			elog(ERROR, "pendingOpsTable corrupted");
 	}							/* end loop over hashtable entries */
 
+#ifdef DEBUG_FSYNC
+	/* Report on sync performance metrics */
+	if (log_checkpoints && (processed > 0))
+		elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec",
+			processed, longest, total_elapsed / processed);
+#endif
+
 	/* Flag successful completion of mdsync */
 	mdsync_in_progress = false;
 }

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Instrument checkpoint sync calls

Reply via email to