Re: [PATCHES] Logging checkpoints and other slowdown causes
Greg Smith wrote: I'll take another stab at refining this can of worms I opened. The one thing I noticed on a quick review is that it's almost possible to skip all the calls to gettimeofday if log_checkpoints is off now. I'd like to make that a specific goal, because that will make me feel better that adding this code has almost no performance impact relative to now unless you turn the feature on. Saving a couple of gettimeofday calls on an event that happens as infrequently as checkpoints is not going to make any difference. Especially if you compare it to all the other work that's done on checkpoint. I agree with Simon that tracking create/drop separately is unnecessary. As for why all the timing info is in ms, given the scale of the numbers typically encountered I found it easier to work with. I originally wanted resolution down to 0.1ms if the underlying OS supports it, which means 4 figures to the right of the decimal point if the unit was switched to seconds. Quite often the times reported are less than 100ms, so you'll normally be dealing with fractional part of a second. If we take Heikki's example: LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms sync=32.0 ms And switch it to seconds: LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms sync=0.0320 ms I don't find that as easy to work with. The only way a timing in seconds would look OK is if the resolution of the whole thing is reduced to ms, which then makes 3 decimal points--easy to read as ms instead. Having stared at a fair amount of this data now, that's probably fine; I'll collect up some more data on it from a fast server this week to confirm whether's it's worthless precision or worth capturing. The checkpoint will take at least a couple of seconds on any interesting system, so 0.1 s resolution should be enough IMHO. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Logging checkpoints and other slowdown causes
On Tue, 29 May 2007, Heikki Linnakangas wrote: The checkpoint will take at least a couple of seconds on any interesting system, so 0.1 s resolution should be enough IMHO. You may be underestimating the resources some interesting systems are willing to put into speeding up checkpoints. I'm sometimes dumping into a SAN whose cache is bigger than the shared_buffer cache in the server, and 0.1s isn't really enough resolution in that situation. A second is a really long checkpoint there. Since even that's limited by fiber-channel speeds, I know it's possible to do better than what I'm seeing with something like a PCIe host adapter having on-board cache in the GB range (which isn't that expensive nowadays). Also, even if the checkpoint total takes seconds, much of that is in the sync phase; the write time can still be in the small number of ms range, and I wouldn't want to see that truncated too much. Anyway, I have a bunch of data on this subject being collected at this moment, and I'll rescale the results based on what I see after analyzing that this week. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Logging checkpoints and other slowdown causes
Here's a work-in-progress updated version of this patch. It's not completed, but I've settled on: - a log_checkpoints GUC variable to control whether anything is printed or not. This is in line with log_connections, log_autovacuum etc. - when checkpoints are logged, LOG-level is used - the following log output: LOG: checkpoint starting (CHECKPOINT command) LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms sync=32.0 ms where checkpoint starting is obviously printed when checkpoint starts, and checkpoint complete when the checkpoint is fully complete. - all information on the progress of checkpoint is collected to a global struct. A global struct sounds a bit ugly, but it avoids the ugliness of littering all functions with multiple return values to pass back information to the caller that's only used to print it out. This makes it easy to add more data points. MoveOfflineLogs should probably be changed to use that as well for the sake of consistency, instead of returning nsegsremoved and nsegsrecycled through pointers. - RequestCheckpoint and CreateCheckpoint was changed to take a CheckpointReason enum as argument, instead of boolean flags. This allows printing a different message depending on the cause of checkpoint, and is also cleaner IMHO. I'm posting now to allow others, and Greg Smith in particular, to review the changes this far, and take over from here. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xlog.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.268 diff -c -r1.268 xlog.c *** src/backend/access/transam/xlog.c 30 Apr 2007 21:01:52 - 1.268 --- src/backend/access/transam/xlog.c 20 May 2007 15:21:49 - *** *** 71,76 --- 71,78 bool XLOG_DEBUG = false; #endif + bool log_checkpoints = false; + /* * XLOGfileslop is used in the code as the allowed fuzz in the number of * preallocated XLOG segments --- we try to have at least XLOGfiles advance *** *** 120,125 --- 122,130 static TimestampTz recoveryStopTime; static bool recoveryStopAfter; + CheckpointProgressData checkpointProgress; + + /* * During normal operation, the only timeline we care about is ThisTimeLineID. * During recovery, however, things are more complicated. To simplify life *** *** 1595,1601 if (XLOG_DEBUG) elog(LOG, time for a checkpoint, signaling bgwriter); #endif ! RequestCheckpoint(false, true); } } } --- 1600,1606 if (XLOG_DEBUG) elog(LOG, time for a checkpoint, signaling bgwriter); #endif ! RequestCheckpoint(false, CHKP_SEGMENTS); } } } *** *** 5067,5073 * the rule that TLI only changes in shutdown checkpoints, which * allows some extra error checking in xlog_redo. */ ! CreateCheckPoint(true, true); /* * Close down recovery environment --- 5072,5078 * the rule that TLI only changes in shutdown checkpoints, which * allows some extra error checking in xlog_redo. */ ! CreateCheckPoint(CHKP_SHUTDOWN); /* XXX startup rather than shutdown... */ /* * Close down recovery environment *** *** 5340,5346 ereport(LOG, (errmsg(shutting down))); ! CreateCheckPoint(true, true); ShutdownCLOG(); ShutdownSUBTRANS(); ShutdownMultiXact(); --- 5345,5351 ereport(LOG, (errmsg(shutting down))); ! CreateCheckPoint(CHKP_SHUTDOWN); ShutdownCLOG(); ShutdownSUBTRANS(); ShutdownMultiXact(); *** *** 5356,5362 * activity has occurred since the last one. */ void ! CreateCheckPoint(bool shutdown, bool force) { CheckPoint checkPoint; XLogRecPtr recptr; --- 5361,5367 * activity has occurred since the last one. */ void ! CreateCheckPoint(CheckpointReason reason) { CheckPoint checkPoint; XLogRecPtr recptr; *** *** 5370,5375 --- 5375,5435 int nsegsrecycled = 0; TransactionId *inCommitXids; int nInCommit; + struct timeval start_t; + int elapsed_secs; + + /* Log beginning of checkpoint */ + + gettimeofday(start_t, NULL); + + if (log_checkpoints) + { + switch(reason) + { + case CHKP_SHUTDOWN: + /* ereport(LOG, (errmsg(checkpoint starting (shutdown; */ + break; + case CHKP_COMMAND: + ereport(LOG, (errmsg(checkpoint starting (CHECKPOINT command; + break; + case CHKP_BACKUP: + ereport(LOG, (errmsg(checkpoint starting (starting backup; + break; + case CHKP_TIMEOUT: + ereport(LOG, (errmsg(checkpoint starting (checkpoint_timeout reached; + break; + case CHKP_SEGMENTS: + ereport(LOG, (errmsg(checkpoint starting (checkpoint_segments reached; +
Re: [PATCHES] Logging checkpoints and other slowdown causes
On Mon, 2007-05-28 at 21:25 +0100, Heikki Linnakangas wrote: - RequestCheckpoint and CreateCheckpoint was changed to take a CheckpointReason enum as argument, instead of boolean flags. This allows printing a different message depending on the cause of checkpoint, and is also cleaner IMHO. Looks good. I'd be inclined to make the reason code for creating/dropping database the same and just say Cluster DDL or something, since for analysis purposes we just want to know it was DDL. I'd rather see the timings in seconds rather than ms. Counting significant digits will be a pain... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Logging checkpoints and other slowdown causes
I'll take another stab at refining this can of worms I opened. The one thing I noticed on a quick review is that it's almost possible to skip all the calls to gettimeofday if log_checkpoints is off now. I'd like to make that a specific goal, because that will make me feel better that adding this code has almost no performance impact relative to now unless you turn the feature on. I agree with Simon that tracking create/drop separately is unnecessary. As for why all the timing info is in ms, given the scale of the numbers typically encountered I found it easier to work with. I originally wanted resolution down to 0.1ms if the underlying OS supports it, which means 4 figures to the right of the decimal point if the unit was switched to seconds. Quite often the times reported are less than 100ms, so you'll normally be dealing with fractional part of a second. If we take Heikki's example: LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms sync=32.0 ms And switch it to seconds: LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms sync=0.0320 ms I don't find that as easy to work with. The only way a timing in seconds would look OK is if the resolution of the whole thing is reduced to ms, which then makes 3 decimal points--easy to read as ms instead. Having stared at a fair amount of this data now, that's probably fine; I'll collect up some more data on it from a fast server this week to confirm whether's it's worthless precision or worth capturing. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
Moving to -hackers. On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote: If you know when the checkpoint ended, and you know how long each of the pieces took, you can reconstruct the other times easily. The way you describe this it is true--that the summary is redundant given the detail--but if you put yourself in the shoes of a log file parser the other way around is easier to work with. Piecing together log entries is a pain, splitting them is easy. If I had to only keep one line out of this, it would be the one with the summary. It would be nice to have it logged at INFO. Yeah, if we have the summary line we don't need the other lines and vice versa. I have sympathy for parsing log files, I've done that a lot in the past and I can see what you mean. Having the individual lines is nice when you're monitoring a running system; you don't get the summary line until the checkpoint is finished. I suppose we can have both the individual lines and the summary, the extra lines shouldn't hurt anyone, and you won't get them unless you turn on the new log_checkpoints parameter anyway. Not to beat a dead horse, but do we really want to force folks to be parsing logs for performance monitoring? Especially if that log parsing is just going to result in data being inserted into a table anyway? I know there's concern about performance of the stats system and maybe that needs to be addressed, but pushing users to log parsing is a lot of extra effort, non-standard, likely to be overlooked, and doesn't play well with other tools. It also conflicts with all the existing statistics framework. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Logging checkpoints and other slowdown causes
Greg Smith wrote: On Tue, 3 Apr 2007, Peter Eisentraut wrote: Something that is aimed at a user should not be enabled at a debug level. Debug levels are for debugging, not for very high verbosity. I asked for feedback about where to log at when I intially sent the first version of this in and didn't hear anything back on that part, so I pushed these in line with other log messages I saw. The messages for when checkpoints start and stop were both logged at DEBUG2, so I put progress reports on the other significant phases of the process there as well. I agree that debug levels are not suitable for this. I'm thinking of INFO, NOTICE or LOG. The user manual says about LOG: LOG Reports information of interest to administrators, e.g., checkpoint activity. But looking at the code, all the checkpoint related messages are at DEBUG-levels, nothing gets printed at LOG-level. Printing the messages at LOG-level would bring the code in line with the documentation, but I don't think we want to fill the log with checkpoint chatter unless the DBA explicitly asks for that. How about INFO? It seems like the best level for information on normal activity of the system. The documentation however has this to say about it: INFO Provides information implicitly requested by the user, e.g., during VACUUM VERBOSE. We should adjust the documentation, but INFO seems like the best level to me. Or we could add a GUC variable similar to log_connections or log_statement to control if the messages are printed or not, and use LOG. I don't expect these messages will be helpful for a normal user--that's what the new data in pg_stats_bgwriter is for. Their main purpose of this patch is debugging checkpoint related performance issues at a level I'd expect only a developer to work at; they're also helpful for someone writing benchmark code. I disagree. They would be helpful to an administrator chasing down checkpoint related problems. E.g. checkpoints taking too long, occurring too often (though we already have log_checkpoint_warning for that), or to identify if transient performance problems that users complain about coincide with checkpoints. And at least I like to have messages like that in the log just to comfort me that everything is going ok. There are several patches in process floating around that aim to adjust either the background writer or the checkpoint process to reduce the impact of checkpoints. This logging allows grading their success at that. As my tests with this patch in place suggest this problem is far from solved with any of the current suggestions, I'd like to get other developers looking at that problem the same way I have been; that's why I'd like to see some standardization on how checkpoints are instrumented. The fact that really advanced users might also use this for troubleshooting I consider a bonus rather than the main focus here. Agreed. Looking at the patch, I think we want one line in the log when checkpoint starts, one when buffer flushing starts (or maybe not, if we assume that checkpointing clog, subtrans and multixact don't take long), one when sync-phase starts and when the checkpoint is done. We don't need to print the times elapsed in each phase on a separate line, that's just derived information from the other lines, unless we use different log-levels for detail lines (like you did in your patch). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Logging checkpoints and other slowdown causes
On 5/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: I agree that debug levels are not suitable for this. Squid (the (reverse) proxy) has a nice feature for debugging purposes (admin and developers). It's possible to have different logging level for different sections of the code. For example, if you want to debug a particular part, you can have ALL,1 33,2 in your config file which allows you to have a more verbose debug on the section 33 of the code (33 is ACL). It's really useful because you can even debug a particular code path or a misconfiguration in production without generating a huge amount of logs. -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Logging checkpoints and other slowdown causes
Guillaume Smet wrote: On 5/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: I agree that debug levels are not suitable for this. Squid (the (reverse) proxy) has a nice feature for debugging purposes (admin and developers). It's possible to have different logging level for different sections of the code. For example, if you want to debug a particular part, you can have ALL,1 33,2 in your config file which allows you to have a more verbose debug on the section 33 of the code (33 is ACL). It's really useful because you can even debug a particular code path or a misconfiguration in production without generating a huge amount of logs. Yes, that'd be really nice, though using numeric codes is a bit archaic. In Java-world, a logging framework called log4j has become the de facto standard to implement logging. It's basically the same idea, you have different log levels, and you can have enable different levels of logging for different modules. I'm sure there's similar libraries for C, that's something to think about in future releases. 8.3 will add some new log_* GUC variables, log_autovacuum and log_temp_files, so it seems we're headed to that direction already. I'm now thinking that's the approach we should take for checkpoints as well: add a new GUC variable, log_checkpoints, and print the messages at LOG-level if that's enabled. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Logging checkpoints and other slowdown causes
On Fri, 11 May 2007, Heikki Linnakangas wrote: Printing the messages at LOG-level would bring the code in line with the documentation, but I don't think we want to fill the log with checkpoint chatter unless the DBA explicitly asks for that. How about INFO? In putting that together, I pushed everything toward the lowest DEBUG level that it was possible for it to run at without the messages I wanted to see being completely lost. I didn't want this to get rejected just on the basis of being too chatty for regular users. Putting a single line in for each checkpoint at INFO, with all the individual progress points being at DEBUG1 or 2, would be close to optimal as far as I'm concerned. More on this below. Or we could add a GUC variable similar to log_connections or log_statement to control if the messages are printed or not, and use LOG. If the levels for the individual messages are adjusted usefully the ability to control whether they show up or not falls out of the existing log level adjustments; I haven't ever felt a strong need for GUC when using this code. If, as you say, there's already a move in this direction, then fine--another GUC would be better. We don't need to print the times elapsed in each phase on a separate line, that's just derived information from the other lines, unless we use different log-levels for detail lines Let me turn this around for a second and ask you this: what do you think people are going to do with this information? I can tell you what I do. I parse the logs and look for that single line with the summary information. I then take the three pieces of information it provides (MB written, time to write, time to fsync) and save them into a table. From there I generate stats, scatter plots, all sorts of useful stuff. If you know when the checkpoint ended, and you know how long each of the pieces took, you can reconstruct the other times easily. The way you describe this it is true--that the summary is redundant given the detail--but if you put yourself in the shoes of a log file parser the other way around is easier to work with. Piecing together log entries is a pain, splitting them is easy. If I had to only keep one line out of this, it would be the one with the summary. It would be nice to have it logged at INFO. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Logging checkpoints and other slowdown causes
Greg Smith wrote: On Fri, 11 May 2007, Heikki Linnakangas wrote: Or we could add a GUC variable similar to log_connections or log_statement to control if the messages are printed or not, and use LOG. If the levels for the individual messages are adjusted usefully the ability to control whether they show up or not falls out of the existing log level adjustments; I haven't ever felt a strong need for GUC when using this code. If, as you say, there's already a move in this direction, then fine--another GUC would be better. The need for GUC becomes apparent when you start adding logging to other modules as well, like we did for autovacuum a while back. If there wasn't a log_autovacuum parameter, you'd have to choose between logging checkpoints and autovacuum actions, or neither. We don't need to print the times elapsed in each phase on a separate line, that's just derived information from the other lines, unless we use different log-levels for detail lines Let me turn this around for a second and ask you this: what do you think people are going to do with this information? I can tell you what I do. I parse the logs and look for that single line with the summary information. I then take the three pieces of information it provides (MB written, time to write, time to fsync) and save them into a table. From there I generate stats, scatter plots, all sorts of useful stuff. If you know when the checkpoint ended, and you know how long each of the pieces took, you can reconstruct the other times easily. The way you describe this it is true--that the summary is redundant given the detail--but if you put yourself in the shoes of a log file parser the other way around is easier to work with. Piecing together log entries is a pain, splitting them is easy. If I had to only keep one line out of this, it would be the one with the summary. It would be nice to have it logged at INFO. Yeah, if we have the summary line we don't need the other lines and vice versa. I have sympathy for parsing log files, I've done that a lot in the past and I can see what you mean. Having the individual lines is nice when you're monitoring a running system; you don't get the summary line until the checkpoint is finished. I suppose we can have both the individual lines and the summary, the extra lines shouldn't hurt anyone, and you won't get them unless you turn on the new log_checkpoints parameter anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Logging checkpoints and other slowdown causes
Greg Smith wrote: This patch puts entries in the log file around some backend events that regularly cause client pauses, most of which are visible only when running with log_min_messages = debug2. Something that is aimed at a user should not be enabled at a debug level. Debug levels are for debugging, not for very high verbosity. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Logging checkpoints and other slowdown causes
On Tue, 3 Apr 2007, Peter Eisentraut wrote: Something that is aimed at a user should not be enabled at a debug level. Debug levels are for debugging, not for very high verbosity. I asked for feedback about where to log at when I intially sent the first version of this in and didn't hear anything back on that part, so I pushed these in line with other log messages I saw. The messages for when checkpoints start and stop were both logged at DEBUG2, so I put progress reports on the other significant phases of the process there as well. I don't expect these messages will be helpful for a normal user--that's what the new data in pg_stats_bgwriter is for. Their main purpose of this patch is debugging checkpoint related performance issues at a level I'd expect only a developer to work at; they're also helpful for someone writing benchmark code. There are several patches in process floating around that aim to adjust either the background writer or the checkpoint process to reduce the impact of checkpoints. This logging allows grading their success at that. As my tests with this patch in place suggest this problem is far from solved with any of the current suggestions, I'd like to get other developers looking at that problem the same way I have been; that's why I'd like to see some standardization on how checkpoints are instrumented. The fact that really advanced users might also use this for troubleshooting I consider a bonus rather than the main focus here. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Logging checkpoints and other slowdown causes
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Greg Smith wrote: This patch puts entries in the log file around some backend events that regularly cause client pauses, most of which are visible only when running with log_min_messages = debug2. The main focus is regularly occuring checkpoints, from either running out of segments or hitting a timeout. This is an example of what gets written to the logs at checkpoint time: DEBUG2: checkpoint required (timeout passed) DEBUG2: checkpoint starting DEBUG2: checkpoint flushing buffer pool DEBUG2: checkpoint database fsync starting DEBUG2: checkpoint database fsync complete DEBUG1: checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms sync=3716.5 ms DEBUG2: checkpoint complete; 0 transaction log file(s) added, 0 removed, 0 recycled The idea here is that if you want to really watch what your system is doing, either to track down a problem or for benchmarking, you log at DEBUG2 and timestamp the logs with something like log_line_prefix = '[%m]:'. Then line up your timestamped transactions with the logs to figure out what happened during bad periods. This is very easy to do with the timestamped pgbench latency logs for example, so you can put them in time sequence order and see something like oh, those long transactions all came in between the fsync start and end. The summary line with the write/sync timing information has a broader use and suggests when you need to go deeper, which is why I put that one at DEBUG1. I also adjust some other log messages to make it easier to run the system at DEBUG2 without going completely crazy, like moving individual WAL segment recycles to DEBUG3 and standardizing the format of the fork/exit messages so they're easier to filter out. My main concern about this code is how it adds several gettimeofday calls and complicated string building inside FlushBufferPool, all of which is wasted if this data isn't actually going to get logged. I'd like to have something that tells me whether I should even bother, but I didn't want to poke into the elog variables for fear of adding a dependency on its current structure to here. Ideally, it would be nice to call something like WillLog(DEBUG2) and get a boolean back saying whether something at that log level will be output anywhere; I don't know enough about the logging code to add such a thing to it myself. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD Content-Description: [ Attachment, skipping... ] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] Logging checkpoints and other slowdown causes
This patch puts entries in the log file around some backend events that regularly cause client pauses, most of which are visible only when running with log_min_messages = debug2. The main focus is regularly occuring checkpoints, from either running out of segments or hitting a timeout. This is an example of what gets written to the logs at checkpoint time: DEBUG2: checkpoint required (timeout passed) DEBUG2: checkpoint starting DEBUG2: checkpoint flushing buffer pool DEBUG2: checkpoint database fsync starting DEBUG2: checkpoint database fsync complete DEBUG1: checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms sync=3716.5 ms DEBUG2: checkpoint complete; 0 transaction log file(s) added, 0 removed, 0 recycled The idea here is that if you want to really watch what your system is doing, either to track down a problem or for benchmarking, you log at DEBUG2 and timestamp the logs with something like log_line_prefix = '[%m]:'. Then line up your timestamped transactions with the logs to figure out what happened during bad periods. This is very easy to do with the timestamped pgbench latency logs for example, so you can put them in time sequence order and see something like oh, those long transactions all came in between the fsync start and end. The summary line with the write/sync timing information has a broader use and suggests when you need to go deeper, which is why I put that one at DEBUG1. I also adjust some other log messages to make it easier to run the system at DEBUG2 without going completely crazy, like moving individual WAL segment recycles to DEBUG3 and standardizing the format of the fork/exit messages so they're easier to filter out. My main concern about this code is how it adds several gettimeofday calls and complicated string building inside FlushBufferPool, all of which is wasted if this data isn't actually going to get logged. I'd like to have something that tells me whether I should even bother, but I didn't want to poke into the elog variables for fear of adding a dependency on its current structure to here. Ideally, it would be nice to call something like WillLog(DEBUG2) and get a boolean back saying whether something at that log level will be output anywhere; I don't know enough about the logging code to add such a thing to it myself. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MDIndex: src/backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.265 diff -c -r1.265 xlog.c *** src/backend/access/transam/xlog.c 3 Mar 2007 20:02:26 - 1.265 --- src/backend/access/transam/xlog.c 1 Apr 2007 04:55:19 - *** *** 1591,1600 if (new_highbits != old_highbits || new_segno = old_segno + (uint32) (CheckPointSegments-1)) { ! #ifdef WAL_DEBUG ! if (XLOG_DEBUG) ! elog(LOG, time for a checkpoint, signaling bgwriter); ! #endif RequestCheckpoint(false, true); } } --- 1591,1598 if (new_highbits != old_highbits || new_segno = old_segno + (uint32) (CheckPointSegments-1)) { ! ereport(DEBUG2,(errmsg( ! checkpoint required (wrote checkpoint_segments; RequestCheckpoint(false, true); } } *** *** 1858,1863 --- 1856,1863 (errcode_for_file_access(), errmsg(could not create file \%s\: %m, tmppath))); + ereport(DEBUG2, (errmsg(WAL creating and filling new file on disk))); + /* * Zero-fill the file. We have to do this the hard way to ensure that all * the file space has really been allocated --- on platforms that allow *** *** 2565,2571 true, max_advance, true)) { ! ereport(DEBUG2, (errmsg(recycled transaction log file \%s\, xlde-d_name)));