Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-29 Thread Heikki Linnakangas

Greg Smith wrote:
I'll take another stab at refining this can of worms I opened.  The one 
thing I noticed on a quick review is that it's almost possible to skip 
all the calls to gettimeofday if log_checkpoints is off now.  I'd like 
to make that a specific goal, because that will make me feel better that 
adding this code has almost no performance impact relative to now unless 
you turn the feature on.


Saving a couple of gettimeofday calls on an event that happens as 
infrequently as checkpoints is not going to make any difference. 
Especially if you compare it to all the other work that's done on 
checkpoint.


I agree with Simon that tracking create/drop separately is unnecessary. 
As for why all the timing info is in ms, given the scale of the numbers 
typically encountered I found it easier to work with.  I originally 
wanted resolution down to 0.1ms if the underlying OS supports it, which 
means 4 figures to the right of the decimal point if the unit was 
switched to seconds.  Quite often the times reported are less than 
100ms, so you'll normally be dealing with fractional part of a second.  
If we take Heikki's example:


LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms 
sync=32.0 ms


And switch it to seconds:

LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms 
sync=0.0320 ms


I don't find that as easy to work with.  The only way a timing in 
seconds would look OK is if the resolution of the whole thing is reduced 
to ms, which then makes 3 decimal points--easy to read as ms instead.  
Having stared at a fair amount of this data now, that's probably fine; 
I'll collect up some more data on it from a fast server this week to 
confirm whether's it's worthless precision or worth capturing.


The checkpoint will take at least a couple of seconds on any interesting 
system, so 0.1 s resolution should be enough IMHO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-29 Thread Greg Smith

On Tue, 29 May 2007, Heikki Linnakangas wrote:

The checkpoint will take at least a couple of seconds on any interesting 
system, so 0.1 s resolution should be enough IMHO.


You may be underestimating the resources some interesting systems are 
willing to put into speeding up checkpoints.  I'm sometimes dumping into a 
SAN whose cache is bigger than the shared_buffer cache in the server, and 
0.1s isn't really enough resolution in that situation.  A second is a 
really long checkpoint there.  Since even that's limited by fiber-channel 
speeds, I know it's possible to do better than what I'm seeing with 
something like a PCIe host adapter having on-board cache in the GB range 
(which isn't that expensive nowadays).


Also, even if the checkpoint total takes seconds, much of that is in the 
sync phase; the write time can still be in the small number of ms range, 
and I wouldn't want to see that truncated too much.


Anyway, I have a bunch of data on this subject being collected at this 
moment, and I'll rescale the results based on what I see after analyzing 
that this week.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-28 Thread Heikki Linnakangas
Here's a work-in-progress updated version of this patch. It's not 
completed, but I've settled on:


- a log_checkpoints GUC variable to control whether anything is printed 
or not. This is in line with log_connections, log_autovacuum etc.


- when checkpoints are logged, LOG-level is used

- the following log output:

LOG:  checkpoint starting (CHECKPOINT command)
LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms 
sync=32.0 ms


where checkpoint starting is obviously printed when checkpoint starts, 
and checkpoint complete when the checkpoint is fully complete.


- all information on the progress of checkpoint is collected to a global 
struct. A global struct sounds a bit ugly, but it avoids the ugliness of 
littering all functions with multiple return values to pass back 
information to the caller that's only used to print it out. This makes 
it easy to add more data points. MoveOfflineLogs should probably be 
changed to use that as well for the sake of consistency, instead of 
returning nsegsremoved and nsegsrecycled through pointers.


- RequestCheckpoint and CreateCheckpoint was changed to take a 
CheckpointReason enum as argument, instead of boolean flags. This allows 
printing a different message depending on the cause of checkpoint, and 
is also cleaner IMHO.


I'm posting now to allow others, and Greg Smith in particular, to review 
the changes this far, and take over from here.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/transam/xlog.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.268
diff -c -r1.268 xlog.c
*** src/backend/access/transam/xlog.c	30 Apr 2007 21:01:52 -	1.268
--- src/backend/access/transam/xlog.c	20 May 2007 15:21:49 -
***
*** 71,76 
--- 71,78 
  bool		XLOG_DEBUG = false;
  #endif
  
+ bool log_checkpoints = false;
+ 
  /*
   * XLOGfileslop is used in the code as the allowed fuzz in the number of
   * preallocated XLOG segments --- we try to have at least XLOGfiles advance
***
*** 120,125 
--- 122,130 
  static TimestampTz recoveryStopTime;
  static bool recoveryStopAfter;
  
+ CheckpointProgressData checkpointProgress;
+ 
+ 
  /*
   * During normal operation, the only timeline we care about is ThisTimeLineID.
   * During recovery, however, things are more complicated.  To simplify life
***
*** 1595,1601 
  		if (XLOG_DEBUG)
  			elog(LOG, time for a checkpoint, signaling bgwriter);
  #endif
! 		RequestCheckpoint(false, true);
  	}
  }
  			}
--- 1600,1606 
  		if (XLOG_DEBUG)
  			elog(LOG, time for a checkpoint, signaling bgwriter);
  #endif
! 		RequestCheckpoint(false, CHKP_SEGMENTS);
  	}
  }
  			}
***
*** 5067,5073 
  		 * the rule that TLI only changes in shutdown checkpoints, which
  		 * allows some extra error checking in xlog_redo.
  		 */
! 		CreateCheckPoint(true, true);
  
  		/*
  		 * Close down recovery environment
--- 5072,5078 
  		 * the rule that TLI only changes in shutdown checkpoints, which
  		 * allows some extra error checking in xlog_redo.
  		 */
! 		CreateCheckPoint(CHKP_SHUTDOWN); /* XXX startup rather than shutdown... */
  
  		/*
  		 * Close down recovery environment
***
*** 5340,5346 
  	ereport(LOG,
  			(errmsg(shutting down)));
  
! 	CreateCheckPoint(true, true);
  	ShutdownCLOG();
  	ShutdownSUBTRANS();
  	ShutdownMultiXact();
--- 5345,5351 
  	ereport(LOG,
  			(errmsg(shutting down)));
  
! 	CreateCheckPoint(CHKP_SHUTDOWN);
  	ShutdownCLOG();
  	ShutdownSUBTRANS();
  	ShutdownMultiXact();
***
*** 5356,5362 
   * activity has occurred since the last one.
   */
  void
! CreateCheckPoint(bool shutdown, bool force)
  {
  	CheckPoint	checkPoint;
  	XLogRecPtr	recptr;
--- 5361,5367 
   * activity has occurred since the last one.
   */
  void
! CreateCheckPoint(CheckpointReason reason)
  {
  	CheckPoint	checkPoint;
  	XLogRecPtr	recptr;
***
*** 5370,5375 
--- 5375,5435 
  	int			nsegsrecycled = 0;
  	TransactionId *inCommitXids;
  	int			nInCommit;
+ 	struct timeval start_t;
+ 	int			elapsed_secs;
+ 
+ 	/* Log beginning of checkpoint */
+ 
+ 	gettimeofday(start_t, NULL);
+ 
+ 	if (log_checkpoints)
+ 	{
+ 		switch(reason)
+ 		{
+ 			case CHKP_SHUTDOWN:
+ /* ereport(LOG, (errmsg(checkpoint starting (shutdown; */
+ break;
+ 			case CHKP_COMMAND:
+ ereport(LOG, (errmsg(checkpoint starting (CHECKPOINT command;
+ break;
+ 			case CHKP_BACKUP:
+ ereport(LOG, (errmsg(checkpoint starting (starting backup;
+ break;
+ 			case CHKP_TIMEOUT:
+ ereport(LOG, (errmsg(checkpoint starting (checkpoint_timeout reached;
+ break;
+ 			case CHKP_SEGMENTS:
+ ereport(LOG, (errmsg(checkpoint starting (checkpoint_segments reached;
+ 

Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-28 Thread Simon Riggs
On Mon, 2007-05-28 at 21:25 +0100, Heikki Linnakangas wrote:

 - RequestCheckpoint and CreateCheckpoint was changed to take a 
 CheckpointReason enum as argument, instead of boolean flags. This allows 
 printing a different message depending on the cause of checkpoint, and 
 is also cleaner IMHO.

Looks good.

I'd be inclined to make the reason code for creating/dropping database
the same and just say Cluster DDL or something, since for analysis
purposes we just want to know it was DDL.

I'd rather see the timings in seconds rather than ms. Counting
significant digits will be a pain...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-28 Thread Greg Smith
I'll take another stab at refining this can of worms I opened.  The one 
thing I noticed on a quick review is that it's almost possible to skip all 
the calls to gettimeofday if log_checkpoints is off now.  I'd like to make 
that a specific goal, because that will make me feel better that adding 
this code has almost no performance impact relative to now unless you turn 
the feature on.


I agree with Simon that tracking create/drop separately is unnecessary. 
As for why all the timing info is in ms, given the scale of the numbers 
typically encountered I found it easier to work with.  I originally wanted 
resolution down to 0.1ms if the underlying OS supports it, which means 4 
figures to the right of the decimal point if the unit was switched to 
seconds.  Quite often the times reported are less than 100ms, so you'll 
normally be dealing with fractional part of a second.  If we take Heikki's 
example:


LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms 
sync=32.0 ms


And switch it to seconds:

LOG:  checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms 
sync=0.0320 ms


I don't find that as easy to work with.  The only way a timing in seconds 
would look OK is if the resolution of the whole thing is reduced to ms, 
which then makes 3 decimal points--easy to read as ms instead.  Having 
stared at a fair amount of this data now, that's probably fine; I'll 
collect up some more data on it from a fast server this week to confirm 
whether's it's worthless precision or worth capturing.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)

2007-05-12 Thread Jim C. Nasby
Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
 If you know when the checkpoint ended, and you know how long each of the 
 pieces took, you can reconstruct the other times easily.  The way you 
 describe this it is true--that the summary is redundant given the 
 detail--but if you put yourself in the shoes of a log file parser the 
 other way around is easier to work with.  Piecing together log entries 
 is a pain, splitting them is easy.
 
 If I had to only keep one line out of this, it would be the one with the 
 summary.  It would be nice to have it logged at INFO.
 
 Yeah, if we have the summary line we don't need the other lines and vice 
 versa. I have sympathy for parsing log files, I've done that a lot in 
 the past and I can see what you mean. Having the individual lines is 
 nice when you're monitoring a running system; you don't get the summary 
 line until the checkpoint is finished. I suppose we can have both the 
 individual lines and the summary, the extra lines shouldn't hurt anyone, 
 and you won't get them unless you turn on the new log_checkpoints 
 parameter anyway.

Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-11 Thread Heikki Linnakangas

Greg Smith wrote:

On Tue, 3 Apr 2007, Peter Eisentraut wrote:


Something that is aimed at a user should not be enabled at a debug
level.  Debug levels are for debugging, not for very high verbosity.


I asked for feedback about where to log at when I intially sent the 
first version of this in and didn't hear anything back on that part, so 
I pushed these in line with other log messages I saw.  The messages for 
when checkpoints start and stop were both logged at DEBUG2, so I put 
progress reports on the other significant phases of the process there as 
well.


I agree that debug levels are not suitable for this.

I'm thinking of INFO, NOTICE or LOG. The user manual says about LOG:

LOG

Reports information of interest to administrators, e.g., checkpoint 
activity.


But looking at the code, all the checkpoint related messages are at 
DEBUG-levels, nothing gets printed at LOG-level. Printing the messages 
at LOG-level would bring the code in line with the documentation, but I 
don't think we want to fill the log with checkpoint chatter unless the 
DBA explicitly asks for that.


How about INFO? It seems like the best level for information on normal 
activity of the system. The documentation however has this to say about it:


INFO

Provides information implicitly requested by the user, e.g., during 
VACUUM VERBOSE.


We should adjust the documentation, but INFO seems like the best level 
to me. Or we could add a GUC variable similar to log_connections or 
log_statement to control if the messages are printed or not, and use LOG.


I don't expect these messages will be helpful for a normal user--that's 
what the new data in pg_stats_bgwriter is for.  Their main purpose of 
this patch is debugging checkpoint related performance issues at a level 
I'd expect only a developer to work at; they're also helpful for someone 
writing benchmark code.


I disagree. They would be helpful to an administrator chasing down 
checkpoint related problems. E.g. checkpoints taking too long, occurring 
too often (though we already have log_checkpoint_warning for that), or 
to identify if transient performance problems that users complain about 
coincide with checkpoints. And at least I like to have messages like 
that in the log just to comfort me that everything is going ok.


There are several patches in process floating around that aim to adjust 
either the background writer or the checkpoint process to reduce the 
impact of checkpoints.  This logging allows grading their success at 
that. As my tests with this patch in place suggest this problem is far 
from solved with any of the current suggestions, I'd like to get other 
developers looking at that problem the same way I have been; that's why 
I'd like to see some standardization on how checkpoints are 
instrumented. The fact that really advanced users might also use this 
for troubleshooting I consider a bonus rather than the main focus here.


Agreed.


Looking at the patch, I think we want one line in the log when 
checkpoint starts, one when buffer flushing starts (or maybe not, if we 
assume that checkpointing clog, subtrans and multixact don't take long), 
one when sync-phase starts and when the checkpoint is done. We don't 
need to print the times elapsed in each phase on a separate line, that's 
just derived information from the other lines, unless we use different 
log-levels for detail lines (like you did in your patch).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-11 Thread Guillaume Smet

On 5/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

I agree that debug levels are not suitable for this.


Squid (the (reverse) proxy) has a nice feature for debugging purposes
(admin and developers). It's possible to have different logging level
for different sections of the code. For example, if you want to debug
a particular part, you can have ALL,1 33,2 in your config file which
allows you to have a more verbose debug on the section 33 of the code
(33 is ACL).
It's really useful because you can even debug a particular code path
or a misconfiguration in production without generating a huge amount
of logs.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-11 Thread Heikki Linnakangas

Guillaume Smet wrote:

On 5/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:

I agree that debug levels are not suitable for this.


Squid (the (reverse) proxy) has a nice feature for debugging purposes
(admin and developers). It's possible to have different logging level
for different sections of the code. For example, if you want to debug
a particular part, you can have ALL,1 33,2 in your config file which
allows you to have a more verbose debug on the section 33 of the code
(33 is ACL).
It's really useful because you can even debug a particular code path
or a misconfiguration in production without generating a huge amount
of logs.


Yes, that'd be really nice, though using numeric codes is a bit archaic. 
In Java-world, a logging framework called log4j has become the de facto 
standard to implement logging. It's basically the same idea, you have 
different log levels, and you can have enable different levels of 
logging for different modules. I'm sure there's similar libraries for C, 
that's something to think about in future releases.


8.3 will add some new log_* GUC variables, log_autovacuum and 
log_temp_files, so it seems we're headed to that direction already. I'm 
now thinking that's the approach we should take for checkpoints as well: 
add a new GUC variable, log_checkpoints, and print the messages at 
LOG-level if that's enabled.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-11 Thread Greg Smith

On Fri, 11 May 2007, Heikki Linnakangas wrote:

Printing the messages at LOG-level would bring the code in line with the 
documentation, but I don't think we want to fill the log with checkpoint 
chatter unless the DBA explicitly asks for that.  How about INFO?


In putting that together, I pushed everything toward the lowest DEBUG 
level that it was possible for it to run at without the messages I wanted 
to see being completely lost.  I didn't want this to get rejected just on 
the basis of being too chatty for regular users.  Putting a single line in 
for each checkpoint at INFO, with all the individual progress points being 
at DEBUG1 or 2, would be close to optimal as far as I'm concerned.  More 
on this below.


Or we could add a GUC variable similar to log_connections or log_statement to 
control if the messages are printed or not, and use LOG.


If the levels for the individual messages are adjusted usefully the 
ability to control whether they show up or not falls out of the existing 
log level adjustments; I haven't ever felt a strong need for GUC when 
using this code.  If, as you say, there's already a move in this 
direction, then fine--another GUC would be better.


We don't need to print the times elapsed in each phase on a separate 
line, that's just derived information from the other lines, unless we 
use different log-levels for detail lines


Let me turn this around for a second and ask you this:  what do you think 
people are going to do with this information?  I can tell you what I do. 
I parse the logs and look for that single line with the summary 
information.  I then take the three pieces of information it provides (MB 
written, time to write, time to fsync) and save them into a table.  From 
there I generate stats, scatter plots, all sorts of useful stuff.


If you know when the checkpoint ended, and you know how long each of the 
pieces took, you can reconstruct the other times easily.  The way you 
describe this it is true--that the summary is redundant given the 
detail--but if you put yourself in the shoes of a log file parser the 
other way around is easier to work with.  Piecing together log entries is 
a pain, splitting them is easy.


If I had to only keep one line out of this, it would be the one with the 
summary.  It would be nice to have it logged at INFO.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-05-11 Thread Heikki Linnakangas

Greg Smith wrote:

On Fri, 11 May 2007, Heikki Linnakangas wrote:
Or we could add a GUC variable similar to log_connections or 
log_statement to control if the messages are printed or not, and use LOG.


If the levels for the individual messages are adjusted usefully the 
ability to control whether they show up or not falls out of the existing 
log level adjustments; I haven't ever felt a strong need for GUC when 
using this code.  If, as you say, there's already a move in this 
direction, then fine--another GUC would be better.


The need for GUC becomes apparent when you start adding logging to other 
modules as well, like we did for autovacuum a while back. If there 
wasn't a log_autovacuum parameter, you'd have to choose between logging 
checkpoints and autovacuum actions, or neither.


We don't need to print the times elapsed in each phase on a separate 
line, that's just derived information from the other lines, unless we 
use different log-levels for detail lines


Let me turn this around for a second and ask you this:  what do you 
think people are going to do with this information?  I can tell you what 
I do. I parse the logs and look for that single line with the summary 
information.  I then take the three pieces of information it provides 
(MB written, time to write, time to fsync) and save them into a table.  
 From there I generate stats, scatter plots, all sorts of useful stuff.


If you know when the checkpoint ended, and you know how long each of the 
pieces took, you can reconstruct the other times easily.  The way you 
describe this it is true--that the summary is redundant given the 
detail--but if you put yourself in the shoes of a log file parser the 
other way around is easier to work with.  Piecing together log entries 
is a pain, splitting them is easy.


If I had to only keep one line out of this, it would be the one with the 
summary.  It would be nice to have it logged at INFO.


Yeah, if we have the summary line we don't need the other lines and vice 
versa. I have sympathy for parsing log files, I've done that a lot in 
the past and I can see what you mean. Having the individual lines is 
nice when you're monitoring a running system; you don't get the summary 
line until the checkpoint is finished. I suppose we can have both the 
individual lines and the summary, the extra lines shouldn't hurt anyone, 
and you won't get them unless you turn on the new log_checkpoints 
parameter anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-04-03 Thread Peter Eisentraut
Greg Smith wrote:
 This patch puts entries in the log file around some backend events
 that regularly cause client pauses, most of which are visible only
 when running with log_min_messages = debug2.

Something that is aimed at a user should not be enabled at a debug 
level.  Debug levels are for debugging, not for very high verbosity.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-04-03 Thread Greg Smith

On Tue, 3 Apr 2007, Peter Eisentraut wrote:


Something that is aimed at a user should not be enabled at a debug
level.  Debug levels are for debugging, not for very high verbosity.


I asked for feedback about where to log at when I intially sent the first 
version of this in and didn't hear anything back on that part, so I pushed 
these in line with other log messages I saw.  The messages for when 
checkpoints start and stop were both logged at DEBUG2, so I put progress 
reports on the other significant phases of the process there as well.


I don't expect these messages will be helpful for a normal user--that's 
what the new data in pg_stats_bgwriter is for.  Their main purpose of this 
patch is debugging checkpoint related performance issues at a level I'd 
expect only a developer to work at; they're also helpful for someone 
writing benchmark code.


There are several patches in process floating around that aim to adjust 
either the background writer or the checkpoint process to reduce the 
impact of checkpoints.  This logging allows grading their success at that. 
As my tests with this patch in place suggest this problem is far from 
solved with any of the current suggestions, I'd like to get other 
developers looking at that problem the same way I have been; that's why 
I'd like to see some standardization on how checkpoints are instrumented. 
The fact that really advanced users might also use this for 
troubleshooting I consider a bonus rather than the main focus here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Logging checkpoints and other slowdown causes

2007-04-02 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Greg Smith wrote:
 This patch puts entries in the log file around some backend events that 
 regularly cause client pauses, most of which are visible only when running 
 with log_min_messages = debug2.  The main focus is regularly occuring 
 checkpoints, from either running out of segments or hitting a timeout. 
 This is an example of what gets written to the logs at checkpoint time:
 
 DEBUG2:  checkpoint required (timeout passed)
 DEBUG2:  checkpoint starting
 DEBUG2:  checkpoint flushing buffer pool
 DEBUG2:  checkpoint database fsync starting
 DEBUG2:  checkpoint database fsync complete
 DEBUG1:  checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms 
 sync=3716.5 ms
 DEBUG2:  checkpoint complete; 0 transaction log file(s) added, 0 removed, 
 0 recycled
 
 The idea here is that if you want to really watch what your system is 
 doing, either to track down a problem or for benchmarking, you log at 
 DEBUG2 and timestamp the logs with something like log_line_prefix = 
 '[%m]:'.  Then line up your timestamped transactions with the logs to 
 figure out what happened during bad periods.  This is very easy to do with 
 the timestamped pgbench latency logs for example, so you can put them in 
 time sequence order and see something like oh, those long transactions 
 all came in between the fsync start and end.  The summary line with the 
 write/sync timing information has a broader use and suggests when you need 
 to go deeper, which is why I put that one at DEBUG1.
 
 I also adjust some other log messages to make it easier to run the system 
 at DEBUG2 without going completely crazy, like moving individual WAL 
 segment recycles to DEBUG3 and standardizing the format of the fork/exit 
 messages so they're easier to filter out.
 
 My main concern about this code is how it adds several gettimeofday calls 
 and complicated string building inside FlushBufferPool, all of which is 
 wasted if this data isn't actually going to get logged.  I'd like to have 
 something that tells me whether I should even bother, but I didn't want to 
 poke into the elog variables for fear of adding a dependency on its 
 current structure to here.  Ideally, it would be nice to call something 
 like WillLog(DEBUG2) and get a boolean back saying whether something at 
 that log level will be output anywhere; I don't know enough about the 
 logging code to add such a thing to it myself.
 
 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
Content-Description: 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] Logging checkpoints and other slowdown causes

2007-04-01 Thread Greg Smith
This patch puts entries in the log file around some backend events that 
regularly cause client pauses, most of which are visible only when running 
with log_min_messages = debug2.  The main focus is regularly occuring 
checkpoints, from either running out of segments or hitting a timeout. 
This is an example of what gets written to the logs at checkpoint time:


DEBUG2:  checkpoint required (timeout passed)
DEBUG2:  checkpoint starting
DEBUG2:  checkpoint flushing buffer pool
DEBUG2:  checkpoint database fsync starting
DEBUG2:  checkpoint database fsync complete
DEBUG1:  checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms 
sync=3716.5 ms
DEBUG2:  checkpoint complete; 0 transaction log file(s) added, 0 removed, 
0 recycled


The idea here is that if you want to really watch what your system is 
doing, either to track down a problem or for benchmarking, you log at 
DEBUG2 and timestamp the logs with something like log_line_prefix = 
'[%m]:'.  Then line up your timestamped transactions with the logs to 
figure out what happened during bad periods.  This is very easy to do with 
the timestamped pgbench latency logs for example, so you can put them in 
time sequence order and see something like oh, those long transactions 
all came in between the fsync start and end.  The summary line with the 
write/sync timing information has a broader use and suggests when you need 
to go deeper, which is why I put that one at DEBUG1.


I also adjust some other log messages to make it easier to run the system 
at DEBUG2 without going completely crazy, like moving individual WAL 
segment recycles to DEBUG3 and standardizing the format of the fork/exit 
messages so they're easier to filter out.


My main concern about this code is how it adds several gettimeofday calls 
and complicated string building inside FlushBufferPool, all of which is 
wasted if this data isn't actually going to get logged.  I'd like to have 
something that tells me whether I should even bother, but I didn't want to 
poke into the elog variables for fear of adding a dependency on its 
current structure to here.  Ideally, it would be nice to call something 
like WillLog(DEBUG2) and get a boolean back saying whether something at 
that log level will be output anywhere; I don't know enough about the 
logging code to add such a thing to it myself.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MDIndex: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.265
diff -c -r1.265 xlog.c
*** src/backend/access/transam/xlog.c   3 Mar 2007 20:02:26 -   1.265
--- src/backend/access/transam/xlog.c   1 Apr 2007 04:55:19 -
***
*** 1591,1600 
if (new_highbits != old_highbits ||
new_segno = old_segno + 
(uint32) (CheckPointSegments-1))
{
! #ifdef WAL_DEBUG
!   if (XLOG_DEBUG)
!   elog(LOG, time for a 
checkpoint, signaling bgwriter);
! #endif
RequestCheckpoint(false, true);
}
}
--- 1591,1598 
if (new_highbits != old_highbits ||
new_segno = old_segno + 
(uint32) (CheckPointSegments-1))
{
!   ereport(DEBUG2,(errmsg(
!   checkpoint required 
(wrote checkpoint_segments;
RequestCheckpoint(false, true);
}
}
***
*** 1858,1863 
--- 1856,1863 
(errcode_for_file_access(),
 errmsg(could not create file \%s\: %m, 
tmppath)));
  
+   ereport(DEBUG2, (errmsg(WAL creating and filling new file on disk)));
+ 
/*
 * Zero-fill the file.  We have to do this the hard way to ensure that 
all
 * the file space has really been allocated --- on platforms that allow
***
*** 2565,2571 

   true, max_advance,

   true))
{
!   ereport(DEBUG2,
(errmsg(recycled 
transaction log file \%s\,

xlde-d_name)));