Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Robert Haas
On Sun, Nov 14, 2010 at 8:31 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith g...@2ndquadrant.com wrote:
 But if this is generating a lot of log data or adding a lot of
 overhead, then you have bigger problems anyway:

 +               elog(DEBUG1, Unable to forward fsync request, executing
 directly);


 The argument against this log line even existing is that if the field is
 added to pg_stat_bgwriter, that's probably how you want to monitor this data
 anyway.

 I'll remove it if you really want it gone, but personally I think it's
 useful to have.  I've more than once had to debug a problem given a
 PostgreSQL log file with the debug level cranked up and not a whole
 lot else.  Rare events that cause performance to tank are worth
 logging, IMHO.

 I started out touching code that called it just sync, but then crossed to
 other code that called it fsync, and made the external UI use that name.
  No objections to sorting that out within my patch so it's consistent.

 OK, I'll do that before I commit it.

Committed with (I think) all the changes discussed, plus a catversion bump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Jeff Janes
On Sun, Nov 14, 2010 at 2:07 PM, Greg Smith g...@2ndquadrant.com wrote:
 The attached patch adds a new field to pg_stat_bgwriter, counting the number
 of times backends execute their own fsync calls.  Normally, when a backend
 needs to fsync data, it passes a request to the background writer, which
 then absorbs the call into its own queue of work to do.  However, under some
 types of heavy system load, the associated queue can fill.  When this
 happens, backends are forced to do their own fsync call.  This is
 potentially much worse than when they do a regular write.

 The really nasty situation is when the background writer is busy because
 it's executing a checkpoint.  In that case, it's possible for the backend
 fsync calls to start competing with the ones the background writer is trying
 to get done,


Do you know where this competition is happening?  Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?

...

 DEBUG:  Absorbing 4096 fsync requests
 DEBUG:  Absorbing 150 fsync requests
 DEBUG:  Unable to forward fsync request, executing directly
 CONTEXT:  writing block 158638 of relation base/16385/16398

 Here 4096 is the most entries the BGW will ever absorb at once, and all 90
 of the missed sync calls are logged so you can see what files they came
 from.

Looking in src/backend/postmaster/bgwriter.c line 1071:

 * Note: we presently make no attempt to eliminate duplicate requests
 * in the requests[] queue.  The bgwriter will have to eliminate dups
 * internally anyway, so we may as well avoid holding the lock longer
 * than we have to here.

This makes sense if we just need to append to a queue.  But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-15 Thread Greg Smith

Jeff Janes wrote:

Do you know where this competition is happening?  Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?
  


Kernel.  Linux systems with lots of memory will happily queue up 
gigabytes of memory in their write cache, only getting serious about 
writing it out to disk when demanded to by fsync.




This makes sense if we just need to append to a queue.  But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?
  


One of the paths I'd like to follow is experimenting with both sorting 
writes by file and looking for duplication in the queues.  I think a 
basic, simple sync spreading approach needs to get finished first 
through; this sort of thing would then be an optimization on top of it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Count backend self-sync calls

2010-11-14 Thread Greg Smith
The attached patch adds a new field to pg_stat_bgwriter, counting the 
number of times backends execute their own fsync calls.  Normally, when 
a backend needs to fsync data, it passes a request to the background 
writer, which then absorbs the call into its own queue of work to do.  
However, under some types of heavy system load, the associated queue can 
fill.  When this happens, backends are forced to do their own fsync 
call.  This is potentially much worse than when they do a regular write.


The really nasty situation is when the background writer is busy because 
it's executing a checkpoint.  In that case, it's possible for the 
backend fsync calls to start competing with the ones the background 
writer is trying to get done, causing the checkpoint sync phase to 
execute slower than it should.  I've seen the sync phase take over 45 
minutes on a really busy server once it got into this condition, where 
hundreds of clients doing their own backend fsync calls were fighting 
against the checkpoint fsync work.  With this patch, you can observe 
that happening as an upwards spike in 
pg_stat_bgwriter.buffers_backend_sync, which as documented is an 
inclusive subset of the total shown in buffers_backend.


While it takes a busier system than I can useful show how to simulate 
here to show a really bad situation, I'm able to see some of these 
unabsorbed backend fsync calls when initializing a pgbench database, to 
prove they happen in the lab.  The attached test program takes as its 
input a pgbench scale counter.  It then creates a pgbench database 
(deleting any existing pgbench database, so watch out for that) and 
shows the values accumulated in pg_stat_bgwriter during that period.  
Here's an example, using the script's default scale of 100 on a server 
with 8GB of RAM and fake fsync (the hard drives are lying about it):


-[ RECORD 1 ]+-
now  | 2010-11-14 16:08:41.36421-05
...
Initializing pgbench
-[ RECORD 1 ]+--
now  | 2010-11-14 16:09:46.713693-05
checkpoints_timed| 0
checkpoints_req  | 0
buffers_checkpoint   | 0
buffers_clean| 0
maxwritten_clean | 0
buffers_backend  | 654716
buffers_backend_sync | 90
buffers_alloc| 803

This is with default sizing for memory structures.  As you increase 
shared_buffers, one of the queues involved here increases 
proportionately, making it less likely to run into this problem.  That 
just changes it to the kind of problem I've only seen on a larger system 
with a difficult to simulate workload.  The production system getting 
hammered with this problem (running a web application) that prompted 
writing the patch had shared_buffers=4GB at the time.


The patch also adds some logging to the innards involved here, to help 
with understanding problems in this area.  I don't think that should be 
in the version committed as is.  May want to drop the logging level or 
make it disabled in regular builds, since it is sitting somewhere it 
generates a lot of log data and adds overhead.  It is nice for now, as 
it lets you get an idea how much fsync work *is* being absorbed by the 
BGW, as well as showing what relation is suffering from this issue.  
Example of both those things, with the default config for everything 
except log_checkpoints (on) and log_min_messages (debug1):


DEBUG:  Absorbing 4096 fsync requests
DEBUG:  Absorbing 150 fsync requests
DEBUG:  Unable to forward fsync request, executing directly
CONTEXT:  writing block 158638 of relation base/16385/16398

Here 4096 is the most entries the BGW will ever absorb at once, and all 
90 of the missed sync calls are logged so you can see what files they 
came from.


As a high-level commentary about this patch, I'm not sure what most end 
users will ever do with this data.  At the same time, I wasn't sure what 
a typical end user would do with anything else in pg_stat_bgwriter 
either when it was added, and it turns out the answer is wait for 
people who know the internals to write things that monitor it.  For 
example, Magnus has updated recent versions of the Munin plug-in for 
PostgreSQL to usefully graph pg_stat_bgwriter data over time.  As I have 
some data to suggest checkpoint problems on Linux in particular are 
getting worse as total system memory increases, I expect that having a 
way to easily instrument for this particular problem will be 
correspondingly more important in the future too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us


diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dbf966b..f701b8f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -264,8 +264,10 @@ postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
   and the number of times the background writer stopped a cleaning scan
  

Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Robert Haas
On Sun, Nov 14, 2010 at 5:07 PM, Greg Smith g...@2ndquadrant.com wrote:
 The patch also adds some logging to the innards involved here, to help with
 understanding problems in this area.  I don't think that should be in the
 version committed as is.  May want to drop the logging level or make it
 disabled in regular builds, since it is sitting somewhere it generates a lot
 of log data and adds overhead.

I think this one could be removed:

+   if (n  0)
+   elog(DEBUG1,Absorbing %d fsync requests,n);

But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+   elog(DEBUG1, Unable to forward fsync request, executing 
directly);

I'm inclined to change that to an ereport(), but otherwise it seems
OK.  Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the f in some places?

With those changes, I think this is committable, and will do so,
barring objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 With those changes, I think this is committable, and will do so,
 barring objections.

Obey message style guidelines, please, especially if you're going
to promote any of those to ereports.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Robert Haas
On Sun, Nov 14, 2010 at 6:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 With those changes, I think this is committable, and will do so,
 barring objections.

 Obey message style guidelines, please, especially if you're going
 to promote any of those to ereports.

The only new message would be the one Greg has as:

Unable to forward fsync request, executing directly

For that, we could just go with:

could not forward fsync request

(Lower case, avoid use of unable, telegram style with program as
implicit subject.)

It might be even better to mention that the reason why we couldn't
forward the fsync request is that the fsync request queue is full.
I'm not sure exactly how to phrase that.  I thought about:

fsync request queue is full

But that seems not to answer the so what question.  There is an
example like this in the docs:

could not forward fsync request (fsync request queue is full)

...but I'm not sure I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Greg Smith

Robert Haas wrote:

I think this one could be removed:

+   if (n  0)
+   elog(DEBUG1,Absorbing %d fsync requests,n);
  


Right; that one is just there to let you know the patch is working, and 
how much the background writer does for you per pass, mainly for the 
purpose of reviewing the patch.




But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+   elog(DEBUG1, Unable to forward fsync request, executing 
directly);
  


The argument against this log line even existing is that if the field is 
added to pg_stat_bgwriter, that's probably how you want to monitor this 
data anyway.  I don't think there's actually much value to it, which is 
one reason I didn't worry too much about matching style guidelines, 
translation, etc.  You've found the two things that I think both need to 
go away before commit, but I left them in because I think they're 
valuable for testing the patch itself does what it's supposed to.



Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the f in some places?
  


I started out touching code that called it just sync, but then crossed 
to other code that called it fsync, and made the external UI use that 
name.  No objections to sorting that out within my patch so it's consistent.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It might be even better to mention that the reason why we couldn't
 forward the fsync request is that the fsync request queue is full.
 I'm not sure exactly how to phrase that.  I thought about:

 fsync request queue is full

 But that seems not to answer the so what question.  There is an
 example like this in the docs:

 could not forward fsync request (fsync request queue is full)

 ...but I'm not sure I like that.

Well, that example is meant to cover cases where you have to assemble a
couple of independently created phrases.  In this case I'd suggest
could not forward fsync request because request queue is full
or, if you think there might sometime be a need to have a strerror
variant, ie
could not forward fsync request: %m
then maybe this would make the most sense:
could not forward fsync request: request queue is full

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Robert Haas
On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith g...@2ndquadrant.com wrote:
 But if this is generating a lot of log data or adding a lot of
 overhead, then you have bigger problems anyway:

 +               elog(DEBUG1, Unable to forward fsync request, executing
 directly);


 The argument against this log line even existing is that if the field is
 added to pg_stat_bgwriter, that's probably how you want to monitor this data
 anyway.

I'll remove it if you really want it gone, but personally I think it's
useful to have.  I've more than once had to debug a problem given a
PostgreSQL log file with the debug level cranked up and not a whole
lot else.  Rare events that cause performance to tank are worth
logging, IMHO.

 I started out touching code that called it just sync, but then crossed to
 other code that called it fsync, and made the external UI use that name.
  No objections to sorting that out within my patch so it's consistent.

OK, I'll do that before I commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Count backend self-sync calls

2010-11-14 Thread Robert Haas
On Sun, Nov 14, 2010 at 7:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It might be even better to mention that the reason why we couldn't
 forward the fsync request is that the fsync request queue is full.
 I'm not sure exactly how to phrase that.  I thought about:

 fsync request queue is full

 But that seems not to answer the so what question.  There is an
 example like this in the docs:

 could not forward fsync request (fsync request queue is full)

 ...but I'm not sure I like that.

 Well, that example is meant to cover cases where you have to assemble a
 couple of independently created phrases.  In this case I'd suggest
        could not forward fsync request because request queue is full

Sounds good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers