Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-04-05 Thread Peter Geoghegan
On 3 April 2012 12:11, Robert Haas robertmh...@gmail.com wrote:
 Well, for 9.2, I am asking that you rip all that stuff out of the
 patch altogether so we can focus on the stuff that was in the original
 patch.

Given how we're pushed for time, I'm inclined to agree that that is
the best course of action.

Attached revision does not attempt to do anything with strategy
writes/allocations.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


bgwriter_stats_2012_04_05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-04-05 Thread Jeff Janes
On Wed, Mar 28, 2012 at 7:23 AM, Robert Haas robertmh...@gmail.com wrote:

 Although I liked the idea of separating this out, I wasn't thinking we
 should do it as part of this patch.  It seems like an independent
 project.  At any rate, I strongly agree that counting the number of
 strategy allocations is not really a viable proxy for counting the
 number of backend writes.  You can't know how many of those actually
 got dirtied.

I was thinking that it would be part of this patch, but only because
Greg mentioned the view format stabilizing.  If it is going to be
frozen, I wanted this in it.  But I guess stabilize and fossilize are
different things...

 Since any backend write is necessarily the result of that backend
 trying to allocate a buffer, I think maybe we should just count
 whether the number of times it was trying to allocate a buffer *using
 a BAS* vs. the number of times it was trying to allocate a buffer *not
 using a BAS*.  That is, decide whether or not it's a strategy write
 not based on whether the buffer came in via a strategy allocation, but
 rather based on whether it's going out as a result of a strategy
 allocation.

Yes, exactly.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 9:05 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 3 April 2012 12:11, Robert Haas robertmh...@gmail.com wrote:
 Well, for 9.2, I am asking that you rip all that stuff out of the
 patch altogether so we can focus on the stuff that was in the original
 patch.

 Given how we're pushed for time, I'm inclined to agree that that is
 the best course of action.

 Attached revision does not attempt to do anything with strategy
 writes/allocations.

I have committed this after extensive revisions.  I removed the
sync_files count, since no one ever explained what that was good for.
I renamed a bunch of things so that it was clear that these stats were
referring to checkpoints rather than anything else.  I moved the
documentation to what I believe to be the correct place.  I fixed a
few other assorted things, too, and reverted a couple hunks that
didn't seem to me to be adding anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-04-03 Thread Robert Haas
On Sat, Mar 31, 2012 at 5:34 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Since any backend write is necessarily the result of that backend
 trying to allocate a buffer, I think maybe we should just count
 whether the number of times it was trying to allocate a buffer *using
 a BAS* vs. the number of times it was trying to allocate a buffer *not
 using a BAS*.  That is, decide whether or not it's a strategy write
 not based on whether the buffer came in via a strategy allocation, but
 rather based on whether it's going out as a result of a strategy
 allocation.

 I'm not quite sure I understand what you mean here. Are you suggesting
 that I produce a revision that bumps beside FlushBuffer() in
 BufferAlloc(), as a dirty page is evicted/written, while breaking the
 figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both
 figures be presented as separate columns within pg_stat_bgwriter?

Well, for 9.2, I am asking that you rip all that stuff out of the
patch altogether so we can focus on the stuff that was in the original
patch.

For 9.3, yes, that's what I'm suggesting, although I'll defer to you,
Jeff, and others on whether it's a *good* suggestion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-31 Thread Peter Geoghegan
On 28 March 2012 15:23, Robert Haas robertmh...@gmail.com wrote:
 At any rate, I strongly agree that counting the number of
 strategy allocations is not really a viable proxy for counting the
 number of backend writes.  You can't know how many of those actually
 got dirtied.

Sure.

 Since any backend write is necessarily the result of that backend
 trying to allocate a buffer, I think maybe we should just count
 whether the number of times it was trying to allocate a buffer *using
 a BAS* vs. the number of times it was trying to allocate a buffer *not
 using a BAS*.  That is, decide whether or not it's a strategy write
 not based on whether the buffer came in via a strategy allocation, but
 rather based on whether it's going out as a result of a strategy
 allocation.

I'm not quite sure I understand what you mean here. Are you suggesting
that I produce a revision that bumps beside FlushBuffer() in
BufferAlloc(), as a dirty page is evicted/written, while breaking the
figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both
figures be presented as separate columns within pg_stat_bgwriter?
-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-28 Thread Robert Haas
On Thu, Mar 22, 2012 at 6:07 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 23 January 2012 02:08, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm finding the backend_writes column pretty unfortunate.  The only
 use I know of for it is to determine if the bgwriter is lagging
 behind.  Yet it doesn't serve even this purpose because it lumps
 together the backend writes due to lagging background writes, and the
 backend writes by design due to the use buffer access strategy
 during bulk inserts.

 +1 for separating those.

 I decided to have a go myself. Attached patch breaks out strategy
 allocations in pg_stat_bgwriter, but not strategy writes. My thinking
 is that this may serve to approximate non-BAS_NORMAL writes, with the
 considerable advantage of not requiring that I work backwards to
 figure out strategy from some block when backend-local syncing (yes,
 syncing, not writing) a buffer to work out which strategy object
 references the buffer. The bookkeeping that that would likely entail
 seems to make it infeasible.

 Incidentally, it seems Postgres doesn't currently record backend
 writes when the buffer doesn't go on to be sync'd. That seems
 problematic to me, or at the very least a misrepresentation, since
 temporary tables will be written out by the backend for example. Not
 sure if it's worth fixing, though I've added a comment to that effect
 at the site of where backend_writes is bumped.

 I have corrected the md.c bug. This previously would have prevented
 the sync_files (number of relation segments synced) value from being
 valid in non-log_checkpoints configurations.

 I'm not currently confident that the strategy_alloc filed is a very
 useful proxy for a strategy_backend_writes field. I think that rather
 than bumping the strategy allocation count analogously to the way the
 overall count is bumped (within StrategyGetBuffer()), I should have
 bumped earlier within BufferAlloc() so that it'd count if the buffer
 was requested with non-BAS_NORMAL strategy but was found in
 shared_buffers (so control never got as far as StrategyGetBuffer() ).
 That might make the value more closely line-up to the would-be value
 of a strategy_backend_writes column. What do you think?

Although I liked the idea of separating this out, I wasn't thinking we
should do it as part of this patch.  It seems like an independent
project.  At any rate, I strongly agree that counting the number of
strategy allocations is not really a viable proxy for counting the
number of backend writes.  You can't know how many of those actually
got dirtied.

Since any backend write is necessarily the result of that backend
trying to allocate a buffer, I think maybe we should just count
whether the number of times it was trying to allocate a buffer *using
a BAS* vs. the number of times it was trying to allocate a buffer *not
using a BAS*.  That is, decide whether or not it's a strategy write
not based on whether the buffer came in via a strategy allocation, but
rather based on whether it's going out as a result of a strategy
allocation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-22 Thread Peter Geoghegan
On 23 January 2012 02:08, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm finding the backend_writes column pretty unfortunate.  The only
 use I know of for it is to determine if the bgwriter is lagging
 behind.  Yet it doesn't serve even this purpose because it lumps
 together the backend writes due to lagging background writes, and the
 backend writes by design due to the use buffer access strategy
 during bulk inserts.

 +1 for separating those.

I decided to have a go myself. Attached patch breaks out strategy
allocations in pg_stat_bgwriter, but not strategy writes. My thinking
is that this may serve to approximate non-BAS_NORMAL writes, with the
considerable advantage of not requiring that I work backwards to
figure out strategy from some block when backend-local syncing (yes,
syncing, not writing) a buffer to work out which strategy object
references the buffer. The bookkeeping that that would likely entail
seems to make it infeasible.

Incidentally, it seems Postgres doesn't currently record backend
writes when the buffer doesn't go on to be sync'd. That seems
problematic to me, or at the very least a misrepresentation, since
temporary tables will be written out by the backend for example. Not
sure if it's worth fixing, though I've added a comment to that effect
at the site of where backend_writes is bumped.

I have corrected the md.c bug. This previously would have prevented
the sync_files (number of relation segments synced) value from being
valid in non-log_checkpoints configurations.

I'm not currently confident that the strategy_alloc filed is a very
useful proxy for a strategy_backend_writes field. I think that rather
than bumping the strategy allocation count analogously to the way the
overall count is bumped (within StrategyGetBuffer()), I should have
bumped earlier within BufferAlloc() so that it'd count if the buffer
was requested with non-BAS_NORMAL strategy but was found in
shared_buffers (so control never got as far as StrategyGetBuffer() ).
That might make the value more closely line-up to the would-be value
of a strategy_backend_writes column. What do you think?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
new file mode 100644
index 840e54a..f714cb8
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*** SELECT pg_stat_get_backend_pid(s.backend
*** 806,811 
--- 806,818 
   the functionpg_stat_get_buf_alloc/function function./entry
   /row
   row
+   entrybuffers_strat_alloc/entry
+   entrytypebigint/type/entry
+   entryNumber of buffers allocated with non-default buffer access strategy.
+  This value can also be returned by directly calling
+  the functionpg_stat_get_buf_strat_alloc/function function./entry
+  /row
+  row
entrystats_reset/entry
entrytypebigint/type/entry
entryThe last time these statistics were reset.
*** SELECT pg_stat_get_backend_pid(s.backend
*** 1703,1708 
--- 1710,1741 
/entry
   /row
  
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_write_time()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are written to disk, in milliseconds.
+   /entry
+  /row
+ 
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_sync_time()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are synchronized to disk, in milliseconds.
+   /entry
+  /row
+ 
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_sync_files()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total number of files that have been synchronized to disk during
+ 	   checkpoint processing.
+   /entry
+  /row
  
   row
entryliteralfunctionpg_stat_get_wal_senders()/function/literal/entry
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index ff7f521..e481be3
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** LogCheckpointStart(int flags, bool resta
*** 7492,7501 
  }
  
  /*
!  * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  sync_secs,
--- 7492,7501 
  }
  
  /*
!  * Time and potentially log the end of a checkpoint.
   */
  static void
! TimeCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  sync_secs,
*** LogCheckpointEnd(bool restartpoint)
*** 7511,7519 
  
  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
  
- 	

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-09 Thread Robert Haas
On Wed, Feb 22, 2012 at 2:11 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 One beef that I have with the variable name m_write_ms is that ms
 could equally well refer to microseconds or milliseconds, and these
 mistakes are very common.

I would expect ms to mean milliseconds and us to mean microseconds.

 Details of existing comment bug: The docs show that
 pg_stat_*_functions accumulates time in units of milliseconds.
 However, a PgStat_FunctionEntry payload is commented as sending the
 time/self time to the stats collector in microseconds. So this
 comment, in the existing code, is actually wrong:

        PgStat_Counter f_time;          /* times in microseconds */
        PgStat_Counter f_time_self;
 } PgStat_StatFuncEntry;

I think that the comment is not wrong.  The code that populates it
looks like this:

m_ent-f_time = INSTR_TIME_GET_MICROSEC(entry-f_counts.f_time);
m_ent-f_time_self = INSTR_TIME_GET_MICROSEC(entry-f_counts.f_time_self

If that's not producing microseconds, those macros are badly misnamed.
 Note that the pg_stat_user_functions view divides the return value of
the function by 1000, which is why the view output is in milliseconds.

 As for the substance of the patch, I am in general agreement that this
 is a good idea. Storing the statistics in pg_stat_bgwriter is a more
 flexible approach than is immediately apparent.  Users can usefully
 calculate delta values, as part of the same process through which
 checkpoint tuning is performed by comparing output from select now(),
 * from pg_stat_bgwriter at different intervals. This also puts this
 information within easy reach of monitoring tools.

On further thought, I'll revise the position I took upthread and
concede that write_ms and sync_ms are useful.  Given values of the
stats counters at time A and time B, you can calculate what percentage
of that time was spent in the write phase of some checkpoint, the sync
phase of some checkpoint, and not in any checkpoint.  But I'm still
pretty sketchy on sync_files.  I can't see how you can do anything
useful with that.

 So, I'd ask Greg and/or Jaime to produce a revision of the patch with
 those concerns in mind, as well as fixing the md.c usage of
 log_checkpoints.

Seems we still are waiting for an update of this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-02-22 Thread Peter Geoghegan
On 16 January 2012 06:28, Greg Smith g...@2ndquadrant.com wrote:
 One of the most useful bits of feedback on how well checkpoint I/O is going
 is the amount of time taken to sync files to disk.  Right now the only way
 to get that is to parse the logs.  The attached patch publishes the most
 useful three bits of data you could only get from log_checkpoints before out
 to pg_stat_bgwriter.  Easiest to just show an example:

This is a preliminary review of the patch. Attached is a revision of
the patch that clears up some minor bit-rot and style issues, just as
a convenience to everyone else who might like to try it out against
master who doesn't want to repeat my work. There were oid collisions
against master, and an upstream change in the pg_proc format with the
introduction of the proleakproof column. Regression tests pass (as
they must have in the prior revision, as the original authors modified
expected/rules.out). Note that I have deliberately avoided changing
the patch further than that, though I believe that further changes
will be necessary.

One beef that I have with the variable name m_write_ms is that ms
could equally well refer to microseconds or milliseconds, and these
mistakes are very common. Therefore, you might think about making the
fields of type instr_time, which is consistent with some other such
fields used within the statistics collector - this takes care of
issues with overflows, portability and confusion over units, to as
great an extent as is possible. I'm not even confident that this is
the right thing though, because PgStat_StatFuncEntry, for example,
*does* use PgStat_Countuer as the type for time values, in its case
apparently *microseconds* rather than milliseconds, even though the
pg_stat_user_functions view displays values in millisecond (it's
incorrectly commented). Even the use of instr_time within some
pgstat.h structs sees the value converted to microseconds for
transmission to the collector. It might be that since we don't really
accumulate time values here, it's better to just directly represent
that value as PgStat_Counter, but in that case I'd still clearly mark
the value as being milliseconds. In any case, I would have the columns
in milliseconds, but not ending with *_ms  within pg_stat_bgwriter,
for consistency with pg_stat_user_functions et al.

There is no reason to cast to uint64 when accumulating milliseconds,
as the lvalue is currently a PgStat_Counter.

Details of existing comment bug: The docs show that
pg_stat_*_functions accumulates time in units of milliseconds.
However, a PgStat_FunctionEntry payload is commented as sending the
time/self time to the stats collector in microseconds. So this
comment, in the existing code, is actually wrong:

PgStat_Counter f_time;  /* times in microseconds */
PgStat_Counter f_time_self;
} PgStat_StatFuncEntry;

Can someone commit a fix for that please? It could easily result in a
user-visible bug. Mixing units, and representing them as simple
integers and totally eschewing type safety is sort of a pet peeve of
mine, but consistency with the existing code is more important.

My general impression of the patch's code quality is that it is more
or less idomatic, and, as Greg said, simple; it generalises from the
style of existing code in LogCheckpointEnd(), as well as existing
fields within the PgStat_MsgBgWriter and PgStat_GlobalStats structs .

As for the substance of the patch, I am in general agreement that this
is a good idea. Storing the statistics in pg_stat_bgwriter is a more
flexible approach than is immediately apparent.  Users can usefully
calculate delta values, as part of the same process through which
checkpoint tuning is performed by comparing output from select now(),
* from pg_stat_bgwriter at different intervals. This also puts this
information within easy reach of monitoring tools.

So, I'd ask Greg and/or Jaime to produce a revision of the patch with
those concerns in mind, as well as fixing the md.c usage of
log_checkpoints. That variable is currently used in these places only:

src/backend/access/transam/xlog.c
80:bool log_checkpoints = false;
7694:* Note: because it is possible for log_checkpoints to change while a
7696:* log_checkpoints is currently off.
7835:   if (log_checkpoints)
8035:   if (log_checkpoints)
8229:* Note: because it is possible for log_checkpoints to change while a
8231:* log_checkpoints is currently off.
8236:   if (log_checkpoints)
8301:   if (log_checkpoints)
8305:   ereport((log_checkpoints ? LOG : DEBUG2),

src/backend/storage/smgr/md.c
1097:   if (log_checkpoints)
1105:   if (log_checkpoints  
(!INSTR_TIME_IS_ZERO(sync_start)))

I also agree with separating backend write times due to lagging
background writes, and the backend writes by design, as described by
Jeff. I'm not inclined to worry too much about the usability issues
surrounding the relative values of things like 

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-23 Thread Greg Smith

Robert Haas wrote:

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.



+1 for separating those.
  


I'm tied up with some on-site things until Friday, can rev the patch 
with this in mind (and clear some of my overall review work pile) then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-22 Thread Greg Smith

Jeff Janes wrote:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes.  That is a pretty limiting limit.
  


I don't think it's quite that bad in general; this presumes a moderate 
amount of BAS writes relative to other activity.  But I understand your 
concern better now.  I don't think the sorts of workloads you seem to 
have a lot of were considered very carefully before here.



I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.
  


We can't not report them.  One of the goals of pg_stat_bgwriter is to 
account for all writes out of the buffer cache.  If there enough BAS 
writes on your system that them being lumped together is a serious 
problem, having them go missing altogether would be even worse.  And any 
whacking around of pg_stat_bgwriter might as well fix that too.


Do you think you could put together a quick test case that's similar to 
the ones you're seeing unfair accounting for here?  This isn't quite 
buggy behavior, but a solid example I could test against showing it's a 
sketchy approach would be enough for me to incorporate a fix for it into 
this suggested redesign.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-22 Thread Robert Haas
On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm finding the backend_writes column pretty unfortunate.  The only
 use I know of for it is to determine if the bgwriter is lagging
 behind.  Yet it doesn't serve even this purpose because it lumps
 together the backend writes due to lagging background writes, and the
 backend writes by design due to the use buffer access strategy
 during bulk inserts.

+1 for separating those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-21 Thread Jeff Janes
On Sun, Jan 15, 2012 at 10:28 PM, Greg Smith g...@2ndquadrant.com wrote:

 I would like to be able to tell people they need never turn on
 log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets
 a good precedent for what direction the database is going.  It would be nice
 for pg_stat_bgwriter's format to settle down for a long period too.

While on the topic of the format of pg_stat_bgwriter and it converging
on stability:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes by design due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes.  That is a pretty limiting limit.

I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.

I know you've argued against this before
(http://archives.postgresql.org/pgsql-performance/2011-03/msg00340.php),
but I think the assumption that all writes are absorbed by the OS
without blocking is assuming an awful lot.  If the server is not
behaving well, then it very well may be blocking on writes.  If we are
confident that the OS can always efficiently cache writes, why bother
with a background writer at all?

I think the argument that it is not important to know which strategy
motivated a backend write could just as validly be reformulated as an
argument that we don't need to know how many backend writes there were
in the first place.

Knowing only an aggregation of BAS-motivated backend writes and
straggling-bgwriter-motivated backend writes just doesn't seem useful
to me.  Am I missing a use for it?  Not only is it probably not
useful, it seems to be actively harmful as I've seen several times
where it leads people down false paths (including me several times, as
about once a year I forget why it is useless and spend time
rediscovering it).

If we are not willing to separate the two types of backend writes, or
just omit the BAS ones altogether, then I would suggest we drop the
column entirely.

Would a patch to separate the backend writes into two columns be
entertained for 9.3?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Hitoshi Harada
On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 01/16/2012 01:28 AM, Greg Smith wrote:

 -I can't tell for sure if this is working properly when log_checkpoints is
 off.  This now collects checkpoint end time data in all cases, whereas
 before it ignored that work if log_checkpoints was off.


 ...and there's at least one I missed located already:  inside of md.c.  I'd
 forgotten how many spots where timing calls are optimized out are floating
 around this code path.

I think CF app page for this patch has missing link with wrong message-id.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Alvaro Herrera

Excerpts from Hitoshi Harada's message of jue ene 19 07:08:52 -0300 2012:
 On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote:
  On 01/16/2012 01:28 AM, Greg Smith wrote:
 
  -I can't tell for sure if this is working properly when log_checkpoints is
  off.  This now collects checkpoint end time data in all cases, whereas
  before it ignored that work if log_checkpoints was off.
 
 
  ...and there's at least one I missed located already:  inside of md.c.  I'd
  forgotten how many spots where timing calls are optimized out are floating
  around this code path.
 
 I think CF app page for this patch has missing link with wrong message-id.

Fixed, thanks

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Robert Haas
On Mon, Jan 16, 2012 at 1:28 AM, Greg Smith g...@2ndquadrant.com wrote:
 One of the most useful bits of feedback on how well checkpoint I/O is going
 is the amount of time taken to sync files to disk.  Right now the only way
 to get that is to parse the logs.  The attached patch publishes the most
 useful three bits of data you could only get from log_checkpoints before out
 to pg_stat_bgwriter.

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be what percentage of my
total sync time is accounted for by the longest sync?.  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Simon Riggs
On Thu, Jan 19, 2012 at 3:52 PM, Robert Haas robertmh...@gmail.com wrote:

 And, it doesn't seem like it's necessarily going to safe me a whole
 lot either, because if it turns out that my sync phases are long, the
 first question out of my mouth is going to be what percentage of my
 total sync time is accounted for by the longest sync?.  And so right
 there I'm back to the logs.

It seems clear that the purpose of this is to quickly and easily
decide whether there is a potential problem.

If you decide there is a potential problem, you may wish to look at
more detailed information.

So there is a huge time saving from avoiding the need to look at the
detail when its unnecessary and possibly not even enabled.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Greg Smith

On 01/19/2012 10:52 AM, Robert Haas wrote:

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?


Right--they get updated in the same atomic bump that moves up things 
like buffers_checkpoint



Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.


Priority #1 here is to reduce (but, admittedly, not always eliminate) 
the need for log file parsing of this particular area, so including all 
the major bits from the existing log message that can be published this 
way would include the write phase time.  You mentioned one reason why 
the write phase time might be interesting; there could be others.  One 
of the things expected here is that Munin will expand its graphing of 
values from pg_stat_bgwriter to include all these fields.  Most of the 
time the graph of time spent in the write phase will be boring and 
useless.  Making it easy for a look at a graph to spot those rare times 
when it isn't is one motivation for including it.


As for why to include the number of files being sync'd, one reason is 
again simply wanting to include everything that can easily be 
published.  A second is that it helps support ideas like my Checkpoint 
sync pause one; that's untunable in any reasonable way without some 
easy way of monitoring the number of files typically sync'd.  Sometimes 
when I'm investigating checkpoint spikes during sync, I wonder whether 
they were because more files than usual were synced, or if it's instead 
just because of more churn on a smaller number.  Making this easy to 
graph pulls that data out to where I can compare it with disk I/O 
trends.  And there's precedent now proving that an always incrementing 
number in pg_stat_bgwriter can be turned into such a graph easily by 
monitoring tools.



And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be what percentage of my
total sync time is accounted for by the longest sync?.  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.


This isn't ideal yet.  I mentioned how some future performance event 
logging history collector was really needed as a place to push longest 
sync times into, and we don't have it yet.  This is the best thing to 
instrument that I'm sure is useful, and that I can stick onto with the 
existing infrastructure.


The idea is that this change makes it possible to trigger a sync times 
are too long alert out of a tool that's based solely on database 
queries.  When that goes off, yes you're possibly back to the logs again 
for more details about the longest individual sync time.  But the rest 
of the time, what's hopefully the normal state of things, you can ignore 
the logs and just track the pg_stat_bgwriter numbers.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-15 Thread Greg Smith

On 01/16/2012 01:28 AM, Greg Smith wrote:
-I can't tell for sure if this is working properly when 
log_checkpoints is off.  This now collects checkpoint end time data in 
all cases, whereas before it ignored that work if log_checkpoints was off.


...and there's at least one I missed located already:  inside of md.c.  
I'd forgotten how many spots where timing calls are optimized out are 
floating around this code path.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers