Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-24 Thread Bruce Momjian

Applied.

---

Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   OK, I have attached a proposed patch to improve this.  I moved the
   pg_clog mention to a new paragraph and linked it to the reason the
   default is relatively low.
  
  The references to vacuum freeze are incorrect; autovacuum does NOT
  do the equivalent of VACUUM FREEZE.  Please stop playing around with
  the perfectly good existing wording.
 
 Uh, so VACUUM FREEZE unconditionally freezes all rows, while vacuum just
 freezes rows who's xid is older than vacuum_freeze_min_age?  I saw that
 in our current docs in reference to VACUUM FREEZE:
 
   Selects aggressive freezing of tuples. Specifying FREEZE is
   equivalent to performing VACUUM with the vacuum_freeze_min_age
   parameter set to zero. The FREEZE option is deprecated and
   will be removed in a future release; set the parameter instead.
 
 Updated patch attached.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +


 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010:

 Well, the reason that value is 200 million is for pg_clog cleanup, not
 for xid wraparound protection.  The next sentence does relate to xid
 wraparound, but it seems to fit because the previous sentence ends with
 xid wraparound:
 
 Note that the system will launch autovacuum processes to
 prevent wraparound even when autovacuum is otherwise disabled.
 
 If we were worried about just xid wraparound I assume the value would be
 2 billion.
 
 Do you have a suggestion?  Reorder the items?

I'd add another para before that one saying that this value also
affects pg_clog truncation.  I agree that putting pg_clog truncation as
the first item here is not an improvement.  For most people, having
those pg_clog files there or not is going to be a wash, compared to data
size.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010:
 Do you have a suggestion?  Reorder the items?

 I'd add another para before that one saying that this value also
 affects pg_clog truncation.  I agree that putting pg_clog truncation as
 the first item here is not an improvement.  For most people, having
 those pg_clog files there or not is going to be a wash, compared to data
 size.

I was going to suggest that the point about pg_clog should be in a
separate paragraph *after* this one, since it seems like a secondary
issue.  But anyway, I agree with putting this para back as it was and
talking about clog in a separate para.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 23 12:40:32 -0400 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010:
  Do you have a suggestion?  Reorder the items?
 
  I'd add another para before that one saying that this value also
  affects pg_clog truncation.  I agree that putting pg_clog truncation as
  the first item here is not an improvement.  For most people, having
  those pg_clog files there or not is going to be a wash, compared to data
  size.
 
 I was going to suggest that the point about pg_clog should be in a
 separate paragraph *after* this one, since it seems like a secondary
 issue.  But anyway, I agree with putting this para back as it was and
 talking about clog in a separate para.

Sorry, yes, I was also thinking after.  I don't know what made me
write before but it wasn't clarity of thought.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Josh Berkus
All,

FYI, the system which sparked this original discussion turned out, on
extensive analysis, to have ZFS-level filesystem corruption.  The
polling issues were related to that rather than to Postgres.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Alvaro Herrera wrote:
 Excerpts from Tom Lane's message of lun ago 23 12:40:32 -0400 2010:
  Alvaro Herrera alvhe...@commandprompt.com writes:
   Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010:
   Do you have a suggestion?  Reorder the items?
  
   I'd add another para before that one saying that this value also
   affects pg_clog truncation.  I agree that putting pg_clog truncation as
   the first item here is not an improvement.  For most people, having
   those pg_clog files there or not is going to be a wash, compared to data
   size.
  
  I was going to suggest that the point about pg_clog should be in a
  separate paragraph *after* this one, since it seems like a secondary
  issue.  But anyway, I agree with putting this para back as it was and
  talking about clog in a separate para.
 
 Sorry, yes, I was also thinking after.  I don't know what made me
 write before but it wasn't clarity of thought.

OK, I have attached a proposed patch to improve this.  I moved the
pg_clog mention to a new paragraph and linked it to the reason the
default is relatively low.

Comments?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.307
diff -c -c -r1.307 config.sgml
*** doc/src/sgml/config.sgml23 Aug 2010 02:43:25 -  1.307
--- doc/src/sgml/config.sgml23 Aug 2010 18:55:02 -
***
*** 4150,4161 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ operation is forced to allow 
removal
! of old files from the filenamepg_clog/ subdirectory and prevent
! transaction ID wraparound within the table.  Note that the system
! will launch autovacuum processes to prevent wraparound even when
! autovacuum is otherwise disabled.
! The default is 200 million transactions.
  This parameter can only be set at server start, but the setting
  can be reduced for individual tables by
  changing storage parameters.
--- 4150,4165 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ freeze operation is forced
! to prevent transaction ID wraparound within the table.
! Note that the system will launch autovacuum processes to
! prevent wraparound even when autovacuum is otherwise disabled.
!/para
! 
!para
! Vacuum freeze also allows removal of old files from the
! filenamepg_clog/ subdirectory, which is why the default
! is a relatively low 200 million transactions.
  This parameter can only be set at server start, but the setting
  can be reduced for individual tables by
  changing storage parameters.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 OK, I have attached a proposed patch to improve this.  I moved the
 pg_clog mention to a new paragraph and linked it to the reason the
 default is relatively low.

The references to vacuum freeze are incorrect; autovacuum does NOT
do the equivalent of VACUUM FREEZE.  Please stop playing around with
the perfectly good existing wording.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of lun ago 23 14:55:55 -0400 2010:

 OK, I have attached a proposed patch to improve this.  I moved the
 pg_clog mention to a new paragraph and linked it to the reason the
 default is relatively low.
 
 Comments?

I think the new para doesn't make much sense, in context.  Why does it
say freeze?  How can we expect users to understand how that is
related to this parameter?

 --- 4150,4165 
  para
   Specifies the maximum age (in transactions) that a table's
   structnamepg_class/.structfieldrelfrozenxid/ field can
 ! attain before a commandVACUUM/ freeze operation is forced
 ! to prevent transaction ID wraparound within the table.
 ! Note that the system will launch autovacuum processes to
 ! prevent wraparound even when autovacuum is otherwise disabled.
 !/para
 ! 
 !para
 ! Vacuum freeze also allows removal of old files from the
 ! filenamepg_clog/ subdirectory, which is why the default
 ! is a relatively low 200 million transactions.
   This parameter can only be set at server start, but the setting
   can be reduced for individual tables by
   changing storage parameters.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  OK, I have attached a proposed patch to improve this.  I moved the
  pg_clog mention to a new paragraph and linked it to the reason the
  default is relatively low.
 
 The references to vacuum freeze are incorrect; autovacuum does NOT
 do the equivalent of VACUUM FREEZE.  Please stop playing around with
 the perfectly good existing wording.

Uh, so VACUUM FREEZE unconditionally freezes all rows, while vacuum just
freezes rows who's xid is older than vacuum_freeze_min_age?  I saw that
in our current docs in reference to VACUUM FREEZE:

Selects aggressive freezing of tuples. Specifying FREEZE is
equivalent to performing VACUUM with the vacuum_freeze_min_age
parameter set to zero. The FREEZE option is deprecated and
will be removed in a future release; set the parameter instead.

Updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.307
diff -c -c -r1.307 config.sgml
*** doc/src/sgml/config.sgml	23 Aug 2010 02:43:25 -	1.307
--- doc/src/sgml/config.sgml	23 Aug 2010 18:55:02 -
***
*** 4150,4161 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ operation is forced to allow removal
! of old files from the filenamepg_clog/ subdirectory and prevent
! transaction ID wraparound within the table.  Note that the system
! will launch autovacuum processes to prevent wraparound even when
! autovacuum is otherwise disabled.
! The default is 200 million transactions.
  This parameter can only be set at server start, but the setting
  can be reduced for individual tables by
  changing storage parameters.
--- 4150,4165 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ operation is forced
! to prevent transaction ID wraparound within the table.
! Note that the system will launch autovacuum processes to
! prevent wraparound even when autovacuum is otherwise disabled.
!/para
! 
!para
! Vacuum also allows removal of old files from the
! filenamepg_clog/ subdirectory, which is why the default
! is a relatively low 200 million transactions.
  This parameter can only be set at server start, but the setting
  can be reduced for individual tables by
  changing storage parameters.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Alvaro Herrera wrote:
 Excerpts from Bruce Momjian's message of lun ago 23 14:55:55 -0400 2010:
 
  OK, I have attached a proposed patch to improve this.  I moved the
  pg_clog mention to a new paragraph and linked it to the reason the
  default is relatively low.
  
  Comments?
 
 I think the new para doesn't make much sense, in context.  Why does it
 say freeze?  How can we expect users to understand how that is
 related to this parameter?

I have removed the freeze mention per Tom's comment and posted an
updated version that removes the 'freeze' wording.  Are there other
changes needed?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Rob Wultsch
For a documentation patch should this not be back ported to all
relevant versions?

On 8/21/10, Bruce Momjian br...@momjian.us wrote:
 Josh Berkus wrote:

  On further reflection, though: since we put in the BufferAccessStrategy
  code, which was in 8.3, the background writer isn't *supposed* to be
  very much involved in writing pages that are dirtied by VACUUM.  VACUUM
  runs in a small ring of buffers and is supposed to have to clean its own
  dirt most of the time.  So it's wrong to blame this on the bgwriter not
  holding up its end.  Rather, what you need to be thinking about is how
  come vacuum seems to be making lots of pages dirty on only one of these
  machines.

 This is an anti-wraparound vacuum, so it could have something to do with
 the hint bits.  Maybe it's setting the freeze bit on every page, and
 writing them one page at a time?  Still don't understand the call to
 pollsys, even so, though.

 We often mention that we do vacuum freeze for anti-wraparound vacuum,
 but not for pg_clog file removal, which is the primary trigger for
 autovacuum vacuum freezing.  I have added the attached documentation
 patch for autovacuum_freeze_max_age;  back-patched to 9.0.

 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +



-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Bruce Momjian
Rob Wultsch wrote:
 For a documentation patch should this not be back ported to all
 relevant versions?

It is only a minor adjustment and I normally don't backpatch that.

---


 
 On 8/21/10, Bruce Momjian br...@momjian.us wrote:
  Josh Berkus wrote:
 
   On further reflection, though: since we put in the BufferAccessStrategy
   code, which was in 8.3, the background writer isn't *supposed* to be
   very much involved in writing pages that are dirtied by VACUUM.  VACUUM
   runs in a small ring of buffers and is supposed to have to clean its own
   dirt most of the time.  So it's wrong to blame this on the bgwriter not
   holding up its end.  Rather, what you need to be thinking about is how
   come vacuum seems to be making lots of pages dirty on only one of these
   machines.
 
  This is an anti-wraparound vacuum, so it could have something to do with
  the hint bits.  Maybe it's setting the freeze bit on every page, and
  writing them one page at a time?  Still don't understand the call to
  pollsys, even so, though.
 
  We often mention that we do vacuum freeze for anti-wraparound vacuum,
  but not for pg_clog file removal, which is the primary trigger for
  autovacuum vacuum freezing.  I have added the attached documentation
  patch for autovacuum_freeze_max_age;  back-patched to 9.0.
 
  --
Bruce Momjian  br...@momjian.ushttp://momjian.us
EnterpriseDB http://enterprisedb.com
 
+ It's impossible for everything to be true. +
 
 
 
 -- 
 Rob Wultsch
 wult...@gmail.com

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 We often mention that we do vacuum freeze for anti-wraparound vacuum,
 but not for pg_clog file removal, which is the primary trigger for
 autovacuum vacuum freezing.  I have added the attached documentation
 patch for autovacuum_freeze_max_age;  back-patched to 9.0.

This patch does not actually seem like an improvement.  The paragraph is
all about transaction age, but you've inserted something entirely
unrelated, and not only that but chosen to make the unrelated thing seem
like the major consequence and anti-wraparound an afterthought.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  We often mention that we do vacuum freeze for anti-wraparound vacuum,
  but not for pg_clog file removal, which is the primary trigger for
  autovacuum vacuum freezing.  I have added the attached documentation
  patch for autovacuum_freeze_max_age;  back-patched to 9.0.
 
 This patch does not actually seem like an improvement.  The paragraph is
 all about transaction age, but you've inserted something entirely
 unrelated, and not only that but chosen to make the unrelated thing seem
 like the major consequence and anti-wraparound an afterthought.

Well, the reason that value is 200 million is for pg_clog cleanup, not
for xid wraparound protection.  The next sentence does relate to xid
wraparound, but it seems to fit because the previous sentence ends with
xid wraparound:

Note that the system will launch autovacuum processes to
prevent wraparound even when autovacuum is otherwise disabled.

If we were worried about just xid wraparound I assume the value would be
2 billion.

Do you have a suggestion?  Reorder the items?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-21 Thread Bruce Momjian
Josh Berkus wrote:
 
  On further reflection, though: since we put in the BufferAccessStrategy
  code, which was in 8.3, the background writer isn't *supposed* to be
  very much involved in writing pages that are dirtied by VACUUM.  VACUUM
  runs in a small ring of buffers and is supposed to have to clean its own
  dirt most of the time.  So it's wrong to blame this on the bgwriter not
  holding up its end.  Rather, what you need to be thinking about is how
  come vacuum seems to be making lots of pages dirty on only one of these
  machines.
 
 This is an anti-wraparound vacuum, so it could have something to do with
 the hint bits.  Maybe it's setting the freeze bit on every page, and
 writing them one page at a time?  Still don't understand the call to
 pollsys, even so, though.

We often mention that we do vacuum freeze for anti-wraparound vacuum,
but not for pg_clog file removal, which is the primary trigger for
autovacuum vacuum freezing.  I have added the attached documentation
patch for autovacuum_freeze_max_age;  back-patched to 9.0.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.304
diff -c -c -r1.304 config.sgml
*** doc/src/sgml/config.sgml	17 Aug 2010 04:37:20 -	1.304
--- doc/src/sgml/config.sgml	22 Aug 2010 02:30:39 -
***
*** 4150,4156 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ operation is forced to prevent
  transaction ID wraparound within the table.  Note that the system
  will launch autovacuum processes to prevent wraparound even when
  autovacuum is otherwise disabled.
--- 4150,4157 
 para
  Specifies the maximum age (in transactions) that a table's
  structnamepg_class/.structfieldrelfrozenxid/ field can
! attain before a commandVACUUM/ operation is forced to allow removal
! of old files from the filenamepg_clog/ subdirectory and prevent
  transaction ID wraparound within the table.  Note that the system
  will launch autovacuum processes to prevent wraparound even when
  autovacuum is otherwise disabled.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 What I find interesting about that trace is the large proportion of
 writes.  That appears to me to indicate that it's *not* a matter of
 vacuum delays, or at least not just a matter of that.  The process seems
 to be getting involved in having to dump dirty buffers to disk.  Perhaps
 the background writer is malfunctioning?

You appear to be correct in that it's write-related.  Will be testing on
what specificially is producing it.

Note that this is one of two ostensibly duplicate servers, and the issue
has never appeared on the other server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 What I find interesting about that trace is the large proportion of
 writes.  That appears to me to indicate that it's *not* a matter of
 vacuum delays, or at least not just a matter of that.  The process seems
 to be getting involved in having to dump dirty buffers to disk.  Perhaps
 the background writer is malfunctioning?

 You appear to be correct in that it's write-related.  Will be testing on
 what specificially is producing it.

 Note that this is one of two ostensibly duplicate servers, and the issue
 has never appeared on the other server.

On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM.  VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time.  So it's wrong to blame this on the bgwriter not
holding up its end.  Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 On further reflection, though: since we put in the BufferAccessStrategy
 code, which was in 8.3, the background writer isn't *supposed* to be
 very much involved in writing pages that are dirtied by VACUUM.  VACUUM
 runs in a small ring of buffers and is supposed to have to clean its own
 dirt most of the time.  So it's wrong to blame this on the bgwriter not
 holding up its end.  Rather, what you need to be thinking about is how
 come vacuum seems to be making lots of pages dirty on only one of these
 machines.

This is an anti-wraparound vacuum, so it could have something to do with
the hint bits.  Maybe it's setting the freeze bit on every page, and
writing them one page at a time?  Still don't understand the call to
pollsys, even so, though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Rather, what you need to be thinking about is how
 come vacuum seems to be making lots of pages dirty on only one of these
 machines.

 This is an anti-wraparound vacuum, so it could have something to do with
 the hint bits.  Maybe it's setting the freeze bit on every page, and
 writing them one page at a time?

That would explain all the writes, but it doesn't seem to explain why
your two servers aren't behaving similarly.

 Still don't understand the call to pollsys, even so, though.

Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay.  I'm still suspicious that the writes are eating
more cost_delay points than you think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 
 This is an anti-wraparound vacuum, so it could have something to
 do with the hint bits.  Maybe it's setting the freeze bit on
 every page, and writing them one page at a time?
 
 That would explain all the writes, but it doesn't seem to explain
 why your two servers aren't behaving similarly.
 
One was bulk-loaded from the other, or they were bulk-loaded at
different times?  Or one had some other activity that boosted the
xid count, possibly in another database?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 That would explain all the writes, but it doesn't seem to explain why
 your two servers aren't behaving similarly.

Well, that's why I said ostensibly identical.  There may in fact be
differences, not just in the databases but in some OS libs as well.
These servers have been in production for quite a while, and the owner
has a messy deployment process.

 Most likely that's the libc implementation of the select()-based sleeps
 for vacuum_cost_delay.  I'm still suspicious that the writes are eating
 more cost_delay points than you think.

Tested that.  It does look like if I increase vacuum_cost_limit to 1
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys.  The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
a vacuum without cost_delay and make sure the pollsys() are connected to
the cost delay and not something else.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus

 Tested that.  It does look like if I increase vacuum_cost_limit to 1
 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
 2-3 before each pollsys.  The math seems completely wrong on that,
 though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
 a vacuum without cost_delay and make sure the pollsys() are connected to
 the cost delay and not something else.

Hmmm.  Looks like, at least in 8.3, running a manual vacuum on a table
doesn't prevent anti-wraparound vacuum from restarting.   So I can't do
any further testing until we can restart the server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Most likely that's the libc implementation of the select()-based sleeps
 for vacuum_cost_delay.  I'm still suspicious that the writes are eating
 more cost_delay points than you think.

 Tested that.  It does look like if I increase vacuum_cost_limit to 1
 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
 2-3 before each pollsys.  The math seems completely wrong on that,
 though -- it should be 50 and 30 pages, or similar.

I think there could be a lot of cost_delay points getting expended
without any effects visible at the level of strace.  Maybe try fooling
with vacuum_cost_page_hit and vacuum_cost_page_miss, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus
All,

This is something I'd swear we fixed around 8.3.2. However, I'm seeing
it again in production, and was wondering if anyone could remember what
the root cause was and how we fixed it.

The problem is that sometimes (but not the majority of times) autovaccum
with cost_delay is going into a pathological cycle where it polls the
system clock after reading every single disk page of a table. On large
tables, this results in vacuum not completing within the lifetime of the
server.  In most cases, killing autovaccuum and restarting it will cause
it to behave normally.

The below is the truss from the exhibited issue on 8.3.11 on Solaris
10u7, compiled with sun cc:

pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
lseek(39, 0x28F88000, SEEK_SET) = 0x28F88000
write(39, 0E10\0\0E0 CB5C101\001\0.., 8192)   = 8192
lseek(39, 0x28FCA000, SEEK_SET) = 0x28FCA000
read(39,  q\r\0\0 `9CD2B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  F0E\0\090A888 H01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q\r\0\0C819D3B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
lseek(39, 0x28F9, SEEK_SET) = 0x28F9
write(39, 0E10\0\0 0 gB7C101\001\0.., 8192)   = 8192
lseek(39, 0x28FD, SEEK_SET) = 0x28FD
read(39,  q\r\0\0 X 8D3B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  t0F\0\0 H +8F !01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q\r\0\0F0 sD3B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  F0E\0\0 0C888 H01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
lseek(39, 0x28FDA000, SEEK_SET) = 0x28FDA000
read(39,  q\r\0\0C0D1D3B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q\r\0\0D8F0D3B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  F0E\0\0800189 H01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q0F\0\0D0 ^A9F701\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  F0E\0\010 ?89 H01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q\r\0\0 x mD4B001\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  F0E\0\0 X _89 H01\001\0.., 8192)= 8192
pollsys(0xFD7FFFDF9030, 0, 0xFD7FFFDF90C0, 0x) = 0
read(39,  q\r\0\0 @ADD4B001\001\0.., 8192)= 8192

For contrast, this is normal behavior:

read(10,  }\0\0\0 X82 E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\018 4 ME301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0E881 NE301\0\0\0.., 8192)= 8192
semop(16777221, 0xFD7FFFDF8FB8, 1)  = 0
read(10,  }\0\0\0 PEE \E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0\b k ^E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 8E0 jE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 P07 nE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0D885 xE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0  8D }E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 xD280E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\010DF8CE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0E09E8EE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0C8E29CE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\080889EE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0B0 UADE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0C0E4BCE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0   !C0E301\0\0\0.., 8192)= 8192
read(10,  }\0\0\010 UCDE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0F8EBCEE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\08092DDE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0A8 QDFE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 x cEDE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0D8 EFE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 P15FAE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0C8C0FDE301\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 PFC\tE401\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 8C3\rE401\0\0\0.., 8192)= 8192
read(10,  }\0\0\0 @890FE401\0\0\0.., 8192)= 8192
read(10,  }\0\0\0   r11E401\0\0\0.., 8192)= 8192


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Joe Conway
On 08/16/2010 11:24 AM, Josh Berkus wrote:
 All,
 
 This is something I'd swear we fixed around 8.3.2. However, I'm seeing
 it again in production, and was wondering if anyone could remember what
 the root cause was and how we fixed it.

I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
Sparc. Any chance you can get a backtrace from a build with debug symbols?

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus

 I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
 Sparc. Any chance you can get a backtrace from a build with debug symbols?

The problem is that we haven't been able to reproduce the bug in
testing.  Like I said, it only seems to happen occasionally ... like
maybe once in 10 or 20 (or more?) autovacuums.  We've never been seen it
with a manual vacuum at all.

And we can't rebuild the production servers.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 This is something I'd swear we fixed around 8.3.2. However, I'm seeing
 it again in production, and was wondering if anyone could remember what
 the root cause was and how we fixed it.

Hmm, I can't find anything in the 8.3-series CVS logs suggesting that
there was a post-8.3.0 fix related to vacuum delays.

 The problem is that sometimes (but not the majority of times) autovaccum
 with cost_delay is going into a pathological cycle where it polls the
 system clock after reading every single disk page of a table.

What I find interesting about that trace is the large proportion of
writes.  That appears to me to indicate that it's *not* a matter of
vacuum delays, or at least not just a matter of that.  The process seems
to be getting involved in having to dump dirty buffers to disk.  Perhaps
the background writer is malfunctioning?

Another idea that comes to mind is that you have vacuum_cost_page_dirty
set to an unreasonably large value, so that autovac is blocking whenever
it has to write even one page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Joe Conway
On 08/16/2010 12:12 PM, Josh Berkus wrote:
 
 I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
 Sparc. Any chance you can get a backtrace from a build with debug symbols?
 
 The problem is that we haven't been able to reproduce the bug in
 testing.  Like I said, it only seems to happen occasionally ... like
 maybe once in 10 or 20 (or more?) autovacuums.  We've never been seen it
 with a manual vacuum at all.
 
 And we can't rebuild the production servers.

Hmmm, well I don't know how to reproduce it on demand either -- I'll try
to get a backtrace from the wild if possible. I'll keep you posted...

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Alvaro Herrera
Excerpts from Joe Conway's message of lun ago 16 16:47:19 -0400 2010:
 On 08/16/2010 12:12 PM, Josh Berkus wrote:
  
  I've also recently heard a report of vacuum hanging on 8.3.x on Solaris
  Sparc. Any chance you can get a backtrace from a build with debug symbols?
  
  The problem is that we haven't been able to reproduce the bug in
  testing.  Like I said, it only seems to happen occasionally ... like
  maybe once in 10 or 20 (or more?) autovacuums.  We've never been seen it
  with a manual vacuum at all.
  
  And we can't rebuild the production servers.
 
 Hmmm, well I don't know how to reproduce it on demand either -- I'll try
 to get a backtrace from the wild if possible. I'll keep you posted...

FWIW there's also a report of it hanging in FreeBSD, but sadly when the
process is inspected under truss, it dies because of its parent PID
attribute changing underneath and thus triggering the safety feature
that makes it die if the parent postmaster disappears.

I suspect that the problem may lie in the cost_delay rebalance code in
autovacuum.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of lun ago 16 16:58:31 -0400 2010:

 I suspect that the problem may lie in the cost_delay rebalance code in
 autovacuum.

Hmm, so we have this code:

void
AutoVacuumUpdateDelay(void)
{
if (MyWorkerInfo)
{
VacuumCostDelay = MyWorkerInfo-wi_cost_delay;
VacuumCostLimit = MyWorkerInfo-wi_cost_limit;
}
}

where the MyWorkerInfo bits come from shared memory and can be modified
by other autovac worker processes.  We could read an incomplete value
into our variables.  But this only makes sense if an int variable can
be subject to a partial read/write, which we already assume not to be so
(c.f. GetNewTransactionId).

In any case, if you happen to see this reoccur, could you please attach
GDB to the misbehaving worker and see what VacuumCostDelay and
VacuumCostLimit print out as?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus

 Another idea that comes to mind is that you have vacuum_cost_page_dirty
 set to an unreasonably large value, so that autovac is blocking whenever
 it has to write even one page.

Nope.  Default.  And total cost was raised to 1000.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers