[PERFORM] stats collector suddenly causing lots of IO

2010-04-20 Thread Chris
I have a lot of centos servers which are running postgres.  Postgres isn't used
that heavily on any of them, but lately, the stats collector process keeps
causing tons of IO load.  It seems to happen only on servers with centos 5.
The versions of postgres that are running are:

8.1.18
8.2.6
8.3.1
8.3.5
8.3.6
8.3.7
8.3.8
8.3.9
8.4.2
8.4.3

I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf
except track_counts (since auto vacuum says it needs it), but it seems to have
little affect on the IO caused by the stats collector.

Has anyone else noticed this?  Have there been recent kernel changes
that could cause this that anyone knows about?  Since we haven't touched
postgres on these boxes since they were setup initially, I'm a bit baffled as
to what might be causing the problem, and why I can't make it go away short of
kill -STOP.

Any suggestions would be much appreciated!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Thu, Apr 15, 2010 at 6:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Chris li...@deksai.com writes:
 I have a lot of centos servers which are running postgres.  Postgres isn't 
 used
 that heavily on any of them, but lately, the stats collector process keeps
 causing tons of IO load.  It seems to happen only on servers with centos 5.

 Say, I just realized that both of you are complaining about stats
 collector overhead on centos 5 servers.  I hadn't been thinking in terms
 of OS-specific causes, but maybe that is what we need to consider.
 Can you tell me the exact kernel versions you are seeing these problems
 with?

uname -a says ... 2.6.18-92.1.13.el5 #1 SMP ... x86_64, and it's CentOS 5.2.

I'm not sure whether this is related to the stats collector problems
on this machine, but I noticed alarming table bloat in the catalog
tables pg_attribute, pg_attrdef, pg_depend, and pg_type. Perhaps this
has happened slowly over the past few months, but I discovered the
bloat when I ran the query from:
http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html

on the most-active database on this server (OID 16389 from the
pgstat.stat I sent in). See attached table_bloat.txt. The autovacuum
settings for this server haven't been tweaked from the default; they
probably should have been, given the heavy bulk updates/inserts done.
Maybe there's another cause for this extreme catalog bloat, besides
the weak autovacuum settings, though.

Table sizes, according to pg_size_pretty(pg_total_relation_size(...)):
 * pg_attribute: 145 GB
 * pg_attrdef: 85 GB
 * pg_depend: 38 GB
 * pg_type: 3465 MB

I'll try to send in strace outputs later today.

Josh
  schemaname   |   tablename   | reltuples | relpages |  otta  |  
tbloat  | wastedpages | wastedbytes | wastedsize | iname
  | ituples  | ipages  | iotta  | ibloat  | wastedipages | 
wastedibytes | wastedisize 
---+---+---+--++--+-+-+++--+-++-+--+--+-
 pg_catalog| pg_attribute  | 0 | 12178069 |  0 |  
0.0 |12178069 | 99762741248 | 93 GB  | pg_attribute_relid_attnam_index  
  |0 | 4525199 |  0 | 0.0 |  4525199 |  
37070430208 | 35 GB
 pg_catalog| pg_attribute  | 0 | 12178069 |  0 |  
0.0 |12178069 | 99762741248 | 93 GB  | pg_attribute_relid_attnum_index  
  |0 |  848842 |  0 | 0.0 |   848842 |   
6953713664 | 6632 MB
 pg_catalog| pg_attrdef|   767 |  8789472 | 30 | 
292982.4 | 8789442 | 72003108864 | 67 GB  | pg_attrdef_oid_index
   |  767 |  361513 | 28 | 12911.2 |   361485 |   
2961285120 | 2824 MB
 pg_catalog| pg_attrdef|   767 |  8789472 | 30 | 
292982.4 | 8789442 | 72003108864 | 67 GB  | 
pg_attrdef_adrelid_adnum_index |  767 |  359805 | 28 | 
12850.2 |   359777 |   2947293184 | 2811 MB
snip one bloated user table
 pg_catalog| pg_depend | 14270 |  1890691 | 98 |  
19292.8 | 1890593 | 15487737856 | 14 GB  | pg_depend_depender_index 
  |14270 | 1649557 | 65 | 25377.8 |  1649492 |  
13512638464 | 13 GB
 pg_catalog| pg_depend | 14270 |  1890691 | 98 |  
19292.8 | 1890593 | 15487737856 | 14 GB  | pg_depend_reference_index
  |14270 | 1371522 | 65 | 21100.3 |  1371457 |  
11234975744 | 10 GB
snip some more user tables
 pg_catalog| pg_type   |  1333 |   385445 |  31 |  
12433.7 |  385414 |  3157311488 | 3011 MB| pg_type_oid_index
  | 1333 |   54965 | 24 |  2290.2 |54941 |
450076672 | 429 MB
 pg_catalog| pg_type   |  1333 |   385445 |  31 |  
12433.7 |  385414 |  3157311488 | 3011 MB| pg_type_typname_nsp_index
  | 1333 |3129 | 24 |   130.4 | 3105 | 
25436160 | 24 MB
snip


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 I'm not sure whether this is related to the stats collector problems
 on this machine, but I noticed alarming table bloat in the catalog
 tables pg_attribute, pg_attrdef, pg_depend, and pg_type.

Hmm.  That makes me wonder if autovacuum is functioning properly at all.
What does pg_stat_all_tables show for the last vacuum and analyze times
of those tables?  Try something like

select 
relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
 from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1;

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Chris li...@deksai.com writes:
 After the file was made larger and I stopped the vacuum process, I started
 seeing the problem.  All other postgress processes were quiet, but the stats
 collector was constantly causing anywhere from 20-60 of the IO on the server.
 Since all the other postgres processes weren't really doing anything, and it 
 is
 a busy web server which is predominately MySQL, I'm fairly curious as to what
 it is doing.

Yeah, the stats collector rewrites the stats file every half second, if
there have been any changes since last time --- so the bigger the file,
the more overhead.  (8.4 is smarter about this, but that doesn't help
you on 8.3.)

 I straced the stats collector process.  I wasn't sure what else to trace as
 there wasn't a single other postgres process doing anything.

That strace doesn't really prove much; it's what I'd expect.  Here's
what to do: start a PG session, and strace that session's backend *and*
the stats collector while you manually do VACUUM some-small-table.
The VACUUM command should try to send some messages to the stats collector
process.  I'm wondering if those get dropped somehow.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Fri, Apr 16, 2010 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Kupershmidt schmi...@gmail.com writes:
 I'm not sure whether this is related to the stats collector problems
 on this machine, but I noticed alarming table bloat in the catalog
 tables pg_attribute, pg_attrdef, pg_depend, and pg_type.

 Hmm.  That makes me wonder if autovacuum is functioning properly at all.
 What does pg_stat_all_tables show for the last vacuum and analyze times
 of those tables?  Try something like

 select 
 relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
  from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1;


Output attached. Note that I ran pg_stat_reset() a few days ago.
Josh
# select 
relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
 from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1;
 relname  | n_live_tup | n_dead_tup | last_vacuum |
last_autovacuum| last_analyze |   last_autoanalyze
--+++-+---+--+---
 pg_aggregate |  0 |  0 | | 
  |  | 
 pg_am|  0 |  0 | | 
  |  | 
 pg_amop  |  0 |  0 | | 
  |  | 
 pg_amproc|  0 |  0 | | 
  |  | 
 pg_attrdef   |  2 |3527675 | | 
  |  | 
 pg_attribute | 56 |   12771286 | | 
  |  | 
 pg_auth_members  | 11 |  0 | | 
  |  | 
 pg_authid| 16 |  2 | | 
  |  | 
 pg_autovacuum|  0 |  0 | | 
  |  | 
 pg_cast  |  0 |  0 | | 
  |  | 
 pg_class |  30191 | 155416 | | 2010-04-15 
21:14:06.826358+00 |  | 2010-04-15 21:14:06.826358+00
 pg_constraint|   2435 |701 | | 2010-04-15 
16:13:33.92506+00  |  | 2010-04-15 16:13:33.92506+00
 pg_conversion|  0 |  0 | | 
  |  | 
 pg_database  |  0 |  0 | | 
  |  | 
 pg_depend| 55 |4961774 | | 
  |  | 
 pg_description   |  0 |  0 | | 
  |  | 
 pg_enum  |  0 |  0 | | 
  |  | 
 pg_index |   4603 |   2272 | | 2010-04-15 
16:18:21.289279+00 |  | 2010-04-15 16:18:21.289279+00
 pg_inherits  |  2 |  0 | | 
  |  | 
 pg_language  |  0 |  0 | | 
  |  | 
 pg_largeobject   |  0 |  0 | | 
  |  | 
 pg_listener  | 37 | 117512 | | 
  |  | 
 pg_namespace |  0 |  0 | | 
  |  | 
 pg_opclass   |  0 |  0 | | 
  |  | 
 pg_operator  |  0 |  0 | | 
  |  | 
 pg_opfamily  |  0 |  0 | | 
  |  | 
 pg_pltemplate|  0 |  0 | | 
  |  | 
 pg_proc  |  0 |  0 | | 
  |  | 
 pg_rewrite   |  4 |  0 | | 
  |  | 
 pg_shdepend  |717 | 172806 | | 2010-04-15 
16:56:26.296809+00 |  | 2010-04-15 16:56:26.296809+00
 pg_shdescription |  0 |  0 | | 
  |  | 
 pg_statistic |  0 |  0 | | 
  |  | 
 pg_tablespace|  0 |  0 | | 
  |  | 
 pg_trigger   |   7021 |   1232 | | 2010-04-15 
16:24:55.054294+00 |  

Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 On Fri, Apr 16, 2010 at 11:23 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm.  That makes me wonder if autovacuum is functioning properly at all.
 What does pg_stat_all_tables show for the last vacuum and analyze times
 of those tables?  Try something like
 
 select 
 relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze
  from pg_stat_all_tables where schemaname = 'pg_catalog' order by 1;

 Output attached. Note that I ran pg_stat_reset() a few days ago.

Wow.  Well, we have a smoking gun here: for some reason, autovacuum
isn't running, or isn't doing its job if it is.  If it's not running
at all, that would explain failure to prune the stats collector's file
too.

Is there anything in the postmaster log that would suggest autovac
difficulties?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 I made a small half-empty table like this:
   CREATE TABLE test_vacuum (i  int  PRIMARY KEY);
   INSERT INTO test_vacuum (i) SELECT a FROM generate_series(1,50) AS a;
   DELETE FROM test_vacuum WHERE RANDOM()  0.5;

 and then ran:
   VACUUM test_vacuum;

 while an strace of the stats collector process was running. Then after
 a few seconds, found the PID of the VACUUM process, and ran strace on
 it. I killed them after the VACUUM finished. Outputs attached.

Huh.  The VACUUM strace clearly shows a boatload of TABPURGE messages
being sent:

sendto(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0\324\206\24\321uC\24\320\350)\24\225\345,\24...,
 1000, 0, NULL, 0) = 1000
sendto(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0c\274?\24\365\323?\24\241n@\24\217\0309\24...,
 1000, 0, NULL, 0) = 1000
sendto(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0\375z2\24\211\f@\0241\3047\24\357mH\24..., 
1000, 0, NULL, 0) = 1000
sendto(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0\242\3529\24\234k\'\24\17\227)\24\300\22+\24...,
 1000, 0, NULL, 0) = 1000

and the stats collector is receiving them:

recvfrom(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0\324\206\24\321uC\24\320\350)\24\225\345,\24...,
 1000, 0, NULL, NULL) = 1000
recvfrom(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0c\274?\24\365\323?\24\241n@\24\217\0309\24...,
 1000, 0, NULL, NULL) = 1000
recvfrom(7, 
\2\0\0\0\350\3\0\...@\0\0\366\0\0\0\375z2\24\211\f@\0241\3047\24\357mH\24..., 
1000, 0, NULL, NULL) = 1000

So this *should* have resulted in the stats file shrinking.  Did you
happen to notice if it did, after you did this?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Wow.  Well, we have a smoking gun here: for some reason, autovacuum
 isn't running, or isn't doing its job if it is.  If it's not running
 at all, that would explain failure to prune the stats collector's file
 too.

Hrm, well autovacuum is at least trying to do work: it's currently
stuck on those bloated pg_catalog tables, of course. Another developer
killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
after it had been running for two weeks. See current pg_stat_activity
output attached, which shows the three autovacuum workers running plus
two manual VACUUM ANALYZEs I started yesterday.

 Is there anything in the postmaster log that would suggest autovac
 difficulties?

Yup, there are logs from April 1st which I just grepped through. I
attached the redacted output, and I see a few warnings about [table]
contains more than max_fsm_pages pages with useful free space, as
well as ERROR:  canceling autovacuum task.

Perhaps bumping up max_fsm_pages and making autovacuum settings more
aggressive will help me? I was also planning to run a CLUSTER of those
four bloated pg_catalog tables -- is this safe, particularly for
tables like pg_attrdef which rely on OIDs?

Josh
# SELECT * FROM pg_stat_activity WHERE usename = 'postgres' AND current_query 
ILIKE '%vacuum%';
  datid  |  datname   | procpid | usesysid | usename  | 
current_query | waiting 
|  xact_start   |  query_start  | 
backend_start |  client_addr  | client_port 
-++-+--+--+---+-+---+---+---+---+-
 7651893 | master_dev |   20288 |   10 | postgres | VACUUM ANALYZE 
myschema.mytable;  | f   | 
2010-04-14 16:24:51.421088+00 | 2010-04-14 16:24:51.421088+00 | 2010-04-14 
16:24:46.875911+00 | 192.168.24.63 |   54065
[snip SELECT * FROM pg_stat_activity]
   16389 | master |9596 |   10 | postgres | autovacuum: VACUUM 
ANALYZE pg_catalog.pg_attrdef  | f  
 | 2010-04-15 21:18:41.583044+00 | 2010-04-15 21:18:41.583044+00 | 
2010-04-01 11:25:53.903941+00 |   |
   16389 | master |   18043 |   10 | postgres | autovacuum: VACUUM 
ANALYZE pg_catalog.pg_attribute| f  
 | 2010-04-15 17:18:36.071479+00 | 2010-04-15 17:18:36.071479+00 | 
2010-04-15 16:51:50.230827+00 |   |
   16389 | master |   19608 |   10 | postgres | autovacuum: VACUUM 
ANALYZE pg_catalog.pg_depend   | f  
 | 2010-04-15 16:26:41.941069+00 | 2010-04-15 16:26:41.941069+00 | 
2010-04-08 17:08:07.417122+00 |   |
   16389 | master |   20278 |   10 | postgres | VACUUM ANALYZE 
myschema.mytable;  | f   | 
2010-04-14 16:24:35.7458+00   | 2010-04-14 16:24:35.7458+00   | 2010-04-14 
16:24:28.632431+00 | 192.168.24.63 |   54064
(6 rows)
# grep -A 2 -B 2 -i vacuum *.log
postgresql-2010-04-01_00.log-2010-04-01 06:48:40 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
postgresql-2010-04-01_00.log-2010-04-01 06:48:44 UTCWARNING:  relation 
pg_catalog.pg_depend contains more than max_fsm_pages pages with useful 
free space
postgresql-2010-04-01_00.log:2010-04-01 06:48:44 UTCHINT:  Consider 
using VACUUM FULL on this relation or increasing the configuration parameter 
max_fsm_pages.
postgresql-2010-04-01_00.log-2010-04-01 06:48:51 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
postgresql-2010-04-01_00.log-2010-04-01 06:48:52 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
--
[snip connection info]
postgresql-2010-04-01_00.log-2010-04-01 10:41:38 UTCWARNING:  relation 
schema.table contains more than max_fsm_pages pages with useful free space
postgresql-2010-04-01_00.log:2010-04-01 10:41:38 UTCHINT:  Consider 
using VACUUM FULL on this relation or increasing the configuration parameter 
max_fsm_pages.
postgresql-2010-04-01_00.log-2010-04-01 10:41:46 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
postgresql-2010-04-01_00.log-2010-04-01 10:41:47 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
--
postgresql-2010-04-01_00.log-2010-04-01 10:49:33 UTC username1 1.2.3.4 
master WARNING:  there is already a transaction in progress
postgresql-2010-04-01_00.log-2010-04-01 10:49:33 

Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
I wrote:
 So this *should* have resulted in the stats file shrinking.  Did you
 happen to notice if it did, after you did this?

Oh, never mind that --- I can see that it did shrink, just from counting
the write() calls in the collector's strace.  So what we have here is a
demonstration that the tabpurge mechanism does work for you, when it's
invoked.  Which is further evidence that for some reason autovacuum is
not running for you.

What I'd suggest at this point is cranking up log_min_messages to DEBUG2
or so in postgresql.conf, restarting the postmaster, and keeping an eye
on the log to see if you can spot anything about why autovac isn't
working.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Wow.  Well, we have a smoking gun here: for some reason, autovacuum
 isn't running, or isn't doing its job if it is.  If it's not running
 at all, that would explain failure to prune the stats collector's file
 too.

 Hrm, well autovacuum is at least trying to do work: it's currently
 stuck on those bloated pg_catalog tables, of course. Another developer
 killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
 after it had been running for two weeks. See current pg_stat_activity
 output attached, which shows the three autovacuum workers running plus
 two manual VACUUM ANALYZEs I started yesterday.

Two weeks?  What have you got the autovacuum cost delays set to?

Once you're up to three AV workers, no new ones can get launched until
one of those finishes or is killed.  So that would explain failure to
prune the stats collector's tables (the tabpurge code is only run during
AV worker launch).  So what we need to figure out is why it's taking so
obscenely long to vacuum these tables ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Scott Carey

On Apr 16, 2010, at 9:48 AM, Tom Lane wrote:

 Josh Kupershmidt schmi...@gmail.com writes:
 On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Wow.  Well, we have a smoking gun here: for some reason, autovacuum
 isn't running, or isn't doing its job if it is.  If it's not running
 at all, that would explain failure to prune the stats collector's file
 too.
 
 Hrm, well autovacuum is at least trying to do work: it's currently
 stuck on those bloated pg_catalog tables, of course. Another developer
 killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
 after it had been running for two weeks. See current pg_stat_activity
 output attached, which shows the three autovacuum workers running plus
 two manual VACUUM ANALYZEs I started yesterday.
 
 Two weeks?  What have you got the autovacuum cost delays set to?
 
 Once you're up to three AV workers, no new ones can get launched until
 one of those finishes or is killed.  So that would explain failure to
 prune the stats collector's tables (the tabpurge code is only run during
 AV worker launch).  So what we need to figure out is why it's taking so
 obscenely long to vacuum these tables ...
 

On any large system with good I/O I have had to significantly increase the 
aggressiveness of autovacuum.
Even with the below settings, it doesn't interfere with other activity 
(~2200iops random, ~900MB/sec sequential capable I/O).

My relevant autovacuum parameters are (from 'show *'):
 autovacuum  | on| 
Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor | 0.1   | 
Number of tuple inserts, updates or deletes prior to analyze as a fraction of 
reltuples.
 autovacuum_analyze_threshold| 50| 
Minimum number of tuple inserts, updates or deletes prior to analyze.
 autovacuum_freeze_max_age   | 2 | Age 
at which to autovacuum a table to prevent transaction ID wraparound.
 autovacuum_max_workers  | 3 | Sets 
the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_naptime  | 1min  | Time 
to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay| 20ms  | 
Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit| 2000  | 
Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor  | 0.2   | 
Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold | 50   




For what it is worth, I just went onto one of my systems -- one with lots of 
partition tables and temp table creation/destruction -- and looked at the 
system tables in question there.

Postgres 8.4, using dt+  (trimmed result below to interesting tables)

   Schema   |  Name   | Type  |  Owner   |Size| 
Description 
+-+---+--++-
 pg_catalog | pg_attrdef  | table | postgres | 195 MB | 
 pg_catalog | pg_attribute| table | postgres | 1447 MB| 
 pg_catalog | pg_class| table | postgres | 1694 MB| 
 pg_catalog | pg_constraint   | table | postgres | 118 MB | 
 pg_catalog | pg_depend   | table | postgres | 195 MB | 
 pg_catalog | pg_statistic| table | postgres | 2300 MB| 
 pg_catalog | pg_type | table | postgres | 181 MB | 


So, I did a vacuum full; reindex table; analyze;   sequence on each of these.  
I wish I could just CLUSTER them but the above works.

now the tables are:
   Schema   |  Name   | Type  |  Owner   |Size| 
Description 
+-+---+--++-
 pg_catalog | pg_attrdef  | table | postgres | 44 MB  | 
 pg_catalog | pg_attribute| table | postgres | 364 MB | 
 pg_catalog | pg_class| table | postgres | 1694 MB| 
 pg_catalog | pg_constraint   | table | postgres | 118 MB | 
 pg_catalog | pg_depend   | table | postgres | 195 MB | 
 pg_catalog | pg_statistic| table | postgres | 656 MB | 
 pg_catalog | pg_type | table | postgres | 45 MB  | 


I've learned to accept about 50% bloat (2x the compacted size) in postgres as 
just the way it usually is on a busy table, but the 3x and 4x bloat of 
statistic, attrdef, and attribute have me wondering.

I have had some 'idle in transaction' connections hanging out from time to time 
that have caused issues on this machine that could explain the above 
perma-bloat.  That is one thing that could affect the case reported here as 

Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Scott Carey
 
 I have had some 'idle in transaction' connections hanging out from time to 
 time that have caused issues on this machine that could explain the above 
 perma-bloat.  That is one thing that could affect the case reported here as 
 well.  The worst thing about those, is you can't even force kill those 
 connections from within postgres (pg_cancel_backend doesn't work on them, and 
 killing them via the OS bounces postgres ...) so you have to hunt down the 
 offending client.
 

Ooh, I just noticed pg_terminate_backend() ... maybe this will let me kill 
annoying idle in transaction clients.  I guess this arrived in 8.4?  Hopefully 
this won't cause the whole thing to bounce and close all other backends


 
  regards, tom lane
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Fri, Apr 16, 2010 at 12:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Kupershmidt schmi...@gmail.com writes:
 Hrm, well autovacuum is at least trying to do work: it's currently
 stuck on those bloated pg_catalog tables, of course. Another developer
 killed an autovacuum of pg_attribute (or maybe it was pg_attrdef)
 after it had been running for two weeks. See current pg_stat_activity
 output attached, which shows the three autovacuum workers running plus
 two manual VACUUM ANALYZEs I started yesterday.

 Two weeks?  What have you got the autovacuum cost delays set to?

SELECT name, current_setting(name), source FROM pg_settings WHERE
source != 'default' AND name ILIKE '%vacuum%';
 name | current_setting |   source
--+-+
 vacuum_cost_delay| 200ms   | configuration file
 vacuum_cost_limit| 100 | configuration file
 vacuum_cost_page_hit | 6   | configuration file
(3 rows)

I'm guessing these values and the default autovacuum configuration
values need to be cranked significantly to make vacuum much more
aggressive :-(

 Once you're up to three AV workers, no new ones can get launched until
 one of those finishes or is killed.  So that would explain failure to
 prune the stats collector's tables (the tabpurge code is only run during
 AV worker launch).  So what we need to figure out is why it's taking so
 obscenely long to vacuum these tables ...


Hopefully changing those three vacuum_cost_* params will speed up the
manual- and auto-vacuums.. it'll take me a few days to see any
results, since I still need to do something about the bloat that's
already there.

Josh

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Greg Smith

Josh Kupershmidt wrote:

SELECT name, current_setting(name), source FROM pg_settings WHERE
source != 'default' AND name ILIKE '%vacuum%';
 name | current_setting |   source
--+-+
 vacuum_cost_delay| 200ms   | configuration file
 vacuum_cost_limit| 100 | configuration file
 vacuum_cost_page_hit | 6   | configuration file

  
Hopefully changing those three vacuum_cost_* params will speed up the

manual- and auto-vacuums..


Those only impact manual VACUUM statements.  There's a different set 
with names like autovacuum_vacuum_cost_delay that control the daemon.  
You can set those to -1 in order to match the regular VACUUM, but 
that's not the default.


You really need to sort out the max_fsm_pages setting too, because until 
that issue goes away these tables are unlikely to ever stop growing.  
And, no, you can't use CLUSTER on the system tables to clean those up.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Fri, Apr 16, 2010 at 2:14 PM, Greg Smith g...@2ndquadrant.com wrote:
 Josh Kupershmidt wrote:

 SELECT name, current_setting(name), source FROM pg_settings WHERE
 source != 'default' AND name ILIKE '%vacuum%';
         name         | current_setting |       source
 --+-+
  vacuum_cost_delay    | 200ms           | configuration file
  vacuum_cost_limit    | 100             | configuration file
  vacuum_cost_page_hit | 6               | configuration file

  Hopefully changing those three vacuum_cost_* params will speed up the
 manual- and auto-vacuums..

 Those only impact manual VACUUM statements.  There's a different set with
 names like autovacuum_vacuum_cost_delay that control the daemon.  You can
 set those to -1 in order to match the regular VACUUM, but that's not the
 default.

It looks like the default which I have of autovacuum_vacuum_cost_limit
= -1, which means it's inheriting the vacuum_cost_limit of 100 I had
set. I'll try bumping vacuum_cost_limit up to 1000 or so.

 You really need to sort out the max_fsm_pages setting too, because until
 that issue goes away these tables are unlikely to ever stop growing.  And,
 no, you can't use CLUSTER on the system tables to clean those up.

I have max_fsm_pages = 524288 , but from the hints in the logfiles
this obviously needs to go up much higher. And it seems the only way
to compact the pg_catalog tables is VACUUM FULL + REINDEX on 8.3 -- I
had tried the CLUSTER on my 9.0 machine and wrongly assumed it would
work on 8.3, too.

Josh

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Greg Smith

Josh Kupershmidt wrote:

And it seems the only way
to compact the pg_catalog tables is VACUUM FULL + REINDEX on 8.3 -- I
had tried the CLUSTER on my 9.0 machine and wrongly assumed it would
work on 8.3, too.
  


Right; that just got implemented a couple of months ago.  See the news 
from http://www.postgresql.org/community/weeklynews/pwn20100214 for a 
summary of how the code was gyrated around to support that.  This is a 
tough situation to get out of in 9.0 because VACUUM FULL is slow and 
takes an exclusive lock on the table.  That tends to lead toward an 
unpredictable window for required downtime, which is never good.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Tom Lane
Josh Kupershmidt schmi...@gmail.com writes:
 name | current_setting |   source
 --+-+
  vacuum_cost_delay| 200ms   | configuration file
  vacuum_cost_limit| 100 | configuration file
  vacuum_cost_page_hit | 6   | configuration file
 
 It looks like the default which I have of autovacuum_vacuum_cost_limit
 = -1, which means it's inheriting the vacuum_cost_limit of 100 I had
 set. I'll try bumping vacuum_cost_limit up to 1000 or so.

Actually I think the main problem is that cost_delay value, which is
probably an order of magnitude too high.  The way to limit vacuum's
I/O impact on other stuff is to make it take frequent short delays,
not have it run full speed and then sleep a long time.  In any case,
your current settings have got it sleeping way too much.  Two WEEKS !!!??

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-16 Thread Josh Kupershmidt
On Fri, Apr 16, 2010 at 3:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Kupershmidt schmi...@gmail.com writes:
         name         | current_setting |       source
 --+-+
  vacuum_cost_delay    | 200ms           | configuration file
  vacuum_cost_limit    | 100             | configuration file
  vacuum_cost_page_hit | 6               | configuration file

 It looks like the default which I have of autovacuum_vacuum_cost_limit
 = -1, which means it's inheriting the vacuum_cost_limit of 100 I had
 set. I'll try bumping vacuum_cost_limit up to 1000 or so.

 Actually I think the main problem is that cost_delay value, which is
 probably an order of magnitude too high.  The way to limit vacuum's
 I/O impact on other stuff is to make it take frequent short delays,
 not have it run full speed and then sleep a long time.  In any case,
 your current settings have got it sleeping way too much.  Two WEEKS !!!??

Yup, I was going to turn vacuum_cost_delay down to 20. The two weeks
was for the pg_catalog table which has bloated to 145 GB, I think. One
of those manual VACUUMs I kicked off just finished, after 48 hours --
and that table was only 25 GB or so. I wasn't the one who set up this
postgresql.conf, but I am stuck fixing things :/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-15 Thread Tom Lane
Chris li...@deksai.com writes:
 I have a lot of centos servers which are running postgres.  Postgres isn't 
 used
 that heavily on any of them, but lately, the stats collector process keeps
 causing tons of IO load.  It seems to happen only on servers with centos 5.
 The versions of postgres that are running are:

 8.1.18
 8.2.6
 8.3.1
 8.3.5
 8.3.6
 8.3.7
 8.3.8
 8.3.9
 8.4.2
 8.4.3

Do these different server versions really all show the problem to the
same extent?  I'd expect 8.4.x in particular to be cheaper than the
older branches.  Are their pgstat.stat files all of similar sizes?
(Note that 8.4.x keeps pgstat.stat under $PGDATA/pg_stat_tmp/ whereas
in earlier versions it was under $PGDATA/global/.)

If your applications create/use/drop a lot of tables (perhaps temp
tables) then bloat of the pgstat.stat file is to be expected, but
it should get cleaned up by vacuum (including autovacuum).  What is
your vacuuming policy on these servers ... do you use autovacuum?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Chris
I have a lot of centos servers which are running postgres.  Postgres isn't used
that heavily on any of them, but lately, the stats collector process keeps
causing tons of IO load.  It seems to happen only on servers with centos 5.
The versions of postgres that are running are:

8.1.18
8.2.6
8.3.1
8.3.5
8.3.6
8.3.7
8.3.8
8.3.9
8.4.2
8.4.3

I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf
except track_counts (since auto vacuum says it needs it), but it seems to have
little affect on the IO caused by the stats collector.

Has anyone else noticed this?  Have there been recent kernel changes
that could cause this that anyone knows about?  Since we haven't touched
postgres on these boxes since they were setup initially, I'm a bit baffled as
to what might be causing the problem, and why I can't make it go away short of
kill -STOP.

Any suggestions would be much appreciated!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Cédric Villemain
2010/4/13 Chris li...@deksai.com:
 I have a lot of centos servers which are running postgres.  Postgres isn't 
 used
 that heavily on any of them, but lately, the stats collector process keeps
 causing tons of IO load.  It seems to happen only on servers with centos 5.
 The versions of postgres that are running are:

 8.1.18
 8.2.6
 8.3.1
 8.3.5
 8.3.6
 8.3.7
 8.3.8
 8.3.9
 8.4.2
 8.4.3

 I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf
 except track_counts (since auto vacuum says it needs it), but it seems to have
 little affect on the IO caused by the stats collector.

 Has anyone else noticed this?  Have there been recent kernel changes
 that could cause this that anyone knows about?  Since we haven't touched
 postgres on these boxes since they were setup initially, I'm a bit baffled as
 to what might be causing the problem, and why I can't make it go away short of
 kill -STOP.

 Any suggestions would be much appreciated!

stats file is writed to disk every 500ms (can be change while building
postgres) but it have been improved in 8.4 and should be write only if
needed.

In 8.4 you can change the directory where to write the stat file with
the config param : stats_temp_directory  Perhaps have a test and
change the filesystem (you might want to try a ramdisk and another fs
- ext3 -XFS-ext4 depending of your kernel) and see if it does change
something in your IO load.

Anyway it looks like it is centos 5 relative so what is your curernt
running kernel ? (and what FS )



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] stats collector suddenly causing lots of IO

2010-04-13 Thread Alvaro Herrera
Chris wrote:
 I have a lot of centos servers which are running postgres.  Postgres isn't 
 used
 that heavily on any of them, but lately, the stats collector process keeps
 causing tons of IO load.  It seems to happen only on servers with centos 5.

Does this correlate to an increase in size of the pgstat.stat file?
Maybe you could try resetting stats, so that the file goes back to an
initial size and is slowly repopulated.  I'd suggest monitoring the size
of the stats file, just in case there's something abnormal with it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance