Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Jakub Ouhrabka

 You might want to try setting log_autovacuum_min_duration=0 in the
 postgresql.conf

Thanks, tried it. There is nothing in the log - the actual 
vacuum/analyze commands are not run (as there is no query activity). I 
suspect that autovacuum is checking each database if it should run - and 
decides not to run. See the randomly catch process in ps 
output/pg_stat_activity mentioned in earlier mail. I suspect that this 
checking generates the load. Is it possible?


 With this many databases and this high of a statistics target

I've changed the default_statistics_target back to its default (100). No 
change, still stats collector generates load.


 You're really pushing what you can do in a VM with this many
 databases of this size.

Yes, it's a VM but on our dedicated hardware - there are few other 
containers running but they are not generating any load.


What's puzzling me is that there is no database activity (queries, 
connections) and stats collector is still eating CPU.


Kuba

Dne 16.2.2010 8:29, Greg Smith napsal(a):

Jakub Ouhrabka wrote:

I've found similar reports but with older versions of postgres:
http://old.nabble.com/100--of-CPU-utilization-postgres-process-tt27302021.html



Those all looked like a FreeBSD issue, doubt it's related to yours.


The pgstat.stat is ~20MB. There are 650 databases, 140GB total.
default_statistics_target = 1000
The system is running Proxmox linux distribution. PostgreSQL is in
OpenVZ container.


With this many databases and this high of a statistics target, running
in a VM, suspecting autovacuum seems reasonable. You might want to try
setting log_autovacuum_min_duration=0 in the postgresql.conf, restarting
or signalling (pg_ctl reload) the server, and watching just what it's
doing. You might need to reduce how aggressively that runs, or limit the
higher target to only the tables that need it, to get this under
control. You're really pushing what you can do in a VM with this many
databases of this size.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Alvaro Herrera
Jakub Ouhrabka wrote:
  You might want to try setting log_autovacuum_min_duration=0 in the
  postgresql.conf
 
 Thanks, tried it. There is nothing in the log - the actual
 vacuum/analyze commands are not run (as there is no query activity).
 I suspect that autovacuum is checking each database if it should run
 - and decides not to run. See the randomly catch process in ps
 output/pg_stat_activity mentioned in earlier mail. I suspect that
 this checking generates the load. Is it possible?

Yes.  There were some changes that needed to be done to autovacuum so
that it didn't read the stats file too often, but I don't recall if I
got around to it.

Note that autovacuum_naptime=1min (default value) means that it's
checking stats 650 times per minute (there's a throttle IIRC but still).
Maybe you should decrease naptime a bit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Euler Taveira de Oliveira
Jakub Ouhrabka escreveu:
 These databases are archive databases, so there is no user activity - no
 connected users. But the stats collector generates load - 20-40% of
 modern 2.8GHz core all the time.
 
Did you try to set stats_temp_directory in a RAM based filesystem?

 Any clues what does it cause and how to investigate it?
 
OProfile?


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Jakub Ouhrabka

 Maybe you should decrease naptime a bit.

That did the trick, thanks!

 Yes.  There were some changes that needed to be done to autovacuum so
 that it didn't read the stats file too often, but I don't recall if I
 got around to it.

I looked at the strace output and there are *writes* to the file not 
reads. Why? Is it a consequence of this optimization?


Release notes 8.4:

Reduce I/O load of writing the statistics collection file by writing the 
file only when requested (Martin Pihlak)


Was autovacuum requesting to write this 20MB file 650x per minute?

Anyway, thank you all for the quick answer and precise answers. 
PostgreSQL is really unique in this regard!


Kuba

Dne 16.2.2010 15:10, Alvaro Herrera napsal(a):

Jakub Ouhrabka wrote:

You might want to try setting log_autovacuum_min_duration=0 in the
postgresql.conf


Thanks, tried it. There is nothing in the log - the actual
vacuum/analyze commands are not run (as there is no query activity).
I suspect that autovacuum is checking each database if it should run
- and decides not to run. See the randomly catch process in ps
output/pg_stat_activity mentioned in earlier mail. I suspect that
this checking generates the load. Is it possible?


Yes.  There were some changes that needed to be done to autovacuum so
that it didn't read the stats file too often, but I don't recall if I
got around to it.

Note that autovacuum_naptime=1min (default value) means that it's
checking stats 650 times per minute (there's a throttle IIRC but still).
Maybe you should decrease naptime a bit.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Alvaro Herrera
Jakub Ouhrabka wrote:
  Maybe you should decrease naptime a bit.
 
 That did the trick, thanks!
 
  Yes.  There were some changes that needed to be done to autovacuum so
  that it didn't read the stats file too often, but I don't recall if I
  got around to it.
 
 I looked at the strace output and there are *writes* to the file not
 reads. Why? Is it a consequence of this optimization?
 
 Release notes 8.4:
 
 Reduce I/O load of writing the statistics collection file by writing
 the file only when requested (Martin Pihlak)
 
 Was autovacuum requesting to write this 20MB file 650x per minute?

Yes, exactly.

Ideally, autovacuum would only request a new copy of the file if the one
it got was considerably out of date.  Obviously a tenth of a second is
not old enough.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Jakub Ouhrabka

 Ideally, autovacuum would only request a new copy of the file if the
 one it got was considerably out of date.  Obviously a tenth of a
 second is not old enough.

I've tried to look at it and found that's already implemented - see 
autovac_refresh_stats(). STATS_READ_DELAY which is set to 1s. Am I 
reading the code correctly? If so then 1s is not enough for big clusters.


I guess it would be feasible to crank STATS_READ_DELAY up a little bit, 
say to 10s. What do you think?


Kuba

Dne 16.2.2010 19:59, Alvaro Herrera napsal(a):

Jakub Ouhrabka wrote:

Maybe you should decrease naptime a bit.


That did the trick, thanks!


Yes.  There were some changes that needed to be done to autovacuum so
that it didn't read the stats file too often, but I don't recall if I
got around to it.


I looked at the strace output and there are *writes* to the file not
reads. Why? Is it a consequence of this optimization?

Release notes 8.4:

Reduce I/O load of writing the statistics collection file by writing
the file only when requested (Martin Pihlak)

Was autovacuum requesting to write this 20MB file 650x per minute?


Yes, exactly.

Ideally, autovacuum would only request a new copy of the file if the one
it got was considerably out of date.  Obviously a tenth of a second is
not old enough.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Alvaro Herrera
Jakub Ouhrabka wrote:
  Ideally, autovacuum would only request a new copy of the file if the
  one it got was considerably out of date.  Obviously a tenth of a
  second is not old enough.
 
 I've tried to look at it and found that's already implemented - see
 autovac_refresh_stats(). STATS_READ_DELAY which is set to 1s. Am I
 reading the code correctly? If so then 1s is not enough for big
 clusters.

Note that it says it's not used for autovacuum workers; it's only used
for the autovacuum launcher.  The workers have their own set of
problems, particularly the bit that two of them might choose to vacuum
the same table.  I don't think this is so serious a problem in 8.4, so
maybe we could take out the check that limits it to the launcher.
However, it needs some thought.

You could try removing the if line and make it work unconditionally
and see if it fixes the problem for you, even at the 1s value.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Jakub Ouhrabka wrote:
 Was autovacuum requesting to write this 20MB file 650x per minute?

 Yes, exactly.

 Ideally, autovacuum would only request a new copy of the file if the one
 it got was considerably out of date.  Obviously a tenth of a second is
 not old enough.

Wasn't it you that insisted on a short staleness criterion for autovac
in the first place?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Jakub Ouhrabka wrote:
  Was autovacuum requesting to write this 20MB file 650x per minute?
 
  Yes, exactly.
 
  Ideally, autovacuum would only request a new copy of the file if the one
  it got was considerably out of date.  Obviously a tenth of a second is
  not old enough.
 
 Wasn't it you that insisted on a short staleness criterion for autovac
 in the first place?

well, my current opinion is that we should spend some nonzero amount of
thought into figuring out what to do.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Jakub Ouhrabka

 well, my current opinion is that we should spend some nonzero amount
 of thought into figuring out what to do.

I'd suggest to do it like this:

Do autovac_refresh_stats() once per autovacuum_naptime/2 and share the 
result among all autovacuum workers.


This would guarantee that autovacuum is fired no later than 
autovacuum_naptime after the condition for the run became true.


If it's not that easy to code then don't share it among the workers and 
do it once per worker - typically there are not so many workers.


And for bigger installations document that it's highly recommend to put 
the stats file on ramdisk.


Kuba

Dne 17.2.2010 0:12, Alvaro Herrera napsal(a):

Tom Lane wrote:

Alvaro Herreraalvhe...@commandprompt.com  writes:

Jakub Ouhrabka wrote:

Was autovacuum requesting to write this 20MB file 650x per minute?



Yes, exactly.



Ideally, autovacuum would only request a new copy of the file if the one
it got was considerably out of date.  Obviously a tenth of a second is
not old enough.


Wasn't it you that insisted on a short staleness criterion for autovac
in the first place?


well, my current opinion is that we should spend some nonzero amount of
thought into figuring out what to do.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-16 Thread Jakub Ouhrabka

 This would guarantee that autovacuum is fired no later than
 autovacuum_naptime after the condition for the run became true.

Of course, this unfortunately not true... The guarantee is 1,5x 
autovacuum_naptime. But I'd be happy with it but I agree that's not what 
I'd as a user expect from this parameter.


Kuba

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Problem with 8.4 stats collector high load

2010-02-15 Thread Jakub Ouhrabka

Hi,

sorry for repost but previous message didn't get through. So I'm trying 
another list and sending without attachment which I can send privately 
upon request (strace output mentioned below).


We've migrated some of our databases to 8.4 cluster (from 8.2 and older 
versions).


These databases are archive databases, so there is no user activity - no 
connected users. But the stats collector generates load - 20-40% of 
modern 2.8GHz core all the time.


I've found similar reports but with older versions of postgres:

http://old.nabble.com/100--of-CPU-utilization-postgres-process-tt27302021.html

Any clues what does it cause and how to investigate it?

I'm attaching my findings below - I suspect autovacuum but don't know 
where the problem is exactly.


Thanks,

Kuba

Detailed report:

PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc 
(Debian 4.3.2-1.1) 4.3.2, 64-bit


The pgstat.stat is ~20MB. There are 650 databases, 140GB total.

Attached is strace output of stats collector running for 5s.

Restarting postgresql and/or pg_stat_reset() doesn't help.

When I do select * from pg_stat_activity, there is every 3rd try row 
like this:


template1# select * from pg_stat_activity;

datname - some database in the cluster
procpid - changing number
usename - postgres
current_query -  command string not enabled
xact_start - null
query_start - null
backend_start - few milliseconds ago

ps shows autovacuum worker:

21323 0:04 /opt/pg/bin/postmaster -D /var/lib/postgresql/8.4/data
21325 0:00 postgres: writer process
21326 0:00 postgres: wal writer process
21327 3:01 postgres: autovacuum launcher process
21328 22:30 postgres: stats collector process
21355 0:00 postgres: autovacuum worker process name of db

There are only minor modifications to postgresql.conf:

shared_buffers = 512MB
temp_buffers = 2MB
work_mem = 32MB
maintenance_work_mem = 128MB
max_stack_depth = 1MB
fsync = off
wal_buffers = 1MB
checkpoint_segments = 100
effective_cache_size = 2GB
default_statistics_target = 1000

The system is running Proxmox linux distribution. PostgreSQL is in 
OpenVZ container. The kernel is 2.6.18-2-pve. PostgreSQL data files are 
on local xfs filesystem. We don't have much experience with this setup 
yet. But we have a smaller cluster with 8.4 running without this problem 
on other machine. And we have a big 8.2 cluster on this setup without 
this problem.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem with 8.4 stats collector high load

2010-02-15 Thread Greg Smith

Jakub Ouhrabka wrote:

I've found similar reports but with older versions of postgres:
http://old.nabble.com/100--of-CPU-utilization-postgres-process-tt27302021.html 



Those all looked like a FreeBSD issue, doubt it's related to yours.


The pgstat.stat is ~20MB. There are 650 databases, 140GB total.
default_statistics_target = 1000
The system is running Proxmox linux distribution. PostgreSQL is in 
OpenVZ container.


With this many databases and this high of a statistics target, running 
in a VM, suspecting autovacuum seems reasonable.  You might want to try 
setting log_autovacuum_min_duration=0 in the postgresql.conf, restarting 
or signalling (pg_ctl reload) the server, and watching just what it's 
doing.  You might need to reduce how aggressively that runs, or limit 
the higher target to only the tables that need it, to get this under 
control.  You're really pushing what you can do in a VM with this many 
databases of this size.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers