Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread ITAGAKI Takahiro

Matthew T. O'Connor matthew@zeut.net wrote:

Sorry, I should have explained more.

 What is this based on?  That is, based on what information is it 
 deciding to reduce the naptime?

If there are some vacuum or analyze jobs, the naptime is shortened
(i.e, autovacuum is accelerated). And if there are no jobs, the naptime
is lengthened (autovacuum is decelerated).

 Given that we can now specify the vacuum cost delay settings for 
 autovacuum and disable tables and everything else, I'm not sure we this 
 anymore, at least not as it was originally designed.  It sounds like 
 Itagaki is doing things a little different with his patch, but I'm not 
 sure I understand it.

I noticed my method is based on different views from contrib/pg_autovacuum.
I'm afraid of the lack of vacuum by autovacuum. So if the database seems to
require frequent vacuums, I'll accelerate autovacuum, and vice versa.
If we have a small heavily-updated table and a large rarely-updated table,
we should vacuum the small one soon after vacuum on the large one is done,
even if the large vacuum takes long time. -- but hmm, it may be better to
have multiple autovacuums in such a case primarily.


 My vision of the maintenance window has always been very simple, that 
 is, during the maintenance window the thresholds get reduced by some 
 factor (probably a GUC variable) so during the day it might take 1 
 updates on a table to cause a vacuum but during the naptime it might be 
 10% of that, 1000.  Is this in-line with what others were thinking?

I agree. We can use autovacuum thresholds and cost-delay parameters to
control the frequency and priority of vacuum. I don't think it is good
to control vacuums by changing naptime.


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread Matthew T. O'Connor

ITAGAKI Takahiro wrote:

Matthew T. O'Connor matthew@zeut.net wrote:
What is this based on?  That is, based on what information is it 
deciding to reduce the naptime?


If there are some vacuum or analyze jobs, the naptime is shortened
(i.e, autovacuum is accelerated). And if there are no jobs, the naptime
is lengthened (autovacuum is decelerated).


Yeah, I looked through the patch after I sent this email.  It's an 
interesting perspective, but I want to see some performance numbers or 
significant bloat reduction before I agree this is a good idea.  Again, 
when a table is busy, constant vacuuming will help keep down bloat, but 
at the expense of throughput.



I noticed my method is based on different views from contrib/pg_autovacuum.
I'm afraid of the lack of vacuum by autovacuum. So if the database seems to
require frequent vacuums, I'll accelerate autovacuum, and vice versa.
If we have a small heavily-updated table and a large rarely-updated table,
we should vacuum the small one soon after vacuum on the large one is done,
even if the large vacuum takes long time. -- but hmm, it may be better to
have multiple autovacuums in such a case primarily.


Yes, I think we are heading in this direction.  As of 8.2 PostgreSQL 
will allow multiple vacuums at the same time (just not on the same 
table), autovacuum hasn't been trained on this yet, but I think it will 
eventually.



I agree. We can use autovacuum thresholds and cost-delay parameters to
control the frequency and priority of vacuum. I don't think it is good
to control vacuums by changing naptime.


Now I'm confused, are you now saying that you don't like the concept 
behind your patch?  Or am I misunderstanding.  I think your idea might 
be a good one, I'm just not sure yet.


Matt

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread Jim C. Nasby
On Thu, Aug 17, 2006 at 03:00:00PM +0900, ITAGAKI Takahiro wrote:
 
 Matthew T. O'Connor matthew@zeut.net wrote:
 
 Sorry, I should have explained more.
 
  What is this based on?  That is, based on what information is it 
  deciding to reduce the naptime?
 
 If there are some vacuum or analyze jobs, the naptime is shortened
 (i.e, autovacuum is accelerated). And if there are no jobs, the naptime
 is lengthened (autovacuum is decelerated).

IMO, the only reason at all for naptime is because there is a
non-trivial cost associated with checking a database to see if any
vacuuming is needed.

One problem that I've run across is that in a cluster with a lot of
databases it can take a very long time to cycle through all of them.

Perhaps a better idea would be to check a number of databases on each
pass. That way you won't bog the server down while checking, but it
won't take as long to get to all the databases.

Also, autovac should immediately continue checking databases after it
finishes vacuuming one. The reason for this is that while vacuuming,
the vacuum_cost_delay settings will almost certainly be in effect, which
will prevent autovac from hammering the system. Since the system won't
be hammered during the vacuum, it's ok to check more databases
immediately after finishing vacuuming on one.

Does anyone have any info on how much load there actually is when
checking databases to see if they need vacuuming?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Adjust autovacuum naptime automatically

2006-08-17 Thread Matthew T. O'Connor

Jim C. Nasby wrote:

On Thu, Aug 17, 2006 at 03:00:00PM +0900, ITAGAKI Takahiro wrote:
  
IMO, the only reason at all for naptime is because there is a

non-trivial cost associated with checking a database to see if any
vacuuming is needed.
  


This cost is reduced significantly in the integrated version as compared 
to the contrib version, but yes still not zero.



One problem that I've run across is that in a cluster with a lot of
databases it can take a very long time to cycle through all of them.

Perhaps a better idea would be to check a number of databases on each
pass. That way you won't bog the server down while checking, but it
won't take as long to get to all the databases.

Also, autovac should immediately continue checking databases after it
finishes vacuuming one. The reason for this is that while vacuuming,
the vacuum_cost_delay settings will almost certainly be in effect, which
will prevent autovac from hammering the system. Since the system won't
be hammered during the vacuum, it's ok to check more databases
immediately after finishing vacuuming on one.
  


This is basically what Itagaki's patch does. 


Does anyone have any info on how much load there actually is when
checking databases to see if they need vacuuming?
  


I haven't.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings