Re: [HACKERS] Still recommending daily vacuum...

2007-07-07 Thread Jim Nasby

On Jul 3, 2007, at 3:36 PM, Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:

Is there a reason to say anything beyond use autovac?
There is; I know that things like web session tables aren't  
handled very

well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we  
should
be able to accommodate that case with multiple workers, but we'll  
need a

mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.


And which part of that do you think isn't resolved in 8.3?


It's still possible to tie up all autovac workers in large tables,  
though of course it's now far less likely.


BTW, +1 to dropping the thresholds to a very low value. 0 might be  
pushing it, but 10 or 20 certainly doesn't sound absurd.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Alvaro Herrera
Matthew T. O'Connor wrote:
 Alvaro Herrera wrote:
 Jim C. Nasby wrote:
 FWIW, I normally go with the 8.2 defaults, though I could see dropping
 vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
 could be decreased further, maybe divide by 10.
 
 How about pushing thresholds all the way down to 0?
 
 As long as it handles small (or zero row) tables ok then yes.  The 
 base_threshold in the originial contrib autovacuum was just an easy way 
 to not vacuum really small tables too often.  If a table has only 10 
 rows, it's going to get vacuumed every time one row is updated.  I guess 
 that's not a big problem with a table that small but still seems excessive.

Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)

Does this sound acceptable?

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this.   (Fotis)
   (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Alvaro Herrera
Kevin Grittner wrote:
  On Tue, Jul 3, 2007 at  5:34 PM, in message
 [EMAIL PROTECTED], Alvaro Herrera
 [EMAIL PROTECTED] wrote: 
  Kevin Grittner wrote:
  
  Autovacuum is enabled with very aggressive settings, to cover small
  tables, including one with about 75 rows that can be updated 100 or more
  times per second.  Even with these settings there is zero chance of any
  table of even moderate size hitting the autovacuum threshold between our
  scheduled vacuums.
  
  Sounds like you would be served by setting those specific tables to a
  lower vacuum scale factor (keeping a more normal default for the rest of
  the tables), and having a non-zero vacuum delay setting (to avoid
  excessive I/O consumption).  Have you tried that?
  
 I did play with that, but it doens't seem to make sense in our environment.
 We have about 100 databases, most of them scattered around the state, and
 any extra maintenance like that has a cost, particularly with the daily
 cluster changing the oid.  Both from doing the math and from experience,
 I can say that the autovacuum only affects the small, frequently updated
 tables, so I could see no benefit.  Am I missing somethign?  (I can't see
 where this causes any extra I/O.)

There seem to be a misunderstanding here.

1. Cluster does not change the OID.  It only changes the relfilenode.
The file on disk is named something else, but the OID used in the
database remains unchanged.  So if you insert something into
pg_autovacuum it continues to work after a CLUSTER, you don't need to
update the OID.

2. The point of autovacuum is to get rid of maintenance burden, not add
to it.  If you know which tables are small and frequently updated, then
configure those to specific settings that you've found to be optimal,
and then you don't need to worry about vacuuming them any longer.

You already know this but: autovacuum uses a formula to determine which
tables to vacuum.  The formula is based on the number of dead tuples,
the size of the table and two factors that you can configure per table
as well as globally.  If you didn't tune it to match specific tables,
most likely your biggest tables never met the formula's condition, which
is why you were seeing it affecting only the small tables (which met the
condition under the values you configured server-wide).

The extra I/O I was talking about would come from vacuuming one of your
biggest tables, which could cause the amount of I/O to swamp everything
else the server was doing at the time.  Since it never actually touched
the big tables this hasn't happened to you yet.  Do note that autovacuum
uses the vacuum_cost_delay if autovacuum_vacuum_cost_delay is set to the
default value of -1.

 Our tables tend to fall into one of four categories, small tables with high
 update rates, medium tables (millions or tens of millions of rows) with
 thousands or tens of thousands of updates per day, static tables of various
 sizes that are only modified as part of a software release, and big honking
 tables (100s of GB) which are either insert-only or are insert with
 periodic purge of old rows.  Only the first group has a chance of being
 autovacuumed in normal operations.  Event he purges don't cause it to kick
 in.

It could certainly vacuum all your tables.  But one thing to keep in
mind that as of 8.2, only one autovacuum process can be running.  So if
and when it decides to vacuum the big tables, it will be long before it
is able to go back and check the small tables.

This is fixed in 8.3.


  Oh, the tiny, high-update tables occasionally bloat to hundreds or
  thousands of pages because of long-running transactions, so we schedule
  a daily cluster on those, just to keep things tidy.
  
  If you can afford the cluster then there's no problem.  I don't expect
  that to change in 8.3.
  
 Here also we're talking 10 to 20 milliseconds.  I understand that in 8.2
 that leaves a chance of an error, but we seem to have dodged that bullet
 so far.  Has that gotten any safer in 8.3?

Yes, it did, assuming I understood what error are you talking about
(cluster not leaving dead tuples possibly seen by concurrent
transactions).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
Well, if a table has 10 rows, and we keep the current threshold of 1000
rows, then this table must have 1002 dead tuples (99% dead tuples, 1002
dead + 10 live) before being vacuumed.  This seems wasteful because
there are 500 dead tuples on it and only 10 live tuples.  So each scan
must wade through all the dead tuples.

Another small table with 100 tuples will be vacuumed on every iteration
as well, even if there are just two dead tuples.  So you are right --
maybe dropping it all the way to 0 is too much.  But a small value of 10
is reasonable?  That will make the 10 tuple table be vacuumed when there
are 10 dead tuples (50% of dead tuples), and the 100 tuple table when
there are 11 (11% of dead tuples).  It decreases quickly to the scale
factor (2%, or do we want to decrease it to 1%?)


I think it's probably fine. I think, that the optimal number for the 
base_threhold is probably dependant on the width of the row, for a very 
narrow row where you might have many on the same page, 20 or 50 might be 
right, but for a very wide table a smaller number might be optimal, 
however I think it probably doesn't matter much anyway.


Reducing the default to 10 seems fine, and perhaps even removing it as a 
tuning knob.  I think there are too many autovacuum knobs and it 
confuses people.  Is it too late to possibly remove this GUC altogether?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Kevin Grittner
 On Fri, Jul 6, 2007 at  2:19 PM, in message
[EMAIL PROTECTED], Alvaro Herrera
[EMAIL PROTECTED] wrote: 
 Kevin Grittner wrote:

 2. The point of autovacuum is to get rid of maintenance burden, not add
 to it.  If you know which tables are small and frequently updated, then
 configure those to specific settings that you've found to be optimal,
 and then you don't need to worry about vacuuming them any longer.
 
We have 72 counties using the same schema, which has over 300 tables.
(Each has their own server, located in their county, with their data.)
Rather than trying to fine-tune autovacuum for each table in all locations,
we find it more convenient to use general settings which are aggressive
enough for the small, high-update tables, but lax enough to let the big
ones go until a nightly database vacuum.  (That time is pretty slack
anyway, so why not off-load the overhead of the vacuum to those hours?)
 
 If you didn't tune it to match specific tables,
 most likely your biggest tables never met the formula's condition, which
 is why you were seeing it affecting only the small tables (which met the
 condition under the values you configured server-wide).
 
 The extra I/O I was talking about would come from vacuuming one of your
 biggest tables, which could cause the amount of I/O to swamp everything
 else the server was doing at the time.
 
Is there something better about having the autovacuum compete with load
during the week, rather than doing a database vacuum during otherwise
idle weekend hours, immediately after the weekly delete of almost 2% of
the rows?  At the time we run the database vacuum analyze, there is
nothing else running to be swamped.
 
 Our tables tend to fall into one of four categories, small tables with high
 update rates, medium tables (millions or tens of millions of rows) with
 thousands or tens of thousands of updates per day, static tables of various
 sizes that are only modified as part of a software release, and big honking
 tables (100s of GB) which are either insert-only or are insert with
 periodic purge of old rows.  Only the first group has a chance of being
 autovacuumed in normal operations.  Event he purges don't cause it to kick
 in.
 
 It could certainly vacuum all your tables.
 
Well, sure, if we weren't doing a nightly database vacuum.  (Weekly for the
database containing the largest tables, mentioned above.)
 
This all started with the question about whether the documentation should
say anything about vacuum schedules other than enable autovacuum.
My point was that I have a use case where I think that a scheduled vacuum
will be better than leaving everything to autovacuum.  I may not be the only
one, so I'm thinking the documentation should discuss where an explicit
schedule might be useful.
 
The changes sound good, but I don't see the point of having any vacuum
activity during the work week on the big tables in the database I was
describing.  It seems to me that it would result in at least some
performance degradation for the interactive users, and bloat the table,
since we might start inserting before the post-delete vacuum.
 
-Kevin
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Still recommending daily vacuum...

2007-07-06 Thread Alvaro Herrera
Kevin Grittner wrote:

 This all started with the question about whether the documentation should
 say anything about vacuum schedules other than enable autovacuum.
 My point was that I have a use case where I think that a scheduled vacuum
 will be better than leaving everything to autovacuum.  I may not be the only
 one, so I'm thinking the documentation should discuss where an explicit
 schedule might be useful.

Fair enough, you are correct that this scenario may be a useful one to
describe in the docs.

There are plans to add some sort of scheduling to autovacuum so that it
choses different settings based on time/date.  It may be useful for you
as well (though of course they are 8.4 material at best ...), and we may
consider shrinking the docs (or at least reshaping them) a bit at that
time.


 The changes sound good, but I don't see the point of having any vacuum
 activity during the work week on the big tables in the database I was
 describing.

Of course.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 18.1, W 73º 13' 56.4
A wizard is never late, Frodo Baggins, nor is he early.
 He arrives precisely when he means to.  (Gandalf, en LoTR FoTR)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Michael Paesold

Alvaro Herrera wrote:

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?


For very small tables, setting a threshold of 0 could mean a vacuum 
after every single row update (or every other row). I think that is just 
burning cycles. What about a threshold of 10 or 50, to have at least 
some sanity limit? Even though the cost of vacuum of a small table is 
low, it is still not free, IMHO, no?


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Florian G. Pflug

Michael Paesold wrote:

Alvaro Herrera wrote:

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?


For very small tables, setting a threshold of 0 could mean a vacuum 
after every single row update (or every other row). I think that is just 
burning cycles. What about a threshold of 10 or 50, to have at least 
some sanity limit? Even though the cost of vacuum of a small table is 
low, it is still not free, IMHO, no?


A bit off-topic (because probably not realistic in a 8.3 timeframe) -
but maybe the threshold should be specified in terms of expected number of
pages to be freed, instead specifing a bias for the number of modified
rows as it is done now. Then 1 would probably be a reasonable default, because
a vacuum that won't free at least one page seems to be not really worth
the effort - it won't safe any future IO bandwith.

Just an idea I got while following this thread...

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Still recommending daily vacuum...

2007-07-05 Thread Kevin Grittner
 On Tue, Jul 3, 2007 at  5:34 PM, in message
[EMAIL PROTECTED], Alvaro Herrera
[EMAIL PROTECTED] wrote: 
 Kevin Grittner wrote:
 
 Autovacuum is enabled with very aggressive settings, to cover small
 tables, including one with about 75 rows that can be updated 100 or more
 times per second.  Even with these settings there is zero chance of any
 table of even moderate size hitting the autovacuum threshold between our
 scheduled vacuums.
 
 Sounds like you would be served by setting those specific tables to a
 lower vacuum scale factor (keeping a more normal default for the rest of
 the tables), and having a non-zero vacuum delay setting (to avoid
 excessive I/O consumption).  Have you tried that?
 
I did play with that, but it doens't seem to make sense in our environment.
We have about 100 databases, most of them scattered around the state, and
any extra maintenance like that has a cost, particularly with the daily
cluster changing the oid.  Both from doing the math and from experience,
I can say that the autovacuum only affects the small, frequently updated
tables, so I could see no benefit.  Am I missing somethign?  (I can't see
where this causes any extra I/O.)
 
Our tables tend to fall into one of four categories, small tables with high
update rates, medium tables (millions or tens of millions of rows) with
thousands or tens of thousands of updates per day, static tables of various
sizes that are only modified as part of a software release, and big honking
tables (100s of GB) which are either insert-only or are insert with
periodic purge of old rows.  Only the first group has a chance of being
autovacuumed in normal operations.  Event he purges don't cause it to kick
in.
 
 In terms of our autovacuum settings, we have several different types of
 databases, and in all of them we seem to do well with these changes from
 the 8.2 defaults, combined with (except for the above configuration) a
 nightly database vacuum:
  
 autovacuum_naptime = 10s
 
 Another change in 8.3 is that the naptime is per-database, i.e. the time
 between two consecutive autovac runs on a database.  So with a setting
 of 10s, if you have 10 database there will be one autovac run per
 second, whereas on 8.2 there would be one autovac each 10 seconds
 (unless you run out of worker slots).
 
That's fine.  We actually want it every ten seconds in a production
database.  When you can have more updates per second than there are rows
in a small table, frequent vacuums are good.  As long as the table doesn't
bloat too badly, the vacuum is typically 10 to 20 milliseconds.  I'm sure
that part of it is that the table tends to remain fully cached. When these
tables were vacuumed once per minute, we ran into performance problems.
 
 Oh, the tiny, high-update tables occasionally bloat to hundreds or
 thousands of pages because of long-running transactions, so we schedule
 a daily cluster on those, just to keep things tidy.
 
 If you can afford the cluster then there's no problem.  I don't expect
 that to change in 8.3.
 
Here also we're talking 10 to 20 milliseconds.  I understand that in 8.2
that leaves a chance of an error, but we seem to have dodged that bullet
so far.  Has that gotten any safer in 8.3?
 
-Kevin
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Still recommending daily vacuum...

2007-07-04 Thread Gregory Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 FWIW, I normally go with the 8.2 defaults, though I could see dropping
 vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
 could be decreased further, maybe divide by 10.

 How about pushing thresholds all the way down to 0?

My intuition is that the thresholds should be lowered to about 5%.

I may be biased by the TPC-C schema where the largest table, stock, gets a
little over 20 records per page so 5% represents an average of one update per
page. But there's nothing unusual about a table like that. Waiting until 20%
of the table is potentially dead --four dead tuples out of 20 per page in the
stock table case-- seems extravagantly wasteful.

I find the idea of lowering the thresholds to 0 sort of intriguing though.
That makes the vacuum delay parameters the primary method to control how
frequently vacuum runs.

Unfortunately vacuum delay settings are hard to get right. The admin needs to
observe how much of an effect the settings have on i/o throughput which varies
from system to system. And using them to control how frequently vacuum runs
would be even harder.

In an ideal world autovacuum would be able to set the delay settings based on
how many updates had happened since the last run started. If more than 5% of
the table was cleaned by vacuum then decrease the delay settings to get this
vacuum to finish sooner and allow fewer updates. If less than 5% of the table
was cleaned by vacuum then increase the delay settings to reduce the
unnecessary impact of vacuum. But that just leaves us back where we started.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-04 Thread Alvaro Herrera
Gregory Stark wrote:
 
 Alvaro Herrera [EMAIL PROTECTED] writes:
 
  FWIW, I normally go with the 8.2 defaults, though I could see dropping
  vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
  could be decreased further, maybe divide by 10.
 
  How about pushing thresholds all the way down to 0?
 
 My intuition is that the thresholds should be lowered to about 5%.
 
 I may be biased by the TPC-C schema where the largest table, stock, gets a
 little over 20 records per page so 5% represents an average of one update per
 page. But there's nothing unusual about a table like that. Waiting until 20%
 of the table is potentially dead --four dead tuples out of 20 per page in the
 stock table case-- seems extravagantly wasteful.
 
 I find the idea of lowering the thresholds to 0 sort of intriguing though.
 That makes the vacuum delay parameters the primary method to control how
 frequently vacuum runs.

I think you are mixing thresholds with scale factors.

vacuum tuples = threshold + reltuples * scale factor

If dead tuples are more than vacuum tuples, autovac does a vacuum.

So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations.  In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway.  So why not let the threshold be 0 and be done with it?

I, too, find a 0.2 scale factor a bit high.  But since I don't run any
database, I fear I would be picking numbers out of thin air.

 In an ideal world autovacuum would be able to set the delay settings based on
 how many updates had happened since the last run started. If more than 5% of
 the table was cleaned by vacuum then decrease the delay settings to get this
 vacuum to finish sooner and allow fewer updates. If less than 5% of the table
 was cleaned by vacuum then increase the delay settings to reduce the
 unnecessary impact of vacuum. But that just leaves us back where we started.

Maybe we can construct some smarts based on something like this.  The
equations we currently use are just inherited from contrib autovac,
which didn't have access to much other info.  Integrated autovac can do
much better, I think.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Michael Paesold

Joshua D. Drake wrote:

Alvaro Herrera wrote:

Joshua D. Drake wrote:
Did we change the default autovac parameters for 8.3 (beyond turning 
it on?) because on any reasonably used database, they are way to 
conservative.


We're still on time to change them ...  Any concrete proposals?


I could provide numbers from production high use databases. We could 
probably back those down a little and make more reasonable numbers.


Please do so. Perhaps others can also tell their typical settings.

Best Regards
Michael Paesold

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
 Joshua D. Drake wrote:
 Alvaro Herrera wrote:
 Joshua D. Drake wrote:
 Did we change the default autovac parameters for 8.3 (beyond turning 
 it on?) because on any reasonably used database, they are way to 
 conservative.
 
 We're still on time to change them ...  Any concrete proposals?
 
 I could provide numbers from production high use databases. We could 
 probably back those down a little and make more reasonable numbers.
 
 Please do so. Perhaps others can also tell their typical settings.

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpLfM5OHZFng.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Jim C. Nasby
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
 
 Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
 text now.
 
 Is there a reason to say anything beyond use autovac?

There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpxNRAaq8BOV.pgp
Description: PGP signature


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
 Is there a reason to say anything beyond use autovac?

 There is; I know that things like web session tables aren't handled very
 well by autovacuum if there are any moderately large tables (anything
 that will take more than a few minutes to vacuum). Eventually we should
 be able to accommodate that case with multiple workers, but we'll need a
 mechanism to ensure that at least one worker doesn't get tied up in
 large vacuums.

And which part of that do you think isn't resolved in 8.3?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Alvaro Herrera
Jim C. Nasby wrote:
 On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
  Joshua D. Drake wrote:
  Alvaro Herrera wrote:
  Joshua D. Drake wrote:
  Did we change the default autovac parameters for 8.3 (beyond turning 
  it on?) because on any reasonably used database, they are way to 
  conservative.
  
  We're still on time to change them ...  Any concrete proposals?
  
  I could provide numbers from production high use databases. We could 
  probably back those down a little and make more reasonable numbers.
  
  Please do so. Perhaps others can also tell their typical settings.
 
 FWIW, I normally go with the 8.2 defaults, though I could see dropping
 vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
 could be decreased further, maybe divide by 10.

How about pushing thresholds all the way down to 0?

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte (Andre Breton)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Kevin Grittner
 On Tue, Jul 3, 2007 at  3:36 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 Jim C. Nasby [EMAIL PROTECTED] writes:
 On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
 Is there a reason to say anything beyond use autovac?
 
 There is; I know that things like web session tables aren't handled very
 well by autovacuum if there are any moderately large tables (anything
 that will take more than a few minutes to vacuum). Eventually we should
 be able to accommodate that case with multiple workers, but we'll need a
 mechanism to ensure that at least one worker doesn't get tied up in
 large vacuums.
 
 And which part of that do you think isn't resolved in 8.3?
 
We have a 406GB table where 304GB is in one table.  The next two tables
are 57GB and 40GB.  Inserts to these three tables are constant during the
business day, along with inserts, updates, and very few deletes to the
other tables.  Database modifications are few and scattered at night and
on weekends.  Virtually all queries are during the business day.  The
large tables are insert only except for a weekend delete of the oldest
one week of data, to keep a rolling set of just over a year.  (No, we
really don't want to go to weekly partitions, if it can be avoided.)
 
Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second.  Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
at the end of business day, it ran well into the next day, and the users
complained of slowness until it stopped.  We changed to a weeknight vacuum
analyze of the volatile tables which aren't in the big three, and a vacuum
analyze of the entire database right after the weekly delete.
 
Isn't this a use case where we don't want to count on autovacuum, both
from a table bloat perspective and the user impact perspective, even under
8.3?
 
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:
 
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
 
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.
 
-Kevin
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Kevin Grittner
 On Tue, Jul 3, 2007 at  5:17 PM, in message
[EMAIL PROTECTED], Kevin Grittner
[EMAIL PROTECTED] wrote: 
  
 We have a 406GB table where 304GB is in one table.  The next two tables

It's probably obvious, but I meant a 406GB database.  Sorry.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Alvaro Herrera
Kevin Grittner wrote:

 We have a 406GB table where 304GB is in one table.  The next two tables
 are 57GB and 40GB.  Inserts to these three tables are constant during the
 business day, along with inserts, updates, and very few deletes to the
 other tables.  Database modifications are few and scattered at night and
 on weekends.  Virtually all queries are during the business day.  The
 large tables are insert only except for a weekend delete of the oldest
 one week of data, to keep a rolling set of just over a year.  (No, we
 really don't want to go to weekly partitions, if it can be avoided.)
  
 Autovacuum is enabled with very aggressive settings, to cover small
 tables, including one with about 75 rows that can be updated 100 or more
 times per second.  Even with these settings there is zero chance of any
 table of even moderate size hitting the autovacuum threshold between our
 scheduled vacuums.  When we tried doing a nightly vacuum analyze starting
 at the end of business day, it ran well into the next day, and the users
 complained of slowness until it stopped.  We changed to a weeknight vacuum
 analyze of the volatile tables which aren't in the big three, and a vacuum
 analyze of the entire database right after the weekly delete.

Sounds like you would be served by setting those specific tables to a
lower vacuum scale factor (keeping a more normal default for the rest of
the tables), and having a non-zero vacuum delay setting (to avoid
excessive I/O consumption).  Have you tried that?

The problem you would still have with 8.2 is that while one of these
tables is being vacuumed the rest won't be vacuumed at all.  In 8.3 the
other tables can still be vacuumed regularly with the big vacuum still
running (a feature I dubbed multiple workers, but we're still waiting
to know what name the marketing guys are gonna use).

 In terms of our autovacuum settings, we have several different types of
 databases, and in all of them we seem to do well with these changes from
 the 8.2 defaults, combined with (except for the above configuration) a
 nightly database vacuum:
  
 autovacuum_naptime = 10s

Another change in 8.3 is that the naptime is per-database, i.e. the time
between two consecutive autovac runs on a database.  So with a setting
of 10s, if you have 10 database there will be one autovac run per
second, whereas on 8.2 there would be one autovac each 10 seconds
(unless you run out of worker slots).


 Oh, the tiny, high-update tables occasionally bloat to hundreds or
 thousands of pages because of long-running transactions, so we schedule
 a daily cluster on those, just to keep things tidy.

If you can afford the cluster then there's no problem.  I don't expect
that to change in 8.3.

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are.  -- Charles J. Sykes' advice to teenagers

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Still recommending daily vacuum...

2007-07-03 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

Jim C. Nasby wrote:

FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.


How about pushing thresholds all the way down to 0?



As long as it handles small (or zero row) tables ok then yes.  The 
base_threshold in the originial contrib autovacuum was just an easy way 
to not vacuum really small tables too often.  If a table has only 10 
rows, it's going to get vacuumed every time one row is updated.  I guess 
that's not a big problem with a table that small but still seems excessive.


If you think this isn't a problem with the current autovacuum, then sure 
turn it down to zero, and perhaps we can even get rid of it altogether 
in another release or two.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Robert Treat
On Monday 02 July 2007 17:52, Jim C. Nasby wrote:
 From
 http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

 Recommended practice for most sites is to schedule a database-wide
 VACUUM once a day at a low-usage time of day, supplemented by more
 frequent vacuuming of heavily-updated tables if necessary. (Some
 installations with extremely high update rates vacuum their busiest
 tables as often as once every few minutes.) If you have multiple
 databases in a cluster, don't forget to VACUUM each one; the program
 vacuumdb  might be helpful.

 Do we still want that to be our formal recommendation? ISTM it would be
 more logical to recommend a combination of autovac, daily vacuumdb -a if
 you can afford it and have a quiet period, and frequent manual vacuuming
 of things like web session tables.

 I'm happy to come up with a patch, but I figure there should be
 consensus first...

I generally recommend to try autovacuum first, augmented by 
vacuum/analyze/reindex if you find trouble.  I wont say there aren't 
workloads that autvacuum wont handle, but in most cases it does fine, and I 
expect that increase with 8.3. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond use autovac?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Joshua D. Drake

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :


Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond use autovac?


Did we change the default autovac parameters for 8.3 (beyond turning it 
on?) because on any reasonably used database, they are way to conservative.


Joshua D. Drake



regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Alvaro Herrera
Joshua D. Drake wrote:
 Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
 http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
 
 Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
 text now.
 
 Is there a reason to say anything beyond use autovac?
 
 Did we change the default autovac parameters for 8.3 (beyond turning it 
 on?) because on any reasonably used database, they are way to conservative.

We're still on time to change them ...  Any concrete proposals?

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles. (Lao Tse)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Still recommending daily vacuum...

2007-07-02 Thread Joshua D. Drake

Alvaro Herrera wrote:

Joshua D. Drake wrote:

Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :

Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.

Is there a reason to say anything beyond use autovac?
Did we change the default autovac parameters for 8.3 (beyond turning it 
on?) because on any reasonably used database, they are way to conservative.


We're still on time to change them ...  Any concrete proposals?


I could provide numbers from production high use databases. We could 
probably back those down a little and make more reasonable numbers.


Joshua D. Drake




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly