Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Fri, Apr 30, 2010 at 6:50 PM, Josh Berkus j...@agliodbs.com wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.

That's all fine, but probably not too relevant to the original
complaint - the OP backed off the default settings by several orders
of magnitude, which might very well cause a problem with both VACUUM
and ANALYZE.

I don't have a stake in the ground on what the right settings are, but
I think it's fair to say that if you vacuum OR analyze much less
frequently than what we recommend my default, it might break.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Wed, Apr 28, 2010 at 8:20 AM, Thomas Kellerer spam_ea...@gmx.net wrote:
 Rick, 22.04.2010 22:42:

 So, in a large table, the scale_factor is the dominant term. In a
 small table, the threshold is the dominant term. But both are taken into
 account.

 The default values are set for small tables; it is not being run for
 large tables.

 With 8.4 you can adjust the autovacuum settings per table...

You can as well with 8.3, but it's not made by alter table but by
pg_autovacuum table entries.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.   Before 8.4, vacuuming more frequently, especially on
 large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
 with 8.4 it's not something you want to increase without thinking about
 the tradeoff

Actually I would think that statement would be be that before 8.3
vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
had to do with moving FSM to disk, making seldom vacuumed tables
easier to keep track of, and making autovac work better in the
presence of long running transactions.  The ability to tune IO load
etc was basically unchanged in 8.4.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Greg Smith

Robert Haas wrote:

I don't have a stake in the ground on what the right settings are, but
I think it's fair to say that if you vacuum OR analyze much less
frequently than what we recommend my default, it might break.
  


I think the default settings are essentially minimum recommended 
frequencies.  They aren't too terrible for the giant data warehouse case 
Josh was suggesting they came from--waiting until there's 20% worth of 
dead stuff before kicking off an intensive vacuum is OK when vacuum is 
expensive and you're mostly running big queries anyway.  And for smaller 
tables, the threshold helps it kick in a little earlier.  It's unlikely 
anyone wants to *increase* those, so that autovacuum runs even less; out 
of the box it's not tuned to run very often at all.


If anything, I'd expect people to want to increase how often it runs, 
for tables where much less than 20% dead is a problem.  The most common 
situation I've seen where that's the case is when you have a hotspot of 
heavily updated rows in a large table, and this may match some of the 
situations that Robert was alluding to seeing.  Let's say you have a big 
table where 0.5% of the users each update their respective records 
heavily, averaging 30 times each.  That's only going to result in 15% 
dead rows, so no autovacuum.  But latency for those users will suffer 
greatly, because they might have to do lots of seeking around to get 
their little slice of the data.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 If anything, I'd expect people to want to increase how often it runs, 
 for tables where much less than 20% dead is a problem.  The most common 
 situation I've seen where that's the case is when you have a hotspot of 
 heavily updated rows in a large table, and this may match some of the 
 situations that Robert was alluding to seeing.  Let's say you have a big 
 table where 0.5% of the users each update their respective records 
 heavily, averaging 30 times each.  That's only going to result in 15% 
 dead rows, so no autovacuum.  But latency for those users will suffer 
 greatly, because they might have to do lots of seeking around to get 
 their little slice of the data.

With a little luck, HOT will alleviate that case, since HOT updates can
be reclaimed without running vacuum per se.  I agree there's a risk
there though.

Now that partial vacuum is available, it'd be a real good thing to
revisit these numbers.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.   Before 8.4, vacuuming more frequently, especially on
 large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
 with 8.4 it's not something you want to increase without thinking about
 the tradeoff

 Actually I would think that statement would be be that before 8.3
 vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
 had to do with moving FSM to disk, making seldom vacuumed tables
 easier to keep track of, and making autovac work better in the
 presence of long running transactions.  The ability to tune IO load
 etc was basically unchanged in 8.4.

What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Sat, May 1, 2010 at 1:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.   Before 8.4, vacuuming more frequently, especially on
 large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
 with 8.4 it's not something you want to increase without thinking about
 the tradeoff

 Actually I would think that statement would be be that before 8.3
 vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
 had to do with moving FSM to disk, making seldom vacuumed tables
 easier to keep track of, and making autovac work better in the
 presence of long running transactions.  The ability to tune IO load
 etc was basically unchanged in 8.4.

 What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

That really only has an effect no tables that aren't updated very
often.  Unless you've got a whole bunch of those, it's not that big of
a deal.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Scott Marlowe
On Sat, May 1, 2010 at 1:17 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Sat, May 1, 2010 at 1:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, May 1, 2010 at 12:13 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Fri, Apr 30, 2010 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote:
 Which is the opposite of my experience; currently we have several
 clients who have issues which required more-frequent analyzes on
 specific tables.   Before 8.4, vacuuming more frequently, especially on
 large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
 with 8.4 it's not something you want to increase without thinking about
 the tradeoff

 Actually I would think that statement would be be that before 8.3
 vacuum was much more expensive.  The changes to vacuum for 8.4 mostly
 had to do with moving FSM to disk, making seldom vacuumed tables
 easier to keep track of, and making autovac work better in the
 presence of long running transactions.  The ability to tune IO load
 etc was basically unchanged in 8.4.

 What about http://www.postgresql.org/docs/8.4/static/storage-vm.html ?

 That really only has an effect no tables that aren't updated very
 often.  Unless you've got a whole bunch of those, it's not that big of
 a deal.

sigh, s/ no / on /

Anyway, my real point was that the big improvements that made vacuum
so much better came in 8.3, with HOT updates and multi-threaded vacuum
(that might have shown up in 8.2 even) 8.3 was a huge improvement and
compelling upgrade from 8.1 for me.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-05-01 Thread Robert Haas
On Sat, May 1, 2010 at 1:11 PM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:

 I don't have a stake in the ground on what the right settings are, but
 I think it's fair to say that if you vacuum OR analyze much less
 frequently than what we recommend my default, it might break.


 I think the default settings are essentially minimum recommended
 frequencies.  They aren't too terrible for the giant data warehouse case
 Josh was suggesting they came from--waiting until there's 20% worth of dead
 stuff before kicking off an intensive vacuum is OK when vacuum is expensive
 and you're mostly running big queries anyway.  And for smaller tables, the
 threshold helps it kick in a little earlier.  It's unlikely anyone wants to
 *increase* those, so that autovacuum runs even less; out of the box it's not
 tuned to run very often at all.

 If anything, I'd expect people to want to increase how often it runs, for
 tables where much less than 20% dead is a problem.  The most common
 situation I've seen where that's the case is when you have a hotspot of
 heavily updated rows in a large table, and this may match some of the
 situations that Robert was alluding to seeing.  Let's say you have a big
 table where 0.5% of the users each update their respective records heavily,
 averaging 30 times each.  That's only going to result in 15% dead rows, so
 no autovacuum.  But latency for those users will suffer greatly, because
 they might have to do lots of seeking around to get their little slice of
 the data.

For me it's more that my applications are typically really fast, and
when they run at half-speed people think oh, it's slow today but
they can still work and attribute the problem to their computer, or
the network, or something.  When they slow down by like 10x then they
file a bug.  I'm typically dealing with a situation where the whole
database can be easily cached in RAM and the CPU is typically 90%
idle, which cushions the blow quite a bit.

A few months ago someone reported that the portal was slow and the
problem turned out to be that the database was bloated by in excess of
a factor a factor of 10 due to having blown out the free space map.  I
wasn't a regular user of that system at that time so hadn't had the
opportunity to notice myself.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-30 Thread Josh Berkus

 My guess is that the reason we run ANALYZE more frequently than vacuum
 (with the default settings) is that ANALYZE is pretty cheap.  In many
 cases, if the statistical distribution of the data hasn't changed
 much, then it's not really necessary, but it doesn't cost much either.
  And for certain types of usage patterns, like time series (where the
 maximum value keeps increasing) it's REALLY important to analyze
 frequently.
 
 But having said that, on the systems I've worked with, I've only
 rarely seen a problem caused by not analyzing frequently enough.  On
 the other hand, I've seen MANY problems caused by not vacuuming
 enough. 

Which is the opposite of my experience; currently we have several
clients who have issues which required more-frequent analyzes on
specific tables.   Before 8.4, vacuuming more frequently, especially on
large tables, was very costly; vacuum takes a lot of I/O and CPU.  Even
with 8.4 it's not something you want to increase without thinking about
the tradeoffs.

Since I'm responsible for the current defaults, I though I'd explain the
reasoning behind them.  I developed and tested them while at Greenplum,
so they are *not* designed for small databases.

#autovacuum_vacuum_threshold = 50
#autovacuum_analyze_threshold = 50

These two are set to the minimum threshold to avoid having small tables
get vacuum/analyzed continuously, but to make sure that small tables do
get vacuumed  analyzed sometimes.

#autovacuum_vacuum_scale_factor = 0.2

This is set because in my experience, 20% bloat is about the level at
which bloat starts affecting performance; thus, we want to vacuum at
that level but not sooner.  This does mean that very large tables which
never have more than 10% updates/deletes don't get vacuumed at all until
freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
you don't *want* to vacuum a billion-row table which has only 100,000
updates.

#autovacuum_analyze_scale_factor = 0.1

The 10% threshold for analyze is there because (a) analyze is cheap, and
(b) 10% changes to a table can result in very bad plans if the changes
are highly skewed towards a specific range, such as additions onto the
end of a time-based table.

The current postgres defaults were tested on DBT2 as well as pgbench,
and in my last 2 years of consulting I've seldom found reason to touch
them except on *specific* tables.  So I still feel that they are good
defaults.

It would be worth doing a DBT2/DBT5 test run with different autovac
settings post-8.4 so see if we should specifically change the vacuum
threshold.  Pending that, though, I think the current defaults are good
enough.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-30 Thread Alvaro Herrera
Josh Berkus escribió:

 #autovacuum_vacuum_scale_factor = 0.2
 
 This is set because in my experience, 20% bloat is about the level at
 which bloat starts affecting performance; thus, we want to vacuum at
 that level but not sooner.  This does mean that very large tables which
 never have more than 10% updates/deletes don't get vacuumed at all until
 freeze_age; this is a *good thing*. VACUUM on large tables is expensive;
 you don't *want* to vacuum a billion-row table which has only 100,000
 updates.

Hmm, now that we have partial vacuum, perhaps we should revisit this.


 It would be worth doing a DBT2/DBT5 test run with different autovac
 settings post-8.4 so see if we should specifically change the vacuum
 threshold.

Right.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-28 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Rick richard.bran...@ca.com wrote:

 Since vacuum just recovers space, that doesn't seem to be nearly
 as critical for performance?
 
 That doesn't really match my experience.  Without regular
 vacuuming, tables and indices end up being larger than they ought
 to be and contain large amounts of dead space that slows things
 down.  How much of an impact that ends up having depends on how
 badly bloated they are and what you're trying to do, but it can
 get very ugly.
 
That has been my experience, too.  When we first started using
PostgreSQL, we noticed a performance hit when some small tables
which were updated very frequently were vacuumed.  Our knee-jerk
reaction was to tune autovacuum to be less aggressive, so that we
didn't get hit with the pain as often.  Of course, things just got
worse, because every access to that table, when vacuum hadn't been
run recently, had to examine all versions of the desired row, and
test visibility for each version, to find the current one.  So
performance fell off even worse.  So we went to much more aggressive
settings for autovacuum (although only slightly more aggressive than
what has since become the default) and the problems disappeared.
 
Basically, as long as small tables are not allowed to bloat,
vacuuming them is so fast that you never notice it.
 
 8.3 and higher are better about this because of an optimization
 called HOT, but there can still be problems.
 
Agreed on both counts.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-28 Thread akp geek
Hi -
   don't want to side track the discussion. We have 8.4, which of
AUTOVACUUM PARAMETERS can be set to handle individual table?  I ran into
bloat with small table only. Now the issue is being resolved.

Regards
On Wed, Apr 28, 2010 at 10:20 AM, Thomas Kellerer spam_ea...@gmx.netwrote:

 Rick, 22.04.2010 22:42:


 So, in a large table, the scale_factor is the dominant term. In a
 small table, the threshold is the dominant term. But both are taken into
 account.

 The default values are set for small tables; it is not being run for
 large tables.


 With 8.4 you can adjust the autovacuum settings per table...





 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] autovacuum strategy / parameters

2010-04-28 Thread Kenneth Marshall
Check out the manual:

http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#AUTOVACUUM

Cheers,
Ken

On Wed, Apr 28, 2010 at 10:37:35AM -0400, akp geek wrote:
 Hi -
don't want to side track the discussion. We have 8.4, which of
 AUTOVACUUM PARAMETERS can be set to handle individual table?  I ran into
 bloat with small table only. Now the issue is being resolved.
 
 Regards
 On Wed, Apr 28, 2010 at 10:20 AM, Thomas Kellerer spam_ea...@gmx.netwrote:
 
  Rick, 22.04.2010 22:42:
 
 
  So, in a large table, the scale_factor is the dominant term. In a
  small table, the threshold is the dominant term. But both are taken into
  account.
 
  The default values are set for small tables; it is not being run for
  large tables.
 
 
  With 8.4 you can adjust the autovacuum settings per table...
 
 
 
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-28 Thread Thomas Kellerer

akp geek, 28.04.2010 16:37:

We have 8.4, which of AUTOVACUUM PARAMETERS can be set to handle individual 
table?


All documented here:
http://www.postgresql.org/docs/current/static/sql-createtable.html



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-27 Thread Robert Haas
On Thu, Apr 22, 2010 at 4:42 PM, Rick richard.bran...@ca.com wrote:
 On Apr 22, 2:55 pm, robertmh...@gmail.com (Robert Haas) wrote:
 On Wed, Apr 21, 2010 at 11:06 AM, Rick richard.bran...@ca.com wrote:
  I have a DB with small and large tables that can go up to 15G.
  For performance benefits, it appears that analyze has much less cost
  than vacuum, but the same benefits?

 Err, no.  ANALYZE gathers statistics for the query planner; VACUUM
 clears out old, dead tuples so that space can be reused by the
 database system.

  I can’t find any clear recommendations for frequencies and am
  considering these parameters:

  Autovacuum_vacuum_threshold = 5
  Autovacuum_analyze_threshold = 1
  Autovacuum_vacuum_scale_factor = 0.01
  Autovacuum_analyze_scale_factor = 0.005

  This appears it will result in table analyzes occurring around 10,000
  to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000,
  depending on the table sizes.

  Can anyone comment on whether this is the right strategy and targets
  to use?

 I'm not that familiar with tuning these parameters but increasing the
 default thesholds by a thousand-fold doesn't seem like a good idea.
 Small tables will never get vacuumed or analyzed at all.

 ...Robert

 --
 Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org)
 To make changes to your 
 subscription:http://www.postgresql.org/mailpref/pgsql-performance

 The problem is with the autovacuum formula:

 In a loop, autovacuum checks to see if number of dead tuples 
 ((number of live tuples * autovacuum_vacuum_scale_factor) +
 autovacuum_vacuum_threshold), and if
 so, it runs VACUUM. If not, it sleeps. It works the same way for
 ANALYZE.

 So, in a large table, the scale_factor is the dominant term. In a
 small
 table, the threshold is the dominant term. But both are taken into
 account.

 The default values are set for small tables; it is not being run for
 large tables.
 The question boils down to exactly what is the max number of dead
 tuples that should be allowed to accumulate before running analyze?
 Since vacuum just recovers space, that doesn't seem to be nearly as
 critical for performance?

That doesn't really match my experience.  Without regular vacuuming,
tables and indices end up being larger than they ought to be and
contain large amounts of dead space that slows things down.  How much
of an impact that ends up having depends on how badly bloated they are
and what you're trying to do, but it can get very ugly.

My guess is that the reason we run ANALYZE more frequently than vacuum
(with the default settings) is that ANALYZE is pretty cheap.  In many
cases, if the statistical distribution of the data hasn't changed
much, then it's not really necessary, but it doesn't cost much either.
 And for certain types of usage patterns, like time series (where the
maximum value keeps increasing) it's REALLY important to analyze
frequently.

But having said that, on the systems I've worked with, I've only
rarely seen a problem caused by not analyzing frequently enough.  On
the other hand, I've seen MANY problems caused by not vacuuming
enough.  Someone runs a couple of big queries that rewrite a large
portion of a table several times over and, boom, problems.  8.3 and
higher are better about this because of an optimization called HOT,
but there can still be problems.

Other people's experiences may not match mine, but the bottom line is
that you need to do both of these things, and you need to make sure
they happen regularly.  In most cases, the CPU and I/O time they
consume will be amply repaid in improved query performance.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-26 Thread Rick
On Apr 22, 2:55 pm, robertmh...@gmail.com (Robert Haas) wrote:
 On Wed, Apr 21, 2010 at 11:06 AM, Rick richard.bran...@ca.com wrote:
  I have a DB with small and large tables that can go up to 15G.
  For performance benefits, it appears that analyze has much less cost
  than vacuum, but the same benefits?

 Err, no.  ANALYZE gathers statistics for the query planner; VACUUM
 clears out old, dead tuples so that space can be reused by the
 database system.

  I can’t find any clear recommendations for frequencies and am
  considering these parameters:

  Autovacuum_vacuum_threshold = 5
  Autovacuum_analyze_threshold = 1
  Autovacuum_vacuum_scale_factor = 0.01
  Autovacuum_analyze_scale_factor = 0.005

  This appears it will result in table analyzes occurring around 10,000
  to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000,
  depending on the table sizes.

  Can anyone comment on whether this is the right strategy and targets
  to use?

 I'm not that familiar with tuning these parameters but increasing the
 default thesholds by a thousand-fold doesn't seem like a good idea.
 Small tables will never get vacuumed or analyzed at all.

 ...Robert

 --
 Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org)
 To make changes to your 
 subscription:http://www.postgresql.org/mailpref/pgsql-performance

The problem is with the autovacuum formula:

In a loop, autovacuum checks to see if number of dead tuples 
((number of live tuples * autovacuum_vacuum_scale_factor) +
autovacuum_vacuum_threshold), and if
so, it runs VACUUM. If not, it sleeps. It works the same way for
ANALYZE.

So, in a large table, the scale_factor is the dominant term. In a
small
table, the threshold is the dominant term. But both are taken into
account.

The default values are set for small tables; it is not being run for
large tables.
The question boils down to exactly what is the max number of dead
tuples that should be allowed to accumulate before running analyze?
Since vacuum just recovers space, that doesn't seem to be nearly as
critical for performance?

-Rick

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-26 Thread Alvaro Herrera
Rick wrote:

 So, in a large table, the scale_factor is the dominant term. In a
 small
 table, the threshold is the dominant term. But both are taken into
 account.

Correct.

 The default values are set for small tables; it is not being run for
 large tables.

So decrease the scale factor and leave threshold alone.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] autovacuum strategy / parameters

2010-04-22 Thread Rick
I have a DB with small and large tables that can go up to 15G.
For performance benefits, it appears that analyze has much less cost
than vacuum, but the same benefits?
I can’t find any clear recommendations for frequencies and am
considering these parameters:

Autovacuum_vacuum_threshold = 5
Autovacuum_analyze_threshold = 1
Autovacuum_vacuum_scale_factor = 0.01
Autovacuum_analyze_scale_factor = 0.005

This appears it will result in table analyzes occurring around 10,000
to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000,
depending on the table sizes.

Can anyone comment on whether this is the right strategy and targets
to use?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum strategy / parameters

2010-04-22 Thread Robert Haas
On Wed, Apr 21, 2010 at 11:06 AM, Rick richard.bran...@ca.com wrote:
 I have a DB with small and large tables that can go up to 15G.
 For performance benefits, it appears that analyze has much less cost
 than vacuum, but the same benefits?

Err, no.  ANALYZE gathers statistics for the query planner; VACUUM
clears out old, dead tuples so that space can be reused by the
database system.

 I can’t find any clear recommendations for frequencies and am
 considering these parameters:

 Autovacuum_vacuum_threshold = 5
 Autovacuum_analyze_threshold = 1
 Autovacuum_vacuum_scale_factor = 0.01
 Autovacuum_analyze_scale_factor = 0.005

 This appears it will result in table analyzes occurring around 10,000
 to 85,000 dead tuples and vacuum occuring around 50,000 to 200,000,
 depending on the table sizes.

 Can anyone comment on whether this is the right strategy and targets
 to use?

I'm not that familiar with tuning these parameters but increasing the
default thesholds by a thousand-fold doesn't seem like a good idea.
Small tables will never get vacuumed or analyzed at all.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance