Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread David Fetter
On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
  Disabling autovacuum can have catastrophic effects, since it disables
  the ANALYZing of tables.
  
  Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?
 
 You mean something like
 autovacuum = on / off / analyze ?
 
 We can certainly do that, but is there buy-in?

+1

Having autovacuum on during bulk loads can really tank performance,
but having autoanalyze on is good :)

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas

David Fetter wrote:

On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:

Simon Riggs wrote:

Disabling autovacuum can have catastrophic effects, since it disables
the ANALYZing of tables.

Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

You mean something like
autovacuum = on / off / analyze ?

We can certainly do that, but is there buy-in?


+1

Having autovacuum on during bulk loads can really tank performance,
but having autoanalyze on is good :)


Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.


Adding that option feels natural to me, but it is a rather blunt 
instrument. You can already do that with pg_autovacuum, though that 
interface isn't very user-friendly. I whole-heartedly support the idea 
of controlling autovacuum with storage options, e.g ALTER TABLE ... 
WITH (autoanalyze = on).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
 David Fetter wrote:
  On Tue, Sep 16, 2008 at 08:59:08PM -0400, Alvaro Herrera wrote:
  Simon Riggs wrote:
  Disabling autovacuum can have catastrophic effects, since it disables
  the ANALYZing of tables.
 
  Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?
  You mean something like
  autovacuum = on / off / analyze ?
 
  We can certainly do that, but is there buy-in?
  
  +1
  
  Having autovacuum on during bulk loads can really tank performance,
  but having autoanalyze on is good :)
 
 Isn't autoanalyze a waste of time during a bulk load? Seems better to 
 run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.

 Adding that option feels natural to me, but it is a rather blunt 
 instrument. You can already do that with pg_autovacuum, though that 
 interface isn't very user-friendly. I whole-heartedly support the idea 
 of controlling autovacuum with storage options, e.g ALTER TABLE ... 
 WITH (autoanalyze = on).

Yes, have that option also, since it is fine tuning.

I definitely want a blunt instrument! I don't want to have to run ALTER
TABLE on *every* table. Even if you think that's possible, it won't work
in conjunction with interfaces submitting standard SQL, plus it won't
work if I forget either.

This request comes from a real situation where a dump was reloaded
during the day when autovacuum was off and so ANALYZE was missed. Not my
mistake, but it took time to resolve that could have been avoided by the
new option suggested here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Alvaro Herrera
Simon Riggs wrote:
 
 On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:

  Isn't autoanalyze a waste of time during a bulk load? Seems better to 
  run ANALYZE manually at the end.
 
 Its not a waste of time because it catches tables immediately they have
 been loaded, not just at the end of the bulk load. Running ANALYZE is a
 waste of time if autoanalyze has already caught it, which is why that's
 never been added onto the end of a pg_dump script. But currently this is
 true only when we have both autoVACUUM and autoANALYZE enabled.

Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Simon Riggs wrote:

On Wed, 2008-09-17 at 10:09 +0300, Heikki Linnakangas wrote:
Isn't autoanalyze a waste of time during a bulk load? Seems better to 
run ANALYZE manually at the end.

Its not a waste of time because it catches tables immediately they have
been loaded, not just at the end of the bulk load. Running ANALYZE is a
waste of time if autoanalyze has already caught it, which is why that's
never been added onto the end of a pg_dump script. But currently this is
true only when we have both autoVACUUM and autoANALYZE enabled.


Hmm, one of the first complaints about defaulting autovacuum to on was
that it made restores so much longer *because* it was choosing to do
autoanalyzes on the tables as they were imported.  It was then that the
auto-cancel mechanism was introduced.

http://pgsql.markmail.org/message/rqyjkafuw43426xy

Why doesn't this new request conflict with that one?


The problem back then was that a CREATE INDEX was waiting on the 
autoanalyze to finish, and the autoanalyze took a long time to finish 
because of vacuum_cost_delay. Now that we have the auto-cancel 
mechanism, that's not a problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 Why doesn't this new request conflict with that one?

 The problem back then was that a CREATE INDEX was waiting on the 
 autoanalyze to finish, and the autoanalyze took a long time to finish 
 because of vacuum_cost_delay. Now that we have the auto-cancel 
 mechanism, that's not a problem.

Define not a problem.  With auto-cancel, what will happen is that
whatever work the autoanalyze does will be wasted.  It seems to me
that the current complaint is about background autovacuum/autoanalyze
wasting cycles during a bulk load, and there's certainly no purer waste
than an analyze cycle that gets aborted.

I tend to agree with Alvaro that there's not very much of a use case for
an analyze-only autovacuum mode.  Assuming that we get to the point of
having a parallelizing pg_restore, it would be interesting to give it an
option to include ANALYZE for each table it's loaded among the tasks
that it schedules.  (I'm visualizing these commands as being made up by
pg_restore itself, *not* added to the pg_dump output.)  Then you could
have a reasonably optimal total workflow, whereas allowing autovacuum
to try to schedule the ANALYZEs can't be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Simon Riggs
On Wed, 2008-09-17 at 10:52 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Alvaro Herrera wrote:
  Why doesn't this new request conflict with that one?
 
  The problem back then was that a CREATE INDEX was waiting on the 
  autoanalyze to finish, and the autoanalyze took a long time to finish 
  because of vacuum_cost_delay. Now that we have the auto-cancel 
  mechanism, that's not a problem.
 
 Define not a problem.  With auto-cancel, what will happen is that
 whatever work the autoanalyze does will be wasted.  It seems to me
 that the current complaint is about background autovacuum/autoanalyze
 wasting cycles during a bulk load, and there's certainly no purer waste
 than an analyze cycle that gets aborted.

OK, but that's an argument against auto-anything, not just against
splitting out autoanalyze and autovacuum.

 I tend to agree with Alvaro that there's not very much of a use case for
 an analyze-only autovacuum mode. 

Did he say that? I thought he said we could do that, what did that mean 
Alvaro?

I have a customer saying this would be a good thing and I agree. The
roles of Autovacuum and autoanalyze are not exactly matched, so why do
we force them to be run together or not at all? Why not allow
the user to specify whether they want both or not? It's an option, we're
not forcing anyone to do it that way if they don't want to.

  Assuming that we get to the point of
 having a parallelizing pg_restore, it would be interesting to give it an
 option to include ANALYZE for each table it's loaded among the tasks
 that it schedules.  (I'm visualizing these commands as being made up by
 pg_restore itself, *not* added to the pg_dump output.)  Then you could
 have a reasonably optimal total workflow, whereas allowing autovacuum
 to try to schedule the ANALYZEs can't be.

That doesn't solve all problems, just ones with pg_restore. That's nice
and I won't turn it away, but what will we do about plain pg_dump and
about other table creations and loads?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-17 Thread Robert Haas
 I tend to agree with Alvaro that there's not very much of a use case for
 an analyze-only autovacuum mode.  Assuming that we get to the point of
 having a parallelizing pg_restore, it would be interesting to give it an
 option to include ANALYZE for each table it's loaded among the tasks
 that it schedules.  (I'm visualizing these commands as being made up by
 pg_restore itself, *not* added to the pg_dump output.)  Then you could
 have a reasonably optimal total workflow, whereas allowing autovacuum
 to try to schedule the ANALYZEs can't be.

In Simon's original email, he suggested forcing an automatic ANALYZE
on the server side after CREATE TABLE AS.  I objected on the grounds
that this won't fix anything for people who are doing bulk data loads
using any other mechanism.  Here, you're proposing the exact same
thing, except instead of restricting it to people who use CREATE TABLE
AS, you're restricting it to people who use a hypothetical
parallelized implementation of pg_restore.

While either of these is better than doing nothing, ISTM it would be
far better to give the database some smarts about what constitutes a
bulk data load (a whole bunch of insert operations on a newly created
table) and what to do about it (synchronous analyze just before the
first operation on the table that isn't an insert - and perhaps not
before).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-16 Thread Robert Haas
 Disabling autovacuum can have catastrophic effects, since it disables
 the ANALYZing of tables.

 Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

 ANALYZE times are fairly bounded because of the way we do sampling.
 VACUUM times are not bounded at all, and typically  O(n). So it makes
 sense to switch off the VACUUM at certain times, but never good to
 switch off ANALYZE.

This seems reasonable.

 While we're there, it would be useful if CREATE TABLE AS SELECT was
 followed by an automatic ANALYZE. Especially important for temp tables.

This seems like the wrong solution.  There is a general problem that
bulk data loads on an empty table tend to result in horrible query
plans, but not all of those will be created using CREATE TABLE AS
SELECT.  Someone may easily do a COPY or just a bunch of INSERTs.
Maybe something like: If the table has never been ANALYZEd, force an
immediate ANALYZE before planning the first SELECT, UPDATE, or DELETE.

And maybe also do the same thing if the table has grown significantly
(not sure what the threshold should be) since the last ANALYZE.

I'm not sure exactly what is practical here but it would certainly be
nice to have some solution.  This has got to be my #1 cause of
extremely slow queries.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-16 Thread Tom Lane
Robert Haas [EMAIL PROTECTED] writes:
 This seems like the wrong solution.  There is a general problem that
 bulk data loads on an empty table tend to result in horrible query
 plans,

Please provide some specifics.  It's been a very long time since the
planner was completely unaware of the size of such a table.  Lack of
stats is certainly a handicap, but I'm not convinced it should result
in horrible plans.  Maybe a more appropriate answer to this type of
issue is to tweak some of the default selectivity numbers.

 And maybe also do the same thing if the table has grown significantly
 (not sure what the threshold should be) since the last ANALYZE.

Autovacuum already does this type of thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-16 Thread Robert Haas
 Please provide some specifics.  It's been a very long time since the
 planner was completely unaware of the size of such a table.  Lack of
 stats is certainly a handicap, but I'm not convinced it should result
 in horrible plans.  Maybe a more appropriate answer to this type of
 issue is to tweak some of the default selectivity numbers.

Sure.  See attached output.  This is from 8.2.9, but the behavior on
HEAD is similar.  The first query executed before and then again after
the ANALYZE is OK, but the second, which involves an additional join
condition, is 6X slower prior to the ANALYZE.

I don't see how you're going to fix this problem by tweaking the
selectivity estimates.  If it were possible to generate good query
plans without selectivity estimates derived from the actual table
contents, we wouldn't need ANALYZE in the first place.

 And maybe also do the same thing if the table has grown significantly
 (not sure what the threshold should be) since the last ANALYZE.

 Autovacuum already does this type of thing.

It's asynchronous, though.  Frequently, you want to load a bunch of
data into a table and then immediately execute a query against it, or
possibly several queries.  It's pretty annoying to have to write logic
that says - ok, if the number of rows that we just inserted was really
big relative to what was already in the table, then do an ANALYZE on
the table before issuing the SELECT, otherwise skip it.

I would be happy enough if we could recognize CREATE TABLE ... insert
a bunch of data ... SELECT as a case where we need to force a
synchronous ANALYZE - because in my experience you almost always do.
Recognizing the case where the table has grown a lot since the last
ANALYZE is probably harder, and a bit less important, but would surely
be nice if it could be done.

...Robert
portal=# create table bulk_data (a integer, b integer, primary key (a, b)); 
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index bulk_data_pkey 
for table bulk_data
CREATE TABLE
portal=# insert into bulk_data select 1,generate_series(1,100); 
INSERT 0 100
portal=# insert into bulk_data select 2,generate_series(1,100);
INSERT 0 100
portal=# explain analyze select * from bulk_data where a = 1 limit 100; 
   QUERY PLAN   
-
 Limit  (cost=0.00..272.33 rows=100 width=8) (actual time=0.461..0.875 rows=100 
loops=1)
   -  Index Scan using bulk_data_pkey on bulk_data  (cost=0.00..25898.89 
rows=9510 width=8) (actual time=0.454..0.630 rows=100 loops=1)
 Index Cond: (a = 1)
 Total runtime: 1.094 ms
(4 rows)

portal=# explain analyze select * from bulk_data where a = 1 and (b % 1000) = 0 
limit 100;
 QUERY PLAN 

 Limit  (cost=183.81..10439.27 rows=48 width=8) (actual time=409.110..494.701 
rows=100 loops=1)
   -  Bitmap Heap Scan on bulk_data  (cost=183.81..10439.27 rows=48 width=8) 
(actual time=409.103..494.228 rows=100 loops=1)
 Recheck Cond: (a = 1)
 Filter: ((b % 1000) = 0)
 -  Bitmap Index Scan on bulk_data_pkey  (cost=0.00..183.79 rows=9510 
width=0) (actual time=404.616..404.616 rows=100 loops=1)
   Index Cond: (a = 1)
 Total runtime: 495.186 ms
(7 rows)

portal=# analyze bulk_data;
ANALYZE
portal=# explain analyze select * from bulk_data where a = 1 limit 100; 
 QUERY PLAN 

 Limit  (cost=0.00..3.50 rows=100 width=8) (actual time=0.037..0.435 rows=100 
loops=1)
   -  Seq Scan on bulk_data  (cost=0.00..34804.20 rows=995341 width=8) (actual 
time=0.031..0.184 rows=100 loops=1)
 Filter: (a = 1)
 Total runtime: 0.676 ms
(4 rows)

portal=# explain analyze select * from bulk_data where a = 1 and (b % 1000) = 0 
limit 100;
QUERY PLAN  
---
 Limit  (cost=0.00..900.23 rows=100 width=8) (actual time=1.261..76.754 
rows=100 loops=1)
   -  Seq Scan on bulk_data  (cost=0.00..44804.28 rows=4977 width=8) (actual 
time=1.255..76.381 rows=100 loops=1)
 Filter: ((a = 1) AND ((b % 1000) = 0))
 Total runtime: 77.084 ms
(4 rows)

portal=#

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum and Autoanalyze

2008-09-16 Thread Alvaro Herrera
Simon Riggs wrote:
 Disabling autovacuum can have catastrophic effects, since it disables
 the ANALYZing of tables.
 
 Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

You mean something like
autovacuum = on / off / analyze ?

We can certainly do that, but is there buy-in?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Autovacuum and Autoanalyze

2008-09-15 Thread Simon Riggs
Disabling autovacuum can have catastrophic effects, since it disables
the ANALYZing of tables.

Can we have a mode where we disable autoVACUUM yet enable autoANALYZE?

ANALYZE times are fairly bounded because of the way we do sampling.
VACUUM times are not bounded at all, and typically  O(n). So it makes
sense to switch off the VACUUM at certain times, but never good to
switch off ANALYZE.

While we're there, it would be useful if CREATE TABLE AS SELECT was
followed by an automatic ANALYZE. Especially important for temp tables.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers