Re: [HACKERS] Potential autovacuum optimization: new tables

2013-01-25 Thread Bruce Momjian
On Mon, Oct 15, 2012 at 12:06:27AM +0100, Greg Stark wrote:
 On Sat, Oct 13, 2012 at 3:13 AM, Stephen Frost sfr...@snowman.net wrote:
  Josh's concern is about autovacuum causing lots of stats churn, which is
  understandable, we don't want it constantly rescanning a table
 
 I don't think rescanning the table is a big concern. autovacuum will
 only scan as often as it feels like in the first place and these are
 by definition small tables anyways.
 
 Josh's stated concern was about the churn in the stats table. That
 could cause extra vacuums on the stats table which could be a fairly
 substantial table. Hopefully HOT updates and the visibility bitmap
 would protect against that being too bad though.

Added to TODO:

Improve autovacuum tuning

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-15 Thread Robert Haas
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus j...@agliodbs.com wrote:
 For my part, over the last 3 years of consulting and dealing with 
 postgresql.conf settings for more than 140 clients:

 * only 10% of them ever touched the autoanalyze settings at all
 * of the ~~ 14 who did:
* 1 improved the tuning of their database
* 3 of them messed up autoanalyze, causing stats and vacuum issues
* ~~ 10 had no measurable effect

 ... so you'll understand when I say that I don't think ease of knob-twiddling 
 is a priority for autoanalyze design.  In fact, I'd say that removing the 
 knobs entirely is a design goal.

Yeah.  My experience is shorter in time frame, but similar in composition.

 I've been going over the notes and email archives from the period where Matt 
 O'Connor and I arrived at the current settings.  All of our testing was 
 devoted to autovacuum, not autoanalyze.  The threshold+scale_factor design 
 works pretty well for autovacuum; it prevents us from constantly vacuuming 
 small tables, or large tables with less than 20% dead rows.  And I did 
 extensive testing using DBT2 on OSDL to set the current defaults.

However, I disagree with this.  I think that things have changed a lot
in 8.4+, because of the visibility map.  Extra vacuuming is not nearly
so expensive as it used to be, and essentially 100% of the vacuum
problems I see are caused by not vacuuming frequently enough, either
because (1) when vacuum does eventually run it imposes a gigantic
server load for a really long time or (2) too much bloat builds up
between vacuum runs.   If these settings were adjusted in an exactly
middle-of-the-road fashion, I ought to see 50% of the problems from
vacuuming too often and the other 50% from not vacuuming often enough.
 The reality is nothing like that; it's all on one side.

As I've said before (and I believe Simon has said similar things), I
think we should be vacuuming the heap much more often but only doing
index vac when we accumulate enough dead tuples to justify the cost of
the index scan.  Pruning the heap is cheap and very effective.

 Our mistake was assuming that the same formula which worked well for vacuum 
 would work well for analyze.  And since the DBT2 database has entirely 
 medium-sized tables full of random data, no shortcomings in this thinking 
 showed up in the tests.  Since the only counterproposal at the time was to 
 have a flat percentage without a threshold, we got the current defaults.

 So, problem #1 is coming up with a mathematical formula.  My initial target 
 values are in terms of # of rows in the table vs. # of writes before analyze 
 is triggered:

 1 : 3
 10 : 5
 100 : 10
 1000 : 100
 10 : 2000
 100 : 5000
 1000 : 25000
 1 : 10

  etc.  So problem #1 is a mathematical formula which gives this kind of 
 curve.  I've tried some solution-seeking software, but I don't know how to 
 use it well enough to get something useful.

That's a pretty funny-looking curve, because it doubles between 10 and
100 but then increases 10x between 100 and 1000.  It's similarly
erratic further on.  But I do agree that some kind of log scale might
be appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-15 Thread Josh Berkus
Robert,

 As I've said before (and I believe Simon has said similar things), I
 think we should be vacuuming the heap much more often but only doing
 index vac when we accumulate enough dead tuples to justify the cost of
 the index scan.  Pruning the heap is cheap and very effective.

You are probably correct, especially since the testing which set the
current thresholds was pre-HOT, even.

We don't have a vacuum method which allows us to vacuum the heap but not
the indexes, though, do we?

Note that the only time I encounter chronic autovacuum issues, it's for
problem tables like queue tables, and no default is going to cope with
those.

Anyway, big thing is, we need to be able to test this.

 That's a pretty funny-looking curve, because it doubles between 10 and
 100 but then increases 10x between 100 and 1000.  It's similarly
 erratic further on.  But I do agree that some kind of log scale might
 be appropriate.

Yeah, it's mainly the endpoints I'm sure of based on experience.  The
middle should be a smooth curve between them, if possible.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-15 Thread Christopher Browne
On Sat, Oct 13, 2012 at 3:49 PM, Joshua Berkus j...@agliodbs.com wrote:
 So, problem #1 is coming up with a mathematical formula.  My initial target 
 values are in terms of # of rows in the table vs. # of writes before analyze 
 is triggered:

 1 : 3
 10 : 5
 100 : 10
 1000 : 100
 10 : 2000
 100 : 5000
 1000 : 25000
 1 : 10

Do we necessarily care about smoothness?

If we don't at all, then this would be fine:

func powerlaw (tuples int) int {
if tuples  10 {
return 3
}
if tuples  100 {
return 5
}
if tuples  1000 {
return 10
}
if tuples  10 {
return 100
}
if tuples  100 {
return 2000
}
if tuples  1000 {
return 5000
}
if tuples  1 {
return 25000
}
return 10
}

If we want smoothness within the ranges, this is a piecewise linear
representation of your table:

func powerlaw2 (tuples int) int {
if tuples  10 {
return 3
}
if tuples  100 {
return 5 + 5 * (tuples - 90)/90
}
if tuples  1000 {
return 10 + 90 * (tuples - 900)/900
}
if tuples  10 {
return 100 + 1900 * (tuples - 99000)/99000
}
if tuples  100 {
return 2000 + 3000 * (tuples - 90)/90
}
if tuples  1000 {
return 5000 + 22000 * (tuples - 900)/900
}
if tuples  1 {
return 25000 + 75000 * (tuples - 9000)/9000
}
return 10   
}

That's in Go, but there shouldn't be anything too unfamiliar looking
about it :-).

It would be nice to have a simpler functional representation, but the
above is by no means heinous, and it's not verbose beyond reason.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-14 Thread Greg Stark
On Sat, Oct 13, 2012 at 3:13 AM, Stephen Frost sfr...@snowman.net wrote:
 Josh's concern is about autovacuum causing lots of stats churn, which is
 understandable, we don't want it constantly rescanning a table

I don't think rescanning the table is a big concern. autovacuum will
only scan as often as it feels like in the first place and these are
by definition small tables anyways.

Josh's stated concern was about the churn in the stats table. That
could cause extra vacuums on the stats table which could be a fairly
substantial table. Hopefully HOT updates and the visibility bitmap
would protect against that being too bad though.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-13 Thread Joshua Berkus

 For my part, while that's certainly an interesting idea, it's far
 more
 complicated than even providing GUCs and the idea is to make PG just
 do
 it right, not to offer the user more ways to get it wrong...

Yes, please let's not replace the existing too-simplistic knobs with giant 
complicated gadgets nobody, including us, understands.

For my part, over the last 3 years of consulting and dealing with 
postgresql.conf settings for more than 140 clients:

* only 10% of them ever touched the autoanalyze settings at all
* of the ~~ 14 who did:
   * 1 improved the tuning of their database
   * 3 of them messed up autoanalyze, causing stats and vacuum issues
   * ~~ 10 had no measurable effect

... so you'll understand when I say that I don't think ease of knob-twiddling 
is a priority for autoanalyze design.  In fact, I'd say that removing the knobs 
entirely is a design goal.

I've been going over the notes and email archives from the period where Matt 
O'Connor and I arrived at the current settings.  All of our testing was devoted 
to autovacuum, not autoanalyze.  The threshold+scale_factor design works pretty 
well for autovacuum; it prevents us from constantly vacuuming small tables, or 
large tables with less than 20% dead rows.  And I did extensive testing using 
DBT2 on OSDL to set the current defaults.

Our mistake was assuming that the same formula which worked well for vacuum 
would work well for analyze.  And since the DBT2 database has entirely 
medium-sized tables full of random data, no shortcomings in this thinking 
showed up in the tests.  Since the only counterproposal at the time was to have 
a flat percentage without a threshold, we got the current defaults.

So, problem #1 is coming up with a mathematical formula.  My initial target 
values are in terms of # of rows in the table vs. # of writes before analyze is 
triggered:

1 : 3
10 : 5
100 : 10
1000 : 100
10 : 2000
100 : 5000
1000 : 25000
1 : 10

 etc.  So problem #1 is a mathematical formula which gives this kind of 
curve.  I've tried some solution-seeking software, but I don't know how to use 
it well enough to get something useful.

Second problem is actually testing the result.  At this point, we don't have 
any performance tests which create anything other than fairly randomly 
distributed data, which doesn't tend to show up any issues in analyze.  We 
really need a performance test where new data is skewed and unbalanced, 
including tables of radically different sizes, and where we're set up to 
measure the level of inaccuracy in query statistics.  

Hmmm.  Actually, for measuring the innacuracy, I have some tools thanks to 
David Wheeler.  But not to generate the test in the first place.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-13 Thread Jeff Janes
On Sat, Oct 13, 2012 at 12:49 PM, Joshua Berkus j...@agliodbs.com wrote:

 So, problem #1 is coming up with a mathematical formula.  My initial target 
 values are in terms of # of rows in the table vs. # of writes before analyze 
 is triggered:

 1 : 3
 10 : 5
 100 : 10
 1000 : 100
 10 : 2000
 100 : 5000
 1000 : 25000
 1 : 10

  etc.  So problem #1 is a mathematical formula which gives this kind of 
 curve.  I've tried some solution-seeking software, but I don't know how to 
 use it well enough to get something useful.

That is close to a power law, where best fit is about threshold = 1.5
* (rows ** 0.6)

rowsyours   powerfit
1.00E+003.00E+001.50E+00
1.00E+015.00E+005.97E+00
1.00E+021.00E+012.38E+01
1.00E+031.00E+029.46E+01
1.00E+052.00E+031.50E+03
1.00E+065.00E+035.97E+03
1.00E+072.50E+042.38E+04
1.00E+081.00E+059.46E+04

If you want something more natural, reduce the exponent from 0.6 to
0.5 so it becomes the square root.

I have no opinion on the suitability of this, I'm just crunching the
numbers for you.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-13 Thread Tom Lane
Joshua Berkus j...@agliodbs.com writes:
 I've been going over the notes and email archives from the period
 where Matt O'Connor and I arrived at the current settings.  All of our
 testing was devoted to autovacuum, not autoanalyze.
 Our mistake was assuming that the same formula which worked well for
 vacuum would work well for analyze.

Ah.  Okay, maybe we can agree that that wasn't a good idea.

 So, problem #1 is coming up with a mathematical formula.  My initial target 
 values are in terms of # of rows in the table vs. # of writes before analyze 
 is triggered:

 1 : 3
 10 : 5
 100 : 10
 1000 : 100
 10 : 2000
 100 : 5000
 1000 : 25000
 1 : 10

I don't really see that we need to bend over backwards to exactly match
some data points that you made up out of thin air.  How about
ceil(sqrt(N)) to start with?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-13 Thread Joshua Berkus

 Ah.  Okay, maybe we can agree that that wasn't a good idea.

Oh, I'd say there's no question it was a mistake.  We just didn't have the data 
at the time to realize it.

 I don't really see that we need to bend over backwards to exactly
 match
 some data points that you made up out of thin air.  How about
 ceil(sqrt(N)) to start with?

We can start with anything, including Jeff Jane's equation (for my part, I 
think sqrt(N) will result in analyzing very large tables a bit too often) The 
tough part will be coming up with some way to test it.

--Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Josh Berkus
All,

 1. User creates new table
 2. User inserts 45 records into new table.
 3. Time passes.
 4. User creates a query which joins against new table.
 5. Planner uses estimate of 1000 rows for the new table.
 6. User gets very bad query plan.

Because Snowman asked me for an example:

Before ANALYZE on the new table:

http://explain.depesz.com/s/36D

After ANALYZE on the new table:

http://explain.depesz.com/s/851t

That's a 25X difference in execution time.  This is not the first time
I've seen this issue.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Now, I look at this, and ask myself: why didn't autoanalyze kick in at
 step 3?  After all, this was a table which had 0 rows, we inserted 45
 rows, making the table infinitely larger.  It should have got on the
 autoanalyze list, no?

 Well, no.  It seems that any table with less than
 autovacuum_analyze_threshold rows will NEVER be autoanalyzed.  Ever.

Yeah ...

 This seems easy to fix.  If a table has no stats and has any write stats
 at all, it should automatically go on the autoanalyze list.  Or if it's
 easier, one where last_autoanalyze is null.

No, it's not that easy.  The question you have to ask is when has that
initial write burst stopped?.  As an example, if autovacuum happened to
see that table in the instant after CREATE, it might autovacuum it while
it's still empty, and then this rule fails to trigger any further effort.

Personally I've always thought that autovacuum's rules should be based
on a percentage of rows changed, not an absolute threshold (or maybe in
addition to an absolute threshold).  This way, if you create a table and
insert 10 rows, that would make it subject to analyze on-sight, even if
autovac had managed to pass by while it was still empty, because the
percentage-changed is infinite.  Then, if you insert the other 35 rows
you meant to insert, it's *again* subject to autoanalyze on the next
pass, because the percentage-changed is still 350%.

I remember having got voted down on the percentage approach back when
we first put AV into core, but I remain convinced that decision was a
bad one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 Because Snowman asked me for an example:

Thanks. :)

 That's a 25X difference in execution time.  This is not the first time
 I've seen this issue.

If we can figure out an 'easy' solution to this, I'd definitely vote for
it being back-patched.  Having a table simply never get analyze'd
strikes me as a very bad thing.

Thanks again,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Josh Berkus

 No, it's not that easy.  The question you have to ask is when has that
 initial write burst stopped?.  As an example, if autovacuum happened to
 see that table in the instant after CREATE, it might autovacuum it while
 it's still empty, and then this rule fails to trigger any further effort.

Well, frankly, it would be useful to know it's empty too.  If you really
wanna see that 1000-row default estimate bite you on the tuchas, try a
JOIN against an empty new table.

 Personally I've always thought that autovacuum's rules should be based
 on a percentage of rows changed, not an absolute threshold (or maybe in
 addition to an absolute threshold).  This way, if you create a table and
 insert 10 rows, that would make it subject to analyze on-sight, even if
 autovac had managed to pass by while it was still empty, because the
 percentage-changed is infinite.  Then, if you insert the other 35 rows
 you meant to insert, it's *again* subject to autoanalyze on the next
 pass, because the percentage-changed is still 350%

 I remember having got voted down on the percentage approach back when
 we first put AV into core, but I remain convinced that decision was a
 bad one.

Yeah, I was one of the ones voting against you.  The reason not to have
percentage-only is for small tables.  Imagine that you have a table with
18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.

Add two rows -- ANALYZE
UPDATE two rows -- ANALYZE
UPDATE three more rows -- ANALYZE
DELETE three rows -- ANALYZE

Without a threshold, any table under 100 rows which gets UPDATEs would
be continually in the autoanalyze queue, which would not be beneficial;
the churn in pg_statistic alone would be detrimental.

I guess the real issue is that we tried to solve the issue of ANALYZEing
tables of radically different sizes with a simple heuristic of
threshold+scale_factor, and that's proving too simple for actual
production sites.  The other end where autoanalyze often falls down is
the high end (tables with a million rows).

Can anyone think of a new heuristic which doesn't involve adding 2-6 new
GUCS knobs?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I remember having got voted down on the percentage approach back when
 we first put AV into core, but I remain convinced that decision was a
 bad one.

 Yeah, I was one of the ones voting against you.  The reason not to have
 percentage-only is for small tables.  Imagine that you have a table with
 18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.

[ shrug... ]  You're attacking a straw man, or more precisely putting
words into my mouth about what the percentage-based thresholds might be.
Notice the examples I gave involved update percentages quite far north
of 100%.  It's possible and maybe likely that we need a sliding scale.

Also, I don't necessarily accept the conclusion you seem to be drawing,
that it's okay to have complete turnover of a small table and not redo
its stats.  If you don't like the current behavior when there's no
stats, why would you like the behavior when there are some stats but
they no longer have the remotest relationship to reality?

 Can anyone think of a new heuristic which doesn't involve adding 2-6 new
 GUCS knobs?

The increased number of knobs may be a problem, but I don't think we can
avoid having more.  Your own complaint is that the current design is too
simplistic.  Replacing it with a different but equally simplistic design
probably won't help much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Josh Berkus

 [ shrug... ]  You're attacking a straw man, or more precisely putting
 words into my mouth about what the percentage-based thresholds might be.
 Notice the examples I gave involved update percentages quite far north
 of 100%.  It's possible and maybe likely that we need a sliding scale.

Yes, or a logarithmic one.

 Also, I don't necessarily accept the conclusion you seem to be drawing,
 that it's okay to have complete turnover of a small table and not redo
 its stats.  

I'm not drawing that conclusion.  I'm explaining the logic of
autovacuum_analyze_threshold.   That logic actually works pretty well
for tables between 200 rows and 200,000 rows.  It's outside of those
boundaries where it starts to break down.

 The increased number of knobs may be a problem, but I don't think we can
 avoid having more.  Your own complaint is that the current design is too
 simplistic.  Replacing it with a different but equally simplistic design
 probably won't help much.

Well, we could do something which involves no GUCS at all, which would
be my favorite approach.  For example, Frost and I were discussing this
on IRC.  Imagine if autovac threshold were set according to a simple log
function, resulting in very small tables getting analyzed after 100%
changes, and very large tables getting analyzed after 0.1% changes, and
everyone else between?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 [ shrug... ]  You're attacking a straw man, or more precisely putting
 words into my mouth about what the percentage-based thresholds might be.
 Notice the examples I gave involved update percentages quite far north
 of 100%.  It's possible and maybe likely that we need a sliding scale.

I was just discussing such a sliding scale approach w/ Josh on IRC, my
thinking was that we could use a logarithmic approach based on table
size.

 Also, I don't necessarily accept the conclusion you seem to be drawing,
 that it's okay to have complete turnover of a small table and not redo
 its stats.  If you don't like the current behavior when there's no
 stats, why would you like the behavior when there are some stats but
 they no longer have the remotest relationship to reality?

Josh's concern is about autovacuum causing lots of stats churn, which is
understandable, we don't want it constantly rescanning a table, but
perhaps we could use some kind of threshold for preventing autovac from
rescanning a table it just scanned?  Note that I did *not* say 'GUC',
but I don't know what the 'right' answer is for how frequently is
good-but-not-too-frequent.  I'd also like to try and avoid adding GUCs.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread David Johnston
On Oct 12, 2012, at 22:13, Stephen Frost sfr...@snowman.net wrote:

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 [ shrug... ]  You're attacking a straw man, or more precisely putting
 words into my mouth about what the percentage-based thresholds might be.
 Notice the examples I gave involved update percentages quite far north
 of 100%.  It's possible and maybe likely that we need a sliding scale.
 
 I was just discussing such a sliding scale approach w/ Josh on IRC, my
 thinking was that we could use a logarithmic approach based on table
 size.
 
 Also, I don't necessarily accept the conclusion you seem to be drawing,
 that it's okay to have complete turnover of a small table and not redo
 its stats.  If you don't like the current behavior when there's no
 stats, why would you like the behavior when there are some stats but
 they no longer have the remotest relationship to reality?
 
 Josh's concern is about autovacuum causing lots of stats churn, which is
 understandable, we don't want it constantly rescanning a table, but
 perhaps we could use some kind of threshold for preventing autovac from
 rescanning a table it just scanned?  Note that I did *not* say 'GUC',
 but I don't know what the 'right' answer is for how frequently is
 good-but-not-too-frequent.  I'd also like to try and avoid adding GUCs.
 
 

Instead of global could you attach an interface function to the table and have 
the auto-analyzer call that function to basically ask the table whether it 
needs to be analyzed?  Still need to deal with defaults and provide a decent 
supply of built-in algorithms but at least the system can be made tunable.  The 
default algorithm could maybe just handoff to a table size specific handler.  
The create table and alter table commands could be used to change the assigned 
algorithm if desired and new ones could be supplied via extensions.

The 1000 row default seems unusual at first glance and contributes to the 
problem described.

It is likely that the first I sent following the create table is going to be a 
bulk load if the table is going to have many rows.  In the case where rows are 
inserted individually it is likely that the expected row count will be closer 
to 1 than 1000.

One useful algorithm to provide the user is analyze on insert and, though maybe 
less so, analyze on update.  So that any insert/update causes the table to be 
re-analyzed.  Not a good default but, combined with delayed analyze logic to 
establish a minimum frequency, is a possible option for some use cases.

Temporary table creation should have special attention given if changes are 
going to be made here.

Another idea is to have system after [command] trigger(s) than can be used to 
call analyze without waiting for the auto-vacuum process.  Provide some way for 
CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger.

David J.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread Stephen Frost
* David Johnston (pol...@yahoo.com) wrote:
 Instead of global could you attach an interface function to the table and 
 have the auto-analyzer call that function to basically ask the table whether 
 it needs to be analyzed?  Still need to deal with defaults and provide a 
 decent supply of built-in algorithms but at least the system can be made 
 tunable.  The default algorithm could maybe just handoff to a table size 
 specific handler.  The create table and alter table commands could be used to 
 change the assigned algorithm if desired and new ones could be supplied via 
 extensions.

For my part, while that's certainly an interesting idea, it's far more
complicated than even providing GUCs and the idea is to make PG just do
it right, not to offer the user more ways to get it wrong...

Thanks,

Stephen


signature.asc
Description: Digital signature