Re: [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Shridhar Daithankar
Josh Berkus wrote:

Shridhar,
 However I do not agree with this logic entirely. It pegs the next vacuum
w.r.t current table size which is not always a good thing.


No, I think the logic's fine, it's the numbers which are wrong.   We want to 
vacuum when updates reach between 5% and 15% of total rows.   NOT when 
updates reach 110% of total rows ... that's much too late.
Well, looks like thresholds below 1 should be norm rather than exception.

Hmmm ... I also think the threshold level needs to be lowered; I guess the 
purpose was to prevent continuous re-vacuuuming of small tables?  
Unfortunately, in the current implementation, the result is tha small tables 
never get vacuumed at all.

So for defaults, I would peg -V at 0.1 and -v at 100, so our default 
calculation for a table with 10,000 rows is:

100 +  ( 0.1 * 10,000 ) = 1100 rows.
I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that 
thresholds less than 1 should be used.

Furthermore analyze threshold depends upon inserts+updates. I think it
should also depends upon deletes for obvious reasons.
Yes.  Vacuum threshold is counting deletes, I hope?
It does.

My comment about the frequency of vacuums vs. analyze is that currently the 
*default* is to analyze twice as often as you vacuum.Based on my 
experiece as a PG admin on a variety of databases, I believe that the default 
should be to analyze half as often as you vacuum.
OK.

I am all for experimentation. If you have real life data to play with, I
can give you some patches to play around.
I will have real data very soon .
I will submit a patch that would account deletes in analyze threshold. Since you 
want to delay the analyze, I would calculate analyze count as

n=updates + inserts *-* deletes

Rather than current n = updates + inserts. Also update readme about examples 
and analyze frequency.

What does statistics gather BTW? Just number of rows or something else as well? 
I think I would put that on Hackers separately.

I am still wary of inverting vacuum analyze frequency. You think it is better to 
set inverted default rather than documenting it?

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] More detail on settings for pgavd?

2003-11-20 Thread Josh Berkus
Shridhar,

 I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that
 thresholds less than 1 should be used.

Yes, but not thresholds, scale factors of less than 1.0.  Thresholds should 
still be in the range of 100 to 1000.

 I will submit a patch that would account deletes in analyze threshold.
 Since you want to delay the analyze, I would calculate analyze count as

 n=updates + inserts *-* deletes

I'm not clear on how this is a benefit.  Deletes affect the statistics, too.

 What does statistics gather BTW? Just number of rows or something else as
 well? I think I would put that on Hackers separately.

Number of tuples, degree of uniqueness, some sample values, and high/low 
values.   Just query your pg_statistics view for an example.

 I am still wary of inverting vacuum analyze frequency. You think it is
 better to set inverted default rather than documenting it?

See my post to Matthew.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Josh Berkus
Shridhar,

 Will look into it. Give me a day or so. I am planning couple of other
 patches as well. May be over week end.

Thanks, appreciated.   As I said, I don't think the settings themselves are 
wrong, I think the documentation is.

What are you patching?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Shridhar Daithankar
Josh Berkus wrote:

Shridhar,

I was looking at the -V/-v and -A/-a settings in pgavd, and really don't 
understand how the calculation works.   According to the readme, if I set -v 
to 1000 and -V to 2 (the defaults) for a table with 10,000 rows, pgavd would 
only vacuum after 21,000 rows had been updated.   This seems wrong.
No. that is correct.

It is calculated as

threshold = base + scale*numebr of current rows

Which translates to

21,000 = 1000 + 2*1000

However I do not agree with this logic entirely. It pegs the next vacuum w.r.t 
current table size which is not always a good thing.

I would rather vacuum the table at 2000 updates, which is what you probably want.

Furthermore analyze threshold depends upon inserts+updates. I think it should 
also depends upon deletes for obvious reasons.

Can you clear this up a little?   I'd like to tweak these settings but can't 
without being better aquainted with the calculation.
What did you expected in above example? It is not difficult to tweak 
pg_autovacuum calculations. For testing we can play around.

Also, you may want to reverse your default ratio for Vacuum/analyze frequency.  
True, analyze is a less expensive operation than Vacuum, but it's also needed 
less often -- only when the *distribution* of data changes.I've seen 
databases where the optimal vacuum/analyze frequency was every 10 min/once 
per day.
OK vacuum and analyze thresholds are calculated with same formula as shown above 
 but with different parameters as follows.

vacthresh = vacbase + vacscale*ntuples
anathresh = anabase + anascale*ntuples
What you are asking for is

vacthresh = vacbase*vacscale
anathresh = anabase + anascale*ntuples
Would that tilt the favour the way you want? i.e. an analyze is triggered when a 
fixed *percentage* of table changes but a vacuum is triggered when a fixed 
*number of rows* are changed.

I am all for experimentation. If you have real life data to play with, I can 
give you some patches to play around.

And BTW, this is all brain child of Mathew O.Connor(Correct? I am not good at 
either names or spellings). The way I wrote pgavd originally, each table got to 
get separate threshold..:-). That was rather a brute force approach.

 Shridhar





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] More detail on settings for pgavd?

2003-11-19 Thread Josh Berkus
Shridhar,

 However I do not agree with this logic entirely. It pegs the next vacuum
 w.r.t current table size which is not always a good thing.

No, I think the logic's fine, it's the numbers which are wrong.   We want to 
vacuum when updates reach between 5% and 15% of total rows.   NOT when 
updates reach 110% of total rows ... that's much too late.

Hmmm ... I also think the threshold level needs to be lowered; I guess the 
purpose was to prevent continuous re-vacuuuming of small tables?  
Unfortunately, in the current implementation, the result is tha small tables 
never get vacuumed at all.

So for defaults, I would peg -V at 0.1 and -v at 100, so our default 
calculation for a table with 10,000 rows is:

100 +  ( 0.1 * 10,000 ) = 1100 rows.

 I would rather vacuum the table at 2000 updates, which is what you probably
 want.

Not necessarily.  This would be painful if the table has 10,000,000 rows.   It 
*should* be based on a % of rows.

 Furthermore analyze threshold depends upon inserts+updates. I think it
 should also depends upon deletes for obvious reasons.

Yes.  Vacuum threshold is counting deletes, I hope?

 What did you expected in above example? It is not difficult to tweak
 pg_autovacuum calculations. For testing we can play around.

Can I set the settings to decimals, or are they integers?

 vacthresh = vacbase*vacscale
 anathresh = anabase + anascale*ntuples

Nope, see above.

My comment about the frequency of vacuums vs. analyze is that currently the 
*default* is to analyze twice as often as you vacuum.Based on my 
experiece as a PG admin on a variety of databases, I believe that the default 
should be to analyze half as often as you vacuum.

 I am all for experimentation. If you have real life data to play with, I
 can give you some patches to play around.

I will have real data very soon .

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend