Re: [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Shridhar, However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. No, I think the logic's fine, it's the numbers which are wrong. We want to vacuum when updates reach between 5% and 15% of total rows. NOT when updates reach 110% of total rows ... that's much too late. Well, looks like thresholds below 1 should be norm rather than exception. Hmmm ... I also think the threshold level needs to be lowered; I guess the purpose was to prevent continuous re-vacuuuming of small tables? Unfortunately, in the current implementation, the result is tha small tables never get vacuumed at all. So for defaults, I would peg -V at 0.1 and -v at 100, so our default calculation for a table with 10,000 rows is: 100 + ( 0.1 * 10,000 ) = 1100 rows. I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that thresholds less than 1 should be used. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Yes. Vacuum threshold is counting deletes, I hope? It does. My comment about the frequency of vacuums vs. analyze is that currently the *default* is to analyze twice as often as you vacuum.Based on my experiece as a PG admin on a variety of databases, I believe that the default should be to analyze half as often as you vacuum. OK. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. I will have real data very soon . I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as n=updates + inserts *-* deletes Rather than current n = updates + inserts. Also update readme about examples and analyze frequency. What does statistics gather BTW? Just number of rows or something else as well? I think I would put that on Hackers separately. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? Shridhar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] More detail on settings for pgavd?
Shridhar, I would say -V 0.2-0.4 could be great as well. Fact to emphasize is that thresholds less than 1 should be used. Yes, but not thresholds, scale factors of less than 1.0. Thresholds should still be in the range of 100 to 1000. I will submit a patch that would account deletes in analyze threshold. Since you want to delay the analyze, I would calculate analyze count as n=updates + inserts *-* deletes I'm not clear on how this is a benefit. Deletes affect the statistics, too. What does statistics gather BTW? Just number of rows or something else as well? I think I would put that on Hackers separately. Number of tuples, degree of uniqueness, some sample values, and high/low values. Just query your pg_statistics view for an example. I am still wary of inverting vacuum analyze frequency. You think it is better to set inverted default rather than documenting it? See my post to Matthew. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] More detail on settings for pgavd?
Shridhar, Will look into it. Give me a day or so. I am planning couple of other patches as well. May be over week end. Thanks, appreciated. As I said, I don't think the settings themselves are wrong, I think the documentation is. What are you patching? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] More detail on settings for pgavd?
Josh Berkus wrote: Shridhar, I was looking at the -V/-v and -A/-a settings in pgavd, and really don't understand how the calculation works. According to the readme, if I set -v to 1000 and -V to 2 (the defaults) for a table with 10,000 rows, pgavd would only vacuum after 21,000 rows had been updated. This seems wrong. No. that is correct. It is calculated as threshold = base + scale*numebr of current rows Which translates to 21,000 = 1000 + 2*1000 However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. I would rather vacuum the table at 2000 updates, which is what you probably want. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Can you clear this up a little? I'd like to tweak these settings but can't without being better aquainted with the calculation. What did you expected in above example? It is not difficult to tweak pg_autovacuum calculations. For testing we can play around. Also, you may want to reverse your default ratio for Vacuum/analyze frequency. True, analyze is a less expensive operation than Vacuum, but it's also needed less often -- only when the *distribution* of data changes.I've seen databases where the optimal vacuum/analyze frequency was every 10 min/once per day. OK vacuum and analyze thresholds are calculated with same formula as shown above but with different parameters as follows. vacthresh = vacbase + vacscale*ntuples anathresh = anabase + anascale*ntuples What you are asking for is vacthresh = vacbase*vacscale anathresh = anabase + anascale*ntuples Would that tilt the favour the way you want? i.e. an analyze is triggered when a fixed *percentage* of table changes but a vacuum is triggered when a fixed *number of rows* are changed. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. And BTW, this is all brain child of Mathew O.Connor(Correct? I am not good at either names or spellings). The way I wrote pgavd originally, each table got to get separate threshold..:-). That was rather a brute force approach. Shridhar ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] More detail on settings for pgavd?
Shridhar, However I do not agree with this logic entirely. It pegs the next vacuum w.r.t current table size which is not always a good thing. No, I think the logic's fine, it's the numbers which are wrong. We want to vacuum when updates reach between 5% and 15% of total rows. NOT when updates reach 110% of total rows ... that's much too late. Hmmm ... I also think the threshold level needs to be lowered; I guess the purpose was to prevent continuous re-vacuuuming of small tables? Unfortunately, in the current implementation, the result is tha small tables never get vacuumed at all. So for defaults, I would peg -V at 0.1 and -v at 100, so our default calculation for a table with 10,000 rows is: 100 + ( 0.1 * 10,000 ) = 1100 rows. I would rather vacuum the table at 2000 updates, which is what you probably want. Not necessarily. This would be painful if the table has 10,000,000 rows. It *should* be based on a % of rows. Furthermore analyze threshold depends upon inserts+updates. I think it should also depends upon deletes for obvious reasons. Yes. Vacuum threshold is counting deletes, I hope? What did you expected in above example? It is not difficult to tweak pg_autovacuum calculations. For testing we can play around. Can I set the settings to decimals, or are they integers? vacthresh = vacbase*vacscale anathresh = anabase + anascale*ntuples Nope, see above. My comment about the frequency of vacuums vs. analyze is that currently the *default* is to analyze twice as often as you vacuum.Based on my experiece as a PG admin on a variety of databases, I believe that the default should be to analyze half as often as you vacuum. I am all for experimentation. If you have real life data to play with, I can give you some patches to play around. I will have real data very soon . -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend