Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Amit Kapila writes: > On Sun, Jan 19, 2014 at 5:59 AM, Tom Lane wrote: >> It strikes me that there may be an obvious way to improve the number >> further, based on the observation in this thread that nkeep doesn't need >> to be scaled up because VACUUM should have scanned every page that could >> contain dead tuples. Namely, that we're arriving at new_rel_tuples by >> scaling up num_tuples linearly --- but perhaps we should only scale up >> the live-tuples fraction of that count, not the dead-tuples fraction. >> By scaling up dead tuples too, we are presumably still overestimating >> new_rel_tuples somewhat, and the behavior that I'm seeing with this test >> script seems to confirm that. > After reading your analysis, first thought occurred to me is that we can > directly subtract nkeep from num_tuples to account for better scaling > of live tuples, but I think the scaling routine vac_estimate_reltuples() > is expecting scanned_tuples and this routine is shared by both > Analyze and Vacuum where the mechanism to calculate the live > and dead tuples seems to be bit different, so may be directly passing > a subtract of num_tuples and nkeep to this routine might create some > problem. However I think this idea is definitely worth pursuing to > improve the estimates of live tuples in Vacuum. Yeah, it seemed like it would require some rethinking of the way vac_estimate_reltuples() works. It's probably not that hard, but it looked like it'd require more thought than I wanted to put into it on a Saturday ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Sun, Jan 19, 2014 at 5:59 AM, Tom Lane wrote: > Amit Kapila writes: >> I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready >> For Committer. > > I've reviewed and committed this patch, with one significant change. > If you look at the way that vacuumlazy.c computes new_rel_tuples, it's > based on scanned_tuples (lazy_scan_heap's num_tuples), which is the total > number of surviving tuples *including* the recently-dead ones counted in > nkeep. This is the number that we want to put into pg_class.reltuples, > I think, but it's wrong for the pgstats stuff to use it as n_live_tuples > if we're going to count the recently-dead ones as dead. That is, if we're > improving the approximation that n_dead_tuples is zero after a vacuum, > the fix should involve reducing the n_live_tuples estimate as well as > increasing the n_dead_tuples estimate. > > Using your test script against the unpatched code, it's easy to see that > there's a large (and wrong) value of n_live_tup reported by an autovacuum, > which gets corrected by the next autoanalyze. For instance note these > successive readouts from the pg_stat_all_tables query: > > n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup > +--+---+ > 497365 |9 | 8 |4958346 > 497365 |9 | 8 |5458346 > 1186555 | 10 | 8 | 0 > 1186555 | 10 | 8 | 50 > 499975 | 10 | 9 |2491877 > > Since we know the true number of live tuples is always exactly 50 > in this test, that jump is certainly wrong. With the committed patch, > the behavior is significantly saner: > > n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup > +--+---+ > 483416 |2 | 2 |5759861 > 483416 |2 | 2 |6259861 > 655171 |3 | 2 | 382306 > 655171 |3 | 2 | 882306 > 553942 |3 | 3 |3523744 > > Still some room for improvement, but it's not so silly anymore. > > It strikes me that there may be an obvious way to improve the number > further, based on the observation in this thread that nkeep doesn't need > to be scaled up because VACUUM should have scanned every page that could > contain dead tuples. Namely, that we're arriving at new_rel_tuples by > scaling up num_tuples linearly --- but perhaps we should only scale up > the live-tuples fraction of that count, not the dead-tuples fraction. > By scaling up dead tuples too, we are presumably still overestimating > new_rel_tuples somewhat, and the behavior that I'm seeing with this test > script seems to confirm that. After reading your analysis, first thought occurred to me is that we can directly subtract nkeep from num_tuples to account for better scaling of live tuples, but I think the scaling routine vac_estimate_reltuples() is expecting scanned_tuples and this routine is shared by both Analyze and Vacuum where the mechanism to calculate the live and dead tuples seems to be bit different, so may be directly passing a subtract of num_tuples and nkeep to this routine might create some problem. However I think this idea is definitely worth pursuing to improve the estimates of live tuples in Vacuum. > I haven't time to pursue this idea at the moment, but perhaps someone else > would like to. I think this idea is worth to be added in Todo list. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Amit Kapila writes: > I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready > For Committer. I've reviewed and committed this patch, with one significant change. If you look at the way that vacuumlazy.c computes new_rel_tuples, it's based on scanned_tuples (lazy_scan_heap's num_tuples), which is the total number of surviving tuples *including* the recently-dead ones counted in nkeep. This is the number that we want to put into pg_class.reltuples, I think, but it's wrong for the pgstats stuff to use it as n_live_tuples if we're going to count the recently-dead ones as dead. That is, if we're improving the approximation that n_dead_tuples is zero after a vacuum, the fix should involve reducing the n_live_tuples estimate as well as increasing the n_dead_tuples estimate. Using your test script against the unpatched code, it's easy to see that there's a large (and wrong) value of n_live_tup reported by an autovacuum, which gets corrected by the next autoanalyze. For instance note these successive readouts from the pg_stat_all_tables query: n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup +--+---+ 497365 |9 | 8 |4958346 497365 |9 | 8 |5458346 1186555 | 10 | 8 | 0 1186555 | 10 | 8 | 50 499975 | 10 | 9 |2491877 Since we know the true number of live tuples is always exactly 50 in this test, that jump is certainly wrong. With the committed patch, the behavior is significantly saner: n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup +--+---+ 483416 |2 | 2 |5759861 483416 |2 | 2 |6259861 655171 |3 | 2 | 382306 655171 |3 | 2 | 882306 553942 |3 | 3 |3523744 Still some room for improvement, but it's not so silly anymore. It strikes me that there may be an obvious way to improve the number further, based on the observation in this thread that nkeep doesn't need to be scaled up because VACUUM should have scanned every page that could contain dead tuples. Namely, that we're arriving at new_rel_tuples by scaling up num_tuples linearly --- but perhaps we should only scale up the live-tuples fraction of that count, not the dead-tuples fraction. By scaling up dead tuples too, we are presumably still overestimating new_rel_tuples somewhat, and the behavior that I'm seeing with this test script seems to confirm that. I haven't time to pursue this idea at the moment, but perhaps someone else would like to. The n_dead_tup values seem to still be on the high side (not the low side) when I run this test. Not too sure why that is. Also, I don't see any particularly meaningful change in the rate of autovacuuming or autoanalyzing when using default postgresql.conf settings. I see that your test case changes the autovac parameters quite a bit, but I didn't bother installing those values here. This may or may not mean much; the fix is clearly correct on its own terms. On the whole, this patch is not really addressing the question of accounting for transactions changing the table concurrently with VACUUM; it's only fixing the impedance mismatch between pgstats wanting to count live and dead tuples separately while VACUUM wasn't telling it that. That's a good thing to do, but I think we still have some issues here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Thu, Jan 16, 2014 at 10:24 PM, tirtho wrote: > Is this now fixed? If so, where do I find the patch for postgres 9.2.2. This is still not fixed, the patch is in Ready For Committer stage. You can confirm the status here: https://commitfest.postgresql.org/action/patch_view?id=1258 I had verified that it still applies on HEAD, so moved it to current Commit Fest. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Is this now fixed? If so, where do I find the patch for postgres 9.2.2. Thanks - Tirthankar -- View this message in context: http://postgresql.1045698.n5.nabble.com/Heavily-modified-big-table-bloat-even-in-auto-vacuum-is-running-tp5774274p5787477.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Thu, Dec 12, 2013 at 12:24 PM, Haribabu kommi wrote: > On 06 December 2013 11:57 Amit Kapila wrote: > > A simplified test and updated patch by taking care the above comment are > attached in the mail. > I am not able to reduce the test duration but changed as the test > automatically exists after 45 mins run. > Please check vacuum_test.sh file more details for running the test. > > Auto vacuum count Bloat size > Master 15220MB > Patched_nkeep18213MB I ran the attached test and the numbers are as below: Auto vacuum count Bloat size Master 16222MB Patched_nkeep 23216MB Here by Bloat size, it means the table_size after the test finished and by Auto vacuum count, it means number of times Auto Vacuum is triggered during test run. It clearly shows that by setting number of dead tuples at end of Vacuum improves the situation. I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready For Committer. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Mon, Dec 16, 2013 at 6:51 PM, Peter Eisentraut wrote: > Amit, if you think this patch is ready now, please mark it as Ready for > Committer. Otherwise, I encourage your to continue the discussion and > possibly resubmit the patch to the next commitfest. Only one test is pending from myside to conclude. I am planing to complete it tomorrow. Is it okay? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Amit, if you think this patch is ready now, please mark it as Ready for Committer. Otherwise, I encourage your to continue the discussion and possibly resubmit the patch to the next commitfest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 06 December 2013 11:57 Amit Kapila wrote: > On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi > wrote: > > On 29 November 2013 12:00 Amit Kapila wrote: > >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi > >> wrote: > >> Few questions about your latest patch: > >> a. Is there any reason why you are doing estimation of dead tuples > >> only for Autovacuum and not for Vacuum. > > > > No, changed. > > > >> /* clear and get the new stats for calculating proper dead tuples */ > >> pgstat_clear_snapshot(); tabentry = > >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); > >> b. In the above code, to get latest data you are first clearing > >> snapshot and then calling pgstat function. It will inturn perform > I/O > >> (read of stats file) and send/receive message from stats > >> collector to ensure it can read latest data. I think it will add > overhead > >> to Vacuum, especially if 'nkeep' calculated in function > >> lazy_scan_heap() can serve the purpose. In my simple test[1], I > >> observed > >> that value of keep can serve the purpose. > >> > >> Can you please once try the test on 'nkeep' approach patch. > > > > Using the nkeep and snapshot approach, I ran the test for 40 mins > with > > a high analyze_threshold and results are below. > > > > Auto vacuum count Bloat size > > Master 11 220MB > > Patched_nkeep 14 215MB > > Patched_snapshot 18 198MB > > > > Both the approaches are showing good improvement in the test. > > Updated patches, test script and configuration is attached in the > mail. > > I think updating dead tuple count using nkeep is good idea as similar > thing is done for Analyze as well in acquire_sample_rows(). > One minor point, I think it is better to log dead tuples is below error > message: > ereport(LOG, > (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" > "pages: %d removed, %d remain\n" > "tuples: %.0f removed, %.0f remain\n" > > "tuples: %.0f removed, %.0f remain, %.0f dead\n" > > > About your test, how to collect the data by running this script, are > you manually stopping it after 40 mins, because I ran it for more than > an hour, the final result didn't came. > As I mentioned you last time, please simplify your test, for other > person in its current form, it is difficult to make meaning out of it. > Write comments on top of it in steps form to explain what exactly it is > doing and how to take data using it (for example, do I need to wait, > till script ends; how long this test can take to complete). A simplified test and updated patch by taking care the above comment are attached in the mail. I am not able to reduce the test duration but changed as the test automatically exists after 45 mins run. Please check vacuum_test.sh file more details for running the test. Auto vacuum count Bloat size Master 15220MB Patched_nkeep18213MB Please let me know your suggestions. Regards, Hari babu. vacuum_fix_v7_nkeep.patch Description: vacuum_fix_v7_nkeep.patch test_script.tar.gz Description: test_script.tar.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi wrote: > On 29 November 2013 12:00 Amit Kapila wrote: >> On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi >> wrote: >> Few questions about your latest patch: >> a. Is there any reason why you are doing estimation of dead tuples only >> for Autovacuum and not for Vacuum. > > No, changed. > >> /* clear and get the new stats for calculating proper dead tuples */ >> pgstat_clear_snapshot(); tabentry = >> pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); >> b. In the above code, to get latest data you are first clearing >> snapshot and then calling pgstat function. It will inturn perform I/O >> (read of stats file) and send/receive message from stats collector >> to ensure it can read latest data. I think it will add overhead >> to Vacuum, especially if 'nkeep' calculated in function >> lazy_scan_heap() can serve the purpose. In my simple test[1], I >> observed >> that value of keep can serve the purpose. >> >> Can you please once try the test on 'nkeep' approach patch. > > Using the nkeep and snapshot approach, I ran the test for 40 mins with a > high analyze_threshold and results are below. > > Auto vacuum count Bloat size > Master 11 220MB > Patched_nkeep 14 215MB > Patched_snapshot 18 198MB > > Both the approaches are showing good improvement in the test. > Updated patches, test script and configuration is attached in the mail. I think updating dead tuple count using nkeep is good idea as similar thing is done for Analyze as well in acquire_sample_rows(). One minor point, I think it is better to log dead tuples is below error message: ereport(LOG, (errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n" "pages: %d removed, %d remain\n" "tuples: %.0f removed, %.0f remain\n" "tuples: %.0f removed, %.0f remain, %.0f dead\n" About your test, how to collect the data by running this script, are you manually stopping it after 40 mins, because I ran it for more than an hour, the final result didn't came. As I mentioned you last time, please simplify your test, for other person in its current form, it is difficult to make meaning out of it. Write comments on top of it in steps form to explain what exactly it is doing and how to take data using it (for example, do I need to wait, till script ends; how long this test can take to complete). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 29 November 2013 12:00 Amit Kapila wrote: > On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi > wrote: > > On 25 November 2013 10:43 Amit Kapila wrote: > >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi > >> wrote: > >> > On 19 November 2013 10:33 Amit Kapila wrote: > >> >> If I understood correctly, then your patch's main intention is to > >> >> correct the estimate of dead tuples, so that it can lead to > Vacuum > >> >> cleaning the table/index which otherwise is not happening as per > >> >> configuration value (autovacuum_vacuum_threshold) in some of the > >> >> cases, also it is not reducing the complete bloat (Unpatched - > >> 1532MB > >> >> ~Patched - 1474MB), as the main reason of bloat is extra space > in > >> >> index which can be reclaimed by reindex operation. > >> >> > >> >> So if above is correct then this patch has 3 advantages: > >> >> a. Extra Vacuum on table/index due to better estimation of dead > >> tuples. > >> >> b. Space reclaim due to this extra vacuum c. may be some > >> >> performance advantage as it will avoid the delay in cleaning dead > >> >> tuples > >> >> > >> >> I think better way to test the patch is to see how much benefit > is > >> >> there due to above (a and b points) advantages. Different values > >> >> of autovacuum_vacuum_threshold can be used to test. > >> > > >> > > >> > The performance effect of the patch is not much visible as I think > >> the > >> > analyze on the table estimates the number of dead tuples of the > >> > table > >> with some estimation. > >> > >>Yes, that seems to be the reason why you are not seeing any > >> performance benefit, but still I think this is useful optimization > to > >> do, as > >>analyze updates both the livetuples and dead tuples and similarly > >> vacuum should also update both the counts. Do you see any reason > >>why Vacuum should only update live tuples and not deadtuples? > > > > As vacuum touches all the pages where the dead tuples are present. > > This is not the Same with analyzer. Because of this reason, the > analyzer estimates the dead tuples also. > > With the proposed patch the vacuum also estimates the dead tuples. > > Few questions about your latest patch: > a. Is there any reason why you are doing estimation of dead tuples only > for Autovacuum and not for Vacuum. No, changed. > /* clear and get the new stats for calculating proper dead tuples */ > pgstat_clear_snapshot(); tabentry = > pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); > b. In the above code, to get latest data you are first clearing > snapshot and then calling pgstat function. It will inturn perform I/O > (read of stats file) and send/receive message from stats collector > to ensure it can read latest data. I think it will add overhead > to Vacuum, especially if 'nkeep' calculated in function > lazy_scan_heap() can serve the purpose. In my simple test[1], I > observed > that value of keep can serve the purpose. > > Can you please once try the test on 'nkeep' approach patch. Using the nkeep and snapshot approach, I ran the test for 40 mins with a high analyze_threshold and results are below. Auto vacuum count Bloat size Master 11 220MB Patched_nkeep 14 215MB Patched_snapshot 18 198MB Both the approaches are showing good improvement in the test. Updated patches, test script and configuration is attached in the mail. Regards, Hari babu. vacuum_fix_v6_nkeep.patch Description: vacuum_fix_v6_nkeep.patch test_script.tar.gz Description: test_script.tar.gz vacuum_fix_v6_snapshot.patch Description: vacuum_fix_v6_snapshot.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi wrote: > On 25 November 2013 10:43 Amit Kapila wrote: >> On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi >> wrote: >> > On 19 November 2013 10:33 Amit Kapila wrote: >> >> If I understood correctly, then your patch's main intention is to >> >> correct the estimate of dead tuples, so that it can lead to Vacuum >> >> cleaning the table/index which otherwise is not happening as per >> >> configuration value (autovacuum_vacuum_threshold) in some of the >> >> cases, also it is not reducing the complete bloat (Unpatched - >> 1532MB >> >> ~Patched - 1474MB), as the main reason of bloat is extra space in >> >> index which can be reclaimed by reindex operation. >> >> >> >> So if above is correct then this patch has 3 advantages: >> >> a. Extra Vacuum on table/index due to better estimation of dead >> tuples. >> >> b. Space reclaim due to this extra vacuum c. may be some performance >> >> advantage as it will avoid the delay in cleaning dead tuples >> >> >> >> I think better way to test the patch is to see how much benefit is >> >> there due to above (a and b points) advantages. Different values of >> >> autovacuum_vacuum_threshold can be used to test. >> > >> > >> > The performance effect of the patch is not much visible as I think >> the >> > analyze on the table estimates the number of dead tuples of the table >> with some estimation. >> >>Yes, that seems to be the reason why you are not seeing any >> performance benefit, but still I think this is useful optimization to >> do, as >>analyze updates both the livetuples and dead tuples and similarly >> vacuum should also update both the counts. Do you see any reason >>why Vacuum should only update live tuples and not deadtuples? > > As vacuum touches all the pages where the dead tuples are present. This is > not the > Same with analyzer. Because of this reason, the analyzer estimates the dead > tuples also. > With the proposed patch the vacuum also estimates the dead tuples. Few questions about your latest patch: a. Is there any reason why you are doing estimation of dead tuples only for Autovacuum and not for Vacuum. /* clear and get the new stats for calculating proper dead tuples */ pgstat_clear_snapshot(); tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel)); b. In the above code, to get latest data you are first clearing snapshot and then calling pgstat function. It will inturn perform I/O (read of stats file) and send/receive message from stats collector to ensure it can read latest data. I think it will add overhead to Vacuum, especially if 'nkeep' calculated in function lazy_scan_heap() can serve the purpose. In my simple test[1], I observed that value of keep can serve the purpose. Can you please once try the test on 'nkeep' approach patch. >> > Because of this reason not much performance improvement is not >> visible >> > as the missed dead tuple calculation in vacuum is covered by the >> analyze. >> >>Yeah, so might be we can check once by configuring >> analyze_threshold/scalefactor in a way that analyze doesn't get trigger >> during your test. > > I ran the test for one hour with a high analyze_threshold and results are > below. > > Auto vacuum count Bloat size > Master 15 155MB > Patched 23 134MB > > Updated test script and configuration is attached in the mail. I just had a brief look on your test, please check if you can simplify your script file and make the test results to come in 15~20 mins. Don't put too much effort on it, if you can do it easily then it is okay. [1] Simple test case to verify the value of dead tuples: Session-1 - a. Create table t1(c1 int); b. insert into t1 values(generate_series(1,1000)); c. delete from t1; d. Vacuum t1; -- here I stopped in debugger, after fetching dead tuple count first time (line 235, vacuumlazy.c, after applying your patch) as per your code (modified a bit so that I can get the value for Vacuum) Session-2 - a. insert into t1 values (generate_series(1000,1500)); b. delete from t1; Session -1 - b. Verified the value of nkeep in lazy_scan_heap(), it is 501 which is what we expect. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 25 November 2013 10:43 Amit Kapila wrote: > On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi > wrote: > > On 19 November 2013 10:33 Amit Kapila wrote: > >> If I understood correctly, then your patch's main intention is to > >> correct the estimate of dead tuples, so that it can lead to Vacuum > >> cleaning the table/index which otherwise is not happening as per > >> configuration value (autovacuum_vacuum_threshold) in some of the > >> cases, also it is not reducing the complete bloat (Unpatched - > 1532MB > >> ~Patched - 1474MB), as the main reason of bloat is extra space in > >> index which can be reclaimed by reindex operation. > >> > >> So if above is correct then this patch has 3 advantages: > >> a. Extra Vacuum on table/index due to better estimation of dead > tuples. > >> b. Space reclaim due to this extra vacuum c. may be some performance > >> advantage as it will avoid the delay in cleaning dead tuples > >> > >> I think better way to test the patch is to see how much benefit is > >> there due to above (a and b points) advantages. Different values of > >> autovacuum_vacuum_threshold can be used to test. > > > > > > The performance effect of the patch is not much visible as I think > the > > analyze on the table estimates the number of dead tuples of the table > with some estimation. > >Yes, that seems to be the reason why you are not seeing any > performance benefit, but still I think this is useful optimization to > do, as >analyze updates both the livetuples and dead tuples and similarly > vacuum should also update both the counts. Do you see any reason >why Vacuum should only update live tuples and not deadtuples? As vacuum touches all the pages where the dead tuples are present. This is not the Same with analyzer. Because of this reason, the analyzer estimates the dead tuples also. With the proposed patch the vacuum also estimates the dead tuples. > > Because of this reason not much performance improvement is not > visible > > as the missed dead tuple calculation in vacuum is covered by the > analyze. > >Yeah, so might be we can check once by configuring > analyze_threshold/scalefactor in a way that analyze doesn't get trigger > during your test. I ran the test for one hour with a high analyze_threshold and results are below. Auto vacuum count Bloat size Master 15 155MB Patched 23 134MB Updated test script and configuration is attached in the mail. Regards, Hari babu. test_script.tar.gz Description: test_script.tar.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi wrote: > On 19 November 2013 10:33 Amit Kapila wrote: >> If I understood correctly, then your patch's main intention is to >> correct the estimate of dead tuples, so that it can lead to Vacuum >> cleaning the table/index which otherwise is not happening as per >> configuration value (autovacuum_vacuum_threshold) in some of the cases, >> also it is not reducing the complete bloat (Unpatched - 1532MB >> ~Patched - 1474MB), as the main reason of bloat is extra space in >> index which can be reclaimed by reindex operation. >> >> So if above is correct then this patch has 3 advantages: >> a. Extra Vacuum on table/index due to better estimation of dead tuples. >> b. Space reclaim due to this extra vacuum c. may be some performance >> advantage as it will avoid the delay in cleaning dead tuples >> >> I think better way to test the patch is to see how much benefit is >> there due to above (a and b points) advantages. Different values of >> autovacuum_vacuum_threshold can be used to test. > > > The performance effect of the patch is not much visible as I think the analyze > on the table estimates the number of dead tuples of the table with some > estimation. Yes, that seems to be the reason why you are not seeing any performance benefit, but still I think this is useful optimization to do, as analyze updates both the livetuples and dead tuples and similarly vacuum should also update both the counts. Do you see any reason why Vacuum should only update live tuples and not deadtuples? > Because of this reason not much performance improvement is not visible as the > missed dead tuple calculation in vacuum is covered by the analyze. Yeah, so might be we can check once by configuring analyze_threshold/scalefactor in a way that analyze doesn't get trigger during your test. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 19 November 2013 10:33 Amit Kapila wrote: > If I understood correctly, then your patch's main intention is to > correct the estimate of dead tuples, so that it can lead to Vacuum > cleaning the table/index which otherwise is not happening as per > configuration value (autovacuum_vacuum_threshold) in some of the cases, > also it is not reducing the complete bloat (Unpatched - 1532MB > ~Patched - 1474MB), as the main reason of bloat is extra space in > index which can be reclaimed by reindex operation. > > So if above is correct then this patch has 3 advantages: > a. Extra Vacuum on table/index due to better estimation of dead tuples. > b. Space reclaim due to this extra vacuum c. may be some performance > advantage as it will avoid the delay in cleaning dead tuples > > I think better way to test the patch is to see how much benefit is > there due to above (a and b points) advantages. Different values of > autovacuum_vacuum_threshold can be used to test. I modified the test and did a performance test with following configuration changes, autovacuum_vacuum_threshold as 3 times the number of records in the table. Autovacuum_nap_time - 30s The test script will generate the configured vacuum threshold data in 45sec. After 180sec test, sleeps for 2 min. After one hour test run the patched approach shown one autovacuum is more than the master code. Not sure as this difference also may not be visible in long runs. The performance effect of the patch is not much visible as I think the analyze on the table estimates the number of dead tuples of the table with some estimation. Because of this reason not much performance improvement is not visible as the missed dead tuple calculation in vacuum is covered by the analyze. Please correct me if anything missed in my analysis. Updated patch and test scripts are attached in the mail. Regards, Hari babu. vacuum_fix_v5.patch Description: vacuum_fix_v5.patch Table_and_functions.sql Description: Table_and_functions.sql script.sh Description: script.sh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Fri, Nov 15, 2013 at 10:52 AM, Haribabu kommi wrote: > On 15 November 2013 10:00 Amit Kapila wrote: >> On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi >> wrote: >> > >> > With the simulated bloat test run for 1 hour the bloat occurred as >> > below, >> > >> > Unpatched - 1532MB >> > Patched - 1474MB >> >> In your test run, have you checked what happen if after heavy update >> (or once bloat occurs), if you keep the system idle (or just have read >> load on system) for some time, what is the result? > > In the simulated test run which is shared in the previous mail, after a heavy > update > the system is idle for 15 mins. > > With the master code, the vacuum is not triggered during this idle time as it > is > Not satisfies the vacuum threshold, because it doesn't consider the dead > tuples occurred > During vacuum operation. > > With the fix the one extra vacuum can gets triggered compared to master code > after two or three > heavy updates because of accumulation of dead tuples. > >> You haven't answered one of my questions in previous mail ( >With the >> fix of setting the dead tuples properly, the bloat is reduced and by >> changing the vacuum cost Parameters the bloat is avoided. >> Which fix here you are referring to?) > > The bloat reduced is same with initial and latest patch. > The vacuum cost parameters change (which doesn't contain any fix) is avoided > the bloat. If I understood correctly, then your patch's main intention is to correct the estimate of dead tuples, so that it can lead to Vacuum cleaning the table/index which otherwise is not happening as per configuration value (autovacuum_vacuum_threshold) in some of the cases, also it is not reducing the complete bloat (Unpatched - 1532MB ~Patched - 1474MB), as the main reason of bloat is extra space in index which can be reclaimed by reindex operation. So if above is correct then this patch has 3 advantages: a. Extra Vacuum on table/index due to better estimation of dead tuples. b. Space reclaim due to this extra vacuum c. may be some performance advantage as it will avoid the delay in cleaning dead tuples I think better way to test the patch is to see how much benefit is there due to above (a and b points) advantages. Different values of autovacuum_vacuum_threshold can be used to test. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 15 November 2013 10:00 Amit Kapila wrote: > On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi > wrote: > > On 12 November 2013 08:47 Amit Kapila wrote: > >> On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi > >> wrote: > >> > On 08 November 2013 18:35 Amit Kapila wrote: > >> >> On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi > >> >> wrote: > >> >> > On 07 November 2013 09:42 Amit Kapila wrote: > >> >> > 1. Taking a copy of n_dead_tuples before VACUUM starts and then > >> >> subtract it once it is done. > >> >> >This approach doesn't include the tuples which are remains > >> >> > during > >> >> the vacuum operation. > > > > Patch is modified as take a copy of n_dead_tuples during vacuum start > > and use the same while calculating the new dead tuples at end of > vacuum. > > > >> >> By the way, do you have test case or can you try to write a test > >> case > >> >> which can show this problem and then after fix, you can verify if > >> the > >> >> problem is resolved. > >> > > >> > The simulated index bloat problem can be generated using the > >> > attached > >> script and sql. > >> > With the fix of setting the dead tuples properly, > >> > >>Which fix here you are referring to, is it the one which you have > >> proposed with your initial mail? > >> > >> > the bloat is reduced and by changing the vacuum cost Parameters > the > >> > bloat is avoided. > > > > With the simulated bloat test run for 1 hour the bloat occurred as > > below, > > > > Unpatched - 1532MB > > Patched - 1474MB > > In your test run, have you checked what happen if after heavy update > (or once bloat occurs), if you keep the system idle (or just have read > load on system) for some time, what is the result? In the simulated test run which is shared in the previous mail, after a heavy update the system is idle for 15 mins. With the master code, the vacuum is not triggered during this idle time as it is Not satisfies the vacuum threshold, because it doesn't consider the dead tuples occurred During vacuum operation. With the fix the one extra vacuum can gets triggered compared to master code after two or three heavy updates because of accumulation of dead tuples. > You haven't answered one of my questions in previous mail ( >With the > fix of setting the dead tuples properly, the bloat is reduced and by > changing the vacuum cost Parameters the bloat is avoided. > Which fix here you are referring to?) The bloat reduced is same with initial and latest patch. The vacuum cost parameters change (which doesn't contain any fix) is avoided the bloat. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi wrote: > On 12 November 2013 08:47 Amit Kapila wrote: >> On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi >> wrote: >> > On 08 November 2013 18:35 Amit Kapila wrote: >> >> On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi >> >> wrote: >> >> > On 07 November 2013 09:42 Amit Kapila wrote: >> >> > 1. Taking a copy of n_dead_tuples before VACUUM starts and then >> >> subtract it once it is done. >> >> >This approach doesn't include the tuples which are remains >> >> > during >> >> the vacuum operation. > > Patch is modified as take a copy of n_dead_tuples during vacuum start and use > the same while calculating the new dead tuples at end of vacuum. > >> >> By the way, do you have test case or can you try to write a test >> case >> >> which can show this problem and then after fix, you can verify if >> the >> >> problem is resolved. >> > >> > The simulated index bloat problem can be generated using the attached >> script and sql. >> > With the fix of setting the dead tuples properly, >> >>Which fix here you are referring to, is it the one which you have >> proposed with your initial mail? >> >> > the bloat is reduced and by changing the vacuum cost Parameters the >> > bloat is avoided. > > With the simulated bloat test run for 1 hour the bloat occurred as below, > > Unpatched - 1532MB > Patched - 1474MB In your test run, have you checked what happen if after heavy update (or once bloat occurs), if you keep the system idle (or just have read load on system) for some time, what is the result? You haven't answered one of my questions in previous mail ( >With the fix of setting the dead tuples properly, the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. Which fix here you are referring to?) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 12 November 2013 08:47 Amit Kapila wrote: > On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi > wrote: > > On 08 November 2013 18:35 Amit Kapila wrote: > >> On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi > >> wrote: > >> > On 07 November 2013 09:42 Amit Kapila wrote: > >> > 1. Taking a copy of n_dead_tuples before VACUUM starts and then > >> subtract it once it is done. > >> >This approach doesn't include the tuples which are remains > >> > during > >> the vacuum operation. Patch is modified as take a copy of n_dead_tuples during vacuum start and use the same while calculating the new dead tuples at end of vacuum. > >> By the way, do you have test case or can you try to write a test > case > >> which can show this problem and then after fix, you can verify if > the > >> problem is resolved. > > > > The simulated index bloat problem can be generated using the attached > script and sql. > > With the fix of setting the dead tuples properly, > >Which fix here you are referring to, is it the one which you have > proposed with your initial mail? > > > the bloat is reduced and by changing the vacuum cost Parameters the > > bloat is avoided. With the simulated bloat test run for 1 hour the bloat occurred as below, Unpatched - 1532MB Patched - 1474MB With this patched approach the bloat is reduced. Regards, Hari babu. vacuum_fix_v4.patch Description: vacuum_fix_v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi wrote: > On 08 November 2013 18:35 Amit Kapila wrote: >> On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi >> wrote: >> > On 07 November 2013 09:42 Amit Kapila wrote: >> >> I am not sure whether the same calculation as done for >> new_rel_tuples >> >> works for new_dead_tuples, you can once check it. >> > >> > I didn't find any way to calculate new_dead_tuples like >> new_rel_tuples. >> > I will check it. >> > >> > The two approaches calculations are approximation values only. >> > >> > 1. Taking a copy of n_dead_tuples before VACUUM starts and then >> subtract it once it is done. >> >This approach doesn't include the tuples which are remains during >> the vacuum operation. >> >> Wouldn't next or future vacuum's will make the estimate more >> appropraite? > > Possible only when nkeep counter value (tuples not cleaned) is very less > value. Do you really expect too many dead tuples during Vacuum? >> > 2. nkeep counter contains the tuples which are still visible to other >> transactions. >> >This approach doesn't include tuples which are deleted on pages >> where vacuum operation is already finished. >> > >> > In my opinion the second approach gives the value nearer to the >> actual >> > value, because it includes some of the new dead tuples also. Please >> correct me if anything wrong in my analysis. >>I think main problem in nkeep logic is to come up with an estimation >> algorithm similar to live tuples. >> >> By the way, do you have test case or can you try to write a test case >> which can show this problem and then after fix, you can verify if the >> problem is resolved. > > The simulated index bloat problem can be generated using the attached script > and sql. > With the fix of setting the dead tuples properly, Which fix here you are referring to, is it the one which you have proposed with your initial mail? > the bloat is reduced and by changing the vacuum cost > Parameters the bloat is avoided. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 08 November 2013 18:35 Amit Kapila wrote: > On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi > wrote: > > On 07 November 2013 09:42 Amit Kapila wrote: > >> I am not sure whether the same calculation as done for > new_rel_tuples > >> works for new_dead_tuples, you can once check it. > > > > I didn't find any way to calculate new_dead_tuples like > new_rel_tuples. > > I will check it. > > > >> I am thinking that if we have to do estimation anyway, then wouldn't > >> it be better to do the way Tom had initially suggested (Maybe we > >> could have VACUUM copy the n_dead_tuples value as it exists when > >> VACUUM starts, and then send that as the value to subtract when it's > >> done?) > >> > >> I think the reason you gave that due to tuple visibility check the > >> number of dead tuples calculated by above logic is not accurate is > >> right but still it will make the value of dead tuples more > >> appropriate than it's current value. > >> > >> You can check if there is a way to do estimation of dead tuples > >> similar to new tuples, and it will be as solid as current logic of > >> vac_estimate_reltuples(), then it's okay, otherwise use the other > >> solution (using the value of n_dead_tuples at start of Vacuum) to > >> solve the problem. > > > > The two approaches calculations are approximation values only. > > > > 1. Taking a copy of n_dead_tuples before VACUUM starts and then > subtract it once it is done. > >This approach doesn't include the tuples which are remains during > the vacuum operation. > > Wouldn't next or future vacuum's will make the estimate more > appropraite? Possible only when nkeep counter value (tuples not cleaned) is very less value. > > 2. nkeep counter contains the tuples which are still visible to other > transactions. > >This approach doesn't include tuples which are deleted on pages > where vacuum operation is already finished. > > > > In my opinion the second approach gives the value nearer to the > actual > > value, because it includes some of the new dead tuples also. Please > correct me if anything wrong in my analysis. >I think main problem in nkeep logic is to come up with an estimation > algorithm similar to live tuples. > > By the way, do you have test case or can you try to write a test case > which can show this problem and then after fix, you can verify if the > problem is resolved. The simulated index bloat problem can be generated using the attached script and sql. With the fix of setting the dead tuples properly, the bloat is reduced and by changing the vacuum cost Parameters the bloat is avoided. The advantage with the fix is observed is the more number of times the auto vacuum is triggered on The bloated table, as it satisfies the vacuum criteria because of proper dead tuples compared to the original code. Regards, Hari babu. Table_and_functions.sql Description: Table_and_functions.sql script.sh Description: script.sh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi wrote: > On 07 November 2013 09:42 Amit Kapila wrote: >> I am not sure whether the same calculation as done for new_rel_tuples >> works for new_dead_tuples, you can once check it. > > I didn't find any way to calculate new_dead_tuples like new_rel_tuples. > I will check it. > >> I am thinking that if we have to do estimation anyway, then wouldn't it >> be better to do the way Tom had initially suggested (Maybe we could >> have VACUUM copy the n_dead_tuples value as it exists when VACUUM >> starts, and then send that as the value to subtract when it's done?) >> >> I think the reason you gave that due to tuple visibility check the >> number of dead tuples calculated by above logic is not accurate is >> right but still it will make the value of dead tuples more appropriate >> than it's current value. >> >> You can check if there is a way to do estimation of dead tuples similar >> to new tuples, and it will be as solid as current logic of >> vac_estimate_reltuples(), then it's okay, otherwise use the other >> solution (using the value of n_dead_tuples at start of Vacuum) to solve >> the problem. > > The two approaches calculations are approximation values only. > > 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it > once it is done. >This approach doesn't include the tuples which are remains during the > vacuum operation. Wouldn't next or future vacuum's will make the estimate more appropraite? > 2. nkeep counter contains the tuples which are still visible to other > transactions. >This approach doesn't include tuples which are deleted on pages where > vacuum operation is already finished. > > In my opinion the second approach gives the value nearer to the actual value, > because it includes some of the new dead tuples also. Please correct me if > anything wrong in my analysis. I think main problem in nkeep logic is to come up with an estimation algorithm similar to live tuples. By the way, do you have test case or can you try to write a test case which can show this problem and then after fix, you can verify if the problem is resolved. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 07 November 2013 09:42 Amit Kapila wrote: > On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi > wrote: > > On 22 October 2013 10:15 Amit Kapila wrote: > >>>On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi > wrote: > >>> > >>Actually what I had in mind was to use nkeep to estimate > n_dead_tuples > >>similar to how num_tuples is used to estimate n_live_tuples. I think > >>it will match what Tom had pointed in his response (What > >>>would make more sense to me is for VACUUM to estimate the number > >>of remaining dead tuples somehow and send that in its message. > >>However, since the whole point here is that we aren't accounting > >>for transactions that commit while VACUUM runs, it's not very > >>clear how to do that.) > > > > I changed the patch as passing the "nkeep" counter data as the new > dead tuples in the relation to stats like the new_rel_tuples. > > The "nkeep" counter is an approximation of dead tuples data of a > relation. > > Instead of resetting dead tuples stats as zero, used this value to > set n_dead_tuples same as n_live_tuples. > > Directly using nkeep might not work as it is not guaranteed that Vacuum > will scan all the pages, we need to estimate the value similar to > new_rel_tuples, something like as done in below function: > > /* now we can compute the new value for pg_class.reltuples */ > vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false, > > nblocks, > > vacrelstats->scanned_pages, > > num_tuples); > > I am not sure whether the same calculation as done for new_rel_tuples > works for new_dead_tuples, you can once check it. I didn't find any way to calculate new_dead_tuples like new_rel_tuples. I will check it. > I am thinking that if we have to do estimation anyway, then wouldn't it > be better to do the way Tom had initially suggested (Maybe we could > have VACUUM copy the n_dead_tuples value as it exists when VACUUM > starts, and then send that as the value to subtract when it's done?) > > I think the reason you gave that due to tuple visibility check the > number of dead tuples calculated by above logic is not accurate is > right but still it will make the value of dead tuples more appropriate > than it's current value. > > You can check if there is a way to do estimation of dead tuples similar > to new tuples, and it will be as solid as current logic of > vac_estimate_reltuples(), then it's okay, otherwise use the other > solution (using the value of n_dead_tuples at start of Vacuum) to solve > the problem. The two approaches calculations are approximation values only. 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it once it is done. This approach doesn't include the tuples which are remains during the vacuum operation. 2. nkeep counter contains the tuples which are still visible to other transactions. This approach doesn't include tuples which are deleted on pages where vacuum operation is already finished. In my opinion the second approach gives the value nearer to the actual value, because it includes some of the new dead tuples also. Please correct me if anything wrong in my analysis. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi wrote: > On 22 October 2013 10:15 Amit Kapila wrote: >>>On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi >>>wrote: >>> >>Actually what I had in mind was to use nkeep to estimate n_dead_tuples >>similar to how num_tuples is used to estimate n_live_tuples. I think it will >>match what Tom had pointed in his response (What >>>would make more sense to me is for VACUUM to estimate the number >>of remaining dead tuples somehow and send that in its message. >>However, since the whole point here is that we aren't accounting for >>transactions that commit while VACUUM runs, it's not very clear how >>to do that.) > > I changed the patch as passing the "nkeep" counter data as the new dead > tuples in the relation to stats like the new_rel_tuples. > The "nkeep" counter is an approximation of dead tuples data of a relation. > Instead of resetting dead tuples stats as zero, used this value to set > n_dead_tuples same as n_live_tuples. Directly using nkeep might not work as it is not guaranteed that Vacuum will scan all the pages, we need to estimate the value similar to new_rel_tuples, something like as done in below function: /* now we can compute the new value for pg_class.reltuples */ vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false, nblocks, vacrelstats->scanned_pages, num_tuples); I am not sure whether the same calculation as done for new_rel_tuples works for new_dead_tuples, you can once check it. I am thinking that if we have to do estimation anyway, then wouldn't it be better to do the way Tom had initially suggested (Maybe we could have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, and then send that as the value to subtract when it's done?) I think the reason you gave that due to tuple visibility check the number of dead tuples calculated by above logic is not accurate is right but still it will make the value of dead tuples more appropriate than it's current value. You can check if there is a way to do estimation of dead tuples similar to new tuples, and it will be as solid as current logic of vac_estimate_reltuples(), then it's okay, otherwise use the other solution (using the value of n_dead_tuples at start of Vacuum) to solve the problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 22 October 2013 10:15 Amit Kapila wrote: >>On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi >>wrote: >> >> Yes, it's correct. "nkeep" counter have the dead tuples which are recently >> dead and are not vacuumed. The removal of tuples vacuumed from dead tuples >> should be the same as "nkeep" counter. >> So if we remove the nkeep from num_tuples which gives us the proper live >> tuples. How about following statement at the end scan for all blocks. >> >> num_tuples -= nkeep; > >Actually what I had in mind was to use nkeep to estimate n_dead_tuples similar >to how num_tuples is used to estimate n_live_tuples. I think it will match >what Tom had pointed in his response (What >>would make more sense to me is for VACUUM to estimate the number >of remaining dead tuples somehow and send that in its message. >However, since the whole point here is that we aren't accounting for >transactions that commit while VACUUM runs, it's not very clear how >to do that.) I changed the patch as passing the "nkeep" counter data as the new dead tuples in the relation to stats like the new_rel_tuples. The "nkeep" counter is an approximation of dead tuples data of a relation. Instead of resetting dead tuples stats as zero, used this value to set n_dead_tuples same as n_live_tuples. Patch is attached in the mail. Please let me know if any changes are required. Regards, Hari Babu. vacuum_fix_v3.patch Description: vacuum_fix_v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi wrote: > On 20 October 2013 12:06 Amit Kapila wrote: >>On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi >>wrote: >>> On 12 October 2013 11:30 Tom Lane wrote: Haribabu kommi writes: >>> Another way to look at it is that we want to keep any increments to n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, and then send that as the value to subtract when it's done? >>> >>> Taking of n_dead_tuples copy and pass the same at the vacuum end to >>> subtract from table stats may not be correct, as vacuum may not be cleaned >>> all the dead tuples because of tuple visibility To other transactions. How >>> about resets the n_dead_tuples as zero if it goes negative because of >>> errors? > >>Wouldn't the way you are planing to change n_dead_tuples create inconsistency >>for n_live_tuples and n_dead_tuples, because it would have counted non >>deleted tuples as n_live_tuples as per below code: >> >>if (tupgone) >>{ >>.. >>tups_vacuumed += 1; >>has_dead_tuples = true; >>} >>else >>{ >>num_tuples += 1; >>hastup = true; >>.. >>} >> >>So now if we just subtract tuples_deleted from n_dead_tuples, it will count >>the tuples deleted during vacuum both as live tuples and dead tuples. >>There is one statistics for dead row version's that cannot be removed >>(nkeep), if we could use that to estimate total remaining dead tuples, then >>the solution can be inline with Tom's suggestion (What >>would make more sense to me is for VACUUM to estimate the number of remaining >>dead tuples somehow and send that in its message.). > > Yes, it's correct. "nkeep" counter have the dead tuples which are recently > dead and are not vacuumed. The removal of tuples vacuumed from dead tuples > should be the same as "nkeep" counter. > So if we remove the nkeep from num_tuples which gives us the proper live > tuples. How about following statement at the end scan for all blocks. > > num_tuples -= nkeep; Actually what I had in mind was to use nkeep to estimate n_dead_tuples similar to how num_tuples is used to estimate n_live_tuples. I think it will match what Tom had pointed in his response (What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that in its message. However, since the whole point here is that we aren't accounting for transactions that commit while VACUUM runs, it's not very clear how to do that.) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 20 October 2013 12:06 Amit Kapila wrote: >On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi >wrote: >> On 12 October 2013 11:30 Tom Lane wrote: >>>Haribabu kommi writes: To handle the above case instead of directly resetting the dead tuples as zero, how if the exact dead tuples are removed from the table stats. With this approach vacuum gets triggered frequently thus it reduces the bloat. >> >>>This does not seem like a very good idea as-is, because it will mean >>>that n_dead_tuples can diverge arbitrarily far from reality over time, as a >>>result of accumulation of errors. It also doesn't seem like a very good >>>idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples >>>incrementally; ideally those counters should move in the same fashion. >>>In short, I think this patch will create at least as many problems as it >>>fixes. >> >>>What would make more sense to me is for VACUUM to estimate the number >>>of remaining dead tuples somehow and send that in its message. However, >>>since the whole point here is that we aren't accounting for transactions >>>that commit while VACUUM runs, it's not very clear how to do that. >> >>>Another way to look at it is that we want to keep any increments to >>>n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could >>>have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, >>>and then send that as the value to subtract when it's done? >> >> Taking of n_dead_tuples copy and pass the same at the vacuum end to >> subtract from table stats may not be correct, as vacuum may not be cleaned >> all the dead tuples because of tuple visibility To other transactions. How >> about resets the n_dead_tuples as zero if it goes negative because of errors? >Wouldn't the way you are planing to change n_dead_tuples create inconsistency >for n_live_tuples and n_dead_tuples, because it would have counted non deleted >tuples as n_live_tuples as per below code: > >if (tupgone) >{ >.. >tups_vacuumed += 1; >has_dead_tuples = true; >} >else >{ >num_tuples += 1; >hastup = true; >.. >} > >So now if we just subtract tuples_deleted from n_dead_tuples, it will count >the tuples deleted during vacuum both as live tuples and dead tuples. >There is one statistics for dead row version's that cannot be removed (nkeep), >if we could use that to estimate total remaining dead tuples, then the >solution can be inline with Tom's suggestion (What >would make more sense to me is for VACUUM to estimate the number of remaining >dead tuples somehow and send that in its message.). Yes, it's correct. "nkeep" counter have the dead tuples which are recently dead and are not vacuumed. The removal of tuples vacuumed from dead tuples should be the same as "nkeep" counter. So if we remove the nkeep from num_tuples which gives us the proper live tuples. How about following statement at the end scan for all blocks. num_tuples -= nkeep; please let me know if any corrections are required. Patch with the above implementation is attached in the mail. Regards, Hari babu. vacuum_fix_v2.patch Description: vacuum_fix_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi wrote: > On 12 October 2013 11:30 Tom Lane wrote: >>Haribabu kommi writes: >>> To handle the above case instead of directly resetting the dead tuples >>> as zero, how if the exact dead tuples are removed from the table stats. >>> With this approach vacuum gets triggered frequently thus it reduces the >>> bloat. > >>This does not seem like a very good idea as-is, because it will mean that >>n_dead_tuples can diverge arbitrarily far from reality over time, as a result >>of accumulation of errors. It also doesn't seem >>like a very good idea that VACUUM sets n_live_tuples while only adjusting >>n_dead_tuples incrementally; ideally those counters should move in the same >>fashion. >>In short, I think this patch will create at least as many problems as it >>fixes. > >>What would make more sense to me is for VACUUM to estimate the number of >>remaining dead tuples somehow and send that in its message. However, since >>the whole point here is that we aren't accounting for >>transactions that commit while VACUUM runs, it's not very clear how to do >>that. > >>Another way to look at it is that we want to keep any increments to >>n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could >>have VACUUM copy the n_dead_tuples value as it exists when >>VACUUM starts, and then send that as the value to subtract when it's done? > > Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract > from table stats may not be correct, as vacuum may not be cleaned all the > dead tuples because of tuple visibility > To other transactions. How about resets the n_dead_tuples as zero if it goes > negative because of errors? Wouldn't the way you are planing to change n_dead_tuples create inconsistency for n_live_tuples and n_dead_tuples, because it would have counted non deleted tuples as n_live_tuples as per below code: if (tupgone) { .. tups_vacuumed += 1; has_dead_tuples = true; } else { num_tuples += 1; hastup = true; .. } So now if we just subtract tuples_deleted from n_dead_tuples, it will count the tuples deleted during vacuum both as live tuples and dead tuples. There is one statistics for dead row version's that cannot be removed (nkeep), if we could use that to estimate total remaining dead tuples, then the solution can be inline with Tom's suggestion (What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that in its message.). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
On 12 October 2013 11:30 Tom Lane wrote: >Haribabu kommi writes: >> To handle the above case instead of directly resetting the dead tuples >> as zero, how if the exact dead tuples are removed from the table stats. With >> this approach vacuum gets triggered frequently thus it reduces the bloat. >This does not seem like a very good idea as-is, because it will mean that >n_dead_tuples can diverge arbitrarily far from reality over time, as a result >of accumulation of errors. It also doesn't seem >like a very good idea that VACUUM sets n_live_tuples while only adjusting >n_dead_tuples incrementally; ideally those counters should move in the same >fashion. >In short, I think this patch will create at least as many problems as it fixes. >What would make more sense to me is for VACUUM to estimate the number of >remaining dead tuples somehow and send that in its message. However, since >the whole point here is that we aren't accounting for >transactions that commit while VACUUM runs, it's not very clear how to do that. >Another way to look at it is that we want to keep any increments to >n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could have >VACUUM copy the n_dead_tuples value as it exists when >VACUUM starts, and then send that as the value to subtract when it's done? Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract from table stats may not be correct, as vacuum may not be cleaned all the dead tuples because of tuple visibility To other transactions. How about resets the n_dead_tuples as zero if it goes negative because of errors? Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Haribabu kommi writes: > To handle the above case instead of directly resetting the dead tuples as > zero, how if the exact dead tuples > are removed from the table stats. With this approach vacuum gets triggered > frequently thus it reduces the bloat. This does not seem like a very good idea as-is, because it will mean that n_dead_tuples can diverge arbitrarily far from reality over time, as a result of accumulation of errors. It also doesn't seem like a very good idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples incrementally; ideally those counters should move in the same fashion. In short, I think this patch will create at least as many problems as it fixes. What would make more sense to me is for VACUUM to estimate the number of remaining dead tuples somehow and send that in its message. However, since the whole point here is that we aren't accounting for transactions that commit while VACUUM runs, it's not very clear how to do that. Another way to look at it is that we want to keep any increments to n_dead_tuples that occur after VACUUM takes its snapshot. Maybe we could have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, and then send that as the value to subtract when it's done? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running
Haribabu kommi wrote: > To handle the above case instead of directly resetting the dead > tuples as zero, how if the exact dead tuples are removed from the > table stats. With this approach vacuum gets triggered frequently > thus it reduces the bloat. > Patch for the same is attached in the mail. Please add this to the open CommitFest to ensure that it gets reviewed: https://commitfest.postgresql.org/action/commitfest_view/open For more information about the process, see: http://wiki.postgresql.org/wiki/CommitFest You may also want to reveiw: http://wiki.postgresql.org/wiki/Developer_FAQ -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Heavily modified big table bloat even in auto vacuum is running
vacuum is not happening on a heavily modified big table even if the dead tuples are more than configured threshold. This is because during at the end of vacuum, the number of dead tuples of the table is reset as zero, because of this reason the dead tuples which are occurred during the vacuum operation are lost. Thus to trigger a next vacuum on the same table, the configured threshold number of dead tuples needs to be occurred. The next vacuum operation is taking more time because of more number of dead tuples, like this it continues and it leads to Table and index bloat. To handle the above case instead of directly resetting the dead tuples as zero, how if the exact dead tuples are removed from the table stats. With this approach vacuum gets triggered frequently thus it reduces the bloat. Patch for the same is attached in the mail. please let me know is there any problem in this approach. Regards, Hari babu. vacuum_fix_v1.patch Description: vacuum_fix_v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers