Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-18 Thread Tom Lane
Amit Kapila amit.kapil...@gmail.com writes:
 I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready
 For Committer.

I've reviewed and committed this patch, with one significant change.
If you look at the way that vacuumlazy.c computes new_rel_tuples, it's
based on scanned_tuples (lazy_scan_heap's num_tuples), which is the total
number of surviving tuples *including* the recently-dead ones counted in
nkeep.  This is the number that we want to put into pg_class.reltuples,
I think, but it's wrong for the pgstats stuff to use it as n_live_tuples
if we're going to count the recently-dead ones as dead.  That is, if we're
improving the approximation that n_dead_tuples is zero after a vacuum,
the fix should involve reducing the n_live_tuples estimate as well as
increasing the n_dead_tuples estimate.

Using your test script against the unpatched code, it's easy to see that
there's a large (and wrong) value of n_live_tup reported by an autovacuum,
which gets corrected by the next autoanalyze.  For instance note these
successive readouts from the pg_stat_all_tables query:

 n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup 
+--+---+
 497365 |9 | 8 |4958346
 497365 |9 | 8 |5458346
1186555 |   10 | 8 |  0
1186555 |   10 | 8 | 50
 499975 |   10 | 9 |2491877

Since we know the true number of live tuples is always exactly 50
in this test, that jump is certainly wrong.  With the committed patch,
the behavior is significantly saner:

 n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup 
+--+---+
 483416 |2 | 2 |5759861
 483416 |2 | 2 |6259861
 655171 |3 | 2 | 382306
 655171 |3 | 2 | 882306
 553942 |3 | 3 |3523744

Still some room for improvement, but it's not so silly anymore.

It strikes me that there may be an obvious way to improve the number
further, based on the observation in this thread that nkeep doesn't need
to be scaled up because VACUUM should have scanned every page that could
contain dead tuples.  Namely, that we're arriving at new_rel_tuples by
scaling up num_tuples linearly --- but perhaps we should only scale up
the live-tuples fraction of that count, not the dead-tuples fraction.
By scaling up dead tuples too, we are presumably still overestimating
new_rel_tuples somewhat, and the behavior that I'm seeing with this test
script seems to confirm that.  I haven't time to pursue this idea at the
moment, but perhaps someone else would like to.

The n_dead_tup values seem to still be on the high side (not the low
side) when I run this test.  Not too sure why that is.

Also, I don't see any particularly meaningful change in the rate of
autovacuuming or autoanalyzing when using default postgresql.conf
settings.  I see that your test case changes the autovac parameters quite
a bit, but I didn't bother installing those values here.  This may or may
not mean much; the fix is clearly correct on its own terms.

On the whole, this patch is not really addressing the question of
accounting for transactions changing the table concurrently with
VACUUM; it's only fixing the impedance mismatch between pgstats wanting
to count live and dead tuples separately while VACUUM wasn't telling
it that.  That's a good thing to do, but I think we still have some
issues here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-18 Thread Amit Kapila
On Sun, Jan 19, 2014 at 5:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready
 For Committer.

 I've reviewed and committed this patch, with one significant change.
 If you look at the way that vacuumlazy.c computes new_rel_tuples, it's
 based on scanned_tuples (lazy_scan_heap's num_tuples), which is the total
 number of surviving tuples *including* the recently-dead ones counted in
 nkeep.  This is the number that we want to put into pg_class.reltuples,
 I think, but it's wrong for the pgstats stuff to use it as n_live_tuples
 if we're going to count the recently-dead ones as dead.  That is, if we're
 improving the approximation that n_dead_tuples is zero after a vacuum,
 the fix should involve reducing the n_live_tuples estimate as well as
 increasing the n_dead_tuples estimate.

 Using your test script against the unpatched code, it's easy to see that
 there's a large (and wrong) value of n_live_tup reported by an autovacuum,
 which gets corrected by the next autoanalyze.  For instance note these
 successive readouts from the pg_stat_all_tables query:

  n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup
 +--+---+
  497365 |9 | 8 |4958346
  497365 |9 | 8 |5458346
 1186555 |   10 | 8 |  0
 1186555 |   10 | 8 | 50
  499975 |   10 | 9 |2491877

 Since we know the true number of live tuples is always exactly 50
 in this test, that jump is certainly wrong.  With the committed patch,
 the behavior is significantly saner:

  n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup
 +--+---+
  483416 |2 | 2 |5759861
  483416 |2 | 2 |6259861
  655171 |3 | 2 | 382306
  655171 |3 | 2 | 882306
  553942 |3 | 3 |3523744

 Still some room for improvement, but it's not so silly anymore.

 It strikes me that there may be an obvious way to improve the number
 further, based on the observation in this thread that nkeep doesn't need
 to be scaled up because VACUUM should have scanned every page that could
 contain dead tuples.  Namely, that we're arriving at new_rel_tuples by
 scaling up num_tuples linearly --- but perhaps we should only scale up
 the live-tuples fraction of that count, not the dead-tuples fraction.
 By scaling up dead tuples too, we are presumably still overestimating
 new_rel_tuples somewhat, and the behavior that I'm seeing with this test
 script seems to confirm that.

After reading your analysis, first thought occurred to me is that we can
directly subtract nkeep from num_tuples to account for better scaling
of live tuples, but I think the scaling routine vac_estimate_reltuples()
is expecting scanned_tuples and this routine is shared by both
Analyze and Vacuum where the mechanism to calculate the live
and dead tuples seems to be bit different, so may be directly passing
a subtract of num_tuples and nkeep to this routine might create some
problem. However I think this idea is definitely worth pursuing to
improve the estimates of live tuples in Vacuum.

 I haven't time to pursue this idea at the moment, but perhaps someone else 
 would like to.

I think this idea is worth to be added in Todo list.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-18 Thread Tom Lane
Amit Kapila amit.kapil...@gmail.com writes:
 On Sun, Jan 19, 2014 at 5:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It strikes me that there may be an obvious way to improve the number
 further, based on the observation in this thread that nkeep doesn't need
 to be scaled up because VACUUM should have scanned every page that could
 contain dead tuples.  Namely, that we're arriving at new_rel_tuples by
 scaling up num_tuples linearly --- but perhaps we should only scale up
 the live-tuples fraction of that count, not the dead-tuples fraction.
 By scaling up dead tuples too, we are presumably still overestimating
 new_rel_tuples somewhat, and the behavior that I'm seeing with this test
 script seems to confirm that.

 After reading your analysis, first thought occurred to me is that we can
 directly subtract nkeep from num_tuples to account for better scaling
 of live tuples, but I think the scaling routine vac_estimate_reltuples()
 is expecting scanned_tuples and this routine is shared by both
 Analyze and Vacuum where the mechanism to calculate the live
 and dead tuples seems to be bit different, so may be directly passing
 a subtract of num_tuples and nkeep to this routine might create some
 problem. However I think this idea is definitely worth pursuing to
 improve the estimates of live tuples in Vacuum.

Yeah, it seemed like it would require some rethinking of the way
vac_estimate_reltuples() works.  It's probably not that hard, but it
looked like it'd require more thought than I wanted to put into it on
a Saturday ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-16 Thread tirtho
Is this now fixed? If so, where do I find the patch for postgres 9.2.2.

Thanks

- Tirthankar



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Heavily-modified-big-table-bloat-even-in-auto-vacuum-is-running-tp5774274p5787477.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2014-01-16 Thread Amit Kapila
On Thu, Jan 16, 2014 at 10:24 PM, tirtho tir...@hotmail.com wrote:
 Is this now fixed? If so, where do I find the patch for postgres 9.2.2.

   This is still not fixed, the patch is in Ready For Committer stage.

   You can confirm the status here:
   https://commitfest.postgresql.org/action/patch_view?id=1258

   I had verified that it still applies on HEAD, so moved it to current
   Commit Fest.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Peter Eisentraut
Amit, if you think this patch is ready now, please mark it as Ready for
Committer.  Otherwise, I encourage your to continue the discussion and
possibly resubmit the patch to the next commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Amit Kapila
On Mon, Dec 16, 2013 at 6:51 PM, Peter Eisentraut pete...@gmx.net wrote:
 Amit, if you think this patch is ready now, please mark it as Ready for
 Committer.  Otherwise, I encourage your to continue the discussion and
 possibly resubmit the patch to the next commitfest.

Only one test is pending from myside to conclude. I am planing to
complete it tomorrow.
Is it okay?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Amit Kapila
On Thu, Dec 12, 2013 at 12:24 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 06 December 2013 11:57 Amit Kapila wrote:

 A simplified test and updated patch by taking care the above comment are 
 attached in the mail.
 I am not able to reduce the test duration but changed as the test 
 automatically exists after 45 mins run.
 Please check vacuum_test.sh file more details for running the test.

   Auto vacuum count   Bloat size
 Master   15220MB
 Patched_nkeep18213MB


I ran the attached test and the numbers are as below:
Auto vacuum count   Bloat size
 Master 16222MB
 Patched_nkeep 23216MB

Here by Bloat size, it means the table_size after the test finished and
by Auto vacuum count, it means number of times Auto Vacuum is
triggered during test run.

It clearly shows that by setting number of dead tuples at end of Vacuum improves
the situation.
I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready
For Committer.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-11 Thread Haribabu kommi
On 06 December 2013 11:57 Amit Kapila wrote:
 On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 29 November 2013 12:00 Amit Kapila wrote:
  On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
  Few questions about your latest patch:
  a. Is there any reason why you are doing estimation of dead tuples
  only for Autovacuum and not for Vacuum.
 
  No, changed.
 
  /* clear and get the new stats for calculating proper dead tuples */
  pgstat_clear_snapshot(); tabentry =
  pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
  b. In the above code, to get latest data you are first clearing
  snapshot and then calling pgstat function. It will inturn perform
 I/O
  (read of stats file) and send/receive message from stats
  collector to ensure it can read latest data. I think it will add
 overhead
  to Vacuum, especially if 'nkeep' calculated in function
  lazy_scan_heap() can serve the purpose. In my simple test[1], I
  observed
  that value of keep can serve the purpose.
 
  Can you please once try the test on 'nkeep' approach patch.
 
  Using the nkeep and snapshot approach, I ran the test for 40 mins
 with
  a high analyze_threshold and results are below.
 
  Auto vacuum count   Bloat size
  Master   11  220MB
  Patched_nkeep  14  215MB
  Patched_snapshot   18  198MB
 
  Both the approaches are showing good improvement in the test.
  Updated patches, test script and configuration is attached in the
 mail.
 
 I think updating dead tuple count using nkeep is good idea as similar
 thing is done for Analyze as well in acquire_sample_rows().
 One minor point, I think it is better to log dead tuples is below error
 message:
 ereport(LOG,
 (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n
 pages: %d removed, %d remain\n
 tuples: %.0f removed, %.0f remain\n
 
 tuples: %.0f removed, %.0f remain, %.0f dead\n
 
 
 About your test, how to collect the data by running this script, are
 you manually stopping it after 40 mins, because I ran it for more than
 an hour, the final result didn't came.
 As I mentioned you last time, please simplify your test, for other
 person in its current form, it is difficult to make meaning out of it.
 Write comments on top of it in steps form to explain what exactly it is
 doing and how to take data using it (for example, do I need to wait,
 till script ends; how long this test can take to complete).

A simplified test and updated patch by taking care the above comment are 
attached in the mail.
I am not able to reduce the test duration but changed as the test automatically 
exists after 45 mins run.
Please check vacuum_test.sh file more details for running the test.

  Auto vacuum count   Bloat size
Master   15220MB
Patched_nkeep18213MB

Please let me know your suggestions.

Regards,
Hari babu.



vacuum_fix_v7_nkeep.patch
Description: vacuum_fix_v7_nkeep.patch


test_script.tar.gz
Description: test_script.tar.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-05 Thread Amit Kapila
On Fri, Nov 29, 2013 at 6:55 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 29 November 2013 12:00 Amit Kapila wrote:
 On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
 Few questions about your latest patch:
 a. Is there any reason why you are doing estimation of dead tuples only
 for Autovacuum and not for Vacuum.

 No, changed.

 /* clear and get the new stats for calculating proper dead tuples */
 pgstat_clear_snapshot(); tabentry =
 pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
 b. In the above code, to get latest data you are first clearing
 snapshot and then calling pgstat function. It will inturn perform I/O
 (read of stats file) and send/receive message from stats collector
 to ensure it can read latest data. I think it will add overhead
 to Vacuum, especially if 'nkeep' calculated in function
 lazy_scan_heap() can serve the purpose. In my simple test[1], I
 observed
 that value of keep can serve the purpose.

 Can you please once try the test on 'nkeep' approach patch.

 Using the nkeep and snapshot approach, I ran the test for 40 mins with a
 high analyze_threshold and results are below.

 Auto vacuum count   Bloat size
 Master   11  220MB
 Patched_nkeep  14  215MB
 Patched_snapshot   18  198MB

 Both the approaches are showing good improvement in the test.
 Updated patches, test script and configuration is attached in the mail.

I think updating dead tuple count using nkeep is good idea as similar
thing is done for Analyze as well in acquire_sample_rows().
One minor point, I think it is better to log dead tuples is below error message:
ereport(LOG,
(errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n
pages: %d removed, %d remain\n
tuples: %.0f removed, %.0f remain\n

tuples: %.0f removed, %.0f remain, %.0f dead\n


About your test, how to collect the data by running this script, are
you manually stopping it after 40 mins, because I ran it for more than
an hour,
the final result didn't came.
As I mentioned you last time, please simplify your test, for other
person in its current form, it is difficult to make meaning out of it.
Write comments on top of it in steps form to explain what exactly it
is doing and how to take data using it (for example, do I need
to wait, till script ends; how long this test can take to complete).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-29 Thread Haribabu kommi
On 29 November 2013 12:00 Amit Kapila wrote: 
 On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 25 November 2013 10:43 Amit Kapila wrote:
  On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 19 November 2013 10:33 Amit Kapila wrote:
   If I understood correctly, then your patch's main intention is to
   correct the estimate of dead tuples, so that it can lead to
 Vacuum
   cleaning the table/index which otherwise is not happening as per
   configuration value (autovacuum_vacuum_threshold) in some of the
   cases, also it is not reducing the complete bloat (Unpatched -
  1532MB
   ~Patched   - 1474MB), as the main reason of bloat is extra space
 in
   index which can be reclaimed by reindex operation.
  
   So if above is correct then this patch has 3 advantages:
   a. Extra Vacuum on table/index due to better estimation of dead
  tuples.
   b. Space reclaim due to this extra vacuum c. may be some
   performance advantage as it will avoid the delay in cleaning dead
   tuples
  
   I think better way to test the patch is to see how much benefit
 is
   there due to above (a and b points) advantages. Different values
   of autovacuum_vacuum_threshold can be used to test.
  
  
   The performance effect of the patch is not much visible as I think
  the
   analyze on the table estimates the number of dead tuples of the
   table
  with some estimation.
 
 Yes, that seems to be the reason why you are not seeing any
  performance benefit, but still I think this is useful optimization
 to
  do, as
 analyze updates both the livetuples and dead tuples and similarly
  vacuum should also update both the counts. Do you see any reason
 why Vacuum should only update live tuples and not deadtuples?
 
  As vacuum touches all the pages where the dead tuples are present.
  This is not the Same with analyzer. Because of this reason, the
 analyzer estimates the dead tuples also.
  With the proposed patch the vacuum also estimates the dead tuples.
 
 Few questions about your latest patch:
 a. Is there any reason why you are doing estimation of dead tuples only
 for Autovacuum and not for Vacuum.

No, changed.
 
 /* clear and get the new stats for calculating proper dead tuples */
 pgstat_clear_snapshot(); tabentry =
 pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
 b. In the above code, to get latest data you are first clearing
 snapshot and then calling pgstat function. It will inturn perform I/O
 (read of stats file) and send/receive message from stats collector
 to ensure it can read latest data. I think it will add overhead
 to Vacuum, especially if 'nkeep' calculated in function
 lazy_scan_heap() can serve the purpose. In my simple test[1], I
 observed
 that value of keep can serve the purpose.
 
 Can you please once try the test on 'nkeep' approach patch.

Using the nkeep and snapshot approach, I ran the test for 40 mins with a
high analyze_threshold and results are below.

Auto vacuum count   Bloat size
Master   11  220MB
Patched_nkeep  14  215MB
Patched_snapshot   18  198MB

Both the approaches are showing good improvement in the test.
Updated patches, test script and configuration is attached in the mail.

Regards,
Hari babu.



vacuum_fix_v6_nkeep.patch
Description: vacuum_fix_v6_nkeep.patch


test_script.tar.gz
Description: test_script.tar.gz


vacuum_fix_v6_snapshot.patch
Description: vacuum_fix_v6_snapshot.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-28 Thread Amit Kapila
On Tue, Nov 26, 2013 at 7:26 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 25 November 2013 10:43 Amit Kapila wrote:
 On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 19 November 2013 10:33 Amit Kapila wrote:
  If I understood correctly, then your patch's main intention is to
  correct the estimate of dead tuples, so that it can lead to Vacuum
  cleaning the table/index which otherwise is not happening as per
  configuration value (autovacuum_vacuum_threshold) in some of the
  cases, also it is not reducing the complete bloat (Unpatched -
 1532MB
  ~Patched   - 1474MB), as the main reason of bloat is extra space in
  index which can be reclaimed by reindex operation.
 
  So if above is correct then this patch has 3 advantages:
  a. Extra Vacuum on table/index due to better estimation of dead
 tuples.
  b. Space reclaim due to this extra vacuum c. may be some performance
  advantage as it will avoid the delay in cleaning dead tuples
 
  I think better way to test the patch is to see how much benefit is
  there due to above (a and b points) advantages. Different values of
  autovacuum_vacuum_threshold can be used to test.
 
 
  The performance effect of the patch is not much visible as I think
 the
  analyze on the table estimates the number of dead tuples of the table
 with some estimation.

Yes, that seems to be the reason why you are not seeing any
 performance benefit, but still I think this is useful optimization to
 do, as
analyze updates both the livetuples and dead tuples and similarly
 vacuum should also update both the counts. Do you see any reason
why Vacuum should only update live tuples and not deadtuples?

 As vacuum touches all the pages where the dead tuples are present. This is 
 not the
 Same with analyzer. Because of this reason, the analyzer estimates the dead 
 tuples also.
 With the proposed patch the vacuum also estimates the dead tuples.

Few questions about your latest patch:
a. Is there any reason why you are doing estimation of dead tuples
only for Autovacuum and not for Vacuum.

/* clear and get the new stats for calculating proper dead tuples */
pgstat_clear_snapshot();
tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
b. In the above code, to get latest data you are first clearing
snapshot and then calling pgstat function. It will inturn perform I/O
(read of stats file) and send/receive message from stats collector
to ensure it can read latest data. I think it will add overhead
to Vacuum, especially if 'nkeep' calculated in function
lazy_scan_heap() can serve the purpose. In my simple test[1], I
observed
that value of keep can serve the purpose.

Can you please once try the test on 'nkeep' approach patch.

  Because of this reason not much performance improvement is not
 visible
  as the missed dead tuple calculation in vacuum is covered by the
 analyze.

Yeah, so might be we can check once by configuring
 analyze_threshold/scalefactor in a way that analyze doesn't get trigger
 during your test.

 I ran the test for one hour with a high analyze_threshold and results are 
 below.

 Auto vacuum count   Bloat size
 Master  15  155MB
 Patched 23  134MB

 Updated test script and configuration is attached in the mail.

I just had a brief look on your test, please check if you can simplify
your script file and make the test results to come in 15~20 mins.
Don't put too much effort on it, if you can do it easily then it is okay.

[1] Simple test case to verify the value of dead tuples:
Session-1
-
a. Create table t1(c1 int);
b. insert into t1 values(generate_series(1,1000));
c. delete from t1;
d. Vacuum t1;
-- here I stopped in debugger, after fetching dead tuple count first
time (line 235, vacuumlazy.c, after applying your patch) as per your
code (modified a bit so that I can get the value for Vacuum)

Session-2
-
a. insert into t1 values (generate_series(1000,1500));
b. delete from t1;

Session -1
-
b. Verified the value of nkeep in lazy_scan_heap(), it is 501 which is
what we expect.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-26 Thread Haribabu kommi
On 25 November 2013 10:43 Amit Kapila wrote:
 On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 19 November 2013 10:33 Amit Kapila wrote:
  If I understood correctly, then your patch's main intention is to
  correct the estimate of dead tuples, so that it can lead to Vacuum
  cleaning the table/index which otherwise is not happening as per
  configuration value (autovacuum_vacuum_threshold) in some of the
  cases, also it is not reducing the complete bloat (Unpatched -
 1532MB
  ~Patched   - 1474MB), as the main reason of bloat is extra space in
  index which can be reclaimed by reindex operation.
 
  So if above is correct then this patch has 3 advantages:
  a. Extra Vacuum on table/index due to better estimation of dead
 tuples.
  b. Space reclaim due to this extra vacuum c. may be some performance
  advantage as it will avoid the delay in cleaning dead tuples
 
  I think better way to test the patch is to see how much benefit is
  there due to above (a and b points) advantages. Different values of
  autovacuum_vacuum_threshold can be used to test.
 
 
  The performance effect of the patch is not much visible as I think
 the
  analyze on the table estimates the number of dead tuples of the table
 with some estimation.
 
Yes, that seems to be the reason why you are not seeing any
 performance benefit, but still I think this is useful optimization to
 do, as
analyze updates both the livetuples and dead tuples and similarly
 vacuum should also update both the counts. Do you see any reason
why Vacuum should only update live tuples and not deadtuples?

As vacuum touches all the pages where the dead tuples are present. This is not 
the
Same with analyzer. Because of this reason, the analyzer estimates the dead 
tuples also.
With the proposed patch the vacuum also estimates the dead tuples.

  Because of this reason not much performance improvement is not
 visible
  as the missed dead tuple calculation in vacuum is covered by the
 analyze.
 
Yeah, so might be we can check once by configuring
 analyze_threshold/scalefactor in a way that analyze doesn't get trigger
 during your test.

I ran the test for one hour with a high analyze_threshold and results are below.

Auto vacuum count   Bloat size
Master  15  155MB
Patched 23  134MB

Updated test script and configuration is attached in the mail.

Regards,
Hari babu.


test_script.tar.gz
Description: test_script.tar.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-24 Thread Amit Kapila
On Fri, Nov 22, 2013 at 12:12 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 19 November 2013 10:33 Amit Kapila wrote:
 If I understood correctly, then your patch's main intention is to
 correct the estimate of dead tuples, so that it can lead to Vacuum
 cleaning the table/index which otherwise is not happening as per
 configuration value (autovacuum_vacuum_threshold) in some of the cases,
 also it is not reducing the complete bloat (Unpatched - 1532MB
 ~Patched   - 1474MB), as the main reason of bloat is extra space in
 index which can be reclaimed by reindex operation.

 So if above is correct then this patch has 3 advantages:
 a. Extra Vacuum on table/index due to better estimation of dead tuples.
 b. Space reclaim due to this extra vacuum c. may be some performance
 advantage as it will avoid the delay in cleaning dead tuples

 I think better way to test the patch is to see how much benefit is
 there due to above (a and b points) advantages. Different values of
 autovacuum_vacuum_threshold can be used to test.


 The performance effect of the patch is not much visible as I think the analyze
 on the table estimates the number of dead tuples of the table with some 
 estimation.

   Yes, that seems to be the reason why you are not seeing any
performance benefit, but still I think this is useful optimization to
do, as
   analyze updates both the livetuples and dead tuples and similarly
vacuum should also update both the counts. Do you see any reason
   why Vacuum should only update live tuples and not deadtuples?

 Because of this reason not much performance improvement is not visible as the
 missed dead tuple calculation in vacuum is covered by the analyze.

   Yeah, so might be we can check once by configuring
analyze_threshold/scalefactor in a way that analyze doesn't get
trigger during your test.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-21 Thread Haribabu kommi
On 19 November 2013 10:33 Amit Kapila wrote:
 If I understood correctly, then your patch's main intention is to
 correct the estimate of dead tuples, so that it can lead to Vacuum
 cleaning the table/index which otherwise is not happening as per
 configuration value (autovacuum_vacuum_threshold) in some of the cases,
 also it is not reducing the complete bloat (Unpatched - 1532MB
 ~Patched   - 1474MB), as the main reason of bloat is extra space in
 index which can be reclaimed by reindex operation.
 
 So if above is correct then this patch has 3 advantages:
 a. Extra Vacuum on table/index due to better estimation of dead tuples.
 b. Space reclaim due to this extra vacuum c. may be some performance
 advantage as it will avoid the delay in cleaning dead tuples
 
 I think better way to test the patch is to see how much benefit is
 there due to above (a and b points) advantages. Different values of
 autovacuum_vacuum_threshold can be used to test.

I modified the test and did a performance test with following configuration 
changes,
autovacuum_vacuum_threshold as 3 times the number of records in the table.
Autovacuum_nap_time - 30s

The test script will generate the configured vacuum threshold data in 45sec.
After 180sec test, sleeps for 2 min.

After one hour test run the patched approach shown one autovacuum is more
than the master code. Not sure as this difference also may not be visible in 
long runs.

The performance effect of the patch is not much visible as I think the analyze
on the table estimates the number of dead tuples of the table with some 
estimation. 
Because of this reason not much performance improvement is not visible as the
missed dead tuple calculation in vacuum is covered by the analyze. Please 
correct
me if anything missed in my analysis.

Updated patch and test scripts are attached in the mail.

Regards,
Hari babu.



vacuum_fix_v5.patch
Description: vacuum_fix_v5.patch


Table_and_functions.sql
Description: Table_and_functions.sql


script.sh
Description: script.sh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-18 Thread Amit Kapila
On Fri, Nov 15, 2013 at 10:52 AM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 15 November 2013 10:00 Amit Kapila wrote:
 On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
 
  With the simulated bloat test run for 1 hour the bloat occurred as
  below,
 
  Unpatched - 1532MB
  Patched   - 1474MB

 In your test run, have you checked what happen if after heavy update
 (or once bloat occurs), if you keep the system idle (or just have read
 load on system) for some time, what is the result?

 In the simulated test run which is shared in the previous mail, after a heavy 
 update
 the system is idle for 15 mins.

 With the master code, the vacuum is not triggered during this idle time as it 
 is
 Not satisfies the vacuum threshold, because it doesn't consider the dead 
 tuples occurred
 During vacuum operation.

 With the fix the one extra vacuum can gets triggered compared to master code 
 after two or three
 heavy updates because of accumulation of dead tuples.

 You haven't answered one of my questions in previous mail ( With the
 fix of setting the dead tuples properly, the bloat is reduced and by
 changing the vacuum cost Parameters the bloat is avoided.
 Which fix here you are referring to?)

 The bloat reduced is same with initial and latest patch.
 The vacuum cost parameters change (which doesn't contain any fix) is avoided 
 the bloat.

If I understood correctly, then your patch's main intention is to
correct the estimate of dead tuples, so that it can lead to Vacuum
cleaning the table/index which otherwise is not happening as per
configuration value (autovacuum_vacuum_threshold) in some of the
cases, also it is not reducing the complete bloat (Unpatched - 1532MB
~Patched   - 1474MB), as the main reason of bloat is extra space in
index which can be reclaimed by reindex operation.

So if above is correct then this patch has 3 advantages:
a. Extra Vacuum on table/index due to better estimation of dead tuples.
b. Space reclaim due to this extra vacuum
c. may be some performance advantage as it will avoid the delay in
cleaning dead tuples

I think better way to test the patch is to see how much benefit is
there due to above (a and b points) advantages. Different values of
autovacuum_vacuum_threshold can be used to test.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-14 Thread Amit Kapila
On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 12 November 2013 08:47 Amit Kapila wrote:
 On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 08 November 2013 18:35 Amit Kapila wrote:
  On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 07 November 2013 09:42 Amit Kapila wrote:
   1. Taking a copy of n_dead_tuples before VACUUM starts and then
  subtract it once it is done.
  This approach doesn't include the tuples which are remains
   during
  the vacuum operation.

 Patch is modified as take a copy of n_dead_tuples during vacuum start and use
 the same while calculating the new dead tuples at end of vacuum.

  By the way, do you have test case or can you try to write a test
 case
  which can show this problem and then after fix, you can verify if
 the
  problem is resolved.
 
  The simulated index bloat problem can be generated using the attached
 script and sql.
  With the fix of setting the dead tuples properly,

Which fix here you are referring to, is it the one which you have
 proposed with your initial mail?

  the bloat is reduced and by changing the vacuum cost Parameters the
  bloat is avoided.

 With the simulated bloat test run for 1 hour the bloat occurred as below,

 Unpatched - 1532MB
 Patched   - 1474MB

In your test run, have you checked what happen if after heavy update
(or once bloat occurs), if you keep the system idle (or just have read
load on system) for some time, what is the result?

You haven't answered one of my questions in previous mail
( With the fix of setting the dead tuples properly, the bloat is
reduced and by changing the vacuum cost Parameters the bloat is
avoided.
Which fix here you are referring to?)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-14 Thread Haribabu kommi
On 15 November 2013 10:00 Amit Kapila wrote:
 On Wed, Nov 13, 2013 at 12:02 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 12 November 2013 08:47 Amit Kapila wrote:
  On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 08 November 2013 18:35 Amit Kapila wrote:
   On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
   haribabu.ko...@huawei.com wrote:
On 07 November 2013 09:42 Amit Kapila wrote:
1. Taking a copy of n_dead_tuples before VACUUM starts and then
   subtract it once it is done.
   This approach doesn't include the tuples which are remains
during
   the vacuum operation.
 
  Patch is modified as take a copy of n_dead_tuples during vacuum start
  and use the same while calculating the new dead tuples at end of
 vacuum.
 
   By the way, do you have test case or can you try to write a test
  case
   which can show this problem and then after fix, you can verify if
  the
   problem is resolved.
  
   The simulated index bloat problem can be generated using the
   attached
  script and sql.
   With the fix of setting the dead tuples properly,
 
 Which fix here you are referring to, is it the one which you have
  proposed with your initial mail?
 
   the bloat is reduced and by changing the vacuum cost Parameters
 the
   bloat is avoided.
 
  With the simulated bloat test run for 1 hour the bloat occurred as
  below,
 
  Unpatched - 1532MB
  Patched   - 1474MB
 
 In your test run, have you checked what happen if after heavy update
 (or once bloat occurs), if you keep the system idle (or just have read
 load on system) for some time, what is the result?

In the simulated test run which is shared in the previous mail, after a heavy 
update
the system is idle for 15 mins.

With the master code, the vacuum is not triggered during this idle time as it is
Not satisfies the vacuum threshold, because it doesn't consider the dead tuples 
occurred
During vacuum operation.

With the fix the one extra vacuum can gets triggered compared to master code 
after two or three
heavy updates because of accumulation of dead tuples.

 You haven't answered one of my questions in previous mail ( With the
 fix of setting the dead tuples properly, the bloat is reduced and by
 changing the vacuum cost Parameters the bloat is avoided.
 Which fix here you are referring to?)

The bloat reduced is same with initial and latest patch.
The vacuum cost parameters change (which doesn't contain any fix) is avoided 
the bloat.

Regards,
Hari babu.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-12 Thread Haribabu kommi
On 12 November 2013 08:47 Amit Kapila wrote:
 On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 08 November 2013 18:35 Amit Kapila wrote:
  On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
  haribabu.ko...@huawei.com wrote:
   On 07 November 2013 09:42 Amit Kapila wrote:
   1. Taking a copy of n_dead_tuples before VACUUM starts and then
  subtract it once it is done.
  This approach doesn't include the tuples which are remains
   during
  the vacuum operation.

Patch is modified as take a copy of n_dead_tuples during vacuum start and use
the same while calculating the new dead tuples at end of vacuum.

  By the way, do you have test case or can you try to write a test
 case
  which can show this problem and then after fix, you can verify if
 the
  problem is resolved.
 
  The simulated index bloat problem can be generated using the attached
 script and sql.
  With the fix of setting the dead tuples properly,
 
Which fix here you are referring to, is it the one which you have
 proposed with your initial mail?
 
  the bloat is reduced and by changing the vacuum cost Parameters the
  bloat is avoided.

With the simulated bloat test run for 1 hour the bloat occurred as below, 

Unpatched - 1532MB
Patched   - 1474MB

With this patched approach the bloat is reduced.

Regards,
Hari babu.




vacuum_fix_v4.patch
Description: vacuum_fix_v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-11 Thread Haribabu kommi
On 08 November 2013 18:35 Amit Kapila wrote:
 On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 07 November 2013 09:42 Amit Kapila wrote:
  I am not sure whether the same calculation as done for
 new_rel_tuples
  works for new_dead_tuples, you can once check it.
 
  I didn't find any way to calculate new_dead_tuples like
 new_rel_tuples.
  I will check it.
 
  I am thinking that if we have to do estimation anyway, then wouldn't
  it be better to do the way Tom had initially suggested (Maybe we
  could have VACUUM copy the n_dead_tuples value as it exists when
  VACUUM starts, and then send that as the value to subtract when it's
  done?)
 
  I think the reason you gave that due to tuple visibility check the
  number of dead tuples calculated by above logic is not accurate is
  right but still it will make the value of dead tuples more
  appropriate than it's current value.
 
  You can check if there is a way to do estimation of dead tuples
  similar to new tuples, and it will be as solid as current logic of
  vac_estimate_reltuples(), then it's okay, otherwise use the other
  solution (using the value of n_dead_tuples at start of Vacuum) to
  solve the problem.
 
  The two approaches calculations are approximation values only.
 
  1. Taking a copy of n_dead_tuples before VACUUM starts and then
 subtract it once it is done.
 This approach doesn't include the tuples which are remains during
 the vacuum operation.
 
   Wouldn't next or future vacuum's will make the estimate more
 appropraite?

Possible only when nkeep counter value (tuples not cleaned) is very less value.
 
  2. nkeep counter contains the tuples which are still visible to other
 transactions.
 This approach doesn't include tuples which are deleted on pages
 where vacuum operation is already finished.
 
  In my opinion the second approach gives the value nearer to the
 actual
  value, because it includes some of the new dead tuples also. Please
 correct me if anything wrong in my analysis.
I think main problem in nkeep logic is to come up with an estimation
 algorithm similar to live tuples.
 
 By the way, do you have test case or can you try to write a test case
 which can show this problem and then after fix, you can verify if the
 problem is resolved.

The simulated index bloat problem can be generated using the attached script 
and sql.
With the fix of setting the dead tuples properly, the bloat is reduced and by 
changing the vacuum cost
Parameters the bloat is avoided.

The advantage with the fix is observed is the more number of times the auto 
vacuum is triggered on
The bloated table, as it satisfies the vacuum criteria because of proper dead 
tuples compared to the
original code.

Regards,
Hari babu.



Table_and_functions.sql
Description: Table_and_functions.sql


script.sh
Description: script.sh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-11 Thread Amit Kapila
On Mon, Nov 11, 2013 at 3:14 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 08 November 2013 18:35 Amit Kapila wrote:
 On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 07 November 2013 09:42 Amit Kapila wrote:
  I am not sure whether the same calculation as done for
 new_rel_tuples
  works for new_dead_tuples, you can once check it.
 
  I didn't find any way to calculate new_dead_tuples like
 new_rel_tuples.
  I will check it.
 
  The two approaches calculations are approximation values only.
 
  1. Taking a copy of n_dead_tuples before VACUUM starts and then
 subtract it once it is done.
 This approach doesn't include the tuples which are remains during
 the vacuum operation.

   Wouldn't next or future vacuum's will make the estimate more
 appropraite?

 Possible only when nkeep counter value (tuples not cleaned) is very less 
 value.

   Do you really expect too many dead tuples during Vacuum?

  2. nkeep counter contains the tuples which are still visible to other
 transactions.
 This approach doesn't include tuples which are deleted on pages
 where vacuum operation is already finished.
 
  In my opinion the second approach gives the value nearer to the
 actual
  value, because it includes some of the new dead tuples also. Please
 correct me if anything wrong in my analysis.
I think main problem in nkeep logic is to come up with an estimation
 algorithm similar to live tuples.

 By the way, do you have test case or can you try to write a test case
 which can show this problem and then after fix, you can verify if the
 problem is resolved.

 The simulated index bloat problem can be generated using the attached script 
 and sql.
 With the fix of setting the dead tuples properly,

   Which fix here you are referring to, is it the one which you have
proposed with your initial mail?

 the bloat is reduced and by changing the vacuum cost
 Parameters the bloat is avoided.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-08 Thread Amit Kapila
On Fri, Nov 8, 2013 at 10:56 AM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 07 November 2013 09:42 Amit Kapila wrote:
 I am not sure whether the same calculation as done for new_rel_tuples
 works for new_dead_tuples, you can once check it.

 I didn't find any way to calculate new_dead_tuples like new_rel_tuples.
 I will check it.

 I am thinking that if we have to do estimation anyway, then wouldn't it
 be better to do the way Tom had initially suggested (Maybe we could
 have VACUUM copy the n_dead_tuples value as it exists when VACUUM
 starts, and then send that as the value to subtract when it's done?)

 I think the reason you gave that due to tuple visibility check the
 number of dead tuples calculated by above logic is not accurate is
 right but still it will make the value of dead tuples more appropriate
 than it's current value.

 You can check if there is a way to do estimation of dead tuples similar
 to new tuples, and it will be as solid as current logic of
 vac_estimate_reltuples(), then it's okay, otherwise use the other
 solution (using the value of n_dead_tuples at start of Vacuum) to solve
 the problem.

 The two approaches calculations are approximation values only.

 1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it 
 once it is done.
This approach doesn't include the tuples which are remains during the 
 vacuum operation.

  Wouldn't next or future vacuum's will make the estimate more appropraite?

 2. nkeep counter contains the tuples which are still visible to other 
 transactions.
This approach doesn't include tuples which are deleted on pages where 
 vacuum operation is already finished.

 In my opinion the second approach gives the value nearer to the actual value,
 because it includes some of the new dead tuples also. Please correct me if 
 anything wrong in my analysis.
   I think main problem in nkeep logic is to come up with an
estimation algorithm similar to live tuples.

By the way, do you have test case or can you try to write a test case
which can show this problem and
then after fix, you can verify if the problem is resolved.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-07 Thread Haribabu kommi
On 07 November 2013 09:42 Amit Kapila wrote:
 On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
  On 22 October 2013 10:15 Amit Kapila wrote:
 On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi
 haribabu.ko...@huawei.com wrote:
 
 Actually what I had in mind was to use nkeep to estimate
 n_dead_tuples
 similar to how num_tuples is used to estimate n_live_tuples. I think
 it will match what Tom had pointed in his response (What
 would make more sense to me is for VACUUM to estimate the number
 of remaining dead tuples somehow and send that in its message.
 However, since the whole point here is that we aren't accounting
 for transactions that commit while VACUUM runs, it's not very
 clear how to do that.)
 
  I changed the patch as passing the nkeep counter data as the new
 dead tuples in the relation to stats like the new_rel_tuples.
  The nkeep counter is an approximation of dead tuples data of a
 relation.
  Instead of resetting dead tuples stats as zero, used this value to
 set n_dead_tuples same as n_live_tuples.
 
 Directly using nkeep might not work as it is not guaranteed that Vacuum
 will scan all the pages, we need to estimate the value similar to
 new_rel_tuples, something like as done in below function:
 
 /* now we can compute the new value for pg_class.reltuples */
 vacrelstats-new_rel_tuples = vac_estimate_reltuples(onerel, false,
 
 nblocks,
 
 vacrelstats-scanned_pages,
 
 num_tuples);
 
 I am not sure whether the same calculation as done for new_rel_tuples
 works for new_dead_tuples, you can once check it.

I didn't find any way to calculate new_dead_tuples like new_rel_tuples.
I will check it.
 
 I am thinking that if we have to do estimation anyway, then wouldn't it
 be better to do the way Tom had initially suggested (Maybe we could
 have VACUUM copy the n_dead_tuples value as it exists when VACUUM
 starts, and then send that as the value to subtract when it's done?)
 
 I think the reason you gave that due to tuple visibility check the
 number of dead tuples calculated by above logic is not accurate is
 right but still it will make the value of dead tuples more appropriate
 than it's current value.
 
 You can check if there is a way to do estimation of dead tuples similar
 to new tuples, and it will be as solid as current logic of
 vac_estimate_reltuples(), then it's okay, otherwise use the other
 solution (using the value of n_dead_tuples at start of Vacuum) to solve
 the problem.

The two approaches calculations are approximation values only.

1. Taking a copy of n_dead_tuples before VACUUM starts and then subtract it 
once it is done.
   This approach doesn't include the tuples which are remains during the vacuum 
operation.

2. nkeep counter contains the tuples which are still visible to other 
transactions.
   This approach doesn't include tuples which are deleted on pages where vacuum 
operation is already finished.

In my opinion the second approach gives the value nearer to the actual value,
because it includes some of the new dead tuples also. Please correct me if 
anything wrong in my analysis.

Regards,
Hari babu.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-11-06 Thread Amit Kapila
On Tue, Oct 22, 2013 at 2:09 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 22 October 2013 10:15 Amit Kapila wrote:
On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi haribabu.ko...@huawei.com 
wrote:

Actually what I had in mind was to use nkeep to estimate n_dead_tuples 
similar to how num_tuples is used to estimate n_live_tuples. I think it will 
match what Tom had pointed in his response (What
would make more sense to me is for VACUUM to estimate the number
of remaining dead tuples somehow and send that in its message.
However, since the whole point here is that we aren't accounting for
transactions that commit while VACUUM runs, it's not very clear how
to do that.)

 I changed the patch as passing the nkeep counter data as the new dead 
 tuples in the relation to stats like the new_rel_tuples.
 The nkeep counter is an approximation of dead tuples data of a relation.
 Instead of resetting dead tuples stats as zero, used this value to set 
 n_dead_tuples same as n_live_tuples.

Directly using nkeep might not work as it is not guaranteed that
Vacuum will scan all the pages, we need to estimate the value similar
to new_rel_tuples, something like as done in below function:

/* now we can compute the new value for pg_class.reltuples */
vacrelstats-new_rel_tuples = vac_estimate_reltuples(onerel, false,

nblocks,

vacrelstats-scanned_pages,

num_tuples);

I am not sure whether the same calculation as done for new_rel_tuples
works for new_dead_tuples, you can once check it.

I am thinking that if we have to do estimation anyway, then wouldn't
it be better to do the way Tom had initially suggested (Maybe we could
have VACUUM copy the n_dead_tuples value as it exists when VACUUM
starts, and then send that as the value to subtract when it's done?)

I think the reason you gave that due to tuple visibility check the
number of dead tuples calculated by above logic is not accurate is
right but still it will make the value of dead tuples more appropriate
than it's current value.

You can check if there is a way to do estimation of dead tuples
similar to new tuples, and it will be as solid as current logic of
vac_estimate_reltuples(), then it's okay, otherwise use the other
solution (using the value of n_dead_tuples at start of Vacuum) to
solve the problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-22 Thread Haribabu kommi
On 22 October 2013 10:15 Amit Kapila wrote:
On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi haribabu.ko...@huawei.com 
wrote:

 Yes, it's correct. nkeep counter have the dead tuples which are recently 
 dead and are not vacuumed. The removal of tuples vacuumed from dead tuples 
 should be the same as nkeep counter.
 So if we remove the nkeep from num_tuples which gives us the proper live 
 tuples. How about following statement at the end scan for all blocks.

 num_tuples -= nkeep;

Actually what I had in mind was to use nkeep to estimate n_dead_tuples similar 
to how num_tuples is used to estimate n_live_tuples. I think it will match 
what Tom had pointed in his response (What 
would make more sense to me is for VACUUM to estimate the number
of remaining dead tuples somehow and send that in its message.  
However, since the whole point here is that we aren't accounting for 
transactions that commit while VACUUM runs, it's not very clear how 
to do that.)

I changed the patch as passing the nkeep counter data as the new dead tuples 
in the relation to stats like the new_rel_tuples.
The nkeep counter is an approximation of dead tuples data of a relation.
Instead of resetting dead tuples stats as zero, used this value to set 
n_dead_tuples same as n_live_tuples.

Patch is attached in the mail. Please let me know if any changes are required.

Regards,
Hari Babu.



vacuum_fix_v3.patch
Description: vacuum_fix_v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-21 Thread Amit Kapila
On Mon, Oct 21, 2013 at 10:54 AM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 20 October 2013 12:06 Amit Kapila wrote:
On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi haribabu.ko...@huawei.com 
wrote:
 On 12 October 2013 11:30 Tom Lane wrote:
Haribabu kommi haribabu.ko...@huawei.com writes:

Another way to look at it is that we want to keep any increments to
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could 
have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, 
and then send that as the value to subtract when it's done?

 Taking of n_dead_tuples copy and pass the same at the vacuum end to
 subtract from table stats may not be correct, as vacuum may not be cleaned 
 all the dead tuples because of tuple visibility To other transactions. How 
 about resets the n_dead_tuples as zero if it goes negative because of 
 errors?

Wouldn't the way you are planing to change n_dead_tuples create inconsistency 
for n_live_tuples and n_dead_tuples, because it would have counted non 
deleted tuples as n_live_tuples as per below code:

if (tupgone)
{
..
tups_vacuumed += 1;
has_dead_tuples = true;
}
else
{
num_tuples += 1;
hastup = true;
..
}

So now if we just subtract tuples_deleted from n_dead_tuples, it will count 
the tuples deleted during vacuum both as live tuples and dead tuples.
There is one statistics for dead row version's that cannot be removed 
(nkeep), if we could use that to estimate total remaining dead tuples, then 
the solution can be inline with Tom's suggestion (What
would make more sense to me is for VACUUM to estimate the number of remaining 
dead tuples somehow and send that in its message.).

 Yes, it's correct. nkeep counter have the dead tuples which are recently 
 dead and are not vacuumed. The removal of tuples vacuumed from dead tuples 
 should be the same as nkeep counter.
 So if we remove the nkeep from num_tuples which gives us the proper live 
 tuples. How about following statement at the end scan for all blocks.

 num_tuples -= nkeep;

Actually what I had in mind was to use nkeep to estimate n_dead_tuples
similar to how num_tuples is used to estimate n_live_tuples. I think
it will match what Tom had pointed in his response (What would
make more sense to me is for VACUUM to estimate the number
of remaining dead tuples somehow and send that in its message.  However, 
since the whole point here is that we aren't accounting for transactions 
that commit while VACUUM runs, it's not very clear how to do that.)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-20 Thread Amit Kapila
On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 12 October 2013 11:30 Tom Lane wrote:
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead tuples
 as zero, how if the exact dead tuples are removed from the table stats. 
 With this approach vacuum gets triggered frequently thus it reduces the 
 bloat.

This does not seem like a very good idea as-is, because it will mean that 
n_dead_tuples can diverge arbitrarily far from reality over time, as a result 
of accumulation of errors.  It also doesn't seem
like a very good idea that VACUUM sets n_live_tuples while only adjusting 
n_dead_tuples incrementally; ideally those counters should move in the same 
fashion.
In short, I think this patch will create at least as many problems as it 
fixes.

What would make more sense to me is for VACUUM to estimate the number of 
remaining dead tuples somehow and send that in its message.  However, since 
the whole point here is that we aren't accounting for
transactions that commit while VACUUM runs, it's not very clear how to do 
that.

Another way to look at it is that we want to keep any increments to 
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could 
have VACUUM copy the n_dead_tuples value as it exists when
VACUUM starts, and then send that as the value to subtract when it's done?

 Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract 
 from table stats may not be correct, as vacuum may not be cleaned all the 
 dead tuples because of tuple visibility
 To other transactions. How about resets the n_dead_tuples as zero if it goes 
 negative because of errors?

Wouldn't the way you are planing to change n_dead_tuples create
inconsistency for n_live_tuples and n_dead_tuples, because it would
have counted non deleted tuples as n_live_tuples as per below code:

if (tupgone)
{
..
tups_vacuumed += 1;
has_dead_tuples = true;
}
else
{
num_tuples += 1;
hastup = true;
..
}

So now if we just subtract tuples_deleted from n_dead_tuples, it will
count the tuples deleted during vacuum both as live tuples and dead
tuples.
There is one statistics for dead row version's that cannot be removed
(nkeep), if we could use that to estimate total remaining dead tuples,
then the solution can be inline with Tom's suggestion (What would make
more sense to me is for VACUUM to estimate the number of remaining
dead tuples somehow and send that in its message.).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-20 Thread Haribabu kommi
On 20 October 2013 12:06 Amit Kapila wrote:
On Tue, Oct 15, 2013 at 3:37 PM, Haribabu kommi haribabu.ko...@huawei.com 
wrote:
 On 12 October 2013 11:30 Tom Lane wrote:
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead 
 tuples as zero, how if the exact dead tuples are removed from the table 
 stats. With this approach vacuum gets triggered frequently thus it reduces 
 the bloat.

This does not seem like a very good idea as-is, because it will mean 
that n_dead_tuples can diverge arbitrarily far from reality over time, as a 
result of accumulation of errors.  It also doesn't seem like a very good 
idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples 
incrementally; ideally those counters should move in the same fashion.
In short, I think this patch will create at least as many problems as it 
fixes.

What would make more sense to me is for VACUUM to estimate the number 
of remaining dead tuples somehow and send that in its message.  However, 
since the whole point here is that we aren't accounting for transactions 
that commit while VACUUM runs, it's not very clear how to do that.

Another way to look at it is that we want to keep any increments to 
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could 
have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts, 
and then send that as the value to subtract when it's done?

 Taking of n_dead_tuples copy and pass the same at the vacuum end to 
 subtract from table stats may not be correct, as vacuum may not be cleaned 
 all the dead tuples because of tuple visibility To other transactions. How 
 about resets the n_dead_tuples as zero if it goes negative because of errors?

Wouldn't the way you are planing to change n_dead_tuples create inconsistency 
for n_live_tuples and n_dead_tuples, because it would have counted non deleted 
tuples as n_live_tuples as per below code:

if (tupgone)
{
..
tups_vacuumed += 1;
has_dead_tuples = true;
}
else
{
num_tuples += 1;
hastup = true;
..
}

So now if we just subtract tuples_deleted from n_dead_tuples, it will count 
the tuples deleted during vacuum both as live tuples and dead tuples.
There is one statistics for dead row version's that cannot be removed (nkeep), 
if we could use that to estimate total remaining dead tuples, then the 
solution can be inline with Tom's suggestion (What 
would make more sense to me is for VACUUM to estimate the number of remaining 
dead tuples somehow and send that in its message.).

Yes, it's correct. nkeep counter have the dead tuples which are recently dead 
and are not vacuumed. The removal of tuples vacuumed from dead tuples should be 
the same as nkeep counter.
So if we remove the nkeep from num_tuples which gives us the proper live 
tuples. How about following statement at the end scan for all blocks.

num_tuples -= nkeep;

please let me know if any corrections are required.
Patch with the above implementation is attached in the mail.

Regards,
Hari babu.



vacuum_fix_v2.patch
Description: vacuum_fix_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-15 Thread Haribabu kommi
On 12 October 2013 11:30 Tom Lane wrote: 
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead tuples 
 as zero, how if the exact dead tuples are removed from the table stats. With 
 this approach vacuum gets triggered frequently thus it reduces the bloat.

This does not seem like a very good idea as-is, because it will mean that 
n_dead_tuples can diverge arbitrarily far from reality over time, as a result 
of accumulation of errors.  It also doesn't seem 
like a very good idea that VACUUM sets n_live_tuples while only adjusting 
n_dead_tuples incrementally; ideally those counters should move in the same 
fashion.
In short, I think this patch will create at least as many problems as it fixes.

What would make more sense to me is for VACUUM to estimate the number of 
remaining dead tuples somehow and send that in its message.  However, since 
the whole point here is that we aren't accounting for 
transactions that commit while VACUUM runs, it's not very clear how to do that.

Another way to look at it is that we want to keep any increments to 
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could have 
VACUUM copy the n_dead_tuples value as it exists when 
VACUUM starts, and then send that as the value to subtract when it's done?

Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract 
from table stats may not be correct, as vacuum may not be cleaned all the dead 
tuples because of tuple visibility
To other transactions. How about resets the n_dead_tuples as zero if it goes 
negative because of errors?


Regards,
Hari babu.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-12 Thread Tom Lane
Haribabu kommi haribabu.ko...@huawei.com writes:
 To handle the above case instead of directly resetting the dead tuples as 
 zero, how if the exact dead tuples
 are removed from the table stats. With this approach vacuum gets triggered 
 frequently thus it reduces the bloat.

This does not seem like a very good idea as-is, because it will mean that
n_dead_tuples can diverge arbitrarily far from reality over time, as a
result of accumulation of errors.  It also doesn't seem like a very good
idea that VACUUM sets n_live_tuples while only adjusting n_dead_tuples
incrementally; ideally those counters should move in the same fashion.
In short, I think this patch will create at least as many problems as
it fixes.

What would make more sense to me is for VACUUM to estimate the number
of remaining dead tuples somehow and send that in its message.  However,
since the whole point here is that we aren't accounting for transactions
that commit while VACUUM runs, it's not very clear how to do that.

Another way to look at it is that we want to keep any increments to
n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could
have VACUUM copy the n_dead_tuples value as it exists when VACUUM starts,
and then send that as the value to subtract when it's done?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-11 Thread Haribabu kommi
vacuum is not happening on a heavily modified big table even if the dead tuples 
are more than configured threshold.

This is because during at the end of vacuum, the number of dead tuples of the 
table is reset as zero, because
of this reason the dead tuples which are occurred during the vacuum operation 
are lost. Thus to trigger a next vacuum
on the same table, the configured threshold number of dead tuples needs to be 
occurred.
The next vacuum operation is taking more time because of more number of dead 
tuples, like this it continues and it leads
to Table and index bloat.

To handle the above case instead of directly resetting the dead tuples as zero, 
how if the exact dead tuples
are removed from the table stats. With this approach vacuum gets triggered 
frequently thus it reduces the bloat.
Patch for the same is attached in the mail.

please let me know is there any problem in this approach.

Regards,
Hari babu.


vacuum_fix_v1.patch
Description: vacuum_fix_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-11 Thread Kevin Grittner
Haribabu kommi haribabu.ko...@huawei.com wrote:

 To handle the above case instead of directly resetting the dead
 tuples as zero, how if the exact dead tuples are removed from the
 table stats. With this approach vacuum gets triggered frequently
 thus it reduces the bloat.
 Patch for the same is attached in the mail.

Please add this to the open CommitFest to ensure that it gets
reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

For more information about the process, see:

http://wiki.postgresql.org/wiki/CommitFest

You may also want to reveiw:

http://wiki.postgresql.org/wiki/Developer_FAQ

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers