[patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

David Gould Wed, 17 Jan 2018 16:49:58 -0800

Please add the attached patch and this discussion to the open commit fest. The
original bugs thread is here: 20180111111254.1408.8...@wrigleys.postgresql.org.


Bug reference:      15005
Logged by:          David Gould
Email address:      da...@sonic.net
PostgreSQL version: 10.1 and earlier
Operating system:   Linux
Description:        

ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual
row counts for tables that are larger than the default_statistics_target.

Example from one of a clients production instances:

# analyze verbose pg_attribute;
INFO:  analyzing "pg_catalog.pg_attribute"
INFO:  "pg_attribute": scanned 30000 of 24519424 pages, containing 6475 live 
rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total rows.

This is a large complex database - pg_attribute actually has about five
million rows and needs about one hundred thouand pages. However it has
become extremely bloated and is taking 25 million pages (192GB!), about 250
times too much. This happened despite aggressive autovacuum settings and a
periodic bloat monitoring script. Since pg_class.reltuples was 800 million,
the bloat monitoring script did not detect that this table was bloated and
autovacuum did not think it needed vacuuming.
 
When reltuples is very large compared to the actual row count it causes a
number of problems:

- Bad input to the query planner.
- Prevents autovacuum from processing large bloated tables because
  autovacuum_scale_factor * reltuples is large enough the threshold is
  rarely reached.
- Decieves bloat checking tools that rely on the relationship of relpages
  to reltuples*average_row_size.

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- analyze_reltuples_bug-v1.patch   Patch against master
- README.txt                       Instructions for testing
- analyze_reltuples_bug.sql        Reproduction script
- analyze_counts.awk               Helper for viewing results of test
- test_standard.txt                Test output for unpatched postgresql 10.1
- test_patched.txt                 Test output with patch

The patch applies cleanly, with some offsets, to 9.4.15, 9.5.10, 9.6.6 and 10.1.

Note that this is not the same as the reltuples calculation bug discussed in the
thread at 16db4468-edfa-830a-f921-39a50498e...@2ndquadrant.com. That one is
mainly concerned with vacuum, this with analyze. The two bugs do amplify each
other though.

Analysis:
---------

Analyze and vacuum calculate the new value for pg_class.reltuples in
vacuum.c:vac_estimate_reltuples():

    old_density = old_rel_tuples / old_rel_pages;
    new_density = scanned_tuples / scanned_pages;
    multiplier = (double) scanned_pages / (double) total_pages;
    updated_density = old_density + (new_density - old_density) * multiplier;
    return floor(updated_density * total_pages + 0.5);

The comments talk about the difference between VACUUM and ANALYZE and explain
that VACUUM probably only scanned changed pages so the density of the scanned
pages is not representative of the rest of the unchanged table. Hence the new
overall density of the table should be adjusted proportionaly to the scanned
pages vs total pages. Which makes sense. However despite the comment noteing
that ANALYZE and VACUUM are different, the code actually does the same
calculation for both.

The problem is that it dilutes the impact of ANALYZE on reltuples for large
tables:

- For a table of 3000000 pages an analyze can only change the reltuples
  value by 1%.
- When combined with changes in relpages due to bloat the new computed
  reltuples can end up far from reality.

Reproducing the reltuples analyze estimate bug.
-----------------------------------------------

The script "reltuples_analyze_bug.sql" creates a table that is large
compared to the analyze sample size and then repeatedly updates about
10% of it followed by an analyze each iteration. The bug is that the
calculation analyze uses to update pg_class.reltuples will tend to
increase each time even though the actual rowcount does not change.

To run:

Given a postgresql 10.x server with >= 1GB of shared buffers:

  createdb test
  psql --no-psqlrc -f analyze_reltuples_bug.sql test > test_standard.out 2>&1
  awk -f analyze_counts.awk test_standard.out

To verify the fix, restart postgres with a patched binary and repeat
the above.

Here are the results with an unpatched server:

After 10 interations of:
  update 10% of rows;
  analyze

reltuples has almost doubled.

               / estimated rows  /    /   pages   /   /sampled rows/
   relname       current  proposed    total scanned    live    dead
reltuples_test  10000001  10000055   153847   3000   195000       0
reltuples_test  10981367   9951346   169231   3000   176410   18590
reltuples_test  11948112  10039979   184615   3000   163150   31850
reltuples_test  12900718  10070666   200000   3000   151060   43940
reltuples_test  13835185   9739305   215384   3000   135655   59345
reltuples_test  14758916   9864947   230768   3000   128245   66755
reltuples_test  15674572  10138631   246153   3000   123565   71435
reltuples_test  16576847   9910944   261537   3000   113685   81315
reltuples_test  17470388  10019961   276922   3000   108550   86450
reltuples_test  18356707  10234607   292306   3000   105040   89960
reltuples_test  19228409   9639927   307690   3000    93990  101010

-dg

-- 
David Gould                                   da...@sonic.net
If simplicity worked, the world would be overrun with insects.

Reproducing the reltuples analyze estimate bug.
-----------------------------------------------

The script "reltuples_analyze_bug.sql" creates a table that is large
compared to the analyze sample size and then repeatedly updates about
10% of it followed by an analyze each iteration. The bug is that the
calculation analyze uses to update pg_class.reltuples will tend to
increase each time even though the actual rowcount does not change.

To run:

Given a postgresql 10.x server with >= 1GB of shared buffers:

  createdb test
  psql --no-psqlrc -f analyze_reltuples_bug.sql test > test_standard.out 2>&1
  awk -f analyze_counts.awk test_standard.out

To verify the fix, restart postgres with a patched binary and repeat
the above.

Here are the results with an unpatched server:

After 10 interations of:
  update 10% of rows;
  analyze

reltuples has almost doubled.

               / estimated rows  /    /   pages   /   /sampled rows/
   relname       current  proposed    total scanned    live    dead
reltuples_test  10000001  10000055   153847   3000   195000       0
reltuples_test  10981367   9951346   169231   3000   176410   18590
reltuples_test  11948112  10039979   184615   3000   163150   31850
reltuples_test  12900718  10070666   200000   3000   151060   43940
reltuples_test  13835185   9739305   215384   3000   135655   59345
reltuples_test  14758916   9864947   230768   3000   128245   66755
reltuples_test  15674572  10138631   246153   3000   123565   71435
reltuples_test  16576847   9910944   261537   3000   113685   81315
reltuples_test  17470388  10019961   276922   3000   108550   86450
reltuples_test  18356707  10234607   292306   3000   105040   89960
reltuples_test  19228409   9639927   307690   3000    93990  101010

analyze_reltuples_bug.sql
Description: application/sql

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index cbd6e9b161..ebf03de45f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -766,10 +766,12 @@ vacuum_set_xid_limits(Relation rel,
  *
  *		If we scanned the whole relation then we should just use the count of
  *		live tuples seen; but if we did not, we should not trust the count
- *		unreservedly, especially not in VACUUM, which may have scanned a quite
- *		nonrandom subset of the table.  When we have only partial information,
- *		we take the old value of pg_class.reltuples as a measurement of the
- *		tuple density in the unscanned pages.
+ *		unreservedly, we have only partial information. VACUUM in particular
+ *		may have scanned a quite nonrandom subset of the table, so we take
+ *		the old value of pg_class.reltuples as a measurement of the tuple
+ *		density in the unscanned pages. However, ANALYZE promises that we
+ *		scanned a representative random sample of the table so we should use
+ *	 	the new density directly.
  *
  *		This routine is shared by VACUUM and ANALYZE.
  */
@@ -791,45 +793,39 @@ vac_estimate_reltuples(Relation relation, bool is_analyze,
 		return scanned_tuples;
 
 	/*
-	 * If scanned_pages is zero but total_pages isn't, keep the existing value
-	 * of reltuples.  (Note: callers should avoid updating the pg_class
-	 * statistics in this situation, since no new information has been
-	 * provided.)
+	 * If scanned_pages is zero, keep the existing value of reltuples.
+	 * (Note: callers should avoid updating the pg_class statistics in
+	 * this situation, since no new information has been provided.)
 	 */
 	if (scanned_pages == 0)
 		return old_rel_tuples;
 
 	/*
+	 * For ANALYZE, the newly observed density in the pages scanned is
+	 * based on a representative sample of the whole table and can be
+	 * used as-is.
+	 */
+	new_density = scanned_tuples / scanned_pages;
+	if (is_analyze)
+		return floor(new_density * total_pages + 0.5);
+
+	/*
 	 * If old value of relpages is zero, old density is indeterminate; we
-	 * can't do much except scale up scanned_tuples to match total_pages.
+	 * can't do much except use the new_density to scale up scanned_tuples
+	 * to match total_pages.
 	 */
 	if (old_rel_pages == 0)
-		return floor((scanned_tuples / scanned_pages) * total_pages + 0.5);
+		return floor(new_density * total_pages + 0.5);
 
 	/*
-	 * Okay, we've covered the corner cases.  The normal calculation is to
-	 * convert the old measurement to a density (tuples per page), then update
-	 * the density using an exponential-moving-average approach, and finally
-	 * compute reltuples as updated_density * total_pages.
-	 *
-	 * For ANALYZE, the moving average multiplier is just the fraction of the
-	 * table's pages we scanned.  This is equivalent to assuming that the
-	 * tuple density in the unscanned pages didn't change.  Of course, it
-	 * probably did, if the new density measurement is different. But over
-	 * repeated cycles, the value of reltuples will converge towards the
-	 * correct value, if repeated measurements show the same new density.
-	 *
-	 * For VACUUM, the situation is a bit different: we have looked at a
-	 * nonrandom sample of pages, but we know for certain that the pages we
-	 * didn't look at are precisely the ones that haven't changed lately.
-	 * Thus, there is a reasonable argument for doing exactly the same thing
-	 * as for the ANALYZE case, that is use the old density measurement as the
-	 * value for the unscanned pages.
-	 *
-	 * This logic could probably use further refinement.
+	 * For VACUUM, the situation is different: we have looked at a nonrandom
+	 * sample of pages, and we know that the pages we didn't look at are
+	 * the ones that haven't changed lately. Thus, we use the old density
+	 * measurement for the unscanned pages and combine it with the observed
+	 * new density scaled by the ratio of scanned to unscanned pages.
 	 */
+
 	old_density = old_rel_tuples / old_rel_pages;
-	new_density = scanned_tuples / scanned_pages;
 	multiplier = (double) scanned_pages / (double) total_pages;
 	updated_density = old_density + (new_density - old_density) * multiplier;
 	return floor(updated_density * total_pages + 0.5);

\set QUIET on
set default_statistics_target = :scale ;
set work_mem = '512MB';
set maintenance_work_mem = '1GB';
begin;
-- table is padded to get reasonable number of rows per page
drop table if exists reltuples_test;
create table reltuples_test as
select id,
       id % 10 as ten, id % 100 as hun, id % 10000 as thou,
       id % 10000 as tenk, id %100000 as lahk, id % 1000000 as meg,
       'some extra padding to adjust rowsize' as padding
  from generate_series(1::bigint, (:scale * 1000000)::bigint) g(id);
-- avoid extranous vacuums and analyzes and add primary key
alter table reltuples_test
  set (autovacuum_enabled = false),
  add constraint reltuples_test_pkey primary key (id);
commit;
-- vacuum verbose reltuples_test;
analyze verbose reltuples_test;
psql:reltuples_analyze_bug.sql:35: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:35: INFO:  "reltuples_test": scanned 3000 of 
153847 pages, containing 195000 live rows and 0 dead rows; 3000 rows in sample, 
10000001 estimated total rows
\echo "-- Starting Status --"
"-- Starting Status --"
select relpages, reltuples,
       (reltuples/relpages)::int as rows_per_page,
       relname
  from pg_class where relname ~ '^reltuples_test';
 relpages | reltuples | rows_per_page |       relname       
----------+-----------+---------------+---------------------
   153847 |     1e+07 |            65 | reltuples_test
    27422 |     1e+07 |           365 | reltuples_test_pkey
(2 rows)

\echo

-- Update 1% of table at a time for :iterations.
-- Note: bug is that reltuples increases with each iteration.
select format('update reltuples_test set id = id + %s where id < %s',
              :scale * 1000000, n * :scale * 1000000 * :update_frac)
       as update_query,
       'analyze verbose reltuples_test' as estimate
  from generate_series(1, :iterations) g(n)
\gexec
update reltuples_test set id = id + 10000000 where id < 1000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
169231 pages, containing 176410 live rows and 18590 dead rows; 3000 rows in 
sample, 10981367 estimated total rows
update reltuples_test set id = id + 10000000 where id < 2000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
184615 pages, containing 163150 live rows and 31850 dead rows; 3000 rows in 
sample, 11948112 estimated total rows
update reltuples_test set id = id + 10000000 where id < 3000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
200000 pages, containing 151060 live rows and 43940 dead rows; 3000 rows in 
sample, 12900718 estimated total rows
update reltuples_test set id = id + 10000000 where id < 4000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
215384 pages, containing 135655 live rows and 59345 dead rows; 3000 rows in 
sample, 13835185 estimated total rows
update reltuples_test set id = id + 10000000 where id < 5000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
230768 pages, containing 128245 live rows and 66755 dead rows; 3000 rows in 
sample, 14758916 estimated total rows
update reltuples_test set id = id + 10000000 where id < 6000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
246153 pages, containing 123565 live rows and 71435 dead rows; 3000 rows in 
sample, 15674572 estimated total rows
update reltuples_test set id = id + 10000000 where id < 7000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
261537 pages, containing 113685 live rows and 81315 dead rows; 3000 rows in 
sample, 16576847 estimated total rows
update reltuples_test set id = id + 10000000 where id < 8000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
276922 pages, containing 108550 live rows and 86450 dead rows; 3000 rows in 
sample, 17470388 estimated total rows
update reltuples_test set id = id + 10000000 where id < 9000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
292306 pages, containing 105040 live rows and 89960 dead rows; 3000 rows in 
sample, 18356707 estimated total rows
update reltuples_test set id = id + 10000000 where id < 10000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
307690 pages, containing 93990 live rows and 101010 dead rows; 3000 rows in 
sample, 19228409 estimated total rows
-- final  reltuples and rows per page
\echo "-- Post Test Status --"
"-- Post Test Status --"
select relpages, reltuples,
       (reltuples/relpages)::int as rows_per_page,
       relname
  from pg_class where relname ~ '^reltuples_test';
 relpages |  reltuples  | rows_per_page |       relname       
----------+-------------+---------------+---------------------
   307690 | 1.92284e+07 |            62 | reltuples_test
    54839 | 1.92284e+07 |           351 | reltuples_test_pkey
(2 rows)

\set QUIET on
set default_statistics_target = :scale ;
set work_mem = '512MB';
set maintenance_work_mem = '1GB';
begin;
-- table is padded to get reasonable number of rows per page
drop table if exists reltuples_test;
create table reltuples_test as
select id,
       id % 10 as ten, id % 100 as hun, id % 10000 as thou,
       id % 10000 as tenk, id %100000 as lahk, id % 1000000 as meg,
       'some extra padding to adjust rowsize' as padding
  from generate_series(1::bigint, (:scale * 1000000)::bigint) g(id);
-- avoid extranous vacuums and analyzes and add primary key
alter table reltuples_test
  set (autovacuum_enabled = false),
  add constraint reltuples_test_pkey primary key (id);
commit;
-- vacuum verbose reltuples_test;
analyze verbose reltuples_test;
psql:reltuples_analyze_bug.sql:35: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:35: INFO:  "reltuples_test": scanned 3000 of 
153847 pages, containing 195000 live rows and 0 dead rows; 3000 rows in sample, 
10000055 estimated total rows
\echo "-- Starting Status --"
"-- Starting Status --"
select relpages, reltuples,
       (reltuples/relpages)::int as rows_per_page,
       relname
  from pg_class where relname ~ '^reltuples_test';
 relpages |  reltuples  | rows_per_page |       relname       
----------+-------------+---------------+---------------------
   153847 | 1.00001e+07 |            65 | reltuples_test
    27422 | 1.00001e+07 |           365 | reltuples_test_pkey
(2 rows)

\echo

-- Update 1% of table at a time for :iterations.
-- Note: bug is that reltuples increases with each iteration.
select format('update reltuples_test set id = id + %s where id < %s',
              :scale * 1000000, n * :scale * 1000000 * :update_frac)
       as update_query,
       'analyze verbose reltuples_test' as estimate
  from generate_series(1, :iterations) g(n)
\gexec
update reltuples_test set id = id + 10000000 where id < 1000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
169231 pages, containing 176475 live rows and 18525 dead rows; 3000 rows in 
sample, 9955014 estimated total rows
update reltuples_test set id = id + 10000000 where id < 2000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
184615 pages, containing 162175 live rows and 32825 dead rows; 3000 rows in 
sample, 9979979 estimated total rows
update reltuples_test set id = id + 10000000 where id < 3000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
200000 pages, containing 149565 live rows and 45435 dead rows; 3000 rows in 
sample, 9971000 estimated total rows
update reltuples_test set id = id + 10000000 where id < 4000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
215384 pages, containing 138645 live rows and 56355 dead rows; 3000 rows in 
sample, 9953972 estimated total rows
update reltuples_test set id = id + 10000000 where id < 5000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
230768 pages, containing 128375 live rows and 66625 dead rows; 3000 rows in 
sample, 9874947 estimated total rows
update reltuples_test set id = id + 10000000 where id < 6000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
246153 pages, containing 121643 live rows and 73385 dead rows; 3000 rows in 
sample, 9980930 estimated total rows
update reltuples_test set id = id + 10000000 where id < 7000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
261537 pages, containing 113945 live rows and 81055 dead rows; 3000 rows in 
sample, 9933611 estimated total rows
update reltuples_test set id = id + 10000000 where id < 8000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
276922 pages, containing 110110 live rows and 84890 dead rows; 3000 rows in 
sample, 10163960 estimated total rows
update reltuples_test set id = id + 10000000 where id < 9000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
292306 pages, containing 104910 live rows and 90090 dead rows; 3000 rows in 
sample, 10221941 estimated total rows
update reltuples_test set id = id + 10000000 where id < 10000000.00
analyze verbose reltuples_test
psql:reltuples_analyze_bug.sql:50: INFO:  analyzing "public.reltuples_test"
psql:reltuples_analyze_bug.sql:50: INFO:  "reltuples_test": scanned 3000 of 
307690 pages, containing 97890 live rows and 97110 dead rows; 3000 rows in 
sample, 10039925 estimated total rows
-- final  reltuples and rows per page
\echo "-- Post Test Status --"
"-- Post Test Status --"
select relpages, reltuples,
       (reltuples/relpages)::int as rows_per_page,
       relname
  from pg_class where relname ~ '^reltuples_test';
 relpages |  reltuples  | rows_per_page |       relname       
----------+-------------+---------------+---------------------
   307690 | 1.00399e+07 |            33 | reltuples_test
    54839 | 1.00399e+07 |           183 | reltuples_test_pkey
(2 rows)

[patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

Reply via email to