Re: Wrong example in the bloom documentation

Daniel Westermann (DWE) Mon, 05 Oct 2020 01:17:18 -0700

Hi,

any thoughts on this?


Regards
Daniel



________________________________
From: Daniel Westermann (DWE)
Sent: Sunday, September 27, 2020 17:58
To: Pg Docs <[email protected]>
Subject: Wrong example in the bloom documentation

Hi,

I've briefly discussed this with Bruce some time ago in [1].
Replaying the example referenced in the documentation does not give a Bitmap 
Heap Scan on tbloom but a parallel seq scan with the default configuration:

-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 
123451;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127220.00 rows=250 width=24) (actual 
time=2134.851..2221.836 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126195.00 rows=104 width=24) 
(actual time=1770.691..1770.692 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning Time: 0.895 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, 
Emission 40.658 ms, Total 152.499 ms
 Execution Time: 2288.056 ms
(12 rows)


As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this 
version the example is correct:
postgres=# select version();
                                                 version
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 
(Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 
123451;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178436.06..179392.83 rows=250 width=24) 
(actual time=2279.363..2279.363 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2329
   Heap Blocks: exact=2288
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=250 width=0) 
(actual time=994.406..994.406 rows=2329 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 282.059 ms
 Execution time: 2286.138 ms
(8 rows)

The reason is that parallel execution is disabled by default in 9.6, and if 
that is turned on the plan changes there as well:

postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 
123451;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127194.29 rows=1 width=24) (actual 
time=1148.047..1148.206 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126194.19 rows=1 width=24) 
(actual time=1039.501..1039.501 rows=0 loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning time: 0.580 ms
 Execution time: 1148.247 ms
(8 rows)

Starting with PostgreSQL 10 the example in the documentation is therefore 
wrong. Attached a proposal to fix this. The new example starts with 100x 
reduced rows (as suggested by Bruce in [1] and adds a note that the behavior 
changes as soon as parallel execution is cheaper than the index access.

Thoughts?

Regards
Daniel

[1] 
https://www.postgresql.org/message-id/flat/20191105231854.GA26542%40momjian.us#7859b106ce614dd9530941196dad5bbc

Re: Wrong example in the bloom documentation

Reply via email to