Hi,

I've been looking at this issue today, and so far I don't think it's a bug in the foreign key estimation. It seems mostly that the 9.5 estimates were hopelessly bad, and the join estimation changes simply pushed it a tiny bit the wrong direction.

Although maybe there is a bug (or at least a change of behavior) in one case, but I'll get to that.

I've managed to extract a small part of Q20 that demonstrates the differences between versions quite nicely, I think. The part causing the trouble looks like this:

  explain select
      ps_suppkey
  from
      partsupp,
      (
          select
              l_partkey agg_partkey,
              l_suppkey agg_suppkey
          from
              lineitem
          where
              l_shipdate >= date '1997-01-01'
              and l_shipdate < date '1997-01-01' + interval '1' year
          group by
              l_partkey,
              l_suppkey
      ) agg_lineitem
  where
      agg_partkey = ps_partkey
      and agg_suppkey = ps_suppkey
      and ps_partkey in (
          select
              p_partkey
          from
              part
          where
              p_name like 'hot%'
      );

i.e. it aggregates the "lineitem" table, and then joins "partsupp" and "part" tables to it.

    "aggregated lineitem" <-> partsupp <-> part

I've collected estimates from four different variants of the query (see the attached exlain.sql):

1) SIMPLE
  - join directly to lineitem (without the aggregation)
  - remove the p_name LIKE pattern matching

2) SIMPLE+LIKE
  - like SIMPLE, but keep the LIKE condition

3) GROUPING
  - join to the aggregated lineitem table
  - remove the p_name LIKE pattern matching

4) GROUPING+LIKE
  - like GROUPING, but keep the LIKE condition

I've collected estimates on a 20GB data set, both from 9.5 (so without any of the FK estimation changes) and on master with different foreign keys between the tables.

no-keys  - no foreign keys between the three tables
lineitem - lineitem references partsupp
partsupp - partsupp references part
both     - both foreign keys

And the results look like this (actual row counts were collected on 9.5, but that should not matter - the results should be the same on all versions):

      branch     SIMPLE     SIMPLE+LIKE     GROUPING    GROUPING+LIKE
  --------------------------------------------------------------------
      actual  119994608         1311974     10897186           119238
         9.5       2863              35          160              160
     no-keys       2340              24          868              868
    lineitem  119994848         1229750          868              868
    partsupp       2340              24         1737               18
   both-keys  119994848         1212065         1737               18

This seems mostly sane, I guess, but let's look at various cases.

In the SIMPLE cases, the foreign key "lineitem->partsupp" makes a huge difference - the estimates are pretty exact, both with and without the LIKE condition. The "partsupp->part" key makes almost no difference, though - the minor differences (35/24 and 1229750/1212065) seem to be mostly due to minor differences in stats built by ANALYZE, particularly in histograms used by patternsel().

In the GROUPING cases, the situation is obviously much worse. The grouping makes it impossible to use the "lineitem->partsupp" foreign key, resulting in severe underestimates. The "partsupp->part" is used, but the difference is pretty negligible as it's a simple (one column) foreign key.

The change from 160 -> 868 is merely due to 84f9a35e3 changing how we estimate number of groups in a GROUP BY clause. In 9.5 we get this:

    ->  HashAggregate  (rows=1836028) (actual rows=10897186)

while since 9.6 we get this

    ->  GroupAggregate  (rows=9674242)

Not only is that much closer to the actual value than the 9.5 estimate, but it's almost exactly the factor between 160 and 868:

    9674242 / 1836028 = 5.27
    160 * 5.26 = 843

So I'd say the 160 vs. 868 is expected, although the result is still way off, of course.

Which brings me to the slightly suspicious bit. On 9.5, there's no difference between GROUP and GROUP+LIKE cases - the estimates are exactly the same in both cases. This is true too, but only without the foreign key between "partsupp" and "part", i.e. the two non-grouped relations in the join. And what's more, the difference (1737 vs. 16) is pretty much exactly 100x, which is the estimate for the LIKE condition.

So it kinda seems 9.5 does not apply this condition for semi-joins, while >=9.6 does that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
                                                   QUERY PLAN                   
                                 
-----------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=23613912.46..24689677.34 rows=2340 width=4)
   ->  Merge Join  (cost=23613912.03..24673859.22 rows=2340 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  Sort  (cost=20971852.46..21271839.58 rows=119994848 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
         ->  Materialize  (cost=2642034.49..2722034.49 rows=16000000 width=8)
               ->  Sort  (cost=2642034.49..2682034.49 rows=16000000 width=8)
                     Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
                     ->  Seq Scan on partsupp  (cost=0.00..508759.00 
rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..6.75 rows=1 width=4)
         Index Cond: (p_partkey = lineitem.l_partkey)
(12 rows)

                                            QUERY PLAN                          
                  
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.00..1621494.68 rows=24 width=4)
   ->  Nested Loop  (cost=0.43..473219.70 rows=161616 width=12)
         ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
               Filter: ((p_name)::text ~~ 'hot%'::text)
         ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..8.10 rows=35 width=8)
               Index Cond: (ps_partkey = part.p_partkey)
   ->  Index Only Scan using idx_lineitem_part_supp on lineitem  
(cost=0.57..7.09 rows=1 width=8)
         Index Cond: ((l_partkey = partsupp.ps_partkey) AND (l_suppkey = 
partsupp.ps_suppkey))
(8 rows)

                                                                       QUERY 
PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=5205236.43..6886133.98 rows=1737 width=4)
   Join Filter: (lineitem.l_partkey = part.p_partkey)
   ->  Merge Join  (cost=5205236.00..6885310.01 rows=1737 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
                     Sort Key: lineitem.l_partkey, lineitem.l_suppkey
                     ->  Bitmap Heap Scan on lineitem  
(cost=465856.38..2767269.81 rows=18160762 width=8)
                           Recheck Cond: ((l_shipdate >= '1997-01-01'::date) 
AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                           ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                                 Index Cond: ((l_shipdate >= 
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without 
time zone))
         ->  Index Only Scan using partsupp_pkey on partsupp  
(cost=0.43..1221995.31 rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..0.47 rows=1 width=4)
         Index Cond: (p_partkey = partsupp.ps_partkey)
(15 rows)

                                                                    QUERY PLAN  
                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5671028.84..6050302.90 rows=18 width=4)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Bitmap Heap Scan on lineitem  (cost=465856.38..2767269.81 
rows=18160762 width=8)
                     Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                     ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                           Index Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
   ->  Sort  (cost=465793.27..466197.31 rows=161616 width=12)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         ->  Nested Loop  (cost=0.43..451811.70 rows=161616 width=12)
               ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
               ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..7.57 rows=35 width=8)
                     Index Cond: (ps_partkey = part.p_partkey)
(17 rows)
                                                QUERY PLAN                      
                           
-----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=24060582.74..25120389.31 rows=119994848 width=4)
   Merge Cond: ((lineitem.l_suppkey = partsupp.ps_suppkey) AND 
(lineitem.l_partkey = partsupp.ps_partkey))
   ->  Sort  (cost=20971852.46..21271839.58 rows=119994848 width=8)
         Sort Key: lineitem.l_suppkey, lineitem.l_partkey
         ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
   ->  Materialize  (cost=3088641.19..3168641.19 rows=16000000 width=12)
         ->  Sort  (cost=3088641.19..3128641.19 rows=16000000 width=12)
               Sort Key: partsupp.ps_suppkey, partsupp.ps_partkey
               ->  Hash Join  (cost=171919.20..900678.20 rows=16000000 width=12)
                     Hash Cond: (partsupp.ps_partkey = part.p_partkey)
                     ->  Seq Scan on partsupp  (cost=0.00..508759.00 
rows=16000000 width=8)
                     ->  Hash  (cost=121919.09..121919.09 rows=4000009 width=4)
                           ->  Seq Scan on part  (cost=0.00..121919.09 
rows=4000009 width=4)
(13 rows)

                                            QUERY PLAN                          
                  
--------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.00..1621494.68 rows=1212065 width=4)
   ->  Nested Loop  (cost=0.43..473219.70 rows=161616 width=12)
         ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
               Filter: ((p_name)::text ~~ 'hot%'::text)
         ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..8.10 rows=35 width=8)
               Index Cond: (ps_partkey = part.p_partkey)
   ->  Index Only Scan using idx_lineitem_part_supp on lineitem  
(cost=0.57..7.09 rows=1 width=8)
         Index Cond: ((l_partkey = partsupp.ps_partkey) AND (l_suppkey = 
partsupp.ps_suppkey))
(8 rows)

                                                                       QUERY 
PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=5205236.43..6886133.98 rows=1737 width=4)
   Join Filter: (lineitem.l_partkey = part.p_partkey)
   ->  Merge Join  (cost=5205236.00..6885310.01 rows=1737 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
                     Sort Key: lineitem.l_partkey, lineitem.l_suppkey
                     ->  Bitmap Heap Scan on lineitem  
(cost=465856.38..2767269.81 rows=18160762 width=8)
                           Recheck Cond: ((l_shipdate >= '1997-01-01'::date) 
AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                           ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                                 Index Cond: ((l_shipdate >= 
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without 
time zone))
         ->  Index Only Scan using partsupp_pkey on partsupp  
(cost=0.43..1221995.31 rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..0.47 rows=1 width=4)
         Index Cond: (p_partkey = partsupp.ps_partkey)
(15 rows)

                                                                    QUERY PLAN  
                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5671028.84..6050302.90 rows=18 width=4)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Bitmap Heap Scan on lineitem  (cost=465856.38..2767269.81 
rows=18160762 width=8)
                     Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                     ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                           Index Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
   ->  Sort  (cost=465793.27..466197.31 rows=161616 width=12)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         ->  Nested Loop  (cost=0.43..451811.70 rows=161616 width=12)
               ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
               ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..7.57 rows=35 width=8)
                     Index Cond: (ps_partkey = part.p_partkey)
(17 rows)
                                                QUERY PLAN                      
                           
-----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=24060582.74..25120389.31 rows=119994848 width=4)
   Merge Cond: ((lineitem.l_suppkey = partsupp.ps_suppkey) AND 
(lineitem.l_partkey = partsupp.ps_partkey))
   ->  Sort  (cost=20971852.46..21271839.58 rows=119994848 width=8)
         Sort Key: lineitem.l_suppkey, lineitem.l_partkey
         ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
   ->  Materialize  (cost=3088641.19..3168641.19 rows=16000000 width=12)
         ->  Sort  (cost=3088641.19..3128641.19 rows=16000000 width=12)
               Sort Key: partsupp.ps_suppkey, partsupp.ps_partkey
               ->  Hash Join  (cost=171919.20..900678.20 rows=16000000 width=12)
                     Hash Cond: (partsupp.ps_partkey = part.p_partkey)
                     ->  Seq Scan on partsupp  (cost=0.00..508759.00 
rows=16000000 width=8)
                     ->  Hash  (cost=121919.09..121919.09 rows=4000009 width=4)
                           ->  Seq Scan on part  (cost=0.00..121919.09 
rows=4000009 width=4)
(13 rows)

                                              QUERY PLAN                        
                       
-------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3947865.64..4839365.50 rows=1229750 width=4)
   Merge Cond: (partsupp.ps_partkey = lineitem.l_partkey)
   Join Filter: (lineitem.l_suppkey = partsupp.ps_suppkey)
   ->  Index Scan using idx_partsupp_partkey on partsupp  (cost=0.43..764250.44 
rows=16000000 width=8)
   ->  Sort  (cost=3947864.78..3950939.15 rows=1229750 width=12)
         Sort Key: lineitem.l_partkey
         ->  Hash Join  (cost=132424.16..3823475.97 rows=1229750 width=12)
               Hash Cond: (lineitem.l_partkey = part.p_partkey)
               ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
               ->  Hash  (cost=131919.11..131919.11 rows=40404 width=4)
                     ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 
width=4)
                           Filter: ((p_name)::text ~~ 'hot%'::text)
(12 rows)

                                                                       QUERY 
PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=5205236.43..6886133.98 rows=868 width=4)
   Join Filter: (lineitem.l_partkey = part.p_partkey)
   ->  Merge Join  (cost=5205236.00..6885310.01 rows=1737 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
                     Sort Key: lineitem.l_partkey, lineitem.l_suppkey
                     ->  Bitmap Heap Scan on lineitem  
(cost=465856.38..2767269.81 rows=18160762 width=8)
                           Recheck Cond: ((l_shipdate >= '1997-01-01'::date) 
AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                           ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                                 Index Cond: ((l_shipdate >= 
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without 
time zone))
         ->  Index Only Scan using partsupp_pkey on partsupp  
(cost=0.43..1221995.31 rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..0.47 rows=1 width=4)
         Index Cond: (p_partkey = partsupp.ps_partkey)
(15 rows)

                                                                    QUERY PLAN  
                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5803124.91..6191904.83 rows=868 width=4)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Bitmap Heap Scan on lineitem  (cost=465856.38..2767269.81 
rows=18160762 width=8)
                     Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                     ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                           Index Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
   ->  Sort  (cost=597889.34..601461.54 rows=1428880 width=12)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         ->  Nested Loop  (cost=0.43..451811.70 rows=1428880 width=12)
               ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
               ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..7.57 rows=35 width=8)
                     Index Cond: (ps_partkey = part.p_partkey)
(17 rows)

Attachment: explain.sql
Description: application/sql

                                                   QUERY PLAN                   
                                 
-----------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=23613912.46..24689677.34 rows=2340 width=4)
   ->  Merge Join  (cost=23613912.03..24673859.22 rows=2340 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  Sort  (cost=20971852.46..21271839.58 rows=119994848 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
         ->  Materialize  (cost=2642034.49..2722034.49 rows=16000000 width=8)
               ->  Sort  (cost=2642034.49..2682034.49 rows=16000000 width=8)
                     Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
                     ->  Seq Scan on partsupp  (cost=0.00..508759.00 
rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..6.75 rows=1 width=4)
         Index Cond: (p_partkey = lineitem.l_partkey)
(12 rows)

                                              QUERY PLAN                        
                       
-------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3947865.64..4839365.50 rows=24 width=4)
   Merge Cond: (partsupp.ps_partkey = lineitem.l_partkey)
   Join Filter: (lineitem.l_suppkey = partsupp.ps_suppkey)
   ->  Index Scan using idx_partsupp_partkey on partsupp  (cost=0.43..764250.44 
rows=16000000 width=8)
   ->  Sort  (cost=3947864.78..3950939.15 rows=1229750 width=12)
         Sort Key: lineitem.l_partkey
         ->  Hash Join  (cost=132424.16..3823475.97 rows=1229750 width=12)
               Hash Cond: (lineitem.l_partkey = part.p_partkey)
               ->  Seq Scan on lineitem  (cost=0.00..3228950.48 rows=119994848 
width=8)
               ->  Hash  (cost=131919.11..131919.11 rows=40404 width=4)
                     ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 
width=4)
                           Filter: ((p_name)::text ~~ 'hot%'::text)
(12 rows)

                                                                       QUERY 
PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=5205236.43..6886133.98 rows=868 width=4)
   Join Filter: (lineitem.l_partkey = part.p_partkey)
   ->  Merge Join  (cost=5205236.00..6885310.01 rows=1737 width=12)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
                     Sort Key: lineitem.l_partkey, lineitem.l_suppkey
                     ->  Bitmap Heap Scan on lineitem  
(cost=465856.38..2767269.81 rows=18160762 width=8)
                           Recheck Cond: ((l_shipdate >= '1997-01-01'::date) 
AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                           ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                                 Index Cond: ((l_shipdate >= 
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without 
time zone))
         ->  Index Only Scan using partsupp_pkey on partsupp  
(cost=0.43..1221995.31 rows=16000000 width=8)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..0.47 rows=1 width=4)
         Index Cond: (p_partkey = partsupp.ps_partkey)
(15 rows)

                                                                    QUERY PLAN  
                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5803124.91..6191904.83 rows=868 width=4)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  GroupAggregate  (cost=5205235.57..5438183.70 rows=9674242 width=40)
         Group Key: lineitem.l_partkey, lineitem.l_suppkey
         ->  Sort  (cost=5205235.57..5250637.47 rows=18160762 width=8)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Bitmap Heap Scan on lineitem  (cost=465856.38..2767269.81 
rows=18160762 width=8)
                     Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                     ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..461316.19 rows=18160762 width=0)
                           Index Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
   ->  Sort  (cost=597889.34..601461.54 rows=1428880 width=12)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         ->  Nested Loop  (cost=0.43..451811.70 rows=1428880 width=12)
               ->  Seq Scan on part  (cost=0.00..131919.11 rows=40404 width=4)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
               ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..7.57 rows=35 width=8)
                     Index Cond: (ps_partkey = part.p_partkey)
(17 rows)
select
    ps_suppkey
from
    lineitem,
    partsupp
where
    l_partkey = ps_partkey
    and l_suppkey = ps_suppkey
    and ps_partkey in (
        select
            p_partkey
        from
            part
    );

                                                                   QUERY PLAN   
                                                                
------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=23613854.76..24693102.68 rows=2863 width=4) 
(actual time=252662.703..884575.874 rows=119994608 loops=1)
   ->  Merge Join  (cost=23613854.33..24673749.14 rows=2863 width=12) (actual 
time=252662.656..458535.952 rows=119994608 loops=1)
         Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND 
(lineitem.l_suppkey = partsupp.ps_suppkey))
         ->  Sort  (cost=20971812.62..21271799.14 rows=119994608 width=8) 
(actual time=228806.169..336527.618 rows=119994608 loops=1)
               Sort Key: lineitem.l_partkey, lineitem.l_suppkey
               Sort Method: external merge  Disk: 2111136kB
               ->  Seq Scan on lineitem  (cost=0.00..3228948.08 rows=119994608 
width=8) (actual time=0.227..31541.384 rows=119994608 loops=1)
         ->  Materialize  (cost=2642034.49..2722034.49 rows=16000000 width=8) 
(actual time=23856.476..44148.011 rows=120001929 loops=1)
               ->  Sort  (cost=2642034.49..2682034.49 rows=16000000 width=8) 
(actual time=23856.465..28096.959 rows=16000000 loops=1)
                     Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
                     Sort Method: external sort  Disk: 281536kB
                     ->  Seq Scan on partsupp  (cost=0.00..508759.00 
rows=16000000 width=8) (actual time=0.013..4056.366 rows=16000000 loops=1)
   ->  Index Only Scan using part_pkey on part  (cost=0.43..6.75 rows=1 
width=4) (actual time=0.003..0.003 rows=1 loops=119994608)
         Index Cond: (p_partkey = lineitem.l_partkey)
         Heap Fetches: 119994608
 Planning time: 1.414 ms
 Execution time: 891781.476 ms
(17 rows)

select
    ps_suppkey
from
    lineitem,
    partsupp
where
    l_partkey = ps_partkey
    and l_suppkey = ps_suppkey
    and ps_partkey in (
        select
            p_partkey
        from
            part where p_name like 'hot%'
    );

                                                                        QUERY 
PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3973962.16..4902728.75 rows=35 width=4) (actual 
time=42931.190..51758.776 rows=1311974 loops=1)
   Merge Cond: (partsupp.ps_partkey = lineitem.l_partkey)
   Join Filter: (lineitem.l_suppkey = partsupp.ps_suppkey)
   Rows Removed by Join Filter: 3935922
   ->  Index Scan using idx_partsupp_partkey on partsupp  (cost=0.43..764250.44 
rows=16000000 width=8) (actual time=0.011..5189.382 rows=15998913 loops=1)
   ->  Sort  (cost=3973913.32..3977585.07 rows=1468702 width=12) (actual 
time=42930.688..43398.341 rows=5247893 loops=1)
         Sort Key: lineitem.l_partkey
         Sort Method: quicksort  Memory: 110651kB
         ->  Hash Join  (cost=132424.70..3823473.36 rows=1468702 width=12) 
(actual time=899.792..42215.343 rows=1311974 loops=1)
               Hash Cond: (lineitem.l_partkey = part.p_partkey)
               ->  Seq Scan on lineitem  (cost=0.00..3228948.08 rows=119994608 
width=8) (actual time=0.005..17835.189 rows=119994608 loops=1)
               ->  Hash  (cost=131919.64..131919.64 rows=40405 width=4) (actual 
time=899.698..899.698 rows=43703 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 7498kB
                     ->  Seq Scan on part  (cost=0.00..131919.64 rows=40405 
width=4) (actual time=0.023..886.782 rows=43703 loops=1)
                           Filter: ((p_name)::text ~~ 'hot%'::text)
                           Rows Removed by Filter: 3956297
 Planning time: 1.450 ms
 Execution time: 51809.901 ms
(18 rows)


select
    ps_suppkey
from
    partsupp,
    (
        select
            l_partkey agg_partkey,
            l_suppkey agg_suppkey,
            0.5 * sum(l_quantity) AS agg_quantity
        from
            lineitem
        where
            l_shipdate >= date '1997-01-01'
            and l_shipdate < date '1997-01-01' + interval '1' year
        group by
            l_partkey,
            l_suppkey
    ) agg_lineitem
where
    agg_partkey = ps_partkey
    and agg_suppkey = ps_suppkey
    and ps_partkey in (
        select
            p_partkey
        from
            part
    );
    
                                                                               
QUERY PLAN                                                                      
          
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3039101.38..4313205.40 rows=160 width=4) (actual 
time=49355.090..220422.294 rows=10897186 loops=1)
   ->  Hash Join  (cost=3039100.94..3101066.89 rows=918014 width=12) (actual 
time=49354.508..69808.651 rows=10897186 loops=1)
         Hash Cond: (lineitem.l_partkey = part.p_partkey)
         ->  HashAggregate  (cost=2867180.80..2885541.08 rows=1836028 width=8) 
(actual time=47150.671..53455.449 rows=10897186 loops=1)
               Group Key: lineitem.l_partkey, lineitem.l_suppkey
               ->  Bitmap Heap Scan on lineitem  (cost=470973.36..2775379.44 
rows=18360272 width=8) (actual time=7924.016..28786.995 rows=18214751 loops=1)
                     Recheck Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                     Heap Blocks: exact=1924458
                     ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..466383.29 rows=18360272 width=0) (actual time=6847.338..6847.338 
rows=18214751 loops=1)
                           Index Cond: ((l_shipdate >= '1997-01-01'::date) AND 
(l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
         ->  Hash  (cost=121919.51..121919.51 rows=4000051 width=4) (actual 
time=2196.673..2196.673 rows=4000000 loops=1)
               Buckets: 4194304 (originally 4194304)  Batches: 4 (originally 1) 
 Memory Usage: 229377kB
               ->  Seq Scan on part  (cost=0.00..121919.51 rows=4000051 
width=4) (actual time=0.012..712.926 rows=4000000 loops=1)
   ->  Index Only Scan using partsupp_pkey on partsupp  (cost=0.43..1.31 rows=1 
width=8) (actual time=0.013..0.013 rows=1 loops=10897186)
         Index Cond: ((ps_partkey = lineitem.l_partkey) AND (ps_suppkey = 
lineitem.l_suppkey))
         Heap Fetches: 10897186
 Planning time: 0.855 ms
 Execution time: 221281.967 ms
(18 rows)

select
    ps_suppkey
from
    partsupp,
    (
        select
            l_partkey agg_partkey,
            l_suppkey agg_suppkey,
            0.5 * sum(l_quantity) AS agg_quantity
        from
            lineitem
        where
            l_shipdate >= date '1997-01-01'
            and l_shipdate < date '1997-01-01' + interval '1' year
        group by
            l_partkey,
            l_suppkey
    ) agg_lineitem
where
    agg_partkey = ps_partkey
    and agg_suppkey = ps_suppkey
    and ps_partkey in (
        select
            p_partkey
        from
            part
        where 
            p_name like 'hot%'
    );

                                                                                
  QUERY PLAN                                                                    
               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=3690996.25..3715358.15 rows=160 width=4) (actual 
time=79478.109..83008.686 rows=119238 loops=1)
   Merge Cond: ((partsupp.ps_partkey = agg_lineitem.agg_partkey) AND 
(partsupp.ps_suppkey = agg_lineitem.agg_suppkey))
   ->  Sort  (cost=596073.10..599603.57 rows=1412188 width=12) (actual 
time=1648.385..1659.393 rows=174812 loops=1)
         Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
         Sort Method: quicksort  Memory: 14339kB
         ->  Nested Loop  (cost=0.43..451821.62 rows=1412188 width=12) (actual 
time=0.071..1548.566 rows=174812 loops=1)
               ->  Seq Scan on part  (cost=0.00..131919.64 rows=40405 width=4) 
(actual time=0.047..1006.884 rows=43703 loops=1)
                     Filter: ((p_name)::text ~~ 'hot%'::text)
                     Rows Removed by Filter: 3956297
               ->  Index Scan using idx_partsupp_partkey on partsupp  
(cost=0.43..7.57 rows=35 width=8) (actual time=0.010..0.011 rows=4 loops=43703)
                     Index Cond: (ps_partkey = part.p_partkey)
   ->  Sort  (cost=3094923.15..3099513.22 rows=1836028 width=8) (actual 
time=77829.655..79681.366 rows=10896449 loops=1)
         Sort Key: agg_lineitem.agg_partkey, agg_lineitem.agg_suppkey
         Sort Method: external sort  Disk: 234352kB
         ->  Subquery Scan on agg_lineitem  (cost=2867180.80..2903901.36 
rows=1836028 width=8) (actual time=48973.005..55717.145 rows=10897186 loops=1)
               ->  HashAggregate  (cost=2867180.80..2885541.08 rows=1836028 
width=8) (actual time=48973.003..54254.198 rows=10897186 loops=1)
                     Group Key: lineitem.l_partkey, lineitem.l_suppkey
                     ->  Bitmap Heap Scan on lineitem  
(cost=470973.36..2775379.44 rows=18360272 width=8) (actual 
time=7918.595..29778.138 rows=18214751 loops=1)
                           Recheck Cond: ((l_shipdate >= '1997-01-01'::date) 
AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without time zone))
                           Heap Blocks: exact=1924458
                           ->  Bitmap Index Scan on idx_lineitem_shipdate  
(cost=0.00..466383.29 rows=18360272 width=0) (actual time=6849.743..6849.743 
rows=18214751 loops=1)
                                 Index Cond: ((l_shipdate >= 
'1997-01-01'::date) AND (l_shipdate < '1998-01-01 00:00:00'::timestamp without 
time zone))
 Planning time: 0.723 ms
 Execution time: 83096.504 ms
(24 rows)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to