Re: [PERFORM] issue with query optimizer when joining two partitioned tables

Anish Kejariwal Sun, 10 Jul 2011 08:34:20 -0700

Thanks Tom and Heikki!  I really appreciate your help.

I went ahead and loaded all the data.  In the icream table, I now have ~175
partitions, each with 4041866 records.


The data finished loading 12 hours ago, and I then ran the same query I gave
you guys, and it took 25 seconds since it used the wrong execution plan as
expected.

 HashAggregate  (cost=27680.90..28045.81 rows=11228 width=41) (actual
time=24769.190..24817.618 rows=11028 loops=1)
   ->  Hash Join  (cost=1304.04..18901.88 rows=292634 width=41) (actual
time=3938.965..24688.718 rows=11028 loops=1)
         Hash Cond: (rc.widget_id = r.widget_id)
         ->  Append  (cost=0.00..12110.95 rows=292634 width=24) (actual
time=2854.925..22887.638 rows=309579 loops=1)
               ->  Seq Scan on icecream rc  (cost=0.00..25.60 rows=1
width=24) (actual time=0.003..0.003 rows=0 loops=1)
                     Filter: ((widgetset_id = 5) AND (dataset_id = 283))
               ->  Index Scan using icecream_part_283_widgetset_id_idx on
icecream_part_283 rc  (cost=0.00..12085.35 rows=292633 width=24) (actual
time=2854.915..2
1784.769 rows=309579 loops=1)
                     Index Cond: (widgetset_id = 5)
                     Filter: (dataset_id = 283)
         ->  Hash  (cost=1163.69..1163.69 rows=11228 width=21) (actual
time=1083.704..1083.704 rows=11028 loops=1)
               Buckets: 2048  Batches: 1  Memory Usage: 604kB
               ->  Append  (cost=4.28..1163.69 rows=11228 width=21) (actual
time=528.216..1066.659 rows=11028 loops=1)
                     ->  Bitmap Heap Scan on widget r  (cost=4.28..12.75
rows=1 width=48) (actual time=528.017..528.017 rows=0 loops=1)
                           Recheck Cond: (widgetset_id = 5)
                           Filter: (widget_type_id = 4)
                           ->  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=527.995..527.995 rows=0
loops=1)
                                 Index Cond: (widgetset_id = 5)
                     ->  Index Scan using
widget_part_5_widget_widget_type_id_idx on widget_part_5 r
 (cost=0.00..1150.94 rows=11227 width=21) (actual time=0.191..512.847 rows=1
1028 loops=1)
                           Index Cond: (widget_type_id = 4)
                           Filter: (widgetset_id = 5)
 Total runtime: 24844.016 ms
(21 rows)


I then changed my enable* options to force it to use a nested loop:
set enable_mergejoin=off;
set enable_hashjoin=off;
set enable_nestloop = on;


 HashAggregate  (cost=460004.79..460369.70 rows=11228 width=41) (actual
time=298.014..341.822 rows=11028 loops=1)
   ->  Nested Loop  (cost=4.28..338747.85 rows=4041898 width=41) (actual
time=0.175..248.529 rows=11028 loops=1)
         Join Filter: (r.widget_id = rc.widget_id)
         ->  Append  (cost=4.28..1163.69 rows=11228 width=21) (actual
time=0.053..42.532 rows=11028 loops=1)
               ->  Bitmap Heap Scan on widget r  (cost=4.28..12.75 rows=1
width=48) (actual time=0.014..0.014 rows=0 loops=1)
                     Recheck Cond: (widgetset_id = 5)
                     Filter: (widget_type_id = 4)
                     ->  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                           Index Cond: (widgetset_id = 5)
               ->  Index Scan using widget_part_5_widget_widget_type_id_idx
on widget_part_5 r  (cost=0.00..1150.94 rows=11227 width=21) (actual
time=0.032..18.410 rows=11028 lo
ops=1)
                     Index Cond: (widget_type_id = 4)
                     Filter: (widgetset_id = 5)
         ->  Append  (cost=0.00..29.99 rows=6 width=24) (actual
time=0.009..0.012 rows=1 loops=11028)
               ->  Seq Scan on icecream rc  (cost=0.00..23.00 rows=5
width=24) (actual time=0.001..0.001 rows=0 loops=11028)
                     Filter: (rc.dataset_id = 283)
               ->  Index Scan using icecream_part_283_widget_id_idx on
icecream_part_283 rc  (cost=0.00..6.99 rows=1 width=24) (actual
time=0.004..0.006 rows=1 loo
ps=11028)
                     Index Cond: (rc.widget_id = r.widget_id)
                     Filter: (rc.dataset_id = 283)
 Total runtime: 361.180 ms
(19 rows)

The query was nice and fast as expected.

I then restored the enable* options back to default.  The query was slow
again and taking around ~19 seconds.  So, this gives us some information
about whether autoanalyze is running in the background:
-I don't think waiting is making a difference.  I loaded the data 12 hours
ago, and then ran the query and the query was very slow.  I then ran the
query again but for a different dataset and it was also slow.  It was only
once I changed the enable* parameters could I get my expected performance.
-I would have tried select * from ice-cream again, but with 176 partitions
this query is no longer feasible.  But, I do agree that select * from
icecream is causing the statistics to be updated.

So, then I followed Tom's suggestion to defeat the empty-table heuristic.

select relpages from pg_class where relname = 'icecream';

 relpages
----------
        0
(1 row)

Ok, so the planner thinks that the parent table is empty.  I then ran:
update pg_class set relpages = 1 where relname = 'icecream';


 HashAggregate  (cost=201199.27..201564.18 rows=11228 width=41) (actual
time=277.195..304.620 rows=11028 loops=1)
   ->  Nested Loop  (cost=4.28..79942.45 rows=4041894 width=41) (actual
time=0.227..231.181 rows=11028 loops=1)
         Join Filter: (r.widget_id = rc.widget_id)
         ->  Append  (cost=4.28..1163.69 rows=11228 width=21) (actual
time=0.125..40.834 rows=11028 loops=1)
               ->  Bitmap Heap Scan on widget r  (cost=4.28..12.75 rows=1
width=48) (actual time=0.022..0.022 rows=0 loops=1)
                     Recheck Cond: (widgetset_id = 5)
                     Filter: (widget_type_id = 4)
                     ->  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=0.019..0.019 rows=0 loops=1)
                           Index Cond: (widgetset_id = 5)
               ->  Index Scan using widget_part_5_widget_widget_type_id_idx
on widget_part_5 r  (cost=0.00..1150.94 rows=11227 width=21) (actual
time=0.100..18.964 rows=11028 lo
ops=1)
                     Index Cond: (widget_type_id = 4)
                     Filter: (widgetset_id = 5)
         ->  Append  (cost=0.00..6.99 rows=2 width=24) (actual
time=0.008..0.012 rows=1 loops=11028)
               ->  Seq Scan on icecream rc  (cost=0.00..0.00 rows=1
width=24) (actual time=0.001..0.001 rows=0 loops=11028)
                     Filter: (rc.dataset_id = 283)
               ->  Index Scan using icecream_part_283_widget_id_idx on
icecream_part_283 rc  (cost=0.00..6.99 rows=1 width=24) (actual
time=0.004..0.006 rows=1 loo
ps=11028)
                     Index Cond: (rc.widget_id = r.widget_id)
                     Filter: (rc.dataset_id = 283)
 Total runtime: 318.634 ms
(19 rows)


Wow!  that fixes it.  Thanks you so much!!!! I've been struggling with this
issue for 2-3 days.  (Also, in the past, I've seen inconsistent performance
with this query, which may be the result of the planner sometimes choosing
the wrong plan, but I'll chase that down later).

Tom said: But maybe we should reconsider the heuristic for tables that are
members of inheritance trees --- particularly parents of inheritance trees.

I agree.  I think postgres should get updated to take this into account.  I
shouldn't have to set the relpages to 1 for all the empty parent tables that
I have partitioned.  Should I file this as a bug/enhancement?

Also, do I need to worry about about autoanalyze/autovacuum setting back
relpages to zero for the parent icecream table?

thanks!!!
Anish



On Sat, Jul 9, 2011 at 10:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
> > On 09.07.2011 00:36, Anish Kejariwal wrote:
> >> My guess as to what happened:
> >> -because the icecream parent table has zero records, the query optimizer
> >> chooses the incorrect execution plan
> >> -when I do select * from icecream, the optimizer now knows how many
> records
> >> are really in the icecream table, by knowing that the icecream table has
> >> partitions.
>
> > "select * from icecream" won't have any direct effect on the
> > optimization of subsequent queries. What probably happened is that
> > autoanalyze ran in the background while you ran that select, and
> > analyzed some of the partitions. Simply waiting a while would've had the
> > same effect.
>
> Yeah.  Also, the reason that a manual vacuum on icecream changes things
> yet again is that in 9.0 and up, we have a notion of summary stats
> across the whole inheritance tree, but autoanalyze hasn't been taught to
> gather those.  The manual command on the parent table does gather them,
> though.
>
> So what's happening here is that we suddenly have an accurate idea of
> the size of the join product as a result of having inheritance summary
> stats to estimate with, and that drives the estimated cost of the merge
> or hash join down out of the stratosphere.  The estimated cost of the
> nestloop goes down a lot too, but not as much.
>
> I experimented with a similar case here, and it seems like a lot of the
> remaining error in the nestloop estimate comes from this:
>
> >> ->   Seq Scan on icecream rc  (cost=0.00..23.00 rows=5 width=24) (actual
> time=0.002..0.002 rows=0 loops=1)
> >>        Filter: (dataset_id = 281)
>
> The indexscan on the nonempty child partition is estimated at less than
> 10 cost units, so this is a *large* fraction of what the planner sees as
> the per-outer-row cost of a nestloop.  And with more than 11000 rows on
> the other side of the join, that discourages it from using the nestloop.
> In reality of course this takes negligible time compared to examining
> the child partition.
>
> Now why is the seqscan cost estimate so large, when actually the parent
> icecream table is totally empty?  It's because the planner has been
> taught to never believe that an empty table is empty.  If memory serves,
> it's really estimating on an assumption that the table contains 10 pages
> and some corresponding number of rows.  This is a reasonable defensive
> posture when dealing with ordinary tables, I think, since most likely
> if the catalogs say the table is empty that's just a leftover from when
> it was created.  But maybe we should reconsider the heuristic for tables
> that are members of inheritance trees --- particularly parents of
> inheritance trees.
>
> I was able to defeat the empty-table heuristic here by doing
>
> update pg_class set relpages = 1 where relname = 'icecream';
>
> and then I started getting much more realistic estimates in my test
> case.  (It still wanted to use a merge join initially, but after
> knocking down random_page_cost it went to the nestloop.)  It would
> be interesting to see what sorts of results Anish gets with that.
>
>                        regards, tom lane
>

Re: [PERFORM] issue with query optimizer when joining two partitioned tables

Reply via email to