Re: [PERFORM] Regression from 9.4-9.6
On 10/8/17 3:37 PM, Tom Lane wrote: Jim Nasbywrites: On 10/8/17 2:34 PM, Tom Lane wrote: Why has this indexscan's cost estimate changed so much? Great question... the only thing that sticks out is the coalesce(). Let me see if an analyze with a higher stats target changes anything. FWIW, the 9.6 database is copied from the 9.4 one once a week and then pg_upgraded. I'm pretty sure an ANALYZE is part of that process. Hm, now that I see the SubPlan in there, I wonder whether 9.6 is accounting more conservatively for the cost of the subplan. It probably is assuming that the subplan gets run for each row fetched from the index, although the loops and rows-removed counts show that the previous filter conditions reject 99% of the fetched rows. But that code looks the same in 9.4, so I don't understand why the 9.4 estimate isn't equally large ... Besides the analyze issue, the other part of this is asdid...@graceful.hou/20106> select pg_size_pretty(pg_relation_size('bdata_forks')); pg_size_pretty 106 GB (1 row) asdid...@graceful.hou/20106> select relpages::bigint*8192/reltuples from pg_class where relname='bdata_forks'; ?column? -- 185.559397863791 (1 row) With an effective_cache_size of 200GB that's not really helping things. But it's also another example of the planner's reluctance towards index scans. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regression from 9.4-9.6
Jim Nasbywrites: > On 10/8/17 2:34 PM, Tom Lane wrote: >> Why has this indexscan's cost estimate changed so much? > Great question... the only thing that sticks out is the coalesce(). Let > me see if an analyze with a higher stats target changes anything. FWIW, > the 9.6 database is copied from the 9.4 one once a week and then > pg_upgraded. I'm pretty sure an ANALYZE is part of that process. Hm, now that I see the SubPlan in there, I wonder whether 9.6 is accounting more conservatively for the cost of the subplan. It probably is assuming that the subplan gets run for each row fetched from the index, although the loops and rows-removed counts show that the previous filter conditions reject 99% of the fetched rows. But that code looks the same in 9.4, so I don't understand why the 9.4 estimate isn't equally large ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regression from 9.4-9.6
On 10/8/17 3:02 PM, Jim Nasby wrote: -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 rows=508 loops=1) -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 rows=508 loops=1) I think the reason it's discarding the preferable plan is that, with this huge increment in the estimated cost getting added to both alternatives, the two nestloop plans have fuzzily the same total cost, and it's picking the one you don't want on the basis of some secondary criterion. Great question... the only thing that sticks out is the coalesce(). Let me see if an analyze with a higher stats target changes anything. FWIW, the 9.6 database is copied from the 9.4 one once a week and then pg_upgraded. I'm pretty sure an ANALYZE is part of that process. Turns out that analyze is the 'problem'. On the 9.4 database, pg_stats shows that the newest date in filed_departuretime is 3/18/2017, while the 9.6 database is up-to-date. If I change the query to use 2/9/2018 instead of 7/20/2017 I get the same results. So, the larger cost estimate is theoretically more correct. If I set random_page_cost = 1 I end up with a good plan. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regression from 9.4-9.6
On 10/8/17 2:34 PM, Tom Lane wrote: Jim Nasbywrites: I've got a query that's regressed from 9.4 to 9.6. I suspect it has something to do with the work done around bad plans from single-row estimates. Why has this indexscan's cost estimate changed so much? -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 rows=508 loops=1) -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 rows=508 loops=1) I think the reason it's discarding the preferable plan is that, with this huge increment in the estimated cost getting added to both alternatives, the two nestloop plans have fuzzily the same total cost, and it's picking the one you don't want on the basis of some secondary criterion. Great question... the only thing that sticks out is the coalesce(). Let me see if an analyze with a higher stats target changes anything. FWIW, the 9.6 database is copied from the 9.4 one once a week and then pg_upgraded. I'm pretty sure an ANALYZE is part of that process. 9.4: -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 rows=508 loops=1) Index Cond: ((filed_departuretime >= '2017-07-20 05:00:00'::timestamp without time zone) AND (filed_departuretime <= '2017-07-30 04:59:59'::timestamp without time zone)) Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS NOT NULL) AND (diverted IS NOT TRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= '2017-07-23 05:00:00'::timestamp without time zone) AND (sch_block_out <= '2017-07-24 04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) >= actualdeparturetime) AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) AND (((SubPlan 2))::text = 'KORD'::text)) Rows Removed by Filter: 2696593 SubPlan 2 -> Index Scan using bd_airport_pkey on bd_airport bd_airport_1 (cost=0.56..4.58 rows=1 width=20) (actual time=0.003..0.003 rows=1 loops=21652) Index Cond: (id = bdata_forks.origin_id) 9.6: -> Index Scan using bdata_filed_departuretime on bdata_forks (cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 rows=508 loops=1) Index Cond: ((filed_departuretime >= '2017-07-20 05:00:00'::timestamp without time zone) AND (filed_departuretime <= '2017-07-30 04:59:59'::timestamp without time zone)) Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS NOT NULL) AND (diverted IS NOT TRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= '2017-07-23 05:00:00'::timestamp without time zone) AND (sch_block_out <= '2017-07-24 04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) >= actualdeparturetime) AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) AND (((SubPlan 2))::text = 'KORD'::text)) Rows Removed by Filter: 2696592 SubPlan 2 -> Index Scan using bd_airport_pkey on bd_airport bd_airport_1 (cost=0.56..4.58 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=21652) Index Cond: (id = bdata_forks.origin_id) -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Regression from 9.4-9.6
Jim Nasbywrites: > I've got a query that's regressed from 9.4 to 9.6. I suspect it has > something to do with the work done around bad plans from single-row > estimates. Why has this indexscan's cost estimate changed so much? >> -> Index Scan using bdata_filed_departuretime on bdata_forks >> (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 >> rows=508 loops=1) >> -> Index Scan using bdata_filed_departuretime on bdata_forks >> (cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 >> rows=508 loops=1) I think the reason it's discarding the preferable plan is that, with this huge increment in the estimated cost getting added to both alternatives, the two nestloop plans have fuzzily the same total cost, and it's picking the one you don't want on the basis of some secondary criterion. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance