Re: [PERFORM] Regression from 9.4-9.6

2017-10-08 Thread Jim Nasby

On 10/8/17 3:37 PM, Tom Lane wrote:

Jim Nasby  writes:

On 10/8/17 2:34 PM, Tom Lane wrote:

Why has this indexscan's cost estimate changed so much?



Great question... the only thing that sticks out is the coalesce(). Let
me see if an analyze with a higher stats target changes anything. FWIW,
the 9.6 database is copied from the 9.4 one once a week and then
pg_upgraded. I'm pretty sure an ANALYZE is part of that process.


Hm, now that I see the SubPlan in there, I wonder whether 9.6 is
accounting more conservatively for the cost of the subplan.  It
probably is assuming that the subplan gets run for each row fetched
from the index, although the loops and rows-removed counts show
that the previous filter conditions reject 99% of the fetched rows.

But that code looks the same in 9.4, so I don't understand why
the 9.4 estimate isn't equally large ...


Besides the analyze issue, the other part of this is

asdid...@graceful.hou/20106> select 
pg_size_pretty(pg_relation_size('bdata_forks'));

 pg_size_pretty

 106 GB
(1 row)

asdid...@graceful.hou/20106> select relpages::bigint*8192/reltuples from 
pg_class where relname='bdata_forks';

 ?column?
--
 185.559397863791
(1 row)

With an effective_cache_size of 200GB that's not really helping things. 
But it's also another example of the planner's reluctance towards index 
scans.

--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regression from 9.4-9.6

2017-10-08 Thread Tom Lane
Jim Nasby  writes:
> On 10/8/17 2:34 PM, Tom Lane wrote:
>> Why has this indexscan's cost estimate changed so much?

> Great question... the only thing that sticks out is the coalesce(). Let 
> me see if an analyze with a higher stats target changes anything. FWIW, 
> the 9.6 database is copied from the 9.4 one once a week and then 
> pg_upgraded. I'm pretty sure an ANALYZE is part of that process.

Hm, now that I see the SubPlan in there, I wonder whether 9.6 is
accounting more conservatively for the cost of the subplan.  It
probably is assuming that the subplan gets run for each row fetched
from the index, although the loops and rows-removed counts show
that the previous filter conditions reject 99% of the fetched rows.

But that code looks the same in 9.4, so I don't understand why
the 9.4 estimate isn't equally large ...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regression from 9.4-9.6

2017-10-08 Thread Jim Nasby

On 10/8/17 3:02 PM, Jim Nasby wrote:


-> Index Scan using bdata_filed_departuretime on bdata_forks 
(cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 
rows=508 loops=1)


-> Index Scan using bdata_filed_departuretime on bdata_forks 
(cost=0.57..14894236.06 rows=1 width=36) (actual 
time=892.664..3025.653 rows=508 loops=1)


I think the reason it's discarding the preferable plan is that, with this
huge increment in the estimated cost getting added to both alternatives,
the two nestloop plans have fuzzily the same total cost, and it's picking
the one you don't want on the basis of some secondary criterion.


Great question... the only thing that sticks out is the coalesce(). Let 
me see if an analyze with a higher stats target changes anything. FWIW, 
the 9.6 database is copied from the 9.4 one once a week and then 
pg_upgraded. I'm pretty sure an ANALYZE is part of that process.


Turns out that analyze is the 'problem'. On the 9.4 database, pg_stats 
shows that the newest date in filed_departuretime is 3/18/2017, while 
the 9.6 database is up-to-date. If I change the query to use 2/9/2018 
instead of 7/20/2017 I get the same results.


So, the larger cost estimate is theoretically more correct. If I set 
random_page_cost = 1 I end up with a good plan.

--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regression from 9.4-9.6

2017-10-08 Thread Jim Nasby

On 10/8/17 2:34 PM, Tom Lane wrote:

Jim Nasby  writes:

I've got a query that's regressed from 9.4 to 9.6. I suspect it has
something to do with the work done around bad plans from single-row
estimates.


Why has this indexscan's cost estimate changed so much?


->  Index Scan using bdata_filed_departuretime on bdata_forks  
(cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 rows=508 
loops=1)



->  Index Scan using bdata_filed_departuretime on bdata_forks  
(cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 rows=508 
loops=1)


I think the reason it's discarding the preferable plan is that, with this
huge increment in the estimated cost getting added to both alternatives,
the two nestloop plans have fuzzily the same total cost, and it's picking
the one you don't want on the basis of some secondary criterion.


Great question... the only thing that sticks out is the coalesce(). Let 
me see if an analyze with a higher stats target changes anything. FWIW, 
the 9.6 database is copied from the 9.4 one once a week and then 
pg_upgraded. I'm pretty sure an ANALYZE is part of that process.


9.4:

 ->  Index Scan using bdata_filed_departuretime on bdata_forks  
(cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 rows=508 
loops=1)
   Index Cond: ((filed_departuretime >= '2017-07-20 
05:00:00'::timestamp without time zone) AND (filed_departuretime <= '2017-07-30 
04:59:59'::timestamp without time zone))
   Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS 
NOT NULL) AND (diverted IS NOT TRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= 
'2017-07-23 05:00:00'::timestamp without time zone) AND (sch_block_out <= '2017-07-24 
04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) 
>= actualdeparturetime) AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) 
AND (((SubPlan 2))::text = 'KORD'::text))
   Rows Removed by Filter: 2696593
   SubPlan 2
 ->  Index Scan using bd_airport_pkey on bd_airport 
bd_airport_1  (cost=0.56..4.58 rows=1 width=20) (actual time=0.003..0.003 rows=1 
loops=21652)
   Index Cond: (id = bdata_forks.origin_id)


9.6:

 ->  Index Scan using bdata_filed_departuretime on bdata_forks  
(cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 rows=508 
loops=1)
   Index Cond: ((filed_departuretime >= '2017-07-20 
05:00:00'::timestamp without time zone) AND (filed_departuretime <= '2017-07-30 
04:59:59'::timestamp without time zone))
   Filter: (((view_www IS NULL) OR (view_www IS TRUE)) AND (sch_block_out IS 
NOT NULL) AND (diverted IS NOT TRUE) AND (true_cancel IS NOT TRUE) AND (sch_block_out >= 
'2017-07-23 05:00:00'::timestamp without time zone) AND (sch_block_out <= '2017-07-24 
04:59:59'::timestamp without time zone) AND (COALESCE(actualarrivaltime, cancellation) 
>= actualdeparturetime) AND ((act_block_out - sch_block_out) >= '00:15:00'::interval) 
AND (((SubPlan 2))::text = 'KORD'::text))
   Rows Removed by Filter: 2696592
   SubPlan 2
 ->  Index Scan using bd_airport_pkey on bd_airport 
bd_airport_1  (cost=0.56..4.58 rows=1 width=20) (actual time=0.004..0.004 rows=1 
loops=21652)
   Index Cond: (id = bdata_forks.origin_id)



--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Regression from 9.4-9.6

2017-10-08 Thread Tom Lane
Jim Nasby  writes:
> I've got a query that's regressed from 9.4 to 9.6. I suspect it has 
> something to do with the work done around bad plans from single-row 
> estimates.

Why has this indexscan's cost estimate changed so much?

>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  
>> (cost=0.57..50807.51 rows=1 width=36) (actual time=979.381..3207.777 
>> rows=508 loops=1)

>> ->  Index Scan using bdata_filed_departuretime on bdata_forks  
>> (cost=0.57..14894236.06 rows=1 width=36) (actual time=892.664..3025.653 
>> rows=508 loops=1)

I think the reason it's discarding the preferable plan is that, with this
huge increment in the estimated cost getting added to both alternatives,
the two nestloop plans have fuzzily the same total cost, and it's picking
the one you don't want on the basis of some secondary criterion.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance