Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-22 Thread Erdmann, Markus @ Bellevue

On Jun 20, 2016, at 1:36 PM, David G. Johnston 
mailto:david.g.johns...@gmail.com>> wrote:

​Please don't top-post.

Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt - and 
possibly helps - but wasn't required).  ANALYZE recomputes the statistics for 
your database.  The apparent problem was that those statistics were wrong which 
causes the planner to choose the wrong plan.

EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner both 
end up using the bad statistics.  The addition of ANALYZE to EXPLAIN simply 
tells the system to not only explain your query but to execute it as well (but 
discard the results).  Aside from sharing the same 7 characters the two words 
have nothing in common.

I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you only 
get to explain one statement at a time.

David J.


Thank you, David. My confusion originated from a lack of familiarity with the 
ANALYZE command. Your responses and Tom’s response have been very enlightening.

The head-scratcher for us is that our statistics became so out of date even 
though we have the autovacuum daemon enabled in RDS, and according to the docs 
that does run ANALYZE periodically. Now we know (thanks to your help) to check 
for this issue immediately when the planner is showing a large disparity 
between the estimated and actual cost.

Markus E.


Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-20 Thread David G. Johnston
On Fri, Jun 17, 2016 at 5:05 PM, Erdmann, Markus @ Bellevue <
markus.erdm...@cbre.com> wrote:

> Thank you Tom and David for your very helpful replies. We dumped and
> restored the RDS staging database on a local installation of pg and were
> not able to reproduce the issue in 9.5.2, which led us to try running a
> VACUUM ANALYZE and recreating indexes. After this we no longer saw a
> discrepancy between the query plan in the separate environments. Is this
> what you meant, Tom, by making sure to ANALYZE? Or did you mean including
> ANALYZE in EXPLAIN ANALYZE?
>
>
​Please don't top-post.

Yes, using "[VACUUM] ANALYZE" is what was meant (the VACUUM doesn't hurt -
and possibly helps - but wasn't required).  ANALYZE recomputes the
statistics for your database.  The apparent problem was that those
statistics were wrong which causes the planner to choose the wrong plan.

EXPLAIN ANALYZE doesn't help in that scenario - the explain and planner
both end up using the bad statistics.  The addition of ANALYZE to EXPLAIN
simply tells the system to not only explain your query but to execute it as
well (but discard the results).  Aside from sharing the same 7 characters
the two words have nothing in common.

I'm not sure how one would "...includ[e] ANALYZE in EXPLAIN ANALYZE" - you
only get to explain one statement at a time.

David J.


Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-20 Thread Erdmann, Markus @ Bellevue
Thank you Tom and David for your very helpful replies. We dumped and
restored the RDS staging database on a local installation of pg and were
not able to reproduce the issue in 9.5.2, which led us to try running a
VACUUM ANALYZE and recreating indexes. After this we no longer saw a
discrepancy between the query plan in the separate environments. Is this
what you meant, Tom, by making sure to ANALYZE? Or did you mean including
ANALYZE in EXPLAIN ANALYZE?


On 6/17/16, 11:17 AM, "Tom Lane"  wrote:

>"Erdmann, Markus @ Bellevue"  writes:
>> We¹re trying to debug a performance issue affecting our staging
>>database, and we¹ve narrowed it down to a difference in the query
>>optimizer in 9.5.2. Upgrading to 9.5 is important for us because we need
>>the ability to import foreign schemas.
>
>I think the core of the problem is the large rowcount estimation error
>here:
>
>>   ->  Bitmap Index Scan on
>>transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161 width=0)
>>(actual time=20.153..20.153 rows=269021 loops=1)
>> Index Cond: ((date_created >=
>>'2010-01-01'::date) AND (date_created <= '2015-12-31'::date))
>
>That's a pretty simple condition and it's hard to believe that 9.5 does it
>much differently than 9.4 did.  Perhaps you forgot to ANALYZE, or were
>using a larger statistics target in the 9.4 installation?
>
>   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-17 Thread Tom Lane
"Erdmann, Markus @ Bellevue"  writes:
> We’re trying to debug a performance issue affecting our staging database, and 
> we’ve narrowed it down to a difference in the query optimizer in 9.5.2. 
> Upgrading to 9.5 is important for us because we need the ability to import 
> foreign schemas.

I think the core of the problem is the large rowcount estimation error
here:

>   ->  Bitmap Index Scan on 
> transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161 width=0) 
> (actual time=20.153..20.153 rows=269021 loops=1)
> Index Cond: ((date_created >= 
> '2010-01-01'::date) AND (date_created <= '2015-12-31'::date))

That's a pretty simple condition and it's hard to believe that 9.5 does it
much differently than 9.4 did.  Perhaps you forgot to ANALYZE, or were
using a larger statistics target in the 9.4 installation?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-17 Thread David G. Johnston
On Thu, Jun 16, 2016 at 8:40 PM, Erdmann, Markus @ Bellevue <
markus.erdm...@cbre.com> wrote:

> Hello,
>
> We’re trying to debug a performance issue affecting our staging database,
> and we’ve narrowed it down to a difference in the query optimizer in 9.5.2.
> Upgrading to 9.5 is important for us because we need the ability to import
> foreign schemas.
>
> This is the query we’re running:
>
> CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id
> integer PRIMARY KEY );
>
> INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id)
>

​This...
​

> SELECT DISTINCT ON ("transactions_transaction"."id")
> "transactions_transaction"."id"
>

​​DISTINCT is a code smell.  DISTINCT ON less so - it helps to avoid
self-joins - but your inclusion of ON here is pointless since the only
output column is "id".

​As written there should be no way to get duplicate "id"s into the output
result.  Or, if the tmp_joined_transactions relationship is 1-to-many you
should instead use a semi-join instead of an inner join.

FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev"
> WHERE
>

​Here...​

(NOT ("transactions_transaction"."id"
> IN (SELECT U0."id" AS Col1
> FROM "transactions_transaction" U0
> LEFT OUTER JOIN "transactions_commission" U1
> ON ( U0."id" = U1."transaction_id" )
> WHERE U1."id" IS NULL))
>

​Making this an anti-join (NOT EXISTS instead of NOT IN) stands out as an
easy improvement to try:

​It also makes the logic clearer since you seem to have a double-negative
here which means you really want a semi-join (which I wrote below)

WHERE EXISTS (SELECT 1 FROM transactions_commission WHERE
transactions_transaction.id = transactions_commission.transaction_id)

I won't promise this gives the same answer...I don't have enough spare
brain power or the ability to test it...but its seems correct.

AND "transactions_transaction"."date_created" >= '2010-01-01'::date
> AND "transactions_transaction"."date_created" <= '2015-12-31'::date
> AND "transactions_transaction"."deal_status" IN (1)
>

​Also...
​


> AND (transactions_transaction.id =
> tmp_joined_transactions_75chlsokrsev.transaction_id))
>

​This is style but I'm really a fan of using ANSI JOIN syntax...turning the
above into a 

ORDER BY "transactions_transaction"."id" ASC;
>
>
​The regression itself someone else would need to comment on.

David J.


[GENERAL] Regression in query optimizer when upgrading from 9.4.5 to 9.5.2?

2016-06-17 Thread Erdmann, Markus @ Bellevue
Hello,

We’re trying to debug a performance issue affecting our staging database, and 
we’ve narrowed it down to a difference in the query optimizer in 9.5.2. 
Upgrading to 9.5 is important for us because we need the ability to import 
foreign schemas.

This is the query we’re running:

CREATE TEMP TABLE tmp_joined_transactions_9gkgptn5xcp9 ( transaction_id integer 
PRIMARY KEY );

INSERT INTO tmp_joined_transactions_9gkgptn5xcp9 (transaction_id)
SELECT DISTINCT ON ("transactions_transaction"."id") 
"transactions_transaction"."id"
FROM "transactions_transaction" , "tmp_joined_transactions_75chlsokrsev"
WHERE (NOT ("transactions_transaction"."id"
IN (SELECT U0."id" AS Col1
FROM "transactions_transaction" U0
LEFT OUTER JOIN "transactions_commission" U1
ON ( U0."id" = U1."transaction_id" )
WHERE U1."id" IS NULL))
AND "transactions_transaction"."date_created" >= '2010-01-01'::date
AND "transactions_transaction"."date_created" <= '2015-12-31'::date
AND "transactions_transaction"."deal_status" IN (1)
AND (transactions_transaction.id = 
tmp_joined_transactions_75chlsokrsev.transaction_id))
ORDER BY "transactions_transaction"."id" ASC;

And this is the EXPLAIN ANALYZE output in 9.5.2:

Insert on tmp_joined_transactions_9gkgptn5xcp9  (cost=42111.29..42991.60 rows=1 
width=4) (actual time=470236.029..470236.029 rows=0 loops=1)
  ->  Unique  (cost=42111.29..42991.59 rows=1 width=4) (actual 
time=1109.636..470222.609 rows=4652 loops=1)
->  Nested Loop  (cost=42111.29..42991.59 rows=1 width=4) (actual 
time=1109.635..470219.259 rows=4652 loops=1)
  Join Filter: (transactions_transaction.id = 
tmp_joined_transactions_75chlsokrsev.transaction_id)
  Rows Removed by Join Filter: 1153472704
  ->  Index Only Scan using 
tmp_joined_transactions_75chlsokrsev_pkey on 
tmp_joined_transactions_75chlsokrsev  (cost=0.28..224.61 rows=5355 width=4) 
(actual time=0.016..4.829 rows=4652 loops=1)
Heap Fetches: 4652
  ->  Materialize  (cost=42111.01..42686.66 rows=1 width=4) (actual 
time=0.240..51.639 rows=247953 loops=4652)
->  Bitmap Heap Scan on transactions_transaction  
(cost=42111.01..42686.65 rows=1 width=4) (actual time=1109.580..1238.034 
rows=247953 loops=1)
  Recheck Cond: ((date_created >= '2010-01-01'::date) 
AND (date_created <= '2015-12-31'::date))
  Filter: ((NOT (hashed SubPlan 1)) AND (deal_status = 
1))
  Rows Removed by Filter: 21068
  Heap Blocks: exact=8073
  ->  Bitmap Index Scan on 
transactions_transaction_c69e55a4  (cost=0.00..18.02 rows=161 width=0) (actual 
time=20.153..20.153 rows=269021 loops=1)
Index Cond: ((date_created >= 
'2010-01-01'::date) AND (date_created <= '2015-12-31'::date))
  SubPlan 1
->  Hash Right Join  (cost=8799.57..42092.98 rows=1 
width=4) (actual time=852.012..1086.671 rows=3395 loops=1)
  Hash Cond: (u1.transaction_id = u0.id)
  Filter: (u1.id IS NULL)
  Rows Removed by Filter: 995248
  ->  Seq Scan on transactions_commission u1  
(cost=0.00..27020.63 rows=1039763 width=8) (actual time=0.003..221.750 
rows=1039763 loops=1)
  ->  Hash  (cost=8395.92..8395.92 rows=32292 
width=4) (actual time=153.984..153.984 rows=272663 loops=1)
Buckets: 131072 (originally 32768)  
Batches: 4 (originally 1)  Memory Usage: 3425kB
->  Seq Scan on 
transactions_transaction u0  (cost=0.00..8395.92 rows=32292 width=4) (actual 
time=0.003..71.988 rows=272663 loops=1)
Planning time: 0.410 ms
Execution time: 470237.400 ms

And this is the EXPLAIN ANALYZE output in 9.4.5:

Insert on tmp_joined_transactions_9gkgptn5xcp9  (cost=88544.31..88576.62 
rows=2154 width=4) (actual time=1356.065..1356.065 rows=0 loops=1)
  ->  Unique  (cost=88544.31..88555.08 rows=2154 width=4) (actual 
time=1347.480..1350.548 rows=4715 loops=1)
->  Sort  (cost=88544.31..88549.70 rows=2154 width=4) (actual 
time=1347.477..1348.432 rows=4715 loops=1)
  Sort Key: transactions_transaction.id
  Sort Method: quicksort  Memory: 414kB
  ->  Hash Join  (cost=72700.01..88425.06 rows=2154 width=4) 
(actual time=1107.077..1345.650 rows=4715 loops=1)
Hash Cond: (transactions_transaction.id = 
tmp_joined_transactions_75chlsokrsev.transaction_id)
->  Seq Scan on transactions_transaction  
(cost=72565.61..87199.11 rows=122287 width=4) (actual time=1104.855..1269.783 
rows=251736 loops=1)
  Filter: ((NOT (hashed SubPlan 1)) AND (date_created 
>= '2010-01-01'::date) AND (date_created <= '2015-12-31'::date) AND 
(deal_st