På søndag 07. august 2016 kl. 18:35:56, skrev Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>>:
Dean Rasheed <dean.a.rash...@gmail.com> writes:
 > On 5 August 2016 at 21:48, Tom Lane <t...@sss.pgh.pa.us> wrote:
 >> OK, thanks.  What shall we do about Andreas' request to back-patch this?
 >> I'm personally willing to do it, but there is the old bugaboo of "maybe
 >> it will destabilize a plan that someone is happy with".

 > My inclination would be to back-patch it because arguably it's a
 > bug-fix -- at the very least the old behaviour didn't match the docs
 > for stadistinct:

 Yeah.  I suspect that situations like this are pretty rare, or we'd have
 recognized the problem sooner.  And at least for Andreas, it'd be fixing
 a real problem.  I'll apply the back-patch unless I hear objections in
 the next couple of hours.

 > Additionally, I think that example is misleading because it's only
 > really true if there are no null values in the column. Perhaps it
 > would help to have a more explicit example to illustrate how nulls
 > affect stadistinct, for example:

 Good idea, will do.

 regards, tom lane
 
 
(Sorry for the HTML-formatting, I'm using a web-based client which isn't very 
good at converting the HTML to plain-text)
 
Just wanted to thank you for fixing this.
 
Take this query (pg-9.5):
 
explain analyze SELECT com.entity_id AS company_id , sum((coalesce(log
.adjusted_time,log.duration) / (3600 * 1000)::numeric) * coalesce(log
.adjusted_hourly_rate,log.hourly_rate)) as not_invoiced_hours_amount_adjusted , 
sum((log.duration / (3600 * 1000)::numeric) * log.hourly_rate) as 
not_invoiced_hours_amount_original ,sum(coalesce(log.adjusted_time, log
.duration))AS not_invoiced_hours_duration_adjusted , sum(log.duration) AS 
not_invoiced_hours_duration_originalFROM onp_crm_relation com JOIN 
onp_crm_activity_loglog ON com.entity_id = log.relation_id JOIN onp_crm_person 
logforON logfor.onp_user_id = log.logged_for AND logfor.is_resource = FALSE  
LEFT OUTER JOIN( onp_crm_activity act INNER JOIN onp_crm_project proj ON 
act.project_id = proj.id )ON log.activity_id = act.id WHERE 1 = 1 AND log
.entry_type ='TIMESHEET_ENTRY' AND log.start_date IS NOT NULL  AND log
.is_chargable =TRUE  AND log.status IN (1,2,3,5) AND log.start_date <= 
'2016-06-27' -- AND com.entity_id = 2246068 AND NOT EXISTS( SELECT * FROM 
onp_crm_calendarentry_invoice_membership cemJOIN onp_crm_invoice_line il ON 
cem.invoice_line_id = il.idJOIN onp_crm_invoice inv ON il.invoice_id = 
inv.entity_idWHERE cem.calendar_entry_id = log.id AND inv.status_key = 
'INVOICE_STATUS_INVOICED' AND inv.sent_date <= '2016-06-27' AND NOT EXISTS( 
SELECT* FROM onp_crm_invoice creditnote WHERE il.invoice_id = 
creditnote.credit_againstAND creditnote.status_key = 'INVOICE_STATUS_INVOICED' 
ANDcreditnote.sent_date <= '2016-06-27' ) ) GROUP BY com.entity_id; 
 
Before the fix, the plan was like this:
 
QUERY PLAN 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13874.94..13894.38 rows=972 width=32) (actual time
=420188.465..420189.232rows=462 loops=1) Group Key: com.entity_id -> Nested 
Loop Left Join(cost=1710.21..13848.21 rows=972 width=32) (actual time
=1004.858..420166.736rows=3882 loops=1) -> Nested Loop Anti Join (cost
=1709.51..12990.02rows=972 width=36) (actual time=1003.001..419356.749 rows=3882
loops=1) Join Filter: (cem.calendar_entry_id = log.id) Rows Removed by Join 
Filter: 3996874113 -> Hash Join (cost=1172.69..10612.53 rows=972 width=44) 
(actualtime=29.085..333.323 rows=89431 loops=1) Hash Cond: (log.relation_id = 
com.entity_id) ->Hash Join (cost=1063.39..10489.85 rows=976 width=40) (actual 
time=27.598..261.822 rows=89431 loops=1) Hash Cond: (log.logged_for = 
logfor.onp_user_id) -> Bitmap Heap Scanon onp_crm_activity_log log (cost
=976.83..10055.99rows=90010 width=44) (actual time=27.402..223.407 rows=89486 
loops=1) Recheck Cond: ((status = ANY ('{1,2,3,5}'::integer[])) AND is_chargable
AND(start_date IS NOT NULL) AND ((entry_type)::text = 'TIMESHEET_ENTRY'::text)) 
Filter: (start_date <= '2016-06-27'::date) Rows Removed by Filter: 991 Heap 
Blocks: exact=6095 -> Bitmap Index Scan on origo_calendar_entry_status_1_idx (
cost=0.00..954.33 rows=93008 width=0) (actual time=26.339..26.339 rows=101274 
loops=1) Index Cond: (status = ANY ('{1,2,3,5}'::integer[])) -> Hash (cost
=39.46..39.46rows=3768 width=8) (actual time=0.148..0.148 rows=36 loops=1) 
Buckets:4096 Batches: 1 Memory Usage: 34kB -> Bitmap Heap Scan on 
onp_crm_person logfor (cost=3.69..39.46 rows=3768 width=8) (actual time
=0.049..0.137rows=36 loops=1) Recheck Cond: (onp_user_id IS NOT NULL) Filter: (
NOTis_resource) Rows Removed by Filter: 5 Heap Blocks: exact=11 -> Bitmap Index 
Scanon onp_crm_person_onp_id_idx (cost=0.00..2.75 rows=41 width=0) (actual time
=0.021..0.021rows=41 loops=1) -> Hash (cost=80.66..80.66 rows=2291 width=8) 
(actualtime=1.449..1.449 rows=2292 loops=1) Buckets: 4096 Batches: 1 Memory 
Usage: 122kB -> Index Only Scan using onp_crm_relation_pkey on onp_crm_relation 
com (cost=0.28..80.66 rows=2291 width=8) (actual time=0.221..1.036 rows=2292 
loops=1) Heap Fetches: 1127 -> Materialize (cost=536.82..2362.91 rows=1 width=4
) (actualtime=0.000..1.564 rows=44693 loops=89431) -> Nested Loop (cost
=536.82..2362.91rows=1 width=4) (actual time=10.480..382.816 rows=85668 loops=1
) -> NestedLoop (cost=536.53..2362.59 rows=1 width=8) (actual time
=10.437..176.495rows=88731 loops=1) -> Hash Anti Join (cost=536.24..2362.05 rows
=1 width=8) (actual time=9.914..44.329 rows=76347 loops=1) Hash Cond: 
(il.invoice_id = creditnote.credit_against) ->Index Only Scan using 
origo_invoice_line_id_invoice_idxon onp_crm_invoice_line il (cost=0.29..1530.95 
rows=78707 width=8) (actual time=0.213..20.450 rows=78787 loops=1) Heap Fetches:
3131 -> Hash (cost=390.36..390.36 rows=11647 width=8) (actual time=9.589..9.589 
rows=372 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 143kB -> Index Only 
Scanusing origo_invoice_credit_against_status_sent_idx on onp_crm_invoice 
creditnote (cost=0.29..390.36 rows=11647 width=8) (actual time=0.205..8.953 rows
=11507 loops=1) Index Cond: ((status_key = 'INVOICE_STATUS_INVOICED'::text) AND 
(sent_date <='2016-06-27'::date)) Heap Fetches: 999 -> Index Only Scan using 
onp_crm_calendarentry_invoice_invoice_line_id_calendar_entr_keyon 
onp_crm_calendarentry_invoice_membership cem (cost=0.29..0.45 rows=9 width=8) 
(actualtime=0.001..0.001 rows=1 loops=76347) Index Cond: (invoice_line_id = 
il.id) Heap Fetches:4951 -> Index Only Scan using 
origo_invoice_id_status_sent_idxon onp_crm_invoice inv (cost=0.29..0.31 rows=1 
width=8) (actual time=0.002..0.002 rows=1 loops=88731) Index Cond: ((entity_id 
= il.invoice_id)AND (status_key = 'INVOICE_STATUS_INVOICED'::text) AND 
(sent_date <='2016-06-27'::date)) Heap Fetches: 9293 -> Nested Loop (cost
=0.71..0.87rows=1 width=4) (actual time=0.207..0.207 rows=1 loops=3882) -> Index
Scanusing onp_crm_activity_pkey on onp_crm_activity act (cost=0.42..0.56 rows=1 
width=8) (actual time=0.199..0.199 rows=1 loops=3882) Index Cond: (log
.activity_id = id) ->Index Only Scan using onp_crm_project_pkey on 
onp_crm_project proj (cost=0.29..0.30 rows=1 width=4) (actual time=0.006..0.006 
rows=1 loops=3882) Index Cond: (id = act.project_id) Heap Fetches: 2782 Planning
time: 33.132 ms Execution time: 420195.041 ms (55 rows) 
 
After the fix (using pg-9.5.4), the plan is this:
 
QUERY PLAN 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=44223.20..44269.02 rows=2291 width=32) (actual time
=604.480..604.729rows=462 loops=1) Group Key: com.entity_id -> Hash Left Join (
cost=40805.80..42869.54 rows=49224 width=32) (actual time=591.912..597.284 rows=
3882loops=1) Hash Cond: (log.activity_id = act.id) -> Hash Join (cost
=24077.93..25498.04rows=49224 width=36) (actual time=309.816..313.542 rows=3882 
loops=1) Hash Cond: (log.relation_id = com.entity_id) -> Merge Join (cost
=23968.63..24711.11rows=49435 width=32) (actual time=304.949..307.474 rows=3882 
loops=1) Merge Cond: (logfor.onp_user_id = log.logged_for) -> Index Scan using 
onp_crm_person_onp_id_idxon onp_crm_person logfor (cost=0.14..41.05 rows=3768 
width=8) (actual time=0.194..1.321 rows=36 loops=1) Filter: (NOT is_resource) 
RowsRemoved by Filter: 5 -> Sort (cost=23968.49..24092.29 rows=49520 width=36) 
(actualtime=304.579..305.137 rows=3893 loops=1) Sort Key: log.logged_for Sort 
Method: quicksort Memory:401kB -> Hash Anti Join (cost=9997.74..20106.99 rows=
49520width=36) (actual time=147.213..302.609 rows=3893 loops=1) Hash Cond: (log
.id = cem.calendar_entry_id) -> Bitmap Heap Scanon onp_crm_activity_log log (
cost=976.83..10055.99 rows=90010 width=44) (actual time=28.286..158.599 rows=
89486loops=1) Recheck Cond: ((status = ANY ('{1,2,3,5}'::integer[])) AND 
is_chargableAND (start_date IS NOT NULL) AND ((entry_type)::text = 
'TIMESHEET_ENTRY'::text)) Filter: (start_date <= '2016-06-27'::date) Rows 
Removedby Filter: 991 Heap Blocks: exact=6095 -> Bitmap Index Scan on 
origo_calendar_entry_status_1_idx (cost=0.00..954.33 rows=93008 width=0) (actual
time=27.251..27.251 rows=101274 loops=1) Index Cond: (status = ANY ('{1,2,3,5}'
::integer[])) -> Hash (cost=7964.73..7964.73 rows=84494 width=4) (actual time
=118.106..118.106rows=85668 loops=1) Buckets: 131072 Batches: 1 Memory Usage: 
4036kB -> Hash Join (cost=1069.29..7964.73 rows=84494 width=4) (actual time
=15.605..102.721rows=85668 loops=1) Hash Cond: (il.invoice_id = inv.entity_id) 
->Hash Anti Join (cost=536.55..6265.54 rows=85741 width=8) (actual time
=5.467..71.124rows=88731 loops=1) Hash Cond: (il.invoice_id = 
creditnote.credit_against) -> MergeJoin (cost=0.61..4501.71 rows=89266 width=8) 
(actualtime=0.441..53.014 rows=89301 loops=1) Merge Cond: (il.id = 
cem.invoice_line_id) ->Index Only Scan using origo_invoice_line_id_invoice_idx 
ononp_crm_invoice_line il (cost=0.29..1530.95 rows=78707 width=8) (actual time
=0.226..15.901rows=78787 loops=1) Heap Fetches: 3131 -> Index Only Scan using 
onp_crm_calendarentry_invoice_invoice_line_id_calendar_entr_keyon 
onp_crm_calendarentry_invoice_membership cem (cost=0.29..1662.98 rows=89266 
width=8) (actual time=0.194..16.489 rows=89301 loops=1) Heap Fetches: 5027 -> 
Hash(cost=390.36..390.36 rows=11647 width=8) (actual time=4.785..4.785 rows=372 
loops=1) Buckets: 16384 Batches: 1 Memory Usage: 143kB -> Index Only Scan using 
origo_invoice_credit_against_status_sent_idxon onp_crm_invoice creditnote (cost
=0.29..390.36rows=11647 width=8) (actual time=0.020..4.213 rows=11507 loops=1) 
IndexCond: ((status_key = 'INVOICE_STATUS_INVOICED'::text) AND (sent_date <= 
'2016-06-27'::date)) Heap Fetches: 999 -> Hash (cost=387.15..387.15 rows=11647 
width=8) (actual time=10.063..10.063 rows=11507 loops=1) Buckets: 16384 Batches:
1Memory Usage: 578kB -> Index Only Scan using origo_invoice_id_status_sent_idx 
ononp_crm_invoice inv (cost=0.29..387.15 rows=11647 width=8) (actual time
=0.233..8.133rows=11507 loops=1) Index Cond: ((status_key = 
'INVOICE_STATUS_INVOICED'::text) AND (sent_date <= '2016-06-27'::date)) Heap 
Fetches:999 -> Hash (cost=80.66..80.66 rows=2291 width=8) (actual time
=4.831..4.831rows=2292 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 122kB -> 
Index OnlyScan using onp_crm_relation_pkey on onp_crm_relation com (cost
=0.28..80.66rows=2291 width=8) (actual time=0.235..4.370 rows=2292 loops=1) 
Heap Fetches:1127 -> Hash (cost=14012.31..14012.31 rows=217245 width=4) (actual 
time=281.015..281.015 rows=220147 loops=1) Buckets: 262144 Batches: 1 Memory 
Usage: 9788kB -> Hash Join (cost=433.74..14012.31 rows=217245 width=4) (actual 
time=29.243..235.289 rows=220147 loops=1) Hash Cond: (act.project_id = proj.id) 
-> Seq Scanon onp_crm_activity act (cost=0.00..10591.45 rows=217245 width=8) 
(actualtime=0.186..139.016 rows=220147 loops=1) -> Hash (cost=262.35..262.35 
rows=13711 width=4) (actual time=28.980..28.980 rows=13760 loops=1) Buckets: 
16384Batches: 1 Memory Usage: 612kB -> Index Only Scan using 
onp_crm_project_pkeyon onp_crm_project proj (cost=0.29..262.35 rows=13711 width=
4) (actual time=0.067..26.575 rows=13760 loops=1) Heap Fetches: 6146 Planning 
time: 41.021 ms Execution time: 605.530 ms (60 rows) 
 
That's 420195 / 605 = 695x speedup!
Thanks!
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 

Reply via email to