Turns out we had not 9.6 but 9.5. And query plan from 9.5 is:
Sort (cost=319008.18..319008.19 rows=1 width=556) (actual time=0.028..0.028 rows=0 loops=1) Sort Key: (sum(st.shows)) DESC Sort Method: quicksort Memory: 25kB CTE a -> Index Scan using adroom_active_idx on adroom (cost=0.13..5.21 rows=1 width=584) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: ((now() >= start_ts) AND (now() <= stop_ts)) Filter: (((groups -> 0) ->> 'provider'::text) ~ '^target_mail_ru'::text) CTE b -> HashAggregate (cost=1.27..1.77 rows=100 width=68) (actual time=0.005..0.005 rows=0 loops=1) Group Key: a.provider, a.id, unnest(a.domain_ids) -> CTE Scan on a (cost=0.00..0.52 rows=100 width=68) (actual time=0.004..0.004 rows=0 loops=1) -> HashAggregate (cost=319001.17..319001.18 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1) Group Key: b.provider, d.domain -> Hash Join (cost=16.55..319001.16 rows=1 width=556) (actual time=0.013..0.013 rows=0 loops=1) Hash Cond: ((st.adroom_id = b.id) AND (st.domain_id = b.domain_id)) -> Hash Join (cost=13.05..318633.29 rows=48581 width=536) (never executed) Hash Cond: (st.domain_id = d.id) -> Index Scan using adroom_stat_day_adroom_id_domain_id_url_id_is_wlabp_idx on adroom_stat st (cost=0.58..313307.30 rows=1287388 width=16) (never executed) Index Cond: ((day >= date_trunc('day'::text, (now() - '7 days'::interval))) AND (day <= date_trunc('day'::text, now()))) -> Hash (cost=11.10..11.10 rows=110 width=520) (never executed) -> Seq Scan on domains d (cost=0.00..11.10 rows=110 width=520) (never executed) -> Hash (cost=2.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> CTE Scan on b (cost=0.00..2.00 rows=100 width=40) (actual time=0.007..0.007 rows=0 loops=1) Planning time: 6.641 ms Execution time: 0.203 ms Also I prepared test case for Tom and sent it to him. Dmitry Shalashov, relap.io & surfingbird.ru 2017-11-22 18:19 GMT+03:00 Tom Lane <t...@sss.pgh.pa.us>: > Dmitry Shalashov <skau...@gmail.com> writes: > > BUT if I'll add to 3rd query one additional condition, which is basically > > 2nd query, it will ran same 12 minutes: > > SELECT adroom_id, domain_id, shows, clicks FROM adroom_stat WHERE day > > between date_trunc('day', current_timestamp - interval '1 week') and > > date_trunc('day', current_timestamp) AND domain_id IN (select distinct > > unnest(domain_ids) FROM (select title, id, groups->0->>'provider' > provider, > > domain_ids from adroom where groups->0->>'provider' ~ '^target_mail_ru' > and > > not is_paused and current_timestamp between start_ts and stop_ts) t1) > > > Plan of last query: > > Nested Loop (cost=88.63..25617.31 rows=491 width=16) (actual > > time=3.512..733248.271 rows=1442797 loops=1) > > -> HashAggregate (cost=88.06..88.07 rows=1 width=4) (actual > > time=3.380..13.561 rows=3043 loops=1) > > Group Key: (unnest(adroom.domain_ids)) > > -> HashAggregate (cost=88.03..88.04 rows=1 width=4) (actual > > time=2.199..2.607 rows=3043 loops=1) > > Group Key: unnest(adroom.domain_ids) > > -> ProjectSet (cost=0.28..87.78 rows=100 width=4) > (actual > > time=0.701..1.339 rows=3173 loops=1) > > Hm, seems like the problem is that that lower HashAggregate is estimated > as having only one row out, which is way off and doesn't sound like a > particularly bright default estimate anyway. (And then we're doing an > additional HashAggregate on top of that, which is useless --- implies > that something isn't realizing that the output of the SELECT DISTINCT > is already distinct.) > > I'm suspicious that this is breakage from the work that was done on > targetlist SRFs in v10, but that's just a guess at this point. > > Trying simple test queries involving WHERE x IN (SELECT DISTINCT > unnest(foo) FROM ...), I do not see a behavior like this, so there is some > not-very-obvious contributing factor in your situation. Can you put > together a self-contained test case that produces a bogus one-row > estimate? Extra points if it produces duplicate HashAgg steps. > > regards, tom lane >