IMHO the problems here are due to poor cardinality estimates.
For example in the first query, the problem is here:
-> Nested Loop (cost=0.42..2.46 rows=1 width=59)
(actual time=2.431..91.330 rows=3173 loops=1)
-> CTE Scan on b (cost=0.00..0.02 rows=1 width=40)
(actual time=2.407..23.115 rows=3173 loops=1)
-> Index Scan using domains_pkey on domains d
(cost=0.42..2.44 rows=1 width=19)
(actual time=0.018..0.018 rows=1 loops=3173)
That is, the database expects the CTE to return 1 row, but it returns
3173 of them, which makes the nested loop very inefficient.
Similarly for the other query, where this happens:
Nested Loop (cost=88.63..25617.31 rows=491 width=16)
(actual time=3.512..733248.271 rows=1442797 loops=1)
-> HashAggregate (cost=88.06..88.07 rows=1 width=4)
(actual time=3.380..13.561 rows=3043 loops=1)
That is, about 1:3000 difference in both cases.
Those estimation errors seem to be caused by a condition that is almost
impossible to estimate, because in both queries it does this:
groups->0->>'provider' ~ '^something'
That is, it's a regexp on an expression. You might try creating an index
on the expression (which is the only way to add expression statistics),
and reformulate the condition as LIKE (which I believe we can estimate
better than regular expressions, but I haven't tried).
So something like
CREATE INDEX ON adroom ((groups->0->>'provider'));
WHERE groups->0->>'provider' LIKE 'something%';
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services