IMHO the problems here are due to poor cardinality estimates.

For example in the first query, the problem is here:

    ->  Nested Loop  (cost=0.42..2.46 rows=1 width=59)
                     (actual time=2.431..91.330 rows=3173 loops=1)
        ->  CTE Scan on b  (cost=0.00..0.02 rows=1 width=40)
                           (actual time=2.407..23.115 rows=3173 loops=1)
        ->  Index Scan using domains_pkey on domains d
            (cost=0.42..2.44 rows=1 width=19)
            (actual time=0.018..0.018 rows=1 loops=3173)

That is, the database expects the CTE to return 1 row, but it returns
3173 of them, which makes the nested loop very inefficient.

Similarly for the other query, where this happens:

 Nested Loop  (cost=88.63..25617.31 rows=491 width=16)
              (actual time=3.512..733248.271 rows=1442797 loops=1)
   ->  HashAggregate  (cost=88.06..88.07 rows=1 width=4)
                      (actual time=3.380..13.561 rows=3043 loops=1)

That is, about 1:3000 difference in both cases.

Those estimation errors seem to be caused by a condition that is almost
impossible to estimate, because in both queries it does this:

    groups->0->>'provider' ~ '^something'

That is, it's a regexp on an expression. You might try creating an index
on the expression (which is the only way to add expression statistics),
and reformulate the condition as LIKE (which I believe we can estimate
better than regular expressions, but I haven't tried).

So something like

    CREATE INDEX ON adroom ((groups->0->>'provider'));

    WHERE groups->0->>'provider' LIKE 'something%';


Tomas Vondra        
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to