The thing here is that you are effectively causing Postgres to run a
sub-select for each row of the "result" table, each time generating
either an empty list or a list with one or more identical URLs. This
is effectively forcing a nested loop. In a way, you have two
constraints where you only need one.
You can safely take out the constraint in the subquery, so it is
like this:
SELECT COUNT(*) FROM result WHERE url IN (SELECT shorturl FROM item);
This will generate equivalent results, because those rows that
didn't match the constraint wouldn't have affected the IN anyway.
However, it will alter the performance, because the subquery will
contain more results, but it will only be run once, rather than
multiple times. This is effectively forcing a hash join (kind of).
Whereas if you rewrite the query as I demonstrated earlier, then you
allow Postgres to make its own choice about which join algorithm
will work best.
Matthew
Thank you! I learned a lot today :-)
I thought the subquery will be run on every row thus I tried to make
it as fast as possible by using a where clause. I didn't try your
first query on the hole table so it could be faster than mine approach.
greetings,
moritz
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance