I consider using PostgreSQL for a project we have in
our company and, to get a better picture of the product, I started
scanning its source code and internal documentation.
Based on what I saw (and maybe I didn't see enough)
it seems that the optimizer will always decide to repeatedly scan the
whole row set returned by sub selects in the context of an IN
clause sequentially, as opposed to what I would expect it to do (which is
to create some index or hash structure to improve
performance).
For example, if I have the following
query:
Select * from a where x in (select y from b where z=7)
Select * from a where x in (select y from b where z=7)
Then I would expect an index or hash structure to be created
for b.y when it is first scanned and brought into the cache but I couldn't see
it happening in the source.
As I said, I only inferred it from reading the
source - not from actual experiments - so I may be
wrong.
1. Am I
wrong?
2. If I'm right, is
there any plan to change it (after all, in the context of an IN clause, an index
on the returned row set is all that is needed - the row set itself does not seem
to matter).
Thank
you,
Michael
Rothschild