I think it will be fast, because the "IN set", which is the result of
"SELECT processorid FROM users_processors WHERE userid=4040", is limited
to a maximum of ~500 processors which is not very big. Increasing
Postgres' RAM would be difficult for me, because I am only running a
very small server with 256MB RAM and the webserver also likes to use
some RAM.
Does Postgre cache the HASH-Table for later use? For example when the
user reloads the website.
Kind regards
Ulrich
Rusty Conover wrote:
This is what I've found with tables ranging in the millions of rows.
Using IN is better when you've got lots of rows to check against the
IN set and the IN set may be large and possibly complicated to
retrieve (i.e. lots of joins, or expensive functions).
Postgres will normally build a hash table of the IN set and just
search that hash table. It's especially fast if the entire hash table
that is built can fit into RAM. The cpu/io cost of building the IN
set can be quite large because it needs to fetch every tuple to hash
it, but this can be faster then searching tuple by tuple through
possibly many indexes and tables like EXISTS does. I like to increase
work_mem a lot (512mb and up) if I know I'm going to be doing a lot of
matches against a large IN set of rows because I'd prefer for that
hash table to never to be written to disk.
EXISTS is better when you're doing fewer matches because it will pull
the rows out one at a time from its query possibly using indexes, its
main advantage is that it doesn't pull all of the tuples before it
starts processing matches.
So in summary both are good to know how to use, but choosing which one
to use can really depend on your data set and resources.
Cheers,
Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance