I have a large table (>2 Mrows) against which my application runs some 1M queries per day. The queries are almost all of the nature of
select PRIMARY_KEY_FLD from MY_TABLE where SECONDARY_KEY_FLD = 'something'; I've optimized the table to the limits of what I can think of, now I'm looking to optimize the application. I'm really only interested in the yes/no question of whether the row exists (the returned PRIMARY_KEY_FLD value is pretty much just for debug documentation). What I'm wondering is whether my application would get better (faster) results if I ran a number of queries at once. For example, I could build SQL that looks like: select PRIMARY_KEY_FLD, SECONDARY_KEY_FLD from MY_TABLE WHERE SECONDARY_KEY_FLD in (first_val, second_val, ...); so that the list contains some number of values (either a limited "slice" of the values I care about, or the whole pile--usually 100-200), and iterate this until I've checked them all. Another technique would be to construct a "UNION" table: select PRIMARY_KEY_FLD, SECONDARY_KEY_FLD from MY_TABLE join ( select first_val as "SECONDARY_KEY_FLD" union select second_val union ...) as TEMP_TABLE using (SECONDARY_KEY_FLD) likewise, either taking some number of my desired values in "chunks" or all at once. What's likely to work better, and why? (I'm going off to write a benchmark script, but I'd like to hear some theoretical answers, too.) -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org