Fair enough! I also turned seqscan off, so the new plan (for the NOT EXISTS) is:
Merge Anti Join (cost=0.00..212686.89 rows=1 width=313) (actual time=0.426..14921.344 rows=63836 loops=1) Merge Cond: ((d2.hwdocid)::text = (d.hwdocid)::text) -> Index Scan using hwdocid2_uniq on "Doc2" d2 (cost=0.00..19442.87 rows=96454 width=313) (actual time=0.130..1248.783 rows=96454 loops=1) -> Index Scan using hwdocid_uniq on "Document" d (cost=0.00..189665.17 rows=949272 width=12) (actual time=0.085..11158.740 rows=948336 loops=1) Total runtime: 15062.925 ms Hmm.. doesn't really seem to be such a great boost on performance. But i guess I'll be sticking to this one. So my follow-up question on the subject is this: Are there any particular semantics for the "NOT IN" statement that cause the correlated query to execute for every row of the outter query, as opposed to the "NOT EXISTS" ? Or are there any other practical reasons, related to "IN / NOT IN", for this to be happening? Or is it simply due to implementation details of each RDBMS? I guess the former (or the 2nd one), since, as you say, this is common in most databases, but I would most appreciate an answer to clarify this. Thanks again! Best regards, George 2011/1/7 Mladen Gogala <mladen.gog...@vmsinfo.com> > On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote: > >> >> 1) Why is it taking *so* long for the first query (with the "NOT IN" ) to >> do even the simple select? >> > Because NOT IN has to execute the correlated subquery for every row and > then check whether the requested value is in the result set, usually by > doing sequential comparison. The NOT EXIST plan is also bad because there is > no index but at least it can use very fast and efficient hash algorithm. > Indexing the "hwdocid" column on the "Document" table or, ideally, making it > a primary key, should provide an additional boost to your query. If you > already do have an index, you may consider using enable_seqscan=false for > this session, so that the "hwdocid" index will be used. It's a common wisdom > that in the most cases NOT EXISTS will beat NOT IN. That is so all over the > database world. I've seen that in Oracle applications, MS SQL applications > and, of course MySQL applications. Optimizing queries is far from trivial. > > Μλαδεν Γογαλα > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > >