Re: [PERFORM] "SELECT .. WHERE NOT IN" query running for hours

Γιωργος Βαλκανας Fri, 07 Jan 2011 01:29:53 -0800

Fair enough!

I also turned seqscan off, so the new plan (for the NOT EXISTS) is:


Merge Anti Join (cost=0.00..212686.89 rows=1 width=313) (actual
time=0.426..14921.344 rows=63836 loops=1)
 Merge Cond: ((d2.hwdocid)::text = (d.hwdocid)::text)
 -> Index Scan using hwdocid2_uniq on "Doc2" d2 (cost=0.00..19442.87
rows=96454 width=313) (actual time=0.130..1248.783 rows=96454 loops=1)
 -> Index Scan using hwdocid_uniq on "Document" d (cost=0.00..189665.17
rows=949272 width=12) (actual time=0.085..11158.740 rows=948336 loops=1)
Total runtime: 15062.925 ms

Hmm.. doesn't really seem to be such a great boost on performance. But i
guess I'll be sticking to this one.

So my follow-up question on the subject is this:

Are there any particular semantics for the "NOT IN" statement that cause the
correlated query to execute for every row of the outter query, as opposed to
the "NOT EXISTS" ? Or are there any other practical reasons, related to "IN
/ NOT IN", for this to be happening? Or is it simply due to implementation
details of each RDBMS? I guess the former (or the 2nd one), since, as you
say, this is common in most databases, but I would most appreciate an answer
to clarify this.

Thanks again!

Best regards,
George



2011/1/7 Mladen Gogala <mladen.gog...@vmsinfo.com>

> On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote:
>
>>
>> 1) Why is it taking *so* long for the first query (with the "NOT IN" ) to
>> do even the simple select?
>>
> Because NOT IN has to execute the correlated subquery for every row and
> then check whether the requested value is in the result set, usually by
> doing sequential comparison. The NOT EXIST plan is also bad because there is
> no index but at least it can use very fast and efficient hash algorithm.
> Indexing the "hwdocid" column on the "Document" table or, ideally, making it
> a primary key, should provide an additional boost to your query. If you
> already do have an index, you may consider using enable_seqscan=false for
> this session, so that the "hwdocid" index will be used. It's a common wisdom
> that in the most cases NOT EXISTS will beat NOT IN. That is so all over the
> database world. I've seen that in Oracle applications, MS SQL applications
> and, of course MySQL applications. Optimizing queries is far from trivial.
>
> Μλαδεν Γογαλα
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>

Re: [PERFORM] "SELECT .. WHERE NOT IN" query running for hours

Reply via email to