Thank you all for your suggestions. My original query not only took 4.6 hours to run, but also returned the wrong answer. The "not exists" version below returned the correct answer (as far as I can tell) in 2 minutes. I thought these were essentially the same query but apparently they are not. I have not yet tried the left join version.
SELECT * FROM parcels_esri WHERE NOT EXISTS (SELECT survey.parcelnum FROM survey WHERE parcels_esri.parcelnumb = survey.parcelnum) "Seq Scan on parcels_esri (cost=0.00..3320785.37 rows=193444 width=4537) (actual time=17.793..4154.274 rows=84489 loops=1)" " Filter: (NOT (subplan))" " SubPlan" " -> Index Scan using parcelnum_idx on survey (cost=0.00..8.35 rows=1 width=26) (actual time=0.009..0.009 rows=1 loops=386887)" " Index Cond: ($0 = parcelnum)" "Total runtime: 4200.203 ms" Running EXPLAIN ANALYZE on my original query takes more time than I am willing to wait. This seems like a very problematic formulation: SELECT * FROM parcels_esri WHERE parcels_esri.parcelnumb NOT IN (SELECT survey.parcelnum FROM survey) Thanks again for your help. I now know the huge improvements that can come from such variations. -david _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
