> > > > > > *Query1* > > > > SELECT person_id FROM person WHERE (column1=1 AND column2='62') > > > > INTERSECT > > > > SELECT person_id FROM person WHERE (column1=1 AND column2='189') > > > I get the same plan(see below) with 'sort' for 'intersect all' > operation > > too. Why is intersect not an effecient way? Is there any other way this > > query/index can be written/created so that I can get the intersect > results > > in an efficient way? > > Set operations are rather inefficient. To find the intersection of two > arbitrary sets you need to sort them and compare. A query like you > write would be better expressed as a join, something like: > > SELECT a.person_id > FROM (SELECT person_id FROM person WHERE (column1=1 AND column2='62') > a, > (SELECT person_id FROM person WHERE (column1=1 AND column2='189') b > WHERE a.person_id = b.person_id; > > or perhaps: > > SELECT a.person_id > FROM person a, person b > WHERE a.column1=1 AND a.column2='62' > AND b.column1=1 AND b.column2='189' > AND a.person_id = b.person_id; > > Which will probably generate a merge join... >
Thanks. But this query seems to be more expensive than using intersect operator. This is the explain analyse plan for this query. It took 5 1/2 minutes to generate this. I also tried to disable the mergejoin and in that case it uses hash join and still takes more than 3 minutes (intersect took only 40 sec) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual time=30562.630..264534.677 rows=225145385 loops=1) Merge Cond: (a.patient_id = b.patient_id) -> Sort (cost=527974.81..528929.79 rows=381993 width=4) (actual time= 3755.361..3845.134 rows=213435 loops=1) Sort Key: a.patient_id Sort Method: quicksort Memory: 15868kB -> Bitmap Heap Scan on clinical_variable2 a (cost= 17886.42..492557.97 rows=381993 width=4) (actual time=315.753..3410.366rows=327498 loops=1) Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text)) -> Bitmap Index Scan on clinical_variable_idx_topserv (cost=0.00..17790.92 rows=381993 width=0) (actual time=277.185..277.185rows=327498 loops=1) Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '62'::text)) -> Materialize (cost=1082674.11..1124546.38 rows=3349781 width=4) (actual time=26807.248..99885.620 rows=225148250 loops=1) -> Sort (cost=1082674.11..1091048.57 rows=3349781 width=4) (actual time=26807.238..30343.870 rows=3429228 loops=1) Sort Key: b.patient_id Sort Method: external merge Disk: 53552kB -> Bitmap Heap Scan on clinical_variable2 b (cost= 156754.24..679555.96 rows=3349781 width=4) (actual time=2744.126..20106.160rows=3429228 loops=1) Recheck Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text)) -> Bitmap Index Scan on clinical_variable_idx_topserv (cost=0.00..155916.80 rows=3349781 width=0) (actual time=2686.456..2686.456rows=3429228 loops=1) Index Cond: ((top_parent_service_sys_id = 1) AND ((top_parent_service_code)::text = '189'::text)) Total runtime: 324646.035ms 18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms] Is there any other way you can think of to solve this problem. May be creating the indexes in a different way or something? Thanks jo