Re: [GENERAL] Recheck condition

Josh Harrison Fri, 30 Nov 2007 06:40:05 -0800

>
>
> > > > *Query1*
> > > > SELECT person_id  FROM person   WHERE (column1=1 AND column2='62')
> > > > INTERSECT
> > > > SELECT person_id  FROM person  WHERE (column1=1 AND column2='189')
>
> > I get the same plan(see below)  with 'sort'  for 'intersect all'
> operation
> > too. Why is intersect not an effecient way? Is there any other way this
> > query/index can be written/created so that I can get the intersect
> results
> > in an efficient way?
>
> Set operations are rather inefficient. To find the intersection of two
> arbitrary sets you need to sort them and compare. A query like you
> write would be better expressed as a join, something like:
>
> SELECT a.person_id
> FROM (SELECT person_id  FROM person   WHERE (column1=1 AND column2='62')
> a,
>     (SELECT person_id  FROM person  WHERE (column1=1 AND column2='189') b
> WHERE a.person_id = b.person_id;
>
> or perhaps:
>
> SELECT a.person_id
> FROM person a, person b
> WHERE a.column1=1 AND a.column2='62'
> AND b.column1=1 AND b.column2='189'
> AND a.person_id = b.person_id;
>
> Which will probably generate a merge join...
>


Thanks. But this query seems to be more expensive than using intersect
operator.
This is the explain analyse plan for this query. It took 5 1/2 minutes to
generate this. I also tried to disable the mergejoin and in that case it
uses hash join and still takes more than 3 minutes (intersect took only 40
sec)

QUERY
PLAN

 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Merge Join  (cost=1610648.92..10280119.99 rows=577856095 width=4) (actual
time=30562.630..264534.677 rows=225145385
loops=1)
   Merge Cond: (a.patient_id = b.patient_id)

   ->  Sort  (cost=527974.81..528929.79 rows=381993 width=4) (actual time=
3755.361..3845.134 rows=213435
loops=1)
         Sort Key: a.patient_id

         Sort Method:  quicksort  Memory:
15868kB

         ->  Bitmap Heap Scan on clinical_variable2 a  (cost=
17886.42..492557.97 rows=381993 width=4) (actual
time=315.753..3410.366rows=327498 loops=1)
               Recheck Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'62'::text))
               ->  Bitmap Index Scan on clinical_variable_idx_topserv
(cost=0.00..17790.92 rows=381993 width=0) (actual
time=277.185..277.185rows=327498 loops=1)
                     Index Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'62'::text))
   ->  Materialize  (cost=1082674.11..1124546.38 rows=3349781 width=4)
(actual time=26807.248..99885.620 rows=225148250
loops=1)
         ->  Sort  (cost=1082674.11..1091048.57 rows=3349781 width=4)
(actual time=26807.238..30343.870 rows=3429228
loops=1)
               Sort Key: b.patient_id

               Sort Method:  external merge  Disk:
53552kB

               ->  Bitmap Heap Scan on clinical_variable2 b  (cost=
156754.24..679555.96 rows=3349781 width=4) (actual
time=2744.126..20106.160rows=3429228 loops=1)
                     Recheck Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'189'::text))
                     ->  Bitmap Index Scan on clinical_variable_idx_topserv
(cost=0.00..155916.80 rows=3349781 width=0) (actual
time=2686.456..2686.456rows=3429228 loops=1)
                           Index Cond: ((top_parent_service_sys_id = 1) AND
((top_parent_service_code)::text =
'189'::text))
 Total runtime: 324646.035ms


 18 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] a: 0/ms]

Is there any other way you can think of to solve this problem. May be
creating the indexes in a  different way or something?

Thanks
jo

Re: [GENERAL] Recheck condition

Reply via email to