[PERFORM] Terrible plan for join to nested union

Nate Allan Sat, 07 Jul 2012 15:35:54 -0700

I have a query which joins to a nested union and I'm getting a plan which never 
returns.  Here is the query simplified as much as possible:


select 'anything' as result
               from "Attribute" as A1
                              inner join
                              (
                                             select R."TargetID" as "SourceID"
                                                            from "Relationship" 
as R
                                             union
                                             select A2."PersonID" as "SourceID"
                                                            from "Attribute" as 
A2
                              ) as X on (A1."PersonID" = X."SourceID")
               where (A1."ID" = 124791200)

(this seems like a strange query, but it is simplified to eliminate everything 
I could)

Here is the execution plan I am seeing:
http://explain.depesz.com/s/BwUd

Merge Join  (cost=229235406.73..244862067.56 rows=727 width=0)
  Output: 'anything'
  Merge Cond: (r."TargetID" = a1."PersonID")
  ->  Unique  (cost=229235336.51..233700093.63 rows=892951424 width=8)
        Output: r."TargetID"
        ->  Sort  (cost=229235336.51..231467715.07 rows=892951424 width=8)
              Output: r."TargetID"
              Sort Key: r."TargetID"
              ->  Append  (cost=0.00..23230287.48 rows=892951424 width=8)
                    ->  Seq Scan on public."Relationship" r  
(cost=0.00..5055084.88 rows=328137088 width=8)
                          Output: r."TargetID"
                    ->  Seq Scan on public."Attribute" a2  
(cost=0.00..9245688.36 rows=564814336 width=8)
                          Output: a2."PersonID"
  ->  Materialize  (cost=70.22..70.23 rows=1 width=8)
        Output: a1."PersonID"
        ->  Sort  (cost=70.22..70.23 rows=1 width=8)
              Output: a1."PersonID"
              Sort Key: a1."PersonID"
              ->  Index Scan using "UIDX_Attribute_ID" on public."Attribute" a1 
 (cost=0.00..70.21 rows=1 width=8)
                    Output: a1."PersonID"
                    Index Cond: (a1."ID" = 124791200)

As you can see, the Relationship table has ~300 million rows and Attribute has 
~500 million rows.  I could not include the explain analyze because the query 
never completes.  Going to "union all" fixes it, nesting the restriction fixes 
it, making the restriction limit X rather than A1 fixes it.  Unfortunately, 
none of these "fixes" are acceptable within the context of the complete query 
this was simplified from.

Version string: PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc 
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit
OS: CentOS 5
RAM: 128GB
Processor: AMD Opteron(tm) 6174, 24 cores

I've not changed any configuration settings from the based EnterpriseDB 
installer besides shared_buffers.  Presently the DB is static, and I have 
executed analyze to update the stats since loading it.

Relevant schema:

CREATE TABLE "Attribute"
(
  "ID" bigint NOT NULL,
  "PersonID" bigint NOT NULL,
  "Type" character varying(5) NOT NULL
)
WITH (  OIDS=FALSE);

CREATE INDEX "IDX_Attribute_PersonID_Type"  ON "Attribute"  USING btree
  ("PersonID" , "Type" COLLATE pg_catalog."default" );

CREATE UNIQUE INDEX "UIDX_Attribute_ID"
  ON "Attribute"  USING btree  ("ID" );

CREATE TABLE "Relationship"
(
  "ID" bigint NOT NULL,
  "TargetID" bigint NOT NULL
) WITH (  OIDS=FALSE);

CREATE INDEX "IDX_Relationship_TargetID"
  ON "Relationship"  USING btree   ("TargetID" );

CREATE UNIQUE INDEX "UIDX_Relationship_ID"
  ON "Relationship"   USING btree  ("ID" );

Thanks,

-Nate

[PERFORM] Terrible plan for join to nested union

Reply via email to