I have stopped this query after about 16 hours.  At the same time I ran a
'explain analyze' on the same query to find out why it took so long.  These
two processes generated temporary files of 173GB in
/var/lib/postgresql/9.4/main/base/pgsql_tmp.

COPY
  (SELECT A.ut,
          B.go AS funding_org,
          B.gn AS grant_no,
          C.gt AS thanks,
          D.au
   FROM isi.funding_text C,
        isi.rauthor D,
        isi.africa_uts A
   LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
   WHERE (C.ut IS NOT NULL
          OR B.ut IS NOT NULL)
     AND D.rart_id = C.ut
     AND C.ut = B.ut
   GROUP BY A.ut,
            GO,
            gn,
            gt,
            au
   ORDER BY funding_org) TO '/tmp/africafunding2.csv' WITH csv quote
'"' DELIMITER ',';


A modified version of this query finished in 1min 27 sek:

COPY
  (SELECT 'UT'||A.ut,
                B.go AS funding_org,
                B.gn AS grant_no,
                C.gt AS thanks
   FROM isi.africa_uts A
   LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
   LEFT JOIN isi.funding_text C ON (A.ut = C.ut)
   WHERE (C.ut IS NOT NULL
          OR B.ut IS NOT NULL)
   GROUP BY A.ut,
            GO,
            gn,
            gt) TO '/tmp/africafunding.csv' WITH csv quote '"' DELIMITER ',';

As I said, the process of 'explain analyze' of the problematic query
contributed to the 173GB
temporary files and did not finish in about 16 hours.

Just explain of the query part produces this:

"Sort  (cost=4781458203.46..4798118612.44 rows=6664163593 width=390)"
"  Output: a.ut, b.go, b.gn, c.gt, (array_to_string(array_agg(d.au),
';'::text)), b.go, b.gn, d.au"
"  Sort Key: b.go"
"  ->  GroupAggregate  (cost=2293037801.73..2509623118.51
rows=6664163593 width=390)"
"        Output: a.ut, b.go, b.gn, c.gt,
array_to_string(array_agg(d.au), ';'::text), b.go, b.gn, d.au"
"        Group Key: a.ut, b.go, b.gn, c.gt, d.au"
"        ->  Sort  (cost=2293037801.73..2309698210.72 rows=6664163593
width=390)"
"              Output: a.ut, c.gt, b.go, b.gn, d.au"
"              Sort Key: a.ut, b.go, b.gn, c.gt, d.au"
"              ->  Merge Join  (cost=4384310.92..21202716.78
rows=6664163593 width=390)"
"                    Output: a.ut, c.gt, b.go, b.gn, d.au"
"                    Merge Cond: ((c.ut)::text = (d.rart_id)::text)"
"                    ->  Merge Join  (cost=635890.84..1675389.41
rows=6069238 width=412)"
"                          Output: c.gt, c.ut, a.ut, b.go, b.gn, b.ut"
"                          Merge Cond: ((c.ut)::text = (b.ut)::text)"
"                          Join Filter: ((c.ut IS NOT NULL) OR (b.ut
IS NOT NULL))"
"                          ->  Merge Join  (cost=635476.30..675071.77
rows=1150354 width=348)"
"                                Output: c.gt, c.ut, a.ut"
"                                Merge Cond: ((a.ut)::text = (c.ut)::text)"
"                                ->  Index Only Scan using
africa_ut_idx on isi.africa_uts a  (cost=0.42..19130.19 rows=628918
width=16)"
"                                      Output: a.ut"
"                                ->  Sort  (cost=632211.00..640735.23
rows=3409691 width=332)"
"                                      Output: c.gt, c.ut"
"                                      Sort Key: c.ut"
"                                      ->  Seq Scan on
isi.funding_text c  (cost=0.00..262238.91 rows=3409691 width=332)"
"                                            Output: c.gt, c.ut"
"                          ->  Index Scan using funding_org_ut_idx on
isi.funding_org b  (cost=0.56..912582.50 rows=9835492 width=64)"
"                                Output: b.id, b.ut, b.go, b.gn"
"                    ->  Materialize  (cost=0.57..17914892.46
rows=159086560 width=26)"
"                          Output: d.id, d.rart_id, d.au, d.ro, d.ln,
d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"
"                          ->  Index Scan using rauthor_rart_id_idx on
isi.rauthor d  (cost=0.57..17517176.06 rows=159086560 width=26)"
"                                Output: d.id, d.rart_id, d.au, d.ro,
d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"

Any idea on why adding the rauthor table in the query is so problematic?

My systerm:

768 GB RAM
shared_ buffers:  32GB
work_mem:  4608MB

Regards
Johann

-- 
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

Reply via email to