On 5/25/17 2:26 PM, Dinesh Chandra 12108 wrote:
Hi Tomas,

Please find the below input for slow query.

(a) something about the hardware it's running on
  RAM-->64 GB, CPU->40core

(b) amounts of data in the tables / databases
Database size   :32GB
-----------------
Tables size
-----------------
Workflow.project                : 8194 byte
workflow.tool_performance       :175 MB
workflow.evidence_to_do :580 MB

(c) EXPLAIN or even better EXPLAIN ANALYZE of the query

"GroupAggregate  (cost=16583736169.63..18157894828.18 rows=5920110 width=69)"
"  ->  Sort  (cost=16583736169.63..16714893857.43 rows=52463075120 width=69)"
"        Sort Key: tool_performance.project_id, project.project_name, 
tool_performance.step_id, (date_trunc('day'::text, tool_performance.insert_time)), 
tool_performance.user_id"
"        ->  Nested Loop  (cost=2.42..787115179.07 rows=52463075120 width=69)"
"              ->  Seq Scan on evidence_to_do  (cost=0.00..119443.95 rows=558296 
width=0)"
"                    Filter: (status_id = ANY 
('{15100,15150,15200,15300,15400,15500}'::bigint[]))"
"              ->  Materialize  (cost=2.42..49843.24 rows=93970 width=69)"
"                    ->  Hash Join  (cost=2.42..49373.39 rows=93970 width=69)"
"                          Hash Cond: (tool_performance.project_id = 
project.project_id)"
"                          ->  Seq Scan on tool_performance  (cost=0.00..48078.88 
rows=93970 width=39)"
"                                Filter: ((insert_time > '2017-05-01 
00:00:00+05:30'::timestamp with time zone) AND (insert_time < '2017-05-02 
00:00:00+05:30'::timestamp with time zone))"
"                          ->  Hash  (cost=1.63..1.63 rows=63 width=38)"
"                                ->  Seq Scan on project  (cost=0.00..1.63 rows=63 
width=38)"


Are you sure this is the same query? The query you posted includes there two join conditions:

    evidence_to_do.project_id = tool_performance.project_id
    evidence_to_do.project_id = project.project_id

But the plan only seems to enforce the equality between 'project' and 'tool_performance'. So when joining the evidence_to_do, it performs a cartesian product, producing ~52B rows (estimated). That can't be fast.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to