Hello:
I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?
I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:
-
SELECT answers.*
FROM answers
JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
JOIN courses ON courses.course_id = test_completions.course_id
WHERE courses.group_id = 2;
-
This yields the following plan:
-
Nested Loop (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
-> Hash Join (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
Hash Cond: (test_completions.course_id =
courses.course_id)
-> Seq Scan on test_completions (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
-> Hash (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
Buckets: 4096 Batches: 1 Memory Usage:
161kB
-> Bitmap Heap Scan on courses
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
Recheck Cond: (group_id = 2)
Heap Blocks: exact=117
-> Bitmap Index Scan on
fki_courses_group_id_fkey (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
Index Cond:
(group_id = 2)
### HERE ###
-> Index Scan using fki_answers_test_completion_id_fkey on
answers (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
### HERE ###
Index Cond: (test_completion_id =
test_completions.test_completion_id)
Planning time: 0.523 ms
Execution time: 2805.530 ms
-
My doubt is about the inner loop of the nested loop, the one that I've
delimited with ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:
* Clustering the "answers" table.
* Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
* Upgrading the hardware.
Thank you!
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance