Re: [PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread negora

  
  
Hi Matheus:
Thanks for your prompt answer. It's for a web application. This
  part of the application allows to export the answers to a CSV
  file. So pagination isn't possible here. The user can choose among
  several filters. The group of the courses is one of them. She can
  combine as many filters as she wants. So the query that I
  presented in my previous message was one of the "broadest"
  examples. But it's the one that I'm interested in.

Really, I'm more interested in the relative time than in the
absolute time. Because I could create the file
  asynchronously, in the background, so that the user downloaded it
  at a later time. That's not the problem. My doubt is if 2.8
  seconds is the best that I can do. Is it an acceptable time?
Thank you! ;)


On 19/10/16 13:15, Matheus de Oliveira
  wrote:


  

  On Wed, Oct 19, 2016 at 8:54 AM,
negora 
wrote:

  
    Nested Loop  (cost=245.92..383723.28 rows=7109606
width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)

  
  

I wonder about the use-case for this
  query, because it returns more than 8M rows, so 2.6 seconds
  that sounds that much for so many rows. Is it for an end user
  application? Isn't there any kind of pagination?


  
  -- 
  

  Matheus de Oliveira


  

  

  


  




Re: [PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread Matheus de Oliveira
On Wed, Oct 19, 2016 at 8:54 AM, negora  wrote:

> Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
> time=1.091..2616.553 rows=8906075 loops=1)
>

I wonder about the use-case for this query, because it returns more than 8M
rows, so 2.6 seconds that sounds that much for so many rows. Is it for an
end user application? Isn't there any kind of pagination?


-- 
Matheus de Oliveira


[PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread negora
Hello:

I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?

I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:

-
SELECT answers.*
FROM answers
JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
JOIN courses ON courses.course_id = test_completions.course_id
WHERE courses.group_id = 2;
-


This yields the following plan:

-
Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
->  Hash Join  (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
Hash Cond: (test_completions.course_id =
courses.course_id)
->  Seq Scan on test_completions  (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
->  Hash  (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
Buckets: 4096  Batches: 1  Memory Usage:
161kB
->  Bitmap Heap Scan on courses 
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
Recheck Cond: (group_id = 2)
Heap Blocks: exact=117
->  Bitmap Index Scan on
fki_courses_group_id_fkey  (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
Index Cond:
(group_id = 2)
   ### HERE ###
->  Index Scan using fki_answers_test_completion_id_fkey on
answers  (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
   ### HERE ###
Index Cond: (test_completion_id =
test_completions.test_completion_id)
Planning time: 0.523 ms
Execution time: 2805.530 ms
-

My doubt is about the inner loop of the nested loop, the one that I've
delimited with  ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:

* Clustering the "answers" table.
* Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
* Upgrading the hardware.

Thank you!



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance