Re: [PERFORM] [pgsql-hackers-win32] Poor Performance for large queries

John Meinel Wed, 29 Sep 2004 10:12:18 -0700

Tom Lane wrote:

[ enlarging on Richard's response a bit ]

John Meinel <[EMAIL PROTECTED]> writes:

jfmeinel=> explain analyze execute myget(30000);
                                                 QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on tdata  (cost=0.00..9773.10 rows=181923 width=4)
        (actual time=1047.000..1047.000 rows=0 loops=1)
   Filter: (project_id = $1)
 Total runtime: 1047.000 ms

jfmeinel=> explain analyze select id from tdata where project_id = 30000;
                                                         QUERY PLAN

-------------------------------------------------------------------------
Index Scan using tdata_project_id_idx on tdata (cost=0.00..4.20 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1)
Index Cond: (project_id = 30000)
Total runtime: 0.000 ms

So notice that when doing the actual select it is able to do the index query. But for some reason with a prepared statement, it is not able to do it.

This isn't a "can't do it" situation, it's a "doesn't want to do it"
situation, and it's got nothing whatever to do with null or not null.
The issue is the estimated row count, which in the first case is so high
as to make the seqscan approach look cheaper.  So the real question here
is what are the statistics on the column that are making the planner
guess such a large number when it has no specific information about the
compared-to value.  Do you have one extremely common value in the column?
Have you done an ANALYZE recently on the table, and if so can you show
us the pg_stats row for the column?

                        regards, tom lane

The answer is "yes" that particular column has very common numbers in it. Project id is a number from 1->21. I ended up modifying my query such that I do the bulk of the work in a regular UNION SELECT so that all that can be optimized, and then I later do another query for this row in an 'EXECUTE ...' so that unless I'm actually requesting a small number, the query planner can notice that it can do an indexed query.

I'm pretty sure this is just avoiding worst case scenario. Because it is true that if I use the number 18, it will return 500,000 rows. Getting those with an indexed lookup would be very bad. But typically, I'm doing numbers in a very different range, and so the planner was able to know that it would not likely find that number.

Thanks for pointing out what the query planner was thinking, I was able to work around it.

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] [pgsql-hackers-win32] Poor Performance for large queries

Reply via email to