I have a query where the LIMIT clause is incorrectly reducing the cost
of the query. This results in queries doing LIMIT m run much longer
when m is small (e.g. 1-3) than if m is larger. (PG 9.0) (And yes,
ANALYZE was run and, no, increasing stats_target for the columns
doesn't help).

Notice that the total predicted cost is a small fraction of the cost
of the IndexScan. It's a small cost because we assume that the LIMIT
reduces the cost of the query in a linear manner, which just ain't so.

Limit  (cost=0.00..8320.49 rows=25 width=28) (actual
time=0.024..239.044 rows=25 loops=1)
  ->  Index Scan Backward using blah_pkey on blah
(cost=0.00..4112652.70 rows=12357 width=28) (actual
time=0.022..239.009 rows=25 loops=1)
        Index Cond: (blah_id <= 72572020)
        Filter: (user_id = ANY ('{....list....}'::integer[]))

What we are assuming is that if doing the whole scan would reveal N
rows, then we only need to do m/N fraction of the scan per output row.
So if the Limit is very small then this means that the cost of this
plan looks really low. Low enough that it beats the plan you might
expect to see, which is an IndexScan or a BitMapIndexScan on user_id.

This inappropriate assumption is causing bad plans on the two worst
query types on this large database, so its not just an idle annoyance.
I've also seen this before in other cases at other release levels. I'm
not reporting this because I need help finding a solution, but because
its a clear bug. So its not a topic for the performance list.

Other examples

Limit 1-3    Total runtime: 3394.080 ms
example plan

Limit  (cost=0.00..1719.89 rows=2 width=1757) (actual
time=3394.000..3394.000 rows=0 loops=1)
  ->  Nested Loop  (cost=0.00..769649.42 rows=895 width=1757) (actual
time=3393.998..3393.998 rows=0 loops=1)
        ->  Seq Scan on user  (cost=0.00..763051.97 rows=895
width=1607) (actual time=3393.996..3393.996 rows=0 loops=1)
              Filter: ((user_id <> ALL ('{...about 10
users...}'::integer[])) AND (thing_id = ANY ('{array of
        ->  Index Scan using auth_user_pkey on auth_user
(cost=0.00..7.36 rows=1 width=150) (never executed)
              Index Cond: (id = user.user_id)

Limit 4+   Total runtime: 2.132 ms

Limit  (cost=3052.13..3100.82 rows=4 width=1757) (actual
time=2.053..2.053 rows=0 loops=1)
  ->  Nested Loop  (cost=3052.13..13946.44 rows=895 width=1757)
(actual time=2.050..2.050 rows=0 loops=1)
        ->  Bitmap Heap Scan on user  (cost=3052.13..7348.98 rows=895
width=1607) (actual time=2.048..2.048 rows=0 loops=1)
              Recheck Cond: (thing_id = ANY ('{...lots...}'::bigint[]))
              Filter: (user_id <> ALL ('{...~10 users...}'::integer[]))
              ->  Bitmap Index Scan on user_thing_id
(cost=0.00..3051.90 rows=895 width=0) (actual time=2.026..2.026 rows=6
                    Index Cond: (thing_id = ANY ('{...lots...}'::bigint[]))
        ->  Index Scan using auth_user_pkey on auth_user
(cost=0.00..7.36 rows=1 width=150) (never executed)
              Index Cond: (id = user.user_id)

2000x slower is enough for me to call this a bug, rather than
euphemistically saying "this is sub-optimal".

So there are a few problems:

1. We assume that all values exist within the table. There is no
measure of sparsity. Any value between min and max is assumed to exist
and is assigned the averaged selectivity for a non-MFV.

2. We assume that if values do exist that they have rows uniformly
distributed across the whole table like rungs on a ladder. So the
fewer rows you want the cheaper it is to access them. This mistake is
made any time we propagate the LIMIT clause onto a SeqScan.

3. We take no account of the potential for error in the plan. The plan
chosen is very risky, but has a lower cost. It would be better to
favour less risky plans for most cases.

So (1) leads us to make an overestimate of the selectivity, then we
use (2) to deduce that the overall cost is therefore incredibly low.
So the "worst case" selectivity actually leads us to making a
best-case assumption: that rows are frequent and also uniformly

In terms of proposed solutions, (2) is the only one that seems able to
be altered with any ease. If we have a reasonably large selectivity,
its OK to assume that we don't have to scan much of a table before we
find a matching row. But taking that assumption to extremes causes
these bad plans.

Any time we apply a LIMIT clause to a plan with a SeqScan or
unqualified IndexScan, we shouldn't assume the scan will do less than
say 10% of the table. It might, but its an unsafe assumption because
as the selectivity decreases so does the safety of the assumption that
rows are uniformly distributed. So basically put a clamp on the
ability of this assumption to scale downwards when m/N is very small.

We could also improve on (1) for some data types. Some measure of
sparsity could easily be gained by taking NDistinct/(Max - Min) for
integer types. That is useful because integers are commonly used as
keys and so some special attention there is not unwarranted.

Other ideas welcome.

 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to