Re: [PERFORM] Interesting query plan change linked to the LIMIT parameter

Yannick Le Guédart Wed, 21 Jan 2009 05:50:51 -0800

Thanks for the rapid response.

I can understand the way the planner makes its guess, but as a matter of
fact, he'll be nearly always wrong, just becausethe most commented articles
have only around 5000 or so comments.  I ran the explain  analyze tonight
and got this results :


EXPLAIN ANALYZE SELECT _comment.id,
>         (get_comment_response(_comment.id)).id AS r_id
> FROM   _comment
> INNER JOIN _article
>         ON _article.id = _comment.parent_id
> WHERE  _comment.parent_id = '17355952'
> ORDER BY _comment.id ASC
> OFFSET 0
> LIMIT 1;
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..3588.42 rows=1 width=8) (actual
> time=498597.115..498597.116 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..28524312.40 rows=7949 width=8) (actual
> time=498597.114..498597.114 rows=1 loops=1)
>          ->  Index Scan using _comment_pkey on _comment
> (cost=0.00..28448324.73 rows=7949 width=16) (actual
> time=498473.360..498473.360 rows=1 loops=1)
>                Filter: (parent_id = 17355952::bigint)
>          ->  Index Scan using _article_pkey on _article  (cost=0.00..9.55
> rows=1 width=8) (actual time=63.465..63.465 rows=1 loops=1)
>                Index Cond: (_article.id = 17355952::bigint)
>  Total runtime: 498615.230 ms
> (7 rows)
>
> EXPLAIN ANALYZE SELECT _comment.id,
>         (get_comment_response(_comment.id)).id AS r_id
> FROM   _comment
> INNER JOIN _article
>         ON _article.id = _comment.parent_id
> WHERE  _comment.parent_id = '17355952'
> ORDER BY _comment.id ASC
> OFFSET 0
> LIMIT 1000;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=10261.19..10263.69 rows=1000 width=8) (actual
> time=127.037..127.267 rows=1000 loops=1)
>    ->  Sort  (cost=10261.19..10281.06 rows=7949 width=8) (actual
> time=127.036..127.128 rows=1000 loops=1)
>          Sort Key: _comment.id
>          Sort Method:  top-N heapsort  Memory: 95kB
>          ->  Nested Loop  (cost=0.00..9825.35 rows=7949 width=8) (actual
> time=0.472..122.986 rows=4674 loops=1)
>                ->  Index Scan using _article_pkey on _article
> (cost=0.00..9.55 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=1)
>                      Index Cond: (id = 17355952::bigint)
>                ->  Index Scan using idx_comment_parent_id on _comment
> (cost=0.00..9716.44 rows=7949 width=16) (actual time=0.235..32.869 rows=4674
> loops=1)
>                      Index Cond: (_comment.parent_id = 17355952::bigint)
>  Total runtime: 127.410 ms
> (10 rows)
>

As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our
case, LIMIT 2).

Yannick.

2009/1/20 David Wilson <david.t.wil...@gmail.com>

> On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart
> <yann...@over-blog.com> wrote:
>
> >
> > The second query scans the whole comment table which is very dangerous
> for
> > production servers.
>
> That's not quite true. The second does an index scan- the planner
> seems to be guessing that it'll fulfill the required limit early in
> the index scan; only with a pathologically bad case would it actually
> have to scan the entire thing. Basically, the second query is
> optimized to spit out the first few rows quickly, since that's all you
> asked for with the limit.
>
> Note that your first query has a final cost estimate of "Limit
> (cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated
> 10261.19 to emit the first row; the second has "Limit
> (cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically,
> instant) to emit the first - and only desired - row.
>
> That all said, an explain analyze would give us a better idea of
> what's going on- we can't tell if the planner is making bad estimates
> without the knowledge of what the real timing and row count results of
> plan stages were.
>
>
> --
> - David T. Wilson
> david.t.wil...@gmail.com
>

Re: [PERFORM] Interesting query plan change linked to the LIMIT parameter

Reply via email to