On Fri, May 30, 2008 at 02:23:46AM +0930, Shane Ambler wrote:
Chris Shoemaker wrote:
[Attn list-queue maintainers: Please drop the earlier version
of this email that I accidentally sent from an unsubscribed address. ]
Hi,
I'm having a strange problem with a slow-running select query. The
query I use in production ends in LIMIT 1, and it runs very slowly.
But when I remove the LIMIT 1, the query runs quite quickly. This
behavior has stumped a couple smart DBAs.
Can anyone explain why such a slow plan is chosen when the LIMIT 1
is present? Is there anything I can do to speed this query up?
Thanks.
From what I know using an ORDER BY and a LIMIT can often prevent
*shortening* the query as it still needs to find all rows to perform the
order by before it limits.
That makes complete sense, of course.
The difference in plans eludes me.
production= EXPLAIN ANALYZE SELECT event_updates.*
FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id =
calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
ORDER BY event_updates.id DESC
LIMIT 1;
Does removing the DESC from the order by give the same variation in plans?
Or is this only when using ORDER BY ... DESC LIMIT 1?
Except for using Index Scan instead of Index Scan Backward, the plan
is the same with ORDER BY ... or ORDER BY ... ASC as with ORDER BY
... DESC. In case you're wondering what would happen without the
ORDER BY at all:
production= EXPLAIN SELECT event_updates.*
FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
LIMIT 1;
QUERY
PLAN
Limit (cost=0.00..3.95 rows=1 width=2752)
- Nested Loop (cost=0.00..36992.38 rows=9362 width=2752)
- Nested Loop (cost=0.00..16.55 rows=1 width=8)
- Index Scan using
index_calendar_links_on_calendar_group_id_and_source_tracker_id on
calendar_links (cost=0.00..8.27 rows=1 width=4)
Index Cond: (calendar_group_id = 3640)
- Index Scan using harvest_trackers_pkey on calendars
(cost=0.00..8.27 rows=1 width=4)
Index Cond: (calendars.id =
calendar_links.source_tracker_id)
- Index Scan using index_event_updates_on_feed_id_and_feed_type on
event_updates (cost=0.00..36858.50 rows=9386 width=2752)
Index Cond: (event_updates.feed_id = calendars.id)
(9 rows)
One thing that interests me is try -
EXPLAIN ANALYZE SELECT * FROM (
SELECT event_updates.*
FROM event_updates
INNER JOIN calendars ON event_updates.feed_id = calendars.id
INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
WHERE (calendar_links.calendar_group_id = 3640)
ORDER BY event_updates.id DESC
) AS foo
LIMIT 1;
That's an interesting experiment. Here are the results:
QUERY PLAN
Limit (cost=16.55..91.74 rows=1 width=6027) (actual
time=490709.355..490709.357 rows=1 loops=1)
- Nested Loop (cost=16.55..703794.95 rows=9361 width=2752) (actual
time=490709.352..490709.352 rows=1 loops=1)
Join Filter: (event_updates.feed_id = calendars.id)
- Index Scan Backward using event_updates_pkey on event_updates
(cost=0.00..500211.53 rows=9047416 width=2752) (actual time=0.222..469082.071
rows=5251179 loops=1)
- Materialize (cost=16.55..16.56 rows=1 width=8) (actual
time=0.001..0.002 rows=1 loops=5251179)
- Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual
time=0.240..0.246 rows=1 loops=1)
- Index Scan using
index_calendar_links_on_calendar_group_id_and_source_tracker_id on
calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.108..0.109
rows=1 loops=1)
Index Cond: (calendar_group_id = 3640)
- Index Scan using harvest_trackers_pkey on calendars
(cost=0.00..8.27 rows=1 width=4) (actual time=0.127..0.129 rows=1 loops=1)
Index Cond: (calendars.id =
calendar_links.source_tracker_id)
Total runtime: 490709.576 ms
(11 rows)
That is, no real change in the performance.
Still stumped