Re: [PERFORM] enable_XXX options

2005-04-01 Thread Tom Lane
Peterson, Bjorn [EMAIL PROTECTED] writes:
 I have a query in my application that takes an unreasonable amount of time
 to complete (4.5 hours execution time).   After reviewing the EXPLAIN and
 EXPLAIN ANALYZE output for that and similar queries, my colleagues and I
 determined that turning off the enable_nestloop option might help - we
 noticed dramatic speed improvements for that specific query after doing so
 (2 minutes execution time).  I was warned not to mess with the enable_XXX
 options in a production environment, but does anyone see any problem with
 turning off the enable_nestloop option right before executing my query and
 turning it back on afterwards?

That's what it's there for ... but it would be useful to look into why
the planner gets it so wrong without that hint.  Could we see EXPLAIN
ANALYZE both ways?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] enable_XXX options

2005-04-01 Thread Tom Lane
Peterson, Bjorn [EMAIL PROTECTED] writes:
 That's what it's there for ... but it would be useful to look into why
 the planner gets it so wrong without that hint.  Could we see EXPLAIN
 ANALYZE both ways?

 Below is my query and the output of EXPLAIN - I was not able to run EXPLAIN
 ANALYZE, as the query never completes unless we turn enable_nestloop off:

Well, when the point is to find out why the planner's estimates don't
match reality, it's difficult to learn anything by looking only at the
estimates and not at reality.

Given what you say about the table sizes, the planner's preferred plan
looks somewhat reasonable.  I think the weak spot is the assumption that
this index check will be fast:

   -  Index Scan using Attendance_pkey on
 Attendance a  (cost=0.00..4.37 rows=1 width=14)
 Index Cond: ((a.attendance_date = now())
 AND (a.attendance_type = 1) AND (outer.course_id = a.time_slot) AND
 (a.user_id = outer.user_id))

and the reason this seems like a weak spot is that the plan implies that
you made attendance_date be the first column in the index.  At least for
this query, it'd be far better for attendance_date to be the last
column, so that the info for any one user_id is bunched together in the
index.  For that matter I'd bet that attendance_type shouldn't be the
highest part of the key either --- either course_id or user_id should
probably be the leading key, depending on what sorts of queries you do.
It wouldn't matter for this query, but you should look to see if you
have other queries that select on only one of the two.

If you have both equalities and inequalities in an index condition, you
always want the equalities to be on the higher-order keys.  Otherwise
the scan will involve wasted scanning over index entries that match
only some of the conditions.  (Think about the ordering of a multicolumn
index to see why this is so.)  In this particular case I think the thing
will be scanning almost the whole index every time :-(

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster