Re: [PERFORM] Odd planner choice?

Gary Doades Fri, 08 Oct 2004 12:46:37 -0700

Oops, forgot to mention:

PostgreSQL 8.0 beta 2 Windows.


Thanks,
Gary.

On 8 Oct 2004 at 20:32, Gary Doades wrote:

> 
> I'm looking at one of my standard queries and have encountered some strange 
> performance 
> problems.
> 
> The query below is to search for vacant staff member date/time slots given a series 
> of target 
> date/times. The data contained in the booking_plan/staff_booking tables contain the 
> existing 
> bookings, so I'm looking for "clashing" bookings to eliminate them from a candidate 
> list.
> 
> The query is:
> 
> select distinct b.staff_id from staff_booking b, booking_plan bp, 
> t_search_reqt_dates rd
> where b.booking_id = bp.booking_id
> and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
> AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
> and rd.search_id = 13
> and rd.reqt_date between '2004-09-30' AND '2005-12-31'
> 
> There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in 
> t_search_reqt_dates.
> 
> The t_search reqt_dates is a temp table created and populated with the target 
> date/times. The 
> temp table is *not* analyzed, all the other are.
> 
> The "good" query plan comes with the criteria on search_id and reqt_date given in 
> the last two 
> lines in the query. Note all the rows in the temp table are search_id = 13 and all 
> the rows are 
> between the two dates, so the whole 26 rows is always pulled out.
> 
> In this case it is doing exactly what I expect. It is pulling all rows from the 
> t_search_reqt_dates 
> table, then pulling the relevant records from the booking_plan and then hashing with 
> staff_booking. Excellent performance.
> 
> The problem is I don't need the clauses for search_id and reqt_dates as the whole 
> table is 
> always read anyway. The good plan is because the planner thinks just one row will be 
> read from 
> t_search_reqt_dates.
> 
> If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
> the table, 
> not unreasonable since it has no statistics. But *why* in that case, with *more* 
> estimated rows 
> does it choose to materialize that table (26 rows) 573416 times!!!
> 
> whenever it estimates more than one row it chooses the bad plan.
> 
> I really want to remove the redundant clauses, but I can't. If I analyse the table, 
> then it knows 
> there are 26 rows and chooses the "bad" plan whatever I do.
> 
> Any ideas???
> 
> Cheers,
> Gary.
> 
> -------------------- Plans for above query ------------------------
> 
> Good QUERY PLAN
> Unique (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 
> rows=110 loops=1)
>  -> Sort (cost=15440.83..15444.37 rows=7081 width=4) (actual time=1342.000..1342.000 
> rows=2173 loops=1)
>  Sort Key: b.staff_id
>  -> Hash Join (cost=10784.66..15350.26 rows=7081 width=4) (actual 
> time=601.000..1331.000 rows=2173 loops=1)
>  Hash Cond: ("outer".booking_id = "inner".booking_id)
>  -> Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual 
> time=0.000..400.000 rows=197877 loops=1)
>  -> Hash (cost=10781.12..10781.12 rows=7080 width=4) (actual 
> time=591.000..591.000 rows=0 loops=1)
>  -> Nested Loop (cost=0.00..10781.12 rows=7080 width=4) (actual 
> time=10.000..581.000 rows=2173 loops=1)
>  Join Filter: (("outer".datetime_from <= "inner".datetime_to) AND 
> ("outer".datetime_to >= "inner".datetime_from))
>  -> Seq Scan on t_search_reqt_dates rd (cost=0.00..16.50 rows=1 width=20) 
> (actual time=0.000..0.000 rows=26 loops=1)
>  Filter: ((search_id = 13) AND (reqt_date >= '2004-09-30'::date) AND 
> (reqt_date <= '2005-12-31'::date))
>  -> Index Scan using booking_plan_idx2 on booking_plan bp 
> (cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 
> loops=26)
>  Index Cond: ((bp.booking_date >= ("outer".reqt_date - 1)) AND 
> (bp.booking_date <= ("outer".reqt_date + 1)))
> Total runtime: 1342.000 ms
> 
> 
> Bad QUERY PLAN
> Unique (cost=7878387.29..7885466.50 rows=462 width=4) (actual 
> time=41980.000..41980.000 
> rows=110 loops=1)
>  -> Sort (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual 
> time=41980.000..41980.000 rows=2173 loops=1)
>  Sort Key: b.staff_id
>  -> Nested Loop (cost=5314.32..7480762.73 rows=7079211 width=4) (actual 
> time=6579.000..41980.000 rows=2173 loops=1)
>  Join Filter: (("inner".datetime_from <= "outer".datetime_to) AND 
> ("inner".datetime_to >= 
> "outer".datetime_from) AND ("outer".booking_date >= ("inner".reqt_date - 1)) AND 
> ("outer".booking_date <= ("inner".reqt_date + 1)))
>  -> Hash Join (cost=5299.32..26339.73 rows=573416 width=24) (actual 
> time=2413.000..7832.000 rows=573416 loops=1)
>  Hash Cond: ("outer".booking_id = "inner".booking_id)
>  -> Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) 
> (actual time=0.000..1201.000 rows=573416 loops=1)
>  -> Hash (cost=4233.39..4233.39 rows=197877 width=8) (actual 
> time=811.000..811.000 rows=0 loops=1)
>  -> Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) 
> (actual time=0.000..430.000 rows=197877 loops=1)
>  -> Materialize (cost=15.00..20.00 rows=1000 width=20) (actual time=0.001..0.019 
> rows=26 loops=573416)
>  -> Seq Scan on t_search_reqt_dates rd (cost=0.00..15.00 rows=1000 width=20) 
> (actual time=0.000..0.000 rows=26 loops=1)
> Total runtime: 41980.000 ms
> 



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: [PERFORM] Odd planner choice?

Reply via email to