Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
Oops, forgot to mention:

PostgreSQL 8.0 beta 2 Windows.

Thanks,
Gary.

On 8 Oct 2004 at 20:32, Gary Doades wrote:

 
 I'm looking at one of my standard queries and have encountered some strange 
 performance 
 problems.
 
 The query below is to search for vacant staff member date/time slots given a series 
 of target 
 date/times. The data contained in the booking_plan/staff_booking tables contain the 
 existing 
 bookings, so I'm looking for clashing bookings to eliminate them from a candidate 
 list.
 
 The query is:
 
 select distinct b.staff_id from staff_booking b, booking_plan bp, 
 t_search_reqt_dates rd
 where b.booking_id = bp.booking_id
 and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from
 AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
 and rd.search_id = 13
 and rd.reqt_date between '2004-09-30' AND '2005-12-31'
 
 There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in 
 t_search_reqt_dates.
 
 The t_search reqt_dates is a temp table created and populated with the target 
 date/times. The 
 temp table is *not* analyzed, all the other are.
 
 The good query plan comes with the criteria on search_id and reqt_date given in 
 the last two 
 lines in the query. Note all the rows in the temp table are search_id = 13 and all 
 the rows are 
 between the two dates, so the whole 26 rows is always pulled out.
 
 In this case it is doing exactly what I expect. It is pulling all rows from the 
 t_search_reqt_dates 
 table, then pulling the relevant records from the booking_plan and then hashing with 
 staff_booking. Excellent performance.
 
 The problem is I don't need the clauses for search_id and reqt_dates as the whole 
 table is 
 always read anyway. The good plan is because the planner thinks just one row will be 
 read from 
 t_search_reqt_dates.
 
 If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
 the table, 
 not unreasonable since it has no statistics. But *why* in that case, with *more* 
 estimated rows 
 does it choose to materialize that table (26 rows) 573416 times!!!
 
 whenever it estimates more than one row it chooses the bad plan.
 
 I really want to remove the redundant clauses, but I can't. If I analyse the table, 
 then it knows 
 there are 26 rows and chooses the bad plan whatever I do.
 
 Any ideas???
 
 Cheers,
 Gary.
 
  Plans for above query 
 
 Good QUERY PLAN
 Unique (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 
 rows=110 loops=1)
  - Sort (cost=15440.83..15444.37 rows=7081 width=4) (actual time=1342.000..1342.000 
 rows=2173 loops=1)
  Sort Key: b.staff_id
  - Hash Join (cost=10784.66..15350.26 rows=7081 width=4) (actual 
 time=601.000..1331.000 rows=2173 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual 
 time=0.000..400.000 rows=197877 loops=1)
  - Hash (cost=10781.12..10781.12 rows=7080 width=4) (actual 
 time=591.000..591.000 rows=0 loops=1)
  - Nested Loop (cost=0.00..10781.12 rows=7080 width=4) (actual 
 time=10.000..581.000 rows=2173 loops=1)
  Join Filter: ((outer.datetime_from = inner.datetime_to) AND 
 (outer.datetime_to = inner.datetime_from))
  - Seq Scan on t_search_reqt_dates rd (cost=0.00..16.50 rows=1 width=20) 
 (actual time=0.000..0.000 rows=26 loops=1)
  Filter: ((search_id = 13) AND (reqt_date = '2004-09-30'::date) AND 
 (reqt_date = '2005-12-31'::date))
  - Index Scan using booking_plan_idx2 on booking_plan bp 
 (cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 
 loops=26)
  Index Cond: ((bp.booking_date = (outer.reqt_date - 1)) AND 
 (bp.booking_date = (outer.reqt_date + 1)))
 Total runtime: 1342.000 ms
 
 
 Bad QUERY PLAN
 Unique (cost=7878387.29..7885466.50 rows=462 width=4) (actual 
 time=41980.000..41980.000 
 rows=110 loops=1)
  - Sort (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual 
 time=41980.000..41980.000 rows=2173 loops=1)
  Sort Key: b.staff_id
  - Nested Loop (cost=5314.32..7480762.73 rows=7079211 width=4) (actual 
 time=6579.000..41980.000 rows=2173 loops=1)
  Join Filter: ((inner.datetime_from = outer.datetime_to) AND 
 (inner.datetime_to = 
 outer.datetime_from) AND (outer.booking_date = (inner.reqt_date - 1)) AND 
 (outer.booking_date = (inner.reqt_date + 1)))
  - Hash Join (cost=5299.32..26339.73 rows=573416 width=24) (actual 
 time=2413.000..7832.000 rows=573416 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  - Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) 
 (actual time=0.000..1201.000 rows=573416 loops=1)
  - Hash (cost=4233.39..4233.39 rows=197877 width=8) (actual 
 time=811.000..811.000 rows=0 loops=1)
  - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) 
 (actual time=0.000..430.000 rows=197877 loops=1)
  - Materialize (cost=15.00..20.00 rows=1000 width=20) (actual 

Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
On 8 Oct 2004 at 16:04, Tom Lane wrote:

 Gary Doades [EMAIL PROTECTED] writes:
  If I remove the redundant clauses, the planner now estimates 1000 rows returned 
  from 
  the table, not unreasonable since it has no statistics. But *why* in that case, 
  with *more* 
  estimated rows does it choose to materialize that table (26 rows) 573416 times!!!
 
 It isn't.  It's materializing that once and scanning it 573416 times,
 once for each row in the outer relation.  And this is not a bad plan
 given the estimates.  If it had stuck to what you call the good plan,
 and there *had* been 1000 rows in the temp table, that plan would have
 run 1000 times longer than it did.
 
 As a general rule, if your complaint is that you get a bad plan for an
 unanalyzed table, the response is going to be so analyze the table.
 

The problem is in this case is that if I *do* analyse the table I *always* get the bad 
plan. 
Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't 
choose the better plan by just reading the 26 rows once and index scan the 
booking_plan table 26 times (as in the good plan).

OK, with 1000 row estimate I can see that index scanning 1000 times into the 
booking_plan table would take some time, but the even if planner estimates 5 rows it 
still 
produces the same slow query.

If I analyze the table it then knows there are 26 rows and therefore always goes slow.

This is why I am not analyzing this table, to fool the planner into thinking there is 
only 
one row and produce a much faster access plan. Not ideal I know.

Just using one redundant clause I now get:

select distinct b.staff_id  from staff_booking b, booking_plan bp, t_search_reqt_dates 
rd
where b.booking_id = bp.booking_id
and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13

QUERY PLAN
Unique  (cost=50885.97..50921.37 rows=462 width=4) (actual 
time=35231.000..35241.000 rows=110 loops=1)
  -  Sort  (cost=50885.97..50903.67 rows=35397 width=4) (actual 
time=35231.000..35241.000 rows=2173 loops=1)
Sort Key: b.staff_id
-  Hash Join  (cost=44951.32..50351.07 rows=35397 width=4) (actual 
time=34530.000..35231.000 rows=2173 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  -  Seq Scan on staff_booking b  (cost=0.00..4233.39 rows=197877 
width=8) 
(actual time=0.000..351.000 rows=197877 loops=1)
  -  Hash  (cost=44933.62..44933.62 rows=35397 width=4) (actual 
time=34530.000..34530.000 rows=0 loops=1)
-  Nested Loop  (cost=15.50..44933.62 rows=35397 width=4) (actual 
time=8342.000..34520.000 rows=2173 loops=1)
  Join Filter: ((inner.datetime_from = outer.datetime_to) 
AND 
(inner.datetime_to = outer.datetime_from) AND (outer.booking_date = 
(inner.reqt_date - 1)) AND (outer.booking_date = (inner.reqt_date + 1)))
  -  Seq Scan on booking_plan bp  (cost=0.00..7646.08 
rows=573416 
width=24) (actual time=0.000..1053.000 rows=573416 loops=1)
  -  Materialize  (cost=15.50..15.53 rows=5 width=20) (actual 
time=0.001..0.019 rows=26 loops=573416)
-  Seq Scan on t_search_reqt_dates rd  
(cost=0.00..15.50 rows=5 
width=20) (actual time=0.000..0.000 rows=26 loops=1)
  Filter: (search_id = 13)
Total runtime: 35241.000 ms

If this is the only answer for now, then fair enough I will just have to do more 
testing.

Regards,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] odd planner choice

2004-03-26 Thread Ara Anjargolian
I've run into this odd planner choice which I don't quite understand.

I have two tables articles, users and
articles.article_id and users.user_id are primary keys.

Insides articles there are two optional fields author_id1, author_id2
which all reference users.user_id.

And now the plans:
(by the way this is pg 7.4 and I set enable_seqscan to off).

jargol=# explain select user_id, first_names, last_name from articles, users
where article_id = 5027 and (articles.author_id1 = users.user_id);
QUERY PLAN

--
 Nested Loop  (cost=0.00..4.04 rows=1 width=26)
   -  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
width=4)
 Index Cond: (article_id = 5027)
   -  Index Scan using users_pk on users  (cost=0.00..2.01 rows=1 width=26)
 Index Cond: (outer.author_id1 = users.user_id)
(5 rows)

jargol=# explain select user_id, first_names, last_name from articles, users
where article_id = 5027 and (articles.author_id1 = users.user_id or
articles.author_id2 = users.user_id);
QUERY PLAN

---
 Nested Loop  (cost=1.00..10003.11 rows=2 width=26)
   Join Filter: ((outer.author_id1 = inner.user_id) OR
(outer.author_id2 = inner.user_id))
   -  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
width=8)
 Index Cond: (article_id = 5027)
   -  Seq Scan on users  (cost=1.00..10001.04 rows=4 width=26)
(5 rows)

Why does it think it MUST do a seq-scan in the second case? users.user_id is
a primary key,
so shouldn't it behave exactly as in the first case?

Any enlightenment on this problem will be much appreciated.

thanks,
Ara Anjargolian


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] odd planner choice

2004-03-26 Thread Tom Lane
Ara Anjargolian [EMAIL PROTECTED] writes:
 jargol=# explain select user_id, first_names, last_name from articles, users
 where article_id = 5027 and (articles.author_id1 = users.user_id or
 articles.author_id2 = users.user_id);

 Why does it think it MUST do a seq-scan in the second case?

There's no support for generating an OR indexscan in the context of a
join.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] odd planner choice

2004-03-26 Thread scott.marlowe
On Thu, 25 Mar 2004, Ara Anjargolian wrote:

 I've run into this odd planner choice which I don't quite understand.
 
 I have two tables articles, users and
 articles.article_id and users.user_id are primary keys.
 
 Insides articles there are two optional fields author_id1, author_id2
 which all reference users.user_id.
 
 And now the plans:
 (by the way this is pg 7.4 and I set enable_seqscan to off).
 
 jargol=# explain select user_id, first_names, last_name from articles, users
 where article_id = 5027 and (articles.author_id1 = users.user_id);
 QUERY PLAN
 
 --
  Nested Loop  (cost=0.00..4.04 rows=1 width=26)
-  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
 width=4)
  Index Cond: (article_id = 5027)
-  Index Scan using users_pk on users  (cost=0.00..2.01 rows=1 width=26)
  Index Cond: (outer.author_id1 = users.user_id)
 (5 rows)
 
 jargol=# explain select user_id, first_names, last_name from articles, users
 where article_id = 5027 and (articles.author_id1 = users.user_id or
 articles.author_id2 = users.user_id);
 QUERY PLAN
 
 ---
  Nested Loop  (cost=1.00..10003.11 rows=2 width=26)
Join Filter: ((outer.author_id1 = inner.user_id) OR
 (outer.author_id2 = inner.user_id))
-  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
 width=8)
  Index Cond: (article_id = 5027)
-  Seq Scan on users  (cost=1.00..10001.04 rows=4 width=26)
 (5 rows)
 
 Why does it think it MUST do a seq-scan in the second case? users.user_id is
 a primary key,
 so shouldn't it behave exactly as in the first case?
 
 Any enlightenment on this problem will be much appreciated.

Are articles.author_id1 and users.user_id the same type?  Have you tried 
casting one to the other's type if they're different?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org