Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
On 8 Oct 2004 at 16:04, Tom Lane wrote:

> "Gary Doades" <[EMAIL PROTECTED]> writes:
> > If I remove the redundant clauses, the planner now estimates 1000 rows returned 
> > from 
> > the table, not unreasonable since it has no statistics. But *why* in that case, 
> > with *more* 
> > estimated rows does it choose to materialize that table (26 rows) 573416 times!!!
> 
> It isn't.  It's materializing that once and scanning it 573416 times,
> once for each row in the outer relation.  And this is not a bad plan
> given the estimates.  If it had stuck to what you call the good plan,
> and there *had* been 1000 rows in the temp table, that plan would have
> run 1000 times longer than it did.
> 
> As a general rule, if your complaint is that you get a bad plan for an
> unanalyzed table, the response is going to be "so analyze the table".
> 

The problem is in this case is that if I *do* analyse the table I *always* get the bad 
plan. 
Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't 
choose the better plan by just reading the 26 rows once and index scan the 
booking_plan table 26 times (as in the "good" plan).

OK, with 1000 row estimate I can see that index scanning 1000 times into the 
booking_plan table would take some time, but the even if planner estimates 5 rows it 
still 
produces the same slow query.

If I analyze the table it then knows there are 26 rows and therefore always goes slow.

This is why I am not analyzing this table, to fool the planner into thinking there is 
only 
one row and produce a much faster access plan. Not ideal I know.

Just using one redundant clause I now get:

select distinct b.staff_id  from staff_booking b, booking_plan bp, t_search_reqt_dates 
rd
where b.booking_id = bp.booking_id
and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13

QUERY PLAN
Unique  (cost=50885.97..50921.37 rows=462 width=4) (actual 
time=35231.000..35241.000 rows=110 loops=1)
  ->  Sort  (cost=50885.97..50903.67 rows=35397 width=4) (actual 
time=35231.000..35241.000 rows=2173 loops=1)
Sort Key: b.staff_id
->  Hash Join  (cost=44951.32..50351.07 rows=35397 width=4) (actual 
time=34530.000..35231.000 rows=2173 loops=1)
  Hash Cond: ("outer".booking_id = "inner".booking_id)
  ->  Seq Scan on staff_booking b  (cost=0.00..4233.39 rows=197877 
width=8) 
(actual time=0.000..351.000 rows=197877 loops=1)
  ->  Hash  (cost=44933.62..44933.62 rows=35397 width=4) (actual 
time=34530.000..34530.000 rows=0 loops=1)
->  Nested Loop  (cost=15.50..44933.62 rows=35397 width=4) (actual 
time=8342.000..34520.000 rows=2173 loops=1)
  Join Filter: (("inner".datetime_from <= "outer".datetime_to) 
AND 
("inner".datetime_to >= "outer".datetime_from) AND ("outer".booking_date >= 
("inner".reqt_date - 1)) AND ("outer".booking_date <= ("inner".reqt_date + 1)))
  ->  Seq Scan on booking_plan bp  (cost=0.00..7646.08 
rows=573416 
width=24) (actual time=0.000..1053.000 rows=573416 loops=1)
  ->  Materialize  (cost=15.50..15.53 rows=5 width=20) (actual 
time=0.001..0.019 rows=26 loops=573416)
->  Seq Scan on t_search_reqt_dates rd  
(cost=0.00..15.50 rows=5 
width=20) (actual time=0.000..0.000 rows=26 loops=1)
  Filter: (search_id = 13)
Total runtime: 35241.000 ms

If this is the only answer for now, then fair enough I will just have to do more 
testing.

Regards,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Tom Lane
"Gary Doades" <[EMAIL PROTECTED]> writes:
> If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
> the table, not unreasonable since it has no statistics. But *why* in that case, with 
> *more* 
> estimated rows does it choose to materialize that table (26 rows) 573416 times!!!

It isn't.  It's materializing that once and scanning it 573416 times,
once for each row in the outer relation.  And this is not a bad plan
given the estimates.  If it had stuck to what you call the good plan,
and there *had* been 1000 rows in the temp table, that plan would have
run 1000 times longer than it did.

As a general rule, if your complaint is that you get a bad plan for an
unanalyzed table, the response is going to be "so analyze the table".

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
Oops, forgot to mention:

PostgreSQL 8.0 beta 2 Windows.

Thanks,
Gary.

On 8 Oct 2004 at 20:32, Gary Doades wrote:

> 
> I'm looking at one of my standard queries and have encountered some strange 
> performance 
> problems.
> 
> The query below is to search for vacant staff member date/time slots given a series 
> of target 
> date/times. The data contained in the booking_plan/staff_booking tables contain the 
> existing 
> bookings, so I'm looking for "clashing" bookings to eliminate them from a candidate 
> list.
> 
> The query is:
> 
> select distinct b.staff_id from staff_booking b, booking_plan bp, 
> t_search_reqt_dates rd
> where b.booking_id = bp.booking_id
> and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
> AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
> and rd.search_id = 13
> and rd.reqt_date between '2004-09-30' AND '2005-12-31'
> 
> There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in 
> t_search_reqt_dates.
> 
> The t_search reqt_dates is a temp table created and populated with the target 
> date/times. The 
> temp table is *not* analyzed, all the other are.
> 
> The "good" query plan comes with the criteria on search_id and reqt_date given in 
> the last two 
> lines in the query. Note all the rows in the temp table are search_id = 13 and all 
> the rows are 
> between the two dates, so the whole 26 rows is always pulled out.
> 
> In this case it is doing exactly what I expect. It is pulling all rows from the 
> t_search_reqt_dates 
> table, then pulling the relevant records from the booking_plan and then hashing with 
> staff_booking. Excellent performance.
> 
> The problem is I don't need the clauses for search_id and reqt_dates as the whole 
> table is 
> always read anyway. The good plan is because the planner thinks just one row will be 
> read from 
> t_search_reqt_dates.
> 
> If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
> the table, 
> not unreasonable since it has no statistics. But *why* in that case, with *more* 
> estimated rows 
> does it choose to materialize that table (26 rows) 573416 times!!!
> 
> whenever it estimates more than one row it chooses the bad plan.
> 
> I really want to remove the redundant clauses, but I can't. If I analyse the table, 
> then it knows 
> there are 26 rows and chooses the "bad" plan whatever I do.
> 
> Any ideas???
> 
> Cheers,
> Gary.
> 
>  Plans for above query 
> 
> Good QUERY PLAN
> Unique (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 
> rows=110 loops=1)
>  -> Sort (cost=15440.83..15444.37 rows=7081 width=4) (actual time=1342.000..1342.000 
> rows=2173 loops=1)
>  Sort Key: b.staff_id
>  -> Hash Join (cost=10784.66..15350.26 rows=7081 width=4) (actual 
> time=601.000..1331.000 rows=2173 loops=1)
>  Hash Cond: ("outer".booking_id = "inner".booking_id)
>  -> Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual 
> time=0.000..400.000 rows=197877 loops=1)
>  -> Hash (cost=10781.12..10781.12 rows=7080 width=4) (actual 
> time=591.000..591.000 rows=0 loops=1)
>  -> Nested Loop (cost=0.00..10781.12 rows=7080 width=4) (actual 
> time=10.000..581.000 rows=2173 loops=1)
>  Join Filter: (("outer".datetime_from <= "inner".datetime_to) AND 
> ("outer".datetime_to >= "inner".datetime_from))
>  -> Seq Scan on t_search_reqt_dates rd (cost=0.00..16.50 rows=1 width=20) 
> (actual time=0.000..0.000 rows=26 loops=1)
>  Filter: ((search_id = 13) AND (reqt_date >= '2004-09-30'::date) AND 
> (reqt_date <= '2005-12-31'::date))
>  -> Index Scan using booking_plan_idx2 on booking_plan bp 
> (cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 
> loops=26)
>  Index Cond: ((bp.booking_date >= ("outer".reqt_date - 1)) AND 
> (bp.booking_date <= ("outer".reqt_date + 1)))
> Total runtime: 1342.000 ms
> 
> 
> Bad QUERY PLAN
> Unique (cost=7878387.29..7885466.50 rows=462 width=4) (actual 
> time=41980.000..41980.000 
> rows=110 loops=1)
>  -> Sort (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual 
> time=41980.000..41980.000 rows=2173 loops=1)
>  Sort Key: b.staff_id
>  -> Nested Loop (cost=5314.32..7480762.73 rows=7079211 width=4) (actual 
> time=6579.000..41980.000 rows=2173 loops=1)
>  Join Filter: (("inner".datetime_from <= "outer".datetime_to) AND 
> ("inner".datetime_to >= 
> "outer".datetime_from) AND ("outer".booking_date >= ("inner".reqt_date - 1)) AND 
> ("outer".booking_date <= ("inner".reqt_date + 1)))
>  -> Hash Join (cost=5299.32..26339.73 rows=573416 width=24) (actual 
> time=2413.000..7832.000 rows=573416 loops=1)
>  Hash Cond: ("outer".booking_id = "inner".booking_id)
>  -> Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) 
> (actual time=0.000..1201.000 rows=573416 loops=1)
>  -> Hash (cost=4233.39..4233.39 rows=197877 width=8) (actual 
> time=811.000..811.000 rows=0 loops=1)
>  -> Seq Scan

[PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades



I'm looking at one of my standard queries and have encountered some strange 
performance problems.


The query below is to search for vacant staff member date/time slots given a series of 
target date/times. The data contained in the booking_plan/staff_booking tables contain 
the existing bookings, so I'm looking for "clashing" bookings to eliminate them from a 
candidate list.


The query is:


select distinct b.staff_id  from staff_booking b, booking_plan bp, t_search_reqt_dates rd
where b.booking_id = bp.booking_id
and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13
and rd.reqt_date between '2004-09-30' AND '2005-12-31'


There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in 
t_search_reqt_dates.


The t_search reqt_dates is a temp table created and populated with the target 
date/times. The temp table is *not* analyzed, all the other are.


The "good" query plan comes with the criteria on search_id and reqt_date given in the 
last two lines in the query. Note all the rows in the temp table are search_id = 13 and all 
the rows are between the two dates, so the whole 26 rows is always pulled out.


In this case it is doing exactly what I expect. It is pulling all rows from the 
t_search_reqt_dates table, then pulling the relevant records from the booking_plan and 
then hashing with staff_booking. Excellent performance.


The problem is I don't need the clauses for search_id and reqt_dates as the whole table 
is always read anyway. The good plan is because the planner thinks just one row will be 
read from t_search_reqt_dates.


If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
the table, not unreasonable since it has no statistics. But *why* in that case, with *more* 
estimated rows does it choose to materialize that table (26 rows) 573416 times!!!


whenever it estimates more than one row it chooses the bad plan.


I really want to remove the redundant clauses, but I can't. If I analyse the table, then it 
knows there are 26 rows and chooses the "bad" plan whatever I do.


Any ideas???


Cheers,
Gary.


 Plans for above query 


Good QUERY PLAN
Unique  (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 
rows=110 loops=1)
  ->  Sort  (cost=15440.83..15444.37 rows=7081 width=4) (actual 
time=1342.000..1342.000 rows=2173 loops=1)
    Sort Key: b.staff_id
    ->  Hash Join  (cost=10784.66..15350.26 rows=7081 width=4) 
(actual 
time=601.000..1331.000 rows=2173 loops=1)
  Hash Cond: ("outer".booking_id 
= "inner".booking_id)
  ->  Seq Scan on staff_booking 
b  (cost=0.00..4233.39 rows=197877 width=8) 
(actual time=0.000..400.000 rows=197877 loops=1)
  ->  Hash  (cost=10781.12..10781.12 
rows=7080 width=4) (actual 
time=591.000..591.000 rows=0 loops=1)
    
->  Nested Loop  (cost=0.00..10781.12 rows=7080 width=4) (actual 
time=10.000..581.000 rows=2173 loops=1)
  
Join Filter: (("outer".datetime_from <= "inner".datetime_to) AND 
("outer".datetime_to >= "inner".datetime_from))
  
->  Seq Scan on t_search_reqt_dates rd  (cost=0.00..16.50 rows=1 
width=20) (actual time=0.000..0.000 rows=26 loops=1)
    
Filter: ((search_id = 13) AND (reqt_date >= '2004-09-30'::date) AND 
(reqt_date <= '2005-12-31'::date))
  
->  Index Scan using booking_plan_idx2 on booking_plan bp 
(cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 
loops=26)
    
Index Cond: ((bp.booking_date >= ("outer".reqt_date - 1)) AND 
(bp.booking_date <= ("outer".reqt_date + 1)))
Total runtime: 1342.000 ms




Bad QUERY PLAN
Unique  (cost=7878387.29..7885466.50 rows=462 width=4) (actual 
time=41980.000..41980.000 rows=110 loops=1)
  ->  Sort  (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual 
time=41980.000..41980.000 rows=2173 loops=1)
    Sort Key: b.staff_id
    ->  Nested Loop  (cost=5314.32..7480762.73 rows=7079211 
width=4) (actual 
time=6579.000..41980.000 rows=2173 loops=1)
  Join Filter: (("inner".datetime_from 
<= "outer".datetime_to) AND 
("inner".datetime_to >= "outer".datetime_from) AND ("outer".booking_date >= 
("inner".reqt_date - 1)) AND ("outer".booking_date <= ("inner".reqt_date + 1)))
  ->  Hash Join  
(cost=5299.32..26339.73 rows=573416 width=24) (actual 
time=2413.000..7832.000 rows=573416 loops=1)
    
Hash Cond: ("outer".booking_id = "inner".booking_id)
    
->  Seq Scan on booking_plan bp  (cost=0.00..7646.08 rows=573416 
width=24) (actual time=0.000..1201.000 rows=573416 loops=1)
    
->  Hash  (cost=4233.39..4233.39 rows=197877 width=8) (actual 
time=811.000..811.000 rows=0 loops=1)
 

Re: [PERFORM] odd planner choice

2004-03-26 Thread scott.marlowe
On Thu, 25 Mar 2004, Ara Anjargolian wrote:

> I've run into this odd planner choice which I don't quite understand.
> 
> I have two tables articles, users and
> articles.article_id and users.user_id are primary keys.
> 
> Insides articles there are two optional fields author_id1, author_id2
> which all reference users.user_id.
> 
> And now the plans:
> (by the way this is pg 7.4 and I set enable_seqscan to off).
> 
> jargol=# explain select user_id, first_names, last_name from articles, users
> where article_id = 5027 and (articles.author_id1 = users.user_id);
> QUERY PLAN
> 
> --
>  Nested Loop  (cost=0.00..4.04 rows=1 width=26)
>->  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
> width=4)
>  Index Cond: (article_id = 5027)
>->  Index Scan using users_pk on users  (cost=0.00..2.01 rows=1 width=26)
>  Index Cond: ("outer".author_id1 = users.user_id)
> (5 rows)
> 
> jargol=# explain select user_id, first_names, last_name from articles, users
> where article_id = 5027 and (articles.author_id1 = users.user_id or
> articles.author_id2 = users.user_id);
> QUERY PLAN
> 
> ---
>  Nested Loop  (cost=1.00..10003.11 rows=2 width=26)
>Join Filter: (("outer".author_id1 = "inner".user_id) OR
> ("outer".author_id2 = "inner".user_id))
>->  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
> width=8)
>  Index Cond: (article_id = 5027)
>->  Seq Scan on users  (cost=1.00..10001.04 rows=4 width=26)
> (5 rows)
> 
> Why does it think it MUST do a seq-scan in the second case? users.user_id is
> a primary key,
> so shouldn't it behave exactly as in the first case?
> 
> Any enlightenment on this problem will be much appreciated.

Are articles.author_id1 and users.user_id the same type?  Have you tried 
casting one to the other's type if they're different?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] odd planner choice

2004-03-26 Thread Tom Lane
"Ara Anjargolian" <[EMAIL PROTECTED]> writes:
> jargol=# explain select user_id, first_names, last_name from articles, users
> where article_id = 5027 and (articles.author_id1 = users.user_id or
> articles.author_id2 = users.user_id);

> Why does it think it MUST do a seq-scan in the second case?

There's no support for generating an OR indexscan in the context of a
join.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] odd planner choice

2004-03-26 Thread Ara Anjargolian
I've run into this odd planner choice which I don't quite understand.

I have two tables articles, users and
articles.article_id and users.user_id are primary keys.

Insides articles there are two optional fields author_id1, author_id2
which all reference users.user_id.

And now the plans:
(by the way this is pg 7.4 and I set enable_seqscan to off).

jargol=# explain select user_id, first_names, last_name from articles, users
where article_id = 5027 and (articles.author_id1 = users.user_id);
QUERY PLAN

--
 Nested Loop  (cost=0.00..4.04 rows=1 width=26)
   ->  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
width=4)
 Index Cond: (article_id = 5027)
   ->  Index Scan using users_pk on users  (cost=0.00..2.01 rows=1 width=26)
 Index Cond: ("outer".author_id1 = users.user_id)
(5 rows)

jargol=# explain select user_id, first_names, last_name from articles, users
where article_id = 5027 and (articles.author_id1 = users.user_id or
articles.author_id2 = users.user_id);
QUERY PLAN

---
 Nested Loop  (cost=1.00..10003.11 rows=2 width=26)
   Join Filter: (("outer".author_id1 = "inner".user_id) OR
("outer".author_id2 = "inner".user_id))
   ->  Index Scan using articles_pk on articles  (cost=0.00..2.01 rows=1
width=8)
 Index Cond: (article_id = 5027)
   ->  Seq Scan on users  (cost=1.00..10001.04 rows=4 width=26)
(5 rows)

Why does it think it MUST do a seq-scan in the second case? users.user_id is
a primary key,
so shouldn't it behave exactly as in the first case?

Any enlightenment on this problem will be much appreciated.

thanks,
Ara Anjargolian


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings