Re: [PERFORM] OFFSET impact on Performance???

2005-01-26 Thread David Brown
Although larger offsets have some effect, your real problem is the sort 
(of 42693 rows).

Try:
SELECT r.id_order
FROM report r
WHERE r.id_order IN
  (SELECT id
  FROM orders
  WHERE id_status = 6
  ORDER BY 1
  LIMIT 10 OFFSET 1000)
ORDER BY 1
The subquery doesn't *have* to sort because the table is already ordered 
on the primary key.
You can still add a join to orders outside the subselect without 
significant cost.

Incidentally, I don't know how you got the first plan - it should 
include a sort as well.

Andrei Bintintan wrote:
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 10
> 
> Limit  (cost=44.37..88.75 rows=10 width=4) (actual time=0.160..0.275 
rows=10 loops=1)
>   ->  Merge Join  (cost=0.00..182150.17 rows=41049 width=4) (actual 
time=0.041..0.260 rows=20 loops=1)
> Merge Cond: ("outer".id_order = "inner".id)
> ->  Index Scan using report_id_order_idx on report r  
(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075 
rows=20 loops=1)
> ->  Index Scan using orders_pkey on orders o  
(cost=0.00..24127.04 rows=42501 width=4) (actual time=0.013..0.078 
rows=20 loops=1)
>   Filter: (id_status = 6)
> Total runtime: 0.373 ms
>
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 100
> Limit  (cost=31216.85..31216.85 rows=1 width=4) (actual 
time=1168.152..1168.152 rows=0 loops=1)
>   ->  Sort  (cost=31114.23..31216.85 rows=41049 width=4) (actual 
time=1121.769..1152.246 rows=42693 loops=1)
> Sort Key: o.id
> ->  Hash Join  (cost=2329.99..27684.03 rows=41049 width=4) 
(actual time=441.879..925.498 rows=42693 loops=1)
>   Hash Cond: ("outer".id_order = "inner".id)
>   ->  Seq Scan on report r  (cost=0.00..23860.62 
rows=42862 width=4) (actual time=38.634..366.035 rows=42864 loops=1)
>   ->  Hash  (cost=2077.74..2077.74 rows=42501 width=4) 
(actual time=140.200..140.200 rows=0 loops=1)
> ->  Seq Scan on orders o  (cost=0.00..2077.74 
rows=42501 width=4) (actual time=0.059..96.890 rows=42693 loops=1)
>   Filter: (id_status = 6)
> Total runtime: 1170.586 ms

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
Andrei:
Hi to all, 

I have the following 2 examples. Now, regarding on the offset if it is 
small(10) or big(>5) what is the impact on the performance of the query?? I 
noticed that if I return more data's(columns) or if I make more joins then the 
query runs even slower if the OFFSET is bigger. How can I 
somehow improve the performance on this? 

Merlin:
Offset is not suitable for traversal of large data sets.  Better not use it at 
all!

There are many ways to deal with this problem, the two most direct being the 
view approach and the cursor approach.

cursor approach:
declare report_order with hold cursor for select * from report r, order o [...]
Remember to close the cursor when you're done.  Now fetch time is proportional 
to the number of rows fetched, and should be very fast.  The major drawback to 
this approach is that cursors in postgres (currently) are always insensitive, 
so that record changes after you declare the cursor from other users are not 
visible to you.  If this is a big deal, try the view approach.

view approach:
create view report_order as select * from report r, order o [...]

and this:
prepare fetch_from_report_order(numeric, numeric, int4) as
select * from report_order where order_id >= $1 and
(order_id > $1 or report_id > $2)
order by order_id, report_id limit $3;

fetch next 1000 records from report_order:
execute fetch_from_report_order(o, f, 1000);  o and f being the last key values 
you fetched (pass in zeroes to start it off).

This is not quite as fast as the cursor approach (but it will be when we get a 
proper row constructor, heh), but it more flexible in that it is sensitive to 
changes from other users.  This is more of a 'permanent' binding whereas cursor 
is a binding around a particular task.

Good luck!
Merlin



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] OFFSET impact on Performance???

2005-01-20 Thread Andrei Bintintan




Hi to all, I have the following 2 examples. Now, 
regarding on the offset if it is small(10) or big(>5) what is the impact 
on the performance of the query?? I noticed that if I return more 
data's(columns) or if I make more joins then the query runs even 
slower if the OFFSET is bigger. How can I somehow improve the performance on 
this? 
Best regards, Andy.
explain analyzeSELECT 
o.idFROM 
report r INNER JOIN orders o ON 
o.id=r.id_order AND o.id_status=6ORDER BY 1 LIMIT 10 OFFSET 10
 
Limit  (cost=44.37..88.75 rows=10 width=4) 
(actual time=0.160..0.275 rows=10 loops=1)  ->  Merge 
Join  (cost=0.00..182150.17 rows=41049 width=4) (actual time=0.041..0.260 
rows=20 loops=1)    Merge Cond: 
("outer".id_order = "inner".id)    
->  Index Scan using report_id_order_idx on report r  
(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075 rows=20 
loops=1)    ->  Index Scan 
using orders_pkey on orders o  (cost=0.00..24127.04 rows=42501 width=4) 
(actual time=0.013..0.078 rows=20 
loops=1)  
Filter: (id_status = 6)Total runtime: 0.373 ms

explain analyzeSELECT 
o.idFROM 
report r INNER JOIN orders o ON 
o.id=r.id_order AND o.id_status=6ORDER BY 1 LIMIT 10 OFFSET 100Limit  (cost=31216.85..31216.85 rows=1 width=4) (actual 
time=1168.152..1168.152 rows=0 loops=1)  ->  Sort  
(cost=31114.23..31216.85 rows=41049 width=4) (actual time=1121.769..1152.246 
rows=42693 loops=1)    Sort Key: 
o.id    ->  Hash Join  
(cost=2329.99..27684.03 rows=41049 width=4) (actual time=441.879..925.498 
rows=42693 
loops=1)  
Hash Cond: ("outer".id_order = 
"inner".id)  
->  Seq Scan on report r  (cost=0.00..23860.62 rows=42862 width=4) 
(actual time=38.634..366.035 rows=42864 
loops=1)  
->  Hash  (cost=2077.74..2077.74 rows=42501 width=4) (actual 
time=140.200..140.200 rows=0 
loops=1)    
->  Seq Scan on orders o  (cost=0.00..2077.74 rows=42501 width=4) 
(actual time=0.059..96.890 rows=42693 
loops=1)  
Filter: (id_status = 6)Total runtime: 1170.586 
ms