Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-02-02 Thread Sean Davis
On Jan 26, 2005, at 5:36 AM, Leeuw van der, Tim wrote:
Hi,
What you could do is create a table containing all the fields from 
your SELECT, plus a per-session unique ID. Then you can store the 
query results in there, and use SELECT with OFFSET / LIMIT on that 
table. The WHERE clause for this temp-results table only needs to 
contain the per-session unique id.

This is what I do, but I use two columns for indexing the original 
query, a user_id (not session-id) and an index to the "query_id" that 
is unique within user.  This "query_id" is a foreign key to another 
table that describes the query (often just a name).  I allow the user 
only a fixed number of "stored" queries and recycle after hitting the 
maximum.  You can timestamp your queries so that when you recycle you 
drop the oldest one first.  If you don't need multiple stored query 
results, then using the user_id is probably adequate (assuming the user 
is not logged on in several locations simultaneously).

This of course gives you a new problem: cleaning stale data out of the 
temp-results table. And another new problem is that users will not see 
new data appear on their screen until somehow the query is re-run (... 
but that might even be desirable, actually, depending on how your 
users do their work and what their work is).

See above.  The query refresh issue remains.
And of course better performance cannot be guaranteed until you try it.
For the standard operating procedure of perform query===>view results, 
I have found this to be a nice system.  The user is accustomed to 
queries taking a bit of time to perform, but then wants to be able to 
manipulate and view data rather quickly; this paradigm is pretty well 
served by making a separate table of results, particularly if the 
original query is costly.


Would such a scheme give you any hope of improved performance, or 
would it be too much of a nightmare?

This question still applies
Sean
-Original Message-
From: [EMAIL PROTECTED] on behalf of Andrei 
Bintintan
Sent: Wed 1/26/2005 11:11 AM
To: [EMAIL PROTECTED]; Greg Stark
Cc: Richard Huxton; pgsql-sql@postgresql.org; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???

The problems still stays open.
The thing is that I have about 20 - 30 clients that are using that SQL 
query
where the offset and limit are involved. So, I cannot create a temp 
table,
because that means that I'll have to make a temp table for each 
session...
which is a very bad ideea. Cursors somehow the same. In my application 
the
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or 
to
write a more complex where function to limit the results output. So no
replace for Offset/Limit.

Best regards,
Andy.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-02-01 Thread PFC

As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions.  With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem.  It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.
Looks like the solutions to this problem is not RDBMS IMHO.
Alex Turner
NetEconomist
	Did you miss the proposal to store arrays of the found rows id's in a  
"cache" table ? Is 4 bytes per result row still too large ?

	If it's still too large, you can still implement the same cache in the  
filesystem !
	If you want to fetch 100.000 rows containing just an integer, in my case  
(psycopy) it's a lot faster to use an array aggregate. Time to get the  
data in the application (including query) :

select id from temp
=> 849 ms
select int_array_aggregate(id) as ids from temp
=> 300 ms
	So you can always fetch the whole wuery results (in the form of an  
integer per row) and cache it in the filesystem. It won't work if you have  
10 million rows though !

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Kevin Brown
PFC wrote:
> 
> Supposing your searches display results which are rows coming from one  
> specific table, you could create a cache table :
> 
> search_id serial primary key
> index_n   position of this result in the global result set
> result_id id of the resulting row.
> 
> Then, making a search with 50k results would INSERT INTO cache ... SELECT  
> FROM search query, with a way to set the index_n column, which can be a  
> temporary sequence...
> 
> Then to display your pages, SELECT from your table with index_n BETWEEN so  
> and so, and join to the data table.

This is a nice way of doing a fast materialized view.  But it looked
to me like one of the requirements of the original poster is that the
result set being displayed has to be "current" as of the page display
time.  If inserts to the original table have been committed between
the time the current page was displayed and "now", the next page view
is supposed to show them.  That basically means rerunning the query
that was used to build the cache table.

But perhaps the original poster is willing to live with the idea that
new rows won't show up in the result set, as long as updates show up
(because the cache table is just a fancy index) and deletes "work"
(because the join against the data table will show only rows that are
common between both).


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Greg Stark

"Merlin Moncure" <[EMAIL PROTECTED]> writes:

> what about 
> CREATE AGGREGATE array_accum (
> sfunc = array_append,
> basetype = anyelement,
> stype = anyarray,
> initcond = '{}'
> );

huh, that is faster. It's only 14x slower than the C implementation.

For completeness, here are the fastest times I get after repeating a few times
each:

 13.97 ms   contrib/intagg C implementation
194.76 ms   aggregate using array_append
723.15 ms   aggregate with SQL state function

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread PFC
What you want is called UNNEST. It didn't get done in time for 8.0. But  
if
what you have is an array of integers the int_array_enum() function I  
quoted
in the other post is basically that.
Yes, I used it, thanks. That's what I wanted. The query plans are good.
You don't really need the int_array_aggregate function any more. You can  
write
your own aggregate using the new array operators:
Of course it's about 50x slower than the C implementation though:
Heh. I'll keep using int_array_aggregate ;)
Have a nice day.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Merlin Moncure
Greg Stark wrote:
 
> test=> create or replace function array_push (anyarray, anyelement)
> returns anyarray as 'select $1 || $2' language sql immutable strict;
> CREATE FUNCTION
> test=> create aggregate array_aggregate (basetype=anyelement,
> sfunc=array_push, stype=anyarray, initcond = '{}');
> CREATE AGGREGATE

what about 
CREATE AGGREGATE array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
?
Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Greg Stark

PFC <[EMAIL PROTECTED]> writes:

>   intset(x) seems to be like array[x] ?
>   Actually what I want is the opposite. 

What you want is called UNNEST. It didn't get done in time for 8.0. But if
what you have is an array of integers the int_array_enum() function I quoted
in the other post is basically that.

> Note that int_agg_final_array() crashes my postgres, see my message in
> psql/general

You don't really need the int_array_aggregate function any more. You can write
your own aggregate using the new array operators:

test=> create or replace function array_push (anyarray, anyelement) returns 
anyarray as 'select $1 || $2' language sql immutable strict;
CREATE FUNCTION
test=> create aggregate array_aggregate (basetype=anyelement, sfunc=array_push, 
stype=anyarray, initcond = '{}');
CREATE AGGREGATE

Of course it's about 50x slower than the C implementation though:

test=> select icount(array_aggregate (foo_id)) from foo;
 icount 

  15127
(1 row)

Time: 688.419 ms

test=> select icount(int_array_aggregate (foo_id)) from foo;
 icount 

  15127
(1 row)

Time: 13.680 ms

(And no, that's not a caching artifact; the whole table is cached for both
trials)

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread PFC

	However, it seems that integer && integer[] does not exist :
Try intset(id) && int[]. intset is an undocumented function :)
I'm going to add intset() to README.

SELECT * FROM table WHERE id && int[]
	Mm.
	intset(x) seems to be like array[x] ?
	Actually what I want is the opposite. I have a btree index on an integer  
column ; I wanted to use this index and not create a functional index...  
which is why I wanted to use =ANY(). If I had a gist index on an integer  
array column, I would of course use what you suggest, but this is not the  
case...

Anyway I think the SRF function solution works well, I like it.
	Note that int_agg_final_array() crashes my postgres, see my message in  
psql/general

Regards,
Pierre
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Merlin Moncure
> Actually, you can if you assume you can "temporarily materialize" that
> view.
>
> Then, you use a join on my_query to pull the bits you want:
> 
>   select [big table.details] from [big table],
>   [select * from my_query order by [something] offset 280 limit 20]
>   where [join criteria between my_query and big table]
>   order by [something];
> 

I think that's a pretty reasonable compromise between a true
materialized solution and brute force limit/offset.  Still, the
performance of a snapshot materialized view indexed around your query
simply can't be beat, although you have to pay a hefty price in
complexity, maintenance, and coherency.

Merlin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Oleg Bartunov
On Thu, 27 Jan 2005, PFC wrote:

for example, 
http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
see OPERATIONS and EXAMPLE USAGE:

	Thanks, I already know this documentation and have used intarray 
before (I find it absolutely fabulous in the right application, it has a 
great potential for getting out of tight situations which would involve huge 
unmanageable pivot or attributes tables). Its only drawback is that the gist 
index creation time is slow and sometimes just... takes forever until the 
disk is full.
	However, it seems that integer && integer[] does not exist :
Try intset(id) && int[]. intset is an undocumented function :)
I'm going to add intset() to README.

SELECT * FROM table WHERE id && int[]
explain analyze select * from temp t where id && ( 
'{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
ERREUR:  L'operateur n'existe pas : integer && integer[]
ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments. 
Vous devez ajouter des conversions explicites de type.

	I have already used this type of intarray indexes, but you have to 
create a special gist index with the gist__int_ops on the column, and the 
column has to be an array. In my case the column is just a SERIAL PRIMARY 
KEY, and should stay this way, and I don't want to create a functional index 
in array[id] just for this feature ; so I guess I can't use the && operator. 
Am I mistaken ? My index is the standard btree here.
		It would be nice if the =ANY() could use the index just like 
IN does ; besides at planning time the length of the array is known which 
makes it behave quite just like IN().

	So I'll use either an EXECUTE'd plpgsql-generated query (IN ()) , 
which I don't like because it's a kludge ; or this other solution which I 
find more elegant :

CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
   RETURNS SETOF INTEGERRETURNS NULL ON NULL INPUT 
LANGUAGE plpgsqlAS
$$
DECLARE
	_data	ALIAS FOR $1;
	_i		INTEGER;
BEGIN
	FOR _i IN 1..icount(_data) LOOP
		RETURN NEXT _data[_i];
	END LOOP;
	RETURN;
END;
$$;

---
explain analyze select * from temp t where id =ANY( 
'{1,2,3,4,5,6,7,8,9,10,11,12}' );
Seq Scan on "temp" t  (cost=0.00..5165.52 rows=65536 width=8) (actual 
time=0.030..173.319 rows=12 loops=1)
  Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
Total runtime: 173.391 ms

---
explain analyze select * from temp t where id IN( 1,2,3,4,5,6,7,8,9,10,11,12 
);
Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, 
temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey on "temp" t  (cost=0.00..36.49 rows=12 width=8) (actual 
time=0.046..0.137 rows=12 loops=1)
  Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5) OR 
(id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11) OR (id 
= 12))
Total runtime: 0.292 ms

---
explain analyze select * from temp t where id in (select * from 
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
Nested Loop  (cost=15.00..620.20 rows=200 width=8) (actual time=0.211..0.368 
rows=12 loops=1)
  ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual 
time=0.160..0.173 rows=12 loops=1)
->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000 width=4) 
(actual time=0.127..0.139 rows=12 loops=1)
  ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..3.01 rows=1 
width=8) (actual time=0.010..0.012 rows=1 loops=12)
Index Cond: (t.id = "outer".array_srf)
Total runtime: 0.494 ms

---
explain analyze select * from temp t, (select * from 
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf = 
t.id;

Merge Join  (cost=62.33..2824.80 rows=1000 width=12) (actual 
time=0.215..0.286 rows=12 loops=1)
  Merge Cond: ("outer".id = "inner".array_srf)
  ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..2419.79 
rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
  ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.169..0.173 
rows=12 loops=1)
Sort Key: array_srf.array_srf
->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000 width=4) 
(actual time=0.127..0.139 rows=12 loops=1)
Total runtime: 0.391 ms

---
Note that the meaning is different ; the IN removes duplicates in the array 
but the join does not.


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread PFC

for example,  
http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
see OPERATIONS and EXAMPLE USAGE:

	Thanks, I already know this documentation and have used intarray before  
(I find it absolutely fabulous in the right application, it has a great  
potential for getting out of tight situations which would involve huge  
unmanageable pivot or attributes tables). Its only drawback is that the  
gist index creation time is slow and sometimes just... takes forever until  
the disk is full.
	However, it seems that integer && integer[] does not exist :

SELECT * FROM table WHERE id && int[]
explain analyze select * from temp t where id &&  
( '{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
ERREUR:  L'operateur n'existe pas : integer && integer[]
ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments.  
Vous devez ajouter des conversions explicites de type.

	I have already used this type of intarray indexes, but you have to create  
a special gist index with the gist__int_ops on the column, and the column  
has to be an array. In my case the column is just a SERIAL PRIMARY KEY,  
and should stay this way, and I don't want to create a functional index in  
array[id] just for this feature ; so I guess I can't use the && operator.  
Am I mistaken ? My index is the standard btree here.
	
	It would be nice if the =ANY() could use the index just like IN does ;  
besides at planning time the length of the array is known which makes it  
behave quite just like IN().

	So I'll use either an EXECUTE'd plpgsql-generated query (IN ()) ,  
which I don't like because it's a kludge ; or this other solution which I  
find more elegant :

CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
RETURNS SETOF INTEGERRETURNS NULL ON NULL INPUT 
LANGUAGE plpgsqlAS
$$
DECLARE
	_data	ALIAS FOR $1;
	_i		INTEGER;
BEGIN
	FOR _i IN 1..icount(_data) LOOP
		RETURN NEXT _data[_i];
	END LOOP;
	RETURN;
END;
$$;

---
explain analyze select * from temp t where id  
=ANY( '{1,2,3,4,5,6,7,8,9,10,11,12}' );
 Seq Scan on "temp" t  (cost=0.00..5165.52 rows=65536 width=8) (actual  
time=0.030..173.319 rows=12 loops=1)
   Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
 Total runtime: 173.391 ms

---
explain analyze select * from temp t where id  
IN( 1,2,3,4,5,6,7,8,9,10,11,12 );
 Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,  
temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey on "temp" t  (cost=0.00..36.49 rows=12 width=8) (actual  
time=0.046..0.137 rows=12 loops=1)
   Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5)  
OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11)  
OR (id = 12))
 Total runtime: 0.292 ms

---
explain analyze select * from temp t where id in (select * from  
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
 Nested Loop  (cost=15.00..620.20 rows=200 width=8) (actual  
time=0.211..0.368 rows=12 loops=1)
   ->  HashAggregate  (cost=15.00..15.00 rows=200 width=4) (actual  
time=0.160..0.173 rows=12 loops=1)
 ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000  
width=4) (actual time=0.127..0.139 rows=12 loops=1)
   ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..3.01 rows=1  
width=8) (actual time=0.010..0.012 rows=1 loops=12)
 Index Cond: (t.id = "outer".array_srf)
 Total runtime: 0.494 ms

---
explain analyze select * from temp t, (select * from  
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf =  
t.id;

 Merge Join  (cost=62.33..2824.80 rows=1000 width=12) (actual  
time=0.215..0.286 rows=12 loops=1)
   Merge Cond: ("outer".id = "inner".array_srf)
   ->  Index Scan using temp_pkey on "temp" t  (cost=0.00..2419.79  
rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual  
time=0.169..0.173 rows=12 loops=1)
 Sort Key: array_srf.array_srf
 ->  Function Scan on array_srf  (cost=0.00..12.50 rows=1000  
width=4) (actual time=0.127..0.139 rows=12 loops=1)
 Total runtime: 0.391 ms

---
Note that the meaning is different ; the IN removes duplicates in the  
array but the join does not.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Greg Stark
Oleg Bartunov  writes:

> On Thu, 27 Jan 2005, PFC wrote:
> 
> >
> > > > beware that SELECT * FROM table WHERE id =ANY( array ) won't use an 
> > > > index,

> > > contrib/intarray provides index access to such queries.
> >
> > Can you provide an example of such a query ? I've looked at the operators
> > for intarray without finding it.
> 
> for example,
> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
> see OPERATIONS and EXAMPLE USAGE:
> 
> SELECT * FROM table WHERE id && int[]

I don't think that helps him. He wants the join to the *other* table to use an
index. It would be nice if the IN plan used an index for =ANY(array) just like
it does for =ANY(subquery) but I'm not sure the statistics are there. It might
not be a bad plan to just assume arrays are never going to be millions of
elements long though. 

There is a way to achieve this using "int_array_enum" from another contrib
module, "intagg". My current project uses something similar to this except the
arrays are precomputed. When I went to 7.4 the new array support obsoleted
everything else I was using from the "intagg" and "array" contrib moduels
except for this one instance where intagg is still necessary.

It is a bit awkward but it works:

slo=> EXPLAIN 
   SELECT * 
 FROM foo 
 JOIN (SELECT int_array_enum(foo_ids) AS foo_id 
 FROM cache 
WHERE cache_id = 1) AS x
USING (foo_id) ;
QUERY PLAN  
  
--
 Nested Loop  (cost=0.00..6.40 rows=1 width=726)
   ->  Subquery Scan x  (cost=0.00..3.18 rows=1 width=4)
 ->  Index Scan using idx_cache on cache  (cost=0.00..3.17 rows=1 
width=30)
   Index Cond: (cache_id = 1)
   ->  Index Scan using foo_pkey on foo  (cost=0.00..3.21 rows=1 width=726)
 Index Cond: (foo.foo_id = "outer".foo_id)
(6 rows)


(query and plan edited for clarity and for paranoia purposes)


-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Oleg Bartunov
On Thu, 27 Jan 2005, PFC wrote:

	The best part is that you can skip the LIMIT/OFFSET entirely if you 
put page numbers in your cache table while inserting into it, via a 
temporary sequence or something. Retrieving the results will then be very 
fast, but beware that SELECT * FROM table WHERE id =ANY( array ) won't use 
an index, so
contrib/intarray provides index access to such queries.
	Can you provide an example of such a query ? I've looked at the 
operators for intarray without finding it.
for example, 
http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
see OPERATIONS and EXAMPLE USAGE:

SELECT * FROM table WHERE id && int[]

Thanks.
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread PFC

	The best part is that you can skip the LIMIT/OFFSET entirely if you  
put page numbers in your cache table while inserting into it, via a  
temporary sequence or something. Retrieving the results will then be  
very fast, but beware that SELECT * FROM table WHERE id =ANY( array )  
won't use an index, so
contrib/intarray provides index access to such queries.
	Can you provide an example of such a query ? I've looked at the operators  
for intarray without finding it.
	Thanks.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread Oleg Bartunov
On Thu, 27 Jan 2005, PFC wrote:

Thats a really good idea, just store a list of the sorted ids in the
temp table - small amount of data for insert... I like it!
Alex Turner
NetEconomist
	The best part is that you can skip the LIMIT/OFFSET entirely if you 
put page numbers in your cache table while inserting into it, via a temporary 
sequence or something. Retrieving the results will then be very fast, but 
beware that SELECT * FROM table WHERE id =ANY( array ) won't use an index, so
contrib/intarray provides index access to such queries.
you'll have to trick the thing by generating a query with IN, or joining 
against a SRF returning the elements of the array one by one, which might be 
better.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
 http://archives.postgresql.org
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-27 Thread PFC

Thats a really good idea, just store a list of the sorted ids in the
temp table - small amount of data for insert... I like it!
Alex Turner
NetEconomist
	The best part is that you can skip the LIMIT/OFFSET entirely if you put  
page numbers in your cache table while inserting into it, via a temporary  
sequence or something. Retrieving the results will then be very fast, but  
beware that SELECT * FROM table WHERE id =ANY( array ) won't use an index,  
so you'll have to trick the thing by generating a query with IN, or  
joining against a SRF returning the elements of the array one by one,  
which might be better.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] 
("Merlin Moncure") transmitted:
> Alex wrote:
>> How do you create a temporary view that has only a small subset of the
>> data from the DB init?  (Links to docs are fine - I can read ;).  My
>> query isn't all that complex, and my number of records might be from
>> 10 to 2k depending on how I implement it.
>
> Well, you can't.  My point was that the traditional query/view
> approach is often more appropriate for these cases.

Actually, you can if you assume you can "temporarily materialize" that
view.

You take the initial query and materialize it into a temporary table
which can then be used to browse "detail."

Thus, suppose you've got a case where the selection criteria draw in
8000 objects/transactions, of which you only want to fit 20/page.

It's ugly and slow to process the 15th page, and you essentially
reprocess the whole set from scratch each time:

  select [details] from [big table] where [criteria]
order by [something]
offset 280 limit 20;

Instead, you might start out by doing:

  select [key fields] into temp table my_query
  from [big table] where [criteria];

  create index my_query_idx on my_query(interesting fields);

With 8000 records, the number of pages in the table will correspond
roughly to the number of bytes per record which is probably pretty
small.

Then, you use a join on my_query to pull the bits you want:

  select [big table.details] from [big table], 
  [select * from my_query order by [something] offset 280 limit 20]
  where [join criteria between my_query and big table]
  order by [something];

For this to be fast is predicated on my_query being compact, but that
should surely be so.

The big table may be 20 million records; for the result set to be even
vaguely browsable means that my_query ought to be relatively small so
you can pull subsets reasonably efficiently.

This actually has a merit over looking at a dynamic, possibly-changing
big table that you won't unexpectedly see the result set changing
size.

This strikes me as a pretty slick way to handle "data warehouse-style"
browsing...
-- 
output = ("cbbrowne" "@" "gmail.com")
http://www.ntlug.org/~cbbrowne/oses.html
The first cup of coffee recapitulates phylogeny.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
Thats a really good idea, just store a list of the sorted ids in the
temp table - small amount of data for insert... I like it!

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 22:24:34 +0100, PFC <[EMAIL PROTECTED]> wrote:
> > The problem with this approach is TTFB (Time to first Byte).  The
> > initial query is very slow, but additional requests are fast.  In most
> > situations we do not want the user to have to wait a disproportionate
> > amount of time for the initial query.  If this is the first time using
> > the system this will be the impression that will stick with them.  I
> > guess we could experiment and see how much extra time creating a cache
> > table will take...
> 
> Do it on the second page then ;)
> 
> Seriously :
> - If you want to display the result count and page count, you'll need 
> to
> do the whole query anyway, so you might as well save the results.
> - inserting the result id's in a temp table one by one will be slow, 
> but
> you can do this :
> 
> select array_accum(id) from temp group by id/20 limit 3;
>array_accum
> ---
>   {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
>   {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
>   {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}
> 
> - a really big search of 131072 results :
> create table cache (id serial primary key, value integer[]);
> explain analyze insert into cache (value) select array_accum(id) from temp
> group by id/100;
>   Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)
> (actual time=961.746..1446.630 rows=1311 loops=1)
> ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)
> (actual time=961.607..1423.803 rows=1311 loops=1)
>   ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual
> time=961.181..1077.662 rows=131072 loops=1)
> Sort Key: (id / 100)
> ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072
> width=4) (actual time=0.032..291.652 rows=131072 loops=1)
>   Total runtime: 1493.304 ms
> 
> Note that the "SELECT..." part takes 1400 ms, and the INSERT part 
> takes
> the rest, which is really small. It's the sort which takes most of the
> time, but you'll be doing it anyway to get your results in order, so it
> comes free to you. This will generate 1000 pages with 100 results on each.
> If your searches yield say 1000 results it'll be perfectly fine and can
> target times in the sub-100 ms for caching the results (not counting the
> total query time of course !)
> 
> Using arrays is the key here, because inserting all the results as
> individual rows in the table is gonna be a whole lot slower !
> 
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread PFC
The problem with this approach is TTFB (Time to first Byte).  The
initial query is very slow, but additional requests are fast.  In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query.  If this is the first time using
the system this will be the impression that will stick with them.  I
guess we could experiment and see how much extra time creating a cache
table will take...
Do it on the second page then ;)
	Seriously :
	- If you want to display the result count and page count, you'll need to  
do the whole query anyway, so you might as well save the results.
	- inserting the result id's in a temp table one by one will be slow, but  
you can do this :

select array_accum(id) from temp group by id/20 limit 3;
  array_accum
---
 {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
 {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
 {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}
	- a really big search of 131072 results :
create table cache (id serial primary key, value integer[]);
explain analyze insert into cache (value) select array_accum(id) from temp  
group by id/100;
 Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)  
(actual time=961.746..1446.630 rows=1311 loops=1)
   ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)  
(actual time=961.607..1423.803 rows=1311 loops=1)
 ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual  
time=961.181..1077.662 rows=131072 loops=1)
   Sort Key: (id / 100)
   ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072  
width=4) (actual time=0.032..291.652 rows=131072 loops=1)
 Total runtime: 1493.304 ms

	Note that the "SELECT..." part takes 1400 ms, and the INSERT part takes  
the rest, which is really small. It's the sort which takes most of the  
time, but you'll be doing it anyway to get your results in order, so it  
comes free to you. This will generate 1000 pages with 100 results on each.  
If your searches yield say 1000 results it'll be perfectly fine and can  
target times in the sub-100 ms for caching the results (not counting the  
total query time of course !)

	Using arrays is the key here, because inserting all the results as  
individual rows in the table is gonna be a whole lot slower !



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Greg Stark

Alex Turner <[EMAIL PROTECTED]> writes:

> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...

You could cheat and do queries with an offset of 0 directly but also start up
a background job to fetch the complete results and cache them. queries with a
non-zero offset would have to wait until the complete cache is built. You have
to be careful about people arriving from bookmarks to non-zero offsets and
people hitting reload before the cache is finished being built.

As someone else suggested you could look into other systems for storing the
cache. If you don't need to join against other database tables and you don't
need the reliability of a database then there are faster solutions like
memcached for example. (The problem of joining against database tables is even
solvable, look up pgmemcached. No idea how it performs though.)

But I think you're running into a fundamental tension here. The feature you're
looking for: being able to jump around in an arbitrary non-indexed query
result set which can be arbitrarily large, requires a lot of work. All you can
do is shift around *when* that work is done. There's not going to be any way
to avoid doing the work entirely.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Merlin Moncure
> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...


Have you read this?
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Don't know your exact situation, but this is always worth considering in
those hard to optimize corner cases.  Moving this stuff into the
application space or 'middleware' is going to be a lot of pain and
aggravation.


Merlin




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
The problem with this approach is TTFB (Time to first Byte).  The
initial query is very slow, but additional requests are fast.  In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query.  If this is the first time using
the system this will be the impression that will stick with them.  I
guess we could experiment and see how much extra time creating a cache
table will take...

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 13:58:18 +0100, PFC <[EMAIL PROTECTED]> wrote:
> 
> Supposing your searches display results which are rows coming from one
> specific table, you could create a cache table :
> 
> search_id   serial primary key
> index_n position of this result in the global result set
> result_id   id of the resulting row.
> 
> Then, making a search with 50k results would INSERT INTO cache ... SELECT
>  FROM search query, with a way to set the index_n column, which can be a
> temporary sequence...
> 
> Then to display your pages, SELECT from your table with index_n BETWEEN so
> and so, and join to the data table.
> 
> If you're worried that it might take up too much space : store an integer
> array of result_id instead of just a result_id ; this way you insert fewer
> rows and save on disk space. Generate it with a custom aggregate... then
> just grab a row from this table, it contains all the id's of the rows to
> display.
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
>

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread PFC
Supposing your searches display results which are rows coming from one  
specific table, you could create a cache table :

search_id   serial primary key
index_n position of this result in the global result set
result_id   id of the resulting row.
Then, making a search with 50k results would INSERT INTO cache ... SELECT  
FROM search query, with a way to set the index_n column, which can be a  
temporary sequence...

Then to display your pages, SELECT from your table with index_n BETWEEN so  
and so, and join to the data table.

If you're worried that it might take up too much space : store an integer  
array of result_id instead of just a result_id ; this way you insert fewer  
rows and save on disk space. Generate it with a custom aggregate... then  
just grab a row from this table, it contains all the id's of the rows to  
display.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Richard Huxton
Alex Turner wrote:
As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions.  With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem.  It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.
Looks like the solutions to this problem is not RDBMS IMHO.
It's less the RDBMS than the web application. You're trying to mix a 
stateful setup (the application) with a stateless presentation layer 
(the web). If you're using PHP (which doesn't offer a "real" middle 
layer) you might want to look at memcached.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions.  With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem.  It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.

Looks like the solutions to this problem is not RDBMS IMHO.

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 12:11:49 +0200, Andrei Bintintan <[EMAIL PROTECTED]> wrote:
> The problems still stays open.
> 
> The thing is that I have about 20 - 30 clients that are using that SQL query
> where the offset and limit are involved. So, I cannot create a temp table,
> because that means that I'll have to make a temp table for each session...
> which is a very bad ideea. Cursors somehow the same. In my application the
> Where conditions can be very different for each user(session) apart.
> 
> The only solution that I see in the moment is to work at the query, or to
> write a more complex where function to limit the results output. So no
> replace for Offset/Limit.
> 
> Best regards,
> Andy.
> 
> 
> - Original Message -
> From: "Greg Stark" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: "Richard Huxton" ; "Andrei Bintintan"
> <[EMAIL PROTECTED]>; ;
> 
> Sent: Tuesday, January 25, 2005 8:28 PM
> Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
> 
> 
> >
> > Alex Turner <[EMAIL PROTECTED]> writes:
> >
> >> I am also very interesting in this very question.. Is there any way to
> >> declare a persistant cursor that remains open between pg sessions?
> >> This would be better than a temp table because you would not have to
> >> do the initial select and insert into a fresh table and incur those IO
> >> costs, which are often very heavy, and the reason why one would want
> >> to use a cursor.
> >
> > TANSTAAFL. How would such a persistent cursor be implemented if not by
> > building a temporary table somewhere behind the scenes?
> >
> > There could be some advantage if the data were stored in a temporary table
> > marked as not having to be WAL logged. Instead it could be automatically
> > cleared on every database start.
> >
> > --
> > greg
> >
> >
> 
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Leeuw van der, Tim
Hi,

What you could do is create a table containing all the fields from your SELECT, 
plus a per-session unique ID. Then you can store the query results in there, 
and use SELECT with OFFSET / LIMIT on that table. The WHERE clause for this 
temp-results table only needs to contain the per-session unique id.

This of course gives you a new problem: cleaning stale data out of the 
temp-results table. And another new problem is that users will not see new data 
appear on their screen until somehow the query is re-run (... but that might 
even be desirable, actually, depending on how your users do their work and what 
their work is).

And of course better performance cannot be guaranteed until you try it.


Would such a scheme give you any hope of improved performance, or would it be 
too much of a nightmare?

cheers,

--Tim




-Original Message-
From: [EMAIL PROTECTED] on behalf of Andrei Bintintan
Sent: Wed 1/26/2005 11:11 AM
To: [EMAIL PROTECTED]; Greg Stark
Cc: Richard Huxton; pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
 
The problems still stays open.

The thing is that I have about 20 - 30 clients that are using that SQL query 
where the offset and limit are involved. So, I cannot create a temp table, 
because that means that I'll have to make a temp table for each session... 
which is a very bad ideea. Cursors somehow the same. In my application the 
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or to 
write a more complex where function to limit the results output. So no 
replace for Offset/Limit.

Best regards,
Andy.


- Original Message - 
From: "Greg Stark" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Richard Huxton" ; "Andrei Bintintan" 
<[EMAIL PROTECTED]>; ; 

Sent: Tuesday, January 25, 2005 8:28 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


>
> Alex Turner <[EMAIL PROTECTED]> writes:
>
>> I am also very interesting in this very question.. Is there any way to
>> declare a persistant cursor that remains open between pg sessions?
>> This would be better than a temp table because you would not have to
>> do the initial select and insert into a fresh table and incur those IO
>> costs, which are often very heavy, and the reason why one would want
>> to use a cursor.
>
> TANSTAAFL. How would such a persistent cursor be implemented if not by
> building a temporary table somewhere behind the scenes?
>
> There could be some advantage if the data were stored in a temporary table
> marked as not having to be WAL logged. Instead it could be automatically
> cleared on every database start.
>
> -- 
> greg
>
> 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Andrei Bintintan
The problems still stays open.
The thing is that I have about 20 - 30 clients that are using that SQL query 
where the offset and limit are involved. So, I cannot create a temp table, 
because that means that I'll have to make a temp table for each session... 
which is a very bad ideea. Cursors somehow the same. In my application the 
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or to 
write a more complex where function to limit the results output. So no 
replace for Offset/Limit.

Best regards,
Andy.
- Original Message - 
From: "Greg Stark" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Richard Huxton" ; "Andrei Bintintan" 
<[EMAIL PROTECTED]>; ; 

Sent: Tuesday, January 25, 2005 8:28 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


Alex Turner <[EMAIL PROTECTED]> writes:
I am also very interesting in this very question.. Is there any way to
declare a persistant cursor that remains open between pg sessions?
This would be better than a temp table because you would not have to
do the initial select and insert into a fresh table and incur those IO
costs, which are often very heavy, and the reason why one would want
to use a cursor.
TANSTAAFL. How would such a persistent cursor be implemented if not by
building a temporary table somewhere behind the scenes?
There could be some advantage if the data were stored in a temporary table
marked as not having to be WAL logged. Instead it could be automatically
cleared on every database start.
--
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-25 Thread Greg Stark

Alex Turner <[EMAIL PROTECTED]> writes:

> I am also very interesting in this very question.. Is there any way to
> declare a persistant cursor that remains open between pg sessions? 
> This would be better than a temp table because you would not have to
> do the initial select and insert into a fresh table and incur those IO
> costs, which are often very heavy, and the reason why one would want
> to use a cursor.

TANSTAAFL. How would such a persistent cursor be implemented if not by
building a temporary table somewhere behind the scenes?

There could be some advantage if the data were stored in a temporary table
marked as not having to be WAL logged. Instead it could be automatically
cleared on every database start.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-24 Thread Merlin Moncure
Alex wrote:
> How do you create a temporary view that has only a small subset of the
> data from the DB init?  (Links to docs are fine - I can read ;).  My
> query isn't all that complex, and my number of records might be from
> 10 to 2k depending on how I implement it.

Well, you can't.  My point was that the traditional query/view approach
is often more appropriate for these cases.  

Cursors are really designed to provide an in-transaction working set.
Because of this, they provide the luxury of absolute addressing which is
normally impossible in SQL.  

Queries allow for relative addressing, in other words 'fetch me the next
c of x based on y'.  This is a good thing, because it forces the
application developer to consider changes that happen from other users
while browsing a dataset.  Applications that don't use transactions
should not provide any guarantees about the data in between queries like
the number of records matching a certain criteria.  This is a trap that
many developers fall into, especially when coming from flat file
databases that use to allow this.  This puts particularly nasty
constraints on web application developers who are unable to hold a
transaction between page refreshes.  However this just a variant of SQL
developer trap #2, which is that you are not supposed to hold a
transaction open waiting for user input.

In your particular case IMO what you really need is a materialized view.
Currently, it is possible to rig them up in a fashion with plgsql that
may or may not meet your requirements.  Given some careful thought,
mat-views can be used to solve all kinds of nasty performance related
issues (and it all boils down to performance, otherwise we'd all just
use limit/offset).  

Merlin


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-21 Thread Greg Stark

> > Now I read all the posts and I have some answers.
> > 
> > Yes, I have a web aplication. I HAVE to know exactly how many pages I have
> > and I have to allow the user to jump to a specific page(this is where I
> > used limit and offset). We have this feature and I cannot take it out.

I'm afraid you have a problem then. The only way postgres can know exactly how
many pages and allow users to jump to a specific point for an arbitrary query
is by doing what OFFSET and LIMIT does. 

There are ways to optimize this but they'll be lots of work. And they'll only
amount to moving around when the work is done. The work of gathering all the
records from the query will still have to be done sometime.

If the queries are relatively static you could preprocess the data so you have
all the results in a table with a sequential id. Then you can get the maximum
and jump around in the table using an index all you want.

Otherwise you could consider performing the queries on demand and storing them
in a temporary table. Then fetch the actual records for the page from the
temporary table again using an index on a sequential id to jump around. This
might make the actual performing of the initial query much slower though since
you have to wait for the entire query to be performed and the records stored.
You'll also have to deal with vacuuming this table aggressively.


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-21 Thread Merlin Moncure
> Now I read all the posts and I have some answers.
> 
> Yes, I have a web aplication.
> I HAVE to know exactly how many pages I have and I have to allow the
user
> to
> jump to a specific page(this is where I used limit and offset). We
have
> this
> feature and I cannot take it out.

If your working set is small, say a couple hundred records at the most
(web form or such), limit/offset may be ok.  However you are already
paying double because you are extracting the # of records matching your
where clause, yes?  Also, this # can change while the user is browsing,
heh.

IOW, your application code is writing expensive checks that the database
has to cash.

> >> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
> Now this solution looks very fast, but I cannot implement it, because
I
> cannot jump from page 1 to page  only to page 2. Because I know
with
> this type where did the page 1 ended. And we have some really
complicated
> where's and about 10 tables are involved in the sql query.
> About the CURSOR I have to read more about them because this is my
first
> time when I hear about.
> I don't know if temporary tables are a solution, really I don't think
so,
> there are a lot of users that are working in the same time at the same
> page.

Cursors held by a connection.  If your web app keeps persistent
connection, you can use them.  In this case, pass the where clause to a
plpgsql function which returns a composite object containing a refcursor
object and the number of rows (read the docs!).  If/When pg gets shared
cursors, this may be the way to go...but in this case you may have to
worry about closing them.

Without a connection, you need some type of persistence on the database.
This is complex but it can be done...but it will not be faster than
limit offset for browsing relatively small sets.

Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-21 Thread Andrei Bintintan
Now I read all the posts and I have some answers.
Yes, I have a web aplication.
I HAVE to know exactly how many pages I have and I have to allow the user to 
jump to a specific page(this is where I used limit and offset). We have this 
feature and I cannot take it out.


> SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
Now this solution looks very fast, but I cannot implement it, because I 
cannot jump from page 1 to page  only to page 2. Because I know with 
this type where did the page 1 ended. And we have some really complicated 
where's and about 10 tables are involved in the sql query.

About the CURSOR I have to read more about them because this is my first 
time when I hear about.
I don't know if temporary tables are a solution, really I don't think so, 
there are a lot of users that are working in the same time at the same page.

So... still DIGGING for solutions.
Andy.
- Original Message - 
From: "Ragnar Hafstað" <[EMAIL PROTECTED]>
To: 
Cc: "Andrei Bintintan" <[EMAIL PROTECTED]>; 
Sent: Thursday, January 20, 2005 9:23 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


On Thu, 2005-01-20 at 19:12 +, Ragnar Hafstað wrote:
On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:
> The best way to do pages for is not to use offset or cursors but to use 
> an
> index. This only works if you can enumerate all the sort orders the
> application might be using and can have an index on each of them.
>
> To do this the query would look something like:
>
> SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
>
> Then you take note of the last value used on a given page and if the 
> user
> selects "next" you pass that as the starting point for the next page.

this will only work unchanged if the index is unique. imagine , for
example if you have more than 50 rows with the same value of col.
one way to fix this is to use ORDER BY col,oid
and a slightly more complex WHERE clause as well, of course
gnari


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Alex Turner
How do you create a temporary view that has only a small subset of the
data from the DB init?  (Links to docs are fine - I can read ;).  My
query isn't all that complex, and my number of records might be from
10 to 2k depending on how I implement it.

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 12:00:06 -0500, Merlin Moncure
<[EMAIL PROTECTED]> wrote:
> > I am also very interesting in this very question.. Is there any way to
> > declare a persistant cursor that remains open between pg sessions?
> > This would be better than a temp table because you would not have to
> > do the initial select and insert into a fresh table and incur those IO
> > costs, which are often very heavy, and the reason why one would want
> > to use a cursor.
> 
> Yes, it's called a 'view' :-)
> 
> Everything you can do with cursors you can do with a view, including
> selecting records in blocks in a reasonably efficient way.  As long as
> your # records fetched is not real small (> 10) and your query is not
> super complex, you can slide your view just like a cursor with zero real
> impact on performance.
> 
> If the query in question does not scale in time complexity with the
> amount of data returned (there is a fix processing step which can't be
> avoided), then it's materialized view time, such that they can be done
> in PostgreSQL.
> 
> Now, cursors can be passed around in pl/pgsql functions which makes them
> very useful in that context.  However, for normal data processing via
> queries, they have some limitations that makes them hard to use in a
> general sense.
> 
> Merlin
> 
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
> this will only work unchanged if the index is unique. imagine , for
> example if you have more than 50 rows with the same value of col.
> 
> one way to fix this is to use ORDER BY col,oid

nope!  oid is
1. deprecated
2. not guaranteed to be unique even inside a (large) table.

Use a sequence instead.  

create view a_b as
select nextval('some_sequnce')::k, a.*, b.* from a, b [...]


select * from a_b where k > k1 order by k limit 1000
*or*
execute fetch_a_b(k1, 1000) <-- pass limit into prepared statement for extra 
flexibility.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ragnar Hafstað
On Thu, 2005-01-20 at 19:12 +, Ragnar Hafstað wrote:
> On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:
> 
> > The best way to do pages for is not to use offset or cursors but to use an
> > index. This only works if you can enumerate all the sort orders the
> > application might be using and can have an index on each of them.
> > 
> > To do this the query would look something like:
> > 
> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
> > 
> > Then you take note of the last value used on a given page and if the user
> > selects "next" you pass that as the starting point for the next page.
> 
> this will only work unchanged if the index is unique. imagine , for
> example if you have more than 50 rows with the same value of col.
> 
> one way to fix this is to use ORDER BY col,oid

and a slightly more complex WHERE clause as well, of course

gnari



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ragnar =?ISO-8859-1?Q?Hafsta=F0?=
On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:

> The best way to do pages for is not to use offset or cursors but to use an
> index. This only works if you can enumerate all the sort orders the
> application might be using and can have an index on each of them.
> 
> To do this the query would look something like:
> 
> SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
> 
> Then you take note of the last value used on a given page and if the user
> selects "next" you pass that as the starting point for the next page.

this will only work unchanged if the index is unique. imagine , for
example if you have more than 50 rows with the same value of col.

one way to fix this is to use ORDER BY col,oid

gnari



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Spiegelberg, Greg
Isn't this a prime example of when to use a servlet or something similar
in function?  It will create the cursor, maintain it, and fetch against
it for a particular page.

Greg


-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 10:21 AM
To: Andrei Bintintan
Cc: pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


Andrei Bintintan wrote:
>> If you're using this to provide "pages" of results, could you use a 
>> cursor?
> 
> What do you mean by that? Cursor?
> 
> Yes I'm using this to provide "pages", but If I jump to the last pages 
> it goes very slow.

DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;

Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Greg Stark wrote:
"Andrei Bintintan" <[EMAIL PROTECTED]> writes:

If you're using this to provide "pages" of results, could you use a cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide "pages", but If I jump to the last pages it goes
very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.
To do this the query would look something like:
SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
Then you take note of the last value used on a given page and if the user
selects "next" you pass that as the starting point for the next page.
Greg's is the most efficient, but you need to make sure you have a 
suitable key available in the output of your select.

Also, since you are repeating the query you could get different results 
as people insert/delete rows. This might or might not be what you want.

A similar solution is to partition by date/alphabet or similar, then 
page those results. That can reduce your resultset to a manageable size.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Ron Mayer wrote:
Richard Huxton wrote:
If you've got a web-application then you'll probably want to insert 
the results into a cache table for later use.

If I have quite a bit of activity like this (people selecting 1 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?
I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.
As often as you can, and make sure your config allocates enough 
free-space-map for them. Unless, of course, you end up I/O saturated.

Eventually I started doing a whole bunch of create table tmp_
tables where  is a userid; and a script to drop these tables - but
that's quite ugly in a different way.
With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running...  hmm.
Well, there have been some tweaks, but I don't know if they'll help in 
this case.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
> I am also very interesting in this very question.. Is there any way to
> declare a persistant cursor that remains open between pg sessions?
> This would be better than a temp table because you would not have to
> do the initial select and insert into a fresh table and incur those IO
> costs, which are often very heavy, and the reason why one would want
> to use a cursor.

Yes, it's called a 'view' :-)

Everything you can do with cursors you can do with a view, including
selecting records in blocks in a reasonably efficient way.  As long as
your # records fetched is not real small (> 10) and your query is not
super complex, you can slide your view just like a cursor with zero real
impact on performance.

If the query in question does not scale in time complexity with the
amount of data returned (there is a fix processing step which can't be
avoided), then it's materialized view time, such that they can be done
in PostgreSQL.

Now, cursors can be passed around in pl/pgsql functions which makes them
very useful in that context.  However, for normal data processing via
queries, they have some limitations that makes them hard to use in a
general sense.

Merlin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Greg Stark
"Andrei Bintintan" <[EMAIL PROTECTED]> writes:

> > If you're using this to provide "pages" of results, could you use a cursor?
> What do you mean by that? Cursor?
> 
> Yes I'm using this to provide "pages", but If I jump to the last pages it goes
> very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.

To do this the query would look something like:

SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50

Then you take note of the last value used on a given page and if the user
selects "next" you pass that as the starting point for the next page.

This query takes the same amount of time no matter how many records are in the
table and no matter what page of the result set the user is on. It should
actually be instantaneous even if the user is on the hundredth page of
millions of records because it uses an index both for the finding the right
point to start and for the ordering.

It also has the advantage that it works even if the list of items changes as
the user navigates. If you use OFFSET and someone inserts a record in the
table then the "next" page will overlap the current page. Worse, if someone
deletes a record then "next" will skip a record.

The disadvantages of this are a) it's hard (but not impossible) to go
backwards. And b) it's impossible to give the user a list of pages and let
them skip around willy nilly.


(If this is for a web page then specifically don't recommend cursors. It will
mean you'll have to have some complex session management system that
guarantees the user will always come to the same postgres session and has some
garbage collection if the user disappears. And it means the URL is only good
for a limited amount of time. If they bookmark it it'll break if they come
back the next day.)

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Alex Turner wrote:
I am also very interesting in this very question.. Is there any way
to declare a persistant cursor that remains open between pg sessions?
Not sure how this would work. What do you do with multiple connections? 
Only one can access the cursor, so which should it be?

 This would be better than a temp table because you would not have to
 do the initial select and insert into a fresh table and incur those
IO costs, which are often very heavy, and the reason why one would
want to use a cursor.
I'm pretty sure two things mean there's less difference than you might 
expect:
1. Temp tables don't fsync
2. A cursor will spill to disk beyond a certain size

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ron Mayer
Richard Huxton wrote:
If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

If I have quite a bit of activity like this (people selecting 1 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?
I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.
Eventually I started doing a whole bunch of create table tmp_
tables where  is a userid; and a script to drop these tables - but
that's quite ugly in a different way.
With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running...  hmm.
Any suggestions?
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Alex Turner
I am also very interesting in this very question.. Is there any way to
declare a persistant cursor that remains open between pg sessions? 
This would be better than a temp table because you would not have to
do the initial select and insert into a fresh table and incur those IO
costs, which are often very heavy, and the reason why one would want
to use a cursor.

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 15:20:59 +, Richard Huxton  wrote:
> Andrei Bintintan wrote:
> >> If you're using this to provide "pages" of results, could you use a
> >> cursor?
> >
> > What do you mean by that? Cursor?
> >
> > Yes I'm using this to provide "pages", but If I jump to the last pages
> > it goes very slow.
> 
> DECLARE mycursor CURSOR FOR SELECT * FROM ...
> FETCH FORWARD 10 IN mycursor;
> CLOSE mycursor;
> 
> Repeated FETCHes would let you step through your results. That won't
> work if you have a web-app making repeated connections.
> 
> If you've got a web-application then you'll probably want to insert the
> results into a cache table for later use.
> 
> --
>Richard Huxton
>Archonet Ltd
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Andrei Bintintan wrote:
If you're using this to provide "pages" of results, could you use a 
cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide "pages", but If I jump to the last pages 
it goes very slow.
DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;
Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Andrei Bintintan
If you're using this to provide "pages" of results, could you use a 
cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide "pages", but If I jump to the last pages it 
goes very slow.

Andy.
- Original Message - 
From: "Richard Huxton" 
To: "Andrei Bintintan" <[EMAIL PROTECTED]>
Cc: ; 
Sent: Thursday, January 20, 2005 2:10 PM
Subject: Re: [SQL] OFFSET impact on Performance???


Andrei Bintintan wrote:
Hi to all,
I have the following 2 examples. Now, regarding on the offset if it
is small(10) or big(>5) what is the impact on the performance of
the query?? I noticed that if I return more data's(columns) or if I
make more joins then the query runs even slower if the OFFSET is
bigger. How can I somehow improve the performance on this?
There's really only one way to do an offset of 1000 and that's to fetch 
1000 rows and then some and discard the first 1000.

If you're using this to provide "pages" of results, could you use a 
cursor?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Andrei Bintintan wrote:
Hi to all,
I have the following 2 examples. Now, regarding on the offset if it
is small(10) or big(>5) what is the impact on the performance of
the query?? I noticed that if I return more data's(columns) or if I
make more joins then the query runs even slower if the OFFSET is
bigger. How can I somehow improve the performance on this?
There's really only one way to do an offset of 1000 and that's to fetch 
1000 rows and then some and discard the first 1000.

If you're using this to provide "pages" of results, could you use a cursor?
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly