Re: [PERFORM] Ideal disk setup for Postgresql 7.4?

2005-01-26 Thread Steve Poe
Josh,
Thanks for your feedback, I appreciate it.
Check what I have to say at http://www.powerpostgresql.com/PerfList
 

Will do.
They're currently on a two-disk Adaptec RAID1 with Postgresql 7.4.2.
   

And you've not upgraded to 7.4.6 because  ?
 

Because the proprietary application running the business has not 
certified on it.  Unfortunately, I am at the mercy of their support in 
case something goes wrong.

The drive array is a 7-disk fibre channel on a Qlogic 2100 controller. I
am currently testing RAID5 (sw).
   

In general, RAID 5 is not so great for databases.  See the article for more.
 

Okay, thanks. Even with 7-disks? I trust that. So, RAID 1+0 (sw) is 
probably the best option. I've run sw RAID personally for years without 
issue. I am a bit hesitant in doing sw RAID for a production server for 
a database --- probably because its not my server. Any thoughts on sw 
RAID for Postgresql?

The main reason of moving to a drive array is the high level of context
switches we get during the day (>30K for 20 mins per hour). The OS and
database exist on the same disk but seperate parition (which probably
makes little difference)
   

Unfortunately, the context switches are probably due to a known issue in 
PostgreSQL, and changing the drive array won't help this issue (it may help 
other issues though).Search the archives of this list, and pgsql-hackers, 
for "Context Switch Bug".

For the CS bug, the only workaround right now is to avoid the query structures 
that trigger it.
 

Okay. Darn. While I don't write the queries for the application, I do 
interact with the company frequently. Their considering moving the 
queries into the database with PL/pgSQL. Currently their queries are 
done through ProvIV development using ODBC. Will context switching be 
minimized here by using PL/pgSQL?

 

Server Info:
Centos 3.3 (RHEL 3.x equivelent)
4GB RAM
Adaptec 2100S RAID
Qlogic QLA2100 Fibre
   

CPU?
 

Dual Xeon 2.8 CPUs with HT turned off.
Thanks again.
Steve Poe
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton
Peter Darley wrote:
Folks,
I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
weather a
query I have is going to be slow when I have more information in my tables.
both tables involved will likely have ~500K rows within a year or so.
Specifically I can't tell if I'm causing myself future problems with the
subquery, and should maybe re-write the query to use a join.  The reason I
went with the subquery is that I don't know weather a row in Assignments
will have a corresponding row in Assignment_Settings
The query is:
SELECT User_ID
FROM Assignments A
WHERE A.User_ID IS NOT NULL
AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
Assignment_ID=A.Assignment_ID) IS NULL
GROUP BY User_ID;
You could always use a LEFT JOIN instead, like you say. I'd personally 
be tempted to select distinct user_id's then join, but it depends on how 
many of each.

You're not going to know for sure whether you'll have problems without 
testing. Generate 500k rows of plausible looking test-data and give it a 
try.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-26 Thread Hannu Krosing
Ühel kenal päeval (teisipäev, 25. jaanuar 2005, 10:41-0500), kirjutas
Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Why is removing index entries essential ?
> 
> Because once you re-use the tuple slot, any leftover index entries would
> be pointing to the wrong rows.

That much I understood ;)

But can't clearing up the index be left for "later" ? 

Indexscan has to check the data tuple anyway, at least for visibility.
would adding the check for field sameness in index and data tuples be
too big performance hit ?

>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton
Peter Darley wrote:
Folks,
	I'm using PostgreSQL 7.4.1 on Linux
Oh, and move to the latest in the 7.4 series too.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Andrei Bintintan
The problems still stays open.
The thing is that I have about 20 - 30 clients that are using that SQL query 
where the offset and limit are involved. So, I cannot create a temp table, 
because that means that I'll have to make a temp table for each session... 
which is a very bad ideea. Cursors somehow the same. In my application the 
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or to 
write a more complex where function to limit the results output. So no 
replace for Offset/Limit.

Best regards,
Andy.
- Original Message - 
From: "Greg Stark" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Richard Huxton" ; "Andrei Bintintan" 
<[EMAIL PROTECTED]>; ; 

Sent: Tuesday, January 25, 2005 8:28 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


Alex Turner <[EMAIL PROTECTED]> writes:
I am also very interesting in this very question.. Is there any way to
declare a persistant cursor that remains open between pg sessions?
This would be better than a temp table because you would not have to
do the initial select and insert into a fresh table and incur those IO
costs, which are often very heavy, and the reason why one would want
to use a cursor.
TANSTAAFL. How would such a persistent cursor be implemented if not by
building a temporary table somewhere behind the scenes?
There could be some advantage if the data were stored in a temporary table
marked as not having to be WAL logged. Instead it could be automatically
cleared on every database start.
--
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Leeuw van der, Tim
Hi,

What you could do is create a table containing all the fields from your SELECT, 
plus a per-session unique ID. Then you can store the query results in there, 
and use SELECT with OFFSET / LIMIT on that table. The WHERE clause for this 
temp-results table only needs to contain the per-session unique id.

This of course gives you a new problem: cleaning stale data out of the 
temp-results table. And another new problem is that users will not see new data 
appear on their screen until somehow the query is re-run (... but that might 
even be desirable, actually, depending on how your users do their work and what 
their work is).

And of course better performance cannot be guaranteed until you try it.


Would such a scheme give you any hope of improved performance, or would it be 
too much of a nightmare?

cheers,

--Tim




-Original Message-
From: [EMAIL PROTECTED] on behalf of Andrei Bintintan
Sent: Wed 1/26/2005 11:11 AM
To: [EMAIL PROTECTED]; Greg Stark
Cc: Richard Huxton; pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
 
The problems still stays open.

The thing is that I have about 20 - 30 clients that are using that SQL query 
where the offset and limit are involved. So, I cannot create a temp table, 
because that means that I'll have to make a temp table for each session... 
which is a very bad ideea. Cursors somehow the same. In my application the 
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or to 
write a more complex where function to limit the results output. So no 
replace for Offset/Limit.

Best regards,
Andy.


- Original Message - 
From: "Greg Stark" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Richard Huxton" ; "Andrei Bintintan" 
<[EMAIL PROTECTED]>; ; 

Sent: Tuesday, January 25, 2005 8:28 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


>
> Alex Turner <[EMAIL PROTECTED]> writes:
>
>> I am also very interesting in this very question.. Is there any way to
>> declare a persistant cursor that remains open between pg sessions?
>> This would be better than a temp table because you would not have to
>> do the initial select and insert into a fresh table and incur those IO
>> costs, which are often very heavy, and the reason why one would want
>> to use a cursor.
>
> TANSTAAFL. How would such a persistent cursor be implemented if not by
> building a temporary table somewhere behind the scenes?
>
> There could be some advantage if the data were stored in a temporary table
> marked as not having to be WAL logged. Instead it could be automatically
> cleared on every database start.
>
> -- 
> greg
>
> 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions.  With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem.  It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.

Looks like the solutions to this problem is not RDBMS IMHO.

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 12:11:49 +0200, Andrei Bintintan <[EMAIL PROTECTED]> wrote:
> The problems still stays open.
> 
> The thing is that I have about 20 - 30 clients that are using that SQL query
> where the offset and limit are involved. So, I cannot create a temp table,
> because that means that I'll have to make a temp table for each session...
> which is a very bad ideea. Cursors somehow the same. In my application the
> Where conditions can be very different for each user(session) apart.
> 
> The only solution that I see in the moment is to work at the query, or to
> write a more complex where function to limit the results output. So no
> replace for Offset/Limit.
> 
> Best regards,
> Andy.
> 
> 
> - Original Message -
> From: "Greg Stark" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: "Richard Huxton" ; "Andrei Bintintan"
> <[EMAIL PROTECTED]>; ;
> 
> Sent: Tuesday, January 25, 2005 8:28 PM
> Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
> 
> 
> >
> > Alex Turner <[EMAIL PROTECTED]> writes:
> >
> >> I am also very interesting in this very question.. Is there any way to
> >> declare a persistant cursor that remains open between pg sessions?
> >> This would be better than a temp table because you would not have to
> >> do the initial select and insert into a fresh table and incur those IO
> >> costs, which are often very heavy, and the reason why one would want
> >> to use a cursor.
> >
> > TANSTAAFL. How would such a persistent cursor be implemented if not by
> > building a temporary table somewhere behind the scenes?
> >
> > There could be some advantage if the data were stored in a temporary table
> > marked as not having to be WAL logged. Instead it could be automatically
> > cleared on every database start.
> >
> > --
> > greg
> >
> >
> 
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Richard Huxton
Alex Turner wrote:
As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions.  With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem.  It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.
Looks like the solutions to this problem is not RDBMS IMHO.
It's less the RDBMS than the web application. You're trying to mix a 
stateful setup (the application) with a stateless presentation layer 
(the web). If you're using PHP (which doesn't offer a "real" middle 
layer) you might want to look at memcached.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-26 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes:
> But can't clearing up the index be left for "later" ? 

Based on what?  Are you going to store the information about what has to
be cleaned up somewhere else, and if so where?

> Indexscan has to check the data tuple anyway, at least for visibility.
> would adding the check for field sameness in index and data tuples be
> too big performance hit ?

It does pretty much suck, especially when you think about functional
indexes on expensive functions.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Possibly slow query

2005-01-26 Thread Peter Darley
Richard,
I tried a left join, which has to be a little weird, because there may 
or
may not be a corresponding row in Assignment_Settings for each Assignment,
and they may or may not have Setting='Status', so I came up with:

SELECT User_ID
FROM Assignments A NATURAL LEFT JOIN (SELECT * FROM Assignment_Settings
WHERE Setting='Status') ASet
WHERE A.User_ID IS NOT NULL
AND ASet.Assignment_ID IS NULL
GROUP BY User_ID;

Which explain analyze is saying takes 0.816 ms as compared to 0.163 ms 
for
my other query.  So, I'm not sure that I'm writing the best LEFT JOIN that I
can.  Also, I suspect that these ratios wouldn't hold as the data got bigger
and started using indexes, etc.  I'll mock up a couple of tables with a
bunch of data and see how things go.  It would be nice to understand WHY I
get the results I get, which I'm not sure I will.

I'm not sure what you mean by selecting a distinct User_ID first.  Since
I'm joining the tables on Assignment_ID, I'm not sure how I'd do a distinct
before the join (because I'd lose Assignment_ID).  I was also under the
impression that group by was likely to be faster than a distinct, tho I
can't really recall where I got that idea from.

Thanks for your suggestions!
Peter Darley

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 1:36 AM
To: Peter Darley
Cc: Pgsql-Performance
Subject: Re: [PERFORM] Possibly slow query


Peter Darley wrote:
> Folks,
>
>   I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
> weather
a
> query I have is going to be slow when I have more information in my
tables.
> both tables involved will likely have ~500K rows within a year or so.
>
>   Specifically I can't tell if I'm causing myself future problems with the
> subquery, and should maybe re-write the query to use a join.  The reason I
> went with the subquery is that I don't know weather a row in Assignments
> will have a corresponding row in Assignment_Settings
>
>   The query is:
> SELECT User_ID
> FROM Assignments A
> WHERE A.User_ID IS NOT NULL
>   AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
> Assignment_ID=A.Assignment_ID) IS NULL
> GROUP BY User_ID;

You could always use a LEFT JOIN instead, like you say. I'd personally
be tempted to select distinct user_id's then join, but it depends on how
many of each.

You're not going to know for sure whether you'll have problems without
testing. Generate 500k rows of plausible looking test-data and give it a
try.

--
   Richard Huxton
   Archonet Ltd


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread PFC
Supposing your searches display results which are rows coming from one  
specific table, you could create a cache table :

search_id   serial primary key
index_n position of this result in the global result set
result_id   id of the resulting row.
Then, making a search with 50k results would INSERT INTO cache ... SELECT  
FROM search query, with a way to set the index_n column, which can be a  
temporary sequence...

Then to display your pages, SELECT from your table with index_n BETWEEN so  
and so, and join to the data table.

If you're worried that it might take up too much space : store an integer  
array of result_id instead of just a result_id ; this way you insert fewer  
rows and save on disk space. Generate it with a custom aggregate... then  
just grab a row from this table, it contains all the id's of the rows to  
display.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[PERFORM] SQL Performance Guidelines

2005-01-26 Thread Van Ingen, Lane
Does anybody know where I can lay my hands on some guidelines to get best SQL 
performance
out of PostgreSQL? We are about to get into a project that will be new from the 
ground up (and\we are using Postgres for the first time). Would like to share 
some guidelines with developers on best practices
in Postgres? Thanks for your help.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Ideal disk setup for Postgresql 7.4?

2005-01-26 Thread Josh Berkus
Steve,

> Because the proprietary application running the business has not
> certified on it.  Unfortunately, I am at the mercy of their support in
> case something goes wrong.

FWIW, 7.4.6 is a binary, drop-in place upgrade for 7.4.2.  And 7.4.2 has known 
bugs.   However, I understand your situation.

> Okay, thanks. Even with 7-disks? I trust that. 

Well, it's less bad with 7 disks than it is with 3, certainly.   However,there 
is an obvious and quick gain to be had by splitting off the WAL logs onto 
their own disk resource ... up to 14%+ performance in some applications.

> So, RAID 1+0 (sw) is 
> probably the best option. I've run sw RAID personally for years without
> issue. I am a bit hesitant in doing sw RAID for a production server for
> a database --- probably because its not my server. Any thoughts on sw
> RAID for Postgresql?

Yes.   See my article for one.  In generaly, SW RAID on BSD or Linux works 
well for PostgreSQL ... UNLESS your machine is already CPU-bound, in which 
case it's a bad idea.   If you're hitting the CS bug, it's definitely a bad 
idea, because the SW RAID will increase context switching.

So if your choice, on your system, is between sw RAID 10, and hw RAID 5, and 
you're having excessive CSes, I'd stick with the HW RAID.

> Okay. Darn. While I don't write the queries for the application, I do
> interact with the company frequently. Their considering moving the
> queries into the database with PL/pgSQL. Currently their queries are
> done through ProvIV development using ODBC. Will context switching be
> minimized here by using PL/pgSQL?

Won't make a difference, actually.   Should improve performance in other ways, 
though, by reducing round-trip time on procedures.  Feel free to recommend 
the company to this list.

> Dual Xeon 2.8 CPUs with HT turned off.

Yeah, thought it was a Xeon.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] SQL Performance Guidelines

2005-01-26 Thread Van Ingen, Lane
Clarification: I am talking about SQL coding practices in Postgres (how to 
write queries for best 
results), not  tuning-related considerations (although that would be welcomed 
too).
 
-Original Message- 
From: [EMAIL PROTECTED] on behalf of Van Ingen, Lane 
Sent: Wed 1/26/2005 11:44 AM 
To: pgsql-performance@postgresql.org 
Cc: 
Subject: [PERFORM] SQL Performance Guidelines

Does anybody know where I can lay my hands on some guidelines to get best SQL 
performance
out of PostgreSQL? We are about to get into a project that will be new from the 
ground up (and\we are using Postgres for the first time). Would like to share 
some guidelines with developers on best practices
in Postgres? Thanks for your help.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
The problem with this approach is TTFB (Time to first Byte).  The
initial query is very slow, but additional requests are fast.  In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query.  If this is the first time using
the system this will be the impression that will stick with them.  I
guess we could experiment and see how much extra time creating a cache
table will take...

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 13:58:18 +0100, PFC <[EMAIL PROTECTED]> wrote:
> 
> Supposing your searches display results which are rows coming from one
> specific table, you could create a cache table :
> 
> search_id   serial primary key
> index_n position of this result in the global result set
> result_id   id of the resulting row.
> 
> Then, making a search with 50k results would INSERT INTO cache ... SELECT
>  FROM search query, with a way to set the index_n column, which can be a
> temporary sequence...
> 
> Then to display your pages, SELECT from your table with index_n BETWEEN so
> and so, and join to the data table.
> 
> If you're worried that it might take up too much space : store an integer
> array of result_id instead of just a result_id ; this way you insert fewer
> rows and save on disk space. Generate it with a custom aggregate... then
> just grab a row from this table, it contains all the id's of the rows to
> display.
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
>

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Upgrading from from 7.4.2 to 8.0

2005-01-26 Thread James Gunzelman
Title: Message



Will I have to dump 
and reload all my databases when migrating from 7.4.2 to 
8.0?
 

 
Jim Gunzelman
Senior Software 
Engineer
 
phone: 402.361.3078   fax: 402.361.3178
e-mail:  
JamesGunzelman[EMAIL PROTECTED]
 
Solutionary, 
Inc.
www.Solutionary.com   

 
Making Security Manageable 
24x7
_
 
Confidentiality 
Notice
The content of this 
communication, along with any attachments, is covered by federal and state law 
governing electronic communications and may contain confidential and legally 
privileged information.  If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution, use or copying of the information 
contained herein is strictly prohibited.  
If you have received this communication in error, please immediately 
contact us by telephone at (402) 361-3000 or e-mail 
[EMAIL PROTECTED]  Thank 
you.
 
Copyright 2000-2005, Solutionary, 
Inc. All rights reserved.  ActiveGuard, eV3, Solutionary and the 
Solutionary logo are registered trademarks of Solutionary, 
Inc.
 
 
 


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Merlin Moncure
> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...


Have you read this?
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Don't know your exact situation, but this is always worth considering in
those hard to optimize corner cases.  Moving this stuff into the
application space or 'middleware' is going to be a lot of pain and
aggravation.


Merlin




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PG versus FreeBSD, startup and connections problems

2005-01-26 Thread Mitch Pirtle
Just a quick shout-out to Mark, as you provided the winning answer. I
found numerous mailing list discussions and web pages, but all were
either fragmented or out of date.

Again, many thanks!

-- Mitch

On Wed, 26 Jan 2005 10:08:58 +1300, Mark Kirkwood <[EMAIL PROTECTED]> wrote:
> 
> in /etc/sysctl.conf :
> kern.ipc.shmmax=1
> kern.ipc.shmall=32768
> (can be set online using systcl -w)
> 
> Semaphores need to be set in /boot/loader.conf
> kern.ipc.semmni=256
> kern.ipc.semmns=256
> (can typed at the loader prompt using set)
> 
> These settings should let you have ~100 connections and use about 100M
> of shared memory for shared_buffers.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-26 Thread PFC
http://borg.postgresql.org/docs/8.0/interactive/storage-page-layout.html

If you vacuum as part of the transaction it's going to be more efficient
of resources, because you have more of what you need right there (ie:
odds are that you're on the same page as the old tuple). In cases like
that it very likely makes a lot of sense to take a small hit in your
transaction time up-front, instead of a larger hit doing a vacuum down
the road.
	Some pros would be that you're going to make a disk write anyway because  
the page is modified, so why not vacuum that page while it's there. If the  
machine is CPU bound you lose, if it's IO bound you save some IO, but the  
cost of index updates has to be taken into account...

It prompted a few questions :
Note : temp contains 128k (131072) values generated from a sequence.
create table test (id serial primary key, a integer, z integer, e integer,  
r integer, t integer, y integer ) without oids;
insert into test (id,a,z,e,r,t,y) select id,0,0,0,0,0,0 from temp;
 INSERT 0 131072

explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..2226.84 rows=126284 width=30) (ac Seq Scan  
on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.046..964.590 rows=131072 loops=1)
 Total runtime: 15628.143 ms
tual time=0.047..617.553 rows=131072 loops=1)
 Total runtime: 4432.509 ms

explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..4453.68 rows=252568 width=30) (actual  
time=52.198..611.594 rows=131072 loops=1)
 Total runtime: 5739.064 ms

explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..6680.52 rows=378852 width=30) (actual  
time=127.301..848.762 rows=131072 loops=1)
 Total runtime: 6548.206 ms

Gets slower as more and more dead tuples accumulate... normal as this is a  
seq scan. Note the row estimations getting bigger with the table size...

vacuum full test;
explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.019..779.864 rows=131072 loops=1)
 Total runtime: 5600.311 ms

vacuum full test;
explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.039..1021.847 rows=131072 loops=1)
 Total runtime: 5126.590 ms

-> Seems vacuum full does its job
vacuum test;
explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=36.491..860.135 rows=131072 loops=1)
 Total runtime: 7293.698 ms

vacuum test;
explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=0.044..657.125 rows=131072 loops=1)
 Total runtime: 5934.141 ms

vacuum analyze test;
explain analyze update test set y=1;
 Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=0.018..871.132 rows=131072 loops=1)
 Total runtime: 5548.053 ms

-> here vacuum is about as slow as vacuum full (which is normal as the  
whole table is updated) however the row estimation is still off even after  
ANALYZE.

 Let's create a few indices :
vacuum full test;
create index testa on test(a);
create index testz on test(z);
create index teste on test(e);
create index testr on test(r);
create index testt on test(t);
-- we don't create an index on y
vacuum full test;
explain analyze update test set a=id;
 Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.044..846.102 rows=131072 loops=1)
 Total runtime: 14998.307 ms

We see that the index updating time has made this query a lot slower. This  
is normal, but :

vacuum full test;
explain analyze update test set a=id;
 Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.045..1387.626 rows=131072 loops=1)
 Total runtime: 17644.368 ms

Now, we updated ALL rows but didn't actually change a single value.  
However it took about the same time as the first one. I guess the updates  
all really took place, even if all it did was copy the rows with new  
transaction ID's.
Now, let's update a column which is not indexed :

vacuum full test;
explain analyze update test set y=id;
 Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.046..964.590 rows=131072 loops=1)
 Total runtime: 15628.143 ms

Takes 'bout the same time : the indexes still have to be updated to  
reference the new rows after all.

So, here is something annoying with the current approach : Updating rows  
in a table bloats ALL indices, not just those whose indexed values have  
been actually updated. So if you have a table with many indexed fields and  
you often update some obscure timestamp field, all the indices will bloat,  
which will of course be corrected by VACUUM, but vacuum will have extra  
work to do.

I don't have suggestions, just questions :
	Is there a way that an update to the indices can be avoided if the  
indexed values do not change ?
	Would it depend if an updated tuple can be stored on the same page it was  
before (along with the

Re: [PERFORM] Upgrading from from 7.4.2 to 8.0

2005-01-26 Thread Doug McNaught
"James Gunzelman" <[EMAIL PROTECTED]> writes:

>Will I have to dump and reload all my databases when migrating from
>7.4.2 to 8.0?

Yes.

-Doug


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Greg Stark

Alex Turner <[EMAIL PROTECTED]> writes:

> The problem with this approach is TTFB (Time to first Byte).  The
> initial query is very slow, but additional requests are fast.  In most
> situations we do not want the user to have to wait a disproportionate
> amount of time for the initial query.  If this is the first time using
> the system this will be the impression that will stick with them.  I
> guess we could experiment and see how much extra time creating a cache
> table will take...

You could cheat and do queries with an offset of 0 directly but also start up
a background job to fetch the complete results and cache them. queries with a
non-zero offset would have to wait until the complete cache is built. You have
to be careful about people arriving from bookmarks to non-zero offsets and
people hitting reload before the cache is finished being built.

As someone else suggested you could look into other systems for storing the
cache. If you don't need to join against other database tables and you don't
need the reliability of a database then there are faster solutions like
memcached for example. (The problem of joining against database tables is even
solvable, look up pgmemcached. No idea how it performs though.)

But I think you're running into a fundamental tension here. The feature you're
looking for: being able to jump around in an arbitrary non-indexed query
result set which can be arbitrarily large, requires a lot of work. All you can
do is shift around *when* that work is done. There's not going to be any way
to avoid doing the work entirely.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Upgrading from from 7.4.2 to 8.0

2005-01-26 Thread Michael Fuhr
On Wed, Jan 26, 2005 at 12:51:14PM -0600, James Gunzelman wrote:

> Will I have to dump and reload all my databases when migrating from
> 7.4.2 to 8.0?

Yes -- the Release Notes mention it under "Migration to version 8.0":

http://www.postgresql.org/docs/8.0/static/release.html#RELEASE-8-0

Those unfamiliar with doing an upgrade might want to read "If You
Are Upgrading" in the "Installation Instructions" chapter of the
documenation, and "Migration Between Releases" in the "Backup and
Restore" chapter:

http://www.postgresql.org/docs/8.0/static/install-upgrading.html
http://www.postgresql.org/docs/8.0/static/migration.html

(Install or upgrade questions should probably go to pgsql-admin or
pgsql-general instead of pgsql-performance.)

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] poor performance of db?

2005-01-26 Thread PFC

Every time I tested an idea to speed it up, I got exactly the same  
loading time on a Athlon 1800+, 256Mb RAM, 20Gb PATA computer as  
compared to a Dual Opteron 246, 1Gb RAM, 70Gb WD Raptor SATA server.  
Now, why a dual opteron machine can't perform any faster than a lowly  
1800+ athlon in numerous tests is completely beyond me ... increased  
memory and RAID 0 disc configurations so far have not resulted in any  
significant performance gain in the opteron server.
How many rows does the query return ?
	Maybe a lot of time is spent, hidden in the PHP libraries, converting the  
rows returned by psql into PHP objects.

You should try that :
EXPLAIN ANALYZE SELECT your query
-> time is T1
CREATE TABLE cache AS SELECT your query
EXPLAIN ANALYZE SELECT * FROM cache
-> time is T2 (probably very small)
	Now in your PHP script replace SELECT your query by SELECT * FROM cache.  
How much does the final page time changes ? This will tell you the time  
spend in the postgres engine, not in data transmission and PHPing. It will  
tell wat you can gain optimizing the query.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Upgrading from from 7.4.2 to 8.0

2005-01-26 Thread Thomas F.O'Connell
It should be noted that users who use Slony can create a subscriber 
node running 8.0 that subscribes to a node running 7.4.x and can 
transition with only the downtime required for failover.

This obviates the need for a dump/restore.
See .
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Jan 26, 2005, at 1:51 PM, Michael Fuhr wrote:
On Wed, Jan 26, 2005 at 12:51:14PM -0600, James Gunzelman wrote:
Will I have to dump and reload all my databases when migrating from
7.4.2 to 8.0?
Yes -- the Release Notes mention it under "Migration to version 8.0":
http://www.postgresql.org/docs/8.0/static/release.html#RELEASE-8-0
Those unfamiliar with doing an upgrade might want to read "If You
Are Upgrading" in the "Installation Instructions" chapter of the
documenation, and "Migration Between Releases" in the "Backup and
Restore" chapter:
http://www.postgresql.org/docs/8.0/static/install-upgrading.html
http://www.postgresql.org/docs/8.0/static/migration.html
(Install or upgrade questions should probably go to pgsql-admin or
pgsql-general instead of pgsql-performance.)
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Ideal disk setup for Postgresql 7.4?

2005-01-26 Thread Steve Poe

FWIW, 7.4.6 is a binary, drop-in place upgrade for 7.4.2.  And 7.4.2 has known 
bugs.   However, I understand your situation.

 

As soon as we get the go-ahead, I will upgrade. I think the company is 
actually looking towards 8.0 certification.

Okay, thanks. Even with 7-disks? I trust that. 
   

Well, it's less bad with 7 disks than it is with 3, certainly.   However,there 
is an obvious and quick gain to be had by splitting off the WAL logs onto 
their own disk resource ... up to 14%+ performance in some applications.

 

Pardon my ignorance, but the WAL logs are comprised of pg_xlog and 
pg_clog? Their own disk resource, but not within the same channel of 
disks the database is on, right?

So, RAID 1+0 (sw) is 
probably the best option. I've run sw RAID personally for years without
issue. I am a bit hesitant in doing sw RAID for a production server for
a database --- probably because its not my server. Any thoughts on sw
RAID for Postgresql?
   

Yes.   See my article for one.  In generaly, SW RAID on BSD or Linux works 
well for PostgreSQL ... UNLESS your machine is already CPU-bound, in which 
case it's a bad idea.   If you're hitting the CS bug, it's definitely a bad 
idea, because the SW RAID will increase context switching.

So if your choice, on your system, is between sw RAID 10, and hw RAID 5, and 
you're having excessive CSes, I'd stick with the HW RAID.

 

Okay. InCPU-bound servers, use hw RAID. Any hw raids to avoid?
Okay. Darn. While I don't write the queries for the application, I do
interact with the company frequently. Their considering moving the
queries into the database with PL/pgSQL. Currently their queries are
done through ProvIV development using ODBC. Will context switching be
minimized here by using PL/pgSQL?
   

Won't make a difference, actually.   Should improve performance in other ways, 
though, by reducing round-trip time on procedures.  Feel free to recommend 
the company to this list.

 

I think their too busy to monitor/watch this list. Not a put-down to 
them, but I have to do my own leg work to help decide what we're going 
to do.

Dual Xeon 2.8 CPUs with HT turned off.
   

Yeah, thought it was a Xeon.
 

If we went with a single CPU, like Athlon/Opertron64,  would CS 
storming  go away?

Thanks.
Steve Poe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] 200 times slower then MSSQL??

2005-01-26 Thread PFC

with about 8000 rows. For this table query:
SELECT MAX(MsgSeqNum),MAX(LogTimestamp) FROM ELT_tcli_MessageLog
WHERE LogTimestamp  >=  '0' AND IsFromCounterParty  =  'Y' AND
IsOutOfSequence = 'N'
 AND ConnectionName  =  'DB_BENCHMARK'
 AND LogTimestamp IN (SELECT MAX(LogTimestamp)
  FROM ELT_tcli_MessageLog
  WHERE MsgSeqNum  >  0 AND IsFromCounterParty =  
'Y'

AND IsOutOfSequence  =  'N' AND
ConnectionName  =  'DB_BENCHMARK')
	Can you explain (with words) what this query is supposed to return ? It  
is probably possible to write it in an entirely different way.
	Basically your problem is that max() in postgres does not use an index  
the way you think it should.
	"SELECT max(x) FROM t" should be written "SELECT x FROM t ORDER BY x DESC  
LIMIT 1" to use the index. Depending on additional Where conditions, you  
should add other columns to your index and also order-by clause.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread PFC
The problem with this approach is TTFB (Time to first Byte).  The
initial query is very slow, but additional requests are fast.  In most
situations we do not want the user to have to wait a disproportionate
amount of time for the initial query.  If this is the first time using
the system this will be the impression that will stick with them.  I
guess we could experiment and see how much extra time creating a cache
table will take...
Do it on the second page then ;)
	Seriously :
	- If you want to display the result count and page count, you'll need to  
do the whole query anyway, so you might as well save the results.
	- inserting the result id's in a temp table one by one will be slow, but  
you can do this :

select array_accum(id) from temp group by id/20 limit 3;
  array_accum
---
 {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
 {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
 {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}
	- a really big search of 131072 results :
create table cache (id serial primary key, value integer[]);
explain analyze insert into cache (value) select array_accum(id) from temp  
group by id/100;
 Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)  
(actual time=961.746..1446.630 rows=1311 loops=1)
   ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)  
(actual time=961.607..1423.803 rows=1311 loops=1)
 ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual  
time=961.181..1077.662 rows=131072 loops=1)
   Sort Key: (id / 100)
   ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072  
width=4) (actual time=0.032..291.652 rows=131072 loops=1)
 Total runtime: 1493.304 ms

	Note that the "SELECT..." part takes 1400 ms, and the INSERT part takes  
the rest, which is really small. It's the sort which takes most of the  
time, but you'll be doing it anyway to get your results in order, so it  
comes free to you. This will generate 1000 pages with 100 results on each.  
If your searches yield say 1000 results it'll be perfectly fine and can  
target times in the sub-100 ms for caching the results (not counting the  
total query time of course !)

	Using arrays is the key here, because inserting all the results as  
individual rows in the table is gonna be a whole lot slower !



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Ideal disk setup for Postgresql 7.4?

2005-01-26 Thread Tom Lane
Steve Poe <[EMAIL PROTECTED]> writes:
>> Well, it's less bad with 7 disks than it is with 3, certainly.   
>> However,there 
>> is an obvious and quick gain to be had by splitting off the WAL logs onto 
>> their own disk resource ... up to 14%+ performance in some applications.
>> 
> Pardon my ignorance, but the WAL logs are comprised of pg_xlog and 
> pg_clog? Their own disk resource, but not within the same channel of 
> disks the database is on, right?

Just pg_xlog.  Ideally you don't want any other traffic on the physical
disk pg_xlog is on --- the idea is that the write heads need to stay
over the current xlog file.  I don't think it hurts too much to share a
controller channel though.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Should the optimizer see this?

2005-01-26 Thread Ron Mayer

Short summary... the second query runs faster, and I think
they should be identical queries.   Should the optimizer
have found this optimization?


I have two identical (or so I believe) queries; one where I 
explicitly add a "is not null" comparison; and one where I 
think it would implicitly only find not-null columns.

The queries are 

select *
  from rt4, rt5
 where rt4.tigerfile = rt5.tigerfile
   and feat = feat3;

and 

select *
  from (select * from rt4 where feat3 is not null) as rt4, rt5
 where rt4.tigerfile = rt5.tigerfile
   and feat = feat3;

I would have thought that the optimizer would see that
if feat3 is null (which it usually is), it doesn't need
to keep those rows and sort them -- but it seems (looking
both at explain analyze and "du" on the tmp directory)
that in the first query it is indeed sorting all the
rows --- even the ones with feat3=null.

 

The tables are the Census Tiger Line data explained in detail here:
http://www.census.gov/geo/www/tiger/tiger2003/TGR2003.pdf
I can attach the create statemnts for the tables if people 
think they'd help.   Basically, table rt4 has a column
called feat3 which is usually null, and table rt5 has a
column called feat which is never null.  Both tables have
a few million rows.

No indexes were used, since I'm joining everything to 
everything, they shouldn't have helped anyway.  However
vacuum analyze was run, and (as seen in the second query)
the stats did know that the column feat3 was mostly null.

=
fli=# 
fli=# explain analyze   

select *

  from rt4, rt5 

 where rt4.tigerfile = rt5.tigerfile

   and feat = feat3;
fli-# fli-# fli-# fli-# 

  QUERY PLAN
---
 Merge Join  (cost=1922903.02..1967385.35 rows=117698 width=100) (actual 
time=179246.872..218920.724 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = 
"inner".tigerfile))
   ->  Sort  (cost=876532.10..888964.80 rows=4973079 width=45) (actual 
time=57213.327..67313.216 rows=4971022 loops=1)
 Sort Key: rt4.feat3, rt4.tigerfile
 ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=4973079 width=45) 
(actual time=0.053..10433.883 rows=4971022 loops=1)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual 
time=122033.463..134037.127 rows=5767675 loops=1)
 Sort Key: rt5.feat, rt5.tigerfile
 ->  Seq Scan on rt5  (cost=0.00..127146.13 rows=5634813 width=55) 
(actual time=0.016..22538.958 rows=5635077 loops=1)
 Total runtime: 219632.580 ms
(9 rows)

fli=# fli=# fli=# 
fli=# explain analyze   

select *

  from (select * from rt4 where feat3 is not null) as rt4, rt5  

 where rt4.tigerfile = rt5.tigerfile

   and feat = feat3;


fli-# fli-# fli-# fli-# 
  QUERY PLAN 
---
 Merge Join  (cost=1152466.47..1194789.77 rows=3296 width=100) (actual 
time=125982.562..145927.220 rows=153091 loops=1)
   Merge Cond: (("outer".feat3 = "inner".feat) AND ("outer".tigerfile = 
"inner".tigerfile))
   ->  Sort  (cost=106095.56..106443.67 rows=139247 width=45) (actual 
time=11729.319..11823.006 rows=153091 loops=1)
 Sort Key: tgr.rt4.feat3, tgr.rt4.tigerfile
 ->  Seq Scan on rt4  (cost=0.00..94198.79 rows=139247 width=45) 
(actual time=32.404..10893.373 rows=153091 loops=1)
   Filter: (feat3 IS NOT NULL)
   ->  Sort  (cost=1046370.92..1060457.95 rows=5634813 width=55) (actual 
time=114253.157..126650.225 rows=5767675 loops=1)
 Sort Key: rt5.feat, rt5.tigerfile
 ->  Seq Scan 

Re: [PERFORM] Should the optimizer see this?

2005-01-26 Thread Tom Lane
Ron Mayer <[EMAIL PROTECTED]> writes:
> Should the optimizer have found this optimization?

I can't get excited about it.  Joining on a column that's mostly nulls
doesn't seem like a common thing to do.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Ideal disk setup for Postgresql 7.4?

2005-01-26 Thread Josh Berkus
Steve,

> Okay. InCPU-bound servers, use hw RAID. Any hw raids to avoid?

Well, the list of ones which are good is shorter: pretty much LSI and 3Ware 
(for SATA).  You can suffer with Adaptec if you have to.

> If we went with a single CPU, like Athlon/Opertron64,  would CS
> storming  go away?

Yes.  And then you might be able to use SW Raid.   Of course, you may lose 
performance in other areas with the 1 processor.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] OFFSET impact on Performance???

2005-01-26 Thread David Brown
Although larger offsets have some effect, your real problem is the sort 
(of 42693 rows).

Try:
SELECT r.id_order
FROM report r
WHERE r.id_order IN
  (SELECT id
  FROM orders
  WHERE id_status = 6
  ORDER BY 1
  LIMIT 10 OFFSET 1000)
ORDER BY 1
The subquery doesn't *have* to sort because the table is already ordered 
on the primary key.
You can still add a join to orders outside the subselect without 
significant cost.

Incidentally, I don't know how you got the first plan - it should 
include a sort as well.

Andrei Bintintan wrote:
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 10
> 
> Limit  (cost=44.37..88.75 rows=10 width=4) (actual time=0.160..0.275 
rows=10 loops=1)
>   ->  Merge Join  (cost=0.00..182150.17 rows=41049 width=4) (actual 
time=0.041..0.260 rows=20 loops=1)
> Merge Cond: ("outer".id_order = "inner".id)
> ->  Index Scan using report_id_order_idx on report r  
(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075 
rows=20 loops=1)
> ->  Index Scan using orders_pkey on orders o  
(cost=0.00..24127.04 rows=42501 width=4) (actual time=0.013..0.078 
rows=20 loops=1)
>   Filter: (id_status = 6)
> Total runtime: 0.373 ms
>
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 100
> Limit  (cost=31216.85..31216.85 rows=1 width=4) (actual 
time=1168.152..1168.152 rows=0 loops=1)
>   ->  Sort  (cost=31114.23..31216.85 rows=41049 width=4) (actual 
time=1121.769..1152.246 rows=42693 loops=1)
> Sort Key: o.id
> ->  Hash Join  (cost=2329.99..27684.03 rows=41049 width=4) 
(actual time=441.879..925.498 rows=42693 loops=1)
>   Hash Cond: ("outer".id_order = "inner".id)
>   ->  Seq Scan on report r  (cost=0.00..23860.62 
rows=42862 width=4) (actual time=38.634..366.035 rows=42864 loops=1)
>   ->  Hash  (cost=2077.74..2077.74 rows=42501 width=4) 
(actual time=140.200..140.200 rows=0 loops=1)
> ->  Seq Scan on orders o  (cost=0.00..2077.74 
rows=42501 width=4) (actual time=0.059..96.890 rows=42693 loops=1)
>   Filter: (id_status = 6)
> Total runtime: 1170.586 ms

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Alex Turner
Thats a really good idea, just store a list of the sorted ids in the
temp table - small amount of data for insert... I like it!

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 22:24:34 +0100, PFC <[EMAIL PROTECTED]> wrote:
> > The problem with this approach is TTFB (Time to first Byte).  The
> > initial query is very slow, but additional requests are fast.  In most
> > situations we do not want the user to have to wait a disproportionate
> > amount of time for the initial query.  If this is the first time using
> > the system this will be the impression that will stick with them.  I
> > guess we could experiment and see how much extra time creating a cache
> > table will take...
> 
> Do it on the second page then ;)
> 
> Seriously :
> - If you want to display the result count and page count, you'll need 
> to
> do the whole query anyway, so you might as well save the results.
> - inserting the result id's in a temp table one by one will be slow, 
> but
> you can do this :
> 
> select array_accum(id) from temp group by id/20 limit 3;
>array_accum
> ---
>   {1,2,4,8,16,17,9,18,19,5,10,11,3,6,12,13,7,14,15}
>   {32,33,34,35,36,37,38,39,20,21,22,23,24,25,26,27,28,29,30,31}
>   {40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59}
> 
> - a really big search of 131072 results :
> create table cache (id serial primary key, value integer[]);
> explain analyze insert into cache (value) select array_accum(id) from temp
> group by id/100;
>   Subquery Scan "*SELECT*"  (cost=14382.02..17986.50 rows=131072 width=32)
> (actual time=961.746..1446.630 rows=1311 loops=1)
> ->  GroupAggregate  (cost=14382.02..16020.42 rows=131072 width=4)
> (actual time=961.607..1423.803 rows=1311 loops=1)
>   ->  Sort  (cost=14382.02..14709.70 rows=131072 width=4) (actual
> time=961.181..1077.662 rows=131072 loops=1)
> Sort Key: (id / 100)
> ->  Seq Scan on "temp"  (cost=0.00..2216.40 rows=131072
> width=4) (actual time=0.032..291.652 rows=131072 loops=1)
>   Total runtime: 1493.304 ms
> 
> Note that the "SELECT..." part takes 1400 ms, and the INSERT part 
> takes
> the rest, which is really small. It's the sort which takes most of the
> time, but you'll be doing it anyway to get your results in order, so it
> comes free to you. This will generate 1000 pages with 100 results on each.
> If your searches yield say 1000 results it'll be perfectly fine and can
> target times in the sub-100 ms for caching the results (not counting the
> total query time of course !)
> 
> Using arrays is the key here, because inserting all the results as
> individual rows in the table is gonna be a whole lot slower !
> 
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-26 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] 
("Merlin Moncure") transmitted:
> Alex wrote:
>> How do you create a temporary view that has only a small subset of the
>> data from the DB init?  (Links to docs are fine - I can read ;).  My
>> query isn't all that complex, and my number of records might be from
>> 10 to 2k depending on how I implement it.
>
> Well, you can't.  My point was that the traditional query/view
> approach is often more appropriate for these cases.

Actually, you can if you assume you can "temporarily materialize" that
view.

You take the initial query and materialize it into a temporary table
which can then be used to browse "detail."

Thus, suppose you've got a case where the selection criteria draw in
8000 objects/transactions, of which you only want to fit 20/page.

It's ugly and slow to process the 15th page, and you essentially
reprocess the whole set from scratch each time:

  select [details] from [big table] where [criteria]
order by [something]
offset 280 limit 20;

Instead, you might start out by doing:

  select [key fields] into temp table my_query
  from [big table] where [criteria];

  create index my_query_idx on my_query(interesting fields);

With 8000 records, the number of pages in the table will correspond
roughly to the number of bytes per record which is probably pretty
small.

Then, you use a join on my_query to pull the bits you want:

  select [big table.details] from [big table], 
  [select * from my_query order by [something] offset 280 limit 20]
  where [join criteria between my_query and big table]
  order by [something];

For this to be fast is predicated on my_query being compact, but that
should surely be so.

The big table may be 20 million records; for the result set to be even
vaguely browsable means that my_query ought to be relatively small so
you can pull subsets reasonably efficiently.

This actually has a merit over looking at a dynamic, possibly-changing
big table that you won't unexpectedly see the result set changing
size.

This strikes me as a pretty slick way to handle "data warehouse-style"
browsing...
-- 
output = ("cbbrowne" "@" "gmail.com")
http://www.ntlug.org/~cbbrowne/oses.html
The first cup of coffee recapitulates phylogeny.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster