[PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC



The moral of the story is that you're probably better off running a
bunch of small selects than in trying to optimize things with one
gargantuan select.



Ever experiment with loading the parameters into a temp table and
joining to that?


Also, it might be worth re-testing that conclusion with PG CVS tip
(or 8.2 when it comes out).  The reimplementation of IN as = ANY that
I did a couple months ago might well change the results.


Long mail, but I think it's interesting...

	I think this is a generic problem, which is often encountered : selecting  
a bunch of records based on a list of primary keys (or other indexed,  
unique field) ; said list being anything from very short to quite large.

Here are a few occurences of this need :

	1- The application supplies a list of id's (the case of the OP of this  
thread)
	2- A query Q1 yields a list of selected objects , that we wish to use in  
several subsequent queries.
	And Q1 is a query we don't wish to do several times, either because it's  
slow, complicated (advanced search, for instance), or it acts on a  
constantly moving dataset, so the results would be different each time. So  
we store the result of Q1 in the application, or in a temp table, or in an  
array in a plpgsql variable, whatever, to reuse them.


	Then, for each of these objects, often we will make more queries to  
resolve foreign keys (get category name, owner name, from categories and  
users tables, etc).


	I have encountered both cases quite often, and they both pose a few  
problems. I think it would be a good opportunity for a new feature (see  
below).

A typical use case for point 2 :

Consider an objects table. Each object ...
	- is related to one or several rows from the categories table via an  
objects_categories link table.

- has an owner_id referencing the users table

	I do an advanced search query on objects, which returns a list of  
objects. I can join directly to users to get the owner's name, but  
joining to categories is already problematic because of the many-to-many  
relationship.


	I wish to do this : fetch all objects matching the search criteria ;  
fetch the owner users ; fetch the categories ; build in my application  
object space a clean and straightforward data representation for all this.


Also :
- I do not wish to complicate the search query.
	- The row estimates for the search query results are likely to be not so  
good (because it's a complex query) ; so the joins to users and  
categories are likely to use suboptimal plans based on not so good  
estimates.
	- The rows from objects are large ; so moving them around through a lot  
of small joins hurts performance.


The obvious solution is this :

BEGIN;
CREATE TEMPORARY TABLE results ON COMMIT DROP AS SELECT * FROM advanced  
search query;

ANALYZE results;

-- get the results to the application
SELECT * FROM results;

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

-- get category info
SELECT * FROM categories WHERE id IN (SELECT category_id FROM  
objects_to_categories WHERE object_id IN (SELECT id FROM results));


-- get object/category relations (the ORM will use this to link objects in  
the application)
SELECT * FROM objects_to_categories WHERE object_id IN (SELECT id FROM  
results);

COMMIT;

You might wonder why I do it this way on the categories table.
	This is because I use an Object-Relational mapper which will instantiate  
a User or Category class object for each row I fetch from these tables. I  
do not want to fetch just the username, using a simple join, but I want  
the full object, because :
	- I want to instantiate these objects (they have useful methods to  
process rights etc)

- I do not want to mix columns from objects and users

	And I do not wish to instantiate each category more than once. This would  
waste memory, but more importantly, it is a lot cleaner to have only one  
instance per row, because my ORM then translates the foreign key relations  
into object relations (pointers). Each instanciated category will contain  
a list of Object instances ; each Object instance will contain a list of  
the categories it belongs to, and point to its owner user.


	Back to the point : I can't use the temp table method, because temp  
tables are too slow.
	Creating a temp table, filling it, analyzing it and then dropping it  
takes about 100 ms. The search query, on average, takes 10 ms.


	So I have to move this logic to the application, or to plpgsql, and jump  
through hoops and use big IN() clauses ; which has the following drawbacks  
:

- slow
- ugly
- very hard for the ORM to auto-generate

***

Feature proposal :

	A way to store query results in a named buffer and reuse them in the next  
queries.
	This should be as fast as possible, store results in RAM if possible, and  
be 

[PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Hannes Dorbath

Hi,

I've just had some discussion with colleagues regarding the usage of 
hardware or software raid 1/10 for our linux based database servers.


I myself can't see much reason to spend $500 on high end controller 
cards for a simple Raid 1.


Any arguments pro or contra would be desirable.

From my experience and what I've read here:

+ Hardware Raids might be a bit easier to manage, if you never spend a 
few hours to learn Software Raid Tools.


+ There are situations in which Software Raids are faster, as CPU power 
has advanced dramatically in the last years and even high end controller 
cards cannot keep up with that.


+ Using SATA drives is always a bit of risk, as some drives are lying 
about whether they are caching or not.


+ Using hardware controllers, the array becomes locked to a particular 
vendor. You can't switch controller vendors as the array meta 
information is stored proprietary. In case the Raid is broken to a level 
the controller can't recover automatically this might complicate manual 
recovery by specialists.


+ Even battery backed controllers can't guarantee that data written to 
the drives is consistent after a power outage, neither that the drive 
does not corrupt something during the involuntary shutdown / power 
irregularities. (This is theoretical as any server will be UPS backed)



--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC



You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use select * this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.


	It is a bit of a catch : I need this information, because the purpose of  
the query is to retrieve these objects. I can first store the ids, then  
retrieve the objects, but it's one more query.



Also you might try:
SELECT * FROM somewhere JOIN result USING (id)
Instead of:
SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)


	Yes you're right in this case ; however the query to retrieve the owners  
needs to eliminate duplicates, which IN() does.


On the other hand if your search query runs in 10ms it seems to be fast  
enough for you to run it multiple times.  Theres propably no point in  
optimizing anything in such case.


I don't think so :
	- 10 ms is a mean time, sometimes it can take much more time, sometimes  
it's faster.
	- Repeating the query might yield different results if records were added  
or deleted in the meantime.
	- Complex search queries have imprecise rowcount estimates ; hence the  
joins that I would add to them will get suboptimal plans.


	Using a temp table is really the cleanest solution now ; but it's too  
slow so I reverted to generating big IN() clauses in the application.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Christian Kratzer

Hi,

On Tue, 9 May 2006, PFC wrote:
snipp/
	Back to the point : I can't use the temp table method, because temp 
tables are too slow.
	Creating a temp table, filling it, analyzing it and then dropping it 
takes about 100 ms. The search query, on average, takes 10 ms.


just some thoughts:

You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use select * this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.

Also you might try:

SELECT * FROM somewhere JOIN result USING (id)

Instead of:

SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)

Joins should be a lot faster than large IN clauses.

Here it will also help if result only contains the primary keys 
and not all the other data. The join will be much faster.


On the other hand if your search query runs in 10ms it seems to be fast 
enough for you to run it multiple times.  Theres propably no point in 
optimizing anything in such case.


Greetings
Christian

--
Christian Kratzer   [EMAIL PROTECTED]
CK Software GmbHhttp://www.cksoft.de/
Phone: +49 7452 889 135 Fax: +49 7452 889 136

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Christian Kratzer

Hi,

On Tue, 9 May 2006, PFC wrote:




You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use select * this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.


	It is a bit of a catch : I need this information, because the purpose 
of the query is to retrieve these objects. I can first store the ids, then 
retrieve the objects, but it's one more query.


yes but depending on what you really need that can be faster.

Additionally to your query you are already transferring the whole result 
set multiple times.  First you copy it to the result table. Then you

read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.


Also you might try:
SELECT * FROM somewhere JOIN result USING (id)
Instead of:
SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)


	Yes you're right in this case ; however the query to retrieve the 
owners needs to eliminate duplicates, which IN() does.


then why useth thy not the DISTINCT clause when building thy result table 
and thou shalt have no duplicates.


On the other hand if your search query runs in 10ms it seems to be fast 
enough for you to run it multiple times.  Theres propably no point in 
optimizing anything in such case.


I don't think so :
	- 10 ms is a mean time, sometimes it can take much more time, 
sometimes it's faster.
	- Repeating the query might yield different results if records were 
added or deleted in the meantime.


which is a perfect reason to use a temp table.  Another variation on 
the temp table scheme is use a result table and add a query_id.


We do something like this in our web application when users submit 
complex queries.  For each query we store tuples of (query_id,result_id)

in a result table.  It's then easy for the web application to page the
result set.

	- Complex search queries have imprecise rowcount estimates ; hence 
the joins that I would add to them will get suboptimal plans.


	Using a temp table is really the cleanest solution now ; but it's too 
slow so I reverted to generating big IN() clauses in the application.


A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.

Greetings
Christian

--
Christian Kratzer   [EMAIL PROTECTED]
CK Software GmbHhttp://www.cksoft.de/
Phone: +49 7452 889 135 Fax: +49 7452 889 136

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC


Additionally to your query you are already transferring the whole result  
set multiple times.  First you copy it to the result table. Then you

read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.


	Considering that the result set is not very large and will be cached in  
RAM, this shouldn't be a problem.


then why useth thy not the DISTINCT clause when building thy result  
table and thou shalt have no duplicates.


Because the result table contains no duplicates ;)
I need to remove duplicates in this type of queries :

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

	And in this case I find IN() easier to read than DISTINCT (what I posted  
was a simplification of my real use case...)


which is a perfect reason to use a temp table.  Another variation on the  
temp table scheme is use a result table and add a query_id.


	True. Doesn't solve my problem though : it's still complex, doesn't have  
good rowcount estimation, bloats a table (I only need these records for  
the duration of the transaction), etc.


We do something like this in our web application when users submit  
complex queries.  For each query we store tuples of (query_id,result_id)

in a result table.  It's then easy for the web application to page the
result set.


Yes, that is about the only sane way to page big result sets.


A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.


	Yes, but in this case temp tables add too much overhead. I wish there  
were RAM based temp tables like in mysql. However I guess the current temp  
table slowness comes from the need to mark their existence in the system  
catalogs or something. That's why I proposed using cursors...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Martijn van Oosterhout
On Tue, May 09, 2006 at 12:10:37PM +0200, PFC wrote:
   Yes, but in this case temp tables add too much overhead. I wish 
   there  were RAM based temp tables like in mysql. However I guess the 
 current temp  table slowness comes from the need to mark their existence in 
 the system  catalogs or something. That's why I proposed using cursors...

It would be interesting to know what the bottleneck is for temp tables
for you. They do not go via the buffer-cache, they are stored in
private memory in the backend, they are not xlogged. Nor flushed to
disk on backend exit. They're about as close to in-memory tables as
you're going to get...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Grega Bremec
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Hannes Dorbath wrote:
 Hi,
 
 I've just had some discussion with colleagues regarding the usage of
 hardware or software raid 1/10 for our linux based database servers.
 
 I myself can't see much reason to spend $500 on high end controller
 cards for a simple Raid 1.
 
 Any arguments pro or contra would be desirable.
 

One pro and one con off the top of my head.

Hotplug. Depending on your platform, SATA may or may not be hotpluggable
(I know AHCI mode is the only one promising some kind of a hotplug,
which means ICH6+ and Silicon Image controllers last I heard). SCSI
isn't hotpluggable without the use of special hotplug backplanes and
disks. You lose that in software RAID, which effectively means you need
to shut the box down and do maintenance. Hassle.

CPU. It's cheap. Much cheaper than your average hardware RAID card. For
the 5-10% overhead usually imposed by software RAID, you can throw in a
faster CPU and never even notice it. Most cases aren't CPU-bound
anyways, or at least, most cases are I/O bound for the better part. This
does raise the question of I/O bandwidth your standard SATA or SCSI
controller comes with, though. If you're careful about that and handle
hotplug sufficiently, you're probably never going to notice you're not
running on metal.

Kind regards,
- --
Grega Bremec
gregab at p0f dot net
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEYHRAfu4IwuB3+XoRA9jqAJ9sS3RBJZEurvwUXGKrFMRZfYy9pQCggGHh
tLAy/YtHwKvhd3ekVDGFtWE=
=vlyC
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




It would be interesting to know what the bottleneck is for temp tables
for you. They do not go via the buffer-cache, they are stored in
private memory in the backend, they are not xlogged. Nor flushed to
disk on backend exit. They're about as close to in-memory tables as
you're going to get...


Hum...
	Timings are a mean over 100 queries, including roundtrip to localhost,  
via a python script.


0.038 ms BEGIN
0.057 ms SELECT 1
0.061 ms COMMIT

0.041 ms BEGIN
0.321 ms SELECT count(*) FROM bookmarks
0.080 ms COMMIT

this test table contains about 250 rows

0.038 ms BEGIN
0.378 ms SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20
0.082 ms COMMIT

the ORDER BY uses an index

0.042 ms BEGIN
0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM  
bookmarks ORDER BY annonce_id DESC LIMIT 20

0.246 ms FETCH ALL FROM tmp
0.048 ms MOVE FIRST IN tmp
0.246 ms FETCH ALL FROM tmp
0.048 ms CLOSE tmp
0.084 ms COMMIT

the CURSOR is about as fast as a simple query

0.101 ms BEGIN
1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
LIMIT 20

0.443 ms ANALYZE tmp
0.365 ms SELECT * FROM tmp
0.310 ms DROP TABLE tmp
32.918 ms COMMIT

	CREATING the table is OK, but what happens on COMMIT ? I hear the disk  
seeking frantically.


With fsync=off, I get this :

0.090 ms BEGIN
1.103 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT  
NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP
0.439 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC  
LIMIT 20

0.528 ms ANALYZE tmp
0.364 ms SELECT * FROM tmp
0.313 ms DROP TABLE tmp
0.688 ms COMMIT

Getting closer ?
	I'm betting on system catalogs updates. I get the same timings with  
ROLLBACK instead of COMMIT. Temp tables have a row in pg_class...


Another temporary table wart :

BEGIN;
CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT NULL, c  
TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP;

INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20;

EXPLAIN ANALYZE SELECT * FROM tmp;
QUERY PLAN
---
 Seq Scan on tmp  (cost=0.00..25.10 rows=1510 width=20) (actual  
time=0.003..0.006 rows=20 loops=1)

 Total runtime: 0.030 ms
(2 lignes)

ANALYZE tmp;
EXPLAIN ANALYZE SELECT * FROM tmp;
   QUERY PLAN

 Seq Scan on tmp  (cost=0.00..1.20 rows=20 width=20) (actual  
time=0.003..0.008 rows=20 loops=1)

 Total runtime: 0.031 ms

	We see that the temp table has a very wrong estimated rowcount until it  
has been ANALYZED.
	However, temporary tables do not support concurrent access (obviously) ;  
and in the case of on-commit-drop tables, inserts can't be rolled back  
(obviously), so an accurate rowcount could be maintained via a simple  
counter...




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Memory and/or cache issues?

2006-05-09 Thread mcelroy, tim
Title: RE: [PERFORM] Memory and/or cache issues?





Ok, thank you all again for your help in this matter. Yes, Michael I (the original poster) did say or imply I guess is a better word for it that a combo of training and hands-on is the best way for one to learn PostgreSQL or just about anything for that matter. Thank you for recognizing the true intention of my statements.

One does need some sort of basis from which to grow. I will say that nothing can replace the hands-on real-world training one can get in this business as it is the best way to learn and remember. Just my opinion. For example, I stated I was a SysAdmin for 20 years. I was then thrust into the Oracle world as a DBA about 2 years ago while still maintaining my SysAdmin responsibilities. I have yet to receive any formal Oracle training and have had to learn that on my own via, manuals, Google searches and begging the Oracle Database Architect here for assistance. However, with PostgreSQL I initially started down the very same track but was fortunate enough to receive the ok for that week long PG boot camp. Although I didn't take all that much away from the boot camp it did provide an excellent base from which I continue to grow as a PG DBA and it has helped me to understand postgres a lot easier and quicker than Oracle.

So please, lets just not throw emails back-n-forth amongst the group. Since joining I have found the group as a whole to be a great resource of information and PG knowledge and do not want us to get a testy with each other over something I said or someone's interpretation of what I said. Case closed.

BTW - I am still working towards getting the knowledge out here about what I learned form the posts, mainly that the buffers/cache row of information from the free command is the one we need most be concerned with.

Thank you,
Tim McElroy


-Original Message-
From:  [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Michael Stone

Sent: Monday, May 08, 2006 5:17 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Memory and/or cache issues?


On Mon, May 08, 2006 at 03:38:23PM -0400, Vivek Khera wrote:
On May 8, 2006, at 1:30 PM, Jim C. Nasby wrote:
Yeah, I prefer my surgeons to work this way too. training is for the
birds.

I think you read too quickly past the part where Tim said he'd 
taking a
week-long training class.

s/training/apprenticeship/g;


Of course, the original poster did say that hands-on was the best way to 
learn. What is apprenticeship but a combination of training and 
experience. Are you just sniping for fun?


Mike Stone


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





Re: [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Tom Lane
PFC [EMAIL PROTECTED] writes:
   Feature proposal :

   A way to store query results in a named buffer and reuse them in the 
 next  
 queries.

Why not just fix the speed issues you're complaining about with temp
tables?  I see no reason to invent a new concept.

(Now, just fix might be easier said than done, but inventing an
essentially duplicate facility would be a lot of work too.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Steve Atkins


On May 9, 2006, at 2:16 AM, Hannes Dorbath wrote:


Hi,

I've just had some discussion with colleagues regarding the usage  
of hardware or software raid 1/10 for our linux based database  
servers.


I myself can't see much reason to spend $500 on high end controller  
cards for a simple Raid 1.


Any arguments pro or contra would be desirable.

From my experience and what I've read here:

+ Hardware Raids might be a bit easier to manage, if you never  
spend a few hours to learn Software Raid Tools.


+ There are situations in which Software Raids are faster, as CPU  
power has advanced dramatically in the last years and even high end  
controller cards cannot keep up with that.


+ Using SATA drives is always a bit of risk, as some drives are  
lying about whether they are caching or not.


Don't buy those drives. That's unrelated to whether you use hardware
or software RAID.



+ Using hardware controllers, the array becomes locked to a  
particular vendor. You can't switch controller vendors as the array  
meta information is stored proprietary. In case the Raid is broken  
to a level the controller can't recover automatically this might  
complicate manual recovery by specialists.


Yes. Fortunately we're using the RAID for database work, rather than  
file

storage, so we can use all the nice postgresql features for backing up
and replicating the data elsewhere, which avoids most of this issue.



+ Even battery backed controllers can't guarantee that data written  
to the drives is consistent after a power outage, neither that the  
drive does not corrupt something during the involuntary shutdown /  
power irregularities. (This is theoretical as any server will be  
UPS backed)


fsync of WAL log.

If you have a battery backed writeback cache then you can get the  
reliability
of fsyncing the WAL for every transaction, and the performance of not  
needing

to hit the disk for every transaction.

Also, if you're not doing that you'll need to dedicate a pair of  
spindles to the
WAL log if you want to get good performance, so that there'll be no  
seeking
on the WAL. With a writeback cache you can put the WAL on the same  
spindles
as the database and not lose much, if anything, in the way of  
performance.
If that saves you the cost of two additional spindles, and the space  
on your
drive shelf for them, you've just paid for a reasonably proced RAID  
controller.


Given those advantages... I can't imagine speccing a large system  
that didn't

have a battery-backed write-back cache in it. My dev systems mostly use
software RAID, if they use RAID at all. But my production boxes all  
use SATA
RAID (and I tell my customers to use controllers with BB cache,  
whether it

be SCSI or SATA).

My usual workloads are write-heavy. If yours are read-heavy that will
move the sweet spot around significantly, and I can easily imagine that
for a read-heavy load software RAID might be a much better match.

Cheers,
  Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread Greg Stark

PFC [EMAIL PROTECTED] writes:

 
   I really like this. It's clean, efficient, and easy to use.
 
   This would be a lot faster than using temp tables.
   Creating cursors is very fast so we can create two, and avoid doing
 twice  the same work (ie. hashing the ids from the results to grab categories
 only once).

Creating cursors for a simple plan like a single sequential scan is fast
because it's using the original data from the table. But your example was
predicated on this part of the job being a complex query. If it's a complex
query involving joins and groupings, etc, then it will have to be materialized
and there's no (good) reason for that to be any faster than a temporary table
which is effectively the same thing.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Bruno Wolff III
On Tue, May 09, 2006 at 12:10:32 +0200,
  Jean-Yves F. Barbier [EMAIL PROTECTED] wrote:
 Naa, you can find ATA | SATA ctrlrs for about EUR30 !

But those are the ones that you would generally be better off not using.

 Definitely NOT, however if your server doen't have a heavy load, the
 software overload can't be noticed (essentially cache managing and
 syncing)

It is fairly common for database machines to be IO, rather than CPU, bound
and so the CPU impact of software raid is low.

 Some hardware ctrlrs are able to avoid the loss of a disk if you turn
 to have some faulty sectors (by relocating internally them); software
 RAID doesn't as sectors *must* be @ the same (linear) addresses.

That is not true. Software raid works just fine on drives that have internally
remapped sectors.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Joshua D. Drake


Don't buy those drives. That's unrelated to whether you use hardware
or software RAID.


Sorry that is an extremely misleading statement. SATA RAID is perfectly 
acceptable if you have a hardware raid controller with a battery backup 
controller.


And dollar for dollar, SCSI will NOT be faster nor have the hard drive 
capacity that you will get with SATA.


Sincerely,

Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC



Creating cursors for a simple plan like a single sequential scan is fast
because it's using the original data from the table.


I used the following query :

SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20

It's a backward index scan + limit... not a seq scan. And it's damn 
fast :

0.042 ms BEGIN
0.153 ms DECLARE tmp SCROLL CURSOR WITHOUT HOLD FOR SELECT * FROM  
bookmarks ORDER BY annonce_id DESC LIMIT 20

0.246 ms FETCH ALL FROM tmp
0.048 ms MOVE FIRST IN tmp
0.246 ms FETCH ALL FROM tmp
0.048 ms CLOSE tmp
0.084 ms COMMIT



But your example was
predicated on this part of the job being a complex query. If it's a  
complex
query involving joins and groupings, etc, then it will have to be  
materialized
and there's no (good) reason for that to be any faster than a temporary  
table

which is effectively the same thing.


	You mean the cursors'storage is in fact the same internal machinery as a  
temporary table ?


	In that case, this raises an interesting question : why is the cursor  
faster ?


	Let's try a real-life example from my website : it is a search query  
(quite complex) which is then joined to a lot of tables to resolve FKeys.
	To that query I must add add an application-made join using a big IN()  
clause extracted from the data.

Timings includes the time to fetch the results into Python.
The running total column is the sum of all timings since the BEGIN.


query_time  running_total   rowsquery
0.061 ms0.061 ms-1 BEGIN
23.420 ms   23.481 ms   85 SELECT * FROM (huge query with a  
lot of joins)
4.318 ms27.799 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u WHERE u.id=l.user_id AND l.annonce_id IN  
(list of ids from previous query) ORDER BY annonce_id, added

0.241 ms28.040 ms   -1  COMMIT

	(Just in case you want to hurt yourself, here's the EXPLAIN ANALYZE  
output : http://peufeu.com/temp/big_explain.txt)

Using a cursor takes about the same time.

	Also, doing just the search query takes about 12 ms, the joins take up  
the rest.


Now, I'll rewrite my query eliminating the joins and using a temp table.
	Storing the whole result in the temp table will be too slow, because  
there are too many columns.
	Therefore I will only store the primary and foreign key columns, and join  
again to the main table to get the full records.


query_time  running_total   rowsquery
0.141 ms0.141 ms-1  BEGIN

Do the search :

8.229 ms8.370 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.918 ms9.287 ms-1  ANALYZE tmp

Fetch the main data to display :

7.663 ms16.951 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort


Fetch log entries associates with each row (one row to many log 
entries) :

1.021 ms17.972 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, tmp t WHERE u.id=l.user_id AND l.annonce_id  
= t.id ORDER BY annonce_id, added
3.468 ms21.440 ms   216 SELECT annonce_id,  
array_accum(list_id) AS list_ids, array_accum(COALESCE(user_id,0)) AS  
list_added_by, max(added) AS added_to_list FROM bookmarks GROUP BY  
annonce_id


Resolve foreign key relations

1.034 ms22.474 ms   37  SELECT r.annonce_id FROM  
read_annonces r, tmp t WHERE r.annonce_id = t.id
0.592 ms23.066 ms   9   SELECT * FROM cities_dist_zipcode  
WHERE zipcode IN (SELECT zipcode FROM tmp)
0.716 ms23.782 ms   11  SELECT * FROM cities_dist WHERE id  
IN (SELECT city_id FROM tmp)
1.125 ms24.907 ms   45  SELECT * FROM contacts WHERE id IN  
(SELECT contact_id FROM tmp)
0.799 ms25.705 ms   42  SELECT * FROM contact_groups WHERE  
id IN (SELECT contact_group_id FROM tmp)

0.463 ms26.169 ms   -1  DROP TABLE tmp
32.208 ms   58.377 ms   -1  COMMIT


From this we see :

	Using a temporary table is FASTER than doing the large query with all the  
joins. (26 ms versus 28 ms).

It's also nicer and cleaner.
However the COMMIT takes as much time as all the queries together !

Let's run with fsync=off :

query_time  running_total   rowsquery
0.109 ms0.109 ms-1  BEGIN
8.321 ms8.430 ms-1  CREATE TEMPORARY TABLE tmp AS  
SELECT id, city_id, zipcode, contact_id, contact_group_id, price/terrain  
as sort FROM (stripped down search query)

0.849 ms9.280 ms-1  ANALYZE tmp
7.360 ms16.640 ms   85  SELECT a.* FROM tmp t,  
annonces_display a WHERE a.id=t.id ORDER BY t.sort
1.067 ms17.707 ms   2   SELECT l.*, u.login, u.bg_color  
FROM annonces_log l, users u, 

Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC



Does the time for commit change much if you leave out the analyze?


	Yes, when I don't ANALYZE the temp table, commit time changes from 30 ms  
to about 15 ms ; but the queries get horrible plans (see below) :


	Fun thing is, the rowcount from a temp table (which is the problem here)  
should be available without ANALYZE ; as the temp table is not concurrent,  
it would be simple to inc/decrement a counter on INSERT/DELETE...


	I like the temp table approach : it can replace a large, complex query  
with a batch of smaller and easier to optimize queries...


EXPLAIN ANALYZE SELECT a.* FROM tmp t, annonces_display a WHERE a.id=t.id  
ORDER BY t.sort;
   QUERY  
PLAN

-
 Sort  (cost=3689.88..3693.15 rows=1310 width=940) (actual  
time=62.327..62.332 rows=85 loops=1)

   Sort Key: t.sort
   -  Merge Join  (cost=90.93..3622.05 rows=1310 width=940) (actual  
time=5.595..61.373 rows=85 loops=1)

 Merge Cond: (outer.id = inner.id)
 -  Index Scan using annonces_pkey on annonces   
(cost=0.00..3451.39 rows=10933 width=932) (actual time=0.012..6.620  
rows=10916 loops=1)
 -  Sort  (cost=90.93..94.20 rows=1310 width=12) (actual  
time=0.098..0.105 rows=85 loops=1)

   Sort Key: t.id
   -  Seq Scan on tmp t  (cost=0.00..23.10 rows=1310  
width=12) (actual time=0.004..0.037 rows=85 loops=1)

 Total runtime: 62.593 ms

EXPLAIN ANALYZE SELECT * FROM contacts WHERE id IN (SELECT contact_id FROM  
tmp);

 QUERY PLAN

 Hash Join  (cost=28.88..427.82 rows=200 width=336) (actual  
time=0.156..5.019 rows=45 loops=1)

   Hash Cond: (outer.id = inner.contact_id)
   -  Seq Scan on contacts  (cost=0.00..349.96 rows=9396 width=336)  
(actual time=0.009..3.373 rows=9396 loops=1)
   -  Hash  (cost=28.38..28.38 rows=200 width=4) (actual  
time=0.082..0.082 rows=46 loops=1)
 -  HashAggregate  (cost=26.38..28.38 rows=200 width=4) (actual  
time=0.053..0.064 rows=46 loops=1)
   -  Seq Scan on tmp  (cost=0.00..23.10 rows=1310 width=4)  
(actual time=0.001..0.015 rows=85 loops=1)

 Total runtime: 5.092 ms

ANALYZE tmp;
ANALYZE
annonces= EXPLAIN ANALYZE SELECT a.* FROM tmp t, annonces_display a WHERE  
a.id=t.id ORDER BY t.sort;

  QUERY PLAN
---
 Sort  (cost=508.63..508.84 rows=85 width=940) (actual time=1.830..1.832  
rows=85 loops=1)

   Sort Key: t.sort
   -  Nested Loop  (cost=0.00..505.91 rows=85 width=940) (actual  
time=0.040..1.188 rows=85 loops=1)
 -  Seq Scan on tmp t  (cost=0.00..1.85 rows=85 width=12) (actual  
time=0.003..0.029 rows=85 loops=1)
 -  Index Scan using annonces_pkey on annonces  (cost=0.00..5.89  
rows=1 width=932) (actual time=0.003..0.004 rows=1 loops=85)

   Index Cond: (annonces.id = outer.id)
 Total runtime: 2.053 ms
(7 lignes)

annonces= EXPLAIN ANALYZE SELECT * FROM contacts WHERE id IN (SELECT  
contact_id FROM tmp);

   QUERY PLAN
-
 Nested Loop  (cost=2.06..139.98 rows=36 width=336) (actual  
time=0.072..0.274 rows=45 loops=1)
   -  HashAggregate  (cost=2.06..2.51 rows=45 width=4) (actual  
time=0.052..0.065 rows=46 loops=1)
 -  Seq Scan on tmp  (cost=0.00..1.85 rows=85 width=4) (actual  
time=0.003..0.016 rows=85 loops=1)
   -  Index Scan using contacts_pkey on contacts  (cost=0.00..3.04 rows=1  
width=336) (actual time=0.003..0.004 rows=1 loops=46)

 Index Cond: (contacts.id = outer.contact_id)
 Total runtime: 0.341 ms

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Dawid Kuroczko

On 5/9/06, PFC [EMAIL PROTECTED] wrote:

 You might consider just selecting your primary key or a set of
 primary keys to involved relations in your search query.  If you
 currently use select * this can make your result set very large.

 Copying all the result set to the temp. costs you additional IO
 that you propably dont need.

It is a bit of a catch : I need this information, because the purpose of
the query is to retrieve these objects. I can first store the ids, then
retrieve the objects, but it's one more query.

 Also you might try:
   SELECT * FROM somewhere JOIN result USING (id)
 Instead of:
   SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)

Yes you're right in this case ; however the query to retrieve the owners
needs to eliminate duplicates, which IN() does.


Well, you can either
 SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);
or even, for large number of ids:
 CREATE TEMPORARY TABLE result_ids AS SELECT id FROM RESULT GROUP BY id;
 SELECT * FROM somewhere JOIN result_ids USING (id);



 On the other hand if your search query runs in 10ms it seems to be fast
 enough for you to run it multiple times.  Theres propably no point in
 optimizing anything in such case.

I don't think so :
- 10 ms is a mean time, sometimes it can take much more time, sometimes
it's faster.
- Repeating the query might yield different results if records were 
added
or deleted in the meantime.


You may SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
though locking might bite you. :)


- Complex search queries have imprecise rowcount estimates ; hence the
joins that I would add to them will get suboptimal plans.

Using a temp table is really the cleanest solution now ; but it's too
slow so I reverted to generating big IN() clauses in the application.


A thought, haven't checked it though, but...

You might want to use PL to store values, say PLperl, or even C, say:

create or replace function perl_store(name text, val int) returns void
as $$ my $name = shift; push @{$foo{$name}}, shift; return $$ LANGUAGE
plperl;

select perl_store('someids', id) from something group by id;
(you may need to warp it inside count())

Then use it:

create or replace function perl_retr(name text) returns setof int as
$$ my $name = shift; return $foo{$name} $$ LANGUAGE plperl;

select * from someother join perl_retr('someids') AS a(id) using (id);

All is in the memory.  Of course, you need to do some cleanup, test it,
etc, etc, etc. :)

Should work faster than a in-application solution :)

 Regards,
 Dawid

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread PFC




   SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)



Well, you can either
  SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);


It's the same thing (and postgres knows it)


You might want to use PL to store values, say PLperl, or even C, say:


I tried.
	The problem is that you need a set-returning function to retrieve the  
values. SRFs don't have rowcount estimates, so the plans suck.



Should work faster than a in-application solution :)


Should, but don't, because of what I said above...

	With the version in CVS tip, supprting a fast =ANY( array ), this should  
be doable, though.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Steve Atkins


On May 9, 2006, at 8:51 AM, Joshua D. Drake wrote:

(Using SATA drives is always a bit of risk, as some drives are lying  
about whether they are caching or not.)



Don't buy those drives. That's unrelated to whether you use hardware
or software RAID.


Sorry that is an extremely misleading statement. SATA RAID is  
perfectly acceptable if you have a hardware raid controller with a  
battery backup controller.


If the drive says it's hit the disk and it hasn't then the RAID  
controller

will have flushed the data from its cache (or flagged it as correctly
written). At that point the only place the data is stored is in the non
battery backed cache on the drive itself. If something fails then you'll
have lost data.

You're not suggesting that a hardware RAID controller will protect
you against drives that lie about sync, are you?



And dollar for dollar, SCSI will NOT be faster nor have the hard  
drive capacity that you will get with SATA.


Yup. That's why I use SATA RAID for all my databases.

Cheers,
  Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Vivek Khera


On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote:

Sorry that is an extremely misleading statement. SATA RAID is  
perfectly acceptable if you have a hardware raid controller with a  
battery backup controller.


And dollar for dollar, SCSI will NOT be faster nor have the hard  
drive capacity that you will get with SATA.


Does this hold true still under heavy concurrent-write loads?  I'm  
preparing yet another big DB server and if SATA is a better option,  
I'm all (elephant) ears.




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Douglas McNaught
Vivek Khera [EMAIL PROTECTED] writes:

 On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote:

 And dollar for dollar, SCSI will NOT be faster nor have the hard
 drive capacity that you will get with SATA.

 Does this hold true still under heavy concurrent-write loads?  I'm
 preparing yet another big DB server and if SATA is a better option,
 I'm all (elephant) ears.

Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive.

-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Joshua D. Drake

Vivek Khera wrote:


On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote:

Sorry that is an extremely misleading statement. SATA RAID is 
perfectly acceptable if you have a hardware raid controller with a 
battery backup controller.


And dollar for dollar, SCSI will NOT be faster nor have the hard drive 
capacity that you will get with SATA.


Does this hold true still under heavy concurrent-write loads?  I'm 
preparing yet another big DB server and if SATA is a better option, I'm 
all (elephant) ears.


I didn't say better :). If you can afford, SCSI is the way to go. 
However SATA with a good controller (I am fond of the LSI 150 series) 
can provide some great performance.


I have not used, but have heard good things about Areca as well. Oh, and 
make sure they are SATA-II drives.


Sincerely,

Joshua D. Drake







--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Joshua D. Drake



You're not suggesting that a hardware RAID controller will protect
you against drives that lie about sync, are you?


Of course not, but which drives lie about sync that are SATA? Or more 
specifically SATA-II?


Sincerely,

Joshua D. Drake



--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Steve Atkins


On May 9, 2006, at 11:26 AM, Joshua D. Drake wrote:




You're not suggesting that a hardware RAID controller will protect
you against drives that lie about sync, are you?


Of course not, but which drives lie about sync that are SATA? Or  
more specifically SATA-II?


SATA-II, none that I'm aware of, but there's a long history of dodgy
behaviour designed to pump up benchmark results down in the
consumer drive space, and low end consumer space is where a
lot of SATA drives are. I wouldn't be surprised to see that beahviour
there still.

I was responding to the original posters assertion that drives lying
about sync were a reason not to buy SATA drives, by telling him
not to buy drives that lie about sync. You seem to have read this
as don't buy SATA drives, which is not what I said and not what I
meant.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Joshua D. Drake

Douglas McNaught wrote:

Vivek Khera [EMAIL PROTECTED] writes:


On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote:


And dollar for dollar, SCSI will NOT be faster nor have the hard
drive capacity that you will get with SATA.

Does this hold true still under heavy concurrent-write loads?  I'm
preparing yet another big DB server and if SATA is a better option,
I'm all (elephant) ears.


Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive.


Best I have seen is 10k but if I can put 4x the number of drives in the 
array at the same cost... I don't need 15k.


Joshua D. Drake



-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Scott Marlowe
On Tue, 2006-05-09 at 12:52, Steve Atkins wrote:
 On May 9, 2006, at 8:51 AM, Joshua D. Drake wrote:
 
 (Using SATA drives is always a bit of risk, as some drives are lying  
 about whether they are caching or not.)
 
  Don't buy those drives. That's unrelated to whether you use hardware
  or software RAID.
 
  Sorry that is an extremely misleading statement. SATA RAID is  
  perfectly acceptable if you have a hardware raid controller with a  
  battery backup controller.
 
 If the drive says it's hit the disk and it hasn't then the RAID  
 controller
 will have flushed the data from its cache (or flagged it as correctly
 written). At that point the only place the data is stored is in the non
 battery backed cache on the drive itself. If something fails then you'll
 have lost data.
 
 You're not suggesting that a hardware RAID controller will protect
 you against drives that lie about sync, are you?

Actually, in the case of the Escalades at least, the answer is yes. 
Last year (maybe a bit more) someone was testing an IDE escalade
controller with drives that were known to lie, and it passed the power
plug pull test repeatedly.  Apparently, the escalades tell the drives to
turn off their cache.  While most all IDEs and a fair number of SATA
drives lie about cache fsyncing, they all seem to turn off the cache
when you ask.

And, since a hardware RAID controller with bbu cache has its own cache,
it's not like it really needs the one on the drives anyway.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Tom Lane
PFC [EMAIL PROTECTED] writes:
   Fun thing is, the rowcount from a temp table (which is the problem 
 here)  
 should be available without ANALYZE ; as the temp table is not concurrent,  
 it would be simple to inc/decrement a counter on INSERT/DELETE...

No, because MVCC rules still apply.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Slow C Function

2006-05-09 Thread Adam Palmblad
 
Hi,

We've got a C function that we use here and we find that for every
connection, the first run of the function is much slower than any
subsequent runs. ( 50ms compared to 8ms)

Besides using connection pooling, are there any options to improve
performance?

By the way, we are using pg version 8.1.3.

-Adam


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow C Function

2006-05-09 Thread Joshua D. Drake

Adam Palmblad wrote:
 
Hi,


We've got a C function that we use here and we find that for every
connection, the first run of the function is much slower than any
subsequent runs. ( 50ms compared to 8ms)


That is fairly standard because the data will be cached.



Besides using connection pooling, are there any options to improve
performance?


Not that I know of but then again I am not a C programer.





By the way, we are using pg version 8.1.3.

-Adam


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Postgres gets stuck

2006-05-09 Thread Craig A. James

I'm having a rare but deadly problem.  On our web servers, a process occasionally gets stuck, and 
can't be unstuck.  Once it's stuck, all Postgres activities cease.  kill -9 is required 
to kill it -- signals 2 and 15 don't work, and /etc/init.d/postgresql stop fails.

Here's what the process table looks like:

$ ps -ef | grep postgres
postgres 30713 1  0 Apr24 ?00:02:43 /usr/local/pgsql/bin/postmaster 
-p 5432 -D /disk3/postgres/data
postgres 25423 30713  0 May08 ?00:03:34 postgres: writer process
postgres 25424 30713  0 May08 ?00:00:02 postgres: stats buffer process
postgres 25425 25424  0 May08 ?00:00:02 postgres: stats collector 
process
postgres 11918 30713 21 07:37 ?02:00:27 postgres: production webuser 
127.0.0.1(21772) SELECT
postgres 31624 30713  0 16:11 ?00:00:00 postgres: production webuser 
[local] idle
postgres 31771 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12422) idle
postgres 31772 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12421) idle
postgres 31773 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12424) idle
postgres 31774 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12425) idle
postgres 31775 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12426) idle
postgres 31776 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12427) idle
postgres 31777 30713  0 16:12 ?00:00:00 postgres: production webuser 
127.0.0.1(12428) idle

The SELECT process is the one that's stuck.  top(1) and other indicators show that 
nothing is going on at all (no CPU usage, normal memory usage); the process seems to be 
blocked waiting for something.  (The idle processes are attached to a FastCGI 
program.)

This has happened on *two different machines*, both doing completely different 
tasks.  The first one is essentially a read-only warehouse that serves lots of 
queries, and the second one is the server we use to load the warehouse.  In 
both cases, Postgres has been running for a long time, and is issuing SELECT 
statements that it's issued millions of times before with no problems.  No 
other processes are accessing Postgres, just the web services.

This is a deadly bug, because our web site goes dead when this happens, and it 
requires an administrator to log in and kill the stuck postgres process then 
restart Postgres.  We've installed failover system so that the web site is 
diverted to a backup server, but since this has happened twice in one week, 
we're worried.

Any ideas?

Details:

   Postgres 8.0.3
   Linux 2.6.12-1.1381_FC3smp i686 i386

   Dell 2-CPU Xeon system (hyperthreading is enabled)
   4 GB memory
   2 120 GB disks (SATA on machine 1, IDE on machine 2)

Thanks,
Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Postgres gets stuck

2006-05-09 Thread Chris


This is a deadly bug, because our web site goes dead when this happens, 
and it requires an administrator to log in and kill the stuck postgres 
process then restart Postgres.  We've installed failover system so that 
the web site is diverted to a backup server, but since this has happened 
twice in one week, we're worried.


Any ideas?


Sounds like a deadlock issue.

Do you have query logging turned on?

Also, edit your postgresql.conf file and add (or uncomment):

stats_command_string = true

and restart postgresql.

then you'll be able to:

select * from pg_stat_activity;

to see what queries postgres is running and that might give you some clues.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Bruce Momjian
Scott Marlowe wrote:
 Actually, in the case of the Escalades at least, the answer is yes. 
 Last year (maybe a bit more) someone was testing an IDE escalade
 controller with drives that were known to lie, and it passed the power
 plug pull test repeatedly.  Apparently, the escalades tell the drives to
 turn off their cache.  While most all IDEs and a fair number of SATA
 drives lie about cache fsyncing, they all seem to turn off the cache
 when you ask.
 
 And, since a hardware RAID controller with bbu cache has its own cache,
 it's not like it really needs the one on the drives anyway.

You do if the controller thinks the data is already on the drives and
removes it from its cache.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread William Yu

William Yu wrote:
We upgraded our disk system for our main data processing server earlier 
this year. After pricing out all the components, basically we had the 
choice of:


LSI MegaRaid 320-2 w/ 1GB RAM+BBU + 8 15K 150GB SCSI

  or

Areca 1124 w/ 1GB RAM+BBU + 24 7200RPM 250GB SATA


My mistake -- I keep doing calculations and they don't add up. So I 
looked again on pricewatch and it turns out the actual comparison was 
for 4 SCSI drives, not 8! ($600 for a 15K 145GB versus $90 for a 7200 
250GB.) No wonder our decision seemed to much more decisive back then.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Lot'sa joins - performance tip-up, please?

2006-05-09 Thread Mario Splivalo
On Wed, 2006-05-03 at 10:20 -0500, Dave Dutcher wrote:
   -  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
  (actual time=68.322..529472.026 rows=57925 loops=1)
 -  Seq Scan on ticketing_codes_played
  (cost=0.00..863.25 rows=57925 width=8) (actual time=0.042..473.881
  rows=57925 loops=1)
 -  Index Scan using ticketing_codes_pk on
  ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
  time=9.102..9.108 rows=1 loops=57925)
   Index Cond: (ticketing_codes.code_id =
  outer.code_id)
   Total runtime: 542000.093 ms
  (27 rows)
  
  
  I'll be more than happy to provide any additional information 
  that I may
  be able to gather. I'd be most happy if someone would scream something
  like four joins, smells like a poor design because design 
  is poor, but
  the system is in production, and I have to bare with it.
 
 
 It looks like that nested loop which is joining ticketing_codes_played
 to ticketing_codes is the slow part.  I'm curious how many rows are in
 the ticketing_codes table?
 
 Four or five joins does not seem like a lot to me, but it can be slow if
 you are joining big tables with other big tables.

Ticketing_codes table has 1100 records, and it's expected to grow.

I tried playing with JOIN order as Tom suggested, but performance is the
same.

Mario


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Lot'sa joins - performance tip-up, please?

2006-05-09 Thread Mario Splivalo
On Wed, 2006-05-03 at 13:58 -0400, Tom Lane wrote:
 Mario Splivalo [EMAIL PROTECTED] writes:
  I have a quite large query that takes over a minute to run on my laptop.
 
 The EXPLAIN output you provided doesn't seem to agree with the stated
 query.  Where'd the service_id = 1102 condition come from?

I guess I copypasted the additional WHERE to te EXPLAIN ANALYZE query.
This is the correct one, without the WHERE:

 Hash Left Join  (cost=198628.35..202770.61 rows=121 width=264) (actual
time=998008.264..999645.322 rows=5706 loops=1)
   Hash Cond: (outer.message_id = inner.message_id)
   -  Merge Left Join  (cost=21943.23..21950.96 rows=121 width=238)
(actual time=4375.510..4540.772 rows=5706 loops=1)
 Merge Cond: (outer.message_id = inner.message_id)
 -  Sort  (cost=21847.62..21847.92 rows=121 width=230) (actual
time=3304.787..3378.515 rows=5706 loops=1)
   Sort Key: messages.id
   -  Hash Join  (cost=20250.16..21843.43 rows=121
width=230) (actual time=1617.370..3102.470 rows=5706 loops=1)
 Hash Cond: (outer.message_id = inner.id)
 -  Seq Scan on ticketing_messages
(cost=0.00..1212.37 rows=75937 width=14) (actual time=10.554..609.967
rows=75937 loops=1)
 -  Hash  (cost=20244.19..20244.19 rows=2391
width=216) (actual time=1572.889..1572.889 rows=5706 loops=1)
   -  Nested Loop  (cost=1519.21..20244.19
rows=2391 width=216) (actual time=385.582..1449.207 rows=5706 loops=1)
 -  Seq Scan on services
(cost=0.00..4.20 rows=3 width=54) (actual time=20.829..20.859 rows=2
loops=1)
   Filter: (type_id = 10)
 -  Bitmap Heap Scan on messages
(cost=1519.21..6726.74 rows=1594 width=162) (actual
time=182.346..678.800 rows=2853 loops=2)
   Recheck Cond: ((outer.id =
messages.service_id) AND (messages.receiving_time = '2006-02-12
00:00:00+01'::timestamp with time zone) AND (messages.receiving_time =
'2006-03-18 23:00:00+01'::timestamp with time zone))
   -  BitmapAnd
(cost=1519.21..1519.21 rows=1594 width=0) (actual time=164.311..164.311
rows=0 loops=2)
 -  Bitmap Index Scan on
idx_service_id  (cost=0.00..84.10 rows=14599 width=0) (actual
time=66.809..66.809 rows=37968 loops=2)
   Index Cond:
(outer.id = messages.service_id)
 -  Bitmap Index Scan on
idx_messages_receiving_time  (cost=0.00..1434.87 rows=164144 width=0)
(actual time=192.633..192.633 rows=184741 loops=1)
   Index Cond:
((receiving_time = '2006-02-12 00:00:00+01'::timestamp with time zone)
AND (receiving_time = '2006-03-18 23:00:00+01'::timestamp with time
zone))
 -  Sort  (cost=95.62..99.17 rows=1421 width=8) (actual
time=1070.678..1072.999 rows=482 loops=1)
   Sort Key: ticketing_winners.message_id
   -  Seq Scan on ticketing_winners  (cost=0.00..21.21
rows=1421 width=8) (actual time=424.836..1061.834 rows=1421 loops=1)
   -  Hash  (cost=176144.30..176144.30 rows=57925 width=26) (actual
time=993592.980..993592.980 rows=57925 loops=1)
 -  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
(actual time=1074.984..992536.243 rows=57925 loops=1)
   -  Seq Scan on ticketing_codes_played
(cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993
rows=57925 loops=1)
   -  Index Scan using ticketing_codes_pk on
ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
time=17.044..17.052 rows=1 loops=57925)
 Index Cond: (ticketing_codes.code_id =
outer.code_id)
 Total runtime: 999778.981 ms


 In general, I'd suggest playing around with the join order.  Existing
 releases of PG tend to throw up their hands when faced with a mixture of
 outer joins and regular joins, and just join the tables in the order
 listed.  8.2 will be smarter about this, but for now you have to do it
 by hand ...

No luck for me there. But, I found out that if I first do join on
ticketing_codes and ticketing_codes_played, put the result to temporary
table, and then join that temporary table with the rest of the query
(the SELECT that is in parenthesis is transfered to a temporary table)
the query is almost twice as fast.

As mentioned before, ticketing_codes has 1100 records.

Mario

P.S. Is it just me, or posting to psql-perofrmance is laged, quite a
bit?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] slow variable against int??

2006-05-09 Thread Witold Strzelczyk
I have a question about my function. I must get user rating by game result. 
This isn't probably a perfect solution but I have one question about 

select into inGameRating  count(game_result)+1 from users
where game_result  inRow.game_result;

This query in function results in about 1100 ms.
inRow.game_result is a integer 2984
And now if I replace inRow.game_result with integer

select into inGameRating  count(game_result)+1 from users
where game_result  2984;

query results in about 100 ms

There is probably a reason for this but can you tell me about it because I 
can't fine one

My function:

create or replace function ttt_result(int,int) returns setof tparent_result 
language plpgsql volatile as $$
declare 
inOffset alias for $1;
inLimit alias for $2;
inRow tparent_result%rowtype;
inGameResult int := -1;
inGameRating int := -1;
begin

for inRow in 
select 
email,wynik_gra 
from 
konkurs_uzytkownik 
order by wynik_gra desc limit inLimit offset inOffset 
loop
if inGameResult   0 then -- only for first iteration
/* this is fast ~100 ms
select into inGameRating  
count(game_result)+1 from users
where game_result  2984;
*/
/* even if inRow.game_result = 2984 this is very slow ~ 1100 ms!
select into inGameRating  count(game_result)+1 from users
where game_result  inRow.game_result;
*/
inGameResult  := inRow.game_result;
end if;

if inGameResult   inRow.game_result then 
inGameRating  := inGameRating  + 1;
end if;

inRow.game_rating := inGameRating;
inGameResult   := inRow.game_result;
return next inRow;

end loop;
return;
end;
$$;
-- 
Witold Strzelczyk
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance Issues on Opteron Dual Core

2006-05-09 Thread Gregory Stewart
I installed Ubuntu 5.10 on the production server (64-Bit version), and sure
enough the peformance is like I expected. Opening up that table (320,000
records) takes 6 seconds, with CPU usage of one of the cores going up to
90% - 100% for the 6 seconds.
I assume only one core is being used per user / session / query?

Gregory


-Original Message-
From: Jim C. Nasby [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 04, 2006 12:47 PM
To: Gregory Stewart
Cc: Mark Kirkwood; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance Issues on Opteron Dual Core


All the machines I've been able to replicate this on have been SMP w2k3
machines running SP1. I've been unable to replicate it on anything not
running w2k3, but the only 'SMP' machine I've tested in that manner was
an Intel with HT enabled. I now have an intel with HT and running w2k3
sitting in my office, but I haven't had a chance to fire it up and try
it yet. Once I test that machine it should help narrow down if this
problem exists with HT machines (which someone on -hackers mentioned
they had access to and could do testing with). If it does affect HT
machines then I suspect that this is not an issue for XP...

On Tue, May 02, 2006 at 11:27:02PM -0500, Gregory Stewart wrote:
 Jim,

 Have you seen this happening only on W2k3? I am wondering if I should try
 out 2000 Pro or XP Pro.
 Not my first choice, but if it works...



 -Original Message-
 From: Jim C. Nasby [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 02, 2006 3:29 PM
 To: Mark Kirkwood
 Cc: Gregory Stewart; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Performance Issues on Opteron Dual Core


 On Sun, Apr 30, 2006 at 10:59:56PM +1200, Mark Kirkwood wrote:
  Pgadmin can give misleading times for queries that return large result
  sets over a network, due to:
 
  1/ It takes time to format the (large) result set for display.
  2/ It has to count the time spent waiting for the (large) result set to
  travel across the network.
 
  You aren't running Pgadmin off the dev server are you? If not check your
  network link to dev and prod  - is one faster than the other? (etc).
 
  To eliminate Pgadmin and the network as factors try wrapping your query
  in a 'SELECT count(*) FROM (your query here) AS a', and see if it
  changes anything!

 FWIW, I've found problems running PostgreSQL on Windows in a multi-CPU
 environment on w2k3. It runs fine for some period, and then CPU and
 throughput drop to zero. So far I've been unable to track down any more
 information than that, other than the fact that I haven't been able to
 reproduce this on any single-CPU machines.
 --
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461


 --
 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.1.385 / Virus Database: 268.5.1/328 - Release Date: 5/1/2006



 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.3/331 - Release Date: 5/3/2006



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Dynamically loaded C function performance

2006-05-09 Thread Adam Palmblad
Hi,
We've got a C function that we use here and we find that for every
connection, the first run of the function is much slower than any
subsequent runs.  ( 50ms compared to 8ms)

Besides using connection pooling, are there any options to improve
performance?

-Adam

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM]

2006-05-09 Thread Adam Palmblad
Hi,

We've got a C function that we use here and we find that for every
connection, the first run of the function is much slower than any
subsequent runs. ( 50ms compared to 8ms)

Besides using connection pooling, are there any options to improve
performance?

By the way, we are using pg version 8.1.3.

-Adam


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] VACUUM killing my CPU

2006-05-09 Thread blender
Hi all !

I am running PostgreSQL 7.3.2 on Linux 2.6.13...

What I see when VACUUM process is running is:

Cpu(s):  0.0% us,  3.2% sy,  0.0% ni,  0.0% id, 93.5% wa,  3.2% hi,
0.0% si

What I am worry about is 93.5% wa ...

Could someone explain me what is the VACUUM process waiting for ?

Best regards
David


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] PostgreSQL VACCUM killing CPU

2006-05-09 Thread blender
Hi all !

I have got such problem.
Im running Postgresql 7.3.2 on Linux 2.6.13.
What is see when VACCUM is running and killing my CPU is:

Cpu(s):  3.2% us,  0.0% sy,  0.0% ni,  0.0% id, 96.8% wa,  0.0% hi,
0.0% si

what i am worry about is 96.8% wa why is it like that?

what is the process waiting for ?

could somone explain me that please? :)

Best regards
david


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Jean-Yves F. Barbier
Hi Hannes,

Hannes Dorbath a écrit :
 Hi,
 
 I've just had some discussion with colleagues regarding the usage of
 hardware or software raid 1/10 for our linux based database servers.
 
 I myself can't see much reason to spend $500 on high end controller
 cards for a simple Raid 1.

Naa, you can find ATA | SATA ctrlrs for about EUR30 !

 Any arguments pro or contra would be desirable.
 
 From my experience and what I've read here:
 
 + Hardware Raids might be a bit easier to manage, if you never spend a
 few hours to learn Software Raid Tools.

I'd the same (mostly as you still have to punch a command line for
most of the controlers)

 + There are situations in which Software Raids are faster, as CPU power
 has advanced dramatically in the last years and even high end controller
 cards cannot keep up with that.

Definitely NOT, however if your server doen't have a heavy load, the
software overload can't be noticed (essentially cache managing and
syncing)

For bi-core CPUs, it might be true


 + Using SATA drives is always a bit of risk, as some drives are lying
 about whether they are caching or not.

?? Do you intend to use your server without a UPS ??

 + Using hardware controllers, the array becomes locked to a particular
 vendor. You can't switch controller vendors as the array meta
 information is stored proprietary. In case the Raid is broken to a level
 the controller can't recover automatically this might complicate manual
 recovery by specialists.

?? Do you intend not to make backups ??

 + Even battery backed controllers can't guarantee that data written to
 the drives is consistent after a power outage, neither that the drive
 does not corrupt something during the involuntary shutdown / power
 irregularities. (This is theoretical as any server will be UPS backed)

RAID's laws:

1- RAID prevents you from loosing data on healthy disks, not from faulty
   disks,

1b- So format and reformat your RAID disks (whatever SCSI, ATA, SATA)
several times, with destructive tests (see -c -c option from
the mke2fs man) - It will ensure that disks are safe, and also
make a kind of burn test (might turn to... days of formating!),

2- RAID doesn't prevent you from power suply brokeage or electricity
   breakdown, so use a (LARGE) UPS,

2b- LARGE UPS because HDs are the components that have the higher power
consomption (a 700VA UPS gives me about 10-12 minutes on a machine
with a XP2200+, 1GB RAM and a 40GB HD, however this fall to..
less than 25 secondes with seven HDs ! all ATA),

2c- Use server box with redudancy power supplies,

3- As for any sensitive data, make regular backups or you'll be as
sitting duck.

Some hardware ctrlrs are able to avoid the loss of a disk if you turn
to have some faulty sectors (by relocating internally them); software
RAID doesn't as sectors *must* be @ the same (linear) addresses.

BUT a hardware controler is about EUR2000 and a (ATA/SATA) 500GB HD
is ~ EUR350.

That means you have to consider:

* The server disponibility (time to change a power supply if no
   redudancies, time to exchange a not hotswap HD... In fact, how much
   down time you can afford),

* The volume of the data (from which depends the size of the backup
  device),

* The backup device you'll use (tape or other HDs),

* The load of the server (and the number of simultaneous users =
  Soft|Hard, ATA/SATA|SCSI...),

* The money you can spend in such a server

* And most important, the color of your boss' tie the day you'll
   take the decision.

Hope it will help you

Jean-Yves


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] UNSUBSCRIBE

2006-05-09 Thread Shoaib Burq

UNSUBSCRIBE

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL VACCUM killing CPU

2006-05-09 Thread Will Reese
The wa means waiting on IO.  Vacuum is a very IO intensive  
process.  You can use tools like vmstat and iostat to see how much  
disk IO is occurring.  Also, sar is very helpful for trending these  
values over time.


-- Will Reese   http://blog.rezra.com
On May 9, 2006, at 5:19 AM, [EMAIL PROTECTED] wrote:


Hi all !

I have got such problem.
Im running Postgresql 7.3.2 on Linux 2.6.13.
What is see when VACCUM is running and killing my CPU is:

Cpu(s):  3.2% us,  0.0% sy,  0.0% ni,  0.0% id, 96.8% wa,  0.0% hi,
0.0% si

what i am worry about is 96.8% wa why is it like that?

what is the process waiting for ?

could somone explain me that please? :)

Best regards
david


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] UNSUBSCRIBE

2006-05-09 Thread Chris

Shoaib Burq wrote:

UNSUBSCRIBE


To unsubscribe:

List-Unsubscribe: 
mailto:[EMAIL PROTECTED]



Email admins - Could we add this above or below the random tips that get 
appended to every email ?


--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL VACCUM killing CPU

2006-05-09 Thread Michael Glaesemann



I have got such problem.
Im running Postgresql 7.3.2 on Linux 2.6.13.


Also, you should seriously consider upgrading. 8.1.3 is the current  
PostgreSQL release. If you must remain on 7.3, at least upgrade to  
7.3.14, which contains *many* bugfixes.


Michael Glaesemann
grzm seespotcode net




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL VACCUM killing CPU

2006-05-09 Thread Michael Glaesemann



I have got such problem.
Im running Postgresql 7.3.2 on Linux 2.6.13.


Also, you should seriously consider upgrading. 8.1.3 is the current  
PostgreSQL release. If you must remain on 7.3, at least upgrade to  
7.3.14, which contains many bugfixes.


Michael Glaesemann
grzm seespotcode net




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Slow C Function

2006-05-09 Thread Tom Lane
Adam Palmblad [EMAIL PROTECTED] writes:
 We've got a C function that we use here and we find that for every
 connection, the first run of the function is much slower than any
 subsequent runs. ( 50ms compared to 8ms)

Perhaps that represents the time needed to load the dynamic library
into the backend?  If so, the preload_libraries parameter might
help you fix it.  Or consider pooling connections.  Or build a custom
executable with the function linked in permanently.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Scott Lamb

On May 9, 2006, at 11:26 AM, Joshua D. Drake wrote:
Of course not, but which drives lie about sync that are SATA? Or  
more specifically SATA-II?


I don't know the answer to this question, but have you seen this tool?

http://brad.livejournal.com/2116715.html

It attempts to experimentally determine if, with your operating  
system version, controller, and hard disk, fsync() does as claimed.  
Of course, experimentation can't prove the system is correct, but it  
can sometimes prove the system is broken.


I say it's worth running on any new model of disk, any new  
controller, or after the Linux kernel people rewrite everything (i.e.  
on every point release).


I have to admit to hypocrisy, though...I'm running with systems that  
other people ordered and installed, I doubt they were this thorough,  
and I don't have identical hardware to run tests on. So no real way  
to do this.


Regards,
Scott

--
Scott Lamb http://www.slamb.org/



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] VACUUM killing my CPU

2006-05-09 Thread Alan Hodgson
On May 9, 2006 02:45 am, [EMAIL PROTECTED] wrote:
 What I am worry about is 93.5% wa ...

 Could someone explain me what is the VACUUM process waiting for ?


Disk I/O.

-- 
In a truly free society, Alcohol, Tobacco and Firearms would be a 
convenience store chain.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-09 Thread Greg Stark
Douglas McNaught [EMAIL PROTECTED] writes:

 Vivek Khera [EMAIL PROTECTED] writes:
 
  On May 9, 2006, at 11:51 AM, Joshua D. Drake wrote:
 
  And dollar for dollar, SCSI will NOT be faster nor have the hard
  drive capacity that you will get with SATA.
 
  Does this hold true still under heavy concurrent-write loads?  I'm
  preparing yet another big DB server and if SATA is a better option,
  I'm all (elephant) ears.
 
 Correct me if I'm wrong, but I've never heard of a 15kRPM SATA drive.

Well, dollar for dollar you would get the best performance from slower drives
anyways since it would give you more spindles. 15kRPM drives are *expensive*.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] UNSUBSCRIBE

2006-05-09 Thread Tom Lane
Chris [EMAIL PROTECTED] writes:
 Email admins - Could we add this above or below the random tips that get 
 appended to every email ?

You mean like these headers that already get added to every list
message (these copied-and-pasted from your own message):

List-help: mailto:[EMAIL PROTECTED]
List-owner: mailto:[EMAIL PROTECTED]
List-subscribe: mailto:[EMAIL PROTECTED]
List-unsubscribe: mailto:[EMAIL PROTECTED]

Plus there are at least two of the random tips that deal with how to
unsubscribe.  My feeling is that the people who can't figure this out
still won't figure it out, no matter how thick the cluebat we swing at
them :-(

Maybe the real problem is at the other end of the process, ie we should
require some evidence of a greater-than-room-temp IQ to subscribe in the
first place?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] UNSUBSCRIBE

2006-05-09 Thread Chris

Tom Lane wrote:

Chris [EMAIL PROTECTED] writes:

Email admins - Could we add this above or below the random tips that get 
appended to every email ?



You mean like these headers that already get added to every list
message (these copied-and-pasted from your own message):


The headers aren't the first place you'd go looking for such info.. once 
you know they are there it's ok.



Maybe the real problem is at the other end of the process, ie we should
require some evidence of a greater-than-room-temp IQ to subscribe in the
first place?


Maybe :) The php-general list has

To unsubscribe, visit: http://www.php.net/unsub.php

at the bottom of every email, and there are still random unsubscribe 
requests..


Ah well :)

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] UNSUBSCRIBE

2006-05-09 Thread Tom Lane
Chris [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Maybe the real problem is at the other end of the process, ie we should
 require some evidence of a greater-than-room-temp IQ to subscribe in the
 first place?

 Maybe :) The php-general list has

 To unsubscribe, visit: http://www.php.net/unsub.php

 at the bottom of every email, and there are still random unsubscribe 
 requests..

That's depressing, indeed :-(

I'm not against spending a little bandwidth to provide unsub
instructions, but somehow I can't see putting an 8x10 color glossy
photograph with circles and arrows and a paragraph on the back [1]
of every list message to do it.

regards, tom lane

[1] http://www.guntheranderson.com/v/data/alicesre.htm

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly