Re: [PERFORM] [IMPORTANT] - My application performance

2005-07-27 Thread Dawid Kuroczko
On 7/26/05, Roberto Germano Vieweg Neto <[EMAIL PROTECTED]> wrote:
> My application is using Firebird 1.5.2
> 
> My question is:
> 
> Postgre SQL will be more faster than Firebird? How much (in percent)?
> 
> I need about 20% to 50% more performance at my application.
> Can I get this migratin to postgresql ?

The answer is: maybe.  There's nothing which stops PostgreSQL from
being faster, and likewise there is nothing that stops it from being
slower.  YMMV.

Your route should be:
 * migrate most speed-demanding part to PostgreSQL
 * benchmark it, trying to emulate real-world load.
 * if it is slower than Firebird, post it here, together with EXPLAIN ANALYZEs
   and ask if there's something you can do to speed it up.

   Regards,
  Dawid

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Inherited Table Query Planning (fwd)

2005-07-27 Thread Jens-Wolfhard Schicke
Is there a way to make the query planner consider pulling inner appends 
outside joins?


Example:
natural_person inherits from person (obviously)

admpostgres3=# explain analyze select u.name, p.name from users u, person p 
where p.user_id = u.id and u.name = 's_ohl';

 QUERY PLAN
--
Hash Join  (cost=8.01..3350.14 rows=3 width=36) (actual 
time=107.391..343.657 rows=10 loops=1)

  Hash Cond: ("outer".user_id = "inner".id)
  ->  Append  (cost=0.00..2461.34 rows=117434 width=20) (actual 
time=0.007..264.910 rows=117434 loops=1)
->  Seq Scan on person p  (cost=0.00..575.06 rows=31606 width=20) 
(actual time=0.005..38.911 rows=31606 loops=1)
->  Seq Scan on natural_person p  (cost=0.00..1886.28 rows=85828 
width=19) (actual time=0.003..104.338 rows=85828 loops=1)
  ->  Hash  (cost=8.01..8.01 rows=2 width=24) (actual time=0.096..0.096 
rows=0 loops=1)
->  Index Scan using users_name_idx on users u  (cost=0.00..8.01 
rows=2 width=24) (actual time=0.041..0.081 rows=10 loops=1)

  Index Cond: ((name)::text = 's_ohl'::text)
Total runtime: 343.786 ms
(9 rows)

admpostgres3=# explain analyze select u.name, p.name from users u, only 
person p where p.user_id = u.id and u.name = 's_ohl' union all select 
u.name, p.name from users u, only natural_person p where p.user_id = u.id 
and u.name = 's_ohl';

QUERY PLAN
-
Append  (cost=0.00..28.19 rows=3 width=28) (actual time=0.197..0.366 
rows=10 loops=1)
  ->  Subquery Scan "*SELECT* 1"  (cost=0.00..14.12 rows=1 width=28) 
(actual time=0.159..0.159 rows=0 loops=1)
->  Nested Loop  (cost=0.00..14.11 rows=1 width=28) (actual 
time=0.157..0.157 rows=0 loops=1)
  ->  Index Scan using users_name_idx on users u 
(cost=0.00..8.01 rows=2 width=24) (actual time=0.039..0.075 rows=10 loops=1)

Index Cond: ((name)::text = 's_ohl'::text)
  ->  Index Scan using person_user_idx on person p 
(cost=0.00..3.03 rows=2 width=8) (actual time=0.006..0.006 rows=0 loops=10)

Index Cond: (p.user_id = "outer".id)
  ->  Subquery Scan "*SELECT* 2"  (cost=0.00..14.08 rows=2 width=28) 
(actual time=0.036..0.193 rows=10 loops=1)
->  Nested Loop  (cost=0.00..14.06 rows=2 width=28) (actual 
time=0.033..0.171 rows=10 loops=1)
  ->  Index Scan using users_name_idx on users u 
(cost=0.00..8.01 rows=2 width=24) (actual time=0.018..0.049 rows=10 loops=1)

Index Cond: ((name)::text = 's_ohl'::text)
  ->  Index Scan using natural_person_user_idx on 
natural_person p  (cost=0.00..3.01 rows=1 width=8) (actual 
time=0.006..0.007 rows=1 loops=10)

Index Cond: (p.user_id = "outer".id)
Total runtime: 0.475 ms
(14 rows)


Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke  [EMAIL PROTECTED]
asco GmbH http://www.asco.de
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Inherited Table Query Planning (fwd)

2005-07-27 Thread Tom Lane
Jens-Wolfhard Schicke <[EMAIL PROTECTED]> writes:
> Is there a way to make the query planner consider pulling inner appends 
> outside joins?

Not at the moment.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] "Vacuum Full Analyze" taking so long

2005-07-27 Thread Tomeh, Husam
Thank you all for your great input. It sure helped. 

-- 
 Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jochem van
Dieten
Sent: Tuesday, July 26, 2005 2:58 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] "Vacuum Full Analyze" taking so long

Tomeh, Husam wrote:
> The other question I have. What would be the proper approach to 
> rebuild indexes. I re-indexes and then run vacuum/analyze. Should I 
> not use the re-index approach, and instead, drop the indexes, vacuum 
> the tables, and then create the indexes, then run analyze on tables
and indexes??

If you just want to rebuild indexes, just drop and recreate.

However, you are also running a VACUUM FULL, so I presume you have
deleted a significant number of rows and want to recover the space that
was in use by them. In that scenario, it is often better to CLUSTER the
table to force a rebuild. While VACUUM FULL moves the tuples around
inside the existing file(s), CLUSTER simply creates new file(s), moves
all the non-deleted tuples there and then swaps the old and the new
files. There can be a significant performance increase in doing so (but
you obviously need to have some free diskspace).
If you CLUSTER your table it will be ordered by the index you specify.
There can be a performance increase in doing so, but if you don't want
to you can also do a no-op ALTER TABLE and change a column to a datatype
that is the same as it already has. This too will force a rewrite of the
table but without ordering the tuples.

So in short my recommendations:
- to rebuild indexes, just drop and recreate the indexes
- to rebuild everything because there is space that can bepermanently
reclaimed, drop indexes, cluster or alter the table, recreate the
indexes and anlyze the table

Jochem

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

**
This message contains confidential information intended only for the 
use of the addressee(s) named above and may contain information that 
is legally privileged.  If you are not the addressee, or the person 
responsible for delivering it to the addressee, you are hereby 
notified that reading, disseminating, distributing or copying this 
message is strictly prohibited.  If you have received this message by 
mistake, please immediately notify us by replying to the message and 
delete the original message immediately thereafter.

Thank you.   FADLD Tag
**


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread Josh Berkus
Luke,

> Well - now that I test it, it appears you are correct, temp table COPY
> bypasses WAL - thanks for pointing it out!

RIght.  The problem is bypassing WAL for loading new "scratch" tables which 
aren't TEMPORARY tables.   We need to do this for multi-threaded ETL, since:
a) Temp tables can't be shared by several writers, and
b) you can't index a temp table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread Kris Jurka


On Wed, 27 Jul 2005, Josh Berkus wrote:

> b) you can't index a temp table.
> 

jurka# create temp table t (a int);
CREATE
jurka# create index myi on t(a);
CREATE

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] wal_buffer tests in

2005-07-27 Thread Josh Berkus
Folks,

I ran a wal_buffer test series.   It appears that increasing the 
wal_buffers is indeed very important for OLTP applications, potentially 
resulting in as much as a 15% average increase in transaction processing.  
What's interesting is that this is not just true for 8.1, it's true for 
8.0.3 as well.   

More importantly, 8.1 performance is somehow back up to above-8.0 levels.  
Something was broken in June that's got fixed (this test series is based 
on July 3 CVS) but I don't know what.  Clues?

Test results are here:
http://pgfoundry.org/docman/view.php/141/79/wal_buffer_test.pdf

As always, detailed test results are available from OSDL, just use:
http://khack.osdl.org/stp/#
where # is the test number.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Mirroring PostgreSQL database

2005-07-27 Thread Sebastian Hennebrueder
Shashi Kanth Boddula schrieb:

> Hi,
> I have one customer who is using PostgreSQL 7.4.8 on Linux . He has
> some problems with database mirroring . The details are follows.
> The customer is using Linux on which PostgreSQL 7.4.8 along with Jboss
> 3.2.3 is running . He has 2 servers , one is acting as a live server
> (primary) and another is acting as a fail-over (secondary)  server
> .  Secondary server is placed in remote location . These servers are
> acting as a Attendence server for daily activities . Nearly 50,000
> employees depend on the live server .
>  
> The customer is using DBmirror tool to mirror the database records of
> primary to secondary . The customer is complaining that there is one
> day (24 hours) delay between primary and secondray for database
> synchronization . They have dedicated line and bandwidth , but still
> the problems exists.
>  
> I just want to know , for immediate data mirroring , what is the best
> way for PostgreSQL . PostgreSQL is offering many mirror tools , but
> which one is the best ?. Is there any other way to accomplish the task ?
>  
> Thank you . Waiting for your reply.
>  
>
> Thanks & Regards,
> Shashi Kanth
> Consultant - Linux
> RHCE , LPIC-2
> Onward Novell - Bangalore
> 9886455567
>
>

For java based solution you could also have a look at x-jdbc or xjdbc.

But before you should find out what the reason for the delay is
actually. When the backup server is to slow, it may be not important which
mirroring tool you use.


-- 
Best Regards / Viele Grüße

Sebastian Hennebrueder



http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB 

Get support, education and consulting for these technologies - uncomplicated 
and cheap.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread Simon Riggs
On Wed, 2005-07-27 at 09:29 -0700, Josh Berkus wrote:
> Luke,
> 
> > Well - now that I test it, it appears you are correct, temp table COPY
> > bypasses WAL - thanks for pointing it out!
> 
> RIght.  The problem is bypassing WAL for loading new "scratch" tables which 
> aren't TEMPORARY tables.   We need to do this for multi-threaded ETL, since:
> a) Temp tables can't be shared by several writers, and
> b) you can't index a temp table.

The description of "scratch" tables might need some slight
clarification. It kindof makes it sound like temp tables.

I had in mind the extra tables that an application sometimes needs to
operate faster. Denormalisations, pre-joined tables, pre-calculated
results, aggregated data. These are not temporary tables, just part of
the application - multi-user tables that stay across shutdown/restart.

If you have gallons of GB, you will probably by looking to make use of
such tables.

You can use such tables for the style of ETL known as ELT, but that is
not the only use.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread PFC




I had in mind the extra tables that an application sometimes needs to
operate faster. Denormalisations, pre-joined tables, pre-calculated
results, aggregated data. These are not temporary tables, just part of
the application - multi-user tables that stay across shutdown/restart.


	You could also add caching search results for easy pagination without  
redoing always entirely on each page the Big Slow Search Query that every  
website has...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-27 Thread Matthew Schumacher
I'm not sure how much this has been discussed on the list, but wasn't
able to find anything relevant in the archives.

The new Spamassassin is due out pretty soon.  They are currently testing
3.1.0pre4.  One of the things I hope to get out of this release is bayes
word stats moved to a real RDBMS.  They have separated the mysql
BayesStore module from the PgSQL one so now postgres can use it's own
queries.

I loaded all of this stuff up on a test server and am finding that the
bayes put performance is really not good enough for any real amount of
mail load.

The performance problems seems to be when the bayes module is
inserting/updating.  This is now handled by the token_put procedure.

After playing with various indexes and what not I simply am unable to
make this procedure perform any better.  Perhaps someone on the list can
spot the bottleneck and reveal why this procedure isn't performing that
well or ways to make it better.

I put the rest of the schema up at
http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
needs to see it too.

CREATE OR REPLACE FUNCTION put_token(integer, bytea, integer, integer,
integer) RETURNS bool AS '
DECLARE
inuserid ALIAS for $1;
intoken ALIAS for $2;
inspam_count ALIAS for $3;
inham_count ALIAS for $4;
inatime ALIAS for $5;
got_token record;
updated_atime_p bool;
BEGIN
  updated_atime_p := FALSE;
  SELECT INTO got_token spam_count, ham_count, atime
FROM bayes_token
   WHERE id = inuserid
 AND token = intoken;
   IF NOT FOUND THEN
 -- we do not insert negative counts, just return true
 IF (inspam_count < 0 OR inham_count < 0) THEN
   RETURN TRUE;
 END IF;
 INSERT INTO bayes_token (id, token, spam_count, ham_count, atime)
 VALUES (inuserid, intoken, inspam_count, inham_count, inatime);
 IF NOT FOUND THEN
   RAISE EXCEPTION ''unable to insert into bayes_token'';
   return FALSE;
 END IF;
 UPDATE bayes_vars SET token_count = token_count + 1
  WHERE id = inuserid;
 IF NOT FOUND THEN
   RAISE EXCEPTION ''unable to update token_count in bayes_vars'';
   return FALSE;
 END IF;
 UPDATE bayes_vars SET newest_token_age = inatime
  WHERE id = inuserid AND newest_token_age < inatime;
 IF NOT FOUND THEN
   UPDATE bayes_vars
  SET oldest_token_age = inatime
WHERE id = inuserid
  AND oldest_token_age > inatime;
 END IF;
 return TRUE;
  ELSE
IF (inspam_count != 0) THEN
  -- no need to update atime if it is < the existing value
  IF (inatime < got_token.atime) THEN
UPDATE bayes_token
   SET spam_count = spam_count + inspam_count
 WHERE id = inuserid
   AND token = intoken
   AND spam_count + inspam_count >= 0;
  ELSE
UPDATE bayes_token
   SET spam_count = spam_count + inspam_count,
   atime = inatime
 WHERE id = inuserid
   AND token = intoken
   AND spam_count + inspam_count >= 0;
IF FOUND THEN
  updated_atime_p := TRUE;
END IF;
  END IF;
END IF;
IF (inham_count != 0) THEN
  -- no need to update atime is < the existing value or if it was
already updated
  IF inatime < got_token.atime OR updated_atime_p THEN
UPDATE bayes_token
   SET ham_count = ham_count + inham_count
 WHERE id = inuserid
   AND token = intoken
   AND ham_count + inham_count >= 0;
  ELSE
UPDATE bayes_token
   SET ham_count = ham_count + inham_count,
   atime = inatime
 WHERE id = inuserid
   AND token = intoken
   AND ham_count + inham_count >= 0;
IF FOUND THEN
  updated_atime_p := TRUE;
END IF;
  END IF;
END IF;
IF updated_atime_p THEN
  UPDATE bayes_vars
 SET oldest_token_age = inatime
   WHERE id = inuserid
 AND oldest_token_age > inatime;
END IF;
return TRUE;
  END IF;
END;
' LANGUAGE 'plpgsql';


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-27 Thread Josh Berkus
Matt,

> After playing with various indexes and what not I simply am unable to
> make this procedure perform any better.  Perhaps someone on the list can
> spot the bottleneck and reveal why this procedure isn't performing that
> well or ways to make it better.

Well, my first thought is that this is a pretty complicated procedure for 
something you want to peform well.Is all this logic really necessary?   
How does it get done for MySQL?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-27 Thread Tom Lane
Josh Berkus  writes:
> RIght.  The problem is bypassing WAL for loading new "scratch" tables which 
> aren't TEMPORARY tables.   We need to do this for multi-threaded ETL, since:
> a) Temp tables can't be shared by several writers, and
> b) you can't index a temp table.

This may not matter given point (a), but: point (b) is completely wrong.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-27 Thread Matthew Schumacher
Josh Berkus wrote:
> Matt,
> 
> 
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better.  Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
> 
> 
> Well, my first thought is that this is a pretty complicated procedure for 
> something you want to peform well.Is all this logic really necessary?   
> How does it get done for MySQL?
> 

I'm not sure if it's all needed, in mysql they have this simple schema:

===
CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) TYPE=MyISAM;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) TYPE=MyISAM;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===

Then they do this to insert the token:

INSERT INTO bayes_token (
  id,
  token,
  spam_count,
  ham_count,
  atime
) VALUES (
  ?,
  ?,
  ?,
  ?,
  ?
) ON DUPLICATE KEY
  UPDATE
spam_count = GREATEST(spam_count + ?, 0),
ham_count = GREATEST(ham_count + ?, 0),
atime = GREATEST(atime, ?)

Or update the token:

UPDATE bayes_vars SET
  $token_count_update
  newest_token_age = GREATEST(newest_token_age, ?),
  oldest_token_age = LEAST(oldest_token_age, ?)
  WHERE id = ?


I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.

Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.

schu

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-27 Thread Josh Berkus
Matt,

> UPDATE bayes_vars SET
>   $token_count_update
>   newest_token_age = GREATEST(newest_token_age, ?),
>   oldest_token_age = LEAST(oldest_token_age, ?)
>   WHERE id = ?
>
>
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements performing poorly.

Well, it might be because we don't have a built-in GREATEST or LEAST prior to 
8.1.   However, it's pretty darned easy to construct one.

> Honestly, I'm not real up on writing procs, I was hoping the problem
> would be obvious to someone.

Well, there's the general performance tuning stuff of course (postgresql.conf) 
which if you've not done any of it will pretty dramatically affect your 
througput rates.   And vacuum, analyze, indexes, etc.

You should also look at ways to make the SP simpler.  For example, you have a 
cycle that looks like:

SELECT
IF NOT FOUND
INSERT
ELSE
UPDATE

Which could be made shorter as:

UPDATE
IF NOT FOUND
INSERT

... saving you one index scan.

Also, I don't quite follow it, but the procedure seems to be doing at least 
two steps that the MySQL version isn't doing at all.  If the PG version is 
doing more things, of course it's going to take longer.

Finally, when you have a proc you're happy with, I suggest having an expert 
re-write it in C, which should double the procedure performance.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-27 Thread Matthew Schumacher
Josh Berkus wrote:
> Matt,
> 

> Well, it might be because we don't have a built-in GREATEST or LEAST prior to 
> 8.1.   However, it's pretty darned easy to construct one.

I was more talking about min() and max() but yea, I think you knew where
I was going with it...

> 
> Well, there's the general performance tuning stuff of course 
> (postgresql.conf) 
> which if you've not done any of it will pretty dramatically affect your 
> througput rates.   And vacuum, analyze, indexes, etc.

I have gone though all that.

> You should also look at ways to make the SP simpler.  For example, you have a 
> cycle that looks like:
> 
> SELECT
>   IF NOT FOUND
>   INSERT
>   ELSE
>   UPDATE
> 
> Which could be made shorter as:
> 
> UPDATE
>   IF NOT FOUND
>   INSERT
> 
> ... saving you one index scan.
> 
> Also, I don't quite follow it, but the procedure seems to be doing at least 
> two steps that the MySQL version isn't doing at all.  If the PG version is 
> doing more things, of course it's going to take longer.
> 
> Finally, when you have a proc you're happy with, I suggest having an expert 
> re-write it in C, which should double the procedure performance.
> 

Sounds like I need to completely understand what the proc is doing and
work on a rewrite.  I'll look into writing it in C, I need to do some
reading about how that works and exactly what it buys you.

Thanks for the helpful comments.

schu

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Left joining against two empty tables makes a query SLOW

2005-07-27 Thread Chris Travers

Hi all;

I have a customer who currently uses an application which had become 
slow.  After doing some digging, I found the slow query:


SELECT c.accno, c.description, c.link, c.category, ac.project_id, 
p.projectnumber,

   a.department_id, d.description AS department
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
   JOIN ar a ON (a.id = ac.trans_id)
   LEFT JOIN project p ON (ac.project_id = p.id)
   LEFT JOIN department d ON (d.id = a.department_id)
WHERE a.customer_id = 11373 AND a.id IN (
   SELECT max(id) FROM ar WHERE customer_id = 11373
);

(reformatted for readability)

This is taking 10 seconds to run.

Interestingly, both the project and department tables are blank, and if 
I omit them, the query becomes:

SELECT c.accno, c.description, c.link, c.category, ac.project_id
FROM chart c JOIN acc_trans ac ON (ac.chart_id = c.id)
   JOIN ar a ON (a.id = ac.trans_id)
WHERE a.customer_id = 11373 AND a.id IN (
   SELECT max(id) FROM ar WHERE customer_id = 11373
);

This takes 139ms.  1% of the previous query.

The plan for the long query is:

  
QUERY PLAN


Hash IN Join  (cost=87337.25..106344.93 rows=41 width=118) (actual 
time=7615.843..9850.209 rows=10 loops=1)

  Hash Cond: ("outer".trans_id = "inner".max)
  ->  Merge Right Join  (cost=86620.57..100889.85 rows=947598 
width=126) (actual time=7408.830..9200.435 rows=177769 loops=1)

Merge Cond: ("outer".id = "inner".department_id)
->  Index Scan using department_id_key on department d  
(cost=0.00..52.66

rows=1060 width=36) (actual time=0.090..0.090 rows=0 loops=1)
->  Sort  (cost=86620.57..87067.55 rows=178792 width=94) 
(actual time=7408.709..7925.843 rows=177769 loops=1)

  Sort Key: a.department_id
  ->  Merge Right Join  (cost=45871.18..46952.83 
rows=178792 width=94) (actual time=4962.122..6671.319 rows=177769 loops=1)

Merge Cond: ("outer".id = "inner".project_id)
->  Index Scan using project_id_key on project p  
(cost=0.00..49.80 rows=800 width=36) (actual time=0.007..0.007 rows=0 
loops=1)
->  Sort  (cost=45871.18..46318.16 rows=178792 
width=62) (actual time=4962.084..5475.636 rows=177769 loops=1)

  Sort Key: ac.project_id
  ->  Hash Join  (cost=821.20..13193.43 
rows=178792 width=62) (actual time=174.905..4295.685 rows=177769 loops=1)

Hash Cond: ("outer".chart_id = "inner".id)
->  Hash Join  (cost=817.66..10508.02 
rows=178791

width=20) (actual time=173.952..2840.824 rows=177769 loops=1)
  Hash Cond: ("outer".trans_id = 
"inner".id)
  ->  Seq Scan on acc_trans ac  
(cost=0.00..3304.38 rows=181538 width=12) (actual time=0.062..537.753 
rows=181322 loops=1)
  ->  Hash  (cost=659.55..659.55 
rows=22844 width=8) (actual time=173.625..173.625 rows=0 loops=1)
->  Seq Scan on ar a  
(cost=0.00..659.55 rows=22844 width=8) (actual time=0.022..101.828 
rows=22844 loops=1)
  Filter: (customer_id 
= 11373)
->  Hash  (cost=3.23..3.23 rows=123 
width=50) (actual time=0.915..0.915 rows=0 loops=1)
  ->  Seq Scan on chart c  
(cost=0.00..3.23 rows=123 width=50) (actual time=0.013..0.528 rows=123 
loops=1)
  ->  Hash  (cost=716.67..716.67 rows=1 width=4) (actual 
time=129.037..129.037 rows=0 loops=1)
->  Subquery Scan "IN_subquery"  (cost=716.66..716.67 rows=1 
width=4) (actual time=129.017..129.025 rows=1 loops=1)
  ->  Aggregate  (cost=716.66..716.66 rows=1 width=4) 
(actual time=129.008..129.011 rows=1 loops=1)
->  Seq Scan on ar  (cost=0.00..659.55 rows=22844 
width=4) (actual time=0.020..73.266 rows=22844 loops=1)

  Filter: (customer_id = 11373)
Total runtime: 9954.133 ms
(28 rows)

The shorter query's plan is:

  
QUERY PLAN
 
-
Hash Join  (cost=728.42..732.96 rows=8 width=50) (actual 
time=130.908..131.593 rows=10 loops=1)

  Hash Cond: ("outer".id = "inner".chart_id)
  ->  Seq Scan on chart c  (cost=0.00..3.23 rows=123 width=50) (actual 
time=0.006..0.361 rows=123 loops=1)
  ->  Hash  (cost=728.40..728.40 rows=8 width=8) (actual 
time=130.841..130.841 row

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-27 Thread Tom Lane
Matthew Schumacher <[EMAIL PROTECTED]> writes:
> After playing with various indexes and what not I simply am unable to
> make this procedure perform any better.  Perhaps someone on the list can
> spot the bottleneck and reveal why this procedure isn't performing that
> well or ways to make it better.

There's not anything obviously wrong with that procedure --- all of the
updates are on primary keys, so one would expect reasonably efficient
query plans to get chosen.  Perhaps it'd be worth the trouble to build
the server with profiling enabled and get a gprof trace to see where the
time is going.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-27 Thread Karim Nassar
On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:

> I put the rest of the schema up at
> http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> needs to see it too.

Do you have sample data too?

-- 
Karim Nassar
Collaborative Computing Lab of NAU
Office: (928) 523 5868 -=- Mobile: (928) 699 9221
http://ccl.cens.nau.edu/~kan4


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq