[PERFORM] Implementatiion of Inheritance in Postgres

2004-07-07 Thread Ioannis Theoharis


I would like to ask you where i can find information about the
implementation of the inheritance relationship in Postgres.

There are several ways to store and to retrieve instances
contained in an hierarchie.

Which clustering and buffer replacement policy implements Postgres?

There is a system table called pg_inherits, but how is it used during
hierarchies traversing?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Terrible performance after deleting/recreating indexes

2004-07-07 Thread Bill Chandler
Hi,

Using PostgreSQL 7.4.2 on Solaris.  I'm trying to 
improve performance on some queries to my databases so
I wanted to try out various index structures.  

Since I'm going to be running my performance tests
repeatedly, I created some SQL scripts to delete and 
recreate various index configurations.  One of the
scripts contains the commands for recreating the 
'original' index configuration (i.e. the one I've 
already got some numbers for).  Only thing is now
when I delete and recreate the original indexes then
run the query, I'm finding the performance has gone
completely down the tubes compared to what I 
originally had.  A query that used to take 5 minutes
to complete now takes hours to complete.

For what it's worth my query looks something like:

select * from tbl_1, tbl_2 where tbl_1.id = tbl_2.id
and tbl_2.name like 'x%y%' and tbl_1.x  1234567890123
order by tbl_1.x;

tbl_1 is very big ( 2 million rows)
tbl_2 is relatively small (7000 or so rows)
tbl_1.x is a numeric(13)
tbl_1.id  tbl_2.id are integers
tbl_2.name is a varchar(64)

I've run 'VACUUM ANALYZE' on both tables involved in
the query.  I also used 'EXPLAIN' and observed that
the query plan is completely changed from what it 
was originally.  

Any idea why this would be?  I would have thougth 
that a freshly created index would have better 
performance not worse.  I have not done any inserts
or updates since recreating the indexes.

thanks in advance,

Bill C

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] inserting into brand new database faster than old database

2004-07-07 Thread Missner, T. R.
Hello,

I have been a happy postgresql developer for a few years now.  Recently
I have discovered a very strange phenomenon in regards to inserting
rows.

My app inserts millions of records a day, averaging about 30 rows a
second. I use autovac to make sure my stats and indexes are up to date.
Rarely are rows ever deleted.  Each day a brand new set of tables is
created and eventually the old tables are dropped. The app calls
functions which based on some simple logic perform the correct inserts.


The problem I am seeing is that after a particular database gets kinda
old, say a couple of months, performance begins to degrade.  Even after
creating brand new tables my insert speed is slow in comparison ( by a
magnitude of 5 or more ) with a brand new schema which has the exact
same tables.  I am running on an IBM 360 dual processor Linux server
with a 100 gig raid array spanning 5 scsi disks.  The machine has 1 gig
of ram of which 500 meg is dedicated to Postgresql.

Just to be clear, the question I have is why would a brand new db schema
allow inserts faster than an older schema with brand new tables?  Since
the tables are empty to start, vacuuming should not be an issue at all.
Each schema is identical in every way except the db name and creation
date.

Any ideas are appreciated.

Thanks,

T.R. Missner

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Forcing HashAggregation prior to index scan?

2004-07-07 Thread Eugene
I have a very simple problem. I run two select statments, they are identical except 
for a single
where condition. The first select statment runs in 9 ms, while the second statment 
runs for 4000
ms 

SQL1 - fast 9ms
explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 
from
refseq_refseq_hits where seq_ac1 = 'NP_001217')

SQL2 - very slow  4000ms
explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 
from
refseq_refseq_hits where seq_ac1 = 'NP_001217') AND S.species = 'Homo sapiens' 

I think the second sql statment is slower than the first one because planner is not 
using
HashAggregate. Can I force HashAggregation before index scan? 

Here is the full output from EXPLAIN ANALYZE

explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 
from
refseq_refseq_hits where seq_ac1 = 'NP_001217');
QUERY PLAN
---
 Nested Loop  (cost=169907.83..169919.88 rows=3 width=24) (actual time=1.450..8.707 
rows=53
loops=1)
   -  HashAggregate  (cost=169907.83..169907.83 rows=2 width=19) (actual 
time=1.192..1.876
rows=53 loops=1)
 -  Index Scan using refseq_refseq_hits_pkey on refseq_refseq_hits  
(cost=0.00..169801.33
rows=42600 width=19) (actual time=0.140..0.894 rows=54 loops=1)
   Index Cond: ((seq_ac1)::text = 'NP_001217'::text)
   -  Index Scan using refseq_sequence_pkey on refseq_sequence s  (cost=0.00..6.01 
rows=1
width=24) (actual time=0.105..0.111 rows=1 loops=53)
 Index Cond: ((s.seq_ac)::text = (outer.seq_ac2)::text)
 Total runtime: 9.110 ms



explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 
from
refseq_refseq_hits where seq_ac1 = 'NP_001217') and S.species = 'Homo sapiens';
QUERY PLAN
---
 Nested Loop IN Join  (cost=0.00..4111.66 rows=1 width=24) (actual 
time=504.176..3857.340 rows=30
loops=1)
   -  Index Scan using refseq_sequence_key2 on refseq_sequence s  (cost=0.00..1516.06 
rows=389
width=24) (actual time=0.352..491.107 rows=27391 loops=1)
 Index Cond: ((species)::text = 'Homo sapiens'::text)
   -  Index Scan using refseq_refseq_hits_pkey on refseq_refseq_hits  
(cost=0.00..858.14 rows=213
width=19) (actual time=0.114..0.114 rows=0 loops=27391)
 Index Cond: (((refseq_refseq_hits.seq_ac1)::text = 'NP_001217'::text) AND
((outer.seq_ac)::text = (refseq_refseq_hits.seq_ac2)::text))
 Total runtime: 3857.636 ms

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Mysterious performance of query because of plsql function in

2004-07-07 Thread CoL
hi,
Peter Alberer wrote:
Hi there,
i have a problem with a query that uses the result of a plsql function
In
the where clause:
SELECT
   assignments.assignment_id,
   assignments.package_id AS package_id,
   assignments.title AS title,
   COUNT(*) AS Count
FROM
   assignments INNER JOIN submissions ON
   (assignments.assignment_id=submissions.assignment_id)
WHERE
   package_id=949589 AND
   submission_status(submissions.submission_id)='closed'
GROUP BY
   assignments.assignment_id, assignments.package_id, assignments.title
ORDER BY
   assignments.title;
Postgres seems to execute the function submission_status for every row
of
the submissions table (~1500 rows). 
what is submission_status actualy?
\df submission_status
Is the function submission_status  called stable?
C.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Odd sorting behaviour

2004-07-07 Thread Steinar H. Gunderson
[Please CC me on all replies, I'm not subscribed to this list]

Hi,

I'm trying to find out why one of my queries is so slow -- I'm primarily
using PostgreSQL 7.2 (Debian stable), but I don't really get much better
performance with 7.4 (Debian unstable). My prototype table looks like this:

  CREATE TABLE opinions (
prodid INTEGER NOT NULL,
uid INTEGER NOT NULL,
opinion INTEGER NOT NULL,
PRIMARY KEY ( prodid, uid )
  );

In addition, there are separate indexes on prodid and uid. I've run VACUUM
ANALYZE before all queries, and they are repeatable. (If anybody needs the
data, that could be arranged -- it's not secret or anything :-) ) My query
looks like this:

EXPLAIN ANALYZE
  SELECT o3.prodid, SUM(o3.opinion*o12.correlation) AS total_correlation FROM opinions 
o3
  RIGHT JOIN (
SELECT o2.uid, SUM(o1.opinion*o2.opinion)/SQRT(count(*)+0.0) AS correlation
FROM opinions o1 LEFT JOIN opinions o2 ON o1.prodid=o2.prodid
WHERE o1.uid=1355
GROUP BY o2.uid
  ) o12 ON o3.uid=o12.uid
  LEFT JOIN (
SELECT o4.prodid, COUNT(*) as num_my_comments
FROM opinions o4
WHERE o4.uid=1355
GROUP BY o4.prodid
  ) nmc ON o3.prodid=nmc.prodid
  WHERE nmc.num_my_comments IS NULL AND o3.opinion0 AND o12.correlation0
  GROUP BY o3.prodid
  ORDER BY total_correlation desc;

And produces the query plan at

  http://www.samfundet.no/~sesse/queryplan.txt

(The lines were a bit too long to include in an e-mail :-) ) Note that the
o3.opinion0 AND o12.correleation0 lines are an optimization; I can run
the query fine without them and it will produce the same results, but it
goes slower both in 7.2 and 7.4.

There are a few oddities here:

- The subquery scan o12 phase outputs 1186 rows, yet 83792 are sorted. Where
  do the other ~82000 rows come from? And why would it take ~100ms to sort the
  rows at all? (In earlier tests, this was _one full second_ but somehow that
  seems to have improved, yet without really improving the overall query time.
  shared_buffers is 4096 and sort_mem is 16384, so it should really fit into
  RAM.)
- Why does it use uid_index for an index scan on the table, when it obviously
  has no filter on it (since it returns all the rows)? Furthermore, why would
  this take half a second? (The machine is a 950MHz machine with SCSI disks.)
- Also, the outer sort (the sorting of the 58792 rows from the merge join)
  is slow. :-)

7.4 isn't really much better:

  http://www.samfundet.no/~sesse/queryplan74.txt

Note that this is run on a machine with almost twice the speed (in terms of
CPU speed, at least). The same oddities are mostly present (such as o12
returning 1186 rows, but 58788 rows are sorted), so I really don't understand
what's going on here. Any ideas on how to improve this?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] finding a max value

2004-07-07 Thread Edoardo Ceccarelli
This is the query:
select max(KA) from annuncio
field KA is indexed and is int4,
explaining gives:
explain select max(KA) from annuncio;
QUERY PLAN
---
Aggregate (cost=21173.70..21173.70 rows=1 width=4)
- Seq Scan on annuncio (cost=0.00..20326.76 rows=338776 width=4)
(2 rows)
wasn't supposed to do an index scan? it takes about 1sec to get the result.
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] inserting into brand new database faster than old database

2004-07-07 Thread Missner, T. R.
I do have one table that acts as a lookup table and grows in size as the
app runs, however in the tests I have been doing I have dropped and
recreated all tables including the lookup table.

I keep wondering how disk is allocated to a particular DB. Also is there
any way I could tell whether the writes to disk are the bottleneck?



T.R. Missner
Level(3) Communications
SSID tools
Senior Software Engineer


-Original Message-
From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 07, 2004 1:17 PM
To: Missner, T. R.
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] inserting into brand new database faster than old
database

I don't think I have enough detail about your app.  Couple of questions,

are there any tables that recieve a lot of inserts / updates / deletes 
that are not deleted and recreated often?  If so, one possibility is 
that you don't have a large enough FSM settings and your table is 
actually growing despite using autovac.  Does that sounds possbile to
you?

Missner, T. R. wrote:

 Hello,
 
 I have been a happy postgresql developer for a few years now.
Recently
 I have discovered a very strange phenomenon in regards to inserting
 rows.
 
 My app inserts millions of records a day, averaging about 30 rows a
 second. I use autovac to make sure my stats and indexes are up to
date.
 Rarely are rows ever deleted.  Each day a brand new set of tables is
 created and eventually the old tables are dropped. The app calls
 functions which based on some simple logic perform the correct
inserts.
 
 
 The problem I am seeing is that after a particular database gets kinda
 old, say a couple of months, performance begins to degrade.  Even
after
 creating brand new tables my insert speed is slow in comparison ( by a
 magnitude of 5 or more ) with a brand new schema which has the exact
 same tables.  I am running on an IBM 360 dual processor Linux server
 with a 100 gig raid array spanning 5 scsi disks.  The machine has 1
gig
 of ram of which 500 meg is dedicated to Postgresql.
 
 Just to be clear, the question I have is why would a brand new db
schema
 allow inserts faster than an older schema with brand new tables?
Since
 the tables are empty to start, vacuuming should not be an issue at
all.
 Each schema is identical in every way except the db name and creation
 date.
 
 Any ideas are appreciated.
 
 Thanks,
 
 T.R. Missner
 
 ---(end of
broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if
your
   joining column's datatypes do not match
 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] finding a max value

2004-07-07 Thread Rosser Schwarz
On Fri, 02 Jul 2004 20:50:26 +0200, Edoardo Ceccarelli [EMAIL PROTECTED] wrote:

 This is the query:
 select max(KA) from annuncio

 wasn't supposed to do an index scan? it takes about 1sec to get the result.

 TIP 5: Have you checked our extensive FAQ?

I believe this is a FAQ.

See: http://www.postgresql.org/docs/faqs/FAQ.html#4.8

Try select KA from annuncio order by KA desc limit 1;

/rls

-- 
:wq

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] query plan wierdness?

2004-07-07 Thread Joel McGraw

Can someone explain what I'm missing here?   This query does what I
expect--it uses the foo index on the openeddatetime, callstatus,
calltype, callkey fields:

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by openeddatetime desc, callstatus desc, calltype
desc, callkey desc limit 26;
 
QUERY PLAN



-
 Limit  (cost=0.00..103.76 rows=26 width=297) (actual time=0.07..0.58
rows=26 loops=1)
   -  Index Scan Backward using foo on call  (cost=0.00..1882805.77
rows=471781 width=297) (actual time=0.06..0.54 rows=27 loops=1)
 Index Cond: ((openeddatetime = '2000-01-01
00:00:00-07'::timestamp with time zone) AND (openeddatetime =
'2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Filter: (aspid = '123C'::bpchar)
 Total runtime: 0.66 msec
(5 rows)


However, this query performs a sequence scan on the table, ignoring the
call_idx13 index (the only difference is the addition of the aspid field
in the order by clause):

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
calltype desc, callkey desc limit 26;
 
QUERY PLAN




 Limit  (cost=349379.41..349379.48 rows=26 width=297) (actual
time=32943.52..32943.61 rows=26 loops=1)
   -  Sort  (cost=349379.41..350558.87 rows=471781 width=297) (actual
time=32943.52..32943.56 rows=27 loops=1)
 Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
 -  Seq Scan on call  (cost=0.00..31019.36 rows=471781
width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
   Filter: ((aspid = '123C'::bpchar) AND (openeddatetime =
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 39353.86 msec
(6 rows)


Here's the structure of the table in question:


   Table public.call
  Column  |   Type   | Modifiers 
--+--+---
 aspid| character(4) | 
 lastmodifiedtime | timestamp with time zone | 
 moduser  | character(13)| 
 callkey  | character(13)| 
 calltype | text | 
 callqueueid  | text | 
 openeddatetime   | timestamp with time zone | 
 assigneddatetime | timestamp with time zone | 
 closeddatetime   | timestamp with time zone | 
 reopeneddatetime | timestamp with time zone | 
 openedby | text | 
 callstatus   | character(1) | 
 callpriority | text | 
 callreasontext   | text | 
 keyword1 | text | 
 keyword2 | text | 
 callername   | text | 
 custfirstname| text | 
 custlastname | text | 
 custssntin   | character(9) |
custssnseq   | text | 
 custdbccode  | character(9) | 
 custlongname | text | 
 custtypecode | character(2) | 
 custphone| text | 
 custid   | character(9) | 
 assigneduserid   | character varying(30)| 
 historyitemcount | integer  | 
 callertype   | text | 
 callerphoneext   | text | 
 followupdate | text | 
 hpjobnumber  | character(11)| 
Indexes: call_idx1 unique btree (aspid, callkey),
 call_aspid btree (aspid),
 call_aspid_opendedatetime btree (aspid, openeddatetime),
 call_idx10 btree (aspid, keyword1, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
 call_idx11 btree (aspid, keyword2, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
 call_idx12 btree (aspid, custtypecode, custid, openeddatetime,
callstat
us, calltype, callkey),
 call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
callkey),
 call_idx14 btree (aspid, callqueueid, callstatus, callkey),
 call_idx2 btree (aspid, callqueueid, openeddatetime,
custtypecode, call
status, callkey),
 call_idx3 btree (aspid, assigneduserid, openeddatetime,
custtypecode, c
allstatus, callkey),
 call_idx4 btree (aspid, custid, custtypecode, callkey,
callstatus),
 call_idx7 

Re: [PERFORM] query plan wierdness?

2004-07-07 Thread Joel McGraw
Well, you're kind of right.  I removed the limit, and now _both_
versions of the query perform a sequence scan!

Oh, I forgot to include in my original post: this is PostgreSQL 7.3.4
(on x86 Linux and sparc Solaris 6)

-Joel

-Original Message-
From: Guido Barosio [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 07, 2004 2:46 PM
To: Joel McGraw
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] query plan wierdness?

The limit is tricking you.
I guess a sequential scan is cheaper than an index scan with the limit
26 found there.

I am wrong?

Greets

-- 
---
Guido Barosio
Buenos Aires, Argentina
---

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is 
addressed, and may contain information that is privileged, confidential and exempt 
from disclosure under applicable law. If you are not the intended addressee, nor 
authorized to receive for the intended addressee, you are hereby notified that you may 
not use, copy, disclose or distribute to anyone the message or any information 
contained in the message. If you have received this message in error, please 
immediately advise the sender by reply email, and delete the message. Thank you.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] query plan wierdness?

2004-07-07 Thread Stephan Szabo
On Wed, 7 Jul 2004, Joel McGraw wrote:

 However, this query performs a sequence scan on the table, ignoring the
 call_idx13 index (the only difference is the addition of the aspid field
 in the order by clause):

 elon2=# explain analyse select * from call where aspid='123C' and
 OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
 calltype desc, callkey desc limit 26;

 QUERY PLAN

 
 
 
  Limit  (cost=349379.41..349379.48 rows=26 width=297) (actual
 time=32943.52..32943.61 rows=26 loops=1)
-  Sort  (cost=349379.41..350558.87 rows=471781 width=297) (actual
 time=32943.52..32943.56 rows=27 loops=1)
  Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
  -  Seq Scan on call  (cost=0.00..31019.36 rows=471781
 width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
Filter: ((aspid = '123C'::bpchar) AND (openeddatetime =
 '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
 = '2004-06-24 23:59:59.999-07'::timestamp with time zone))
  Total runtime: 39353.86 msec
 (6 rows)


Hmm, what does it say after a set enable_seqscan=off?

Also, what does it say if you use aspid desc rather than just aspid in the
order by?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster