[PERFORM] Implementatiion of Inheritance in Postgres
I would like to ask you where i can find information about the implementation of the inheritance relationship in Postgres. There are several ways to store and to retrieve instances contained in an hierarchie. Which clustering and buffer replacement policy implements Postgres? There is a system table called pg_inherits, but how is it used during hierarchies traversing? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Terrible performance after deleting/recreating indexes
Hi, Using PostgreSQL 7.4.2 on Solaris. I'm trying to improve performance on some queries to my databases so I wanted to try out various index structures. Since I'm going to be running my performance tests repeatedly, I created some SQL scripts to delete and recreate various index configurations. One of the scripts contains the commands for recreating the 'original' index configuration (i.e. the one I've already got some numbers for). Only thing is now when I delete and recreate the original indexes then run the query, I'm finding the performance has gone completely down the tubes compared to what I originally had. A query that used to take 5 minutes to complete now takes hours to complete. For what it's worth my query looks something like: select * from tbl_1, tbl_2 where tbl_1.id = tbl_2.id and tbl_2.name like 'x%y%' and tbl_1.x 1234567890123 order by tbl_1.x; tbl_1 is very big ( 2 million rows) tbl_2 is relatively small (7000 or so rows) tbl_1.x is a numeric(13) tbl_1.id tbl_2.id are integers tbl_2.name is a varchar(64) I've run 'VACUUM ANALYZE' on both tables involved in the query. I also used 'EXPLAIN' and observed that the query plan is completely changed from what it was originally. Any idea why this would be? I would have thougth that a freshly created index would have better performance not worse. I have not done any inserts or updates since recreating the indexes. thanks in advance, Bill C __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] inserting into brand new database faster than old database
Hello, I have been a happy postgresql developer for a few years now. Recently I have discovered a very strange phenomenon in regards to inserting rows. My app inserts millions of records a day, averaging about 30 rows a second. I use autovac to make sure my stats and indexes are up to date. Rarely are rows ever deleted. Each day a brand new set of tables is created and eventually the old tables are dropped. The app calls functions which based on some simple logic perform the correct inserts. The problem I am seeing is that after a particular database gets kinda old, say a couple of months, performance begins to degrade. Even after creating brand new tables my insert speed is slow in comparison ( by a magnitude of 5 or more ) with a brand new schema which has the exact same tables. I am running on an IBM 360 dual processor Linux server with a 100 gig raid array spanning 5 scsi disks. The machine has 1 gig of ram of which 500 meg is dedicated to Postgresql. Just to be clear, the question I have is why would a brand new db schema allow inserts faster than an older schema with brand new tables? Since the tables are empty to start, vacuuming should not be an issue at all. Each schema is identical in every way except the db name and creation date. Any ideas are appreciated. Thanks, T.R. Missner ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Forcing HashAggregation prior to index scan?
I have a very simple problem. I run two select statments, they are identical except for a single where condition. The first select statment runs in 9 ms, while the second statment runs for 4000 ms SQL1 - fast 9ms explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 from refseq_refseq_hits where seq_ac1 = 'NP_001217') SQL2 - very slow 4000ms explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 from refseq_refseq_hits where seq_ac1 = 'NP_001217') AND S.species = 'Homo sapiens' I think the second sql statment is slower than the first one because planner is not using HashAggregate. Can I force HashAggregation before index scan? Here is the full output from EXPLAIN ANALYZE explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 from refseq_refseq_hits where seq_ac1 = 'NP_001217'); QUERY PLAN --- Nested Loop (cost=169907.83..169919.88 rows=3 width=24) (actual time=1.450..8.707 rows=53 loops=1) - HashAggregate (cost=169907.83..169907.83 rows=2 width=19) (actual time=1.192..1.876 rows=53 loops=1) - Index Scan using refseq_refseq_hits_pkey on refseq_refseq_hits (cost=0.00..169801.33 rows=42600 width=19) (actual time=0.140..0.894 rows=54 loops=1) Index Cond: ((seq_ac1)::text = 'NP_001217'::text) - Index Scan using refseq_sequence_pkey on refseq_sequence s (cost=0.00..6.01 rows=1 width=24) (actual time=0.105..0.111 rows=1 loops=53) Index Cond: ((s.seq_ac)::text = (outer.seq_ac2)::text) Total runtime: 9.110 ms explain analyse select seq_ac from refseq_sequence S where seq_ac in (select seq_ac2 from refseq_refseq_hits where seq_ac1 = 'NP_001217') and S.species = 'Homo sapiens'; QUERY PLAN --- Nested Loop IN Join (cost=0.00..4111.66 rows=1 width=24) (actual time=504.176..3857.340 rows=30 loops=1) - Index Scan using refseq_sequence_key2 on refseq_sequence s (cost=0.00..1516.06 rows=389 width=24) (actual time=0.352..491.107 rows=27391 loops=1) Index Cond: ((species)::text = 'Homo sapiens'::text) - Index Scan using refseq_refseq_hits_pkey on refseq_refseq_hits (cost=0.00..858.14 rows=213 width=19) (actual time=0.114..0.114 rows=0 loops=27391) Index Cond: (((refseq_refseq_hits.seq_ac1)::text = 'NP_001217'::text) AND ((outer.seq_ac)::text = (refseq_refseq_hits.seq_ac2)::text)) Total runtime: 3857.636 ms ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Mysterious performance of query because of plsql function in
hi, Peter Alberer wrote: Hi there, i have a problem with a query that uses the result of a plsql function In the where clause: SELECT assignments.assignment_id, assignments.package_id AS package_id, assignments.title AS title, COUNT(*) AS Count FROM assignments INNER JOIN submissions ON (assignments.assignment_id=submissions.assignment_id) WHERE package_id=949589 AND submission_status(submissions.submission_id)='closed' GROUP BY assignments.assignment_id, assignments.package_id, assignments.title ORDER BY assignments.title; Postgres seems to execute the function submission_status for every row of the submissions table (~1500 rows). what is submission_status actualy? \df submission_status Is the function submission_status called stable? C. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Odd sorting behaviour
[Please CC me on all replies, I'm not subscribed to this list] Hi, I'm trying to find out why one of my queries is so slow -- I'm primarily using PostgreSQL 7.2 (Debian stable), but I don't really get much better performance with 7.4 (Debian unstable). My prototype table looks like this: CREATE TABLE opinions ( prodid INTEGER NOT NULL, uid INTEGER NOT NULL, opinion INTEGER NOT NULL, PRIMARY KEY ( prodid, uid ) ); In addition, there are separate indexes on prodid and uid. I've run VACUUM ANALYZE before all queries, and they are repeatable. (If anybody needs the data, that could be arranged -- it's not secret or anything :-) ) My query looks like this: EXPLAIN ANALYZE SELECT o3.prodid, SUM(o3.opinion*o12.correlation) AS total_correlation FROM opinions o3 RIGHT JOIN ( SELECT o2.uid, SUM(o1.opinion*o2.opinion)/SQRT(count(*)+0.0) AS correlation FROM opinions o1 LEFT JOIN opinions o2 ON o1.prodid=o2.prodid WHERE o1.uid=1355 GROUP BY o2.uid ) o12 ON o3.uid=o12.uid LEFT JOIN ( SELECT o4.prodid, COUNT(*) as num_my_comments FROM opinions o4 WHERE o4.uid=1355 GROUP BY o4.prodid ) nmc ON o3.prodid=nmc.prodid WHERE nmc.num_my_comments IS NULL AND o3.opinion0 AND o12.correlation0 GROUP BY o3.prodid ORDER BY total_correlation desc; And produces the query plan at http://www.samfundet.no/~sesse/queryplan.txt (The lines were a bit too long to include in an e-mail :-) ) Note that the o3.opinion0 AND o12.correleation0 lines are an optimization; I can run the query fine without them and it will produce the same results, but it goes slower both in 7.2 and 7.4. There are a few oddities here: - The subquery scan o12 phase outputs 1186 rows, yet 83792 are sorted. Where do the other ~82000 rows come from? And why would it take ~100ms to sort the rows at all? (In earlier tests, this was _one full second_ but somehow that seems to have improved, yet without really improving the overall query time. shared_buffers is 4096 and sort_mem is 16384, so it should really fit into RAM.) - Why does it use uid_index for an index scan on the table, when it obviously has no filter on it (since it returns all the rows)? Furthermore, why would this take half a second? (The machine is a 950MHz machine with SCSI disks.) - Also, the outer sort (the sorting of the 58792 rows from the merge join) is slow. :-) 7.4 isn't really much better: http://www.samfundet.no/~sesse/queryplan74.txt Note that this is run on a machine with almost twice the speed (in terms of CPU speed, at least). The same oddities are mostly present (such as o12 returning 1186 rows, but 58788 rows are sorted), so I really don't understand what's going on here. Any ideas on how to improve this? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] finding a max value
This is the query: select max(KA) from annuncio field KA is indexed and is int4, explaining gives: explain select max(KA) from annuncio; QUERY PLAN --- Aggregate (cost=21173.70..21173.70 rows=1 width=4) - Seq Scan on annuncio (cost=0.00..20326.76 rows=338776 width=4) (2 rows) wasn't supposed to do an index scan? it takes about 1sec to get the result. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] inserting into brand new database faster than old database
I do have one table that acts as a lookup table and grows in size as the app runs, however in the tests I have been doing I have dropped and recreated all tables including the lookup table. I keep wondering how disk is allocated to a particular DB. Also is there any way I could tell whether the writes to disk are the bottleneck? T.R. Missner Level(3) Communications SSID tools Senior Software Engineer -Original Message- From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 1:17 PM To: Missner, T. R. Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] inserting into brand new database faster than old database I don't think I have enough detail about your app. Couple of questions, are there any tables that recieve a lot of inserts / updates / deletes that are not deleted and recreated often? If so, one possibility is that you don't have a large enough FSM settings and your table is actually growing despite using autovac. Does that sounds possbile to you? Missner, T. R. wrote: Hello, I have been a happy postgresql developer for a few years now. Recently I have discovered a very strange phenomenon in regards to inserting rows. My app inserts millions of records a day, averaging about 30 rows a second. I use autovac to make sure my stats and indexes are up to date. Rarely are rows ever deleted. Each day a brand new set of tables is created and eventually the old tables are dropped. The app calls functions which based on some simple logic perform the correct inserts. The problem I am seeing is that after a particular database gets kinda old, say a couple of months, performance begins to degrade. Even after creating brand new tables my insert speed is slow in comparison ( by a magnitude of 5 or more ) with a brand new schema which has the exact same tables. I am running on an IBM 360 dual processor Linux server with a 100 gig raid array spanning 5 scsi disks. The machine has 1 gig of ram of which 500 meg is dedicated to Postgresql. Just to be clear, the question I have is why would a brand new db schema allow inserts faster than an older schema with brand new tables? Since the tables are empty to start, vacuuming should not be an issue at all. Each schema is identical in every way except the db name and creation date. Any ideas are appreciated. Thanks, T.R. Missner ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] finding a max value
On Fri, 02 Jul 2004 20:50:26 +0200, Edoardo Ceccarelli [EMAIL PROTECTED] wrote: This is the query: select max(KA) from annuncio wasn't supposed to do an index scan? it takes about 1sec to get the result. TIP 5: Have you checked our extensive FAQ? I believe this is a FAQ. See: http://www.postgresql.org/docs/faqs/FAQ.html#4.8 Try select KA from annuncio order by KA desc limit 1; /rls -- :wq ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] query plan wierdness?
Can someone explain what I'm missing here? This query does what I expect--it uses the foo index on the openeddatetime, callstatus, calltype, callkey fields: elon2=# explain analyse select * from call where aspid='123C' and OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24 23:59:59.999' order by openeddatetime desc, callstatus desc, calltype desc, callkey desc limit 26; QUERY PLAN - Limit (cost=0.00..103.76 rows=26 width=297) (actual time=0.07..0.58 rows=26 loops=1) - Index Scan Backward using foo on call (cost=0.00..1882805.77 rows=471781 width=297) (actual time=0.06..0.54 rows=27 loops=1) Index Cond: ((openeddatetime = '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime = '2004-06-24 23:59:59.999-07'::timestamp with time zone)) Filter: (aspid = '123C'::bpchar) Total runtime: 0.66 msec (5 rows) However, this query performs a sequence scan on the table, ignoring the call_idx13 index (the only difference is the addition of the aspid field in the order by clause): elon2=# explain analyse select * from call where aspid='123C' and OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc, calltype desc, callkey desc limit 26; QUERY PLAN Limit (cost=349379.41..349379.48 rows=26 width=297) (actual time=32943.52..32943.61 rows=26 loops=1) - Sort (cost=349379.41..350558.87 rows=471781 width=297) (actual time=32943.52..32943.56 rows=27 loops=1) Sort Key: aspid, openeddatetime, callstatus, calltype, callkey - Seq Scan on call (cost=0.00..31019.36 rows=471781 width=297) (actual time=1.81..7318.13 rows=461973 loops=1) Filter: ((aspid = '123C'::bpchar) AND (openeddatetime = '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime = '2004-06-24 23:59:59.999-07'::timestamp with time zone)) Total runtime: 39353.86 msec (6 rows) Here's the structure of the table in question: Table public.call Column | Type | Modifiers --+--+--- aspid| character(4) | lastmodifiedtime | timestamp with time zone | moduser | character(13)| callkey | character(13)| calltype | text | callqueueid | text | openeddatetime | timestamp with time zone | assigneddatetime | timestamp with time zone | closeddatetime | timestamp with time zone | reopeneddatetime | timestamp with time zone | openedby | text | callstatus | character(1) | callpriority | text | callreasontext | text | keyword1 | text | keyword2 | text | callername | text | custfirstname| text | custlastname | text | custssntin | character(9) | custssnseq | text | custdbccode | character(9) | custlongname | text | custtypecode | character(2) | custphone| text | custid | character(9) | assigneduserid | character varying(30)| historyitemcount | integer | callertype | text | callerphoneext | text | followupdate | text | hpjobnumber | character(11)| Indexes: call_idx1 unique btree (aspid, callkey), call_aspid btree (aspid), call_aspid_opendedatetime btree (aspid, openeddatetime), call_idx10 btree (aspid, keyword1, openeddatetime, callstatus, calltype , custtypecode, custid, callkey), call_idx11 btree (aspid, keyword2, openeddatetime, callstatus, calltype , custtypecode, custid, callkey), call_idx12 btree (aspid, custtypecode, custid, openeddatetime, callstat us, calltype, callkey), call_idx13 btree (aspid, openeddatetime, callstatus, calltype, callkey), call_idx14 btree (aspid, callqueueid, callstatus, callkey), call_idx2 btree (aspid, callqueueid, openeddatetime, custtypecode, call status, callkey), call_idx3 btree (aspid, assigneduserid, openeddatetime, custtypecode, c allstatus, callkey), call_idx4 btree (aspid, custid, custtypecode, callkey, callstatus), call_idx7
Re: [PERFORM] query plan wierdness?
Well, you're kind of right. I removed the limit, and now _both_ versions of the query perform a sequence scan! Oh, I forgot to include in my original post: this is PostgreSQL 7.3.4 (on x86 Linux and sparc Solaris 6) -Joel -Original Message- From: Guido Barosio [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 2:46 PM To: Joel McGraw Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] query plan wierdness? The limit is tricking you. I guess a sequential scan is cheaper than an index scan with the limit 26 found there. I am wrong? Greets -- --- Guido Barosio Buenos Aires, Argentina --- -- CONFIDENTIALITY NOTICE -- This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended addressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy, disclose or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email, and delete the message. Thank you. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] query plan wierdness?
On Wed, 7 Jul 2004, Joel McGraw wrote: However, this query performs a sequence scan on the table, ignoring the call_idx13 index (the only difference is the addition of the aspid field in the order by clause): elon2=# explain analyse select * from call where aspid='123C' and OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc, calltype desc, callkey desc limit 26; QUERY PLAN Limit (cost=349379.41..349379.48 rows=26 width=297) (actual time=32943.52..32943.61 rows=26 loops=1) - Sort (cost=349379.41..350558.87 rows=471781 width=297) (actual time=32943.52..32943.56 rows=27 loops=1) Sort Key: aspid, openeddatetime, callstatus, calltype, callkey - Seq Scan on call (cost=0.00..31019.36 rows=471781 width=297) (actual time=1.81..7318.13 rows=461973 loops=1) Filter: ((aspid = '123C'::bpchar) AND (openeddatetime = '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime = '2004-06-24 23:59:59.999-07'::timestamp with time zone)) Total runtime: 39353.86 msec (6 rows) Hmm, what does it say after a set enable_seqscan=off? Also, what does it say if you use aspid desc rather than just aspid in the order by? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster