Re: [PERFORM] Slow loads when indexes added.
[EMAIL PROTECTED] mentioned : = Try ANALYZE after loading the referenced tables, but before loading the main table I attached a new script for creating the load file... Analyze didn't help, it actually took longer to load. I set autocommit to off, and put a commit after every 100 inserts, chattr'd noatime atrribute off recursively on PGDATA, and set fsync to off, this improved the time from 3min 51sec to 2min 37 sec for the slow scenario. But I was already doing all these things in the app that used to take 40 minutes, but now takes four hours to load. Any other suggestions? Kind Regards Stefan ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cpu_tuple_cost
The this day and age argument isn't very convincing. Hard drive capacity growth has far outstripped hard drive seek time and bandwidth improvements. Random access has more penalty than ever. In point of fact, there haven't been noticeable seek time improvements for years. Transfer rates, on the other hand, have gone through the roof. Er, yeah. I stated it wrong. The real ratio here is between seek time and throughput. Typical 7200RPM drives have average seek times are in the area of 10ms. Typical sustained transfer rates are in the range of 40Mb/s. Postgres reads 8kB blocks at a time. So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. So what's going on with the empirically derived value of 4? Perhaps this is because even though Postgres is reading an entire table sequentially it's unlikely to be the only I/O consumer? The sequential reads would be interleaved occasionally by some other I/O forcing a seek to continue. What about the cache memory on the disk? Even IDE disks have some 8Mb cache today, which makes a lot of difference for fairly short scans. Even if it's just read cache. That'll bring the speed of random access down to a 1=1 relationship with sequential access, assuming all fits in the cache. //Magnus ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance problem on delete from for 10k rows. May
Hi All, I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. But if I want to delete the content of the table (~10k) it may be long to those 6 selects for each deleted rows. Why are those selects are there ? Are those select really run on each row deleted? I'm running version 7.4.5 on cygwin. I ran the same delete from pgAdminIII and I got 945562ms for all the deletes within the same transaction .. (so I was wrong saying it took less time in PgAdminIII... sorry about this). Do you have any idea why those 6 selects are there? Maybe I can drop indexes before deleting the content of the table. I didn't planned to because tables are quite small and it's more complicated in my environment. And tell me if I'm wrong but if I drop indexed do I have to reload all my stored procedure (to reset the planner related info)??? Remember having read that somewhere.. (was it in the Postgresql General Bit newletter ...anyway) Thanks for your help I really appréciate it :-) /David LOG: duration: 144.000 ms LOG: statement: DELETE FROM YN LOG: duration: 30.000 ms LOG: statement: DELETE FROM YO LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = $1 AND ynyonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = $1 AND yryonum = $2 FOR UPDATE OF x LOG: duration: 83.000 ms LOG: connection received: host=127.0.0.1 port=2196 LOG: connection authorized: user=admin database=webCatalog LOG: statement: set datestyle to 'ISO'; select version(), case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end; LOG: duration: 2.000 ms LOG: statement: set client_encoding = 'UNICODE' LOG: duration: 0.000 ms LOG: statement: DELETE FROM IY LOG: duration: 71.000 ms LOG: statement: DELETE FROM IA LOG: duration: 17.000 ms LOG: statement: DELETE FROM IQ LOG: duration: 384.000 ms LOG: statement: DELETE FROM IC LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = $1 FOR UPDATE OF x LOG: duration: 656807.000 msMichael Fuhr wrote: --- DELETE FROM BM; DELETE FROM BD; DELETE FROM BO; DELETE FROM IL; DELETE FROM YR; DELETE FROM YN; DELETE FROM YO; DELETE FROM IY; DELETE FROM IA; DELETE FROM IQ; DELETE FROM IC; Michael Fuhr wrote: On Tue, Mar 15, 2005 at 04:24:17PM -0500, David Gagnon wrote: Il get this strange problem when deleting rows from a Java program. Sometime (For what I noticed it's not all the time) the server take almost forever to delete rows from table. Do other tables have foreign key references to the table you're deleting from? If so, are there indexes on the foreign key columns? Do you have triggers or rules on the table? Have you queried pg_locks during the long-lasting deletes to see if the deleting transaction is waiting for a lock on something? I rememeber having tried to delete the content of my table (IC) from PgAdminIII and I took couples of seconds!!! Not minutes. How many records did you delete in this case? If there are foreign key references, how many records were in the referencing tables? How repeatable is the disparity in delete time? A single test case might have been done under different conditions, so it might not mean much. No offense intended, but I remember doesn't carry as much weight as a documented example. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance problem on delete from for 10k rows. May
David Gagnon wrote: Hi All, I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. But if I want to delete the content of the table (~10k) it may be long to those 6 selects for each deleted rows. Why are those selects are there ? Are those select really run on each row deleted? You are using hibernate. Hibernate is generating them to lock the tables. I'm running version 7.4.5 on cygwin. I ran the same delete from pgAdminIII and I got 945562ms for all the deletes within the same transaction .. (so I was wrong saying it took less time in PgAdminIII... sorry about this). Do you have any idea why those 6 selects are there? Hibernate Maybe I can drop indexes before deleting the content of the table. I didn't planned to because tables are quite small and it's more complicated in my environment. And tell me if I'm wrong but if I drop indexed do I have to reload all my stored procedure (to reset the planner related info)??? Remember having read that somewhere.. (was it in the Postgresql General Bit newletter ...anyway) Thanks for your help I really appréciate it :-) /David LOG: duration: 144.000 ms LOG: statement: DELETE FROM YN LOG: duration: 30.000 ms LOG: statement: DELETE FROM YO LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = $1 AND ynyonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = $1 AND yryonum = $2 FOR UPDATE OF x LOG: duration: 83.000 ms LOG: connection received: host=127.0.0.1 port=2196 LOG: connection authorized: user=admin database=webCatalog LOG: statement: set datestyle to 'ISO'; select version(), case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end; LOG: duration: 2.000 ms LOG: statement: set client_encoding = 'UNICODE' LOG: duration: 0.000 ms LOG: statement: DELETE FROM IY LOG: duration: 71.000 ms LOG: statement: DELETE FROM IA LOG: duration: 17.000 ms LOG: statement: DELETE FROM IQ LOG: duration: 384.000 ms LOG: statement: DELETE FROM IC LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = $1 FOR UPDATE OF x LOG: duration: 656807.000 msMichael Fuhr wrote: --- DELETE FROM BM; DELETE FROM BD; DELETE FROM BO; DELETE FROM IL; DELETE FROM YR; DELETE FROM YN; DELETE FROM YO; DELETE FROM IY; DELETE FROM IA; DELETE FROM IQ; DELETE FROM IC; Michael Fuhr wrote: On Tue, Mar 15, 2005 at 04:24:17PM -0500, David Gagnon wrote: Il get this strange problem when deleting rows from a Java program. Sometime (For what I noticed it's not all the time) the server take almost forever to delete rows from table. Do other tables have foreign key references to the table you're deleting from? If so, are there indexes on the foreign key columns? Do you have triggers or rules on the table? Have you queried pg_locks during the long-lasting deletes to see if the deleting transaction is waiting for a lock on something? I rememeber having tried to delete the content of my table (IC) from PgAdminIII and I took couples of seconds!!! Not minutes. How many records did you delete in this case? If there are foreign key references, how many records were in the referencing tables? How repeatable is the disparity in delete time? A single test case might have been done under different conditions, so it might not mean much. No offense intended, but I remember doesn't carry as much weight as a documented example. ---(end of
Re: [PERFORM] Performance problem on delete from for 10k rows. May
Hi I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. I think they come from the FK checking code. Try to run a VACUUM on the IC table just before you delete from the other tables; that should make the checking almost instantaneous (assuming the vacuuming actually empties the table, which would depend on other transactions). I'll try to vaccum first before I start the delete to see if it change something. There is probably a good reason why but I don't understant why in a foreign key check it need to check the date it points to. You delete a row from table IC and do a check for integrity on tables that have foreign keys on IC (make sense). But why checking back IC? I'm pretty sure there is a good reason but it seems to have a big performance impact... In this case. It means it's not really feasable to empty the content of a schema. The table has only 10k .. with a huge table it's not feasible just because the checks on itselft! Is someone can explain why there is this extra check? Is that can be fixed or improved? Thanks for your help /David LOG: duration: 144.000 ms LOG: statement: DELETE FROM YN LOG: duration: 30.000 ms LOG: statement: DELETE FROM YO LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = $1 AND ynyonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 AND yonum = $2 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = $1 AND yryonum = $2 FOR UPDATE OF x LOG: duration: 83.000 ms LOG: connection received: host=127.0.0.1 port=2196 LOG: connection authorized: user=admin database=webCatalog LOG: statement: set datestyle to 'ISO'; select version(), case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end; LOG: duration: 2.000 ms LOG: statement: set client_encoding = 'UNICODE' LOG: duration: 0.000 ms LOG: statement: DELETE FROM IY LOG: duration: 71.000 ms LOG: statement: DELETE FROM IA LOG: duration: 17.000 ms LOG: statement: DELETE FROM IQ LOG: duration: 384.000 ms LOG: statement: DELETE FROM IC LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x LOG: statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = $1 FOR UPDATE OF x LOG: duration: 656807.000 msMichael Fuhr wrote: It would be better to be able to use TRUNCATE to do this, but in 8.0 you can't if the tables have FKs. 8.1 is better on that regard ... ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] multi-column index
Hello. I have a problem concerning multi-column indexes. I have a table containing some 250k lines. Table public.descriptionprodftdiclnk Column| Type | Modifiers -+-+--- idword | integer | not null idqualifier | integer | not null Indexes: descriptionprodftdiclnk_pkey primary key, btree (idword, idqualifier) ix_descriptionprodftdiclnk_idqualif btree (idqualifier) When analyzing a simple query on the idword column the query planner displays: explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN --- Seq Scan on descriptionprodftdiclnk (cost=0.00..4788.14 rows=44388 width=8) (actual time=87.582..168.041 rows=43792 loops=1) Filter: (idword = 44) Total runtime: 195.339 ms (3 rows) I don't understand why the query planner would not use the default created multi-column index on the primary key. According to the Postgres online documentation it should. By setting the enable_seqscan parameter to off, i can force the planner to use the index: explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN --- Index Scan using descriptionprodftdiclnk_pkey on descriptionprodftdiclnk (cost=0.00..36720.39 rows=44388 width=8) (actual time=0.205..73.489 rows=43792 loops=1) Index Cond: (idword = 44) Total runtime: 100.564 ms (3 rows) On the other hand, by defining a new index on the idword column (and enable_seqscan set to on), the query uses the index: create index ix_tempIndex on descriptionprodftdiclnk(idword); CREATE INDEX explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN - Index Scan using ix_tempindex on descriptionprodftdiclnk (cost=0.00..916.24 rows=44388 width=8) (actual time=0.021..79.879 rows=43792 loops=1) Index Cond: (idword = 44) Total runtime: 107.081 ms (3 rows) Could someone provide an explanation for the planner's behaviour? Thanks for your help, Daniel ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance problem on delete from for 10k rows. May
On Wed, 16 Mar 2005, David Gagnon wrote: Hi I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. I think they come from the FK checking code. Try to run a VACUUM on the IC table just before you delete from the other tables; that should make the checking almost instantaneous (assuming the vacuuming actually empties the table, which would depend on other transactions). I'll try to vaccum first before I start the delete to see if it change something. There is probably a good reason why but I don't understant why in a foreign key check it need to check the date it points to. You delete a row from table IC and do a check for integrity on tables that have foreign keys on IC (make sense). But why checking back IC? Because in the general case there might be another row which satisfies the constraint added between the delete and the check. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance problem on delete from for 10k rows. May
Stephan Szabo wrote: On Wed, 16 Mar 2005, David Gagnon wrote: Hi I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. I think they come from the FK checking code. Try to run a VACUUM on the IC table just before you delete from the other tables; that should make the checking almost instantaneous (assuming the vacuuming actually empties the table, which would depend on other transactions). I'll try to vaccum first before I start the delete to see if it change something. There is probably a good reason why but I don't understant why in a foreign key check it need to check the date it points to. You delete a row from table IC and do a check for integrity on tables that have foreign keys on IC (make sense). But why checking back IC? Because in the general case there might be another row which satisfies the constraint added between the delete and the check. So it's means if I want to reset the shema with DELETE FROM Table statemnets I must first drop indexes, delete the data and then recreate indexes and reload stored procedure. Or I can suspend the foreign key check in the db right. I saw something on this. Is that possible to do this from the JDBC interface? Is there any other options I can consider ? Thanks for your help! /David ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Speeding up select distinct
Consider this query: SELECT distinct owner from pictures; Unique (cost=361.18..382.53 rows=21 width=4) (actual time=14.197..17.639 rows=21 loops=1) - Sort (cost=361.18..371.86 rows=4270 width=4) (actual time=14.188..15.450 rows=4270 loops=1) Sort Key: owner - Seq Scan on pictures (cost=0.00..103.70 rows=4270 width=4) (actual time=0.012..5.795 rows=4270 loops=1) Total runtime: 19.147 ms I thought that 19ms to return 20 rows out of a 4000 rows table so I added an index: CREATE INDEX pictures_owner ON pictures (owner); It gives a slight improvement: Unique (cost=0.00..243.95 rows=21 width=4) (actual time=0.024..10.293 rows=21 loops=1) - Index Scan using pictures_owner on pictures (cost=0.00..233.27 rows=4270 width=4) (actual time=0.022..8.227 rows=4270 loops=1) Total runtime: 10.369 ms But still, it's a lot for 20 rows. I looked at other type of indexes, but they seem to either not give beter perfs or be irrelevant. Any ideas, apart from more or less manually maintaining a list of distinct owners in another table ? -- Laurent Martelli [EMAIL PROTECTED]Java Aspect Components http://www.aopsys.com/ http://jac.objectweb.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cpu_tuple_cost
Greg, So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's a factor of 49. I don't think anyone wants random_page_cost to be set to 50 though. For a high end 15k drive I see average seek times get as low as 3ms. And sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for random access reads or about a random_page_cost of 37. Still pretty extreme. Actually, what you're demonstrating here is that there's really no point in having a random_page_cost GUC, since the seek/scan ratio is going to be high regardless. Although I can point out that you left out the fact that the disk needs to do a seek to find the beginning of the seq scan area, and even then some file fragmentation is possible. Finally, I've never seen PostgreSQL manage more than 70% of the maximum read rate, and in most cases more like 30%. So what's going on with the empirically derived value of 4? It's not empirically derived; it's a value we plug into an internal-to-postgresql formula. And 4 is a fairly conservative value that works for a lot of systems. Realistically, the values we should be deriving from are: -- median file cache size for postgresql files -- average disk read throughput -- effective processor calculation throughput -- median I/O contention However, working those 4 hardware facts into forumulas that allow us to calculate the actual cost of a query execution plan is somebody's PhD paper. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Speeding up select distinct
Try : SELECT owner from pictures group by owner; Any ideas, apart from more or less manually maintaining a list of distinct owners in another table ? That would be a good idea too for normalizing your database. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] multi-column index
Daniel, Table public.descriptionprodftdiclnk What is this, German? ;-) explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN --- Seq Scan on descriptionprodftdiclnk (cost=0.00..4788.14 rows=44388 width=8) (actual time=87.582..168.041 rows=43792 loops=1) Filter: (idword = 44) Total runtime: 195.339 ms (3 rows) explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN --- Index Scan using descriptionprodftdiclnk_pkey on descriptionprodftdiclnk (cost=0.00..36720.39 rows=44388 width=8) (actual time=0.205..73.489 rows=43792 loops=1) Index Cond: (idword = 44) Total runtime: 100.564 ms (3 rows) create index ix_tempIndex on descriptionprodftdiclnk(idword); CREATE INDEX explain analyze select * from descriptionprodftdiclnk where idword=44; QUERY PLAN --- -- Index Scan using ix_tempindex on descriptionprodftdiclnk (cost=0.00..916.24 rows=44388 width=8) (actual time=0.021..79.879 rows=43792 loops=1) Index Cond: (idword = 44) Total runtime: 107.081 ms (3 rows) Could someone provide an explanation for the planner's behaviour? Pretty simple, really. Look at the cost calculations for the index scan for the multi-column index.PostgreSQL believes that: The cost of a seq scan is 4788.14 The cost of an 2-column index scan is 36720.39 The cost of a 1-column index scan is 916.24 Assuming that you ran each of these queries multiple times to eliminate caching as a factor, the issue is that the cost calculations are wrong. We give you a number of GUC variables to change that: effective_cache_size random_page_cost cpu_tuple_cost etc. See the RUNTIME-CONFIGURATION docs for more details. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Speeding up select distinct
On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote: Consider this query: SELECT distinct owner from pictures; The performance has nothing to do with the number of rows returned, but rather the complexity of calculations and amount of data to sift through in order to find it. Any ideas, apart from more or less manually maintaining a list of distinct owners in another table ? This would be the proper thing to do, along with adding a foreign key from pictures to the new owner structure for integrity enforcement. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up select distinct
Consider this query: SELECT distinct owner from pictures; [...] Any ideas, apart from more or less manually maintaining a list of distinct owners in another table ? you answered your own question. With a 20 row owners table, you should be directing your efforts there group by is faster than distinct, but both are very wasteful and essentially require s full seqscan of the detail table. With a little hacking, you can change 'manual maintenance' to 'automatic maintenance'. 1. create table owner as select distinct owner from pictures; 2. alter table owner add constraint owner_pkey(owner); 3. alter table pictures add constraint ri_picture_owner(owner) references owner; 4. make a little append_ownder function which adds an owner to the owner table if there is not already one there. Inline this to your insert statement on pictures. Voila! Merlin p.s. normalize your data always! ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Speeding up select distinct
Rod == Rod Taylor [EMAIL PROTECTED] writes: Rod On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote: Consider this query: SELECT distinct owner from pictures; Rod The performance has nothing to do with the number of rows Rod returned, but rather the complexity of calculations and amount Rod of data to sift through in order to find it. Yes, but I thought that an index might be able to know what distinct values there are and help optime that query very much. -- Laurent Martelli [EMAIL PROTECTED]Java Aspect Components http://www.aopsys.com/ http://jac.objectweb.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Speeding up select distinct
Wow, what a fast response !!! PFC == PFC [EMAIL PROTECTED] writes: PFC Try : PFC SELECT owner from pictures group by owner; That's a slight improvement, but there's still a seq scan on pictures: HashAggregate (cost=114.38..114.38 rows=21 width=4) (actual time=7.585..7.605 rows=21 loops=1) - Seq Scan on pictures (cost=0.00..103.70 rows=4270 width=4) (actual time=0.015..3.272 rows=4270 loops=1) Total runtime: 7.719 ms -- Laurent Martelli [EMAIL PROTECTED]Java Aspect Components http://www.aopsys.com/ http://jac.objectweb.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Speeding up select distinct
On Wed, 2005-03-16 at 19:31 +0100, Laurent Martelli wrote: Rod == Rod Taylor [EMAIL PROTECTED] writes: Rod On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote: Consider this query: SELECT distinct owner from pictures; Rod The performance has nothing to do with the number of rows Rod returned, but rather the complexity of calculations and amount Rod of data to sift through in order to find it. Yes, but I thought that an index might be able to know what distinct values there are and help optime that query very much. The index does know. You just have to visit all of the pages within the index to find out, which it does, and that's why you dropped 10ms. But if you want a sub ms query, you're going to have to normalize the structure. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up select distinct
I just wished there was a means to fully automate all this and render it transparent to the user, just like an index. Merlin Voila! Merlin p.s. normalize your data always! I have this: pictures( PictureID serial PRIMARY KEY, Owner integer NOT NULL REFERENCES users, [...]); CREATE TABLE users ( UserID serial PRIMARY KEY, Name character varying(255), [...]); Isn't it normalized ? try: select * from users where UserID in (select pictureId from pictures); select * userid from users intersect select pictureid from pictures; select distinct userid, [...] from users, pictures where user userid = pictureid) if none of these give you what you want then you can solve this with a new tble, picture_user using the instructions I gave previously. Not sure if your data is normalized, but ISTM you are over-using surrogate keys. It may not be possible, but consider downgrading ID columns to unique and picking a natural key. Now you get better benefit of RI and you can sometimes remove joins from certain queries. Rule: use natural keys when you can, surrogate keys when you have to. Corollary: use domains for fields used in referential integrity. Merlin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Help to find out problem with joined tables
Hi all, Could someone explain me when I joined tree tables the querys that took about 1sec to finish, takes 17secs to complete when I put tree tables joined ? If I join movest/natope, it's fast, if I join movest/produt, it's fast too, but when I put a third joined table, forget, it's very slow. All tables are vacuumed by vacummdb --full --analyze, every night All Indexes are reindexed every night TABLES: --- Movest: +- 2 milions rows, indexed Natope: 30 rows PK(natope_id) Produt: +- 1400 Rows PK(codpro) EXPLAINS: - explain analyze select a.codpro, a.datmov, a.vlrtot from movest a, natope b where a.tipmov = 'S' and a.codpro = 629001 and a.datmov between '2005-03-01' and '2005-03-31' and a.natope = b.natope_id Merge Join (cost=35.68..36.23 rows=1 width=25) (actual time=2.613..2.840 rows=6 loops=1) Merge Cond: (outer.natope = inner.?column2?) - Sort (cost=32.02..32.04 rows=7 width=35) (actual time=1.296..1.314 rows=10 loops=1) Sort Key: a.natope - Index Scan using ix_movest_03 on movest a (cost=0.00..31.92 rows=7 width=35) (actual time=0.507..1.215 rows=10 loops=1) Index Cond: ((codpro = 629001::numeric) AND (datmov = '2005-03-01'::date) AND (datmov = '2005-03-31'::date)) Filter: (tipmov = 'S'::bpchar) - Sort (cost=3.65..3.82 rows=66 width=4) (actual time=1.132..1.203 rows=49 loops=1) Sort Key: (b.natope_id)::numeric - Seq Scan on natope b (cost=0.00..1.66 rows=66 width=4) (actual time=0.117..0.500 rows=66 loops=1) Total runtime: 3.077 ms --- explain analyze select a.codpro, a.datmov, a.vlrtot from movest a, natope b, produt c where a.tipmov = 'S' and a.codpro = 629001 and a.datmov between '2005-03-01' and '2005-03-31' and a.natope = b.natope_id and a.codpro = c.codpro Nested Loop (cost=35.68..144.57 rows=2 width=25) (actual time=2838.121..17257.168 rows=6 loops=1) - Merge Join (cost=35.68..36.23 rows=1 width=25) (actual time=1.808..2.280 rows=6 loops=1) Merge Cond: (outer.natope = inner.?column2?) - Sort (cost=32.02..32.04 rows=7 width=35) (actual time=0.485..0.504 rows=10 loops=1) Sort Key: a.natope - Index Scan using ix_movest_03 on movest a (cost=0.00..31.92 rows=7 width=35) (actual time=0.135..0.390 rows=10 loops=1) Index Cond: ((codpro = 629001::numeric) AND (datmov = '2005-03-01'::date) AND (datmov = '2005-03-31'::date)) Filter: (tipmov = 'S'::bpchar) - Sort (cost=3.65..3.82 rows=66 width=4) (actual time=1.114..1.209 rows=49 loops=1) Sort Key: (b.natope_id)::numeric - Seq Scan on natope b (cost=0.00..1.66 rows=66 width=4) (actual time=0.058..0.485 rows=66 loops=1) - Seq Scan on produt c (cost=0.00..108.26 rows=8 width=4) (actual time=2688.356..2875.743 rows=1 loops=6) Filter: ((codpro)::numeric = 629001::numeric) Total runtime: 17257.865 ms Best Regards Rodrigo Moreno ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance problem on delete from for 10k rows. May
On Wed, 16 Mar 2005, David Gagnon wrote: Stephan Szabo wrote: On Wed, 16 Mar 2005, David Gagnon wrote: Hi I rerun the example with the debug info turned on in postgresl. As you can see all dependent tables (that as foreign key on table IC) are emptied before the DELETE FROM IC statement is issued. For what I understand the performance problem seem to came from those selects that point back to IC ( LOG: statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 FOR UPDATE OF x). There are 6 of them. I don't know where they are comming from. I think they come from the FK checking code. Try to run a VACUUM on the IC table just before you delete from the other tables; that should make the checking almost instantaneous (assuming the vacuuming actually empties the table, which would depend on other transactions). I'll try to vaccum first before I start the delete to see if it change something. There is probably a good reason why but I don't understant why in a foreign key check it need to check the date it points to. You delete a row from table IC and do a check for integrity on tables that have foreign keys on IC (make sense). But why checking back IC? Because in the general case there might be another row which satisfies the constraint added between the delete and the check. So it's means if I want to reset the shema with DELETE FROM Table statemnets I must first drop indexes, delete the data and then recreate indexes and reload stored procedure. Or I can suspend the foreign key check in the db right. I saw something on this. Is that possible to do this from the JDBC interface? I think you can remove the constraints and re-add them after which should hopefully be fast (a vacuum on the tables after the delete and before the add might help, but I'm not sure). You could potentially defer the constraint if it were deferrable, but I don't think that would help any. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cpu_tuple_cost
Josh Berkus josh@agliodbs.com writes: So what's going on with the empirically derived value of 4? It's not empirically derived; Yes it is. I ran experiments back in the late 90s to derive it. Check the archives. Disks have gotten noticeably bigger since then, but I don't think the ratio of seek time to rotation rate has changed much. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] multi-column index
David Brown [EMAIL PROTECTED] writes: Actually, I'm surprised the planner came up with such a low cost for the single column index, unless ... perhaps correlation statistics aren't used when determining costs for multi-column indexes? The correlation calculation for multi-column indexes is pretty whacked out pre-8.0. I don't think it's that great in 8.0 either --- we really need to make ANALYZE calculate the correlation explicitly for each index, probably, rather than trying to use per-column correlations. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help to find out problem with joined tables
On Wed, Mar 16, 2005 at 05:10:17PM -0300, Rodrigo Moreno wrote: If I join movest/natope, it's fast, if I join movest/produt, it's fast too, but when I put a third joined table, forget, it's very slow. What version of PostgreSQL are you using? All tables are vacuumed by vacummdb --full --analyze, every night All Indexes are reindexed every night How many updates/deletes do the tables see between vacuums? Movest: +- 2 milions rows, indexed Natope: 30 rows PK(natope_id) Produt: +- 1400 Rows PK(codpro) Could you show the table definitions, or at least the definitions for the relevant columns and indexes? - Seq Scan on produt c (cost=0.00..108.26 rows=8 width=4) (actual time=2688.356..2875.743 rows=1 loops=6) Filter: ((codpro)::numeric = 629001::numeric) What type is produt.codpro? You might be missing a potential index scan here due to mismatched types. The times (2688.356..2875.743) here look odd, although I might be overlooking or misinterpreting something. I don't know what else might cause that, but one thing that can is a lot of dead tuples in the table, hence my question about how much activity the tables see between vacuums. Maybe somebody else can provide a better explanation. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Join method influences scan method?
[EMAIL PROTECTED] writes: So, it would seem like my optimal plan should have hash joins with index scans. No. The thing you are looking at here is a nestloop join with inner index scan, which has to be understood as a unit even though EXPLAIN doesn't describe it that way. The inner indexscan is repeated once for each outer row, using a join key from the outer row as part of the index lookup. That's simply not relevant to the other kinds of joins, because they expect the inner and outer relations to be scanned independently. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster