Re: [PERFORM] Use my (date) index, darn it!
John Siracusa [EMAIL PROTECTED] writes: Obviously the planner is making some bad choices here. A fair conclusion ... I know that it is trying to avoid random seeks or other scary things implied by a correlation statistic that is not close to 1 or -1, but it is massively overestimating the hit caused by those seeks and seemingly not taking into account the size of the table! You haven't given any evidence to support these conclusions, though. Could we see some table schemas, EXPLAIN ANALYZE output, and relevant pg_stats entries for the various cases? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] optimizing Postgres queries
On Monday 05 January 2004 16:58, David Teran wrote: We have some tests to check the performance and FrontBase is about 10 times faster than Postgres. We already played around with explain analyse select. It seems that for large tables Postgres does not use an index. We often see the scan message in the query plan. Were can we find more hints about tuning the performance? The database is about 350 MB large, without BLOB's. We tried to define every important index for the selects but it seems that something still goes wrong: FrontBase needs about 23 seconds for about 4300 selects and Postgres needs 4 minutes, 34 seconds. Check http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html Are you sure you are using correct data types on indexes? e.g. if field1 is an int2 field, then following query would not use an index. select * from table where field1=2; However following will select * from table where field1=2::int2; It is called as typecasting and postgresql is rather strict about it when it comes to making a decision of index usage. I am sure above two tips could take care of some of the problems. Such kind of query needs more specific information. Can you post explain analyze output for queries and database schema. HTH Shridhar ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] optimizing Postgres queries
Hi Shridhar, Are you sure you are using correct data types on indexes? Did not know about this... e.g. if field1 is an int2 field, then following query would not use an index. our fk have the type bigint, when i try one simple select like this: explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE t0.ID_FOREIGN_TABLE = 21110; i see that no index is being used whereas when i use explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE t0.ID_FOREIGN_TABLE = 21110::bigint; an index is used. Very fine, the performance is about 10 to 100 times faster for the single select. I am using WebObjects with JDBC. I will now create a DB with integer instead of bigint and see how this performs. regards David ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] optimizing Postgres queries
On Monday 05 January 2004 17:35, David Teran wrote: explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE t0.ID_FOREIGN_TABLE = 21110; i see that no index is being used whereas when i use explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE t0.ID_FOREIGN_TABLE = 21110::bigint; an index is used. Very fine, the performance is about 10 to 100 times faster for the single select. I am using WebObjects with JDBC. I will now create a DB with integer instead of bigint and see how this performs. The performance will likely to be the same. Its just that integer happens to be default integer type and hence it does not need an explicit typecast. ( I don't remember exactly which integer is default but it is either of int2,int4 and int8...:-)) The performance diffference is likely due to use of index, which is in turn due to typecasting. If you need bigint, you should use them. Just remember to typecast whenever required. Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] optimizing Postgres queries
Hi, The performance will likely to be the same. Its just that integer happens to be default integer type and hence it does not need an explicit typecast. ( I don't remember exactly which integer is default but it is either of int2,int4 and int8...:-)) The docs say int4 is much faster than int8, but i will check this. The performance diffference is likely due to use of index, which is in turn due to typecasting. If you need bigint, you should use them. Just remember to typecast whenever required. This is my bigger problem: i am using EOF (OR mapping tool) which frees me more or less form writing a lot of SQL. If i need to typecast to use an index then i have to see how to do this with this framework. Regards David ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] optimizing Postgres queries
On Monday 05 January 2004 17:48, David Teran wrote: Hi, The performance will likely to be the same. Its just that integer happens to be default integer type and hence it does not need an explicit typecast. ( I don't remember exactly which integer is default but it is either of int2,int4 and int8...:-)) The docs say int4 is much faster than int8, but i will check this. Well yes. That is correct as well. What I (really) meant to say that an index scan to pick few in4 tuples wouldn't be hell much faster than an index scan to pick same number of tuples with int8 definition. The initial boost you got from converting to index scan, would be probably best you can beat out of it.. Of course if you are scanning a few million of them sequentially, then it is different story. This is my bigger problem: i am using EOF (OR mapping tool) which frees me more or less form writing a lot of SQL. If i need to typecast to use an index then i have to see how to do this with this framework. Well, you can direct your queries to a function rather than table, that would cast the argument appropriately and select. Postgresql support function overloading as well, in case you need different types of arguments with same name. Or you can write an instead rule on server side which will perform casting before touching the table. I am not sure of exact details it would take to make it work, but it should work, at least in theory. That way you can preserve the efforts invested in the mapping tool. Of course, converting everything to integer might be a simpler option after all..:-) Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] optimizing Postgres queries
explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE t0.ID_FOREIGN_TABLE = 21110::bigint; an index is used. Very fine, the performance is about 10 to 100 times faster for the single select. An alternative technique is to do this: ... t0.ID_FOREIGN_TABLE = '21110'; Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] optimizing Postgres queries
David Teran [EMAIL PROTECTED] writes: This is my bigger problem: i am using EOF (OR mapping tool) which frees me more or less form writing a lot of SQL. If i need to typecast to use an index then i have to see how to do this with this framework. It's worth pointing out that this problem is fixed (at long last) in CVS tip. Ypu probably shouldn't expend large amounts of effort on working around a problem that will go away in 7.5. If you don't anticipate going to production for six months or so, you could adopt CVS tip as your development platform, with the expectation that 7.5 will be released by the time you need a production system. I wouldn't recommend running CVS tip as a production database but it should be plenty stable enough for devel purposes. Another plan would be to use int4 columns for the time being with the intention of widening them to int8 when you move to 7.5. This would depend on how soon you anticipate needing values 32 bits, of course. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Use my (date) index, darn it!
On 1/5/04 1:55 AM, Tom Lane wrote: John Siracusa [EMAIL PROTECTED] writes: Obviously the planner is making some bad choices here. A fair conclusion ... I know that it is trying to avoid random seeks or other scary things implied by a correlation statistic that is not close to 1 or -1, but it is massively overestimating the hit caused by those seeks and seemingly not taking into account the size of the table! You haven't given any evidence to support these conclusions, though. Well here's what I was basing that theory on: before clustering, the correlation for the date column was around 0.3. After clustering, it was 1, and the index was always used. Does clustering change any other statistics other that correlation? I ran analyze immediately before and after the cluster operation. Could we see some table schemas, EXPLAIN ANALYZE output, and relevant pg_stats entries for the various cases? Well, the table is clustered now, so I can't reproduce the situation. Is there any way to uncluster a table? Should I just cluster it on a different column? -John ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Use my (date) index, darn it!
John Siracusa [EMAIL PROTECTED] writes: Is there any way to uncluster a table? Should I just cluster it on a different column? That should work, if you choose one that's uncorrelated with the previous clustering attribute. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Use my (date) index, darn it!
After a long battle with technology, [EMAIL PROTECTED] (John Siracusa), an earthling, wrote: On 1/5/04 1:55 AM, Tom Lane wrote: John Siracusa [EMAIL PROTECTED] writes: Obviously the planner is making some bad choices here. A fair conclusion ... I know that it is trying to avoid random seeks or other scary things implied by a correlation statistic that is not close to 1 or -1, but it is massively overestimating the hit caused by those seeks and seemingly not taking into account the size of the table! You haven't given any evidence to support these conclusions, though. Well here's what I was basing that theory on: before clustering, the correlation for the date column was around 0.3. After clustering, it was 1, and the index was always used. Does clustering change any other statistics other that correlation? I ran analyze immediately before and after the cluster operation. Could we see some table schemas, EXPLAIN ANALYZE output, and relevant pg_stats entries for the various cases? Well, the table is clustered now, so I can't reproduce the situation. Is there any way to uncluster a table? Should I just cluster it on a different column? That would presumably work... It sounds to me as though the statistics that are being collected aren't good enough. That tends to be a sign that the quantity of statistics (e.g. - bins in the histogram) are insufficient. This would be resolved by changing the number of bins (default of 10) via ALTER TABLE FOO ALTER COLUMN BAR SET STATISTICS 100 (or some other value higher than 10). Clustering would rearrange the contents of the table, and perhaps make the histogram 'more representative.' Increasing the SET STATISTICS value will quite likely be even more helpful, and is a lot less expensive than clustering the table... -- If this was helpful, http://svcs.affero.net/rm.php?r=cbbrowne rate me http://www.ntlug.org/~cbbrowne/nonrdbms.html Rules of the Evil Overlord #158. I will exchange the labels on my folder of top-secret plans and my folder of family recipes. Imagine the hero's surprise when he decodes the stolen plans and finds instructions for Grandma's Potato Salad. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Use my (date) index, darn it!
On 1/5/04 11:45 AM, Christopher Browne wrote: It sounds to me as though the statistics that are being collected aren't good enough. That tends to be a sign that the quantity of statistics (e.g. - bins in the histogram) are insufficient. This would be resolved by changing the number of bins (default of 10) via ALTER TABLE FOO ALTER COLUMN BAR SET STATISTICS 100 (or some other value higher than 10). I did that, but I wasn't sure what value to use and what column to increase. I believe I increased the date column itself to 50 or something, but then I wasn't sure what to do next. I re-analyzed the table with the date column set to 50 but it didn't seem to help, so I resorted to clustering. Clustering would rearrange the contents of the table, and perhaps make the histogram 'more representative.' Increasing the SET STATISTICS value will quite likely be even more helpful, and is a lot less expensive than clustering the table... What column(s) should I increase? Do I have to do anything after increasing the statistics, or do I just wait for the stats collector to do its thing? -John ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] deferred foreign keys
On Mon, Jan 05, 2004 at 11:33:40 -0500, Vivek Khera [EMAIL PROTECTED] wrote: Thanks. Then it sorta makes it moot for me to try deferred checks, since the Pimary and Foreign keys never change once set. I wonder what is making the transactions appear to run lockstep, then... I think this is probably the issue with foreign key checks needing an exclusive lock, since there is no shared lock that will prevent deletes. This problem has been discussed a number of times on the lists and you should be able to find out more information from the archives if you want to confirm that this is the root cause of your problems. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] optimizing Postgres queries
Hi Tom, It's worth pointing out that this problem is fixed (at long last) in CVS tip. Ypu probably shouldn't expend large amounts of effort on working around a problem that will go away in 7.5. We have now changed the definition to integer, this will work for some time. We are currently evaluating and have several production database we might switch in some time. What we found out now is that a query with a single 'where' works fine, the query planer uses the index but when we have 'two' where clauses it does not use the index anymore: EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542); performs fine, less than one millisecond. EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about 235 milliseconds. I tried to change the second one to use IN but this did not help at all. Am i doing something wrong? I have an index defined like this: CREATE INDEX key_value_meta_data__id_value__fk_index ON KEY_VALUE_META_DATA USING btree (ID_VALUE); Regards David ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] optimizing Postgres queries
David Teran [EMAIL PROTECTED] writes: What we found out now is that a query with a single 'where' works fine, the query planer uses the index but when we have 'two' where clauses it does not use the index anymore: EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542); performs fine, less than one millisecond. EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about 235 milliseconds. Please, when you ask this sort of question, show the EXPLAIN ANALYZE output. It is not a virtue to provide minimal information and see if anyone can guess what's happening. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] deferred foreign keys
On Mon, 5 Jan 2004, Bruno Wolff III wrote: On Mon, Jan 05, 2004 at 11:33:40 -0500, Vivek Khera [EMAIL PROTECTED] wrote: Thanks. Then it sorta makes it moot for me to try deferred checks, since the Pimary and Foreign keys never change once set. I wonder what is making the transactions appear to run lockstep, then... I think this is probably the issue with foreign key checks needing an exclusive lock, since there is no shared lock that will prevent deletes. But, if he's updating the fk table but not the keyed column, it should no longer be doing the check and grabbing the locks. If he's seeing it grab the row locks still a full test case would be handy because it'd probably mean we missed something. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] deferred foreign keys
On Jan 5, 2004, at 1:38 PM, Bruno Wolff III wrote: I think this is probably the issue with foreign key checks needing an exclusive lock, since there is no shared lock that will prevent deletes. That was my original thought upon reading all the discussion of late regarding the FK checking locks. I figured if I deferred the checks to commit, I could save some contention time. However, if FK checks are skipped if the field in question is not updated, what locks would there be? Are they taken even if the checks are not performed on some sort of be prepared principle? Vivek Khera, Ph.D. +1-301-869-4449 x806 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] optimizing Postgres queries
Hi Tom, David Teran [EMAIL PROTECTED] writes: What we found out now is that a query with a single 'where' works fine, the query planer uses the index but when we have 'two' where clauses it does not use the index anymore: EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542); performs fine, less than one millisecond. EXPLAIN ANALYZE SELECT columns... FROM KEY_VALUE_META_DATA t0 WHERE (t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about 235 milliseconds. Please, when you ask this sort of question, show the EXPLAIN ANALYZE output. It is not a virtue to provide minimal information and see if anyone can guess what's happening. Sorry for that, i thought this is such a trivial question that the answer is easy. explain result from first query: Index Scan using key_value_meta_data__id_value__fk_index on KEY_VALUE_M ETA_DATA t0 (cost=0.00..1585.52 rows=467 width=1068) (actual time=0.42 4..0.493 rows=13 loops=1) Index Cond: (ID_VALUE = 21094) Total runtime: 0.608 ms explain result from second query: Seq Scan on KEY_VALUE_META_DATA t0 (cost=0.00..2671.16 rows=931 width =1068) (actual time=122.669..172.179 rows=25 loops=1) Filter: ((ID_VALUE = 21094) OR (ID_VALUE = 21103)) Total runtime: 172.354 ms I found out that its possible to disable seq scans with set enable_seqscan to off; then the second query result looks like this: Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta _data__id_value__fk_index on KEY_VALUE_META_DATA t0 (cost=0.00..3173. 35 rows=931 width=1068) (actual time=0.116..0.578 rows=25 loops=1) Index Cond: ((ID_VALUE = 21094) OR (ID_VALUE = 21103)) Total runtime: 0.716 ms But i read in the docs that its not OK to turn this off by default. I really wonder if this is my fault or not, from my point of view this is such a simple select that the query plan should not result in a table scan. Regards David ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Select max(foo) and select count(*) optimization
Speaking of special cases (well, I was on the admin list) there are two kinds that would really benefit from some attention. 1. The query select max(foo) from bar where the column foo has an index. Aren't indexes ordered? If not, an ordered index would be useful in this situation so that this query, rather than doing a sequential scan of the whole table, would just ask the index for the max value and return nearly instantly. 2. The query select count(*) from bar Surely the total number of rows in a table is kept somewhere convenient. If not, it would be nice if it could be :) Again, rather than doing a sequential scan of the entire table, this type of query could return instantly. I believe MySQL does both of these optimizations (which are probably a lot easier in that product, given its data storage system). These were the first areas where I noticed a big performance difference between MySQL and Postgres. Especially with very large tables, hearing the disks grind as Postgres scans every single row in order to determine the number of rows in a table or the max value of a column (even a primary key created from a sequence) is pretty painful. If the implementation is not too horrendous, this is an area where an orders-of-magnitude performance increase can be had. -John ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] optimizing Postgres queries
David Teran [EMAIL PROTECTED] writes: explain result from second query: Seq Scan on KEY_VALUE_META_DATA t0 (cost=0.00..2671.16 rows=931 width =1068) (actual time=122.669..172.179 rows=25 loops=1) Filter: ((ID_VALUE = 21094) OR (ID_VALUE = 21103)) The problem is evidently that the row estimate is so far off (931 estimate vs 25 actual). Have you done ANALYZE or VACUUM ANALYZE on this table recently? If you have, I'd be interested to see the pg_stats row for ID_VALUE. It might be that you need to increase the statistics target for this table. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] deferred foreign keys
On Mon, 5 Jan 2004, Vivek Khera wrote: On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote: But, if he's updating the fk table but not the keyed column, it should no longer be doing the check and grabbing the locks. If he's seeing it grab the row locks still a full test case would be handy because it'd probably mean we missed something. I'm not *sure* it is taking any locks. The transactions appear to be running lock step (operating on different parts of the same pair of tables) and I was going to see if deferring the locks made the difference. It is my feeling now that it will not. However, if there is a way to detect if locks are being taken, I'll do that. I'd like to avoid dropping and recreating the foreign keys if I can since it takes up some bit of time on the table with 20+ million rows. The only way I can think of to see the locks is to do just one of the operations and then manually attempting to select for update the associated pk row. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select max(foo) and select count(*) optimization
On 1/5/04 2:52 PM, Rod Taylor wrote: max(foo) optimizations requires an extension to the aggregates system. It will likely happen within a few releases. Looking forward to it. A work around can be accomplished today through the use of LIMIT and ORDER BY. Wowzers, I never imagined that that'd be so much faster. Thanks! :) -John ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] deferred foreign keys
On Mon, 2004-01-05 at 14:48, Stephan Szabo wrote: On Mon, 5 Jan 2004, Vivek Khera wrote: On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote: But, if he's updating the fk table but not the keyed column, it should no longer be doing the check and grabbing the locks. If he's seeing it grab the row locks still a full test case would be handy because it'd probably mean we missed something. I'm not *sure* it is taking any locks. The transactions appear to be running lock step (operating on different parts of the same pair of tables) and I was going to see if deferring the locks made the difference. It is my feeling now that it will not. However, if there is a way to detect if locks are being taken, I'll do that. I'd like to avoid dropping and recreating the foreign keys if I can since it takes up some bit of time on the table with 20+ million rows. The only way I can think of to see the locks is to do just one of the operations and then manually attempting to select for update the associated pk row. When a locker runs into a row lock held by another transaction, the locker will show a pending lock on the transaction id in pg_locks. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Use my (date) index, darn it!
In the last exciting episode, [EMAIL PROTECTED] (John Siracusa) wrote: What column(s) should I increase? Do I have to do anything after increasing the statistics, or do I just wait for the stats collector to do its thing? You have to ANALYZE the table again, to force in new statistics. And if the index in question is on _just_ the date column, then it is probably only that date column where the SET STATISTICS needs to be increased. -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://www.ntlug.org/~cbbrowne/sap.html Faith is the quality that enables you to eat blackberry jam on a picnic without looking to see whether the seeds move. -- DeMara Cabrera ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select max(foo) and select count(*) optimization
Oops! [EMAIL PROTECTED] (John Siracusa) was seen spray-painting on a wall: Speaking of special cases (well, I was on the admin list) there are two kinds that would really benefit from some attention. 1. The query select max(foo) from bar where the column foo has an index. Aren't indexes ordered? If not, an ordered index would be useful in this situation so that this query, rather than doing a sequential scan of the whole table, would just ask the index for the max value and return nearly instantly. 2. The query select count(*) from bar Surely the total number of rows in a table is kept somewhere convenient. If not, it would be nice if it could be :) Again, rather than doing a sequential scan of the entire table, this type of query could return instantly. I believe MySQL does both of these optimizations (which are probably a lot easier in that product, given its data storage system). These were the first areas where I noticed a big performance difference between MySQL and Postgres. Especially with very large tables, hearing the disks grind as Postgres scans every single row in order to determine the number of rows in a table or the max value of a column (even a primary key created from a sequence) is pretty painful. If the implementation is not too horrendous, this is an area where an orders-of-magnitude performance increase can be had. These are both VERY frequently asked questions. In the case of question #1, the optimization you suggest could be accomplished via some Small Matter Of Programming. None of the people that have wanted the optimization have, however, offered to actually DO the programming. In the case of #2, the answer is surely NOT. In MVCC databases, that information CANNOT be stored anywhere convenient because queries requested by transactions started at different points in time must get different answers. I think we need to add these questions and their answers to the FAQ so that the answer can be See FAQ Item #17 rather than people having to gratuitously explain it over and over and over again. -- (reverse (concatenate 'string moc.enworbbc @ enworbbc)) http://www.ntlug.org/~cbbrowne/finances.html Rules of the Evil Overlord #127. Prison guards will have their own cantina featuring a wide variety of tasty treats that will deliver snacks to the guards while on duty. The guards will also be informed that accepting food or drink from any other source will result in execution. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] deferred foreign keys
On Mon, 5 Jan 2004, Rod Taylor wrote: On Mon, 2004-01-05 at 14:48, Stephan Szabo wrote: On Mon, 5 Jan 2004, Vivek Khera wrote: On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote: But, if he's updating the fk table but not the keyed column, it should no longer be doing the check and grabbing the locks. If he's seeing it grab the row locks still a full test case would be handy because it'd probably mean we missed something. I'm not *sure* it is taking any locks. The transactions appear to be running lock step (operating on different parts of the same pair of tables) and I was going to see if deferring the locks made the difference. It is my feeling now that it will not. However, if there is a way to detect if locks are being taken, I'll do that. I'd like to avoid dropping and recreating the foreign keys if I can since it takes up some bit of time on the table with 20+ million rows. The only way I can think of to see the locks is to do just one of the operations and then manually attempting to select for update the associated pk row. When a locker runs into a row lock held by another transaction, the locker will show a pending lock on the transaction id in pg_locks. Yeah, but AFAIR that won't let you know if it's blocking on the particular row lock you're expecting. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Select max(foo) and select count(*) optimization
Not that I'm offering to do the porgramming mind you, :) but . . In the case of select count(*), one optimization is to do a scan of the primary key, not the table itself, if the table has a primary key. In a certain commercial, lesser database, this is called an index fast full scan. It would be important to scan the index in physical order (sequential physical IO) and not in key order (random physical IO) I'm guessing the payoff as well as real-world-utility of a max(xxx) optimization are much higher than a count(*) optimization tho On Mon, 2004-01-05 at 12:26, Christopher Browne wrote: Oops! [EMAIL PROTECTED] (John Siracusa) was seen spray-painting on a wall: Speaking of special cases (well, I was on the admin list) there are two kinds that would really benefit from some attention. 1. The query select max(foo) from bar where the column foo has an index. Aren't indexes ordered? If not, an ordered index would be useful in this situation so that this query, rather than doing a sequential scan of the whole table, would just ask the index for the max value and return nearly instantly. 2. The query select count(*) from bar Surely the total number of rows in a table is kept somewhere convenient. If not, it would be nice if it could be :) Again, rather than doing a sequential scan of the entire table, this type of query could return instantly. I believe MySQL does both of these optimizations (which are probably a lot easier in that product, given its data storage system). These were the first areas where I noticed a big performance difference between MySQL and Postgres. Especially with very large tables, hearing the disks grind as Postgres scans every single row in order to determine the number of rows in a table or the max value of a column (even a primary key created from a sequence) is pretty painful. If the implementation is not too horrendous, this is an area where an orders-of-magnitude performance increase can be had. These are both VERY frequently asked questions. In the case of question #1, the optimization you suggest could be accomplished via some Small Matter Of Programming. None of the people that have wanted the optimization have, however, offered to actually DO the programming. In the case of #2, the answer is surely NOT. In MVCC databases, that information CANNOT be stored anywhere convenient because queries requested by transactions started at different points in time must get different answers. I think we need to add these questions and their answers to the FAQ so that the answer can be See FAQ Item #17 rather than people having to gratuitously explain it over and over and over again. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select max(foo) and select count(*) optimization
Paul Tuckfield [EMAIL PROTECTED] writes: In the case of select count(*), one optimization is to do a scan of the primary key, not the table itself, if the table has a primary key. In a certain commercial, lesser database, this is called an index fast full scan. It would be important to scan the index in physical order (sequential physical IO) and not in key order (random physical IO) That won't work because you still have to hit the actual tuple to determine visibility. -Doug ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select max(foo) and select count(*) optimization
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Paul Tuckfield) wrote: Not that I'm offering to do the porgramming mind you, :) but . . In the case of select count(*), one optimization is to do a scan of the primary key, not the table itself, if the table has a primary key. In a certain commercial, lesser database, this is called an index fast full scan. It would be important to scan the index in physical order (sequential physical IO) and not in key order (random physical IO) The problem is that this optimization does not actually work. The index does not contain transaction visibility information, so you have to go to the pages of tuples in order to determine if any given tuple is visible. I'm guessing the payoff as well as real-world-utility of a max(xxx) optimization are much higher than a count(*) optimization tho That's probably so. In many cases, approximations, such as page counts, may be good enough, and pray consider, that (an approximation) is probably all you were getting from the database systems that had an optimization to store the count in a counter. -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www3.sympatico.ca/cbbrowne/linuxxian.html No, you misunderstand. Microsoft asked some hackers how they could make their system secure - the hackers replied Turn it off.. So they did. -- Anthony Ord ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Select max(foo) and select count(*) optimization
[EMAIL PROTECTED] (Rod Taylor) wrote: Especially with very large tables, hearing the disks grind as Postgres scans every single row in order to determine the number of rows in a table or the max value of a column (even a primary key created from a sequence) is pretty painful. If the implementation is not too horrendous, this is an area where an orders-of-magnitude performance increase can be had. Actually, it's very painful. For MySQL, they've accepted the concurrancy hit in order to accomplish it -- PostgreSQL would require a more subtle approach. Anyway, with Rules you can force this: ON INSERT UPDATE counter SET tablecount = tablecount + 1; ON DELETE UPDATE counter SET tablecount = tablecount - 1; You need to create a table counter with a single row that will keep track of the number of rows in the table. Just remember, you've now serialized all writes to the table, but in your situation it may be worth while. There's a still more subtle approach that relieves the serialization constraint, at some cost... - You add rules that _insert_ a row each time there is an insert/delete ON INSERT insert into counts(table, value) values ('our_table', 1); ON DELETE insert into counts(table, value) values ('our_table', -1); - The select count(*) from our_table is replaced by select sum(value) from counts where table = 'our_table' - Periodically, a compression process goes through and either: a) Deletes the rows for 'our_table' and replaces them with one row with a conventionally-scanned 'count(*)' value, or b) Computes select table, sum(value) as value from counts group by table, deletes all the existing rows in counts, and replaces them by the preceding selection, or c) Perhaps does something incremental that's like b), but which only processes parts of the count table at once. Process 500 rows, then COMMIT, or something of the sort... Note that this counts table can potentially grow _extremely_ large. The win comes when it gets compressed, so that instead of scanning through 500K items, it index-scans through 27, the 1 that has the 497000 that was the state of the table at the last compression, and then 26 singletons. A win comes in if an INSERT that adds in 50 rows can lead to inserting ('our_table', 50) into COUNTS, or a delete that eliminates 5000 rows puts in ('our_table', -5000). It's vital to run the compression reasonably often (much like VACUUM :-)) in order that the COUNTS summary table stays relatively small. -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/wp.html Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] optimizing Postgres queries
David Teran wrote: Index?Scan?using?key_value_meta_data__id_value__fk_index,?key_value_meta _data__id_value__fk_index?on?KEY_VALUE_META_DATA?t0??(cost=0.00..19.94 ?rows=14?width=75)?(actual?time=0.112..0.296?rows=25?loops=1) ??Index?Cond:?((ID_VALUE?=?21094)?OR?(ID_VALUE?=?21103)) Total runtime: 0.429 ms Much better. So i think i will first read more about this optimization stuff and regular maintenance things. This is something i like very much from FrontBase: no need for such things, simply start and run. But other things were not so fine ;-). Is there any hint where to start to understand more about this optimization problem? Read the FAQ. There is an item about slow queries and indexes. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Select max(foo) and select count(*) optimization
On Tuesday 06 January 2004 07:16, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Paul Tuckfield) wrote: Not that I'm offering to do the porgramming mind you, :) but . . In the case of select count(*), one optimization is to do a scan of the primary key, not the table itself, if the table has a primary key. In a certain commercial, lesser database, this is called an index fast full scan. It would be important to scan the index in physical order (sequential physical IO) and not in key order (random physical IO) The problem is that this optimization does not actually work. The index does not contain transaction visibility information, so you have to go to the pages of tuples in order to determine if any given tuple is visible. It was rejected as an idea to add transaction visibility information to indexes. The time I proposed, my idea was to vacuum tuples on page level while postgresql pushes buffers out of shared cache. If indexes had visibility information, they could be cleaned out of order than heap tuples. This wouldn't have eliminated vacuum entirely but at least frequently hit data would be clean. But it was rejected because of associated overhead. Just thought worh a mention.. Shridhar ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Select max(foo) and select count(*) optimization
On Tuesday 06 January 2004 01:22, Rod Taylor wrote: Anyway, with Rules you can force this: ON INSERT UPDATE counter SET tablecount = tablecount + 1; ON DELETE UPDATE counter SET tablecount = tablecount - 1; That would generate lot of dead tuples in counter table. How about select relpages,reltuples from pg_class where relname=tablename; Assuming the stats are recent enough, it would be much faster and accurate.. Shridhar ---(end of broadcast)--- TIP 8: explain analyze is your friend