Re: [PERFORM] anti-join chosen even when slower than old plan
2010/11/11 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: Yeah. For Kevin's case, it seems like we want the caching percentage to vary not so much based on which table we're hitting at the moment but on how much of it we're actually reading. Well, we could certainly take the expected number of pages to read and compare that to effective_cache_size. The thing that's missing in that equation is how much other stuff is competing for cache space. I've tried to avoid having the planner need to know the total size of the database cluster, but it's kind of hard to avoid that if you want to model this honestly. Would it be at all workable to have an estimate that so many megs of a table are in cache Yes, with Linux ... at least. (independently of any other table), and then we could scale the cost based on the expected number of pages to read versus that number? The trick here is that DBAs really aren't going to want to set such a per-table number (at least, most of the time) so we need a formula to get to a default estimate for that number based on some simple system-wide parameters. I'm not sure if that's any easier. My current ideas for future POC with pgfincore are around what is said currently in this thread. I'd like to have some maintenance stuff like auto-ANALYZE which report table and index usage of the OS cache, it might be % of data in cache and distribution of data in cache (perhaps only my last 15% of the table are in cache, or perhaps 15% of blocks with a more regular-random?- distribution) My current stats around OS cache illustrate that the OS page cache remain stable : number of blocks in memory per object does not change a lot once application have run long enough. Those are good stats to automaticaly adjust random_page_cost and seq_page_cost per per table or index. DBA provide accurate (with the hardware) random_page_cost and seq_page_cost , perhaps we may want a mem_page_cost (?). Or we just adjust rand_page_cost and seq_page_cost based on the average data in cache. Actually I think that updating *_page_cost and keeping the current design of effective_cache_size (in costsize.c) may rock enough. BTW, it seems that all these variants have an implicit assumption that if you're reading a small part of the table it's probably part of the working set; which is an assumption that could be 100% wrong. I don't see a way around it without trying to characterize the data access at an unworkably fine level, though. Exactly. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
2010/11/11 Robert Haas robertmh...@gmail.com: On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Yeah. For Kevin's case, it seems like we want the caching percentage to vary not so much based on which table we're hitting at the moment but on how much of it we're actually reading. Well, we could certainly take the expected number of pages to read and compare that to effective_cache_size. The thing that's missing in that equation is how much other stuff is competing for cache space. I've tried to avoid having the planner need to know the total size of the database cluster, but it's kind of hard to avoid that if you want to model this honestly. I'm not sure I agree with that. I mean, you could easily have a database that is much larger than effective_cache_size, but only that much of it is hot. Or, the hot portion could move around over time. And for reasons of both technical complexity and plan stability, I don't think we want to try to model that. It seems perfectly reasonable to say that reading 25% of effective_cache_size will be more expensive *per-page* than reading 5% of effective_cache_size, independently of what the total cluster size is. Would it be at all workable to have an estimate that so many megs of a table are in cache (independently of any other table), and then we could scale the cost based on the expected number of pages to read versus that number? The trick here is that DBAs really aren't going to want to set such a per-table number (at least, most of the time) so we need a formula to get to a default estimate for that number based on some simple system-wide parameters. I'm not sure if that's any easier. That's an interesting idea. For the sake of argument, suppose we assume that a relation which is less than 5% of effective_cache_size will be fully cached; and anything larger we'll assume that much of it is cached. Consider a 4GB machine with effective_cache_size set to 3GB. Then we'll assume that any relation less than 153MB table is 100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5% cached. That doesn't seem quite right, though: the caching percentage drops off very quickly after you exceed the threshold. *thinks* I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? See index_pages_fetched() in costsize.c BTW, it seems that all these variants have an implicit assumption that if you're reading a small part of the table it's probably part of the working set; which is an assumption that could be 100% wrong. I don't see a way around it without trying to characterize the data access at an unworkably fine level, though. Me neither, but I think it will frequently be true, and I'm not sure it will hurt very much when it isn't. I mean, if you execute the same query repeatedly, that data will become hot soon enough. If you execute a lot of different queries that each touch a small portion of a big, cold table, we might underestimate the costs of the index probes, but so what? There's probably no better strategy for accessing that table anyway. Perhaps you can construct an example where this underestimate affects the join order in an undesirable fashion, but I'm having a hard time getting worked up about that as a potential problem case. Our current system - where we essentially assume that the caching percentage is uniform across the board - can have the same problem in less artificial cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
I'd say there are two Qs here: 1) Modify costs based on information on how much of the table is in cache. It would be great if this can be done, but I'd prefer to have it as admin knobs (because of plan stability). May be both admin and automatic ways can be followed with some parallel (disableable) process modify knobs on admin behalf. In this case different strategies to automatically modify knobs can be applied. 2) Modify costs for part of table retrieval. Then you need to define part. Current ways are partitioning and partial indexes. Some similar to partial index thing may be created, that has only where clause and no data. But has statistics and knobs (and may be personal bufferspace if they are introduced). I don't like to gather data about last X percents or like, because it works only in clustering and it's hard for optimizer to decide if it will be enough to scan only this percents for given query. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
I supposed it was an answer to my mail but not sure... please keep CC'ed people, it is easier to follow threads (at least for me) 2010/11/12 Vitalii Tymchyshyn tiv...@gmail.com: I'd say there are two Qs here: 1) Modify costs based on information on how much of the table is in cache. It would be great if this can be done, but I'd prefer to have it as admin knobs (because of plan stability). May be both admin and automatic ways can be followed with some parallel (disableable) process modify knobs on admin behalf. In this case different strategies to automatically modify knobs can be applied. OS cache is usualy stable enough to keep your plans stable too, I think. 2) Modify costs for part of table retrieval. Then you need to define part. Current ways are partitioning and partial indexes. Some similar to partial index thing may be created, that has only where clause and no data. But has statistics and knobs (and may be personal bufferspace if they are introduced). I don't like to gather data about last X percents or like, because it works only in clustering and it's hard for optimizer to decide if it will be enough to scan only this percents for given query. Modifying random_page_cost and sequential_page_cost thanks to statistics about cached blocks can be improved if we know the distribution. It does not mean : we know we have last 15% in cache, and we are goign to request those 15%. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
12.11.10 12:56, Cédric Villemain написав(ла): I supposed it was an answer to my mail but not sure... please keep CC'ed people, it is easier to follow threads (at least for me) OK 2010/11/12 Vitalii Tymchyshyntiv...@gmail.com: I'd say there are two Qs here: 1) Modify costs based on information on how much of the table is in cache. It would be great if this can be done, but I'd prefer to have it as admin knobs (because of plan stability). May be both admin and automatic ways can be followed with some parallel (disableable) process modify knobs on admin behalf. In this case different strategies to automatically modify knobs can be applied. OS cache is usualy stable enough to keep your plans stable too, I think. Not if it is on edge. There are always edge cases where data fluctuates near some threshold. 2) Modify costs for part of table retrieval. Then you need to define part. Current ways are partitioning and partial indexes. Some similar to partial index thing may be created, that has only where clause and no data. But has statistics and knobs (and may be personal bufferspace if they are introduced). I don't like to gather data about last X percents or like, because it works only in clustering and it's hard for optimizer to decide if it will be enough to scan only this percents for given query. Modifying random_page_cost and sequential_page_cost thanks to statistics about cached blocks can be improved if we know the distribution. It does not mean : we know we have last 15% in cache, and we are goign to request those 15%. You mean *_cost for the whole table, don't you? That is case (1) for me. Case (2) is when different cost values are selected based on what portion of table is requested in the query. E.g. when we have data for the whole day in one table, data for the last hour is cached and all the other data is not. Optimizer then may use different *_cost for query that requires all the data and for query that requires only last hour data. But, as I've said, that is much more complex task then (1). Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
2010/11/12 Vitalii Tymchyshyn tiv...@gmail.com: 12.11.10 12:56, Cédric Villemain написав(ла): I supposed it was an answer to my mail but not sure... please keep CC'ed people, it is easier to follow threads (at least for me) OK 2010/11/12 Vitalii Tymchyshyntiv...@gmail.com: I'd say there are two Qs here: 1) Modify costs based on information on how much of the table is in cache. It would be great if this can be done, but I'd prefer to have it as admin knobs (because of plan stability). May be both admin and automatic ways can be followed with some parallel (disableable) process modify knobs on admin behalf. In this case different strategies to automatically modify knobs can be applied. OS cache is usualy stable enough to keep your plans stable too, I think. Not if it is on edge. There are always edge cases where data fluctuates near some threshold. So far I did some analysis on the topic with pgfincore. Tables and index first have peak and holes if you graph the % of blocks in cache at the server start, but after a while, it is more stable. Maybe there are applications where linux faill to find a 'stable' page cache. If people are able to graph the pgfincore results for all or part of the objects of their database it will give us more robust analysis. Especially when corner case with the planner exists (like here). 2) Modify costs for part of table retrieval. Then you need to define part. Current ways are partitioning and partial indexes. Some similar to partial index thing may be created, that has only where clause and no data. But has statistics and knobs (and may be personal bufferspace if they are introduced). I don't like to gather data about last X percents or like, because it works only in clustering and it's hard for optimizer to decide if it will be enough to scan only this percents for given query. Modifying random_page_cost and sequential_page_cost thanks to statistics about cached blocks can be improved if we know the distribution. It does not mean : we know we have last 15% in cache, and we are goign to request those 15%. You mean *_cost for the whole table, don't you? That is case (1) for me. Yes. Case (2) is when different cost values are selected based on what portion of table is requested in the query. E.g. when we have data for the whole day in one table, data for the last hour is cached and all the other data is not. Optimizer then may use different *_cost for query that requires all the data and for query that requires only last hour data. But, as I've said, that is much more complex task then (1). I need to think some more time of that. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] MVCC performance issue
This is my first post in this mailing list and I would like to raise an issue that in my opinion is causing performance issues of PostgreSQL especially in a transaction processing environment. In my company we are using PostgreSQL for the last 8 year for our in-house developed billing system (telecom). The last few months we started considering moving to another RDBMS just because of this issue. After all these years, I believe that the biggest improvement that could be done and will boost overall performance especially for enterprise application will be to improve Multiversion Concurrency Control (MVCC) mechanism. In theory this seems to be improving performance for SELECT queries but on tables with very intensive and frequent updates, even that is not fully true because of the fragmentation of data caused by MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used as a buffer) took more than 40min to return a result! VACUUM is not a solution in my opinion even though after the introduction of autovacuum daemon situation got much better. PROBLEM DECRIPTION -- By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. Any SELECT queries within the same session are accessing the new version of the raw and all other queries from other users are still accessing the old version. When transaction is COMMIT PostgreSQL makes the a new version of the row as the active row and expires the old row that remains dead and then is up to VACUUM procedure to recover the dead rows space and make it available to the database engine. In case that transaction is ROLLBACK then space reserved for the new version of the row is released. The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. 8) After that VACUUM will have to deal only with deletions!!! I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. I strongly believe that the outcome of the discussion regarding this issue will be helpful. Best Regards, Kyriacos Kyriacou Senior Developer/DBA -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Fri, Nov 12, 2010 at 03:47:30PM +0200, Kyriacos Kyriacou wrote: This is my first post in this mailing list and I would like to raise an issue that in my opinion is causing performance issues of PostgreSQL especially in a transaction processing environment. In my company we are using PostgreSQL for the last 8 year for our in-house developed billing system (telecom). The last few months we started considering moving to another RDBMS just because of this issue. After all these years, I believe that the biggest improvement that could be done and will boost overall performance especially for enterprise application will be to improve Multiversion Concurrency Control (MVCC) mechanism. In theory this seems to be improving performance for SELECT queries but on tables with very intensive and frequent updates, even that is not fully true because of the fragmentation of data caused by MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used as a buffer) took more than 40min to return a result! VACUUM is not a solution in my opinion even though after the introduction of autovacuum daemon situation got much better. PROBLEM DECRIPTION -- By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. Any SELECT queries within the same session are accessing the new version of the raw and all other queries from other users are still accessing the old version. When transaction is COMMIT PostgreSQL makes the a new version of the row as the active row and expires the old row that remains dead and then is up to VACUUM procedure to recover the dead rows space and make it available to the database engine. In case that transaction is ROLLBACK then space reserved for the new version of the row is released. The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. 8) After that VACUUM will have to deal only with deletions!!! I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. I strongly believe that the outcome of the discussion regarding this issue will be helpful. Best Regards, Kyriacos Kyriacou Senior Developer/DBA I cannot speak to your suggestion, but it sounds like you are not vacuuming enough and a lot of the bloat/randomization would be helped by making use of HOT updates in which the updates are all in the same page and are reclaimed almost immediately. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On 12 November 2010 13:47, Kyriacos Kyriacou kyriac...@prime-tel.comwrote: This is my first post in this mailing list and I would like to raise an issue that in my opinion is causing performance issues of PostgreSQL especially in a transaction processing environment. In my company we are using PostgreSQL for the last 8 year for our in-house developed billing system (telecom). The last few months we started considering moving to another RDBMS just because of this issue. After all these years, I believe that the biggest improvement that could be done and will boost overall performance especially for enterprise application will be to improve Multiversion Concurrency Control (MVCC) mechanism. In theory this seems to be improving performance for SELECT queries but on tables with very intensive and frequent updates, even that is not fully true because of the fragmentation of data caused by MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used as a buffer) took more than 40min to return a result! VACUUM is not a solution in my opinion even though after the introduction of autovacuum daemon situation got much better. PROBLEM DECRIPTION -- By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. Any SELECT queries within the same session are accessing the new version of the raw and all other queries from other users are still accessing the old version. When transaction is COMMIT PostgreSQL makes the a new version of the row as the active row and expires the old row that remains dead and then is up to VACUUM procedure to recover the dead rows space and make it available to the database engine. In case that transaction is ROLLBACK then space reserved for the new version of the row is released. The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. 8) After that VACUUM will have to deal only with deletions!!! I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. I strongly believe that the outcome of the discussion regarding this issue will be helpful. Which version of PostgreSQL are you basing this on? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
Re: [PERFORM] postmaster consuming /lots/ of memory with hash aggregate. why?
On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/11/12 Jon Nelson jnelson+pg...@jamponi.net: On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello look on EXPLAIN ANALYZE command. Probably your statistic are out, and then planner can be confused. EXPLAIN ANALYZE statement show it. As I noted earlier, I did set statistics to 1000 an re-ran vacuum analyze and the plan did not change. this change can do nothing. this is default in config. did you use ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE No. To be clear: are you saying that changing the value for default_statistics_target, restarting postgresql, and re-running VACUUM ANALYZE does *not* change the statistics for columns created/populated *prior* to the sequence of operations, and that one /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE? That does not jive with the documentation, which appears to suggest that setting a new default_statistics_target, restarting postgresql, and then re-ANALYZE'ing a table should be sufficient (provided the columns have not had a statistics target explicitly set). What other diagnostics can I provide? This still doesn't answer the 4 row question, though. It seems absurd to me that the planner would give up and just use 4 rows (0.02 percent of the actual result). there can be some not well supported operation, then planner use a some % from rows without statistic based estimation The strange thing is that the value 4 keeps popping up in totally diffferent contexts, with different tables, databases, etc... I tried digging through the code and the only thing I found was that numGroups was being set to 4 but I couldn't see where. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote: I cannot speak to your suggestion, but it sounds like you are not vacuuming enough and a lot of the bloat/randomization would be helped by making use of HOT updates in which the updates are all in the same page and are reclaimed almost immediately. Regards, Ken IIRC, HOT only operates on non-indexed columns, so if you the tables are heavily indexed you won't get the full benefit of HOT. I could be wrong though. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote: On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote: I cannot speak to your suggestion, but it sounds like you are not vacuuming enough and a lot of the bloat/randomization would be helped by making use of HOT updates in which the updates are all in the same page and are reclaimed almost immediately. Regards, Ken IIRC, HOT only operates on non-indexed columns, so if you the tables are heavily indexed you won't get the full benefit of HOT. I could be wrong though. That is true, but if they are truly having as big a bloat problem as the message indicated, it would be worth designing the schema to leverage HOT for the very frequent updates. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
12.11.10 15:47, Kyriacos Kyriacou написав(ла): PROBLEM DECRIPTION -- As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! Have you considered splitting customer record into two tables with mostly read-only data and with data that is updated often? Such 1-1 relationship can make a huge difference to performance in your case. You can even try to simulate old schema by using an updateable view. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postmaster consuming /lots/ of memory with hash aggregate. why?
2010/11/12 Jon Nelson jnelson+pg...@jamponi.net: On Thu, Nov 11, 2010 at 10:38 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/11/12 Jon Nelson jnelson+pg...@jamponi.net: On Thu, Nov 11, 2010 at 10:26 PM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello look on EXPLAIN ANALYZE command. Probably your statistic are out, and then planner can be confused. EXPLAIN ANALYZE statement show it. As I noted earlier, I did set statistics to 1000 an re-ran vacuum analyze and the plan did not change. this change can do nothing. this is default in config. did you use ALTER TABLE ALTER COLUMN SET STATISTIC = ... ? and ANALYZE No. To be clear: are you saying that changing the value for default_statistics_target, restarting postgresql, and re-running VACUUM ANALYZE does *not* change the statistics for columns created/populated *prior* to the sequence of operations, and that one /must/ use ALTER TABLE ALTER COLUMN SET STATISTICS ... and re-ANALYZE? yes. but I was wrong. Documentation is correct. Problem is elsewhere. That does not jive with the documentation, which appears to suggest that setting a new default_statistics_target, restarting postgresql, and then re-ANALYZE'ing a table should be sufficient (provided the columns have not had a statistics target explicitly set). What other diagnostics can I provide? This still doesn't answer the 4 row question, though. It seems absurd to me that the planner would give up and just use 4 rows (0.02 percent of the actual result). there can be some not well supported operation, then planner use a some % from rows without statistic based estimation The strange thing is that the value 4 keeps popping up in totally diffferent contexts, with different tables, databases, etc... I tried digging through the code and the only thing I found was that numGroups was being set to 4 but I couldn't see where. if I remember well, you can set a number of group by ALTER TABLE ALTER COLUMN SET n_distinct = .. maybe you use it. Regards Pavel Stehule http://www.postgresql.org/docs/9.0/interactive/sql-altertable.html -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! I have it in my plans so in next few months I will setup new servers and upgrade to version 9. Which version of PostgreSQL are you basing this on? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
Re: [PERFORM] MVCC performance issue
This was done already as a workaround after identifying this problem. I just gave it as an example. -Original Message- From: Vitalii Tymchyshyn [mailto:tiv...@gmail.com] Sent: Friday, November 12, 2010 5:54 PM To: Kyriacos Kyriacou Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] MVCC performance issue 12.11.10 15:47, Kyriacos Kyriacou написав(ла): PROBLEM DECRIPTION -- As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! Have you considered splitting customer record into two tables with mostly read-only data and with data that is updated often? Such 1-1 relationship can make a huge difference to performance in your case. You can even try to simulate old schema by using an updateable view. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote: SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc Your use of raw is confusing. I'll just ignore the word. New row versions are already stored in a dynamically allocated spot, right along with the other versions of the table. You are assuming that getting to the correct version of the row is very slow? That's only going to be the case if you have lots and lots of versions. And your solution will not actually help if there are lots of versions. While one person who is hitting the most recent version might be ok, everyone else will still have to search for theirs. Just as they do now. 2) Any SELECT queries within the same session will be again accessing the new version of the row I don't see how this is different from what we currently have. same session could have been dropped from your separate table space, and then you'd have to go search through previous versions of the row... exactly like you do now. And worse, if you dont want to drop your version of the row from the separate table space until you commit/rollback, then no other user can start a transaction on that table until your done! oh no! You have reads and writes blocking each other. 3) Any SELECT queries from other users will still be accessing the old version Again.. the same. 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location current layout makes rollback very very fast. 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). Not sure what you mean by lock, but lock requires single user access and slow's things down. Right now we just bump the most active transaction number, which is very efficient, and requires no locks. As soon as you lock anything, somebody, by definition, has to wait. 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. You are kind of assuming there will only ever be one new transaction, and one old transaction. What about a case where 10 people start a transaction, and there are 10 versions of the row? It seems to me like you are using very long transactions, which is causing lots of row versions to show up. Have you run explain analyze on your slow querys to find out the problems? Have you checked to see if you are cpu bound or io bound? If you are dealing with lots of row versions, I'd assume you are cpu bound. If you check your system though, and see you are io bound, I think that might invalidate your assumptions above. MVCC makes multi user access very nice because readers and writers dont block each other, and there are very few locks. It does come with some kinks (gotta vacuum, keep transactions short, you must commit, etc). select count(*) for example is always going to be slow... just expect it, lets not destroy what works well about the database just to make it fast. Instead, find a better alternative so you dont have to run it. Just like any database, you have to work within MVCC's good points and try to avoid the bad spots. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
Ah, this is a very old version. If you can take advantage of a version with HOT support, you should be much, much happier. Cheers, Ken On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! I have it in my plans so in next few months I will setup new servers and upgrade to version 9. Which version of PostgreSQL are you basing this on? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On 12 November 2010 16:14, Kyriacos Kyriacou kyriac...@prime-tel.comwrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! I have it in my plans so in next few months I will setup new servers and upgrade to version 9. Everything changed, performance-wise, in 8.3, and there have also been improvements since then too. So rather than completely changing your database platform, at least take a look at what work has gone into Postgres since the version you're using. http://www.postgresql.org/docs/8.3/static/release-8-3.html#AEN87319 -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
Re: [PERFORM] MVCC performance issue
On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! This is why we have slony, so you can slowly upgrade your 200Gb while you're live and then only suffer a minute or so of downtime while you switchover. Even if you only install slony for the point of the upgrade and then uninstall it after you're done, that seems well worth it to me rather than running on 8.2.4 for a while. Note there were some changes between 8.2 and 8.3 in regards to casting that might make you revisit your application.
Re: [PERFORM] MVCC performance issue
Kyriacos Kyriacou kyriac...@prime-tel.com writes: We are still using PostgreSQL 8.2.4. In that case you don't have HOT updates, so it seems to me to be a little premature to be proposing a 100% rewrite of the system to fix your problems. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? No. The ideal of trying to know what is actually in cache strikes me as an almost certain non-starter. It can change very quickly, even as a result of the query you're actually running. And getting the information we'd need in order to do it that way would be very expensive, when it can be done at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
To be honest I just now read about HOT (Heap Overflow Tuple) and it seems that will help a lot. Thanks for your point. Kyriacos -Original Message- From: Kenneth Marshall [mailto:k...@rice.edu] Sent: Friday, November 12, 2010 6:22 PM To: Kyriacos Kyriacou Cc: Thom Brown; pgsql-performance@postgresql.org Subject: Re: [PERFORM] MVCC performance issue Ah, this is a very old version. If you can take advantage of a version with HOT support, you should be much, much happier. Cheers, Ken On Fri, Nov 12, 2010 at 06:14:00PM +0200, Kyriacos Kyriacou wrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! I have it in my plans so in next few months I will setup new servers and upgrade to version 9. Which version of PostgreSQL are you basing this on? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
Robert Haas robertmh...@gmail.com writes: On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? No. I think his point is that we already have a proven formula (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air. The problem is to figure out what numbers to apply the M-L formula to. I've been thinking that we ought to try to use it in the context of the query as a whole rather than for individual table scans; the current usage already has some of that flavor but we haven't taken it to the logical conclusion. The ideal of trying to know what is actually in cache strikes me as an almost certain non-starter. Agreed on that point. Plan stability would go out the window. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
My suggestion had just a single difference from what currently MVCC is doing (btw I never said that MVCC is bad). NOW === on COMMIT previous version record is expired and the new version record (created in new dynamically allocated spot, as you said) is set as active MY === on COMMIT, to update new version data over the same physical location that initial version was and release the space used to keep the new version (that was dynamically allocated). The rest are all the same! I do not think that this is breaking anything and I still believe that this might help. I will try to plan upgrade the soonest possible to the newest version. Reading few words about HOT updates it seems that more or less is similar to what I have described and will be very helpful. Kyriacos -Original Message- From: Andy Colson [mailto:a...@squeakycode.net] Sent: Friday, November 12, 2010 6:22 PM To: Kyriacos Kyriacou Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] MVCC performance issue On 11/12/2010 7:47 AM, Kyriacos Kyriacou wrote: SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc Your use of raw is confusing. I'll just ignore the word. New row versions are already stored in a dynamically allocated spot, right along with the other versions of the table. You are assuming that getting to the correct version of the row is very slow? That's only going to be the case if you have lots and lots of versions. And your solution will not actually help if there are lots of versions. While one person who is hitting the most recent version might be ok, everyone else will still have to search for theirs. Just as they do now. 2) Any SELECT queries within the same session will be again accessing the new version of the row I don't see how this is different from what we currently have. same session could have been dropped from your separate table space, and then you'd have to go search through previous versions of the row... exactly like you do now. And worse, if you dont want to drop your version of the row from the separate table space until you commit/rollback, then no other user can start a transaction on that table until your done! oh no! You have reads and writes blocking each other. 3) Any SELECT queries from other users will still be accessing the old version Again.. the same. 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location current layout makes rollback very very fast. 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). Not sure what you mean by lock, but lock requires single user access and slow's things down. Right now we just bump the most active transaction number, which is very efficient, and requires no locks. As soon as you lock anything, somebody, by definition, has to wait. 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. You are kind of assuming there will only ever be one new transaction, and one old transaction. What about a case where 10 people start a transaction, and there are 10 versions of the row? It seems to me like you are using very long transactions, which is causing lots of row versions to show up. Have you run explain analyze on your slow querys to find out the problems? Have you checked to see if you are cpu bound or io bound? If you are dealing with lots of row versions, I'd assume you are cpu bound. If you check your system though, and see you are io bound, I think that might invalidate your assumptions above. MVCC makes multi user access very nice because readers and writers dont block each other, and there are very few locks. It does come with some kinks (gotta vacuum, keep transactions short, you must commit, etc). select count(*) for example is always going to be slow... just expect it, lets not destroy what works well about the database just to make it fast. Instead, find a better alternative so you dont have to run it. Just like any database, you have to work within MVCC's good points and try to avoid the bad spots. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Fri, Nov 12, 2010 at 9:22 AM, Thom Brown t...@linux.com wrote: On 12 November 2010 16:14, Kyriacos Kyriacou kyriac...@prime-tel.com wrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! I have it in my plans so in next few months I will setup new servers and upgrade to version 9. Everything changed, performance-wise, in 8.3, and there have also been improvements since then too. So rather than completely changing your database platform, at least take a look at what work has gone into Postgres since the version you're using. Agreed. 8.3 was a colossal step forward for pg performance. 8.4 was a huge step ahead in maintenance with on disk fsm. If I was upgrading from 8.2 today I would go straight to 8.4 and skip 8.3 since it's a much bigger pain in the butt to configure for fsm stuff. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
OK, in general you have to pay for MVCC one way or another. Many databases make you pay as you go, so to speak, by storing all the MVCC info in a log to be applied at some future date. Other databases you can pay later, by storing all the MVCC in the table itself. Both have similar costs, but one can punish you harshly if you let the MVCC data stored in the database get out of hand. 8.3 and above are much more aggresive about autovacuuming, and on bigger hardware you can make it VERY aggressive and keep the bloat out while keeping up good throughput. On some servers I set up 4 or 6 or 8 autovacuum threads to keep up.If you were on another db you might be adding more drives to make some other part faster. For batch processing storing all MVCC data in the data store can be problematic, but for more normal work where you're changing 1% of a table all the time it can be very fast. Some other databases will just run out of space to store transactions and roll back everything you've done. PostgreSQL will gladly let you shoot yourself in the foot with bloating the data store by running successive whole table updates without vacuuming in between. Bottom line, if your hardware can't keep up, it can't keep up. If vacuum capsizes your IO and still can't keep up then you need more disks and / or better storage subsystems. A 32 disk array with single controller goes for ~$7 to $10k, and you can sustain some pretty amazing thgouhput on that kind of IO subsystem. If you're doing batch processing you can get a lot return by just making sure you vacuum after each mass update. Especially if you are on a single use machine with no cost delays for vacuum, running a vacuum on a freshly worked table should be pretty fast. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Fri, Nov 12, 2010 at 9:19 AM, Ben Chobot be...@silentmedia.com wrote: On Nov 12, 2010, at 8:14 AM, Kyriacos Kyriacou wrote: We are still using PostgreSQL 8.2.4. We are running a 24x7 system and database size is over 200Gb so upgrade is not an easy decision! This is why we have slony, so you can slowly upgrade your 200Gb while you're live and then only suffer a minute or so of downtime while you switchover. Even if you only install slony for the point of the upgrade and then uninstall it after you're done, that seems well worth it to me rather than running on 8.2.4 for a while. Note there were some changes between 8.2 and 8.3 in regards to casting that might make you revisit your application. I work in a slony shop and we used slony to upgrade from 8.2 to 8.3 and it was a breeze. Course we practiced on some test machines first, but it went really smoothly. Our total downtime, due to necessary testing before going live again, was less than 20 mintues. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
HOT also usually requires setting FILLFACTOR to something other than the default for your table, so that there is guaranteed room in the page to modify data without allocating a new page. If you have fillfactor=75, then basically this proposal is already done -- each page has 25% temp space for updates in it. With the caveat that that is only true if the updates are to columns without indexes. On Nov 12, 2010, at 7:37 AM, Kenneth Marshall wrote: On Fri, Nov 12, 2010 at 07:34:36AM -0800, bricklen wrote: On Fri, Nov 12, 2010 at 5:52 AM, Kenneth Marshall k...@rice.edu wrote: I cannot speak to your suggestion, but it sounds like you are not vacuuming enough and a lot of the bloat/randomization would be helped by making use of HOT updates in which the updates are all in the same page and are reclaimed almost immediately. Regards, Ken IIRC, HOT only operates on non-indexed columns, so if you the tables are heavily indexed you won't get the full benefit of HOT. I could be wrong though. That is true, but if they are truly having as big a bloat problem as the message indicated, it would be worth designing the schema to leverage HOT for the very frequent updates. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On Nov 12, 2010, at 9:13 AM, Kyriacos Kyriacou wrote: My suggestion had just a single difference from what currently MVCC is doing (btw I never said that MVCC is bad). NOW === on COMMIT previous version record is expired and the new version record (created in new dynamically allocated spot, as you said) is set as active MY === on COMMIT, to update new version data over the same physical location that initial version was and release the space used to keep the new version (that was dynamically allocated). But what about other transactions that can still see the old version? You can't overwrite the old data if there are any other transactions open in the system at all. You have to have a mechanism to keep the old copy around for a while. The rest are all the same! I do not think that this is breaking anything and I still believe that this might help. I will try to plan upgrade the soonest possible to the newest version. Reading few words about HOT updates it seems that more or less is similar to what I have described and will be very helpful. Kyriacos -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
On Fri, Nov 12, 2010 at 11:43 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think his point is that we already have a proven formula (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air. The problem is to figure out what numbers to apply the M-L formula to. I'm not sure that's really measuring the same thing, although I'm not opposed to using it if it produces reasonable answers. I've been thinking that we ought to try to use it in the context of the query as a whole rather than for individual table scans; the current usage already has some of that flavor but we haven't taken it to the logical conclusion. That's got a pretty severe chicken-and-egg problem though, doesn't it? You're going to need to know how much data you're touching to estimate the costs so you can pick the best plan, but you can't know how much data will ultimately be touched until you've got the whole plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] questions regarding shared_buffers behavior
On Sun, Nov 7, 2010 at 10:03 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2010/11/8 Mark Rostron mrost...@ql2.com: What is the procedure that postgres uses to decide whether or not a table/index block will be left in the shared_buffers cache at the end of the operation? The only special cases are for sequential scans and VACUUM, which use continuously re-use a small section of the buffer cache in some cases instead. Thanks - the part about sequential scans and the re-use of a small section of shared_buffers is the bit I was interested in. I don't suppose you would be able to tell me how large that re-useable area might be? There are 256KB per seqscan and 256KB per vacuum. I suggest you to go reading src/backend/storage/buffer/README Note that there is a different, higher limit for the bulk write strategy when using COPY IN or CTAS. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why dose the planner select one bad scan plan.
Thanks for your answer! And I am sorry for trading the question as a bug, and send it to 'bugs' mailing-list. But I doubt your answer. I think the essence of the problem is when the planner selects 'Bitmap Index Scan' and how the planner computes the cost of 'Bitmap Index Scan'. Tom Lane said “In principle a bitmap index scan should be significantly faster if the index can return the bitmap more or less natively rather than having to construct it. My recollection though is that a significant amount of work is needed to make that happen, and that there is no existing patch that tackled the problem. So I'm not sure that this report should be taken as indicating that there's no chance of a SELECT performance improvement. What it does say is that we have to do that work if we want to make bitmap indexes useful.” Okay, I want to know how the planner computes the cost of constructing bitmap. And when the planner computes the cost of 'Bitmap Index Scan', if it considers the influence of memory cache? As when I do not clear the memory cache, I find the 'Bitmap Index Scan' is real fast than 'Seq Scan'. Best Regards! Asen
[PERFORM] MVCC performance issue
This is my first post in this mailing list and I would like to raise an issue that in my opinion is causing performance issues of PostgreSQL especially in a transaction processing environment. In my company we are using PostgreSQL for the last 8 year for our in-house developed billing system (telecom). The last few months we started considering moving to another RDBMS just because of this issue. After all these years, I believe that the biggest improvement that could be done and will boost overall performance especially for enterprise application will be to improve Multiversion Concurrency Control (MVCC) mechanism. In theory this seems to be improving performance for SELECT queries but on tables with very intensive and frequent updates, even that is not fully true because of the fragmentation of data caused by MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used as a buffer) took more than 40min to return a result! VACUUM is not a solution in my opinion even though after the introduction of autovacuum daemon situation got much better. PROBLEM DECRIPTION -- By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a new copy of the row in a new location. Any SELECT queries within the same session are accessing the new version of the raw and all other queries from other users are still accessing the old version. When transaction is COMMIT PostgreSQL makes the a new version of the row as the active row and expires the old row that remains dead and then is up to VACUUM procedure to recover the dead rows space and make it available to the database engine. In case that transaction is ROLLBACK then space reserved for the new version of the row is released. The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. As an example, consider updating the live balance of a customer for each phone call where the entire customer record has to be duplicated again and again upon each call just for modifying a numeric value! SUGGESTION -- 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version 4) When UPDATE transaction is ROLLBACK just release the space used in new temporary location 5) When UPDATE transaction is COMMIT then try to LOCK the old version and overwrite it at the same physical location (NO FRAGMENTATION). 6) Similar mechanism can be applied on INSERTS and DELETES 7) In case that transaction was COMMIT, the temporary location can be either released or archived/cleaned on a pre-scheduled basis. This will possibly allow the introduction of a TRANSACTION LOG backup mechanism as a next step. 8) After that VACUUM will have to deal only with deletions!!! I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. I strongly believe that the outcome of the discussion regarding this issue will be helpful. Best Regards, Kyriacos Kyriacou Senior Developer/DBA -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] autovacuum blocks the operations of other manual vacuum
Hi, I have a question about the behavior of autovacuum. When I have a big table A which is being processed by autovacuum, I also manually use (full) vacuum to clean another table B. Then I found that I always got something like “found 0 removable, 14283 nonremovable row”. However, if I stop the autovacuum functionality and use vacuum on that big table A manually, I can clean table B (ex. found 22615 removable, 2049 nonremovable row). Is this correct? Why do vacuum and autovacuum have different actions? Ps. My postgreSQL is 8.4. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] temporary tables, indexes, and query plans
On Thu, Oct 28, 2010 at 9:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jon Nelson jnelson+pg...@jamponi.net writes: On Wed, Oct 27, 2010 at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: It thinks it's faster, or there is some reason why it *can't* use the index, like a datatype mismatch. You could tell which by trying set enable_seqscan = off to see if that will make it change to another plan; if so, the estimated costs of that plan versus the original seqscan would be valuable information. When I place the index creation and ANALYZE right after the bulk update, follow it with 'set enable_seqscan = false', the next query (also an UPDATE - should be about 7 rows) results in this plan: Seq Scan on foo_table (cost=100.00..1004998.00 rows=24 width=236) OK, so it thinks it can't use the index. (The cost=100 bit is the effect of enable_seqscan = off: it's not possible to just never use seqscans, but we assign an artificially high cost to discourage the planner from selecting them if there's any other alternative.) So we're back to wondering why it can't use the index. I will say once more that we could probably figure this out quickly if you'd post an exact example instead of handwaving. OK. This is a highly distilled example that shows the behavior. The ANALYZE doesn't appear to change anything, nor the SET STATISTICS (followed by ANALYZE), nor disabling seqential scans. Re-writing the table with ALTER TABLE does, though. If the initial UPDATE (the one before the index creation) is commented out, then the subsequent updates don't use sequential scans. \timing off BEGIN; CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b, ''::text AS c from generate_series(1,500) AS x; UPDATE foo SET c = 'foo' WHERE b = 'A' ; CREATE INDEX foo_b_idx on foo (b); -- let's see what it looks like EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C'; -- does forcing a seqscan off help? set enable_seqscan = false; EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C'; -- what about analyze? ANALYZE VERBOSE foo; EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C'; -- what about statistics? ALTER TABLE foo ALTER COLUMN b SET STATISTICS 1; ANALYZE VERBOSE foo; EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C'; -- let's re-write the table ALTER TABLE foo ALTER COLUMN a TYPE int; EXPLAIN UPDATE foo SET c='bar' WHERE b = 'C'; ROLLBACK; -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
2010/11/12 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: I wondering if we could do something with a formula like 3 * amount_of_data_to_read / (3 * amount_of_data_to_read + effective_cache_size) = percentage NOT cached. That is, if we're reading an amount of data equal to effective_cache_size, we assume 25% caching, and plot a smooth curve through that point. In the examples above, we would assume that a 150MB read is 87% cached, a 1GB read is 50% cached, and a 3GB read is 25% cached. But isn't it already the behavior of effective_cache_size usage ? No. I think his point is that we already have a proven formula (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air. The problem is to figure out what numbers to apply the M-L formula to. I've been thinking that we ought to try to use it in the context of the query as a whole rather than for individual table scans; the current usage already has some of that flavor but we haven't taken it to the logical conclusion. The ideal of trying to know what is actually in cache strikes me as an almost certain non-starter. Agreed on that point. Plan stability would go out the window. Point is not to now the current cache, but like for ANALYZE on a regular basis (probably something around number of page read/hit) run a cache_analyze which report stats like ANALYZE do, and may be adjusted per table like auto_analyze is. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote: The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. Yep. It's all about trade-offs. For some workloads the in-table MVCC storage setup works pretty darn poorly, but for most it seems to work quite well. There are various other methods of implementing relational storage with ACID properties. You can exclude all other transactions while making a change to a table, ensuring that nobody else can see old or new rows so there's no need to keep them around. You can use an out-of-line redo log (a-la Oracle). Many other methods exist, too. They all have advantages and disadvantages for different workloads. It's far from trivial to mix multiple schemes within a single database, so mixing and matching schemes for different parts of your DB isn't generally practical. 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc OK, so you want a redo log a-la Oracle? 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version ... and incurring horrible random I/O penalties if the redo log doesn't fit in RAM. Again, a-la Oracle. Even read-only transactions have to hit the undo log if there's an update in progress, because rows they need may have been moved out to the undo log as they're updated in the main table storage. [snip description] I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. It sounds like you're describing Oracle-style MVCC, using redo logs. http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ http://en.wikipedia.org/wiki/Multiversion_concurrency_control Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space. Personally, I don't know enough to know which is better. I suspect they're just different, with different trade-offs. If redo logs allow you to do without write-ahead logging, that'd be interesting - but then, the WAL is useful for all sorts of replication options, and the use of linear WALs means that write ordering in the tables doesn't need to be as strict, which has performance advantages. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MVCC performance issue
In reading what you are describing, don't you think PG 9 goes a long way to helping you out? On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer cr...@postnewspapers.com.auwrote: On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote: The result is to have huge fragmentation on table space, unnecessary updates in all affected indexes, unnecessary costly I/O operations, poor performance on SELECT that retrieves big record sets (i.e. reports etc) and slower updates. Yep. It's all about trade-offs. For some workloads the in-table MVCC storage setup works pretty darn poorly, but for most it seems to work quite well. There are various other methods of implementing relational storage with ACID properties. You can exclude all other transactions while making a change to a table, ensuring that nobody else can see old or new rows so there's no need to keep them around. You can use an out-of-line redo log (a-la Oracle). Many other methods exist, too. They all have advantages and disadvantages for different workloads. It's far from trivial to mix multiple schemes within a single database, so mixing and matching schemes for different parts of your DB isn't generally practical. 1) When a raw UPDATE is performed, store all new raw versions either in separate temporary table space or in a reserved space at the end of each table (can be allocated dynamically) etc OK, so you want a redo log a-la Oracle? 2) Any SELECT queries within the same session will be again accessing the new version of the row 3) Any SELECT queries from other users will still be accessing the old version ... and incurring horrible random I/O penalties if the redo log doesn't fit in RAM. Again, a-la Oracle. Even read-only transactions have to hit the undo log if there's an update in progress, because rows they need may have been moved out to the undo log as they're updated in the main table storage. [snip description] I understand that my suggestion seems to be too simplified and also that there are many implementation details and difficulties that I am not aware. It sounds like you're describing Oracle-style MVCC, using redo logs. http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/ http://en.wikipedia.org/wiki/Multiversion_concurrency_control Oracle's MVCC approach has its own costs. Like Pg's, those costs increase with update/delete frequency. Instead of table bloat, Oracle suffers from redo log growth (or redo log size management issues). Instead of increased table scan costs from dead rows, Oracle suffers from random I/O costs as it looks up the out-of-line redo log for old rows. Instead of long-running writer transactions causing table bloat, Oracle can have problems with long-running reader transactions aborting when the redo log runs out of space. Personally, I don't know enough to know which is better. I suspect they're just different, with different trade-offs. If redo logs allow you to do without write-ahead logging, that'd be interesting - but then, the WAL is useful for all sorts of replication options, and the use of linear WALs means that write ordering in the tables doesn't need to be as strict, which has performance advantages. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance