Re: [PERFORM] Need help in performance tuning.
On 09/07/10 12:42, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. Admittedly I'm relatively ignorant of the details, but I increasingly think PostgreSQL will need a different big architectural change in the coming years, as the typical performance characteristics of machines change: It'll need to separate running queries from running processes, or start threading backends, so that one way or the other a single query can benefit from the capabilities of multiple CPUs. The same separation, or a move to async I/O, might be needed to get one query to concurrently read multiple partitions of a table, or otherwise get maximum benefit from high-capacity I/O subsystems when running just a few big, expensive queries. Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. My point? *if* I'm not full of hot air and there's some truth to my blather above, any change like that might be accompanied by a move to separate query execution state from connection state, so that idle connections have a much lower resource cost. OK, that's my hand-waving for the day done. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux box
On 09/07/10 02:31, Ryan Wexler wrote: Thanks a lot for all the comments. The fact that both my windows box and the old linux box both show a massive performance improvement over the new linux box seems to point to hardware to me. I am not sure how to test the fsync issue, but i don't see how that could be it. The raid card the server has in it is: 3Ware 4 Port 9650SE-4LPML RAID Card Looking it up, it seems to indicate that it has BBU The only other difference between the boxes is the postgresql version. The new one has 8.4-2 from the yum install instructions on the site: http://yum.pgrpms.org/reporpms/repoview/pgdg-centos.html Any more thoughts? Really dumb idea, you don't happen to have the build of the RPM's that had debug enabled do you? That resulted in significant performance problem? Regards Russell
Re: [PERFORM] Need help in performance tuning.
Thanx you all for the replies. I got a gist on where should I head towards like I should rely a bit on postgres for performance and rest on my tomcat and application. And will try connection pooling on postgres part. And if I come back for any query (related to this topic) then this time it will be more precise (with real time data of my testing). ;-) Regards haps On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer cr...@postnewspapers.com.au wrote: On 09/07/10 12:42, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. Admittedly I'm relatively ignorant of the details, but I increasingly think PostgreSQL will need a different big architectural change in the coming years, as the typical performance characteristics of machines change: It'll need to separate running queries from running processes, or start threading backends, so that one way or the other a single query can benefit from the capabilities of multiple CPUs. The same separation, or a move to async I/O, might be needed to get one query to concurrently read multiple partitions of a table, or otherwise get maximum benefit from high-capacity I/O subsystems when running just a few big, expensive queries. Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. My point? *if* I'm not full of hot air and there's some truth to my blather above, any change like that might be accompanied by a move to separate query execution state from connection state, so that idle connections have a much lower resource cost. OK, that's my hand-waving for the day done. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux box
On Fri, Jul 9, 2010 at 2:08 AM, Russell Smith mr-r...@pws.com.au wrote: On 09/07/10 02:31, Ryan Wexler wrote: The only other difference between the boxes is the postgresql version. The new one has 8.4-2 from the yum install instructions on the site: http://yum.pgrpms.org/reporpms/repoview/pgdg-centos.html Any more thoughts? Really dumb idea, you don't happen to have the build of the RPM's that had debug enabled do you? That resulted in significant performance problem? The OP mentions that the new system underperforms on a straight dd test, so it isn't the database config or postgres build. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. Oh, I should note at this point that I'm *not* whining that someone should volunteer to do this, or that the postgresql project should just make it happen. I'm fully aware that Pg is a volunteer project and that even if these speculations were in a vaguely reasonable direction, that doesn't mean anyone has the time/skills/knowledge/interest to undertake such major architectural change. I certainly know I have zero right to ask/expect anyone to - I'm very, very grateful to all those who already spend time helping out and enhancing Pg. With particular props to Tom Lane for patience on the -general list and heroic bug-fixing persistence. Sorry for the rely-to-self, I just realized my post could've been taken as a whine about Pg's architecture and some kind of demand that someone do something about it. That couldn't be further from my intent. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with planner row strange estimation
Robert Haas a écrit : On Wed, Jul 7, 2010 at 10:39 AM, damien hostin damien.hos...@axege.com wrote: Hello again, At last, I check the same query with the same data on my desktop computer. Just after loading the data, the queries were slow, I launch a vaccum analyse which collect good stats on the main table, the query became quick (~200ms). Now 1classic sata disk computer is faster than our little monster server !! Have you tried running ANALYZE on the production server? You might also want to try ALTER TABLE ... SET STATISTICS to a large value on some of the join columns involved in the query. Hello, Before comparing the test case on the two machines, I run analyse on the whole and look at pg_stats table to see if change occurs for the columns. but on the production server the stats never became as good as on the desktop computer. I set statistic at 1 on column used by the join, run analyse which take a 300 row sample then look at the stats. The stats are not as good as on the desktop. Row number is nearly the same but only 1 or 2 values are found. The data are not balanced the same way on the two computer : - Desktop is 12000 rows with 6000 implicated in the query (50%), - Production (actually a dev/test server) is 6 million rows with 6000 implicated in the query (0,1%). Columns used in the query are nullable, and in the 5994000 other rows that are not implicated in the query these columns are null. I don't know if the statistic target is a % or a number of value to obtain, but event set at max (1), it didn't managed to collect good stats (for this particular query). As I don't know what more to do, my conclusion is that the data need to be better balanced to allow the analyse gather better stats. But if there is a way to improve the stats/query with this ugly balanced data, I'm open to it ! I hope that in real production, data will never be loaded this way. If this appened we will maybe set enable_nestloop to off, but I don't think it's a good solution, other query have a chance to get slower. Thanks for helping -- HOSTIN Damien - Equipe RD Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. This sounds similar to the approach to taken with Replication for years before being moved into core. Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the box will meet the needs of most users. There is also the issue of perception/adoption here as well. One of my colleagues mentioned that at PG East that he repeatedly heard people talking (negatively) about the over reliance on add-on packages to deal with core DB functionality. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Thanx you all for the replies. I got a gist on where should I head towards like I should rely a bit on postgres for performance and rest on my tomcat and application. And will try connection pooling on postgres part. And if I come back for any query (related to this topic) then this time it will be more precise (with real time data of my testing). ;-) Regards haps On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer cr...@postnewspapers.com.au wrote: On 09/07/10 12:42, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. Admittedly I'm relatively ignorant of the details, but I increasingly think PostgreSQL will need a different big architectural change in the coming years, as the typical performance characteristics of machines change: It'll need to separate running queries from running processes, or start threading backends, so that one way or the other a single query can benefit from the capabilities of multiple CPUs. The same separation, or a move to async I/O, might be needed to get one query to concurrently read multiple partitions of a table, or otherwise get maximum benefit from high-capacity I/O subsystems when running just a few big, expensive queries. Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. My point? *if* I'm not full of hot air and there's some truth to my blather above, any change like that might be accompanied by a move to separate query execution state from connection state, so that idle connections have a much lower resource cost. OK, that's my hand-waving for the day done. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Brad Nicholson bnich...@ca.afilias.info wrote: Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the box will meet the needs of most users. Any thoughts on the minimalist solution I suggested a couple weeks ago?: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php So far, there has been no comment by anyone -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 9 Jul 2010, Kevin Grittner wrote: Any thoughts on the minimalist solution I suggested a couple weeks ago?: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php So far, there has been no comment by anyone Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? A proper connection pool provides the following advantages over this: 1. Pool can be on a separate machine or machines, spreading load. 2. Pool has a lightweight footprint per connection, whereas Postgres doesn't. 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. 5. A fair amount of the overhead is caused by context-switching between backends. The more backends, the less useful any CPU caches. 6. There are some internal workings of Postgres that involve keeping all the backends informed about something going on. The more backends, the greater this overhead is. (This was pretty bad with the sinval queue overflowing a while back, but a bit better now. It still causes some overhead). 7. That lock would have a metric *($!-load of contention. Matthew -- Unfortunately, university regulations probably prohibit me from eating small children in front of the lecture class. -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
In case there's any doubt, the questions below aren't rhetorical. Matthew Wakeling matt...@flymine.org wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. A proper connection pool provides the following advantages over this: 1. Pool can be on a separate machine or machines, spreading load. Sure, but how would you do that with a built-in implementation? 2. Pool has a lightweight footprint per connection, whereas Postgres doesn't. I haven't compared footprint of, say, a pgpool connection on the database server to that of an idle PostgreSQL connection. Do you have any numbers? 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. This would probably be most useful where the client held a connection for a long time, not for the login for each database transaction approach. I'm curious how often you think application software uses that approach. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. Isn't this point 2 again? If not, what are you getting at? Again, do you have numbers for the comparison, assuming the connection pooler is running on the database server? 5. A fair amount of the overhead is caused by context-switching between backends. The more backends, the less useful any CPU caches. Would this be true while a backend was blocked? Would this not be true for a connection pool client-side connection? 6. There are some internal workings of Postgres that involve keeping all the backends informed about something going on. The more backends, the greater this overhead is. (This was pretty bad with the sinval queue overflowing a while back, but a bit better now. It still causes some overhead). Hmmm... I hadn't thought about that. Again, any numbers (e.g., profile information) on this? 7. That lock would have a metric *($!-load of contention. Here I doubt you. It would be held for such short periods that I suspect that collisions would be relatively infrequent compared to some of the other locks we use. As noted in the email, it may actually normally be an increment and test within an existing locked block. Also, assuming that any built in connection pool would run on the database server, why would you think the contention for this would be worse than for whatever is monitoring connection count in the pooler? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
If your app is running under Tomcat, connection pooling is extremely easy to set up from there: It has connection pooling mechanisms built in. Request your db connections using said mechanisms, instead of doing it manually, make a couple of changes to server.xml, and the problem goes away. Hundreds, if not thousands of concurrent users might end up running with less than 10 connections. Harpreet singh Wadhwa harpr...@openbravo.com 7/9/2010 3:55 AM Thanx you all for the replies. I got a gist on where should I head towards like I should rely a bit on postgres for performance and rest on my tomcat and application. And will try connection pooling on postgres part. And if I come back for any query (related to this topic) then this time it will be more precise (with real time data of my testing). ;-) Regards haps On Fri, Jul 9, 2010 at 1:22 PM, Craig Ringer cr...@postnewspapers.com.au wrote: On 09/07/10 12:42, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. Admittedly I'm relatively ignorant of the details, but I increasingly think PostgreSQL will need a different big architectural change in the coming years, as the typical performance characteristics of machines change: It'll need to separate running queries from running processes, or start threading backends, so that one way or the other a single query can benefit from the capabilities of multiple CPUs. The same separation, or a move to async I/O, might be needed to get one query to concurrently read multiple partitions of a table, or otherwise get maximum benefit from high-capacity I/O subsystems when running just a few big, expensive queries. Otherwise I'm wondering if PostgreSQL will begin really suffering in performance on workloads where queries are big and expensive but there are relatively few of them running at a time. My point? *if* I'm not full of hot air and there's some truth to my blather above, any change like that might be accompanied by a move to separate query execution state from connection state, so that idle connections have a much lower resource cost. OK, that's my hand-waving for the day done. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 9 Jul 2010, Kevin Grittner wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. I think in some situations, this arrangement would be an advantage. However, I do not think it will suit the majority of situations, and could reduce the performance when the user doesn't need the functionality, either because they have a pool already, or they don't have many connections. No, I don't have any numbers. 1. Pool can be on a separate machine or machines, spreading load. Sure, but how would you do that with a built-in implementation? That's my point exactly. If you have an external pool solution, you can put it somewhere else - maybe on multiple somewhere elses. 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. This would probably be most useful where the client held a connection for a long time, not for the login for each database transaction approach. I'm curious how often you think application software uses that approach. What you say is true. I don't know how often that is, but it seems to be those times that people come crying to the mailing list. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. Isn't this point 2 again? Kind of. Yes. Point 2 was simple overhead. This point was that the backend may have done a load of query-related allocation, and then been stopped. 7. That lock would have a metric *($!-load of contention. Here I doubt you. It would be held for such short periods that I suspect that collisions would be relatively infrequent compared to some of the other locks we use. As noted in the email, it may actually normally be an increment and test within an existing locked block. Fair enough. It may be much less of a problem than I had previously thought. Matthew -- Change is inevitable, except from vending machines. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Matthew Wakeling matt...@flymine.org wrote: On Fri, 9 Jul 2010, Kevin Grittner wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. I think in some situations, this arrangement would be an advantage. However, I do not think it will suit the majority of situations, and could reduce the performance when the user doesn't need the functionality, either because they have a pool already, or they don't have many connections. Oh, totally agreed, except that I think we can have essentially nil impact if they don't exceed a configured limit. In my experience, pooling is more effective the closer you put it to the client. I suppose the strongest argument that could be made against building in some sort of pooling is that it doesn't encourage people to look for client-side solutions. However, we seem to get a lot of posts from people who don't do this, are not able to easily manage it, and who would benefit from even a simple solution like this. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
If anything was built in the database to handle such connections, I'd recommend a big, bold warning, recommending the use of client-side pooling if available. For something like, say, a web-server, pooling connections to the database provides a massive performance advantage regardless of how good the database is at handling way more active queries than the hardware can handle: The assignment of a connection to a thread tends to be at least an order of magnitude cheaper than establishing a new connection for each new thread, and destroying it when it dies. This is especially true if the client architecture relies in relatively short lived threads. While there are a few cases where pooling is counter productive, this only happens in relatively few scenarios. This is why every java application server out there wil strongly recommend using its own facilities to connect to a database: The performance is almost always better, and it provides less headaches to the DBAs. Now, if remote clients are accessing your database directly, setting up a pool inbetween might not be as straightforward or give you the same gains across the board, and that might be the only case where letting the db do its own pooling makes sense. Kevin Grittner kevin.gritt...@wicourts.gov 7/9/2010 12:52 PM Matthew Wakeling matt...@flymine.org wrote: On Fri, 9 Jul 2010, Kevin Grittner wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. I think in some situations, this arrangement would be an advantage. However, I do not think it will suit the majority of situations, and could reduce the performance when the user doesn't need the functionality, either because they have a pool already, or they don't have many connections. Oh, totally agreed, except that I think we can have essentially nil impact if they don't exceed a configured limit. In my experience, pooling is more effective the closer you put it to the client. I suppose the strongest argument that could be made against building in some sort of pooling is that it doesn't encourage people to look for client-side solutions. However, we seem to get a lot of posts from people who don't do this, are not able to easily manage it, and who would benefit from even a simple solution like this. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Jorge Montero jorge_mont...@homedecorators.com wrote: If anything was built in the database to handle such connections, I'd recommend a big, bold warning, recommending the use of client- side pooling if available. +1 -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query with planner row strange estimation
On Fri, Jul 9, 2010 at 6:13 AM, damien hostin damien.hos...@axege.com wrote: Have you tried running ANALYZE on the production server? You might also want to try ALTER TABLE ... SET STATISTICS to a large value on some of the join columns involved in the query. Hello, Before comparing the test case on the two machines, I run analyse on the whole and look at pg_stats table to see if change occurs for the columns. but on the production server the stats never became as good as on the desktop computer. I set statistic at 1 on column used by the join, run analyse which take a 300 row sample then look at the stats. The stats are not as good as on the desktop. Row number is nearly the same but only 1 or 2 values are found. The data are not balanced the same way on the two computer : - Desktop is 12000 rows with 6000 implicated in the query (50%), - Production (actually a dev/test server) is 6 million rows with 6000 implicated in the query (0,1%). Columns used in the query are nullable, and in the 5994000 other rows that are not implicated in the query these columns are null. I don't know if the statistic target is a % or a number of value to obtain, It's a number of values to obtain. but event set at max (1), it didn't managed to collect good stats (for this particular query). I think there's a cutoff where it won't collect values unless they occur significantly more often than the average frequency. I wonder if that might be biting you here: without the actual values in the MCV table, the join selectivity estimates probably aren't too good. As I don't know what more to do, my conclusion is that the data need to be better balanced to allow the analyse gather better stats. But if there is a way to improve the stats/query with this ugly balanced data, I'm open to it ! I hope that in real production, data will never be loaded this way. If this appened we will maybe set enable_nestloop to off, but I don't think it's a good solution, other query have a chance to get slower. Yeah, that usually works out poorly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, Jul 9, 2010 at 12:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. I'm not buying it. A separate connection pooler increases overhead and management complexity, and, I believe, limits our ability to implement optimizations like parallel query execution. I'm glad there are good ones available, but the fact that they're absolutely necessary for good performance in some environments is not a feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index usage with functions in where condition
I'm having trouble getting the query planner to use indexes. The situation occurs when writing a query that uses functions for defining the parameters for the conditions on the indexed columns. The system I'm running is Windows Server 2003, using version 8.4.2 of PostgreSQL. This is the following table that I'm running my query against: CREATE TABLE crs_coordinate ( id integer NOT NULL, nod_id integer NOT NULL, value1 numeric(22,12), value2 numeric(22,12), CONSTRAINT crs_coordinate_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX coo_value1 ON crs_coordinate USING btree (value1); CREATE INDEX coo_value2 ON crs_coordinate USING btree (value2); This table has 23 million rows in it and was analysed just before planning my queries. This is the query that does not use the indexes: SELECT coo.nod_id, 640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 175.58461)*cos(radians(-41.0618)))^2)) as distance FROM crs_coordinate coo WHERE coo.value1 between -41.0618-degrees(1200.0/640.0) and -41.0618+degrees(1200.0/640.0) and coo.value2 between 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618))); Seq Scan on crs_coordinate coo (cost=0.00..1039607.49 rows=592 width=28) Filter: (((value1)::double precision = (-41.0725429586587)::double precision) AND ((value1)::double precision = (-41.0510570413413)::double precision) AND ((value2)::double precision = 175.570362072701::double precision) AND ((value2)::double precision = 175.598857927299::double precision)) However if I pre-evaluated the parameters for the where condition on the value1 and value2 columns, the planner chooses to use the indexes: SELECT coo.nod_id, 640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 175.58461)*cos(radians(-41.0618)))^2)) as distance FROM crs_coordinate coo WHERE coo.value1 BETWEEN -41.07254296 AND -41.05105704 AND coo.value2 BETWEEN 175.57036207 AND 175.59885792; Bitmap Heap Scan on crs_coordinate coo (cost=5299.61..6705.41 rows=356 width=28) Recheck Cond: ((value1 = (-41.07254296)) AND (value1 = (-41.05105704)) AND (value2 = 175.57036207) AND (value2 = 175.59885792)) - BitmapAnd (cost=5299.61..5299.61 rows=356 width=0) - Bitmap Index Scan on coo_value1 (cost=0.00..1401.12 rows=54923 width=0) Index Cond: ((value1 = (-41.07254296)) AND (value1 = (-41.05105704))) - Bitmap Index Scan on coo_value2 (cost=0.00..3898.06 rows=153417 width=0) Index Cond: ((value2 = 175.57036207) AND (value2 = 175.59885792)) So why is the first query not using the indexes on the value1 and value2 columns? I'm assuming that both the COS and RAIDIANS functions are STRICT IMMUTABLE, so logically the evaluation of these functions in the where clause should be inlined. Looking at the query plan this inlining does seem to be happening... At this stage I have a work around by putting the query into a plpgsql function and using dynamic SQL. But it is still frustrating why the planner seems to be working in a far from optimal fashion. Can anyone shed some light on this for me? Thanks, Jeremy __ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or i...@linz.govt.nz) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Matthew Wakeling wrote: If you have an external pool solution, you can put it somewhere else - maybe on multiple somewhere elses. This is the key point to observe: if you're at the point where you have so many connections that you need a pool, the last place you want to put that is on the overloaded database server itself. Therefore, it must be an external piece of software to be effective, rather than being part of the server itself. Database servers are relatively expensive computing hardware due to size/quantity/quality of disks required. You can throw a pooler (or poolers) on any cheap 1U server. This is why a built-in pooler, while interesting, is not particularly functional for how people normally scale up real-world deployments. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index usage with functions in where condition
Jeremy Palmer jpal...@linz.govt.nz writes: This is the query that does not use the indexes: SELECT coo.nod_id, 640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 175.58461)*cos(radians(-41.0618)))^2)) as distance FROM crs_coordinate coo WHERE coo.value1 between -41.0618-degrees(1200.0/640.0) and -41.0618+degrees(1200.0/640.0) and coo.value2 between 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618))); Those expressions yield float8, not numeric, and numeric vs float8 isn't an indexable operator for reasons we needn't get into here. You should probably rethink whether numeric is really the best choice of datatype for your columns, if this is the sort of value you expect to work with --- you're paying a considerable price in speed and space for perhaps-illusory precision gains. But if you insist on using numeric then the solution is to cast the expression results to numeric explicitly. BTW I wonder whether you ought not be looking into postgis rather than rolling-your-own coordinate arithmetic ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?
Richard Yen wrote: I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there are no spindles. (Rolls eyes) Please be careful about how much SSD Kool-Aid you drink, and be skeptical of vendor claims. They don't just make latency go away, particularly on heavy write workloads where the technology is at its weakest. Also, random note, I'm seeing way too many FusionIO drive setups where people don't have any redundancy to cope with a drive failure, because the individual drives are so expensive they don't have more than one. Make sure that if you lose one of the drives, you won't have a massive data loss. Replication might help with that, if you can stand a little bit of data loss when the SSD dies. Not if--when. Even if you have a good one they don't last forever. This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or 656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment files are needed, so they never got deleted. Perhaps we should add in the docs that pg_xlog should be the size of the DB or larger? Excessive write volume beyond the capacity of the hardware can end up delaying the normal checkpoint that would have cleaned up all the xlog files. There's a nasty spiral that can get into I've seen a couple of times in similar form to what you reported. The pg_xlog should never exceed the size computed by that formula for very long, but it can burst above its normal size limits for a little bit. This is already mentioned as possibility in the manual: If, due to a short-term peak of log output rate, there are more than 3 * checkpoint_segments + 1 segment files, the unneeded segment files will be deleted instead of recycled until the system gets back under this limit. Autovacuum is an easy way to get the sort of activity needed to cause this problem, but I don't know if it's a necessary component to see the problem. You have to be in an unusual situation before the sum of the xlog files is anywhere close to the size of the database though. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index usage with functions in where condition
Hi Tom, Thanks for the help - much appreciated. Yes I'm using PostGIS, and with a simply join to a relating table I could get access to the geometry for these point positions. Is using the GIST r-tree index faster than using the 2 b-tree indexes on the lat and long values? I guess this is a question for the PostGIS guys and a quick test could tell me anyway! My memory is that the GIST r-tree index is slow for points at the moment, and that a good implementation of a kd-tree index over GIST is required for better speed. Regards, Jeremy Palmer Geodetic Surveyor National Geodetic Office Land Information New Zealand | Toitu te whenua 160 Lambton Quay | Private Box 5501 | Wellington 6145 DDI: 64 (0)4 498 3537 | Fax: 64 (0)4 498 3837 | www.linz.govt.nz -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Saturday, 10 July 2010 11:20 a.m. To: Jeremy Palmer Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Index usage with functions in where condition Jeremy Palmer jpal...@linz.govt.nz writes: This is the query that does not use the indexes: SELECT coo.nod_id, 640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 175.58461)*cos(radians(-41.0618)))^2)) as distance FROM crs_coordinate coo WHERE coo.value1 between -41.0618-degrees(1200.0/640.0) and -41.0618+degrees(1200.0/640.0) and coo.value2 between 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618))); Those expressions yield float8, not numeric, and numeric vs float8 isn't an indexable operator for reasons we needn't get into here. You should probably rethink whether numeric is really the best choice of datatype for your columns, if this is the sort of value you expect to work with --- you're paying a considerable price in speed and space for perhaps-illusory precision gains. But if you insist on using numeric then the solution is to cast the expression results to numeric explicitly. BTW I wonder whether you ought not be looking into postgis rather than rolling-your-own coordinate arithmetic ... regards, tom lane __ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or i...@linz.govt.nz) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
Greg Smith g...@2ndquadrant.com wrote: if you're at the point where you have so many connections that you need a pool, the last place you want to put that is on the overloaded database server itself. Therefore, it must be an external piece of software to be effective, rather than being part of the server itself. It *is* the last place you want to put it, but putting it there can be much better than not putting it *anywhere*, which is what we've often seen. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On 10/07/10 00:56, Brad Nicholson wrote: On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. This sounds similar to the approach to taken with Replication for years before being moved into core. Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the box will meet the needs of most users. There is also the issue of perception/adoption here as well. One of my colleagues mentioned that at PG East that he repeatedly heard people talking (negatively) about the over reliance on add-on packages to deal with core DB functionality. It would be interesting to know more about what they thought an 'over reliance' was and which packages they meant. While clearly in the case of replication something needed to be done to make it better and easier, it is not obvious that the situation with connection pools is analogous. For instance we make extensive use of PgBouncer, and it seems to do the job fine and is ridiculously easy to install and setup. So would having (something like) this in core be an improvement? Clearly if the 'in core' product is better then it is desirable... similarly if the packaged product is better... well let's have that then! I've certainly observed a 'fear of package installation' on the part of some folk, which is often a hangover from the 'Big IT shop' mentality where it requires blood signatures and child sacrifice to get anything new installed. regards Mark P.s Also note that Database Vendors like pooling integrated in the core of *their* product because it is another thing to charge a license for. Unfortunately this can also become an entrenched mentality of 'must be in core' on the part of consultants etc! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
It *is* the last place you want to put it, but putting it there can be much better than not putting it *anywhere*, which is what we've often seen. Well, what you proposed is an admission control mechanism, which is *different* from a connection pool, although the two overlap. A connection pool solves 4 problems when it's working: a) limiting the number of database server processes b) limiting the number of active concurrent queries c) reducing response times for allocating a new connection d) allowing management of connection routes to the database (redirection, failover, etc.) What you were proposing is only (b). While (b) alone is better than nothing, it only solves some kinds of problems. Database backend processes are *not* free, and in general when I see users with too many connections failures they are not because of too many concurrent queries, but rather because of too many idle connections (I've seen up to 1800 on a server). Simply adding (b) for crappy applications would make the problem worse, not better, because of the large number of pending queries which the developer would fail to deal with, or monitor. So while adding (b) to core alone would be very useful for some users, ironically it's generally for the more advanced users which are not the ones we're trying to help on this thread. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On 10/07/2010 9:25 AM, Josh Berkus wrote: It *is* the last place you want to put it, but putting it there can be much better than not putting it *anywhere*, which is what we've often seen. Well, what you proposed is an admission control mechanism, which is *different* from a connection pool, although the two overlap. A connection pool solves 4 problems when it's working: a) limiting the number of database server processes b) limiting the number of active concurrent queries c) reducing response times for allocating a new connection d) allowing management of connection routes to the database (redirection, failover, etc.) I agree with you: for most Pg users (a) is really, really important. As you know, in PostgreSQL each connection maintains not only general connection state (GUC settings, etc) and if in a transaction, transaction state, but also a query executor (full backend). That gets nasty not only in memory use, but in impact on active query performance, as all those query executors have to participate in global signalling for lock management etc. So an in-server pool that solved (b) but not (a) would IMO not be particularly useful for the majority of users. That said, I don't think it follows that (a) cannot be solved in-core. How much architectural change would be required to do it efficiently enough, though... -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Sent from my iPhone On Jul 9, 2010, at 18:25, Josh Berkus j...@agliodbs.com wrote: So while adding (b) to core alone would be very useful for some users, ironically it's generally for the more advanced users which are not the ones we're trying to help on this thread. It would seem from evidence presented on this thread that the more appropriate conversation would maybe be with package maintainers, to perhaps get them to include a connection pool or provide a package that comes with a pool preconfigured and installed, along with improving existing documentation so that it encourages the use of a pool as a first class installation choice since it seems to be something of a first class problem for a lot of novice users. Just to give some background on my perspective - my prior familiarity with a connection pool was entirely on the client side, where I've been using them for years go keep resource consumption down on the client.. But it never occurred to me to consider one on the other end of those connections, despite the fact that I usually have a cluster of app hosts all talking to the same db. I assumed low connection count was desirable, but not mandatory, since surely the db server limited its own resource consumption, much the way a well written client app will. I basically assumed that the postgres devs used the same logic I did when I pooled my connections at the client side in order to minimize resource consumption there. I've got no truck with the reasons presented against doing so, since they make perfectly good sense to me. However, I suspect there are lots of engineers like myself - folks working without the benefit of a dedicated dba or a dba who is new to the postgres platform - who make naive assumptions that aren't immediately or obviously corrected by the docs (I may be sticking my foot in my mouth here. I haven't read the standard docs in a very long time). With this issue in particular, the fix is fairly trivial and brings other benefits as well. But it sucks to discover it only after you've started to get errors on a running app, no matter how easy the fix. So probably this is really only a bug in communication and can be fixed there. That's great. Easier to fix bugs are hard to find. I have yet to contribute to postgres development, so I guess, if no one objects, I'll see what I can do about improving the documentation of this issue, both in the official docs and just making sure it gets better billing in other sources of postgres documentation. But you'll have to bear with me, as I do have a more-than-full-time other job, and no experience with the pg developer community other than a couple of weeks on the mailing lists. But I do like to contribute to projects I use. It always winds up making me a more proficient user. (for the record, if I wasn't limited to my phone at the moment I would actually check the state of existing documentation before sending this, so if I'm talking out of my ass on the lack of documentation, please go easy on me. I mean no offense) --sam -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance