Re: [PERFORM] Performance Bottleneck
On Fri, 2004-08-06 at 23:18 +, Martin Foster wrote: Mike Benoit wrote: On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote: The queries themselves are simple, normally drawing information from one table with few conditions or in the most complex cases using joins on two table or sub queries. These behave very well and always have, the problem is that these queries take place in rather large amounts due to the dumb nature of the scripts themselves. Show us the explain analyze on that queries, how many rows the tables are containing, the table schema could be also usefull. If the queries themselves are optimized as much as they can be, and as you say, its just the sheer amount of similar queries hitting the database, you could try using prepared queries for ones that are most often executed to eliminate some of the overhead. I've had relatively good success with this in the past, and it doesn't take very much code modification. One of the biggest problems is most probably related to the indexes. Since the performance penalty of logging the information needed to see which queries are used and which are not is a slight problem, then I cannot really make use of it for now. However, I am curious how one would go about preparing query? Is this similar to the DBI::Prepare statement with placeholders and simply changing the values passed on execute? Or is this something database level such as a view et cetera? Yes, always optimize your queries and GUC settings first and foremost. Thats where you are likely to gain the most performance. After that if you still want to push things even further I would try prepared queries. I'm not familiar with DBI::Prepare at all, but I don't think its what your looking for. This is what you want: http://www.postgresql.org/docs/current/static/sql-prepare.html -- Mike Benoit [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Bottleneck
On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Anyone had any experience with both Apache::DBI and pgpool? For my needs they seem to do essentially the same thing, simply that one is invisible to the code while the other requires adding the complexity of a proxy. Both are invisible to the app. (With pgpool it thinks it is connecting to a regular old PG server) And I've been running pgpool in production for months. It just sits there. Doesn't take much to set it up or configure it. Works like a champ -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Bottleneck
On 8/8/2004 8:10 AM, Jeff wrote: On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. And this is exactly where the pgpool advantage lies. Especially with the TPC-W, the Apache is serving a mix of PHP (or whatever CGI technique is used) and static content like images. Since the 200+ Apache kids serve any of that content by random and the emulated browsers very much encourage it to ramp up MaxClients children by using up to 4 concurrent image connections, one does end up with MaxClients DB connections that are all relatively low frequently used. In contrast to that the real pgpool causes lesser, more active DB connections, which is better for performance. Anyone had any experience with both Apache::DBI and pgpool? For my needs they seem to do essentially the same thing, simply that one is invisible to the code while the other requires adding the complexity of a proxy. Both are invisible to the app. (With pgpool it thinks it is connecting to a regular old PG server) And I've been running pgpool in production for months. It just sits there. Doesn't take much to set it up or configure it. Works like a champ And it buys you some extra admin feature people like to forget about it. One can shut down one pool for one web application only. That gives you instant single user access to one database without shutting down the whole webserver or tempering with the pg_hba.conf file. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Bottleneck
And this is exactly where the pgpool advantage lies. Especially with the TPC-W, the Apache is serving a mix of PHP (or whatever CGI technique is used) and static content like images. Since the 200+ Apache kids serve any of that content by random and the emulated browsers very much encourage it to ramp up MaxClients children by using up to 4 concurrent image connections, one does end up with MaxClients DB connections that are all relatively low frequently used. In contrast to that the real pgpool causes lesser, more active DB connections, which is better for performance. There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1) Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. 2) Serve static content off an entirely separate apache server than the dynamic content, but by using separate domains (e.g. 'static.foo.com'). Personally I favour number 1. Our last biggish peak saw 6000 open HTTP and HTTPS connections and only 200 apache children, all of them nice and busy, not hanging around on street corners looking bored. During quiet times Apache drops back to its configured minimum of 40 kids. Option 2 has the advantage that you can use a leaner build for the 'dynamic' apache server, but with RAM so plentiful these days that's a less useful property. Basically this puts the 'pooling' back in the stateless HTTP area where it truly belongs and can be proven not to have any peculiar side effects (especially when it comes to transaction safety). Even better, so long as you use URL parameters for searches and the like, you can have the accelerator cache those pages for a certain time too so long as slightly stale results are OK. I'm sure pgpool and the like have their place, but being band-aids for poorly configured websites probably isn't the best use for them. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance Bottleneck
Jeff wrote: On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Seems that you are right, never noticed that from the documentation before. I always assumed it had something to do with the long lasting/persistent scripts that would remain in transactions for extended periods of time. Here is an odd question. While the server run 7.4.x, the client connects with 7.3.x. Would this in itself make a difference in performance as the protocols are different? At least based from pgpool's documentation. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance Bottleneck
On 8-8-2004 16:29, Matt Clark wrote: There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1) Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. Another version of this 1) is to run with a content accelerator; our favourite is to run Tux in front of Apache. It takes over the connection-handling stuff, has a very low memoryprofile (compared to Apache) and very little overhead. What it does, is to serve up all simple content (although you can have cgi/php/perl and other languages being processed by it, entirely disabling the need for apache in some cases) and forwards/proxies everything it doesn't understand to an Apache/other webserver running at the same machine (which runs on another port). I think there are a few advantages over Squid; since it is partially done in kernel-space it can be slightly faster in serving up content, apart from its simplicity which will probably matter even more. You'll have no caching issues for pages that should not be cached or static files that change periodically (like every few seconds). Afaik Tux can handle more than 10 times as much ab-generated requests per second than a default-compiled Apache on the same machine. And besides the speed-up, you can do any request you where able to do before, since Tux will simply forward it to Apache if it didn't understand it. Anyway, apart from all that. Reducing the amount of apache-connections is nice, but not really the same as reducing the amount of pooled-connections using a db-pool... You may even be able to run with 1000 http-connections, 40 apache-processes and 10 db-connections. In case of the non-pooled setup, you'd still have 40 db-connections. In a simple test I did, I did feel pgpool had quite some overhead though. So it should be well tested, to find out where the turnover-point is where it will be a gain instead of a loss... Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Arjen van der Meijden wrote: On 8-8-2004 16:29, Matt Clark wrote: There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1)Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. Another version of this 1) is to run with a content accelerator; our favourite is to run Tux in front of Apache. It takes over the connection-handling stuff, has a very low memoryprofile (compared to Apache) and very little overhead. What it does, is to serve up all simple content (although you can have cgi/php/perl and other languages being processed by it, entirely disabling the need for apache in some cases) and forwards/proxies everything it doesn't understand to an Apache/other webserver running at the same machine (which runs on another port). I think there are a few advantages over Squid; since it is partially done in kernel-space it can be slightly faster in serving up content, apart from its simplicity which will probably matter even more. You'll have no caching issues for pages that should not be cached or static files that change periodically (like every few seconds). Afaik Tux can handle more than 10 times as much ab-generated requests per second than a default-compiled Apache on the same machine. And besides the speed-up, you can do any request you where able to do before, since Tux will simply forward it to Apache if it didn't understand it. Anyway, apart from all that. Reducing the amount of apache-connections is nice, but not really the same as reducing the amount of pooled-connections using a db-pool... You may even be able to run with 1000 http-connections, 40 apache-processes and 10 db-connections. In case of the non-pooled setup, you'd still have 40 db-connections. In a simple test I did, I did feel pgpool had quite some overhead though. So it should be well tested, to find out where the turnover-point is where it will be a gain instead of a loss... Best regards, Arjen van der Meijden Other then images, there are very few static pages being loaded up by the user.Since they make up a very small portion of the traffic, it tends to be an optimization we can forgo for now. I attempted to make use of pgpool. At the default 32 connections pre-forked the webserver almost immediately tapped out the pgpool base and content stopped being served because no new processes were being forked to make up for it. So I raised it to a higher value (256) and it immediately segfaulted and dropped the core.So not sure exactly how to proceed, since I rather need the thing to fork additional servers as load hits and not the other way around. Unless I had it configured oddly, but it seems work differently then an Apache server would to handle content. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Bottleneck
Jeff wrote: On Aug 8, 2004, at 1:29 AM, Martin Foster wrote: I am currently making use of Apache::DBI which overrides the DBI::disconnect call and keeps a pool of active connections for use when need be. Since it offloads the pooling to the webserver, it seems more advantageous then pgpool which while being able to run on a external system is not adding another layer of complexity. Apache::DBI is not the same sort of a pool as pgpool. DB connections are not shared among all your apache children (A common misconception). So if you have 300 apache kids you can have have 300 db connections. With pgpool connections are shared among all of them so even though you have 300 kids you only have say 32 db connections. Seems that you are right, never noticed that from the documentation before. I always assumed it had something to do with the long lasting/persistent scripts that would remain in transactions for extended periods of time. Here is an odd question. While the server run 7.4.x, the client connects with 7.3.x. Would this in itself make a difference in performance as the protocols are different? At least based from pgpool's documentation. In this case the server fall back from V3 protocol (employed in 7.4 or later) to V2 protocol (employed in from 6.4 to 7.3.x). As far as pgpool concerning, performance difference is significant. Of course that depends on the implementation though. FYI here is the outline of the testing using pgbench. H/W: Pentium4 2.4GHz x2/memory 1GB/HDD IDE 80GB (all PCs are same spec) S/W: RedHat Linux 9/PostgreSQL 7.3.6/7.4.3 postgresql.conf: tcpip_socket = true max_connections = 512 shared_buffers = 2048 host A: pgbench, host B: pgpool, host C: PostgreSQL 7.3.6 or 7.4.3 pgbench parameters: -S -c 10 -t 1000 result: TPS ratio(7.4.3) ratio(7.3.6) without pgpool 4357.625059 100% 100% with pgpool(connection pool mode) 4330.290294 99.4% 94.1% with pgpool(replication mode) 4297.614996 98.6% 87.6% with pgpoo(replication with strictmode) 4270.223136 98.0% 81.5% -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
Arjen van der Meijden wrote: On 8-8-2004 16:29, Matt Clark wrote: There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1)Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. Another version of this 1) is to run with a content accelerator; our favourite is to run Tux in front of Apache. It takes over the connection-handling stuff, has a very low memoryprofile (compared to Apache) and very little overhead. What it does, is to serve up all simple content (although you can have cgi/php/perl and other languages being processed by it, entirely disabling the need for apache in some cases) and forwards/proxies everything it doesn't understand to an Apache/other webserver running at the same machine (which runs on another port). I think there are a few advantages over Squid; since it is partially done in kernel-space it can be slightly faster in serving up content, apart from its simplicity which will probably matter even more. You'll have no caching issues for pages that should not be cached or static files that change periodically (like every few seconds). Afaik Tux can handle more than 10 times as much ab-generated requests per second than a default-compiled Apache on the same machine. And besides the speed-up, you can do any request you where able to do before, since Tux will simply forward it to Apache if it didn't understand it. Anyway, apart from all that. Reducing the amount of apache-connections is nice, but not really the same as reducing the amount of pooled-connections using a db-pool... You may even be able to run with 1000 http-connections, 40 apache-processes and 10 db-connections. In case of the non-pooled setup, you'd still have 40 db-connections. In a simple test I did, I did feel pgpool had quite some overhead though. So it should be well tested, to find out where the turnover-point is where it will be a gain instead of a loss... I don't know what were the configurations you are using, but I noticed that UNIX domain sockets are preferred for the connection bwteen clients and pgpool. When I tested using pgbench -C (involving connection estblishing for each transaction), with-pgpool-configuration 10 times faster than without-pgpool-conf if using UNIX domain sockets, while there is only 3.6 times speed up with TCP/IP sockets. Best regards, Arjen van der Meijden Other then images, there are very few static pages being loaded up by the user.Since they make up a very small portion of the traffic, it tends to be an optimization we can forgo for now. I attempted to make use of pgpool. At the default 32 connections pre-forked the webserver almost immediately tapped out the pgpool base and content stopped being served because no new processes were being forked to make up for it. So I raised it to a higher value (256) and it immediately segfaulted and dropped the core.So not sure exactly how to proceed, since I rather need the thing to fork additional servers as load hits and not the other way around. What version of pgpool did you test? I know that certain version (actually 2.0.2) had such that problem. Can you try again with the latest verison of pgpool? (it's 2.0.6). -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] performance with column orders
I couldn't track down recent info in the archives, so I figured I'd ask here. Does the order of columns still have an impact on table speed? Back in the olden days, it used to be that fixed width columns (integer, tinyint, etc.) should be the first (left) columns in the table and variable width ones should be towards the end (right). This allowed a database to line up the columns better on disk and give you a speed boost. So, does Postgres still care about it? And, if so, how much? The posts I found were from 2 years ago, and indicated that there is a minor increase, but not a lot. Incidentally, could anyone quantify that in any fashion? Thanks, -Jim ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings