Re: Poor man's connection pooling
Michael Peppler wrote: Based on preliminary tests I was able to use a 1 in 10 ratio of database handles per httpd child processes, which, on a large site would cut down on the number of connections that the database server needs to handle. I'd be interested to see how this compares with Apache::DBI performance with MaxRequestsPerChild = 100. I suspect it's negligable unless you're using a database like Oracle (with lots of connection establishment overhead) over a *slow* link. I don't think there's any significant memory savings here, is there? Things you think might share physical memory probably don't after a SQL statement or two and some perl code gets run. Oracle certainly changes the RAM in their connection and cursor handles, for instance. - Barrie
Re: Poor man's connection pooling
On Wed, 6 Sep 2000, Jay Strauss wrote: Being a database guy but new to Mod_Perl (disclaimer: If these aspects have already been implemented and/or talked about please excuse me). Before going down this road again, I suggest reading the definitive work on the subject, which is a post from Jeffrey Baker: http:[EMAIL PROTECTED] - Perrin
Re: Poor man's connection pooling
Michael Peppler wrote: The back-end is Sybase. The actual connect time isn't the issue here (for me.) It's the sheer number of connections, and the potential issue with the number of sockets in CLOSE_WAIT or TIME_WAIT state on the database server. We're looking at a farm of 40 front-end servers, each runnning ~150 modperl procs. If each of the modperl procs opens one connection that's 6000 connections on the database side. Thanks, that makes more sense. I was thrown off by the stated advantage being that connections don't go down when the apache child dies. I thought it was a performance-enhancer. I'm not worried about RAM usage on the web servers. Cool, thanks. - Barrie
Re: Poor man's connection pooling
One thing I've looked at doing is adding an LRU mechanism (or some such discipline) to the existing Apache::DBI. I've already modified Apache::DBI to use 'reauthenticate' if a DBD implements it so that only 1 connection needs to be maintained per child. This greatly improves performance with a 1:1 ratio between Apache children and database connections. However, some (most) DBD don't implement this method and in fact some databases cannot even do a 'reauthenticate' on an existing connection. Also, there are situations where performance can be greatly improved by a judicious choice of the # of connections maintained per child. An example of this is where one user is used to verify authentication info stored in a database and then if authenticated a new user id is used to actually do the work. In this case keeping 2 connections per Apache child, pins the common "authenticating" connection and rotates the other connection, while still keeping the total number of connections quite low. There are other disciplines besides LRU which might work well. Keeping a "Hot Set" of connections by keeping a count of how often they're accessed might a useful alternative. I'm sure there's all sorts of interesting disciplines that could be dreamed up! I'm looking at potential changes to Apache::DBI which would allow a choice of discipline (LRU, HotSet, etc.) along with whether or not to use 'reauthenticate' DBD function all configurable from apache config files. I'd be interested in any input on this course of action! -- Jeff Horn - Original Message - From: "Michael Peppler" [EMAIL PROTECTED] To: "Barrie Slaymaker" [EMAIL PROTECTED] Cc: "Michael Peppler" [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, September 06, 2000 12:24 PM Subject: Re: Poor man's connection pooling Barrie Slaymaker writes: Michael Peppler wrote: Based on preliminary tests I was able to use a 1 in 10 ratio of database handles per httpd child processes, which, on a large site would cut down on the number of connections that the database server needs to handle. I'd be interested to see how this compares with Apache::DBI performance with MaxRequestsPerChild = 100. I suspect it's negligable unless you're using a database like Oracle (with lots of connection establishment overhead) over a *slow* link. The back-end is Sybase. The actual connect time isn't the issue here (for me.) It's the sheer number of connections, and the potential issue with the number of sockets in CLOSE_WAIT or TIME_WAIT state on the database server. We're looking at a farm of 40 front-end servers, each runnning ~150 modperl procs. If each of the modperl procs opens one connection that's 6000 connections on the database side. Sybase can handle this, but I'd rather use a lower number, hence the pooling. I don't think there's any significant memory savings here, is there? Things you think might share physical memory probably don't after a SQL statement or two and some perl code gets run. Oracle certainly changes the RAM in their connection and cursor handles, for instance. I'm not worried about RAM usage on the web servers. Michael -- Michael Peppler -||- Data Migrations Inc. [EMAIL PROTECTED]-||- http://www.mbay.net/~mpeppler Int. Sybase User Group -||- http://www.isug.com Sybase on Linux mailing list: [EMAIL PROTECTED]
Re: Poor man's connection pooling
On Wed, 6 Sep 2000, Perrin Harkins wrote: On Wed, 6 Sep 2000, Stas Bekman wrote: Just a small correction: You can cause pages to become unshared in perl just by writing a variable, ^^^ so it's almost certain to happen sooner or later. Or for example calling pos() which modifies the variable internals: http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_ If you read a variable in a way that causes it to be converted between a numerical value and a string and it hasn't happened before, that will change the internal structure and unshare the memory on one or more pages. I'm no perlguts hacker, but I think this is correct. You are right. I've looked into it with the help of all mighty Devel::Peek. So what actually happens is this: Consider this script: use Devel::Peek; my $numerical = 10; my $string= "10"; $|=1; dump_numerical(); read_numerical_as_numerical(); dump_numerical(); read_numerical_as_string(); dump_numerical(); dump_string(); read_string_as_numerical(); dump_string(); read_string_as_string(); dump_string(); sub read_numerical_as_numerical { print "\nReading numerical as numerical: ", int($numerical), "\n"; } sub read_numerical_as_string { print "\nReading numerical as string: ", $numerical, "\n"; } sub read_string_as_numerical { print "\nReading string as numerical: ", int($string), "\n"; } sub read_string_as_string { print "\nReading string as string: ", $string, "\n"; } sub dump_numerical { print "\nDumping a numerical variable\n"; Dump($numerical); } sub dump_string { print "\nDumping a string variable\n"; Dump($string); } When running it: Dumping a numerical variable SV = IV(0x80e74c0) at 0x80e482c REFCNT = 4 FLAGS = (PADBUSY,PADMY,IOK,pIOK) IV = 10 Reading numerical as numerical: 10 Dumping a numerical variable SV = PVNV(0x810f960) at 0x80e482c REFCNT = 4 FLAGS = (PADBUSY,PADMY,IOK,NOK,pIOK,pNOK) IV = 10 NV = 10 PV = 0 Reading numerical as string: 10 Dumping a numerical variable SV = PVNV(0x810f960) at 0x80e482c REFCNT = 4 FLAGS = (PADBUSY,PADMY,IOK,NOK,POK,pIOK,pNOK,pPOK) IV = 10 NV = 10 PV = 0x80e78b0 "10"\0 CUR = 2 LEN = 28 Dumping a string variable SV = PV(0x80cb87c) at 0x80e8190 REFCNT = 4 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x810f518 "10"\0 CUR = 2 LEN = 3 Reading string as numerical: 10 Dumping a string variable SV = PVNV(0x80e78d0) at 0x80e8190 REFCNT = 4 FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK) IV = 0 NV = 10 PV = 0x810f518 "10"\0 CUR = 2 LEN = 3 Reading string as string: 10 Dumping a string variable SV = PVNV(0x80e78d0) at 0x80e8190 REFCNT = 4 FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK) IV = 0 NV = 10 PV = 0x810f518 "10"\0 CUR = 2 LEN = 3 So you can clearly see that if you want the data to be shared (unless some other variable happen to change its value, and thus cause the whole page to get dirty) and there is a chance that the same variable will be acccessed both as a string and numerical value, you have to access to this variable in both value as in the above example, before the fork happens. _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://apachetoday.com http://jazzvalley.com http://singlesheaven.com http://perlmonth.com perl.org apache.org
Re: Poor man's connection pooling
Perrin Harkins writes: On Tue, 5 Sep 2000, Michael Peppler wrote: I've come across a technique that allows modperl processes to share a pool of database handles. It's not something that I have seen documented, so I figured I'd throw it out here. The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). People have suggested this before on the mod_perl list and the objection raised was that this will fail for the same reason it fails to open a filehandle in the parent process and then use it from all the children. Basically, it becomes unshared at some point and even if they don't do things simultaneously one process will leave the socket in a state that the other doesn't expect and cause problems. You can cause pages to become unshared in perl just by reading a variable, so it's almost certain to happen sooner or later. Yes, that's what I figured too. But in my tests, involving thousands of database calls, I haven't seen any problems (yet). Can you try this some more and maybe throw some artificial loads against it to look for possible problems? It would be cool if this worked, but I'm very skeptical until I see it handle higher concurrency without any problems. I will *definitely* throw as much load as I can in a test/stress environment before I make use of this in a production environment. I'll let y'all know how this goes... Michael -- Michael Peppler -||- Data Migrations Inc. [EMAIL PROTECTED]-||- http://www.mbay.net/~mpeppler Int. Sybase User Group -||- http://www.isug.com Sybase on Linux mailing list: [EMAIL PROTECTED]
Re: Poor man's connection pooling
On Wed, 6 Sep 2000, Stas Bekman wrote: Just a small correction: You can cause pages to become unshared in perl just by writing a variable, ^^^ so it's almost certain to happen sooner or later. Or for example calling pos() which modifies the variable internals: http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_ If you read a variable in a way that causes it to be converted between a numerical value and a string and it hasn't happened before, that will change the internal structure and unshare the memory on one or more pages. I'm no perlguts hacker, but I think this is correct. - Perrin
Re: Poor man's connection pooling
On Tue, Sep 05, 2000 at 10:38:48AM -0700, Michael Peppler wrote: The idea is to create a pool of connections during the main apache/modperl startup. [...] This technique works with Sybase connections using either DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor any other DBD modules). For some drivers, like DBD::Oracle, connections generally don't work right across forks, sadly. MySQL is okay. Not sure about others. Tim.
Re: Poor man's connection pooling
On Tue, 5 Sep 2000, Perrin Harkins wrote: On Tue, 5 Sep 2000, Michael Peppler wrote: I've come across a technique that allows modperl processes to share a pool of database handles. It's not something that I have seen documented, so I figured I'd throw it out here. The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). People have suggested this before on the mod_perl list and the objection raised was that this will fail for the same reason it fails to open a filehandle in the parent process and then use it from all the children. Basically, it becomes unshared at some point and even if they don't do things simultaneously one process will leave the socket in a state that the other doesn't expect and cause problems. You can cause pages to become unshared in perl just by reading a variable, so it's almost certain to happen sooner or later. Just a small correction: You can cause pages to become unshared in perl just by writing a variable, ^^^ so it's almost certain to happen sooner or later. Or for example calling pos() which modifies the variable internals: http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_ Can you try this some more and maybe throw some artificial loads against it to look for possible problems? It would be cool if this worked, but I'm very skeptical until I see it handle higher concurrency without any problems. - Perrin _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://apachetoday.com http://jazzvalley.com http://singlesheaven.com http://perlmonth.com perl.org apache.org
Re: Poor man's connection pooling
"MP" == Michael Peppler [EMAIL PROTECTED] writes: MP I'd be very interested in any comments, in particular if there are any MP downsides that I haven't considered (which is quite possible). Sounds quite cool. Would you consider making an Apache:: module for it? One question that pops to mind is how are the connections cleaned up if a child client dies while in the middle of processing it? That is, do yo have some way for a reaper process to notice and cleanup a connection that is being used by a particular process but orphaned? Other than that, this seems just like the connection pooling you'd get with a middle layer between mod_perl and the database, but all embedded within the mod_perl layer. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-545-6996 GPG MIME spoken herehttp://www.khera.org/~vivek/
Re: Poor man's connection pooling
On Tue, 5 Sep 2000, Michael Peppler wrote: Stas Bekman writes: On Tue, 5 Sep 2000, Michael Peppler wrote: The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). There is no such a thing as a code segment in Perl. Everything is handled as data, so unless you write your code in C and glue it to Perl, you cannot ensure that some read/write variable wouldn't land on the same page with the data that stores the connections. I guess it's in the shared data segment then. In any case the connections and the datastructures that go with them are visible from all the child httpd processes. The shared segments are shared until the moment, when one of the data structures is being written to, which makes the page (1k to 4k) dirty and unshared. See the http://perl.apache.org/guide/performance.html (the shared memory sections) for a more detailed explanation of the memory management under Perl. The advantage of this solution is that connections don't go down when the apache child dies (for example because the MAX_REQUEST per child has been reached). If there would be a way for a chile to know its processor slot as seen by Apache::Status, there will be no need for any semaphoring technique -- each process will access the connection by asking for its slot number. This will work because when the child being reaped off a new process replacing it is running in the same slot. Looks a much faster way to me if you get hold of the slot number. I'll look into that - thanks. You are very welcome. It'd be cool to have yet another neat technique to squeeze more from mod_perl :) _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://apachetoday.com http://jazzvalley.com http://singlesheaven.com http://perlmonth.com perl.org apache.org
Poor man's connection pooling
I've come across a technique that allows modperl processes to share a pool of database handles. It's not something that I have seen documented, so I figured I'd throw it out here. The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). Based on preliminary tests I was able to use a 1 in 10 ratio of database handles per httpd child processes, which, on a large site would cut down on the number of connections that the database server needs to handle. The advantage of this solution is that connections don't go down when the apache child dies (for example because the MAX_REQUEST per child has been reached). The disadvantage is that the connections can only be restarted by restarting the webserver (apachectl stop/apachectl start). This technique works with Sybase connections using either DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor any other DBD modules). I'd be very interested in any comments, in particular if there are any downsides that I haven't considered (which is quite possible). Michael -- Michael Peppler -||- Data Migrations Inc. [EMAIL PROTECTED]-||- http://www.mbay.net/~mpeppler Int. Sybase User Group -||- http://www.isug.com Sybase on Linux mailing list: [EMAIL PROTECTED]
Re: Poor man's connection pooling
On Tue, 5 Sep 2000, Michael Peppler wrote: I've come across a technique that allows modperl processes to share a pool of database handles. It's not something that I have seen documented, so I figured I'd throw it out here. The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). There is no such a thing as a code segment in Perl. Everything is handled as data, so unless you write your code in C and glue it to Perl, you cannot ensure that some read/write variable wouldn't land on the same page with the data that stores the connections. Based on preliminary tests I was able to use a 1 in 10 ratio of database handles per httpd child processes, which, on a large site would cut down on the number of connections that the database server needs to handle. The advantage of this solution is that connections don't go down when the apache child dies (for example because the MAX_REQUEST per child has been reached). If there would be a way for a chile to know its processor slot as seen by Apache::Status, there will be no need for any semaphoring technique -- each process will access the connection by asking for its slot number. This will work because when the child being reaped off a new process replacing it is running in the same slot. Looks a much faster way to me if you get hold of the slot number. The disadvantage is that the connections can only be restarted by restarting the webserver (apachectl stop/apachectl start). This technique works with Sybase connections using either DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor any other DBD modules). I'd be very interested in any comments, in particular if there are any downsides that I haven't considered (which is quite possible). Michael -- Michael Peppler -||- Data Migrations Inc. [EMAIL PROTECTED]-||- http://www.mbay.net/~mpeppler Int. Sybase User Group -||- http://www.isug.com Sybase on Linux mailing list: [EMAIL PROTECTED] _ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://apachetoday.com http://jazzvalley.com http://singlesheaven.com http://perlmonth.com perl.org apache.org
Re: Poor man's connection pooling
Stas Bekman writes: On Tue, 5 Sep 2000, Michael Peppler wrote: The idea is to create a pool of connections during the main apache/modperl startup. Because these connections end up in the code segment for the child processes they are completely shareable. You populate a hash in a BEGIN block with x connections, and then provide some calls to grab the first available one (I use IPC::Semaphore to coordinate access to each connection). There is no such a thing as a code segment in Perl. Everything is handled as data, so unless you write your code in C and glue it to Perl, you cannot ensure that some read/write variable wouldn't land on the same page with the data that stores the connections. I guess it's in the shared data segment then. In any case the connections and the datastructures that go with them are visible from all the child httpd processes. The advantage of this solution is that connections don't go down when the apache child dies (for example because the MAX_REQUEST per child has been reached). If there would be a way for a chile to know its processor slot as seen by Apache::Status, there will be no need for any semaphoring technique -- each process will access the connection by asking for its slot number. This will work because when the child being reaped off a new process replacing it is running in the same slot. Looks a much faster way to me if you get hold of the slot number. I'll look into that - thanks. Michael -- Michael Peppler -||- Data Migrations Inc. [EMAIL PROTECTED]-||- http://www.mbay.net/~mpeppler Int. Sybase User Group -||- http://www.isug.com Sybase on Linux mailing list: [EMAIL PROTECTED]