Re: Poor man's connection pooling

2000-09-06 Thread Barrie Slaymaker

Michael Peppler wrote:
 
 Based on preliminary tests I was able to use a 1 in 10 ratio of
 database handles per httpd child processes, which, on a large site
 would cut down on the number of connections that the database server
 needs to handle.

I'd be interested to see how this compares with Apache::DBI performance with
MaxRequestsPerChild = 100.  I suspect it's negligable unless you're using
a database like Oracle (with lots of connection establishment overhead) over
a *slow* link.

I don't think there's any significant memory savings here, is there?  Things
you think might share physical memory probably don't after a SQL statement
or two and some perl code gets run.  Oracle certainly changes the RAM in
their connection and cursor handles, for instance.

- Barrie



Re: Poor man's connection pooling

2000-09-06 Thread Perrin Harkins

On Wed, 6 Sep 2000, Jay Strauss wrote:

 Being a database guy but new to Mod_Perl (disclaimer: If these aspects have
 already been implemented and/or talked about please excuse me).

Before going down this road again, I suggest reading the definitive work
on the subject, which is a post from Jeffrey Baker:
http:[EMAIL PROTECTED]

- Perrin




Re: Poor man's connection pooling

2000-09-06 Thread Barrie Slaymaker

Michael Peppler wrote:
 
 The back-end is Sybase. The actual connect time isn't the issue here
 (for me.) It's the sheer number of connections, and the potential
 issue with the number of sockets in CLOSE_WAIT or TIME_WAIT state on
 the database server. We're looking at a farm of 40 front-end servers,
 each runnning ~150 modperl procs. If each of the modperl procs opens
 one connection that's 6000 connections on the database side.

Thanks, that makes more sense.  I was thrown off by the stated advantage being
that connections don't go down when the apache child dies.  I thought it was
a performance-enhancer.

 I'm not worried about RAM usage on the web servers.

Cool, thanks.

- Barrie



Re: Poor man's connection pooling

2000-09-06 Thread Jeff Horn

One thing I've looked at doing is adding an LRU mechanism (or some such
discipline) to the existing Apache::DBI.

I've already modified Apache::DBI to use 'reauthenticate' if a DBD
implements it so that only 1 connection needs to be maintained per child.
This greatly improves performance with a 1:1 ratio between Apache children
and database connections.

However, some (most) DBD don't implement this method and in fact some
databases cannot even do a 'reauthenticate' on an existing connection.
Also, there are situations where performance can be greatly improved by a
judicious choice of the # of connections maintained per child.  An example
of this is where one user is used to verify authentication info stored in a
database and then if authenticated a new user id is used to actually do the
work.  In this case keeping 2 connections per Apache child, pins the common
"authenticating" connection and rotates the other connection, while still
keeping the total number of connections quite low.

There are other disciplines besides LRU which might work well.  Keeping a
"Hot Set" of connections by keeping a count of how often they're accessed
might a useful alternative.  I'm sure there's all sorts of interesting
disciplines that could be dreamed up!

I'm looking at potential changes to Apache::DBI which would allow a choice
of discipline (LRU, HotSet, etc.) along with whether or not to use
'reauthenticate' DBD function all configurable from apache config files.
I'd be interested in any input on this course of action!

-- Jeff Horn


 - Original Message -
 From: "Michael Peppler" [EMAIL PROTECTED]
 To: "Barrie Slaymaker" [EMAIL PROTECTED]
 Cc: "Michael Peppler" [EMAIL PROTECTED]; [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Sent: Wednesday, September 06, 2000 12:24 PM
 Subject: Re: Poor man's connection pooling


  Barrie Slaymaker writes:
Michael Peppler wrote:

 Based on preliminary tests I was able to use a 1 in 10 ratio of
 database handles per httpd child processes, which, on a large site
 would cut down on the number of connections that the database
server
 needs to handle.
   
I'd be interested to see how this compares with Apache::DBI
performance
 with
MaxRequestsPerChild = 100.  I suspect it's negligable unless you're
 using
a database like Oracle (with lots of connection establishment
overhead)
 over
a *slow* link.
 
  The back-end is Sybase. The actual connect time isn't the issue here
  (for me.) It's the sheer number of connections, and the potential
  issue with the number of sockets in CLOSE_WAIT or TIME_WAIT state on
  the database server. We're looking at a farm of 40 front-end servers,
  each runnning ~150 modperl procs. If each of the modperl procs opens
  one connection that's 6000 connections on the database side.
 
  Sybase can handle this, but I'd rather use a lower number, hence the
  pooling.
 
I don't think there's any significant memory savings here, is there?
 Things
you think might share physical memory probably don't after a SQL
 statement
or two and some perl code gets run.  Oracle certainly changes the RAM
 in
their connection and cursor handles, for instance.
 
  I'm not worried about RAM usage on the web servers.
 
  Michael  --
  Michael Peppler -||-  Data Migrations Inc.
  [EMAIL PROTECTED]-||-  http://www.mbay.net/~mpeppler
  Int. Sybase User Group  -||-  http://www.isug.com
  Sybase on Linux mailing list: [EMAIL PROTECTED]
 





Re: Poor man's connection pooling

2000-09-06 Thread Stas Bekman

On Wed, 6 Sep 2000, Perrin Harkins wrote:

 On Wed, 6 Sep 2000, Stas Bekman wrote:
  Just a small correction: 
  
  You can cause pages to become unshared in perl just by writing a variable,
 ^^^
  so it's almost certain to happen sooner or later.
  
  Or for example calling pos() which modifies the variable internals:
  http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_
 
 If you read a variable in a way that causes it to be converted between a
 numerical value and a string and it hasn't happened before, that will
 change the internal structure and unshare the memory on one or more
 pages.  I'm no perlguts hacker, but I think this is correct.

You are right. I've looked into it with the help of all mighty
Devel::Peek. So what actually happens is this:

Consider this script:
  
  use Devel::Peek;
  my $numerical = 10;
  my $string= "10";
  $|=1;
  
  dump_numerical();
  read_numerical_as_numerical();
  dump_numerical();
  read_numerical_as_string();
  dump_numerical();
  
  dump_string();
  read_string_as_numerical();
  dump_string();
  read_string_as_string();
  dump_string();
  
  sub read_numerical_as_numerical {
  print "\nReading numerical as numerical: ",
  int($numerical), "\n";
  }
  sub read_numerical_as_string {
  print "\nReading numerical as string: ",
  $numerical, "\n";
  }
  sub read_string_as_numerical {
  print "\nReading string as numerical: ",
  int($string), "\n";
  }
  sub read_string_as_string {
  print "\nReading string as string: ",
  $string, "\n";
  }
  sub dump_numerical {
  print "\nDumping a numerical variable\n";
  Dump($numerical);
  }
  sub dump_string {
  print "\nDumping a string variable\n";
  Dump($string);
  }

When running it:

  Dumping a numerical variable
  SV = IV(0x80e74c0) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,pIOK)
IV = 10
  
  Reading numerical as numerical: 10
  
  Dumping a numerical variable
  SV = PVNV(0x810f960) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,NOK,pIOK,pNOK)
IV = 10
NV = 10
PV = 0
  
  Reading numerical as string: 10
  
  Dumping a numerical variable
  SV = PVNV(0x810f960) at 0x80e482c
REFCNT = 4
FLAGS = (PADBUSY,PADMY,IOK,NOK,POK,pIOK,pNOK,pPOK)
IV = 10
NV = 10
PV = 0x80e78b0 "10"\0
CUR = 2
LEN = 28
  
  Dumping a string variable
  SV = PV(0x80cb87c) at 0x80e8190
REFCNT = 4
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3
  
  Reading string as numerical: 10
  
  Dumping a string variable
  SV = PVNV(0x80e78d0) at 0x80e8190
REFCNT = 4
FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK)
IV = 0
NV = 10
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3
  
  Reading string as string: 10
  
  Dumping a string variable
  SV = PVNV(0x80e78d0) at 0x80e8190
REFCNT = 4
FLAGS = (PADBUSY,PADMY,NOK,POK,pNOK,pPOK)
IV = 0
NV = 10
PV = 0x810f518 "10"\0
CUR = 2
LEN = 3

So you can clearly see that if you want the data to be shared (unless
some other variable happen to change its value, and thus cause the
whole page to get dirty) and there is a chance that the same variable
will be acccessed both as a string and numerical value, you have to
access to this variable in both value as in the above example, before
the fork happens. 

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://jazzvalley.com
http://singlesheaven.com http://perlmonth.com   perl.org   apache.org





Re: Poor man's connection pooling

2000-09-06 Thread Michael Peppler

Perrin Harkins writes:
  On Tue, 5 Sep 2000, Michael Peppler wrote:
   I've come across a technique that allows modperl processes to share a 
   pool of database handles. It's not something that I have seen
   documented, so I figured I'd throw it out here.
   
   The idea is to create a pool of connections during the main
   apache/modperl startup. Because these connections end up in the code
   segment for the child processes they are completely shareable. You
   populate a hash in a BEGIN block with x connections, and then provide
   some calls to grab the first available one (I use IPC::Semaphore to
   coordinate access to each connection).
  
  People have suggested this before on the mod_perl list and the objection
  raised was that this will fail for the same reason it fails to open a
  filehandle in the parent process and then use it from all the children.  
  Basically, it becomes unshared at some point and even if they don't do
  things simultaneously one process will leave the socket in a state that
  the other doesn't expect and cause problems.  You can cause pages to
  become unshared in perl just by reading a variable, so it's almost certain
  to happen sooner or later.

Yes, that's what I figured too. But in my tests, involving thousands
of database calls, I haven't seen any problems (yet).

  Can you try this some more and maybe throw some artificial loads against
  it to look for possible problems?  It would be cool if this worked, but
  I'm very skeptical until I see it handle higher concurrency without any
  problems.

I will *definitely* throw as much load as I can in a test/stress
environment before I make use of this in a production environment.

I'll let y'all know how this goes...

Michael
-- 
Michael Peppler -||-  Data Migrations Inc.
[EMAIL PROTECTED]-||-  http://www.mbay.net/~mpeppler
Int. Sybase User Group  -||-  http://www.isug.com
Sybase on Linux mailing list: [EMAIL PROTECTED]



Re: Poor man's connection pooling

2000-09-06 Thread Perrin Harkins

On Wed, 6 Sep 2000, Stas Bekman wrote:
 Just a small correction: 
 
 You can cause pages to become unshared in perl just by writing a variable,
^^^
 so it's almost certain to happen sooner or later.
 
 Or for example calling pos() which modifies the variable internals:
 http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_

If you read a variable in a way that causes it to be converted between a
numerical value and a string and it hasn't happened before, that will
change the internal structure and unshare the memory on one or more
pages.  I'm no perlguts hacker, but I think this is correct.

- Perrin




Re: Poor man's connection pooling

2000-09-06 Thread Tim Bunce

On Tue, Sep 05, 2000 at 10:38:48AM -0700, Michael Peppler wrote:
 
 The idea is to create a pool of connections during the main
 apache/modperl startup. [...]
 
 This technique works with Sybase connections using either
 DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor
 any other DBD modules).

For some drivers, like DBD::Oracle, connections generally don't work
right across forks, sadly. MySQL is okay. Not sure about others.

Tim.



Re: Poor man's connection pooling

2000-09-06 Thread Stas Bekman

On Tue, 5 Sep 2000, Perrin Harkins wrote:

 On Tue, 5 Sep 2000, Michael Peppler wrote:
  I've come across a technique that allows modperl processes to share a 
  pool of database handles. It's not something that I have seen
  documented, so I figured I'd throw it out here.
  
  The idea is to create a pool of connections during the main
  apache/modperl startup. Because these connections end up in the code
  segment for the child processes they are completely shareable. You
  populate a hash in a BEGIN block with x connections, and then provide
  some calls to grab the first available one (I use IPC::Semaphore to
  coordinate access to each connection).
 
 People have suggested this before on the mod_perl list and the objection
 raised was that this will fail for the same reason it fails to open a
 filehandle in the parent process and then use it from all the children.  
 Basically, it becomes unshared at some point and even if they don't do
 things simultaneously one process will leave the socket in a state that
 the other doesn't expect and cause problems.  You can cause pages to
 become unshared in perl just by reading a variable, so it's almost certain
 to happen sooner or later.

Just a small correction: 

You can cause pages to become unshared in perl just by writing a variable,
   ^^^
so it's almost certain to happen sooner or later.

Or for example calling pos() which modifies the variable internals:
http://perl.apache.org/guide/performance.html#Are_My_Variables_Shared_

 Can you try this some more and maybe throw some artificial loads against
 it to look for possible problems?  It would be cool if this worked, but
 I'm very skeptical until I see it handle higher concurrency without any
 problems.
 
 - Perrin
 
 



_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://jazzvalley.com
http://singlesheaven.com http://perlmonth.com   perl.org   apache.org





Re: Poor man's connection pooling

2000-09-05 Thread Vivek Khera

 "MP" == Michael Peppler [EMAIL PROTECTED] writes:

MP I'd be very interested in any comments, in particular if there are any 
MP downsides that I haven't considered (which is quite possible).

Sounds quite cool.  Would you consider making an Apache:: module for
it?

One question that pops to mind is how are the connections cleaned up
if a child client dies while in the middle of processing it?  That is,
do yo have some way for a reaper process to notice and cleanup a
connection that is being used by a particular process but orphaned?

Other than that, this seems just like the connection pooling you'd get
with a middle layer between mod_perl and the database, but all
embedded within the mod_perl layer.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-301-545-6996
GPG  MIME spoken herehttp://www.khera.org/~vivek/



Re: Poor man's connection pooling

2000-09-05 Thread Stas Bekman

On Tue, 5 Sep 2000, Michael Peppler wrote:

 Stas Bekman writes:
   On Tue, 5 Sep 2000, Michael Peppler wrote:
   
The idea is to create a pool of connections during the main
apache/modperl startup. Because these connections end up in the code
segment for the child processes they are completely shareable. You
populate a hash in a BEGIN block with x connections, and then provide
some calls to grab the first available one (I use IPC::Semaphore to
coordinate access to each connection).
   
   There is no such a thing as a code segment in Perl. Everything is handled
   as data, so unless you write your code in C and glue it to Perl, you
   cannot ensure that some read/write variable wouldn't land on the same page
   with the data that stores the connections.
 
 I guess it's in the shared data segment then. In any case the
 connections and the datastructures that go with them are visible from
 all the child httpd processes.

The shared segments are shared until the moment, when one of the data
structures is being written to, which makes the page (1k to 4k) dirty and
unshared. See the http://perl.apache.org/guide/performance.html (the
shared memory sections) for a more detailed explanation of the memory
management under Perl.

The advantage of this solution is that connections don't go down when
the apache child dies (for example because the MAX_REQUEST per child
has been reached).
   
   If there would be a way for a chile to know its processor slot as seen by
   Apache::Status, there will be no need for any semaphoring technique --
   each process will access the connection by asking for its slot
   number. This will work because when the child being reaped off a new
   process replacing it is running in the same slot. Looks a much faster way
   to me if you get hold of the slot number.
 
 I'll look into that - thanks.

You are very welcome. It'd be cool to have yet another neat technique to
squeeze more from mod_perl :)

_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://jazzvalley.com
http://singlesheaven.com http://perlmonth.com   perl.org   apache.org





Poor man's connection pooling

2000-09-05 Thread Michael Peppler

I've come across a technique that allows modperl processes to share a 
pool of database handles. It's not something that I have seen
documented, so I figured I'd throw it out here.

The idea is to create a pool of connections during the main
apache/modperl startup. Because these connections end up in the code
segment for the child processes they are completely shareable. You
populate a hash in a BEGIN block with x connections, and then provide
some calls to grab the first available one (I use IPC::Semaphore to
coordinate access to each connection).

Based on preliminary tests I was able to use a 1 in 10 ratio of
database handles per httpd child processes, which, on a large site
would cut down on the number of connections that the database server
needs to handle.

The advantage of this solution is that connections don't go down when
the apache child dies (for example because the MAX_REQUEST per child
has been reached).

The disadvantage is that the connections can only be restarted by
restarting the webserver (apachectl stop/apachectl start).

This technique works with Sybase connections using either
DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor
any other DBD modules).

I'd be very interested in any comments, in particular if there are any 
downsides that I haven't considered (which is quite possible).

Michael
-- 
Michael Peppler -||-  Data Migrations Inc.
[EMAIL PROTECTED]-||-  http://www.mbay.net/~mpeppler
Int. Sybase User Group  -||-  http://www.isug.com
Sybase on Linux mailing list: [EMAIL PROTECTED]



Re: Poor man's connection pooling

2000-09-05 Thread Stas Bekman

On Tue, 5 Sep 2000, Michael Peppler wrote:

 I've come across a technique that allows modperl processes to share a 
 pool of database handles. It's not something that I have seen
 documented, so I figured I'd throw it out here.
 
 The idea is to create a pool of connections during the main
 apache/modperl startup. Because these connections end up in the code
 segment for the child processes they are completely shareable. You
 populate a hash in a BEGIN block with x connections, and then provide
 some calls to grab the first available one (I use IPC::Semaphore to
 coordinate access to each connection).

There is no such a thing as a code segment in Perl. Everything is handled
as data, so unless you write your code in C and glue it to Perl, you
cannot ensure that some read/write variable wouldn't land on the same page
with the data that stores the connections.

 Based on preliminary tests I was able to use a 1 in 10 ratio of
 database handles per httpd child processes, which, on a large site
 would cut down on the number of connections that the database server
 needs to handle.
 
 The advantage of this solution is that connections don't go down when
 the apache child dies (for example because the MAX_REQUEST per child
 has been reached).

If there would be a way for a chile to know its processor slot as seen by
Apache::Status, there will be no need for any semaphoring technique --
each process will access the connection by asking for its slot
number. This will work because when the child being reaped off a new
process replacing it is running in the same slot. Looks a much faster way
to me if you get hold of the slot number.

 The disadvantage is that the connections can only be restarted by
 restarting the webserver (apachectl stop/apachectl start).
 
 This technique works with Sybase connections using either
 DBI/DBD::Sybase or Sybase::CTlib (I've not tested Sybase::DBlib, nor
 any other DBD modules).
 
 I'd be very interested in any comments, in particular if there are any 
 downsides that I haven't considered (which is quite possible).
 
 Michael
 -- 
 Michael Peppler -||-  Data Migrations Inc.
 [EMAIL PROTECTED]-||-  http://www.mbay.net/~mpeppler
 Int. Sybase User Group  -||-  http://www.isug.com
 Sybase on Linux mailing list: [EMAIL PROTECTED]
 



_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide 
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://jazzvalley.com
http://singlesheaven.com http://perlmonth.com   perl.org   apache.org





Re: Poor man's connection pooling

2000-09-05 Thread Michael Peppler

Stas Bekman writes:
  On Tue, 5 Sep 2000, Michael Peppler wrote:
  
   The idea is to create a pool of connections during the main
   apache/modperl startup. Because these connections end up in the code
   segment for the child processes they are completely shareable. You
   populate a hash in a BEGIN block with x connections, and then provide
   some calls to grab the first available one (I use IPC::Semaphore to
   coordinate access to each connection).
  
  There is no such a thing as a code segment in Perl. Everything is handled
  as data, so unless you write your code in C and glue it to Perl, you
  cannot ensure that some read/write variable wouldn't land on the same page
  with the data that stores the connections.

I guess it's in the shared data segment then. In any case the
connections and the datastructures that go with them are visible from
all the child httpd processes.

   The advantage of this solution is that connections don't go down when
   the apache child dies (for example because the MAX_REQUEST per child
   has been reached).
  
  If there would be a way for a chile to know its processor slot as seen by
  Apache::Status, there will be no need for any semaphoring technique --
  each process will access the connection by asking for its slot
  number. This will work because when the child being reaped off a new
  process replacing it is running in the same slot. Looks a much faster way
  to me if you get hold of the slot number.

I'll look into that - thanks.

Michael
-- 
Michael Peppler -||-  Data Migrations Inc.
[EMAIL PROTECTED]-||-  http://www.mbay.net/~mpeppler
Int. Sybase User Group  -||-  http://www.isug.com
Sybase on Linux mailing list: [EMAIL PROTECTED]