Re: Recommended load balancing solutions?

2007-01-11 Thread Ofer Nave

Henri Asseily wrote:

Take the example of the pod:

$DATABASE::conf{'test'} = {
 max_retries = 2,
 db_stack = [
  [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
  [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
  [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
 ], ...

Assume his 'test' handle is a read-only handle.
The stack definition shows that you have 3 read-only servers: prod1, 
prod2 and prod3.
The particular process that uses this stack will start with prod1, and 
if it fails max_retries times, it will switch to the next one in the 
stack, prod2, and so on.
If you want to have your application to be load-balanced across all the 
read-only databases, you can do it in many ways, but ultimately it boils 
down to having your children processes hitting different databases, and 
having stacks that are as unique as possible (so that if a db server 
fails, all its dependents don't fail over to the same secondary).


Here's a complex example:
In the case above, say you have 20 children for those 3 databases.
The different stack permutations are:

1: prod1,prod2,prod3
2: prod1,prod3,prod2
3: prod2,prod1,prod3
4: prod2,prod3,prod1
5: prod3,prod1,prod2
6: prod3,prod2,prod1

Either dynamically or statically create these 6 permutations and put 
them in @all_db_stacks:
Then when one of your 20 children starts up, it will take a stack. 
Either have each take the next stack down, or randomly take a stack. One 
example under Apache if you don't want to have a shared global is to 
grab the child's process id and get its modulus by the size of 
@all_db_stack:

$stack_number = $$ % scalar(@all_db_stacks)

And so in the example config above:
db_stack = $all_db_stacks[$stack_number]


Couldn't an option be added as an alternative to db_stack that indicates 
a desire for random ordering, thereby eliminating the need to precreate 
all possible permutations?  I'm imaging something like this (modified 
version of your example):


$DATABASE::conf{'test'} = {
  max_retries = 2,
  db_pool = [
   [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
   [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
   [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
  ], ...

So, db_stack maintains ordering, while db_pool tells DBIx::HA to 
randomize the list of servers.


-ofer


load-balancing with DBIx::HA (was: Re: Recommended load balancing solutions?)

2007-01-11 Thread Mark Stosberg
Ofer Nave wrote:
 
 Couldn't an option be added as an alternative to db_stack that indicates
 a desire for random ordering, thereby eliminating the need to precreate
 all possible permutations?  I'm imaging something like this (modified
 version of your example):
 
 $DATABASE::conf{'test'} = {
   max_retries = 2,
   db_pool = [
[ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
[ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
[ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
   ], ...
 
 So, db_stack maintains ordering, while db_pool tells DBIx::HA to
 randomize the list of servers.

I like the idea of having this feature built-in, but I think the name
could clearer, perhaps call it load balance:

$DATABASE::conf{'test'} = {
   load_balance = 1,
   db_stack = [
[ 'dbi:Sybase:server=prod1;database=test','user1','pass1',$attrib ],
[ 'dbi:Sybase:server=prod2;database=test','user2','pass2',$attrib ],
   ], ...


   Mark


Re: load-balancing with DBIx::HA (was: Re: Recommended load balancing solutions?)

2007-01-11 Thread Henri Asseily

On Jan 11, 2007, at 3:18 PM, Mark Stosberg wrote:


Ofer Nave wrote:


Couldn't an option be added as an alternative to db_stack that  
indicates
a desire for random ordering, thereby eliminating the need to  
precreate

all possible permutations?  I'm imaging something like this (modified
version of your example):

$DATABASE::conf{'test'} = {
  max_retries = 2,
  db_pool = [
   [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1',  
$attrib ],
   [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2',  
$attrib ],
   [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3',  
$attrib ],

  ], ...

So, db_stack maintains ordering, while db_pool tells DBIx::HA to
randomize the list of servers.


I like the idea of having this feature built-in, but I think the name
could clearer, perhaps call it load balance:

$DATABASE::conf{'test'} = {
   load_balance = 1,
   db_stack = [
[ 'dbi:Sybase:server=prod1;database=test','user1','pass1', 
$attrib ],
[ 'dbi:Sybase:server=prod2;database=test','user2','pass2', 
$attrib ],

   ], ...


The problem is that there are too many ways to load balance, which  
gets even more confusing when you use presistent connections such as  
those in mod_perl. Unless you have a large number of client processes  
compared the the servers, when you use persistent connections you're  
rarely going to be happily load balancing in a way that is equitable.  
Certain algorithms are better for certain situations such as a lower  
ratio of clients to servers, for example.
And since the developer knows her systems better than I do, let her  
write a proper balancing routine for her needs. The trivial ones are  
unnecessary for me to include since they're trivial, and the more  
advanced ones are too specific to a situation for the general public  
(i.e. probably the 3 people using this module) to care about...


As an interesting exercise, I'd be curious to see if you and Ofer can  
agree on a load balancing routine to use. :-)


H



Re: Recommended load balancing solutions?

2007-01-11 Thread Peter J. Holzer
On 2007-01-10 15:06:56 -0800, Ofer Nave wrote:
 Henri Asseily wrote:
 Take the example of the pod:
 $DATABASE::conf{'test'} = {
  max_retries = 2,
  db_stack = [
   [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
   [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
   [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
  ], ...
[...]
 Here's a complex example:
 In the case above, say you have 20 children for those 3 databases.
 The different stack permutations are:
 1: prod1,prod2,prod3
 2: prod1,prod3,prod2
 3: prod2,prod1,prod3
 4: prod2,prod3,prod1
 5: prod3,prod1,prod2
 6: prod3,prod2,prod1
 Either dynamically or statically create these 6 permutations and put them in 
 @all_db_stacks:
[...]
 
 Couldn't an option be added as an alternative to db_stack that indicates a 
 desire for random ordering, thereby eliminating the need to precreate all 
 possible permutations?  I'm imaging something like this (modified version of 
 your example):
 
 $DATABASE::conf{'test'} = {
   max_retries = 2,
   db_pool = [
[ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
[ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
[ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
   ], ...

How about just using shuffle from the standard module List::Util?

use List::Util qw(shuffle);

$DATABASE::conf{'test'} = {
 max_retries = 2,
 db_stack = [
  shuffle (
  [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
  [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
  [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
  )
 ], ...

hp

-- 
   _  | Peter J. Holzer| If I wanted to be academically correct,
|_|_) | Sysadmin WSR   | I'd be programming in Java.
| |   | [EMAIL PROTECTED]  | I don't, and I'm not.
__/   | http://www.hjp.at/ |   -- Jesse Erlbaum on dbi-users


pgpAXpSF4OodH.pgp
Description: PGP signature


Re: Recommended load balancing solutions?

2007-01-11 Thread Ofer Nave

Peter J. Holzer wrote:

On 2007-01-10 15:06:56 -0800, Ofer Nave wrote:

Henri Asseily wrote:

Take the example of the pod:
$DATABASE::conf{'test'} = {
max_retries = 2,
db_stack = [
 [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
 [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
 [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
], ...

[...]

Here's a complex example:
In the case above, say you have 20 children for those 3 databases.
The different stack permutations are:
1: prod1,prod2,prod3
2: prod1,prod3,prod2
3: prod2,prod1,prod3
4: prod2,prod3,prod1
5: prod3,prod1,prod2
6: prod3,prod2,prod1
Either dynamically or statically create these 6 permutations and put them in 
@all_db_stacks:

[...]
Couldn't an option be added as an alternative to db_stack that indicates a 
desire for random ordering, thereby eliminating the need to precreate all 
possible permutations?  I'm imaging something like this (modified version of 
your example):


$DATABASE::conf{'test'} = {
  max_retries = 2,
  db_pool = [
   [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
   [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
   [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
  ], ...


How about just using shuffle from the standard module List::Util?

use List::Util qw(shuffle);

$DATABASE::conf{'test'} = {
 max_retries = 2,
 db_stack = [
  shuffle (
  [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ],
  [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ],
  [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ],
  )
 ], ...

hp



I like it!

-ofer


Re: Recommended load balancing solutions?

2007-01-09 Thread Henri Asseily

On Jan 8, 2007, at 10:10 PM, Mark Stosberg wrote:


Henri Asseily wrote:


I don't know if I uploaded the latest version of DBIx::HA to CPAN,  
I'll

do it asap.


The docs for DBIx::HA say

It is very simple to load-balance across read-only database servers.
Simply randomize or reorder the 'db_stack' entry in your database
configuration on a per-process basis.

I know it could be dead simple, but could an example of what this  
looks

like be provided?


Take the example of the pod:

$DATABASE::conf{'test'} = {
 max_retries = 2,
 db_stack = [
  [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1',  
$attrib ],
  [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2',  
$attrib ],
  [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3',  
$attrib ],

 ], ...

Assume his 'test' handle is a read-only handle.
The stack definition shows that you have 3 read-only servers: prod1,  
prod2 and prod3.
The particular process that uses this stack will start with prod1,  
and if it fails max_retries times, it will switch to the next one in  
the stack, prod2, and so on.
If you want to have your application to be load-balanced across all  
the read-only databases, you can do it in many ways, but ultimately  
it boils down to having your children processes hitting different  
databases, and having stacks that are as unique as possible (so that  
if a db server fails, all its dependents don't fail over to the same  
secondary).


Here's a complex example:
In the case above, say you have 20 children for those 3 databases.
The different stack permutations are:

1: prod1,prod2,prod3
2: prod1,prod3,prod2
3: prod2,prod1,prod3
4: prod2,prod3,prod1
5: prod3,prod1,prod2
6: prod3,prod2,prod1

Either dynamically or statically create these 6 permutations and put  
them in @all_db_stacks:
Then when one of your 20 children starts up, it will take a stack.  
Either have each take the next stack down, or randomly take a stack.  
One example under Apache if you don't want to have a shared global is  
to grab the child's process id and get its modulus by the size of  
@all_db_stack:

$stack_number = $$ % scalar(@all_db_stacks)

And so in the example config above:
db_stack = $all_db_stacks[$stack_number]

Assuming that Apache recycles its children pretty uniformly, you  
should have a uniform distribution across the database servers. If  
hypothetically the child process are 1 to 20, then stacks 1 and 2  
will be allocated to 4 children each, while stacks 3,4,5 and 6 will  
be allocated to 3 children each for a total of 2x4+4x3=20 children.
So when all the db servers are functioning properly, prod1 should  
have 8 clients, prod2 and 3 should have 6 each. If prod1 dies, its  
children get split evenly between prod2 and prod3.




How well has this worked in practice? So, if an Apache/mod_perl
application has 24 children, about 12 would end up talking to each  
DB as

a primary?


In the case where you have 24 children and 2 database servers, sure.  
You'll have 12 children hitting one db, and 12 hitting the other. And  
if one dies, all 24 will hit the remaining live one. There's no other  
way.




Also, DBD::Multi /does/ have this feature-- automatically randomly
selecting between to dbs with the same priority. Perhaps this feature
could be merged in.


DBIx::HA focuses on high availability. This means not only load  
balancing, but failover, especially in a mod_perl environment where  
the children are always connected. You can't just randomly select a  
db for each call for a number of reasons, the biggest one being that  
you'd need a live connection to all the dbs. In the case of 3  
database servers and 100 children, each db server would need to have  
100 connects active. Now that may not be a problem for you, but when  
you have 100 web servers with 50 children each, few database servers  
are happy with 5,000 concurrent open connections.


In such large setups, you also want good control over which servers/ 
children connect to which possible databases. In the case when you  
have 100 web servers, you'd probably want each server to have its own  
database failover stack, with all its children following the same  
pattern and using the same resources. This helps immensely the  
operations people who want to know machine and resource dependencies.


I found that using a simple low-tech stack system allowed for many  
possibilities, even though it's not very glamorous.




I do like the time out feature that DBIx::HA has.


That's another critical issue for high availability. There are too  
many ways a database server can be unavailable, and the standard  
timeouts of the database drivers are completely unreliable because  
they're 'safe'. DBIx::HA uses unsafe timeouts, so technically the  
Perl interpreter could be in a bad state after a timeout, but  
practically I've never seen this happen and being able to time out a  
query or a connect is an absolute necessity.



As to how well DBIx::HA has worked in 

Re: Recommended load balancing solutions (especially with PostgreSQL)?

2007-01-08 Thread Mark Stosberg
Henri Asseily wrote:
 
 I don't know if I uploaded the latest version of DBIx::HA to CPAN, I'll
 do it asap.

The docs for DBIx::HA say

It is very simple to load-balance across read-only database servers.
Simply randomize or reorder the 'db_stack' entry in your database
configuration on a per-process basis.

I know it could be dead simple, but could an example of what this looks
like be provided?

How well has this worked in practice? So, if an Apache/mod_perl
application has 24 children, about 12 would end up talking to each DB as
a primary?

Also, DBD::Multi /does/ have this feature-- automatically randomly
selecting between to dbs with the same priority. Perhaps this feature
could be merged in.

I do like the time out feature that DBIx::HA has.

   Mark


Re: Recommended load balancing solutions (especially with PostgreSQL)?

2007-01-05 Thread Tim Bunce
On Thu, Jan 04, 2007 at 02:54:37PM -0500, Mark Stosberg wrote:
 
 Hello,
 
 Today I've been researching database load balancing solutions to use
 with an Apache/mod_perl application and a PostgreSQL database. Right now
 I'm using Slony-I for replication, and it seems to work well enough.

 #
 
 What's beginning to appeal to me know is to solve part load balancing in
 the application by using two database handles: a read/write one, and a
 read-only one. From there, it's much easier problem to solve: The R/w
 handle always talks to the master and the Read-only handle can be load
 balanced without being so smart.

A wise approach. I believe DBIx::Class is taking that approach.

 Some further details about going this route:
 
- I would leave the default handle as read-write. That means if I
 accidently miss converting a handle to read only somewhere, it will
 still work, it just won't be load-balanced.
 
- CGI::Application::Plugin::DBH will make adding the second handle
 easy. I'll still use $self-dbh() to access the default handle, and
 $self-dbh('ro') for the new Read-only handle.
 
 - DBD::Multi looks like it could handle this kind of simple load
 balancing. As a bonus, it has some fail-over logic in it, so if the
 slave was unvailable for a bit, that would be handled transparently.
 
 Does that seem sound? What's worked for you?

Also take a look at DBIx::HA.

Tim.


Re: Recommended load balancing solutions (Thanks)

2007-01-05 Thread Mark Stosberg
Tim Bunce wrote:
 On Thu, Jan 04, 2007 at 02:54:37PM -0500, Mark Stosberg wrote:
 Hello,

 Today I've been researching database load balancing solutions to use
 with an Apache/mod_perl application and a PostgreSQL database. Right now
 I'm using Slony-I for replication, and it seems to work well enough.
 
 #

 What's beginning to appeal to me know is to solve part load balancing in
 the application by using two database handles: a read/write one, and a
 read-only one. From there, it's much easier problem to solve: The R/w
 handle always talks to the master and the Read-only handle can be load
 balanced without being so smart.
 
 A wise approach. I believe DBIx::Class is taking that approach.

You deserve some credit. I think I got the idea from you, through an
previous post I found during my research. Thanks!

   Mark