Re: Recommended load balancing solutions?
Henri Asseily wrote: Take the example of the pod: $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... Assume his 'test' handle is a read-only handle. The stack definition shows that you have 3 read-only servers: prod1, prod2 and prod3. The particular process that uses this stack will start with prod1, and if it fails max_retries times, it will switch to the next one in the stack, prod2, and so on. If you want to have your application to be load-balanced across all the read-only databases, you can do it in many ways, but ultimately it boils down to having your children processes hitting different databases, and having stacks that are as unique as possible (so that if a db server fails, all its dependents don't fail over to the same secondary). Here's a complex example: In the case above, say you have 20 children for those 3 databases. The different stack permutations are: 1: prod1,prod2,prod3 2: prod1,prod3,prod2 3: prod2,prod1,prod3 4: prod2,prod3,prod1 5: prod3,prod1,prod2 6: prod3,prod2,prod1 Either dynamically or statically create these 6 permutations and put them in @all_db_stacks: Then when one of your 20 children starts up, it will take a stack. Either have each take the next stack down, or randomly take a stack. One example under Apache if you don't want to have a shared global is to grab the child's process id and get its modulus by the size of @all_db_stack: $stack_number = $$ % scalar(@all_db_stacks) And so in the example config above: db_stack = $all_db_stacks[$stack_number] Couldn't an option be added as an alternative to db_stack that indicates a desire for random ordering, thereby eliminating the need to precreate all possible permutations? I'm imaging something like this (modified version of your example): $DATABASE::conf{'test'} = { max_retries = 2, db_pool = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... So, db_stack maintains ordering, while db_pool tells DBIx::HA to randomize the list of servers. -ofer
load-balancing with DBIx::HA (was: Re: Recommended load balancing solutions?)
Ofer Nave wrote: Couldn't an option be added as an alternative to db_stack that indicates a desire for random ordering, thereby eliminating the need to precreate all possible permutations? I'm imaging something like this (modified version of your example): $DATABASE::conf{'test'} = { max_retries = 2, db_pool = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... So, db_stack maintains ordering, while db_pool tells DBIx::HA to randomize the list of servers. I like the idea of having this feature built-in, but I think the name could clearer, perhaps call it load balance: $DATABASE::conf{'test'} = { load_balance = 1, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test','user1','pass1',$attrib ], [ 'dbi:Sybase:server=prod2;database=test','user2','pass2',$attrib ], ], ... Mark
Re: load-balancing with DBIx::HA (was: Re: Recommended load balancing solutions?)
On Jan 11, 2007, at 3:18 PM, Mark Stosberg wrote: Ofer Nave wrote: Couldn't an option be added as an alternative to db_stack that indicates a desire for random ordering, thereby eliminating the need to precreate all possible permutations? I'm imaging something like this (modified version of your example): $DATABASE::conf{'test'} = { max_retries = 2, db_pool = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... So, db_stack maintains ordering, while db_pool tells DBIx::HA to randomize the list of servers. I like the idea of having this feature built-in, but I think the name could clearer, perhaps call it load balance: $DATABASE::conf{'test'} = { load_balance = 1, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test','user1','pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test','user2','pass2', $attrib ], ], ... The problem is that there are too many ways to load balance, which gets even more confusing when you use presistent connections such as those in mod_perl. Unless you have a large number of client processes compared the the servers, when you use persistent connections you're rarely going to be happily load balancing in a way that is equitable. Certain algorithms are better for certain situations such as a lower ratio of clients to servers, for example. And since the developer knows her systems better than I do, let her write a proper balancing routine for her needs. The trivial ones are unnecessary for me to include since they're trivial, and the more advanced ones are too specific to a situation for the general public (i.e. probably the 3 people using this module) to care about... As an interesting exercise, I'd be curious to see if you and Ofer can agree on a load balancing routine to use. :-) H
Re: Recommended load balancing solutions?
On 2007-01-10 15:06:56 -0800, Ofer Nave wrote: Henri Asseily wrote: Take the example of the pod: $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... [...] Here's a complex example: In the case above, say you have 20 children for those 3 databases. The different stack permutations are: 1: prod1,prod2,prod3 2: prod1,prod3,prod2 3: prod2,prod1,prod3 4: prod2,prod3,prod1 5: prod3,prod1,prod2 6: prod3,prod2,prod1 Either dynamically or statically create these 6 permutations and put them in @all_db_stacks: [...] Couldn't an option be added as an alternative to db_stack that indicates a desire for random ordering, thereby eliminating the need to precreate all possible permutations? I'm imaging something like this (modified version of your example): $DATABASE::conf{'test'} = { max_retries = 2, db_pool = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... How about just using shuffle from the standard module List::Util? use List::Util qw(shuffle); $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ shuffle ( [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ) ], ... hp -- _ | Peter J. Holzer| If I wanted to be academically correct, |_|_) | Sysadmin WSR | I'd be programming in Java. | | | [EMAIL PROTECTED] | I don't, and I'm not. __/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users pgpAXpSF4OodH.pgp Description: PGP signature
Re: Recommended load balancing solutions?
Peter J. Holzer wrote: On 2007-01-10 15:06:56 -0800, Ofer Nave wrote: Henri Asseily wrote: Take the example of the pod: $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... [...] Here's a complex example: In the case above, say you have 20 children for those 3 databases. The different stack permutations are: 1: prod1,prod2,prod3 2: prod1,prod3,prod2 3: prod2,prod1,prod3 4: prod2,prod3,prod1 5: prod3,prod1,prod2 6: prod3,prod2,prod1 Either dynamically or statically create these 6 permutations and put them in @all_db_stacks: [...] Couldn't an option be added as an alternative to db_stack that indicates a desire for random ordering, thereby eliminating the need to precreate all possible permutations? I'm imaging something like this (modified version of your example): $DATABASE::conf{'test'} = { max_retries = 2, db_pool = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... How about just using shuffle from the standard module List::Util? use List::Util qw(shuffle); $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ shuffle ( [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ) ], ... hp I like it! -ofer
Re: Recommended load balancing solutions?
On Jan 8, 2007, at 10:10 PM, Mark Stosberg wrote: Henri Asseily wrote: I don't know if I uploaded the latest version of DBIx::HA to CPAN, I'll do it asap. The docs for DBIx::HA say It is very simple to load-balance across read-only database servers. Simply randomize or reorder the 'db_stack' entry in your database configuration on a per-process basis. I know it could be dead simple, but could an example of what this looks like be provided? Take the example of the pod: $DATABASE::conf{'test'} = { max_retries = 2, db_stack = [ [ 'dbi:Sybase:server=prod1;database=test', 'user1', 'pass1', $attrib ], [ 'dbi:Sybase:server=prod2;database=test', 'user2', 'pass2', $attrib ], [ 'dbi:Sybase:server=prod3;database=test', 'user3', 'pass3', $attrib ], ], ... Assume his 'test' handle is a read-only handle. The stack definition shows that you have 3 read-only servers: prod1, prod2 and prod3. The particular process that uses this stack will start with prod1, and if it fails max_retries times, it will switch to the next one in the stack, prod2, and so on. If you want to have your application to be load-balanced across all the read-only databases, you can do it in many ways, but ultimately it boils down to having your children processes hitting different databases, and having stacks that are as unique as possible (so that if a db server fails, all its dependents don't fail over to the same secondary). Here's a complex example: In the case above, say you have 20 children for those 3 databases. The different stack permutations are: 1: prod1,prod2,prod3 2: prod1,prod3,prod2 3: prod2,prod1,prod3 4: prod2,prod3,prod1 5: prod3,prod1,prod2 6: prod3,prod2,prod1 Either dynamically or statically create these 6 permutations and put them in @all_db_stacks: Then when one of your 20 children starts up, it will take a stack. Either have each take the next stack down, or randomly take a stack. One example under Apache if you don't want to have a shared global is to grab the child's process id and get its modulus by the size of @all_db_stack: $stack_number = $$ % scalar(@all_db_stacks) And so in the example config above: db_stack = $all_db_stacks[$stack_number] Assuming that Apache recycles its children pretty uniformly, you should have a uniform distribution across the database servers. If hypothetically the child process are 1 to 20, then stacks 1 and 2 will be allocated to 4 children each, while stacks 3,4,5 and 6 will be allocated to 3 children each for a total of 2x4+4x3=20 children. So when all the db servers are functioning properly, prod1 should have 8 clients, prod2 and 3 should have 6 each. If prod1 dies, its children get split evenly between prod2 and prod3. How well has this worked in practice? So, if an Apache/mod_perl application has 24 children, about 12 would end up talking to each DB as a primary? In the case where you have 24 children and 2 database servers, sure. You'll have 12 children hitting one db, and 12 hitting the other. And if one dies, all 24 will hit the remaining live one. There's no other way. Also, DBD::Multi /does/ have this feature-- automatically randomly selecting between to dbs with the same priority. Perhaps this feature could be merged in. DBIx::HA focuses on high availability. This means not only load balancing, but failover, especially in a mod_perl environment where the children are always connected. You can't just randomly select a db for each call for a number of reasons, the biggest one being that you'd need a live connection to all the dbs. In the case of 3 database servers and 100 children, each db server would need to have 100 connects active. Now that may not be a problem for you, but when you have 100 web servers with 50 children each, few database servers are happy with 5,000 concurrent open connections. In such large setups, you also want good control over which servers/ children connect to which possible databases. In the case when you have 100 web servers, you'd probably want each server to have its own database failover stack, with all its children following the same pattern and using the same resources. This helps immensely the operations people who want to know machine and resource dependencies. I found that using a simple low-tech stack system allowed for many possibilities, even though it's not very glamorous. I do like the time out feature that DBIx::HA has. That's another critical issue for high availability. There are too many ways a database server can be unavailable, and the standard timeouts of the database drivers are completely unreliable because they're 'safe'. DBIx::HA uses unsafe timeouts, so technically the Perl interpreter could be in a bad state after a timeout, but practically I've never seen this happen and being able to time out a query or a connect is an absolute necessity. As to how well DBIx::HA has worked in
Re: Recommended load balancing solutions (especially with PostgreSQL)?
Henri Asseily wrote: I don't know if I uploaded the latest version of DBIx::HA to CPAN, I'll do it asap. The docs for DBIx::HA say It is very simple to load-balance across read-only database servers. Simply randomize or reorder the 'db_stack' entry in your database configuration on a per-process basis. I know it could be dead simple, but could an example of what this looks like be provided? How well has this worked in practice? So, if an Apache/mod_perl application has 24 children, about 12 would end up talking to each DB as a primary? Also, DBD::Multi /does/ have this feature-- automatically randomly selecting between to dbs with the same priority. Perhaps this feature could be merged in. I do like the time out feature that DBIx::HA has. Mark
Re: Recommended load balancing solutions (especially with PostgreSQL)?
On Thu, Jan 04, 2007 at 02:54:37PM -0500, Mark Stosberg wrote: Hello, Today I've been researching database load balancing solutions to use with an Apache/mod_perl application and a PostgreSQL database. Right now I'm using Slony-I for replication, and it seems to work well enough. # What's beginning to appeal to me know is to solve part load balancing in the application by using two database handles: a read/write one, and a read-only one. From there, it's much easier problem to solve: The R/w handle always talks to the master and the Read-only handle can be load balanced without being so smart. A wise approach. I believe DBIx::Class is taking that approach. Some further details about going this route: - I would leave the default handle as read-write. That means if I accidently miss converting a handle to read only somewhere, it will still work, it just won't be load-balanced. - CGI::Application::Plugin::DBH will make adding the second handle easy. I'll still use $self-dbh() to access the default handle, and $self-dbh('ro') for the new Read-only handle. - DBD::Multi looks like it could handle this kind of simple load balancing. As a bonus, it has some fail-over logic in it, so if the slave was unvailable for a bit, that would be handled transparently. Does that seem sound? What's worked for you? Also take a look at DBIx::HA. Tim.
Re: Recommended load balancing solutions (Thanks)
Tim Bunce wrote: On Thu, Jan 04, 2007 at 02:54:37PM -0500, Mark Stosberg wrote: Hello, Today I've been researching database load balancing solutions to use with an Apache/mod_perl application and a PostgreSQL database. Right now I'm using Slony-I for replication, and it seems to work well enough. # What's beginning to appeal to me know is to solve part load balancing in the application by using two database handles: a read/write one, and a read-only one. From there, it's much easier problem to solve: The R/w handle always talks to the master and the Read-only handle can be load balanced without being so smart. A wise approach. I believe DBIx::Class is taking that approach. You deserve some credit. I think I got the idea from you, through an previous post I found during my research. Thanks! Mark