[TurboGears] Re: Scalability

Bob Ippolito Mon, 02 Oct 2006 12:20:01 -0700

On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2006-10-02 at 10:45 -0700, Bob Ippolito wrote:
> > On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
> > >
> > > On Mon, 2006-10-02 at 10:04 -0700, Bob Ippolito wrote:
> > > > On 10/2/06, Stuart Clarke <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Thanks for the reply Kevin...
> > > > >
> > > > > > > I want to ask some questions, however, about scalability.  I'm
> > > > > > > developing a web system (the pages of which will be customised on 
> > > > > > > a
> > > > > > > per-user basis), that may grow to be quite popular.  I need to
> > > > > > > implement this, such that it's horizontally scalable in an 
> > > > > > > indefinite
> > > > > > > manner.
> > > > > > >
> > > > > > > OK, so web server replication and load balancing is easy.  My 
> > > > > > > problem
> > > > > > > is with the DB.  I can find several good-looking master-slave DB
> > > > > > > replicators (Slony for PG, for example), but I can't find a 
> > > > > > > suitable
> > > > > > > load-balancing mechanism, especially one that integrates with
> > > > > > > SQLObject
> > > > > > > or SQLAlchemy.
> > > > > >
> > > > > > I'm not sure what you mean here. In what way is the ORM involved 
> > > > > > with
> > > > > > the database replication? Do you mean from the standpoint of having
> > > > > > some collection of web servers talk to some specifically collection
> > > > > > of database servers?
> > > > >
> > > > > *** As I see it, there are two problems in using a distributed
> > > > > master-slave arrangement for the DB: replication (i.e. mirroring data
> > > > > from the master to the slaves) and load balancing (i.e. balancing the
> > > > > "DB-read" load across the slaves).
> > > > >
> > > > > Replication is handled by tools such as Slony.  What I need from the 
> > > > > ORM
> > > > > (or whatever) is a mechanism for load balancing.  I need to be able to
> > > > > say: here's my master server (for writing) and here is my list of 
> > > > > slave
> > > > > servers (for reading).  Please balance the system load appropriately,
> > > > > across these servers.  Or I need a hook where I can insert code of my
> > > > > own to do this.
> > > > >
> > > > > I have a sneaking suspicion that it might be possible in SQLAlchemy, 
> > > > > but
> > > > > I don't think it will integrate out of the box with TG's Identity
> > > > > implementation.
> > > > >
> > > > > Plus, I would like to do it in SQLObject, so I can have Catwalk.
> > > > >
> > > > > Any suggestions?
> > > >
> > > > Why don't you do load balancing at the DB layer with pgpool or 
> > > > something?
> > >
> > > *** pgpool is limited to one master, and one slave.  It's scalability is
> > > therefore quite limited.
> > >
> > > I haven't found any general-purpose tools which can provide unlimited
> > > (say, >20 slaves) scalability for either MySQL or PostgreSQL.  Does
> > > anyone know of one?
> >
> > Well, you only want one master... at least for any of the free
> > PostgreSQL replication solutions. I always partition my usage between
> > read-only and read-write connections, so it's rather easy to make that
> > work.
>
> *** I'm happy with only one master, and I also wish to partition my
> usage between read-only and read-write connections.  I want to do that,
> however, within a single Turbogears "application".  Is this what you do?
> Can you provide some hints on how to do it?  Also, do you know of a way
> to pool your read-only connections to a number of slaves, thereby
> distributing the load (within either SO or SA)?


I'm not currently using any ORM, so using different SA engines for
different queries is trivial. I'm also not currently distributing
among several slaves, but if I had to I would use something like SQL
Relay rather than trying to shove load balancing into my model.

> SQL Relay seems capable of load-balancing across a number of read-only
> DBs.  And it has a drop-in replacement API for PostgreSQL.  It doesn't
> distinguish between master and slaves, however, and so can't
> automatically manage the difference between reading and writing.  Also,
> SQL Relay load-balances on a per-connection basis, which run contrary to
> Turbo Gears' persistent-connection architecture.  Which sucks.  Have you
> any experience with using SQL Relay under Turbo Gears?

The reason you load balance is so that you get better concurrency. For
serial requests one database is going to do just as well (if not
better due to cache effects) than a pool. The way TG and SQL Relay
would interact is fine, because you get different connections for each
thread in the TG pool. Concurrent queries will be sent to different
servers (at the discretion of SQL Relay of course), so load balancing
still does exactly what it's supposed to.

No, I'm not currently using SQL Relay with TurboGears. I might if I
had to scale like that, though I haven't done all of the research to
definitively say that it's the load balancing solution I'd choose.

-bob

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"TurboGears" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/turbogears
-~----------~----~----~----~------~----~------~--~---

[TurboGears] Re: Scalability

Reply via email to