Since this seems to apply to everyone ...
I don't have a problem with polling the down server. Its relatively simple
to ping or attempt a connection to a server with a short timeout. If it
times out, move on. Otherwise, you don't know when that server comes back
up.
Your random distribution was the only thing I was commenting on. Your logic
of whether to 'keep polling' a down server has _nothing_ to do with whether
you use totally random selection or random select then round-robin.
Nothing!
That said, the best way to accomplish this well would be:
Store servers in an array of structs:
struct server {
ip_addr address;
int down = 0;
int try = 0;
}
struct server * servers; // malloc them ...
1) pick random start # from 0 to (num_servers - 1)
2) if servers[this_server].down > 0 then:
- servers[this_server].try++;
- if (try < down), increment, go to (2)
- else, try = 0 ...
3) connect to server
- if failed:
- servers[this_server].down++;
4) (success) increment start, set server.down to 0.
5) if another message, increment, go to (2)
That's off the top of my head -- but it would allow for linear (not
exponential) decrease of use of down servers. Over time, if a server was
down for a long time, it would stop being polled for a connection as often.
----- Original Message -----
From: "David Dyer-Bennet" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, August 02, 2000 1:03 PM
Subject: Re: updated load balancing qmail-qmqpc.c mods
> JuanE <[EMAIL PROTECTED]> writes on 2 August 2000 at 16:37:36 GMT
> >
> > I agree with you both (Jay and Michael), at least partially. I agree
that
> > altough what Jay proposes will work, it is too much computation and
that a
> > simpler round-robin (after picking initial position) would suffice.
> >
> > My comment is that in the event of a down server, the simple round
robin
> > will flood the next server in the chain with twice the load of the
others.
> > Jay's solution does not do this (at a high computational cost).
> >
> > What I proposed earlier is just one of *many* solutions that addresses
the
> > flood problem at a lower computational cost.
> >
> > Jay, I agree with you that selecting the same server many times in a
row is
> > not an issue. This is guaranteed by the Law of Averages (for you math
> > wizzes out there, the Law of Large Numbers).
>
> Sounds like making repeated random picks is the way to go.
>
> If no server is down, your one random pick will handle the mail (same
> cost as picking random starting point for round-robin).
>
> If a server is down *and you hit it*, you pay the cost of a second
> random pick. This is slightly expensive, but you only pay it when
> you need it. It's cheaper in elapsed time than trying the next server
> and having it refuse the connection due to overload, for example. And
> it spreads the load more evenly.
>
> On the third hand, if the servers can manage their incoming
> connections intelligently (say with tcpserver :-) ), the one after a
> down one in a round-robin, while it will get hit a lot, can refuse
> some of the connections, which will then go on to the next after it.
> So you aren't really constrained to running all your servers at less
> than 50% capacity, and the one after the down one won't actually melt
> down.
> --
> Photos: http://dd-b.lighthunters.net/ Minicon:
http://www.mnstf.org/minicon
> Bookworms: http://ouroboros.demesne.com/ SF: http://www.dd-b.net/dd-b
> David Dyer-Bennet / Welcome to the future! / [EMAIL PROTECTED]
>
>
>