Hello Viraj -
On Sun, 28 Apr 2002 23:06, Viraj Alankar wrote:
Hello,
I am wondering what's the best design for a high volume radius system. We
are looking at on the order of 100-150 requests/second (auth+acct) on
average. Does anyone here have a load balancing system setup? If so, I'd
appreciate any tips on how you set this up.
I will send you a seperate mail containing a copy of a paper I wrote for one
of our customers that deals with part of this issue.
After using Radiator for quite awhile, I've found that the main things that
cause slowdowns is database queries or network outages. I've noticed during
network outages, some RASes (we have mostly Ascend) and proxy servers start
flooding the server once the connectivity comes back. These appear to be
queued requests (mostly accounting) on the systems. In this situation it
pretty much kills our radius server (CPU - 99%) and many times we have to
run Radiator in a very basic configuration (no database, no authentication)
for some time to cool things down. Many times I've even had to go to our
firewall and block some RAS traffic.
You should have a look at a trace 4 debug using the LogMicroseconds parameter
(requires Time-HiRes from CPAN). This will tell you how much time each
processing step is taking, and in consequence how many requests per second
you can deal with.
So I am just looking for some tips on how to setup a scalable system. We
have a test system setup with a Foundry switch load balancing to 2 Radiator
servers via roundrobin. However, in our tests we are noticing that the load
balancing is not even when the source UDP port stays constant, which is for
example when another Radiator is forwarding requests to it. It only seems
to load balance properly when the source ports change. Anyone have any
ideas what could be wrong here?
It sounds like the switch is using address/port pairs to determine how to
load share. Radiator has three different load balancing modules that
implement different algorithms. The most useful in this respect is usually
the AuthBy LOADBALANCE module that distributes requests according to the
response time of each target.
What I was thinking was to instead setup one Radiator system that uses the
AuthBy loadbalance clause instead of the Foundry switch. Any thoughts on
this instead of hardware load balancing?
As mentioned above, you should check the switch to see what options you have
for selecting the load balancing algorithm.
The next issue is database slowdowns. I am thinking that the best setup
would be for the RASes to go directly to Radiators that do not have any
sort of DB dependency, and instead they proxy to respective servers that do
have DB dependencies. For example:
A
/ \
/ \
B C
/ \ / \
D E F G
A = Radiator doing AuthBy loadbalance to B and C (or hardware switch)
B/C = Radiator with only AuthBy RADIUS clauses
D/E/F/G = Radiator with DB access
The B and C trees would be identical. Does this sound like a proper setup?
Well the problem here is that you will still have a single throttle point
that will result in everything running at the speed of the database. In other
words, B and C will still not send a reply to the NAS until the database
query(s) complete.
As far as the type of database access, we've mostly seen that accounting is
what causes problems. I believe this is due to our table designs. For
example, we have unique indexes to drop duplicate accounting, indexed on
many fields. At some point when there is alot of data inserts become slow.
I was thinking that Radiator's access to the DB should be made as fast as
possible, and that Radiator should instead use the DB as sort of a log
table for accounting (with no indexes at all), similar to writing to raw
files. Then, periodically, an external process would process this data and
move to the real accounting tables (with indexes, etc). This way, DB query
time is kept to a minimal for accounting.
What you describe is a good solution - keep the processing that Radiator
itself does to an absolute minimum.
You might also consider running two instances of Radiator on each host - one
for authentication and the other for accounting.
Another problem we have is the number of Handlers. We handle requests
depending on the following:
RAS IP
RAS IP+DNIS
RAS IP+DNIS+Realm
With all of our devices, the number of handlers is getting quite large. I'm
wondering what would be an upper bound on this and if there is a better way
to handle this. We have almost 500 handlers at this point.
It is difficult to say anything sensible about your setup without seeing the
configuration file and understanding your requirements.
Note that we do offer consulting and design services on a contract basis and
we have done a large number of custom installations all around the world for
many of our customers.
If you are interested in this service, please contact