On 08/10/14 17:12, Curtis K. Larsen wrote:
I am trying to be proactive regarding scaling of RADIUS servers as we move to a
new load-balanced environment. The idea is to know when we are getting close
to a threshold and need to add another VM, allocate more CPU, RAM, etc.
I've used a traffic generator to simulate authentications against our
FreeRADIUS VM's and so I know the max. number of auths/sec. that a server can
handle and I've seen that the server will start to reject clients when it can't
handle the load. So I am thinking a dashboard that graphs the auths per
second, and pie chart that shows successful vs. failed requests with some
alerting would allow us to preempt load/growth issues. It seems this info
wouldn't be too difficult to grab from syslog and graph on a web page somehow.
I am just wondering if any of you have already done this or something like this
that you could share before I re-invent the wheel. Let me know.
Thanks,
Curtis Larsen
University of Utah
Hi Curtis,
We use Nagios[1] to monitor all our systems, including FreeRADIUS
servers. We use a bolt-on called pnp4nagios[2] to do graphing of
performance data[3] with RRD.
FreeRADIUS itself includes a virtual status server[4]. If you enable
this then you can send a dummy auth packet to FreeRADIUS using radclient
and it will respond with a packet containing some useful status
variables, including successful and failed auths.
I've written a Nagios plugin to interrogate the RADIUS server and format
the data for Nagios[5]. I've also given some example config snippets to
use the plugin with Nagios. You can of course slurp the status data into
any format you like. The status packet only contains packet counters
since the FreeRADIUS daemon was started so the plugin has to save the
results from the last check and average them over time to give a "per
second" rate.
Our system gives us nice graphs like this one[6] which is showing the
Access-Requests per second on the three primary RADIUS servers. The
spiky behaviour on the top two graphs (which serve campus) is due to
people moving around en masse on the hour, as lectures change over. The
third graph (which serves residences) is pretty steady, but also shows
that students never really go to sleep!
[1] http://www.nagios.org/
[2] https://docs.pnp4nagios.org/
[3] http://nagios.sourceforge.net/docs/3_0/perfdata.html
[4] http://wiki.freeradius.org/config/Status
[5] https://gist.github.com/djjudas21/cd1e7bfee44fb879855d
[6] http://imgur.com/F4xeNUm
Hope this helps,
Jonathan
**********
Participation and subscription information for this EDUCAUSE Constituent Group
discussion list can be found at http://www.educause.edu/groups/.