Date:    Fri, 5 May 2017 14:19:47 +0000
From:    "Watters, John" <john.watt...@ua.edu>
Subject: Re: Radius Transaction Times

We have been having RADIUS problems for a while. After a lot of cussing and 
gnashing of teeth I got the RADIUS folks to build three new servers (all virtual). 
These were put into the same IP address spaces as our Cisco 8510 controllers. We 
are running MPLS with our campus divided into three areas, soon to become four 
since we acquired 100+ acres of adjacent land that used to be the State mental 
health hospital complex). The WLCs, RADIUS servers, and APs are all in a global VRF 
in each area. In addition these new RADIUS servers (running FreeRadius) had code 
upgrades that provided caching which cut down dramatically on their calls to our 
LDAP servers (we do not use AD for this function). We have found that the new 
RADIUS servers perform well enough to drastically cut down our timeout & retry 
values. And, they are not failing over to the other listed RADIUS servers at all. I 
have been looking at the stats, adding the results into a spreadsheet for 
comparison, and resetting the stats on a daily basis for about a week now. Very 
impressive results compared to what they were in the past. Zero failovers to the 
backup RADIUS servers) Now, the slow RADIUS performers are the few where we allow 
areas to run their own RADIUS authentication (e.g., Athletics and a State funded 
traffic accident center).


The following are stats for the last 24 hours for the primary RADIUS servers in 
each MPLS area. Note that our last day of finals was yesterday. So overall 
usage is down somewhat from previous days.


All controllers are 8510s running Cisco 8.0.140.0 due to a few older APs that 
we are phasing out this summer.


We also run Cisco controllers and freeradius with Active Directory
back0-end.

We had horrible horrible HORRIBLE radius performance problems back
around 8.0 code.   I forget the exact version but the version that
fixed it introduced the concept of what Cisco calls "radius queues" but
it really just a range of UDP source ports to distribute the queries
across.


Run this command on your controller:  'show radius queue'
If you don't see multiple Source Ports, then upgrade WLC code **ASAFP**


 >show radius queue

Max Radius Queues Per Server..................... 8
 Source Port numbers used........................ 32769 32770 32771 32772 32773 
32774 32775 32776

Max Radius Buffers Available..................... 4064
 Currently number of Buffers consumed............ 1

Radius Authentication Messages Stats
 Total Auth Req sent(allocated).................. 71786156
 Total Auth Resp rcvd(freed)..................... 71786155
 Total Auth Req Pkts Dropped(no buffer).......... 0

Radius Accounting Messages Stats
 Total Acct Req sent(allocated).................. 0
 Total Acct Resp rcvd(freed)..................... 0
 Total Acct Req Pkts Dropped(no buffer).......... 0



The problem that we had was, when classes changed and everyone moved
locations and then reconnected to Wifi, more than 256 login
conversations were going on at once.    This overflowed the radius_id
8-bit counter and confused the controller and radius server about which
user was being authed.

Since radius is UDP and does not have a TCP session to keep track, the
only unique identifiers are the source IP, source mac, dest ip, dest
mac and radius_id 8-bit counter.   Since the source and dest it always
the same, the 8-bit counter is all you've got.

The controller would flush both conversations and force them to restart
auth which cascaded out of control.   Then it would failover to another
radius server and start spewing all the half-completed auth
conversations at the new radius server which, of course, had no
knowledge of the partially completed conversations.  Thus, this radius
server would fail out and the WLC would go on to the next.
Wifi was unusable for upwards of five or ten minutes at the top of each
hour.  Natives were gathering at the door with pitchforks and knives.
We were scared.




--
Earl Barfield -- Academic & Research Tech / Information Technology
Georgia Institute of Technology, Atlanta Georgia, 30332
Internet: earl.barfi...@oit.gatech.edu    e...@gatech.edu

**********
Participation and subscription information for this EDUCAUSE Constituent Group 
discussion list can be found at http://www.educause.edu/discuss.

Reply via email to