Date:    Thu, 24 Sep 2015 15:30:59 +0000
From:    "Curtis K. Larsen" <curtis.k.lar...@utah.edu>
Subject: Cisco WLC RADIUS Packet ID Bug

Hi Guys,

I have a TAC case open on this but It looks like once a week or so when the 
perfect storm arises we are hitting this one for a couple of minutes:  
CSCuo96366

---
WLC sends Radius packets with same ID without doing Radius ID check
CSCuo96366
Description
Symptom:
Clients are not able to Authenticate at Peak loads when using FreeRadius.

Conditions:
Using Freed radius (most susceptible), we observe at high auth rate and if 
Radius server is not responding to all Radius packets in seq order or if the 
server is slow, WLC when wraps around 0-255 Radius ID's, it does not do a check 
when posting new packet.

So essentially you have 2 packets with same ID being presented to AAA server.
---

The funny thing is that 9 of 10 WLC's are working fine against the same servers 
at the same time - the problem only happens on one WLC.  When it occurs we see 
this in the logs (Notice the same ID number 253 below)

servername radiusd[23964]: Discarding conflicting packet from client (IP of 
WLC) port 32770 - ID: 253 due to recent request 57345605.
servername radiusd[23964]: Discarding conflicting packet from client (IP of 
WLC) port 32770 - ID: 253 due to recent request 57347264

Wondering if other Cisco WLC customers see this since I know a lot of you are 
using FreeRADIUS, or FreeRADIUS-based authentication servers.  If so, let me 
know of any solutions and/or work-arounds.



Oh, Man!   I spent 18 months waiting for Cisco to fix this, sending
packet trace after packet trace and talking to anyone who would listen.

They finally fixed this is in 8.1 by using eight different UDP source
ports (hashed on client mac) to send radius requests to the freeradius
server.   This has been an absolutely HUGE improvement to our users!!!

Previously, we would have a cascde chain reaction at almost every class
change when thousands of students would relocate and then all
authenticate to Wifi within a minute or two.

The first conflicting packet would get discarded, causing a timeout.
The second discarded conflicting packet would again cause a timeout.
The third would cause the WiSM to failover to the other radius server
and stupidly spew all the half-completed EAP conversations to the newly
active radius server, which would ignore them.   The WiSM interpreted
this as more timeouts and failed to the tertiary radius server.

All this re-auth and failover caused utter havoc and it went on for
five minutes or so at every class change.

We added radius servers, dedicated AD servers to serve the radius
servers.   The only workaround that really helped before the fix in 8.1
code was to add controllers in order to keep the number of clients per
controller down.

I could talk about this forever after spending a year swimming in
radius packet decodes.   Suffice it to say: Get to 8.1 code ASAP!!!

I don't care what other bugs it may or may not have, this outweighs
them all for us.





--
Earl Barfield -- Academic & Research Tech / Information Technology
Georgia Institute of Technology, Atlanta Georgia, 30332
Internet: earl.barfi...@oit.gatech.edu    e...@gatech.edu

**********
Participation and subscription information for this EDUCAUSE Constituent Group 
discussion list can be found at http://www.educause.edu/groups/.

Reply via email to