RE: [WIRELESS-LAN] spurious cpi report of mass AP disassociation

Hector J Rios Tue, 12 Sep 2017 07:31:00 -0700

I can confirm Manon’s statement. In a call with Cisco TAC yesterday, they 
confirmed that the issues we experienced on the first week of classes (massive 
AP flaps) were related to bug CSCva66176. Cisco was able to recreate this in 
the lab. In our case, we had our HA hot-standbys defined in our mobility group 
members, but because they are in hot-standby mode, their status is “Control and 
Data Path Down”. The resolution is to use the Mobility MAC address instead.


Please double check with your Cisco local team, but we are pretty sure this 
caused our problems. It is supposed to be fixed in 8.3. We asked Cisco for a 
patch for 8.2. Not sure if this will happen.

I know everyone has different experiences, but after the Cisco TAC fiasco we 
had, the Cisco BU and our local team were very responsive and diligent in 
addressing this particular issue for us.

Regards,

Hector Rios
Louisiana State University



From: The EDUCAUSE Wireless Issues Constituent Group Listserv 
[mailto:[email protected]] On Behalf Of Mark Duling
Sent: Monday, September 11, 2017 7:49 PM
To: [email protected]
Subject: Re: [WIRELESS-LAN] spurious cpi report of mass AP disassociation

Thanks for all the replies everyone. Well I'm not used to looking at AP logs, 
but I just logged into one AP on the list and on the day it happened all I see 
are some of these:

%DOT11-4-CCMP_REPLAY .... AES-CCMP TSC replays

and two of these for a client:

%DOT11-4-FLUSH_DEAUTH: Consecutive tx fail 500+: deauth

I'm not used to looking at AP logs, but I would think if the AP thought it 
disassociated it would say so. Another one on the list shows nothing 
corresponding to the time (if I've translated the time properly) but the radio 
interface was reset during the day.

*Sep  8 00:45:20.799: %LINK-6-UPDOWN: Interface Dot11Radio1, changed state to 
down
*Sep  8 00:45:20.803: %LINK-5-CHANGED: Interface Dot11Radio1, changed state to 
reset
*Sep  8 00:45:21.807: %LINEPROTO-5-UPDOWN: Line protocol on Interface 
Dot11Radio1, changed state to down
*Sep  8 00:45:21.831: %DOT11-6-DFS_SCAN_START: DFS: Scanning frequency 5500 MHz 
for 60 seconds.
*Sep  8 00:45:21.835: %LINK-6-UPDOWN: Interface Dot11Radio1, changed state to up
*Sep  8 00:45:22.835: %LINEPROTO-5-UPDOWN: Line protocol on Interface 
Dot11Radio1, changed state to up
*Sep  8 00:45:35.347: %CLEANAIR-6-STATE: Slot 1 down
*Sep  8 00:45:52.167: %CLEANAIR-6-STATE: Slot 1 enabled
*Sep  8 00:46:21.947: %DOT11-6-DFS_SCAN_COMPLETE: DFS scan complete on 
frequency 5500 MHz
*Sep  8 01:28:39.379: %DOT11-4-CCMP_REPLAY: Client [redacted] had 1 AES-CCMP 
TSC replays
*Sep  8 02:03:10.883: %DOT11-4-CCMP_REPLAY: Client [redacted] had 1 AES-CCMP 
TSC replays
*Sep  8 21:44:55.403: %DOT11-4-CCMP_REPLAY: Client [redacted] had 46 AES-CCMP 
TSC replays

Not sure what to make of the logs.

On Mon, Sep 11, 2017 at 2:04 PM, Jeffrey D. Sessler 
<[email protected]<mailto:[email protected]>> wrote:
Did you go back and correlate the event? For example, SSH into a few of the 
WAP’s and look at their logs to see what they thought happened. Did the CAPWAP 
uptime actually change on their WAPs qne/or the hours they report being 
connected. The WAP logs tend to be very informative.

If you use DHCP to hand out IPs for the WAPs, did you have a look at your DHCP 
logs? Many years ago, I saw something similar and it turned out to be the DHCP 
server – a mass of WAPs went to renew at the same time, DHCP server couldn’t 
take the load, and failing the renewal, a mass of WAPs disassociated/associated.

Jeff

From: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>> 
on behalf of "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, September 11, 2017 at 11:48 AM
To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [WIRELESS-LAN] spurious cpi report of mass AP disassociation

We're using Cisco 8540 on code 8.2.151.0. Last week CPI reported a great number 
of simultaneous AP disassociations and then reassociation. CPI shows all the 
events had the exact same timestamp right down to the hundredth second. It was 
just a single event.

But I can find no event preceding it that would cause such a thing. No 
preceding controller errors that I can see. At least a hundred APs were on the 
list. The APs weren't the same type or in the same buildings. I can find no 
common thing at all about it.

No one called in to report any issues. I would think if they really did drop 
those on an affected AP would have noticed. Only one AP in the building housing 
IT was on the report, so perhaps not surprising that none of us noticed 
anything.

Has anyone out there seen anything like this? Aside from the unknown cause, is 
it possible for disassocation and reassociation happened fast enough that users 
wouldn't see any serious disruption if only doing stateless stuff? I'd have 
trouble believing the controller would report AP drops that didn't happen.
********** Participation and subscription information for this EDUCAUSE 
Constituent Group discussion list can be found at 
http://www.educause.edu/discuss.
********** Participation and subscription information for this EDUCAUSE 
Constituent Group discussion list can be found at 
http://www.educause.edu/discuss.

********** Participation and subscription information for this EDUCAUSE 
Constituent Group discussion list can be found at 
http://www.educause.edu/discuss.

**********
Participation and subscription information for this EDUCAUSE Constituent Group 
discussion list can be found at http://www.educause.edu/discuss.

RE: [WIRELESS-LAN] spurious cpi report of mass AP disassociation

Reply via email to