Hi, friends. There’s been a lot of chatter on Educause and other lists related 
to ArubaOS STM issues, ARP issues, code versions, etc. I figured I’d share our 
details with you all in case it helps but at minimum to commiserate with you 
all. I’ve gathered experience notes from 10 other schools, and it seems we’re 
all battling similar and different issues. Chalking it up to another adverse 
effect from this pandemic (and the low usage period that has blinded us of 
issues undoubtedly surfaced from past network changes).

TLDR:

  *   STM issue in stadium – lots of adjustments in place for tomorrow’s game
  *   Potential STM issue in academic campus – eliminated Central On-Prem
  *   Preemptively addressed potential ARP issues from bad clients
  *   TLS clients timeout auth after modifying any netdestination – workaround 
implemented
  *   Intermittent auth timeouts during peaks likely due to Citrix NetScaler 
(load balancers)

Detail:
1.) Our game last weekend was awful in terms of connectivity. We went into the 
game with our MMs on 8.6.0.13 (controllers at 8.6.0.10) to preemptively 
mitigate possible STM issues. With our 6 7240XMs, we got to around 20,000 
clients when we started seeing issues. STM CPU never spiked but the drops 
exponentially increased and clients were kicked offline, causing more station 
management load in a cascading way. Public Safety issues ensued due to our use 
of mobile ticketing and the lack of Internet connectivity for those that had 
yet to download their ticket to their phone’s wallet. We cut all NMS monitoring 
on the controller from the MM, Airwave, Central On-Prem, and AKiPS. We disabled 
802.11k to reduce STM load as well. Controllers were rebooted to “reset” the 
STM process in a clean way and sort of start over, but this didn’t help. Game 
ended as bad as it started.

Going into tomorrow’s here’s what we’re doing / have done:

  *   Increased 6-node cluster supporting the stadium to 10-node
  *   Adjusted client and AP load balancing for the cluster to align with more 
controllers
  *   Disabled client-match in all ARM profiles
  *   Enabled ARP attack protection
  *   Disabled high-efficiency (i.e., disabled 802.11ax features)
  *   Continuing to block all network monitoring
  *   Preparing to disable Passpoint either ahead of the game or in case of 
emergencies

2.) On the academic side, we’re getting a lot of complaints from classrooms 
about students being unable to connect at the start of class. We suspect this 
is performance related to our load balancers (Citrix Netscalers), as when we 
change AAA on some controllers to directly to ClearPass, timeouts decrease 
considerably for those controllers. Case opened with Citrix, and we’ve 
increased memory to our virtual load balancers and will be adding CPU cores 
this evening. Fingers crossed.

3.) We learned from Aruba that Central On-Prem (COP) aggressively bootstraps to 
the controllers every 5min causing considerable STM load. We put a block on the 
controllers for tcp4343 from COP to prevent this load.

4.) We paid attention to others’ ARP issues and proactively implemented ARP 
attacks from bad performing clients on the controller.

5.) We’ve battled an issue for 20 months wherein we make a change to a 
netdestination and seemingly unrelated causes our TLS clients to start timing 
out authentications. This week, we discovered that an EAP packet of 1514 was 
being sent but no response received. We suspect the controller was blocking 
that, so we decreased the EAP fragmentation size on the controllers to 1400, 
eliminating this issue. (Finally, a win!)

6.) Aruba is actively working on efficiencies and will likely provide us a 
c-build in the next 1+ weeks. It’s unclear what exactly will make it into that 
c-build and the degree to which it’ll improve things.

I’m happy to provide more detail for anyone that wants it.

Best of luck to everyone!!

Ryan Holland
Associate Director, Wireless & Network Engineering
The Ohio State University
Office of the Chief Information Officer Enterprise Networking
102 Telecommunications Network Center (TNC), 320 W. 8th Ave, Columbus, OH 43201
614-292-9906 Office
holland....@osu.edu<mailto:holland....@osu.edu> / 
wireless.osu.edu<http://wireless.osu.edu/>


**********
Replies to EDUCAUSE Community Group emails are sent to the entire community 
list. If you want to reply only to the person who sent the message, copy and 
paste their email address and forward the email reply. Additional participation 
and subscription information can be found at https://www.educause.edu/community

Reply via email to