Hi, friends. There’s been a lot of chatter on Educause and other lists related to ArubaOS STM issues, ARP issues, code versions, etc. I figured I’d share our details with you all in case it helps but at minimum to commiserate with you all. I’ve gathered experience notes from 10 other schools, and it seems we’re all battling similar and different issues. Chalking it up to another adverse effect from this pandemic (and the low usage period that has blinded us of issues undoubtedly surfaced from past network changes).
TLDR: * STM issue in stadium – lots of adjustments in place for tomorrow’s game * Potential STM issue in academic campus – eliminated Central On-Prem * Preemptively addressed potential ARP issues from bad clients * TLS clients timeout auth after modifying any netdestination – workaround implemented * Intermittent auth timeouts during peaks likely due to Citrix NetScaler (load balancers) Detail: 1.) Our game last weekend was awful in terms of connectivity. We went into the game with our MMs on 8.6.0.13 (controllers at 8.6.0.10) to preemptively mitigate possible STM issues. With our 6 7240XMs, we got to around 20,000 clients when we started seeing issues. STM CPU never spiked but the drops exponentially increased and clients were kicked offline, causing more station management load in a cascading way. Public Safety issues ensued due to our use of mobile ticketing and the lack of Internet connectivity for those that had yet to download their ticket to their phone’s wallet. We cut all NMS monitoring on the controller from the MM, Airwave, Central On-Prem, and AKiPS. We disabled 802.11k to reduce STM load as well. Controllers were rebooted to “reset” the STM process in a clean way and sort of start over, but this didn’t help. Game ended as bad as it started. Going into tomorrow’s here’s what we’re doing / have done: * Increased 6-node cluster supporting the stadium to 10-node * Adjusted client and AP load balancing for the cluster to align with more controllers * Disabled client-match in all ARM profiles * Enabled ARP attack protection * Disabled high-efficiency (i.e., disabled 802.11ax features) * Continuing to block all network monitoring * Preparing to disable Passpoint either ahead of the game or in case of emergencies 2.) On the academic side, we’re getting a lot of complaints from classrooms about students being unable to connect at the start of class. We suspect this is performance related to our load balancers (Citrix Netscalers), as when we change AAA on some controllers to directly to ClearPass, timeouts decrease considerably for those controllers. Case opened with Citrix, and we’ve increased memory to our virtual load balancers and will be adding CPU cores this evening. Fingers crossed. 3.) We learned from Aruba that Central On-Prem (COP) aggressively bootstraps to the controllers every 5min causing considerable STM load. We put a block on the controllers for tcp4343 from COP to prevent this load. 4.) We paid attention to others’ ARP issues and proactively implemented ARP attacks from bad performing clients on the controller. 5.) We’ve battled an issue for 20 months wherein we make a change to a netdestination and seemingly unrelated causes our TLS clients to start timing out authentications. This week, we discovered that an EAP packet of 1514 was being sent but no response received. We suspect the controller was blocking that, so we decreased the EAP fragmentation size on the controllers to 1400, eliminating this issue. (Finally, a win!) 6.) Aruba is actively working on efficiencies and will likely provide us a c-build in the next 1+ weeks. It’s unclear what exactly will make it into that c-build and the degree to which it’ll improve things. I’m happy to provide more detail for anyone that wants it. Best of luck to everyone!! Ryan Holland Associate Director, Wireless & Network Engineering The Ohio State University Office of the Chief Information Officer Enterprise Networking 102 Telecommunications Network Center (TNC), 320 W. 8th Ave, Columbus, OH 43201 614-292-9906 Office holland....@osu.edu<mailto:holland....@osu.edu> / wireless.osu.edu<http://wireless.osu.edu/> ********** Replies to EDUCAUSE Community Group emails are sent to the entire community list. If you want to reply only to the person who sent the message, copy and paste their email address and forward the email reply. Additional participation and subscription information can be found at https://www.educause.edu/community