Rob, When we were troubleshooting an issue last summer, I recall Aruba TAC mentioning problems without out-order-packets. As a troubleshooting step we removed all ECMP and reduced LACP bundles to one member between AP and controller to eliminate this possibility. Just a thought…
-j -- Jason Lavoie Director of Networking and Infrastructure Bowdoin College > On Dec 29, 2020, at 11:07 AM, Robert Spellman <rsp...@bates.edu> wrote: > Juniper, huh? We are a Juniper shop too. Odd thing is, we haven't made any > hardware or configuration changes this month. We have been building new > switches for our top of rack, but until Sunday, I hadn't moved the aruba > controllers off of the old layer 2 switches. Once the problem started, I > moved them to the new switches, and the problem continued. > > I do agree that this could be a packet loss issue. Pings from the access > points to the controllers are fine though, even when the access point > connection is down. I've turned on ap-debug on one of the access points in > my office, and I do see a number of messages indicating that the ap received > a packet with the wrong sequence number, as well as messages indicating the > retry timer is getting called. > > In looking at the wrong sequence number errors, it looks like I receive the > response from the controller twice. The ap is fine with the first packet, > but doesn't like the second packet: > > Dec 29 10:26:04 sap_msg_proc: received message from <controller-ip>:8222: > code 16101 type PWR_EVENT_UPDATE id (<ap-ip>,00eb3380,512) length 5 > Dec 29 10:26:04 C>> sapd_proc_sap_resp: msg from <controller-ip>:8222 id > (<ap-ip>,00eb3380,512) type PWR_EVENT_UPDATE result 0 size 0 > Dec 29 10:26:04 PMM: sapd_pwr_event_send_cb: is standby: 0, result OK. > Dec 29 10:26:04 sapd_proc_sap_resp: Calling 'queued' message type 18 Send now > Dec 29 10:26:04 C<< sapd_msg_send_cur: Sending msg ID (<ap-ip>,00eb3380,513) > type PWR_EVENT_UPDATE len 182 secure 0 to <controller-ip> > Dec 29 10:26:04 sapd_msg_send_cur: Retry Timer called for message > PWR_EVENT_UPDATE, with 10 rexmit time > Dec 29 10:26:04 sapd_svc_recv_sap_msg_output: len: 25 type: PWR_EVENT_UPDATE > seq_num: 512. > Dec 29 10:26:04 sap_msg_proc: received message from <controller-ip>:8222: > code 16101 type PWR_EVENT_UPDATE id (<ap-ip>,00eb3380,512) length 5 > Dec 29 10:26:04 C>> sapd_proc_sap_resp: msg from <controller-ip>:8222 id > (<ap-ip>,00eb3380,512) type PWR_EVENT_UPDATE result 0 size 0 > Dec 29 10:26:04 sapd_proc_sap_resp: Non-matching response received; dropped. > 1 18 != 18, expecting id (<ap-ip>,00eb3380,513) -- Jason Lavoie Director of Networking and Infrastructure Bowdoin College (207)725-3315 ********** Replies to EDUCAUSE Community Group emails are sent to the entire community list. If you want to reply only to the person who sent the message, copy and paste their email address and forward the email reply. Additional participation and subscription information can be found at https://www.educause.edu/community