Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
Nick, Having network roots, I imagine you’ve tried / evaluate all of this already, but still worth mentioning. 1.) From the latent node, traceroute to all the other cluster nodes (since dbrep is more of a mesh nowadays). Is it taking the path you expect and/or the most optimal if more than one path exists? 2.) High NTP distance to a reference clock or can also cause really weird behavior in CCM, as it correlates to dbreplication. Sent from my iPhone On Nov 6, 2018, at 15:54, Wes Sisk (wsisk) mailto:ws...@cisco.com>> wrote: Nick, The features you describe are propagated by both SDL signaling and with a dependence on database replication. At casual observation it sounds like database traffic between nodes may not prioritized and may be delayed or dropped. The 80 msec is especially important for near real-time convergence of the distributed processes. Concurrently database replication plays a critical role as every process reads its local database. Very casually: node1: "Hey node 2, RouteList5 changed” node2: “okay, let me read the changes from my local database” node2: I don’t see any changes…. In the mean time database replication is held up in the network…. -Wes On Nov 6, 2018, at 3:31 PM, Nick Barnett mailto:nicksbarn...@gmail.com>> wrote: We think it is happening frequently WITHOUT this command being ran. Weird stuff happens... like deleting a speed dial and it never goes away... or changing the distribution order on a route list that auotmatically reverts back after a few seconds... or maybe the GUI shows it never reverted back however it is clearly not performing the correct algo. I can duplicate the RTT issue by raising the packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure that the pings I'm doing are NOT being classified and queued properly. It is very frustrating that I know what I'm talking (enough to discuss with them, but it has been 7 years since I was 100% router jockey) about and can't get them to pay attention to a probable network issue. I have an IP SLA running that shows average latency in the 20ms range. IP SLA is a fake red herring if you ask me... it only looks at an AVERAGE every 5 minutes and if there are no issues, of course it will look great. Thanks, Nick On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff mailto:ryanh...@outlook.com>> wrote: You are able to correlate the out-of-band RTT to only when the dbreplication stat command is ran, or are there other times the RTT is OOB that isn't related to querying the replication status? Thanks, -R From: cisco-voip mailto:cisco-voip-boun...@puck.nether.net>> on behalf of Nick Barnett mailto:nicksbarn...@gmail.com>> Sent: Tuesday, November 6, 2018 11:57 AM To: Cisco VoIP Group Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster? We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure. We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs. My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end). Thanks, Nick P.S. We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention. ___ cisco-voip mailing list cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
Nick, The features you describe are propagated by both SDL signaling and with a dependence on database replication. At casual observation it sounds like database traffic between nodes may not prioritized and may be delayed or dropped. The 80 msec is especially important for near real-time convergence of the distributed processes. Concurrently database replication plays a critical role as every process reads its local database. Very casually: node1: "Hey node 2, RouteList5 changed” node2: “okay, let me read the changes from my local database” node2: I don’t see any changes…. In the mean time database replication is held up in the network…. -Wes On Nov 6, 2018, at 3:31 PM, Nick Barnett mailto:nicksbarn...@gmail.com>> wrote: We think it is happening frequently WITHOUT this command being ran. Weird stuff happens... like deleting a speed dial and it never goes away... or changing the distribution order on a route list that auotmatically reverts back after a few seconds... or maybe the GUI shows it never reverted back however it is clearly not performing the correct algo. I can duplicate the RTT issue by raising the packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure that the pings I'm doing are NOT being classified and queued properly. It is very frustrating that I know what I'm talking (enough to discuss with them, but it has been 7 years since I was 100% router jockey) about and can't get them to pay attention to a probable network issue. I have an IP SLA running that shows average latency in the 20ms range. IP SLA is a fake red herring if you ask me... it only looks at an AVERAGE every 5 minutes and if there are no issues, of course it will look great. Thanks, Nick On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff mailto:ryanh...@outlook.com>> wrote: You are able to correlate the out-of-band RTT to only when the dbreplication stat command is ran, or are there other times the RTT is OOB that isn't related to querying the replication status? Thanks, -R From: cisco-voip mailto:cisco-voip-boun...@puck.nether.net>> on behalf of Nick Barnett mailto:nicksbarn...@gmail.com>> Sent: Tuesday, November 6, 2018 11:57 AM To: Cisco VoIP Group Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster? We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure. We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs. My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end). Thanks, Nick P.S. We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention. ___ cisco-voip mailing list cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
We think it is happening frequently WITHOUT this command being ran. Weird stuff happens... like deleting a speed dial and it never goes away... or changing the distribution order on a route list that auotmatically reverts back after a few seconds... or maybe the GUI shows it never reverted back however it is clearly not performing the correct algo. I can duplicate the RTT issue by raising the packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure that the pings I'm doing are NOT being classified and queued properly. It is very frustrating that I know what I'm talking (enough to discuss with them, but it has been 7 years since I was 100% router jockey) about and can't get them to pay attention to a probable network issue. I have an IP SLA running that shows average latency in the 20ms range. IP SLA is a fake red herring if you ask me... it only looks at an AVERAGE every 5 minutes and if there are no issues, of course it will look great. Thanks, Nick On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff wrote: > You are able to correlate the out-of-band RTT to only when the > dbreplication stat command is ran, or are there other times the RTT is OOB > that isn't related to querying the replication status? > > > Thanks, > > -R > -- > *From:* cisco-voip on behalf of Nick > Barnett > *Sent:* Tuesday, November 6, 2018 11:57 AM > *To:* Cisco VoIP Group > *Subject:* [cisco-voip] WAN Delays > 80ms for CUCM cluster? > > We all know the max latency is 80ms, but ours occasionally goes over. I'm > trying to track down why but the network team cannot find an issue. We are > able to reproduce the issue repeatedly by running "utils dbreplication > runtimestate." Whether this is causing the issue (I doubt it) or that > command just takes long enough to run that it will eventually find a time > that is > 80ms (my guess Is yes)... I'm not 100% sure. > > We opened a case with TAC to find out what that command is actually doing, > but they won't divulge the info that our network team needs. > > My theory is that it's actually calling some shell script in redhat under > the CLI appliance layer. Has anyone investigated that? Do we know what this > command is actually doing? Specifically, i want to know where it's getting > those ping times... is it running a generic ping with generic datagram > data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm > trying to give the network team something to go on because they are saying > it's not them. (Of course they could run a packet capture and tell me > (mostly) what it's doing, but it's hard to get their attention when they > don't think it's on their end). > > Thanks, > Nick > > P.S. We have frequent DB replication issues... at least a few times per > quarter. This is so annoying and I'm pretty sure it's due to this latency, > but I can't get anyone to pay attention. > ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
Yes, I agree, this is a super common "discussion" between app and network teams... I'm a converted network engineer (like I bet many people are these days)... so know all the tricks to push it back on the app :) On Tue, Nov 6, 2018 at 12:25 PM Wes Sisk (wsisk) wrote: > Nick, > > The command is invoking database commands that Cisco does not own. They > are not being obtuse; they genuinely do not know. > > It will cause a spike in database communication between nodes. > > My first guess is very much in line with yours that the burst in traffic > exceeds certain QoS queues. > > IMHO - and I emphasize the MY in that - this a rather classic discussion > point between application teams and network teams. > > What Matt suggests in a subsequent response is the the rather data > intensive way of getting that information. Fortunately wireshark has graphs > for round trip time. > > -Wes > > On Nov 6, 2018, at 11:57 AM, Nick Barnett wrote: > > We all know the max latency is 80ms, but ours occasionally goes over. I'm > trying to track down why but the network team cannot find an issue. We are > able to reproduce the issue repeatedly by running "utils dbreplication > runtimestate." Whether this is causing the issue (I doubt it) or that > command just takes long enough to run that it will eventually find a time > that is > 80ms (my guess Is yes)... I'm not 100% sure. > > We opened a case with TAC to find out what that command is actually doing, > but they won't divulge the info that our network team needs. > > My theory is that it's actually calling some shell script in redhat under > the CLI appliance layer. Has anyone investigated that? Do we know what this > command is actually doing? Specifically, i want to know where it's getting > those ping times... is it running a generic ping with generic datagram > data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm > trying to give the network team something to go on because they are saying > it's not them. (Of course they could run a packet capture and tell me > (mostly) what it's doing, but it's hard to get their attention when they > don't think it's on their end). > > Thanks, > Nick > > P.S. We have frequent DB replication issues... at least a few times per > quarter. This is so annoying and I'm pretty sure it's due to this latency, > but I can't get anyone to pay attention. > ___ > cisco-voip mailing list > cisco-voip@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-voip > > ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
Not a bad idea, but they have so much many more tools to do this. I'll keep this in mind though. Thanks. On Tue, Nov 6, 2018 at 11:06 AM Matt Jacobson wrote: > You could use the CLI packet capture with some filters to maximize the > capture window, run the dbreplication command once or twice, and then stop > the capture. Pop open RTMT, download the capture(s), and then see what you > find in Wireshark. > > On Tue, Nov 6, 2018 at 20:58 Nick Barnett wrote: > >> We all know the max latency is 80ms, but ours occasionally goes over. I'm >> trying to track down why but the network team cannot find an issue. We are >> able to reproduce the issue repeatedly by running "utils dbreplication >> runtimestate." Whether this is causing the issue (I doubt it) or that >> command just takes long enough to run that it will eventually find a time >> that is > 80ms (my guess Is yes)... I'm not 100% sure. >> >> We opened a case with TAC to find out what that command is actually >> doing, but they won't divulge the info that our network team needs. >> >> My theory is that it's actually calling some shell script in redhat under >> the CLI appliance layer. Has anyone investigated that? Do we know what this >> command is actually doing? Specifically, i want to know where it's getting >> those ping times... is it running a generic ping with generic datagram >> data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm >> trying to give the network team something to go on because they are saying >> it's not them. (Of course they could run a packet capture and tell me >> (mostly) what it's doing, but it's hard to get their attention when they >> don't think it's on their end). >> >> Thanks, >> Nick >> >> P.S. We have frequent DB replication issues... at least a few times per >> quarter. This is so annoying and I'm pretty sure it's due to this latency, >> but I can't get anyone to pay attention. >> ___ >> cisco-voip mailing list >> cisco-voip@puck.nether.net >> https://puck.nether.net/mailman/listinfo/cisco-voip >> > ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
You are able to correlate the out-of-band RTT to only when the dbreplication stat command is ran, or are there other times the RTT is OOB that isn't related to querying the replication status? Thanks, -R From: cisco-voip on behalf of Nick Barnett Sent: Tuesday, November 6, 2018 11:57 AM To: Cisco VoIP Group Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster? We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure. We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs. My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end). Thanks, Nick P.S. We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention. ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
Nick, The command is invoking database commands that Cisco does not own. They are not being obtuse; they genuinely do not know. It will cause a spike in database communication between nodes. My first guess is very much in line with yours that the burst in traffic exceeds certain QoS queues. IMHO - and I emphasize the MY in that - this a rather classic discussion point between application teams and network teams. What Matt suggests in a subsequent response is the the rather data intensive way of getting that information. Fortunately wireshark has graphs for round trip time. -Wes On Nov 6, 2018, at 11:57 AM, Nick Barnett wrote: We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure. We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs. My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end). Thanks, Nick P.S. We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention. ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?
You could use the CLI packet capture with some filters to maximize the capture window, run the dbreplication command once or twice, and then stop the capture. Pop open RTMT, download the capture(s), and then see what you find in Wireshark. On Tue, Nov 6, 2018 at 20:58 Nick Barnett wrote: > We all know the max latency is 80ms, but ours occasionally goes over. I'm > trying to track down why but the network team cannot find an issue. We are > able to reproduce the issue repeatedly by running "utils dbreplication > runtimestate." Whether this is causing the issue (I doubt it) or that > command just takes long enough to run that it will eventually find a time > that is > 80ms (my guess Is yes)... I'm not 100% sure. > > We opened a case with TAC to find out what that command is actually doing, > but they won't divulge the info that our network team needs. > > My theory is that it's actually calling some shell script in redhat under > the CLI appliance layer. Has anyone investigated that? Do we know what this > command is actually doing? Specifically, i want to know where it's getting > those ping times... is it running a generic ping with generic datagram > data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm > trying to give the network team something to go on because they are saying > it's not them. (Of course they could run a packet capture and tell me > (mostly) what it's doing, but it's hard to get their attention when they > don't think it's on their end). > > Thanks, > Nick > > P.S. We have frequent DB replication issues... at least a few times per > quarter. This is so annoying and I'm pretty sure it's due to this latency, > but I can't get anyone to pay attention. > ___ > cisco-voip mailing list > cisco-voip@puck.nether.net > https://puck.nether.net/mailman/listinfo/cisco-voip > ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip
[cisco-voip] WAN Delays > 80ms for CUCM cluster?
We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure. We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs. My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end). Thanks, Nick P.S. We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention. ___ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip