Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Ryan Huff
Nick,

Having network roots, I imagine you’ve tried / evaluate all of this already, 
but still worth mentioning.

1.) From the latent node, traceroute to all the other cluster nodes (since 
dbrep is more of a mesh nowadays). Is it taking the path you expect and/or the 
most optimal if more than one path exists?

2.) High NTP distance to a reference clock or can also cause really weird 
behavior in CCM, as it correlates to dbreplication.

Sent from my iPhone

On Nov 6, 2018, at 15:54, Wes Sisk (wsisk) 
mailto:ws...@cisco.com>> wrote:

Nick,

The features you describe are propagated by both SDL signaling and with a 
dependence on database replication.

At casual observation it sounds like database traffic between nodes may not 
prioritized and may be delayed or dropped.

The 80 msec is especially important for near real-time convergence of the 
distributed processes. Concurrently database replication plays a critical role 
as every process reads its local database.

Very casually:
node1: "Hey node 2, RouteList5 changed”
node2: “okay, let me read the changes from my local database”
node2: I don’t see any changes….

In the mean time database replication is held up in the network….

-Wes


On Nov 6, 2018, at 3:31 PM, Nick Barnett 
mailto:nicksbarn...@gmail.com>> wrote:

We think it is happening frequently WITHOUT this command being ran. Weird stuff 
happens... like deleting a speed dial and it never goes away... or changing the 
distribution order on a route list that auotmatically reverts back after a few 
seconds... or maybe the GUI shows it never reverted back however it is clearly 
not performing the correct algo. I can duplicate the RTT issue by raising the 
packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 
80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure 
that the pings I'm doing are NOT being classified and queued properly. It is 
very frustrating that I know what I'm talking (enough to discuss with them, but 
it has been 7 years since I was 100% router jockey) about and can't get them to 
pay attention to a probable network issue.

I have an IP SLA running that shows average latency in the 20ms range. IP SLA 
is a fake red herring if you ask me... it only looks at an AVERAGE every 5 
minutes and if there are no issues, of course it will look great.

Thanks,
Nick

On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff 
mailto:ryanh...@outlook.com>> wrote:
You are able to correlate the out-of-band RTT to only when the dbreplication 
stat command is ran, or are there other times the RTT is OOB that isn't related 
to querying the replication status?

Thanks,

-R

From: cisco-voip 
mailto:cisco-voip-boun...@puck.nether.net>> 
on behalf of Nick Barnett 
mailto:nicksbarn...@gmail.com>>
Sent: Tuesday, November 6, 2018 11:57 AM
To: Cisco VoIP Group
Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

We all know the max latency is 80ms, but ours occasionally goes over. I'm 
trying to track down why but the network team cannot find an issue. We are able 
to reproduce the issue repeatedly by running "utils dbreplication 
runtimestate." Whether this is causing the issue (I doubt it) or that command 
just takes long enough to run that it will eventually find a time that is > 
80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing, but 
they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under the 
CLI appliance layer. Has anyone investigated that? Do we know what this command 
is actually doing? Specifically, i want to know where it's getting those ping 
times... is it running a generic ping with generic datagram data? Is it sending 
a 1497 packet of 0x and then 0x? Basically, I'm trying to give the 
network team something to go on because they are saying it's not them. (Of 
course they could run a packet capture and tell me (mostly) what it's doing, 
but it's hard to get their attention when they don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per 
quarter. This is so annoying and I'm pretty sure it's due to this latency, but 
I can't get anyone to pay attention.
___
cisco-voip mailing list
cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Wes Sisk (wsisk) via cisco-voip
Nick,

The features you describe are propagated by both SDL signaling and with a 
dependence on database replication.

At casual observation it sounds like database traffic between nodes may not 
prioritized and may be delayed or dropped.

The 80 msec is especially important for near real-time convergence of the 
distributed processes. Concurrently database replication plays a critical role 
as every process reads its local database.

Very casually:
node1: "Hey node 2, RouteList5 changed”
node2: “okay, let me read the changes from my local database”
node2: I don’t see any changes….

In the mean time database replication is held up in the network….

-Wes


On Nov 6, 2018, at 3:31 PM, Nick Barnett 
mailto:nicksbarn...@gmail.com>> wrote:

We think it is happening frequently WITHOUT this command being ran. Weird stuff 
happens... like deleting a speed dial and it never goes away... or changing the 
distribution order on a route list that auotmatically reverts back after a few 
seconds... or maybe the GUI shows it never reverted back however it is clearly 
not performing the correct algo. I can duplicate the RTT issue by raising the 
packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 
80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure 
that the pings I'm doing are NOT being classified and queued properly. It is 
very frustrating that I know what I'm talking (enough to discuss with them, but 
it has been 7 years since I was 100% router jockey) about and can't get them to 
pay attention to a probable network issue.

I have an IP SLA running that shows average latency in the 20ms range. IP SLA 
is a fake red herring if you ask me... it only looks at an AVERAGE every 5 
minutes and if there are no issues, of course it will look great.

Thanks,
Nick

On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff 
mailto:ryanh...@outlook.com>> wrote:
You are able to correlate the out-of-band RTT to only when the dbreplication 
stat command is ran, or are there other times the RTT is OOB that isn't related 
to querying the replication status?

Thanks,

-R

From: cisco-voip 
mailto:cisco-voip-boun...@puck.nether.net>> 
on behalf of Nick Barnett 
mailto:nicksbarn...@gmail.com>>
Sent: Tuesday, November 6, 2018 11:57 AM
To: Cisco VoIP Group
Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

We all know the max latency is 80ms, but ours occasionally goes over. I'm 
trying to track down why but the network team cannot find an issue. We are able 
to reproduce the issue repeatedly by running "utils dbreplication 
runtimestate." Whether this is causing the issue (I doubt it) or that command 
just takes long enough to run that it will eventually find a time that is > 
80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing, but 
they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under the 
CLI appliance layer. Has anyone investigated that? Do we know what this command 
is actually doing? Specifically, i want to know where it's getting those ping 
times... is it running a generic ping with generic datagram data? Is it sending 
a 1497 packet of 0x and then 0x? Basically, I'm trying to give the 
network team something to go on because they are saying it's not them. (Of 
course they could run a packet capture and tell me (mostly) what it's doing, 
but it's hard to get their attention when they don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per 
quarter. This is so annoying and I'm pretty sure it's due to this latency, but 
I can't get anyone to pay attention.
___
cisco-voip mailing list
cisco-voip@puck.nether.net<mailto:cisco-voip@puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Nick Barnett
We think it is happening frequently WITHOUT this command being ran. Weird
stuff happens... like deleting a speed dial and it never goes away... or
changing the distribution order on a route list that auotmatically reverts
back after a few seconds... or maybe the GUI shows it never reverted back
however it is clearly not performing the correct algo. I can duplicate the
RTT issue by raising the packet size to 1200 and doing a repeat 100
packets. it WILL give me times over 80ms. BUT, the SDL traffic is supposed
to be QOS in a certain way and I'm sure that the pings I'm doing are NOT
being classified and queued properly. It is very frustrating that I know
what I'm talking (enough to discuss with them, but it has been 7 years
since I was 100% router jockey) about and can't get them to pay attention
to a probable network issue.

I have an IP SLA running that shows average latency in the 20ms range. IP
SLA is a fake red herring if you ask me... it only looks at an AVERAGE
every 5 minutes and if there are no issues, of course it will look great.

Thanks,
Nick

On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff  wrote:

> You are able to correlate the out-of-band RTT to only when the
> dbreplication stat command is ran, or are there other times the RTT is OOB
> that isn't related to querying the replication status?
>
>
> Thanks,
>
> -R
> --
> *From:* cisco-voip  on behalf of Nick
> Barnett 
> *Sent:* Tuesday, November 6, 2018 11:57 AM
> *To:* Cisco VoIP Group
> *Subject:* [cisco-voip] WAN Delays > 80ms for CUCM cluster?
>
> We all know the max latency is 80ms, but ours occasionally goes over. I'm
> trying to track down why but the network team cannot find an issue. We are
> able to reproduce the issue repeatedly by running "utils dbreplication
> runtimestate." Whether this is causing the issue (I doubt it) or that
> command just takes long enough to run that it will eventually find a time
> that is > 80ms (my guess Is yes)... I'm not 100% sure.
>
> We opened a case with TAC to find out what that command is actually doing,
> but they won't divulge the info that our network team needs.
>
> My theory is that it's actually calling some shell script in redhat under
> the CLI appliance layer. Has anyone investigated that? Do we know what this
> command is actually doing? Specifically, i want to know where it's getting
> those ping times... is it running a generic ping with generic datagram
> data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm
> trying to give the network team something to go on because they are saying
> it's not them. (Of course they could run a packet capture and tell me
> (mostly) what it's doing, but it's hard to get their attention when they
> don't think it's on their end).
>
> Thanks,
> Nick
>
> P.S.  We have frequent DB replication issues... at least a few times per
> quarter. This is so annoying and I'm pretty sure it's due to this latency,
> but I can't get anyone to pay attention.
>
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Nick Barnett
Yes, I agree, this is a super common "discussion" between app and network
teams... I'm a converted network engineer (like I bet many people are these
days)... so know all the tricks to push it back on the app :)

On Tue, Nov 6, 2018 at 12:25 PM Wes Sisk (wsisk)  wrote:

> Nick,
>
> The command is invoking database commands that Cisco does not own. They
> are not being obtuse; they genuinely do not know.
>
> It will cause a spike in database communication between nodes.
>
> My first guess is very much in line with yours that the burst in traffic
> exceeds certain QoS queues.
>
> IMHO - and I emphasize the MY in that - this a rather classic discussion
> point between application teams and network teams.
>
> What Matt suggests in a subsequent response is the the rather data
> intensive way of getting that information. Fortunately wireshark has graphs
> for round trip time.
>
> -Wes
>
> On Nov 6, 2018, at 11:57 AM, Nick Barnett  wrote:
>
> We all know the max latency is 80ms, but ours occasionally goes over. I'm
> trying to track down why but the network team cannot find an issue. We are
> able to reproduce the issue repeatedly by running "utils dbreplication
> runtimestate." Whether this is causing the issue (I doubt it) or that
> command just takes long enough to run that it will eventually find a time
> that is > 80ms (my guess Is yes)... I'm not 100% sure.
>
> We opened a case with TAC to find out what that command is actually doing,
> but they won't divulge the info that our network team needs.
>
> My theory is that it's actually calling some shell script in redhat under
> the CLI appliance layer. Has anyone investigated that? Do we know what this
> command is actually doing? Specifically, i want to know where it's getting
> those ping times... is it running a generic ping with generic datagram
> data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm
> trying to give the network team something to go on because they are saying
> it's not them. (Of course they could run a packet capture and tell me
> (mostly) what it's doing, but it's hard to get their attention when they
> don't think it's on their end).
>
> Thanks,
> Nick
>
> P.S.  We have frequent DB replication issues... at least a few times per
> quarter. This is so annoying and I'm pretty sure it's due to this latency,
> but I can't get anyone to pay attention.
> ___
> cisco-voip mailing list
> cisco-voip@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Nick Barnett
Not a bad idea, but they have so much many more tools to do this. I'll keep
this in mind though. Thanks.

On Tue, Nov 6, 2018 at 11:06 AM Matt Jacobson 
wrote:

> You could use the CLI packet capture with some filters to maximize the
> capture window, run the dbreplication command once or twice, and then stop
> the capture. Pop open RTMT, download the capture(s), and then see what you
> find in Wireshark.
>
> On Tue, Nov 6, 2018 at 20:58 Nick Barnett  wrote:
>
>> We all know the max latency is 80ms, but ours occasionally goes over. I'm
>> trying to track down why but the network team cannot find an issue. We are
>> able to reproduce the issue repeatedly by running "utils dbreplication
>> runtimestate." Whether this is causing the issue (I doubt it) or that
>> command just takes long enough to run that it will eventually find a time
>> that is > 80ms (my guess Is yes)... I'm not 100% sure.
>>
>> We opened a case with TAC to find out what that command is actually
>> doing, but they won't divulge the info that our network team needs.
>>
>> My theory is that it's actually calling some shell script in redhat under
>> the CLI appliance layer. Has anyone investigated that? Do we know what this
>> command is actually doing? Specifically, i want to know where it's getting
>> those ping times... is it running a generic ping with generic datagram
>> data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm
>> trying to give the network team something to go on because they are saying
>> it's not them. (Of course they could run a packet capture and tell me
>> (mostly) what it's doing, but it's hard to get their attention when they
>> don't think it's on their end).
>>
>> Thanks,
>> Nick
>>
>> P.S.  We have frequent DB replication issues... at least a few times per
>> quarter. This is so annoying and I'm pretty sure it's due to this latency,
>> but I can't get anyone to pay attention.
>> ___
>> cisco-voip mailing list
>> cisco-voip@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-voip
>>
>
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Ryan Huff
You are able to correlate the out-of-band RTT to only when the dbreplication 
stat command is ran, or are there other times the RTT is OOB that isn't related 
to querying the replication status?


Thanks,

-R

From: cisco-voip  on behalf of Nick Barnett 

Sent: Tuesday, November 6, 2018 11:57 AM
To: Cisco VoIP Group
Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

We all know the max latency is 80ms, but ours occasionally goes over. I'm 
trying to track down why but the network team cannot find an issue. We are able 
to reproduce the issue repeatedly by running "utils dbreplication 
runtimestate." Whether this is causing the issue (I doubt it) or that command 
just takes long enough to run that it will eventually find a time that is > 
80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing, but 
they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under the 
CLI appliance layer. Has anyone investigated that? Do we know what this command 
is actually doing? Specifically, i want to know where it's getting those ping 
times... is it running a generic ping with generic datagram data? Is it sending 
a 1497 packet of 0x and then 0x? Basically, I'm trying to give the 
network team something to go on because they are saying it's not them. (Of 
course they could run a packet capture and tell me (mostly) what it's doing, 
but it's hard to get their attention when they don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per 
quarter. This is so annoying and I'm pretty sure it's due to this latency, but 
I can't get anyone to pay attention.
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Wes Sisk (wsisk) via cisco-voip
Nick,

The command is invoking database commands that Cisco does not own. They are not 
being obtuse; they genuinely do not know.

It will cause a spike in database communication between nodes.

My first guess is very much in line with yours that the burst in traffic 
exceeds certain QoS queues.

IMHO - and I emphasize the MY in that - this a rather classic discussion point 
between application teams and network teams.

What Matt suggests in a subsequent response is the the rather data intensive 
way of getting that information. Fortunately wireshark has graphs for round 
trip time.

-Wes

On Nov 6, 2018, at 11:57 AM, Nick Barnett  wrote:

We all know the max latency is 80ms, but ours occasionally goes over. I'm 
trying to track down why but the network team cannot find an issue. We are able 
to reproduce the issue repeatedly by running "utils dbreplication 
runtimestate." Whether this is causing the issue (I doubt it) or that command 
just takes long enough to run that it will eventually find a time that is > 
80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing, but 
they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under the 
CLI appliance layer. Has anyone investigated that? Do we know what this command 
is actually doing? Specifically, i want to know where it's getting those ping 
times... is it running a generic ping with generic datagram data? Is it sending 
a 1497 packet of 0x and then 0x? Basically, I'm trying to give the 
network team something to go on because they are saying it's not them. (Of 
course they could run a packet capture and tell me (mostly) what it's doing, 
but it's hard to get their attention when they don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per 
quarter. This is so annoying and I'm pretty sure it's due to this latency, but 
I can't get anyone to pay attention.
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip

___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


Re: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Matt Jacobson
You could use the CLI packet capture with some filters to maximize the
capture window, run the dbreplication command once or twice, and then stop
the capture. Pop open RTMT, download the capture(s), and then see what you
find in Wireshark.

On Tue, Nov 6, 2018 at 20:58 Nick Barnett  wrote:

> We all know the max latency is 80ms, but ours occasionally goes over. I'm
> trying to track down why but the network team cannot find an issue. We are
> able to reproduce the issue repeatedly by running "utils dbreplication
> runtimestate." Whether this is causing the issue (I doubt it) or that
> command just takes long enough to run that it will eventually find a time
> that is > 80ms (my guess Is yes)... I'm not 100% sure.
>
> We opened a case with TAC to find out what that command is actually doing,
> but they won't divulge the info that our network team needs.
>
> My theory is that it's actually calling some shell script in redhat under
> the CLI appliance layer. Has anyone investigated that? Do we know what this
> command is actually doing? Specifically, i want to know where it's getting
> those ping times... is it running a generic ping with generic datagram
> data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm
> trying to give the network team something to go on because they are saying
> it's not them. (Of course they could run a packet capture and tell me
> (mostly) what it's doing, but it's hard to get their attention when they
> don't think it's on their end).
>
> Thanks,
> Nick
>
> P.S.  We have frequent DB replication issues... at least a few times per
> quarter. This is so annoying and I'm pretty sure it's due to this latency,
> but I can't get anyone to pay attention.
> ___
> cisco-voip mailing list
> cisco-voip@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip


[cisco-voip] WAN Delays > 80ms for CUCM cluster?

2018-11-06 Thread Nick Barnett
We all know the max latency is 80ms, but ours occasionally goes over. I'm
trying to track down why but the network team cannot find an issue. We are
able to reproduce the issue repeatedly by running "utils dbreplication
runtimestate." Whether this is causing the issue (I doubt it) or that
command just takes long enough to run that it will eventually find a time
that is > 80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing,
but they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under
the CLI appliance layer. Has anyone investigated that? Do we know what this
command is actually doing? Specifically, i want to know where it's getting
those ping times... is it running a generic ping with generic datagram
data? Is it sending a 1497 packet of 0x and then 0x? Basically, I'm
trying to give the network team something to go on because they are saying
it's not them. (Of course they could run a packet capture and tell me
(mostly) what it's doing, but it's hard to get their attention when they
don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per
quarter. This is so annoying and I'm pretty sure it's due to this latency,
but I can't get anyone to pay attention.
___
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip