Jeremy, go ahead and create an issue with all the details for duplicating the 
problem along with the ATS version.  I’ll work on it.

John 
[email protected]


Sent from my iPhone

> On Apr 23, 2022, at 2:32 PM, Jeremy Payne <[email protected]> wrote:
> 
> re: ATS 9.1.2
> 
> parent policy = consistent_hash
> strategy(nexthop) policy = consistent_hash
> num-parent-rings = 2(primary/secondary)
> num-nexthop-rings = 2(primary/secondary)
> retry-window = 300s
> failure-threshold = 10s
> parent-connection-timeout = 2s
> 
> I notice that the nexthop failure count upon a network timeout/event
> never increments beyond 2.
> With a failure threshold of 10, requests that land on this 'nexthop'
> will always have to incur
> a 2s timeout before moving onto the nexthop in the list. as the
> threshold is never reached per nexthop's failure tracking.
> 
> failure counts using parent_select works as expected, Thus the failing
> parent is taken out of rotation per the parent retry timer/window(300)
> 
> On it's face this looks like a bug.
> If so,  I'll submit a github issue.
> If not, and if I'm missing something within the nexthop/strategies
> configuration(or other suggestions), then please enlighten. :-)
> 
> 
> 
> next_hop
> 
> [Apr 23 01:49:27.883] [ET_NET 10] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [75] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:30.308] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:32.366] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:34.481] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:37.474] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:39.596] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:41.599] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:44.439] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:47.434] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:49.633] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:52.639] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:55.474] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
> 192.168.72.208
> [Apr 23 01:49:58.468] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
> (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
> 192.168.72.208
> 
> parent_select
> 
> [Apr 23 02:11:12.507] [ET_NET 15] DEBUG:
> <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
> Parent fail count increased to 2 for 192.168.72.208:80
> [Apr 23 02:11:14.595] [ET_NET 16] DEBUG:
> <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
> Parent fail count increased to 3 for 192.168.72.208:80
> [Apr 23 02:11:17.589] [ET_NET 17] DEBUG:
> <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
> Parent fail count increased to 4 for 192.168.72.208:80
> [Apr 23 02:11:20.435] [ET_NET 18] DEBUG:
> <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
> Parent fail count increased to 5 for 192.168.72.208:80
> [Apr 23 02:11:22.602] [ET_NET 19] DEBUG:
> <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
> Parent fail count increased to 6 for 192.168.72.208:80
> [Apr 23 02:11:25.587] [ET_NET 0] DEBUG: <ParentSelectionStrategy.cc:86
> (markParentDown)> (parent_select) Parent fail count increased to 7 for
> 192.168.72.208:80
> [Apr 23 02:11:28.353] [ET_NET 1] DEBUG: <ParentSelectionStrategy.cc:86
> (markParentDown)> (parent_select) Parent fail count increased to 8 for
> 192.168.72.208:80
> [Apr 23 02:11:30.795] [ET_NET 2] DEBUG: <ParentSelectionStrategy.cc:86
> (markParentDown)> (parent_select) Parent fail count increased to 9 for
> 192.168.72.208:80
> [Apr 23 02:11:33.758] [ET_NET 3] DEBUG: <ParentSelectionStrategy.cc:86
> (markParentDown)> (parent_select) Parent fail count increased to 10
> for 192.168.72.208:80

Reply via email to