Jeremy, go ahead and create an issue with all the details for duplicating the problem along with the ATS version. I’ll work on it.
John [email protected] Sent from my iPhone > On Apr 23, 2022, at 2:32 PM, Jeremy Payne <[email protected]> wrote: > > re: ATS 9.1.2 > > parent policy = consistent_hash > strategy(nexthop) policy = consistent_hash > num-parent-rings = 2(primary/secondary) > num-nexthop-rings = 2(primary/secondary) > retry-window = 300s > failure-threshold = 10s > parent-connection-timeout = 2s > > I notice that the nexthop failure count upon a network timeout/event > never increments beyond 2. > With a failure threshold of 10, requests that land on this 'nexthop' > will always have to incur > a 2s timeout before moving onto the nexthop in the list. as the > threshold is never reached per nexthop's failure tracking. > > failure counts using parent_select works as expected, Thus the failing > parent is taken out of rotation per the parent retry timer/window(300) > > On it's face this looks like a bug. > If so, I'll submit a github issue. > If not, and if I'm missing something within the nexthop/strategies > configuration(or other suggestions), then please enlighten. :-) > > > > next_hop > > [Apr 23 01:49:27.883] [ET_NET 10] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [75] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:30.308] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:32.366] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [76] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:34.481] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:37.474] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [77] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:39.596] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:41.599] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [78] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:44.439] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:47.434] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [79] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:49.633] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:52.639] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [80] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:55.474] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for > 192.168.72.208 > [Apr 23 01:49:58.468] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139 > (markNextHop)> (next_hop) [81] Parent fail count increased to 2 for > 192.168.72.208 > > parent_select > > [Apr 23 02:11:12.507] [ET_NET 15] DEBUG: > <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) > Parent fail count increased to 2 for 192.168.72.208:80 > [Apr 23 02:11:14.595] [ET_NET 16] DEBUG: > <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) > Parent fail count increased to 3 for 192.168.72.208:80 > [Apr 23 02:11:17.589] [ET_NET 17] DEBUG: > <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) > Parent fail count increased to 4 for 192.168.72.208:80 > [Apr 23 02:11:20.435] [ET_NET 18] DEBUG: > <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) > Parent fail count increased to 5 for 192.168.72.208:80 > [Apr 23 02:11:22.602] [ET_NET 19] DEBUG: > <ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select) > Parent fail count increased to 6 for 192.168.72.208:80 > [Apr 23 02:11:25.587] [ET_NET 0] DEBUG: <ParentSelectionStrategy.cc:86 > (markParentDown)> (parent_select) Parent fail count increased to 7 for > 192.168.72.208:80 > [Apr 23 02:11:28.353] [ET_NET 1] DEBUG: <ParentSelectionStrategy.cc:86 > (markParentDown)> (parent_select) Parent fail count increased to 8 for > 192.168.72.208:80 > [Apr 23 02:11:30.795] [ET_NET 2] DEBUG: <ParentSelectionStrategy.cc:86 > (markParentDown)> (parent_select) Parent fail count increased to 9 for > 192.168.72.208:80 > [Apr 23 02:11:33.758] [ET_NET 3] DEBUG: <ParentSelectionStrategy.cc:86 > (markParentDown)> (parent_select) Parent fail count increased to 10 > for 192.168.72.208:80
