> On Sep 18, 2019, at 8:11 AM, Dave Barach via Lists.Fd.Io > <[email protected]> wrote: > > Sounds like one of the threads hasn't finished rebuilding its graph replica > after you crank up the ipsec tunnel. That operation occurs N-way parallel, > with each thread rebuilding its own replica. > > Take a look at [aka ASSERT] *vlib_worker_threads->node_reforks_required == 0 > at the moment when you enqueue packets from the brand-new IPSEC tunnel.
It seems to be OK. I also printed this value out in the debug from below and it
shows zero during the issue:
2019-09-19 09:09:57.269310: 1: vlib_next_frame_change_ownership:293:
vlib_next_frame_change_ownership: in ip4-arp node forks required 0
2019-09-19 09:09:57.269380: 1: vlib_next_frame_change_ownership:298:
vlib_next_frame_change_ownership: in ip4-arp sending to error-drop: vec_len
(node->next_nodes) 3 != 2 node_runtime->n_next_nodes)
2019-09-19 09:09:57.269403: 1: vlib_next_frame_change_ownership:301:
vlib_next_frame_change_ownership: next 0: 595 error-drop
2019-09-19 09:09:57.269422: 1: vlib_next_frame_change_ownership:301:
vlib_next_frame_change_ownership: next 1: 611 UnknownEthernet1-output
2019-09-19 09:09:57.269439: 1: vlib_next_frame_change_ownership:301:
vlib_next_frame_change_ownership: next 2: 0 null-node
(the first log line is from adding the following code to the code posted
earlier (below)
clib_warning ("%s: in %s node forks required %u",
__FUNCTION__, node->name,
*vlib_worker_threads->node_reforks_required);
Thanks,
Chris.
>
> Dave
>
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Christian Hopps
> Sent: Tuesday, September 17, 2019 10:31 AM
> To: vpp-dev <[email protected]>
> Cc: [email protected]
> Subject: [vpp-dev] Assert verifying node graph construction around ip4-arp.
>
>
> Hi vpp-dev,
>
> I'm hitting an assert in vlib_next_frame_change_ownership (vlib/main.c)
>
> ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);
>
> I added some code to see what was going on
>
> #ifdef CLIB_ASSERT_ENABLE
> if (vec_len (node->next_nodes) != node_runtime->n_next_nodes)
> {
> clib_warning ("%s: in %s: vec_len (node->next_nodes) %u != %u
> node_runtime->n_next_nodes)",
> __FUNCTION__, node->name, vec_len (node->next_nodes),
> node_runtime->n_next_nodes);
> for (int i = 0; i < vec_len (node->next_nodes); i++)
> clib_warning ("%s: next %u: %u %s", __FUNCTION__, i,
> node->next_nodes[i],
> vlib_get_node (vm, node->next_nodes[i])->name);
> }
> #endif
> ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);
>
> And this was the output:
>
> 2019-09-17 12:21:18.375523: 1: vlib_next_frame_change_ownership:293:
> vlib_next_frame_change_ownership: in ip4-arp: vec_len (node->next_nodes) 3 !=
> 2 node_runtime- >n_next_nodes)
> 2019-09-17 12:21:18.375600: 1: vlib_next_frame_change_ownership:296:
> vlib_next_frame_change_ownership: next 0: 595 error-drop
> 2019-09-17 12:21:18.375630: 1: vlib_next_frame_change_ownership:296:
> vlib_next_frame_change_ownership: next 1: 611 UnknownEthernet1-output
> 2019-09-17 12:21:18.375654: 1: vlib_next_frame_change_ownership:296:
> vlib_next_frame_change_ownership: next 2: 0 null-node
> 2019-09-17 12:21:18.375678: 1:
> /var/build/mb-build/openwrt-dd/build_dir/target-aarch64_cortex-a53+neon-vfpv4_glibc-2.22/vpp-19.04.2/src/vlib/main.c:300
> (vlib_next_frame_change_ownership) assertion `vec_len (node->next_nodes) ==
> node_runtime->n_next_nodes' fails
>
> The use case is that I've got an ipsec tunnel that's *just* been brought up
> (i.e., admin up). I'm immediately sending traffic on it in a worker thread
> (polling output routine) to the other end of the tunnel (and thus receiving
> from the other end which is doing the same thing). Depending on the order the
> tunnel endpoint interfaces are brought up I may or may not hit the above, but
> it happens most of the time on at least one endpoint.
>
> In any case, is this hitting some sort of race condition with node/graph
> construction? I wonder this b/c I would think it should not happen that a
> node's next array is larger than the node's runtime count of the same, but
> only for some short period of time.
>
> Thanks,
> Chris.
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
>
> View/Reply Online (#14014): https://lists.fd.io/g/vpp-dev/message/14014
> Mute This Topic: https://lists.fd.io/mt/34176568/1826170
> Group Owner: [email protected]
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
> -=-=-=-=-=-=-=-=-=-=-=-
signature.asc
Description: Message signed with OpenPGP
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14019): https://lists.fd.io/g/vpp-dev/message/14019 Mute This Topic: https://lists.fd.io/mt/34176568/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
