> On Sep 18, 2019, at 8:11 AM, Dave Barach via Lists.Fd.Io 
> <[email protected]> wrote:
> 
> Sounds like one of the threads hasn't finished rebuilding its graph replica 
> after you crank up the ipsec tunnel. That operation occurs N-way parallel, 
> with each thread rebuilding its own replica.
> 
> Take a look at [aka ASSERT] *vlib_worker_threads->node_reforks_required == 0 
> at the moment when you enqueue packets from the brand-new IPSEC tunnel.

It seems to be OK. I also printed this value out in the debug from below and it 
shows zero during the issue:

2019-09-19 09:09:57.269310: 1: vlib_next_frame_change_ownership:293: 
vlib_next_frame_change_ownership: in ip4-arp node forks required 0
2019-09-19 09:09:57.269380: 1: vlib_next_frame_change_ownership:298: 
vlib_next_frame_change_ownership: in ip4-arp sending to error-drop: vec_len 
(node->next_nodes) 3 != 2 node_runtime->n_next_nodes)
2019-09-19 09:09:57.269403: 1: vlib_next_frame_change_ownership:301: 
vlib_next_frame_change_ownership: next 0: 595 error-drop
2019-09-19 09:09:57.269422: 1: vlib_next_frame_change_ownership:301: 
vlib_next_frame_change_ownership: next 1: 611 UnknownEthernet1-output
2019-09-19 09:09:57.269439: 1: vlib_next_frame_change_ownership:301: 
vlib_next_frame_change_ownership: next 2: 0 null-node

(the first log line is from adding the following code to the code posted 
earlier (below)

      clib_warning ("%s: in %s node forks required %u",
                    __FUNCTION__, node->name, 
*vlib_worker_threads->node_reforks_required);

Thanks,
Chris.

> 
> Dave
> 
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Christian Hopps
> Sent: Tuesday, September 17, 2019 10:31 AM
> To: vpp-dev <[email protected]>
> Cc: [email protected]
> Subject: [vpp-dev] Assert verifying node graph construction around ip4-arp.
> 
> 
> Hi vpp-dev,
> 
> I'm hitting an assert in vlib_next_frame_change_ownership (vlib/main.c)
> 
>  ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);
> 
> I added some code to see what was going on
> 
>  #ifdef CLIB_ASSERT_ENABLE
>    if (vec_len (node->next_nodes) != node_runtime->n_next_nodes)
>      {
>        clib_warning ("%s: in %s: vec_len (node->next_nodes) %u != %u 
> node_runtime->n_next_nodes)",
>                      __FUNCTION__, node->name, vec_len (node->next_nodes), 
> node_runtime->n_next_nodes);
>        for (int i = 0; i < vec_len (node->next_nodes); i++)
>            clib_warning ("%s: next %u: %u %s", __FUNCTION__, i, 
> node->next_nodes[i],
>                          vlib_get_node (vm, node->next_nodes[i])->name);
>      }
>  #endif
>    ASSERT (vec_len (node->next_nodes) == node_runtime->n_next_nodes);
> 
> And this was the output:
> 
>  2019-09-17 12:21:18.375523: 1: vlib_next_frame_change_ownership:293: 
> vlib_next_frame_change_ownership: in ip4-arp: vec_len (node->next_nodes) 3 != 
> 2 node_runtime-   >n_next_nodes)
>  2019-09-17 12:21:18.375600: 1: vlib_next_frame_change_ownership:296: 
> vlib_next_frame_change_ownership: next 0: 595 error-drop
>  2019-09-17 12:21:18.375630: 1: vlib_next_frame_change_ownership:296: 
> vlib_next_frame_change_ownership: next 1: 611 UnknownEthernet1-output
>  2019-09-17 12:21:18.375654: 1: vlib_next_frame_change_ownership:296: 
> vlib_next_frame_change_ownership: next 2: 0 null-node
>  2019-09-17 12:21:18.375678: 1: 
> /var/build/mb-build/openwrt-dd/build_dir/target-aarch64_cortex-a53+neon-vfpv4_glibc-2.22/vpp-19.04.2/src/vlib/main.c:300
>  (vlib_next_frame_change_ownership) assertion `vec_len (node->next_nodes) == 
> node_runtime->n_next_nodes' fails
> 
> The use case is that I've got an ipsec tunnel that's *just* been brought up 
> (i.e., admin up). I'm immediately sending traffic on it in a worker thread 
> (polling output routine) to the other end of the tunnel (and thus receiving 
> from the other end which is doing the same thing). Depending on the order the 
> tunnel endpoint interfaces are brought up I may or may not hit the above, but 
> it happens most of the time on at least one endpoint.
> 
> In any case, is this hitting some sort of race condition with node/graph 
> construction? I wonder this b/c I would think it should not happen that a 
> node's next array is larger than the node's runtime count of the same, but 
> only for some short period of time.
> 
> Thanks,
> Chris.
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> 
> View/Reply Online (#14014): https://lists.fd.io/g/vpp-dev/message/14014
> Mute This Topic: https://lists.fd.io/mt/34176568/1826170
> Group Owner: [email protected]
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
> -=-=-=-=-=-=-=-=-=-=-=-

Attachment: signature.asc
Description: Message signed with OpenPGP

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14019): https://lists.fd.io/g/vpp-dev/message/14019
Mute This Topic: https://lists.fd.io/mt/34176568/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to