On Mon, Jul 12, 2021 at 10:28 AM Mark Gray <[email protected]> wrote:
>
> Statically add IPv6 neighbor MAC addresses to avoid NS messages
> evicting datapath flows causing occasional test failures.
>
> We also configure all interfaces to send only one IPv6 router
> solicitation message. These messages can cause datapath flows
> to be unexpectedly evicted causing test failures.
>
> Fixes: 7c927c0c0be1 ("ovn-northd: Fix IPv6 ECMP symmetric reply flows")
> Signed-off-by: Mark Gray <[email protected]>
> ---
>  tests/system-ovn.at | 51 +++++++++++++++++++++++++--------------------
>  1 file changed, 28 insertions(+), 23 deletions(-)
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 79879c6e003b..fc377bbd1a47 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -5833,17 +5833,35 @@ ovn-nbctl lr-route-add R3 fd01::/64 fd02::1
>
>  # Logical port 'alice1' in switch 'alice'.
>  ADD_NAMESPACES(alice1)
> +# Only send 1 router solicitation as any additional ones can cause
datapath
> +# flows to get evicted, causing unexpected failures below.
> +NS_CHECK_EXEC([alice1], [sysctl -w
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
> +net.ipv6.conf.default.router_solicitations = 1
> +])
>  ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \
>           "fd01::1")
>  OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep
tentative)" = ""])
>  ovn-nbctl lsp-add alice alice1 \
>  -- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2"
> +# Add neighbour MAC address to avoid sending IPv6 NS messages which could
> +# cause datapath flows to be evicted
> +NS_CHECK_EXEC([alice1], [ip -6 neigh add fd01::1 lladdr
00:00:01:01:02:03 dev alice1], [0])
>
>  # Logical port 'bob1' in switch 'bob'.
>  ADD_NAMESPACES(bob1)
> +# Only send 1 router solicitation as any additional ones can cause
datapath
> +# flows to get evicted, causing unexpected failures below.
> +NS_CHECK_EXEC([bob1], [sysctl -w
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
> +net.ipv6.conf.default.router_solicitations = 1
> +])
>  ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \
>           "fd07::2")
>  OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep
tentative)" = ""])
> +# Add neighbour MAC addresses to avoid sending IPv6 NS messages which
could
> +# cause datapath flows to be evicted
> +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::2 lladdr 00:00:02:01:02:03
dev bob1], [0])
> +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::3 lladdr 00:00:01:01:02:04
dev bob1], [0])
> +
>  ovn-nbctl lsp-add bob bob1 \
>  -- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1"
>
> @@ -5852,45 +5870,32 @@ ovn-nbctl --wait=hv sync
>
>  on_exit 'ovs-ofctl dump-flows br-int'
>
> -# Later in this test we will check for a datapath flow that matches:
> -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)".
Due
> -# to the way OVS generates datapath flows with wildcards, ICMPv6 NS
flows will
> -# evict this datapath flow. In order to ensure that the flow does not
> -# get evicted, we send one ping packet in order to carry out neighbor
> -# discovery. We then flush the datpath to remove the NS flows so that
the flow
> -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)"
will
> -# be present when we check for it.
> -NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING],
\
> -[0], [dnl
> -2 packets transmitted, 2 received, 0% packet loss, time 0ms
> -])
> -ovs-appctl dpctl/del-flows
> -
>  # 'bob1' should be able to ping 'alice1' directly.
>  NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 |
FORMAT_PING], \
>  [0], [dnl
>  20 packets transmitted, 20 received, 0% packet loss, time 0ms
>  ])
>
> -# Ensure conntrack entry is present. We should not try to predict
> -# the tunnel key for the output port, so we strip it from the labels
> -# and just ensure that the known ethernet address is present.
> -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
> -sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> -sed -e
's/labels=0x[[0-9a-f]]*00000401020400000000/labels=0x00000401020400000000/'],
[0], [dnl
>
-icmpv6,orig=(src=fd07::1,dst=fd01::2,id=<cleared>,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=<cleared>,type=129,code=0),zone=<cleared>,labels=0x00000401020400000000
> -])
> -
>  # Ensure datapaths show conntrack states as expected
>  # Like with conntrack entries, we shouldn't try to predict
>  # port binding tunnel keys. So omit them from expected labels.
>  AT_CHECK([ovs-appctl dpctl/dump-flows | grep
'ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)' -c],
[0], [dnl
>  1
>  ])
> +
>  AT_CHECK([ovs-appctl dpctl/dump-flows | grep
'ct_state(-new+est+rpl+trk).*ct_label(0x.*00000401020400000000/.*)' -c],
[0], [dnl
>  1
>  ])
>
> +# Ensure conntrack entry is present. We should not try to predict
> +# the tunnel key for the output port, so we strip it from the labels
> +# and just ensure that the known ethernet address is present.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
> +sed -e
's/labels=0x[[0-9a-f]]*00000401020400000000/labels=0x00000401020400000000/'],
[0], [dnl
>
+icmpv6,orig=(src=fd07::1,dst=fd01::2,id=<cleared>,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=<cleared>,type=129,code=0),zone=<cleared>,labels=0x00000401020400000000
> +])
> +
>  ovs-ofctl dump-flows br-int
>
>  OVS_APP_EXIT_AND_WAIT([ovn-controller])
> --
> 2.27.0
>

Thanks Mark for the fix! Just a minor comment regarding changing the sysctl
settings for router_solicitations. It is better to save the original value
before changing it and restore it afterwards. Maybe still a problem if the
test fails in between the settings could not be restored, but I don't have
a better idea.

This approach does fix the problem at least on my machine. However, I don't
think I fully understood the problem. I tried many times when reverting the
last NAT related northd change and the test always passes, while adding the
change it fails 100%. I understand that there are RS packets causing the
expected flows got evicted from DP. The older version passes the test
probably because the pipelines were different so that other flows got
evicted but not the ones we are checking in this test. However, what I
don't understand is that we are checking the DP flows immediately after the
20 ping packets succeed, how would the RS/NS packets always come right
after the ping tests causing the DP flows evicted? If the RS/NS packets
come at some intervals, we wouldn't be so lucky, right? Could you help
clarify my confusion?

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to