Re: [ovs-dev] [v3] odp-execute: Fix AVX checksum calculation.

Ilya Maximets Thu, 23 May 2024 14:35:52 -0700

On 5/23/24 13:03, Van Haaren, Harry wrote:
>> On 21 May 2024, at 16:13, Emma Finn wrote:
>>> The AVX implementation for calcualting checksums was not
>>> handling carry-over addition correctly in some cases.
>>> This patch adds an additional shuffle to add 16-bit padding
>>> to the final part of the calculation to handle such cases.
>>> This commit also adds a unit test to fuzz test the actions
>>> autovalidator.
>>>
>>> Signed-off-by: Emma Finn <[email protected]>
>>> Reported-by: Eelco Chaudron <[email protected]>
> <snip>
>>> +AT_SETUP([datapath - Actions Autovalidator Fuzzy])
>>> +AT_SKIP_IF([! $PYTHON3 -c "import scapy"], [], [])
>>> +AT_SKIP_IF([! $PYTHON3 $srcdir/genpkts.py 10000 fuzzy > packets])
>>
>> Hi Emma,
> 
> Hi Eelco, Ilya & Emma (&OVS community)
> 
> As you know checksums are a very "input sensitive" workload; its kind of
> why they're useful (hash ~uniquely represents inputset ideally)
> Internally, computing a hash always involves carry-bits and (expected/good)
> overflows, however this patch is fixing an unintended/bad overflow.
> Testing these corner-cases & boundary conditions is hard; for me the only
> way to be confident in its correctness is autovaldation and fuzzing.


Hi.  Fuzzing is indeed a valid testing approach and in case of checksums it
may be one of the most effective.  However, as already mentioned, it has a few
disadvantages.  The main one is unreliability - the test doesn't reproduce
the issue every time it runs, and also time - it takes about 3 minutes to run
the test.

Fuzzing tests are useful, but they should not be part of the unit testing,
especially if they are not guaranteed to reproduce issues they are written
for.  They should be ran separately and continuously.  Once the issue is
found through fuzzing, a new unit test for the found problem can be added to
the unit test suite, so we have a guaranteed failure in case the same issue
is re-introduced.  Fuzzing can't guarantee that.

This is the approach we use with oss-fuzz.  oss-fuzz project is continuously
fuzzing parts of OVS.  Once they find a bug, we fix it and add a regression
test either for this exact case or for a class of issues, if possible.

> 
>> As Ilya explained in the v2 reply, we do not like to have fuzzy tests in
>> the test suite as they are hard to reproduce.
>> (link 
>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#3313143)
> 
> I understand that any randomness in CI is not desired, and understand that
> random tests can be hard to reproduce. The autovalidator tries to mitigate
> the "reproduce" issue by dumping the packet contents on
> failure, providing the erroring case to easily allow reproduction.
> This test *should* be 100% reliably passing, so no false-positives are 
> expected.
> If not, we should fix the test to be 100% reliable.

This test can't produce false failures, I agree.  However, it does produce
false successes fairly regularly.  For example, for me it doesn't fail IPv4
check in about 2 out of 5 runs.  Making it more reliably reproduce the issue
would mean increasing the number of samples for the already very long test.

> 
>> So my request is to send out a v4 with a test modeled around the 'userspace
>> offload - ip csum offload' test case we have in tests/dpif-netdev.at, as per
>> Ilya’s request.
> 
> I'll suggest 3 paths forward, and give my pro/cons list for each; whatever
> option is preferred by OVS community we can send as a v4;
> 
> 1) Accept "SIMD code fix" part of patch only (no unit test, just fixes issue
>    with local fuzz-testing during development)
> 2) Build "specific-to-this-issue unit-test" only (regression tests against
>    this specific issue, but nothing else; low value in my opinion, as it
>    leaves a large input space untested)

Large input space will remain untested anyway.  Fuzzy test only covers a tiny
fraction at a time.

There is also an advantage to the specific regression test - we can check the
expected output.  In this case we can capture the resulted packet and check
that the checksum is not only the same between the action implementations,
but also that it is correct.  This adds value to the test and makes it useful
even if some implementations are not available on a testing system.

We can take the string representation of the packet, replace the IP, re-generate
the hex and compare with what we got on the other side in pcap.

> 3) Accept "v3 patch as is, with fuzzing" (best test-coverage, confidence in
>    current & future code changes. CI runtime on fuzz-tests, and if failures
>    occur, they must be flagged & fixed, but are *actual real issues* that
>    should be fixed.)
> 
> I think 2) is what is being asked above, but wanted to make sure the tradeoffs
> are understood.  As above, please indicate which is the preferred approach and
> a v4 will be on the way.
> 

Option 4:

Build a specific unit test that that will test for regression for this
particular issue.  Send the patch out.

Then move all the existing fuzzing tests into a separate test target, e.g.
'make check-fuzz-autovalidator' (maybe there is a better name), add fuzzy
actions test there.  Make some CI (realistically, Intel CI, because other
CI systems do not have AVX512) run it periodically or semi-continuously and
report failures to the mailing list, so fixes can be developed and regression
tests added to the main unit test suite.

What do you think?



FWIW, I collected a few samples that can be used for the regression test:

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x0800,
nw_src=9.77.254.180,nw_dst=216.53.194.20,nw_proto=6,nw_ttl=64,nw_frag=no,
tp_src=14303,tp_dst=1044,tcp_flags=ack

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x0800,
nw_src=108.72.150.185,nw_dst=247.95.158.114,nw_proto=6,nw_ttl=64,nw_frag=no,
tp_src=32598,tp_dst=7421,tcp_flags=ack

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x0800, 
nw_src=106.187.156.70,nw_dst=117.142.150.202,nw_proto=6,nw_ttl=64,nw_frag=no,
tp_src=10122,tp_dst=1462,tcp_flags=ack

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x86dd,
ipv6_src=2f8a:2076:3926:9e7:2d47:4bc9:9c7:17f3,
ipv6_dst=7287:10dd:2fb9:41d5:3eb2:2c7a:11b0:6258,
ipv6_label=0x51ac,nw_proto=6,nw_ttl=142,nw_frag=no,
tp_src=20405,tp_dst=20662,tcp_flags=ack

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x86dd,
ipv6_src=630c:659b:2561:38e1:6989:4f2b:3386:51f6,
ipv6_dst=658e:6f2:29fc:35b:5fdf:6558:119:40bc,
ipv6_label=0x721b,nw_proto=6,nw_ttl=20,nw_frag=no,
tp_src=2632,tp_dst=15427,tcp_flags=ack

eth_src=8a:bf:7e:2f:05:84,eth_dst=0a:8f:39:4f:e0:73,dl_type=0x86dd,
ipv6_src=16bf:5603:6d40:1620:2a8e:6a83:4348:6136,
ipv6_dst=5003:74fd:47cd:6b65:353c:7fc1:5ac5:7d08,
ipv6_label=0x7661,nw_proto=6,nw_ttl=214,nw_frag=no,
tp_src=21024,tp_dst=23624,tcp_flags=ack

Feeding them into 'ovs-ofctl compose-packet --bare' produces valid packets
that trigger the issue.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [v3] odp-execute: Fix AVX checksum calculation.

Reply via email to