On 12/6/21 17:54, Ilya Maximets wrote: > On 12/3/21 17:47, Van Haaren, Harry wrote: >>> -----Original Message----- >>> From: David Marchand <[email protected]> >>> Sent: Thursday, December 2, 2021 3:18 PM >>> To: Van Haaren, Harry <[email protected]> >>> Cc: Amber, Kumar <[email protected]>; [email protected]; >>> [email protected]; [email protected]; [email protected] >>> Subject: Re: [ovs-dev] [PATCH v3 2/4] system-dpdk: Use dummy-pmd port for >>> packet injection. >>> >>> On Thu, Dec 2, 2021 at 2:56 PM Van Haaren, Harry >>> <[email protected]> wrote: >>>> >>>>> -----Original Message----- >>>>> From: dev <[email protected]> On Behalf Of David >>> Marchand >>>>> Sent: Thursday, December 2, 2021 12:21 PM >>>>> To: Amber, Kumar <[email protected]> >>>>> Cc: [email protected]; [email protected]; [email protected]; >>>>> [email protected] >>>>> Subject: Re: [ovs-dev] [PATCH v3 2/4] system-dpdk: Use dummy-pmd port >>> for >>>>> packet injection. >>>>> >>>>> On Wed, Dec 1, 2021 at 3:52 PM Amber, Kumar <[email protected]> >>>>> wrote: >>>>>>> diff --git a/tests/genpkts.py b/tests/genpkts.py new file mode 100755 >>> index >>>>>>> 0000000000..f64f786ccb >>>>>>> --- /dev/null >>>>>>> +++ b/tests/genpkts.py >>>>>>> @@ -0,0 +1,56 @@ >>>>>>> +#!/usr/bin/env python3 >>>>>>> + >>>>>>> +import sys >>>>>>> + >>>>>>> +from scapy.all import RandMAC, RandIP, RandIP6, RandShort, fuzz from >>>>>>> +scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP >>>>>>> + >>>>>>> +if len(sys.argv) < 2: >>>>>>> + print('usage: {} packets_count [fuzz]'.format(sys.argv[0])) >>>>>>> + sys.exit(1) >>>>>>> + >>>>>>> +tmpl = [] >>>>>>> + >>>>>>> +if len(sys.argv) == 2: >>>>>>> + eth = Ether(dst='ff:ff:ff:ff:ff:ff') >>>>>>> + vlan = eth / Dot1Q(vlan=1) >>>>>>> + p = eth / IP() / TCP(sport=20, dport=80, flags='SA', window=8192) >>>>>>> + tmpl += [p.build().hex()] >>>>>>> + p = eth / IP() / UDP(sport=53, dport=53) >>>>>>> + tmpl += [p.build().hex()] >>>>>>> + p = eth / IP() / TCP(sport=20, dport=80, flags='S', window=8192) >>>>>>> + tmpl += [p.build().hex()] >>>>>>> + p = eth / IP() / UDP(sport=53, dport=53) >>>>>>> + tmpl += [p.build().hex()] >>>>>>> + p = vlan / IP() / UDP(sport=53, dport=53) >>>>>>> + tmpl += [p.build().hex()] >>>>>>> + p = vlan / IP() / TCP(sport=20, dport=80, flags='S', window=8192) >>>>>>> + tmpl += [p.build().hex()] >>>>>> >>>>>> Hardcoding the values here is not preferable as we wanted to test the >>>>> optimized implementations >>>>>> with various values contained inside the header. >>>>> >>>>> Those hardcoded values comes from the pcap file that was previously used. >>>>> If you want to add more protocols, it is easier with this patch as you >>>>> only need to update some python script rather than rewrite a pcap >>>>> file. >>>> >>>> The PCAP was written by a script, which generated random MAC addresses; >>>> previously: eth = Ether(src=RandMAC(), dst=RandMAC()) >>> >>> We have 3 MFEX tests. >>> >>> OVS-DPDK - MFEX Autovalidator >>> OVS-DPDK - MFEX Autovalidator fuzzy >>> OVS-DPDK - MFEX Configuration >>> >>> In the first and 3rd ones, a pcap file tests/pcap/mfex_test.pcap >>> (committed to OVS sources) was used before this patch. >>> The hunk above from my patch is about replacing this "hardcoded" pcap >>> binary file with a python script that generates hardcoded packets. >> >> Apologies, my misunderstanding - I was under the impression the intent >> was to replace all of them. >> >>> For the 2nd test, if you look at the patch, you'll notice that the >>> fuzzy part is handled like before and generate random packets. >> >> Understood - gotcha. >> >>>> I do not see the value add here, and I'm concerned that when regressions in >>> these methods are pointed >>>> out that these are not acted on, but instead we are told "this hardcoded >>> method is better". It is not. >>> >>> Read what I wrote as: "hardcoded python" is better than "hardcoded >>> pcap" binary file. >> >> Yes, agree. >> >>>>>>> +on_exit "pkill -f -x -9 '$PYTHON3 $srcdir/genpkts.py -1 fuzz'" >>>>>>> +($PYTHON3 $srcdir/genpkts.py -1 fuzz | while read pkt; do >>>>>>> + ovs-appctl netdev-dummy/receive p1 "$pkt" || break >>>>>>> + done) & >>>>>>> + >>>>>>> AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator], >>> [0], >>>>> [dnl >>>>>>> Miniflow extract implementation set to autovalidator. >>>>>>> ]) >>>>>>> >>>>>>> -OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep -oP >>>>>>> 'rx_packets=\s*\K\d+'` -ge 100000]) >>>>>> >>>>>> We should increase the packet count to at-least 10x the current number to >>>>> have a proper fuzzy testing and we have measured it would only take 10~ to >>> 15~ >>>>> sec more. The current runtime did not catch issues when we purposely broke >>> the >>>>> implementation and by allowing to run for 10000 packets, it did catch the >>> induced >>>>> error. >>>>> >>>>> For me, the fuzzy testing does not have its place in a CI, because it >>>>> is not reproducible. >>>>> I let it in place and just made sure it would not reach the timeout. >>>> >>>> Disagree here, Fuzzing is *ideal* for CI, as it tests different inputs >>>> continuously, >>>> and each CI run improves the confidence in the system. A large number of >>> open >>>> source projects are actively doing large-scale fuzzing in CI instances. >>>> e.g.; https://google.github.io/clusterfuzz/ >>>> >>>> If the fuzzing autotests fails in CI, it still flags that there is *an >>>> issue*. In our >>>> case with AutoValidators for DPCLS and MFEX, it even prints a whole debug >>>> log >>>> of "good" miniflow, as well as "bad" results of the optimized >>>> implementation. >>>> These VLOG_ERR results would be hugely helpful in identifying & debugging. >>>> >>>> I do not understand the motivation for disabling/limiting fuzzing in CI. >>> >>> Fuzzing is sure a cool thing when run on an identified version to stress it. >>> >>> But those unit tests will be used for gating patches sent on the mailing >>> list. >>> Having non reproducible input means the test may flag a patch as >>> failing because of some previously merged change that did introduce >>> the regression. > > I agree that fuzzing is a great testing method and OVS actually uses it > via OSS-Fuzz project. But I also agree that unit/system tests should be > as stable as possible, i.e. not be flaky. Hence, fuzzing should not > really be part of them. > > Also, fuzzing is not a replacement for unit tests, because it's unreliable. > For example, I tried to generate the pcap file with mfex_fuzz script and > didn't found a single valid TCP packet in it. So, it doesn't cover even > very basic cases of valid traffic, and obviously any special cases of > valid traffic are not covered too. > > Not in this patch set, but in general, we need at least pass all the > packets generated by tests/flowgen.py through the autovalidator. > > Harry, Is that something you can work on? > >>> >>> I did not remove the fuzzing part, just made it so it runs in a >>> reasonable amount of time. > > This looks reasonable to me. See below. > >>> >>> >>> >>>> >>>>> On the system I used, this test takes 5s with 1k and timeouts with 10k. >>>>> So I guess the 10s/15s evaluation is dependent on the system. >>>>> >>>>> I prefer to stick to current value. >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> -- >>>>> David Marchand >>>> >>>> Removing automated randomness from Fuzzy testing, and removing/limiting >>> fuzz-testing in CI runs >>>> are both big steps backwards in my view, please reconsider what this patch >>>> is >>> actually trying to achieve. >>> >>> This patch removes dependency on DPDK pcap to inject traffic in OVS. >>> OVS has its own mechanism and this dependency is unjustified. >>> >>> That's what this patch is about. >>> >>> What do you propose so that we can move forward? >> >> I guess this is a tradeoff, we have fuzzing quantity & validation vs time. >> >> This patch reduces the performance of the fuzzing (due to dummy-inject >> instead >> of PCAP PMD read with DPDK+PCAP dependency), and trades off the amount >> of fuzzing that will be done in the CI as a result. >> >> Given that the other MFEX fuzzing tests are still valuable for >> stress-testing using >> the DPDK PCAP PMD, I wonder is there enough value to have an option of PCAP >> or dummy-inject instead of removing the PCAP approach? >> >> For example, can we runtime-detect if OVS is DPDK-PCAP enabled? And use the >> high performance method in that case? > > I think the end goal should be to remove the DPDK dependency entirely, > so tests can be moved out of system-dpdk (patch #3), because they are
It should be "patch #4", of course. > not actual system tests (in OVS's meaning of that term). > > This move will compensate the reduced number of fuzzy runs with the > increased number of testsuite runs in general, since people who actually > has the hardware will run these tests while running generic testsuite. > I don't think that a lot of people actually runs system tests. > > For your in-house testing, Harry, I assume, you might just run mfex tests > as many times as you think you need (Assuming that packets are actually > different every time, otherwise the whole fuzzy testing concept is kind > of pointless) until the right solution for fuzzing is in place. That > should not be hard to do. > > The right solution for the fuzzing should be: implement a fuzzing target > for the OSS-Fuzz and let it do its job. All the found issues could be > added as regression tests to the generic testsuite along with other unit > tests. It's a current workflow for all fuzzing tests we have. > See tests/oss-fuzz/ and tests/fuzz-regression/ directories for details. > > Harry, Amber, can you work on that? > > Once this is done, we can remove all the fuzzing parts for autovalidator > from the current testsuite keeping only static non-random tests. > > For the current patch: Taking into account all that I said, I don't think > that reduced number of fuzzy packets should be a blocker here. > > Best regards, Ilya Maximets. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
