On 10/26/23 14:06, Ilya Maximets wrote: > On 10/26/23 08:04, Frode Nordahl wrote: >> On Wed, Oct 25, 2023 at 11:33 PM Ilya Maximets <[email protected]> wrote: >>> >>> On 10/25/23 11:45, Simon Horman wrote: >>>> On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote: >>>>> Many system tests currently use ping with the combination of a >>>>> low packet count (-c 3), short interval between sends (-i 0.3) >>>>> and a _deadline_ of 2 seconds (-d 2). >>>>> >>>>> This combination of options may lead to a situation where more >>>>> than count packets are sent however ping will stop when count >>>>> packets are received. This results in a failed test due to how >>>>> the result is checked, for example: >>>>> >>>>> ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING >>>>> @@ -1,2 +1,2 @@ >>>>> -3 packets transmitted, 3 received, 0% packet loss, time 0ms >>>>> +4 packets transmitted, 3 received, 25% packet loss, time 0ms >>>>> >>>>> To reiterate, in the above example there is no packet loss, but >>>>> ping stops after _receiving_ 3 packets, not bothering with >>>>> waiting for the response to the fourth packet it just sent out. >>>>> >>>>> If we look at the iputils ping manual for the -w deadline option >>>>> we can read that this is expected behavior: >>>>> >>>>>> Specify a timeout, in seconds, before ping exits regardless of >>>>>> how many packets have been sent or received. In this case ping >>>>>> does not stop after count packet are sent, it waits either for >>>>>> deadline expire or until count probes are answered or for some >>>>>> error notification from network. >>>>> >>>>> To avoid these kinds of failures in checks where a response is >>>>> expected, we replace ping -w with ping -W. >>>>> >>>>> We keep ping -w for checks where it is expected to NOT get a >>>>> response. >>>>> >>>>> Signed-off-by: Frode Nordahl <[email protected]> >>>> >>>> Thanks Frode, >>>> >>>> TIL about -w and -W. >>> >>> I learned about -W as well. :) >>> >>> Thanks, Frode, for figuring out the cause of these failures! I've seen >>> them before, but didn't dig too deep to find a cause. OVN also has them >>> from time to time. >> >> yw, we run all the system tests in an automated fashion as part of the >> openvswitch package regression testing, and we see them quite >> frequently, most likely due to the load of the CI infrastructure. >> >>> Though I'm not sure if -W is the right choice. Reading the description: >>> >>> -W timeout >>> Time to wait for a response, in seconds. The >>> option affects only timeout in absence of any >>> responses, otherwise ping waits for two RTTs. >>> Real number allowed with dot as a decimal >>> separator (regardless locale setup). 0 means >>> infinite timeout. >>> >>> And I don't really like the 'in absence of ANY responses' part of it. >>> >>> So, IIUC, if we send 3 packets, first gets replied and the other two >>> are dropped somewhere, ping will ignore the timeout and will wait >>> indefinitely. Unfortunately, OVS gives the first packet a special >>> treatment, so potential for this scenario to happen is rather high. >>> This may break CI systems, getting them stuck testing one patch. And >>> it doesn't seem like we can mix -w and -W, at least the behavior is >>> not really defined in this case. >> >> It also says "otherwise ping waits for two RTTs.", so it will not wait >> indefinitely. The documentation is a bit convoluted though so I went >> to look so that we can be sure about what it will do. >> >> On arrival of the first packet, ping will gather various information >> [0] which will be used to compute the RTT [1], which is used when >> initializing the waittime [2][3]. >> >> So it appears to me -W would cover the scenario laid out above, i.e. >> if we get one reply quickly and the rest are lost, the computed RTT >> would have a ping exit within a reasonable timeframe. Even if the >> first response comes near the timeout value, the RTT would not be more >> than 6 seconds for a -W of 3. > > Looks like I misread the docs. Thanks for digging this through! > It does indeed look like it will not wait for too long. I also > tested this with the following set of OpenFlow rules that allows > exactly one ICMP packet to go through and drops the others (the > interval should be a bit higher for this to work, because it > relies on revalidation to update the flows): > > table=0 priority=100 icmp > actions=learn(table=0,priority=110,eth_type=0x0800,nw_proto=1,NXM_OF_ETH_SRC),normal > table=0 priority=0 actions=normal > > It does not wait indefinitely indeed. However, I can't reproduce > the RTT thing. In my testing it waits for an extra interval (-i) > instead. But maybe it's because RTT is much lower than the interval. > >> >> 0: >> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping.c#L1654-L1660 >> 1: >> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L761 >> 2: >> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L599 >> 3: >> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L263-L268 >> >>> Would be really nice to use fping instead that has simple and very >>> straightforward arguments without side effects, but once again RHEL >>> doesn't package it... >>> >>> Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead? >> >> That would also work, either option works for me, what would be your >> preference? > > The current -W solution seems fine for now. I'll run a few more > tests with it and then apply if it passes for me. > >> >>> Another option might be to slightly reduce the deadline, so the 4th >>> packet will not be sent. But that sounds fragile. >> >> Agreed. >> >
Thanks, Frode and Simon! Applied this one and backported down to 2.17. For the future: We might want to create a macro for running ping to avoid such big changes. But we'll first need to figure out why may tests are using different configurations. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
