On 10/26/23 08:04, Frode Nordahl wrote: > On Wed, Oct 25, 2023 at 11:33 PM Ilya Maximets <[email protected]> wrote: >> >> On 10/25/23 11:45, Simon Horman wrote: >>> On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote: >>>> Many system tests currently use ping with the combination of a >>>> low packet count (-c 3), short interval between sends (-i 0.3) >>>> and a _deadline_ of 2 seconds (-d 2). >>>> >>>> This combination of options may lead to a situation where more >>>> than count packets are sent however ping will stop when count >>>> packets are received. This results in a failed test due to how >>>> the result is checked, for example: >>>> >>>> ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING >>>> @@ -1,2 +1,2 @@ >>>> -3 packets transmitted, 3 received, 0% packet loss, time 0ms >>>> +4 packets transmitted, 3 received, 25% packet loss, time 0ms >>>> >>>> To reiterate, in the above example there is no packet loss, but >>>> ping stops after _receiving_ 3 packets, not bothering with >>>> waiting for the response to the fourth packet it just sent out. >>>> >>>> If we look at the iputils ping manual for the -w deadline option >>>> we can read that this is expected behavior: >>>> >>>>> Specify a timeout, in seconds, before ping exits regardless of >>>>> how many packets have been sent or received. In this case ping >>>>> does not stop after count packet are sent, it waits either for >>>>> deadline expire or until count probes are answered or for some >>>>> error notification from network. >>>> >>>> To avoid these kinds of failures in checks where a response is >>>> expected, we replace ping -w with ping -W. >>>> >>>> We keep ping -w for checks where it is expected to NOT get a >>>> response. >>>> >>>> Signed-off-by: Frode Nordahl <[email protected]> >>> >>> Thanks Frode, >>> >>> TIL about -w and -W. >> >> I learned about -W as well. :) >> >> Thanks, Frode, for figuring out the cause of these failures! I've seen >> them before, but didn't dig too deep to find a cause. OVN also has them >> from time to time. > > yw, we run all the system tests in an automated fashion as part of the > openvswitch package regression testing, and we see them quite > frequently, most likely due to the load of the CI infrastructure. > >> Though I'm not sure if -W is the right choice. Reading the description: >> >> -W timeout >> Time to wait for a response, in seconds. The >> option affects only timeout in absence of any >> responses, otherwise ping waits for two RTTs. >> Real number allowed with dot as a decimal >> separator (regardless locale setup). 0 means >> infinite timeout. >> >> And I don't really like the 'in absence of ANY responses' part of it. >> >> So, IIUC, if we send 3 packets, first gets replied and the other two >> are dropped somewhere, ping will ignore the timeout and will wait >> indefinitely. Unfortunately, OVS gives the first packet a special >> treatment, so potential for this scenario to happen is rather high. >> This may break CI systems, getting them stuck testing one patch. And >> it doesn't seem like we can mix -w and -W, at least the behavior is >> not really defined in this case. > > It also says "otherwise ping waits for two RTTs.", so it will not wait > indefinitely. The documentation is a bit convoluted though so I went > to look so that we can be sure about what it will do. > > On arrival of the first packet, ping will gather various information > [0] which will be used to compute the RTT [1], which is used when > initializing the waittime [2][3]. > > So it appears to me -W would cover the scenario laid out above, i.e. > if we get one reply quickly and the rest are lost, the computed RTT > would have a ping exit within a reasonable timeframe. Even if the > first response comes near the timeout value, the RTT would not be more > than 6 seconds for a -W of 3.
Looks like I misread the docs. Thanks for digging this through! It does indeed look like it will not wait for too long. I also tested this with the following set of OpenFlow rules that allows exactly one ICMP packet to go through and drops the others (the interval should be a bit higher for this to work, because it relies on revalidation to update the flows): table=0 priority=100 icmp actions=learn(table=0,priority=110,eth_type=0x0800,nw_proto=1,NXM_OF_ETH_SRC),normal table=0 priority=0 actions=normal It does not wait indefinitely indeed. However, I can't reproduce the RTT thing. In my testing it waits for an extra interval (-i) instead. But maybe it's because RTT is much lower than the interval. > > 0: > https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping.c#L1654-L1660 > 1: > https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L761 > 2: > https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L599 > 3: > https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L263-L268 > >> Would be really nice to use fping instead that has simple and very >> straightforward arguments without side effects, but once again RHEL >> doesn't package it... >> >> Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead? > > That would also work, either option works for me, what would be your > preference? The current -W solution seems fine for now. I'll run a few more tests with it and then apply if it passes for me. > >> Another option might be to slightly reduce the deadline, so the 4th >> packet will not be sent. But that sounds fragile. > > Agreed. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
