On 10/25/23 11:45, Simon Horman wrote:
> On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote:
>> Many system tests currently use ping with the combination of a
>> low packet count (-c 3), short interval between sends (-i 0.3)
>> and a _deadline_ of 2 seconds (-d 2).
>>
>> This combination of options may lead to a situation where more
>> than count packets are sent however ping will stop when count
>> packets are received. This results in a failed test due to how
>> the result is checked, for example:
>>
>> ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING
>> @@ -1,2 +1,2 @@
>> -3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +4 packets transmitted, 3 received, 25% packet loss, time 0ms
>>
>> To reiterate, in the above example there is no packet loss, but
>> ping stops after _receiving_ 3 packets, not bothering with
>> waiting for the response to the fourth packet it just sent out.
>>
>> If we look at the iputils ping manual for the -w deadline option
>> we can read that this is expected behavior:
>>
>>> Specify a timeout, in seconds, before ping exits regardless of
>>> how many packets have been sent or received. In this case ping
>>> does not stop after count packet are sent, it waits either for
>>> deadline expire or until count probes are answered or for some
>>> error notification from network.
>>
>> To avoid these kinds of failures in checks where a response is
>> expected, we replace ping -w with ping -W.
>>
>> We keep ping -w for checks where it is expected to NOT get a
>> response.
>>
>> Signed-off-by: Frode Nordahl <[email protected]>
>
> Thanks Frode,
>
> TIL about -w and -W.
I learned about -W as well. :)
Thanks, Frode, for figuring out the cause of these failures! I've seen
them before, but didn't dig too deep to find a cause. OVN also has them
from time to time.
Though I'm not sure if -W is the right choice. Reading the description:
-W timeout
Time to wait for a response, in seconds. The
option affects only timeout in absence of any
responses, otherwise ping waits for two RTTs.
Real number allowed with dot as a decimal
separator (regardless locale setup). 0 means
infinite timeout.
And I don't really like the 'in absence of ANY responses' part of it.
So, IIUC, if we send 3 packets, first gets replied and the other two
are dropped somewhere, ping will ignore the timeout and will wait
indefinitely. Unfortunately, OVS gives the first packet a special
treatment, so potential for this scenario to happen is rather high.
This may break CI systems, getting them stuck testing one patch. And
it doesn't seem like we can mix -w and -W, at least the behavior is
not really defined in this case.
Would be really nice to use fping instead that has simple and very
straightforward arguments without side effects, but once again RHEL
doesn't package it...
Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead?
Another option might be to slightly reduce the deadline, so the 4th
packet will not be sent. But that sounds fragile.
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev