On 10/26/23 08:04, Frode Nordahl wrote:
> On Wed, Oct 25, 2023 at 11:33 PM Ilya Maximets <[email protected]> wrote:
>>
>> On 10/25/23 11:45, Simon Horman wrote:
>>> On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote:
>>>> Many system tests currently use ping with the combination of a
>>>> low packet count (-c 3), short interval between sends (-i 0.3)
>>>> and a _deadline_ of 2 seconds (-d 2).
>>>>
>>>> This combination of options may lead to a situation where more
>>>> than count packets are sent however ping will stop when count
>>>> packets are received. This results in a failed test due to how
>>>> the result is checked, for example:
>>>>
>>>>     ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING
>>>>     @@ -1,2 +1,2 @@
>>>>     -3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>>     +4 packets transmitted, 3 received, 25% packet loss, time 0ms
>>>>
>>>> To reiterate, in the above example there is no packet loss, but
>>>> ping stops after _receiving_ 3 packets, not bothering with
>>>> waiting for the response to the fourth packet it just sent out.
>>>>
>>>> If we look at the iputils ping manual for the -w deadline option
>>>> we can read that this is expected behavior:
>>>>
>>>>> Specify a timeout, in seconds, before ping exits regardless of
>>>>> how many packets have been sent or received. In this case ping
>>>>> does not stop after count packet are sent, it waits either for
>>>>> deadline expire or until count probes are answered or for some
>>>>> error notification from network.
>>>>
>>>> To avoid these kinds of failures in checks where a response is
>>>> expected, we replace ping -w with ping -W.
>>>>
>>>> We keep ping -w for checks where it is expected to NOT get a
>>>> response.
>>>>
>>>> Signed-off-by: Frode Nordahl <[email protected]>
>>>
>>> Thanks Frode,
>>>
>>> TIL about -w and -W.
>>
>> I learned about -W as well. :)
>>
>> Thanks, Frode, for figuring out the cause of these failures!  I've seen
>> them before, but didn't dig too deep to find a cause.  OVN also has them
>> from time to time.
> 
> yw, we run all the system tests in an automated fashion as part of the
> openvswitch package regression testing, and we see them quite
> frequently, most likely due to the load of the CI infrastructure.
> 
>> Though I'm not sure if -W is the right choice.  Reading the description:
>>
>>        -W timeout
>>            Time to wait for a response, in seconds. The
>>            option affects only timeout in absence of any
>>            responses, otherwise ping waits for two RTTs.
>>            Real number allowed with dot as a decimal
>>            separator (regardless locale setup). 0 means
>>            infinite timeout.
>>
>> And I don't really like the 'in absence of ANY responses' part of it.
>>
>> So, IIUC, if we send 3 packets, first gets replied and the other two
>> are dropped somewhere, ping will ignore the timeout and will wait
>> indefinitely.  Unfortunately, OVS gives the first packet a special
>> treatment, so potential for this scenario to happen is rather high.
>> This may break CI systems, getting them stuck testing one patch. And
>> it doesn't seem like we can mix -w and -W, at least the behavior is
>> not really defined in this case.
> 
> It also says "otherwise ping waits for two RTTs.", so it will not wait
> indefinitely. The documentation is a bit convoluted though so I went
> to look so that we can be sure about what it will do.
> 
> On arrival of the first packet, ping will gather various information
> [0] which will be used to compute the RTT [1], which is used when
> initializing the waittime [2][3].
> 
> So it appears to me -W would cover the scenario laid out above, i.e.
> if we get one reply quickly and the rest are lost, the computed RTT
> would have a ping exit within a reasonable timeframe. Even if the
> first response comes near the timeout value, the RTT would not be more
> than 6 seconds for a -W of 3.

Looks like I misread the docs.  Thanks for digging this through!
It does indeed look like it will not wait for too long.  I also
tested this with the following set of OpenFlow rules that allows
exactly one ICMP packet to go through and drops the others (the
interval should be a bit higher for this to work, because it
relies on revalidation to update the flows):

 table=0 priority=100 icmp 
actions=learn(table=0,priority=110,eth_type=0x0800,nw_proto=1,NXM_OF_ETH_SRC),normal
 table=0 priority=0 actions=normal

It does not wait indefinitely indeed.  However, I can't reproduce
the RTT thing.  In my testing it waits for an extra interval (-i)
instead.  But maybe it's because RTT is much lower than the interval.

> 
> 0: 
> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping.c#L1654-L1660
> 1: 
> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L761
> 2: 
> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L599
> 3: 
> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L263-L268
> 
>> Would be really nice to use fping instead that has simple and very
>> straightforward arguments without side effects, but once again RHEL
>> doesn't package it...
>>
>> Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead?
> 
> That would also work, either option works for me, what would be your 
> preference?

The current -W solution seems fine for now.  I'll run a few more
tests with it and then apply if it passes for me.

> 
>> Another option might be to slightly reduce the deadline, so the 4th
>> packet will not be sent.  But that sounds fragile.
> 
> Agreed.
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to