On 10/26/23 14:06, Ilya Maximets wrote:
> On 10/26/23 08:04, Frode Nordahl wrote:
>> On Wed, Oct 25, 2023 at 11:33 PM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 10/25/23 11:45, Simon Horman wrote:
>>>> On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote:
>>>>> Many system tests currently use ping with the combination of a
>>>>> low packet count (-c 3), short interval between sends (-i 0.3)
>>>>> and a _deadline_ of 2 seconds (-d 2).
>>>>>
>>>>> This combination of options may lead to a situation where more
>>>>> than count packets are sent however ping will stop when count
>>>>> packets are received. This results in a failed test due to how
>>>>> the result is checked, for example:
>>>>>
>>>>>     ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING
>>>>>     @@ -1,2 +1,2 @@
>>>>>     -3 packets transmitted, 3 received, 0% packet loss, time 0ms
>>>>>     +4 packets transmitted, 3 received, 25% packet loss, time 0ms
>>>>>
>>>>> To reiterate, in the above example there is no packet loss, but
>>>>> ping stops after _receiving_ 3 packets, not bothering with
>>>>> waiting for the response to the fourth packet it just sent out.
>>>>>
>>>>> If we look at the iputils ping manual for the -w deadline option
>>>>> we can read that this is expected behavior:
>>>>>
>>>>>> Specify a timeout, in seconds, before ping exits regardless of
>>>>>> how many packets have been sent or received. In this case ping
>>>>>> does not stop after count packet are sent, it waits either for
>>>>>> deadline expire or until count probes are answered or for some
>>>>>> error notification from network.
>>>>>
>>>>> To avoid these kinds of failures in checks where a response is
>>>>> expected, we replace ping -w with ping -W.
>>>>>
>>>>> We keep ping -w for checks where it is expected to NOT get a
>>>>> response.
>>>>>
>>>>> Signed-off-by: Frode Nordahl <[email protected]>
>>>>
>>>> Thanks Frode,
>>>>
>>>> TIL about -w and -W.
>>>
>>> I learned about -W as well. :)
>>>
>>> Thanks, Frode, for figuring out the cause of these failures!  I've seen
>>> them before, but didn't dig too deep to find a cause.  OVN also has them
>>> from time to time.
>>
>> yw, we run all the system tests in an automated fashion as part of the
>> openvswitch package regression testing, and we see them quite
>> frequently, most likely due to the load of the CI infrastructure.
>>
>>> Though I'm not sure if -W is the right choice.  Reading the description:
>>>
>>>        -W timeout
>>>            Time to wait for a response, in seconds. The
>>>            option affects only timeout in absence of any
>>>            responses, otherwise ping waits for two RTTs.
>>>            Real number allowed with dot as a decimal
>>>            separator (regardless locale setup). 0 means
>>>            infinite timeout.
>>>
>>> And I don't really like the 'in absence of ANY responses' part of it.
>>>
>>> So, IIUC, if we send 3 packets, first gets replied and the other two
>>> are dropped somewhere, ping will ignore the timeout and will wait
>>> indefinitely.  Unfortunately, OVS gives the first packet a special
>>> treatment, so potential for this scenario to happen is rather high.
>>> This may break CI systems, getting them stuck testing one patch. And
>>> it doesn't seem like we can mix -w and -W, at least the behavior is
>>> not really defined in this case.
>>
>> It also says "otherwise ping waits for two RTTs.", so it will not wait
>> indefinitely. The documentation is a bit convoluted though so I went
>> to look so that we can be sure about what it will do.
>>
>> On arrival of the first packet, ping will gather various information
>> [0] which will be used to compute the RTT [1], which is used when
>> initializing the waittime [2][3].
>>
>> So it appears to me -W would cover the scenario laid out above, i.e.
>> if we get one reply quickly and the rest are lost, the computed RTT
>> would have a ping exit within a reasonable timeframe. Even if the
>> first response comes near the timeout value, the RTT would not be more
>> than 6 seconds for a -W of 3.
> 
> Looks like I misread the docs.  Thanks for digging this through!
> It does indeed look like it will not wait for too long.  I also
> tested this with the following set of OpenFlow rules that allows
> exactly one ICMP packet to go through and drops the others (the
> interval should be a bit higher for this to work, because it
> relies on revalidation to update the flows):
> 
>  table=0 priority=100 icmp 
> actions=learn(table=0,priority=110,eth_type=0x0800,nw_proto=1,NXM_OF_ETH_SRC),normal
>  table=0 priority=0 actions=normal
> 
> It does not wait indefinitely indeed.  However, I can't reproduce
> the RTT thing.  In my testing it waits for an extra interval (-i)
> instead.  But maybe it's because RTT is much lower than the interval.
> 
>>
>> 0: 
>> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping.c#L1654-L1660
>> 1: 
>> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L761
>> 2: 
>> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L599
>> 3: 
>> https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L263-L268
>>
>>> Would be really nice to use fping instead that has simple and very
>>> straightforward arguments without side effects, but once again RHEL
>>> doesn't package it...
>>>
>>> Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead?
>>
>> That would also work, either option works for me, what would be your 
>> preference?
> 
> The current -W solution seems fine for now.  I'll run a few more
> tests with it and then apply if it passes for me.
> 
>>
>>> Another option might be to slightly reduce the deadline, so the 4th
>>> packet will not be sent.  But that sounds fragile.
>>
>> Agreed.
>>
> 

Thanks, Frode and Simon!  Applied this one and backported down to 2.17.

For the future: We might want to create a macro for running ping to avoid
such big changes.  But we'll first need to figure out why may tests are
using different configurations.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to