On Wed, Oct 25, 2023 at 11:33 PM Ilya Maximets <[email protected]> wrote:
>
> On 10/25/23 11:45, Simon Horman wrote:
> > On Sat, Oct 21, 2023 at 05:04:48PM +0200, Frode Nordahl wrote:
> >> Many system tests currently use ping with the combination of a
> >> low packet count (-c 3), short interval between sends (-i 0.3)
> >> and a _deadline_ of 2 seconds (-d 2).
> >>
> >> This combination of options may lead to a situation where more
> >> than count packets are sent however ping will stop when count
> >> packets are received. This results in a failed test due to how
> >> the result is checked, for example:
> >>
> >>     ping6 -q -c 3 -i 0.3 -w 2 fc00::3 | FORMAT_PING
> >>     @@ -1,2 +1,2 @@
> >>     -3 packets transmitted, 3 received, 0% packet loss, time 0ms
> >>     +4 packets transmitted, 3 received, 25% packet loss, time 0ms
> >>
> >> To reiterate, in the above example there is no packet loss, but
> >> ping stops after _receiving_ 3 packets, not bothering with
> >> waiting for the response to the fourth packet it just sent out.
> >>
> >> If we look at the iputils ping manual for the -w deadline option
> >> we can read that this is expected behavior:
> >>
> >>> Specify a timeout, in seconds, before ping exits regardless of
> >>> how many packets have been sent or received. In this case ping
> >>> does not stop after count packet are sent, it waits either for
> >>> deadline expire or until count probes are answered or for some
> >>> error notification from network.
> >>
> >> To avoid these kinds of failures in checks where a response is
> >> expected, we replace ping -w with ping -W.
> >>
> >> We keep ping -w for checks where it is expected to NOT get a
> >> response.
> >>
> >> Signed-off-by: Frode Nordahl <[email protected]>
> >
> > Thanks Frode,
> >
> > TIL about -w and -W.
>
> I learned about -W as well. :)
>
> Thanks, Frode, for figuring out the cause of these failures!  I've seen
> them before, but didn't dig too deep to find a cause.  OVN also has them
> from time to time.

yw, we run all the system tests in an automated fashion as part of the
openvswitch package regression testing, and we see them quite
frequently, most likely due to the load of the CI infrastructure.

> Though I'm not sure if -W is the right choice.  Reading the description:
>
>        -W timeout
>            Time to wait for a response, in seconds. The
>            option affects only timeout in absence of any
>            responses, otherwise ping waits for two RTTs.
>            Real number allowed with dot as a decimal
>            separator (regardless locale setup). 0 means
>            infinite timeout.
>
> And I don't really like the 'in absence of ANY responses' part of it.
>
> So, IIUC, if we send 3 packets, first gets replied and the other two
> are dropped somewhere, ping will ignore the timeout and will wait
> indefinitely.  Unfortunately, OVS gives the first packet a special
> treatment, so potential for this scenario to happen is rather high.
> This may break CI systems, getting them stuck testing one patch. And
> it doesn't seem like we can mix -w and -W, at least the behavior is
> not really defined in this case.

It also says "otherwise ping waits for two RTTs.", so it will not wait
indefinitely. The documentation is a bit convoluted though so I went
to look so that we can be sure about what it will do.

On arrival of the first packet, ping will gather various information
[0] which will be used to compute the RTT [1], which is used when
initializing the waittime [2][3].

So it appears to me -W would cover the scenario laid out above, i.e.
if we get one reply quickly and the rest are lost, the computed RTT
would have a ping exit within a reasonable timeframe. Even if the
first response comes near the timeout value, the RTT would not be more
than 6 seconds for a -W of 3.

0: 
https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping.c#L1654-L1660
1: 
https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L761
2: 
https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L599
3: 
https://github.com/iputils/iputils/blob/0cc6da796b9a64113152c071088701cb95a72ae8/ping/ping_common.c#L263-L268

> Would be really nice to use fping instead that has simple and very
> straightforward arguments without side effects, but once again RHEL
> doesn't package it...
>
> Maybe we could use '$ timeout 2 ping6 -q -c 3 -i 0.3 fc00::3' instead?

That would also work, either option works for me, what would be your preference?

> Another option might be to slightly reduce the deadline, so the 4th
> packet will not be sent.  But that sounds fragile.

Agreed.

-- 
Frode Nordahl

> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to