On Thu, Jan 10, 2019 at 11:44:05AM +0300, Ilya Maximets wrote:
> On 09.01.2019 23:27, Ben Pfaff wrote:
> > On Wed, Jan 09, 2019 at 08:28:54PM +0300, Ilya Maximets wrote:
> >> On 27.12.2018 20:36, Ben Pfaff wrote:
> >>> On Wed, Dec 26, 2018 at 06:23:56PM +0300, Ilya Maximets wrote:
> >>>> On some systems in case where remote is not responding, socket could
> >>>> remain in SYN_SENT state for a really long time without errors waiting
> >>>> for connection. This leads to situations where vconn connection hangs
> >>>> for a few minutes waiting for connection to the DOWN remote.
> >>>>
> >>>> For example, this situation emulated by "refuse-connection" vconn
> >>>> testcase. This leads to test failures because Alarm signal arrives much
> >>>> faster than ETIMEDOUT from the socket:
> >>>>
> >>>>   ./vconn.at:21: ovstest test-vconn refuse-connection tcp
> >>>>   Alarm clock
> >>>>   stderr:
> >>>>   |socket_util|INFO|0:127.0.0.1: listening on port 63812
> >>>>   |poll_loop|DBG|wakeup due to 0-ms timeout
> >>>>   |poll_loop|DBG|wakeup due to 10155-ms timeout
> >>>>   |fatal_signal|WARN|terminating with signal 14 (Alarm clock)
> >>>>   ./vconn.at:21: exit code was 142, expected 0
> >>>>   vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED
> >>>>
> >>>> This patch allowes to specify timeout value for vconn blocking
> >>>> connections. If the connection takes more time, socket will be closed
> >>>> with ETIMEDOUT error code. Negative value could be used to wait
> >>>> infinitely.
> >>>>
> >>>> Signed-off-by: Ilya Maximets <[email protected]>
> >>>
> >>> Same comments as patch 2.
> >>>
> >>> Are the timeouts only useful for the test cases?  I wonder whether just
> >>> calling alarm(10); at the beginning of the test programs would be just
> >>> as helpful.  On the other hand, it would make using a debugger on those
> >>> programs harder.
> >>
> >> I guess, we have alarms in all the test programs.
> >> The issue here is that some test apps like 'test_refuse_connection' treats
> >> connection failure as a success. But on some systems, wrong connections 
> >> hangs
> >> for a really long time and alarm kills the test application. In this case
> >> we can't say for sure if the test failed or not, i.e. if it was expected
> >> connection failure or other random issue that forced the application to 
> >> hang.
> >>
> >> stream connection tests even worse, because they are trying to sequentially
> >> establish connection to one of 3 different remotes while only one of them 
> >> is
> >> correct. And it will never try to connect to correct one if the blocking
> >> connection to wrong port will hang for a few minutes. It'll be simply 
> >> killed
> >> by alarm.
> > 
> > It should be possible to tell what caused the test program to exit by
> > testing the the exit status.  When a program exits due to a signal, a
> > Bourne-compatible shell sets $? to 128 plus the signal number.  Usually,
> > it's good enough just to know that the process died with an unusual exit
> > status, but you can get the particular signal name back with "kill -l
> > $?", e.g. on Linux "kill -l 142" prints "ALRM".  This behavior is
> > specified by POSIX so it should be portable.
> 
> Yes, we can detect that app was killed by alarm, but we can't say if it was
> expected hang while connecting to the wrong port or it was just too long
> execution due to random environment issue or a bug.
> 
> Let's look at "multiple remotes" test cases. Their workflow is following:
> 
>   1. alarm(10)
>   2. Initialize idl with multiple remotes.
>   2. RPC: Try to connect to WRONG_PORT_1. Fail expected.
>   3. RPC: Try to connect to right port.
>   4. Perform some ovsdb transactions.
>   5. Check result.
> 
> Step 2 always hangs in CirrusCI environment and app dies there by alarm.
> We can't treat this as success because we didn't check anything useful.

Oh, I see, we have some tests where the expected behavior is to hang,
but we want to make sure that it's for the right reason.  I didn't
properly understand that.  I'll take a look at the new patches.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to