On Thu, Jan 10, 2019 at 11:44:05AM +0300, Ilya Maximets wrote: > On 09.01.2019 23:27, Ben Pfaff wrote: > > On Wed, Jan 09, 2019 at 08:28:54PM +0300, Ilya Maximets wrote: > >> On 27.12.2018 20:36, Ben Pfaff wrote: > >>> On Wed, Dec 26, 2018 at 06:23:56PM +0300, Ilya Maximets wrote: > >>>> On some systems in case where remote is not responding, socket could > >>>> remain in SYN_SENT state for a really long time without errors waiting > >>>> for connection. This leads to situations where vconn connection hangs > >>>> for a few minutes waiting for connection to the DOWN remote. > >>>> > >>>> For example, this situation emulated by "refuse-connection" vconn > >>>> testcase. This leads to test failures because Alarm signal arrives much > >>>> faster than ETIMEDOUT from the socket: > >>>> > >>>> ./vconn.at:21: ovstest test-vconn refuse-connection tcp > >>>> Alarm clock > >>>> stderr: > >>>> |socket_util|INFO|0:127.0.0.1: listening on port 63812 > >>>> |poll_loop|DBG|wakeup due to 0-ms timeout > >>>> |poll_loop|DBG|wakeup due to 10155-ms timeout > >>>> |fatal_signal|WARN|terminating with signal 14 (Alarm clock) > >>>> ./vconn.at:21: exit code was 142, expected 0 > >>>> vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED > >>>> > >>>> This patch allowes to specify timeout value for vconn blocking > >>>> connections. If the connection takes more time, socket will be closed > >>>> with ETIMEDOUT error code. Negative value could be used to wait > >>>> infinitely. > >>>> > >>>> Signed-off-by: Ilya Maximets <[email protected]> > >>> > >>> Same comments as patch 2. > >>> > >>> Are the timeouts only useful for the test cases? I wonder whether just > >>> calling alarm(10); at the beginning of the test programs would be just > >>> as helpful. On the other hand, it would make using a debugger on those > >>> programs harder. > >> > >> I guess, we have alarms in all the test programs. > >> The issue here is that some test apps like 'test_refuse_connection' treats > >> connection failure as a success. But on some systems, wrong connections > >> hangs > >> for a really long time and alarm kills the test application. In this case > >> we can't say for sure if the test failed or not, i.e. if it was expected > >> connection failure or other random issue that forced the application to > >> hang. > >> > >> stream connection tests even worse, because they are trying to sequentially > >> establish connection to one of 3 different remotes while only one of them > >> is > >> correct. And it will never try to connect to correct one if the blocking > >> connection to wrong port will hang for a few minutes. It'll be simply > >> killed > >> by alarm. > > > > It should be possible to tell what caused the test program to exit by > > testing the the exit status. When a program exits due to a signal, a > > Bourne-compatible shell sets $? to 128 plus the signal number. Usually, > > it's good enough just to know that the process died with an unusual exit > > status, but you can get the particular signal name back with "kill -l > > $?", e.g. on Linux "kill -l 142" prints "ALRM". This behavior is > > specified by POSIX so it should be portable. > > Yes, we can detect that app was killed by alarm, but we can't say if it was > expected hang while connecting to the wrong port or it was just too long > execution due to random environment issue or a bug. > > Let's look at "multiple remotes" test cases. Their workflow is following: > > 1. alarm(10) > 2. Initialize idl with multiple remotes. > 2. RPC: Try to connect to WRONG_PORT_1. Fail expected. > 3. RPC: Try to connect to right port. > 4. Perform some ovsdb transactions. > 5. Check result. > > Step 2 always hangs in CirrusCI environment and app dies there by alarm. > We can't treat this as success because we didn't check anything useful.
Oh, I see, we have some tests where the expected behavior is to hang, but we want to make sure that it's for the right reason. I didn't properly understand that. I'll take a look at the new patches. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
