On Wed, Jan 09, 2019 at 08:28:54PM +0300, Ilya Maximets wrote: > On 27.12.2018 20:36, Ben Pfaff wrote: > > On Wed, Dec 26, 2018 at 06:23:56PM +0300, Ilya Maximets wrote: > >> On some systems in case where remote is not responding, socket could > >> remain in SYN_SENT state for a really long time without errors waiting > >> for connection. This leads to situations where vconn connection hangs > >> for a few minutes waiting for connection to the DOWN remote. > >> > >> For example, this situation emulated by "refuse-connection" vconn > >> testcase. This leads to test failures because Alarm signal arrives much > >> faster than ETIMEDOUT from the socket: > >> > >> ./vconn.at:21: ovstest test-vconn refuse-connection tcp > >> Alarm clock > >> stderr: > >> |socket_util|INFO|0:127.0.0.1: listening on port 63812 > >> |poll_loop|DBG|wakeup due to 0-ms timeout > >> |poll_loop|DBG|wakeup due to 10155-ms timeout > >> |fatal_signal|WARN|terminating with signal 14 (Alarm clock) > >> ./vconn.at:21: exit code was 142, expected 0 > >> vconn.at:21: 535. tcp vconn - refuse connection (vconn.at:21): FAILED > >> > >> This patch allowes to specify timeout value for vconn blocking > >> connections. If the connection takes more time, socket will be closed > >> with ETIMEDOUT error code. Negative value could be used to wait > >> infinitely. > >> > >> Signed-off-by: Ilya Maximets <[email protected]> > > > > Same comments as patch 2. > > > > Are the timeouts only useful for the test cases? I wonder whether just > > calling alarm(10); at the beginning of the test programs would be just > > as helpful. On the other hand, it would make using a debugger on those > > programs harder. > > I guess, we have alarms in all the test programs. > The issue here is that some test apps like 'test_refuse_connection' treats > connection failure as a success. But on some systems, wrong connections hangs > for a really long time and alarm kills the test application. In this case > we can't say for sure if the test failed or not, i.e. if it was expected > connection failure or other random issue that forced the application to hang. > > stream connection tests even worse, because they are trying to sequentially > establish connection to one of 3 different remotes while only one of them is > correct. And it will never try to connect to correct one if the blocking > connection to wrong port will hang for a few minutes. It'll be simply killed > by alarm.
It should be possible to tell what caused the test program to exit by testing the the exit status. When a program exits due to a signal, a Bourne-compatible shell sets $? to 128 plus the signal number. Usually, it's good enough just to know that the process died with an unusual exit status, but you can get the particular signal name back with "kill -l $?", e.g. on Linux "kill -l 142" prints "ALRM". This behavior is specified by POSIX so it should be portable. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
