smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) 
the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. 
TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run 
in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   
<forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the 
same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is 
the proposed value is significant bigger than the expected time of the slowest 
test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in 
[this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't 
really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator) 
 Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
        at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem 
[here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, 
it doesn't seem to give any useful information in the log. And it is running 
for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 
min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] 
------------------------------------------------------------------------
   [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-ozone-integration-test: There was a timeout or other error in 
the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please 
read the following articles:
   [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of 
this jira is to add a class-global timeout to EACH of the test functions in 
those classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to 
work well either, as the whole it-client tests were running for ~53min = 
3180sec.
   It seems that as a result of the timeout, we are not getting useful logs to 
diagnose the flakiness. <- This is the main reason @arp7 asks me to add the 
timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out 
intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to