[ 
https://issues.apache.org/jira/browse/KAFKA-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-16349.
---------------------------------
    Fix Version/s: 3.8.0
         Assignee: Greg Harris
       Resolution: Fixed

> ShutdownableThread fails build by calling Exit with race condition
> ------------------------------------------------------------------
>
>                 Key: KAFKA-16349
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16349
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.8.0
>            Reporter: Greg Harris
>            Assignee: Greg Harris
>            Priority: Minor
>             Fix For: 3.8.0
>
>
> `ShutdownableThread` calls `Exit.exit()` when the thread's operation throws 
> FatalExitError. In normal operation, this calls System.exit, and exits the 
> process. In tests, the exit procedure is masked with Exit.setExitProcedure to 
> prevent tests that encounter a FatalExitError from crashing the test JVM.
> Masking of exit procedures is usually done in BeforeEach/AfterEach 
> annotations, with the exit procedures cleaned up immediately after the test 
> finishes. If the body of the test creates a ShutdownableThread that outlives 
> the test, such as by omitting `ShutdownableThread#awaitShutdown`, by having 
> `ShutdownableThread#awaitShutdown` be interrupted by a test timeout, or by a 
> race condition between `Exit.resetExitProcedure` and `Exit.exit`, then 
> System.exit() can be called erroneously.
>  
> {noformat}
> // First, in the test thread:
> Exit.setExitProcedure(...)
> try {
>     new ShutdownableThread(...).start()
> } finally {
>     Exit.resetExitProcedure()
> }
> // Second, in the ShutdownableThread:
> try {
>     throw new FatalExitError(...)
> } catch (FatalExitError e) {
>     Exit.exit(...) // Calls real System.exit()
> }{noformat}
>  
> This can be resolved by one of the following:
>  # Eliminate FatalExitError usages in code when setExitProcedure is in-use
>  # Eliminate the Exit.exit call from ShutdownableThread, and instead 
> propagate this error to another thread to handle without a race-condition
>  # Eliminate resetExitProcedure by refactoring Exit to be non-static
> FatalExitError is in use in a small number of places, but may be difficult to 
> eliminate:
>  * FinalizedFeatureChangeListener
>  * InterBrokerSendThread
>  * TopicBasedRemoteLogMetadataManager
> There are many other places where Exit is called from a background thread, 
> including some implementations of ShutdownableThread which don't use 
> FatalExitError.
> The effect of this bug is that the build is flaky, as race 
> conditions/timeouts in tests can cause the gradle executors to exit with 
> status code 1, which has happened 26 times in the last 28 days. I have not 
> yet been able to confirm this bug is happening in other tests, but I do have 
> a deterministic reproduction case with the exact same symptoms:
> {noformat}
> Gradle Test Run :core:test > Gradle Test Executor 38 > ShutdownableThreadTest 
> > testShutdownWhenTestTimesOut(boolean) > 
> "testShutdownWhenTestTimesOut(boolean).isInterruptible=true" SKIPPED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':core:test'.
> > Process 'Gradle Test Executor 38' finished with non-zero exit value 1
>   This problem might be caused by incorrect test process configuration.
>   For more on test execution, please refer to 
> https://docs.gradle.org/8.6/userguide/java_testing.html#sec:test_execution in 
> the Gradle documentation.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to