Luke Chen created KAFKA-14242:
---------------------------------

             Summary: Hanging logManager in 
testReloadUpdatedFilesWithoutConfigChange test
                 Key: KAFKA-14242
                 URL: https://issues.apache.org/jira/browse/KAFKA-14242
             Project: Kafka
          Issue Type: Test
            Reporter: Luke Chen
            Assignee: Luke Chen


Recently, we got a lot of build failed (and terminated) with core:unitTest 
failure. The failed messages look like this:
FAILURE: Build failed with an exception.
[2022-09-14T09:51:52.190Z] 
[2022-09-14T09:51:52.190Z] * What went wrong:
[2022-09-14T09:51:52.190Z] Execution failed for task ':core:unitTest'.
[2022-09-14T09:51:52.190Z] > Process 'Gradle Test Executor 128' finished with 
non-zero exit value 1{{}}
After investigation, I found one reason of it (maybe there are other reasons). 
In {{BrokerMetadataPublisherTest#testReloadUpdatedFilesWithoutConfigChange}} 
test, we created logManager twice, but when cleanup, we only close one of them. 
So, there will be a log cleaner keeping running. But during this time, the temp 
log dirs are deleted, so it will {{{}Exit.halt(1){}}}, and got the error we saw 
in gradle, like this code did when we encounter IOException in all our log dirs:
fatal(s"Shutdown broker because all log dirs in ${logDirs.mkString(", ")} have 
failed")
Exit.halt(1){{}}
And, why does it sometimes pass, sometimes failed? Because during test cluster 
close, we shutdown broker first, and then other components. And the log cleaner 
is triggered in an interval. So, if the cluster can close fast enough, and 
finish this test, it'll be passed. Otherwise, it'll exit with 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to