Luke Chen created KAFKA-12892:
---------------------------------

             Summary: InvalidACLException thrown in tests caused jenkins build 
unstable
                 Key: KAFKA-12892
                 URL: https://issues.apache.org/jira/browse/KAFKA-12892
             Project: Kafka
          Issue Type: Bug
            Reporter: Luke Chen
         Attachments: image-2021-06-04-21-05-57-222.png

In KAFKA-12866, we fixed the issue that Kafka required ZK root access even when 
using a chroot. But after the PR merged (build #183), trunk build keeps failing 
at least one test group (mostly, JDK 15 and Scala 2.13). The build result will 
said nothing useful:
{code:java}
> Task :core:integrationTest FAILED
[2021-06-04T03:19:18.974Z] 
[2021-06-04T03:19:18.974Z] FAILURE: Build failed with an exception.
[2021-06-04T03:19:18.974Z] 
[2021-06-04T03:19:18.974Z] * What went wrong:
[2021-06-04T03:19:18.974Z] Execution failed for task ':core:integrationTest'.
[2021-06-04T03:19:18.974Z] > Process 'Gradle Test Executor 128' finished with 
non-zero exit value 1
[2021-06-04T03:19:18.974Z]   This problem might be caused by incorrect test 
process configuration.
[2021-06-04T03:19:18.974Z]   Please refer to the test execution section in the 
User Manual at 
https://docs.gradle.org/7.0.2/userguide/java_testing.html#sec:test_execution
{code}
 

After investigation, I found the failed tests is because there are many 
`InvalidACLException` thrown during the tests, ex:

 
{code:java}
GssapiAuthenticationTest > testServerNotFoundInKerberosDatabase() FAILED
[2021-06-04T02:25:45.419Z]     
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = 
InvalidACL for /config/topics/__consumer_offsets
[2021-06-04T02:25:45.419Z]         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:128)
[2021-06-04T02:25:45.419Z]         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
[2021-06-04T02:25:45.419Z]         at 
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
[2021-06-04T02:25:45.419Z]         at 
kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1729)
[2021-06-04T02:25:45.419Z]         at 
kafka.zk.KafkaZkClient.createOrSet$1(KafkaZkClient.scala:366)
[2021-06-04T02:25:45.419Z]         at 
kafka.zk.KafkaZkClient.setOrCreateEntityConfigs(KafkaZkClient.scala:376)
[2021-06-04T02:25:45.419Z]         at 
kafka.zk.AdminZkClient.createTopicWithAssignment(AdminZkClient.scala:109)
[2021-06-04T02:25:45.419Z]         at 
kafka.zk.AdminZkClient.createTopic(AdminZkClient.scala:60)
[2021-06-04T02:25:45.419Z]         at 
kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:357)
[2021-06-04T02:25:45.419Z]         at 
kafka.utils.TestUtils$.createTopic(TestUtils.scala:848)
[2021-06-04T02:25:45.419Z]         at 
kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:428)
[2021-06-04T02:25:45.419Z]         at 
kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:109)
[2021-06-04T02:25:45.419Z]         at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:84)
[2021-06-04T02:25:45.419Z]         at 
kafka.server.GssapiAuthenticationTest.setUp(GssapiAuthenticationTest.scala:68)
{code}
 

Log can be found 
[here|[https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0]|https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/195/nodes/14/steps/145/log/?start=0].]

After tracing back, I found it could because we add a test in the KAFKA-12866 
to lock root access in zookeeper, but somehow it didn't unlock after the test 
in testChrootExistsAndRootIsLocked. Also, while all the InvalidACLException 
failed tests happened right after testChrootExistsAndRootIsLocked not long. Ex: 
below testChrootExistsAndRootIsLocked completed at 02:24:30, and the above 
failed test is at 02:25:45 (and following more than 10 tests with the same 
InvalidACLException. 
{code:java}
[2021-06-04T02:24:29.370Z] ZkClientAclTest > testChrootExistsAndRootIsLocked() 
STARTED
[2021-06-04T02:24:30.321Z] 
[2021-06-04T02:24:30.321Z] ZkClientAclTest > testChrootExistsAndRootIsLocked() 
PASSED{code}
 

!image-2021-06-04-21-05-57-222.png|width=489,height=1111!

We should have further investigation to see how to improve the test to avoid 
breaking the build. Before that, we can disable the test first. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to