[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up
[ https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791592#comment-16791592 ] Attila Sasvari commented on KAFKA-6332: --- Feel free to take it over. > Kafka system tests should use nc instead of log grep to detect start-up > --- > > Key: KAFKA-6332 > URL: https://issues.apache.org/jira/browse/KAFKA-6332 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Priority: Major > Labels: newbie > > [~ewencp] suggested using nc -z test instead of grepping the logs for a more > reliable test. This came up when the system tests were broken by a log > improvement change. > Reference: https://github.com/apache/kafka/pull/3834 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up
[ https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-6332: - Assignee: (was: Attila Sasvari) > Kafka system tests should use nc instead of log grep to detect start-up > --- > > Key: KAFKA-6332 > URL: https://issues.apache.org/jira/browse/KAFKA-6332 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Priority: Major > Labels: newbie > > [~ewencp] suggested using nc -z test instead of grepping the logs for a more > reliable test. This came up when the system tests were broken by a log > improvement change. > Reference: https://github.com/apache/kafka/pull/3834 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7813) JmxTool throws NPE when --object-name is omitted
Attila Sasvari created KAFKA-7813: - Summary: JmxTool throws NPE when --object-name is omitted Key: KAFKA-7813 URL: https://issues.apache.org/jira/browse/KAFKA-7813 Project: Kafka Issue Type: Bug Reporter: Attila Sasvari Running the JMX tool without --object-name parameter, results in a NullPointerException: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi ... Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143) at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143) at scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:93) at scala.collection.immutable.List.exists(List.scala:84) at kafka.tools.JmxTool$.main(JmxTool.scala:143) at kafka.tools.JmxTool.main(JmxTool.scala) {code} Documentation of the tool says: {code} --object-name A JMX object name to use as a query. This can contain wild cards, and this option can be given multiple times to specify more than one query. If no objects are specified all objects will be queried. {code} Running the tool with {{--object-name ''}}, also results in an NPE: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name '' ... Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$.main(JmxTool.scala:197) at kafka.tools.JmxTool.main(JmxTool.scala) {code} Runnig the tool with --object-name without an argument, the tool with OptionMissingRequiredArgumentException: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name Exception in thread "main" joptsimple.OptionMissingRequiredArgumentException: Option object-name requires an argument at joptsimple.RequiredArgumentOptionSpec.detectOptionArgument(RequiredArgumentOptionSpec.java:48) at joptsimple.ArgumentAcceptingOptionSpec.handleOption(ArgumentAcceptingOptionSpec.java:257) at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:513) at joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56) at joptsimple.OptionParser.parse(OptionParser.java:396) at kafka.tools.JmxTool$.main(JmxTool.scala:104) at kafka.tools.JmxTool.main(JmxTool.scala) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
[ https://issues.apache.org/jira/browse/KAFKA-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726830#comment-16726830 ] Attila Sasvari commented on KAFKA-7752: --- In kafka.zk.ZkData class, ZkAclStore.securePaths contains the following paths {code} 0 = "/kafka-acl" 1 = "/kafka-acl-changes" 2 = "/kafka-acl-extended/prefixed" 3 = "/kafka-acl-extended-changes" {code} When the migrator tool runs, ZkUtils.SecureZkRootPaths contains: {code} result = {$colon$colon@2669} "::" size = 14 0 = "/admin" 1 = "/brokers" 2 = "/cluster" 3 = "/config" 4 = "/controller" 5 = "/controller_epoch" 6 = "/isr_change_notification" 7 = "/latest_producer_id_block" 8 = "/log_dir_event_notification" 9 = "/delegation_token" 10 = "/kafka-acl" 11 = "/kafka-acl-changes" 12 = "/kafka-acl-extended/prefixed" 13 = "/kafka-acl-extended-changes" {code} Then the code recursively travels these paths and set ACL on all the child znodes. As a result, {{/kafka-acl-extended}} is missed. > zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended > - > > Key: KAFKA-7752 > URL: https://issues.apache.org/jira/browse/KAFKA-7752 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname > -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then > {{zookeeper-security-migration.sh --zookeeper.connect $(hostname > -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those. > I noticed that the tool did not remove ACLs on certain nodes: > {code} > ] getAcl /kafka/kafka-acl-extended > 'world,'anyone > : r > 'sasl,'kafka > : cdrwa > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
[ https://issues.apache.org/jira/browse/KAFKA-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7752: - Assignee: Attila Sasvari > zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended > - > > Key: KAFKA-7752 > URL: https://issues.apache.org/jira/browse/KAFKA-7752 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname > -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then > {{zookeeper-security-migration.sh --zookeeper.connect $(hostname > -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those. > I noticed that the tool did not remove ACLs on certain nodes: > {code} > ] getAcl /kafka/kafka-acl-extended > 'world,'anyone > : r > 'sasl,'kafka > : cdrwa > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
Attila Sasvari created KAFKA-7752: - Summary: zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended Key: KAFKA-7752 URL: https://issues.apache.org/jira/browse/KAFKA-7752 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.0.0 Reporter: Attila Sasvari Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then {{zookeeper-security-migration.sh --zookeeper.connect $(hostname -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those. I noticed that the tool did not remove ACLs on certain nodes: {code} ] getAcl /kafka/kafka-acl-extended 'world,'anyone : r 'sasl,'kafka : cdrwa {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf
[ https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7631: - Assignee: Viktor Somogyi (was: Attila Sasvari) > NullPointerException when SCRAM is allowed bu ScramLoginModule is not in > broker's jaas.conf > --- > > Key: KAFKA-7631 > URL: https://issues.apache.org/jira/browse/KAFKA-7631 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 2.0.0 >Reporter: Andras Beni >Assignee: Viktor Somogyi >Priority: Minor > > When user wants to use delegation tokens and lists {{SCRAM}} in > {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to > broker's JAAS configuration, a null pointer exception is thrown on broker > side and the connection is closed. > Meaningful error message should be logged and sent back to the client. > {code} > java.lang.NullPointerException > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376) > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489) > at org.apache.kafka.common.network.Selector.poll(Selector.java:427) > at kafka.network.Processor.poll(SocketServer.scala:679) > at kafka.network.Processor.run(SocketServer.scala:584) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured b
[ https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708633#comment-16708633 ] Attila Sasvari commented on KAFKA-7696: --- Thanks [~omkreddy]! Resolving this as a duplicate > kafka-delegation-tokens.sh using a config file that contains > security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to > connect to an SSL-enabled secured broker > - > > Key: KAFKA-7696 > URL: https://issues.apache.org/jira/browse/KAFKA-7696 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 1.1.0, 2.0.0 >Reporter: Attila Sasvari >Assignee: Viktor Somogyi >Priority: Major > > When the command-config file of kafka-delegation-tokens contain > security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user > error), the process throws a java.lang.OutOfMemoryError upon connection > attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. > {code} > [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread > 'kafka-admin-client-thread | adminclient-1': > (org.apache.kafka.common.utils.KafkaThread) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) > at org.apache.kafka.common.network.Selector.poll(Selector.java:468) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured br
[ https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-7696. --- Resolution: Duplicate > kafka-delegation-tokens.sh using a config file that contains > security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to > connect to an SSL-enabled secured broker > - > > Key: KAFKA-7696 > URL: https://issues.apache.org/jira/browse/KAFKA-7696 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 1.1.0, 2.0.0 >Reporter: Attila Sasvari >Assignee: Viktor Somogyi >Priority: Major > > When the command-config file of kafka-delegation-tokens contain > security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user > error), the process throws a java.lang.OutOfMemoryError upon connection > attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. > {code} > [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread > 'kafka-admin-client-thread | adminclient-1': > (org.apache.kafka.common.utils.KafkaThread) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) > at org.apache.kafka.common.network.Selector.poll(Selector.java:468) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured br
[ https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7696: - Assignee: Viktor Somogyi > kafka-delegation-tokens.sh using a config file that contains > security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to > connect to an SSL-enabled secured broker > - > > Key: KAFKA-7696 > URL: https://issues.apache.org/jira/browse/KAFKA-7696 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 1.1.0, 2.0.0 >Reporter: Attila Sasvari >Assignee: Viktor Somogyi >Priority: Major > > When the command-config file of kafka-delegation-tokens contain > security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user > error), the process throws a java.lang.OutOfMemoryError upon connection > attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. > {code} > [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread > 'kafka-admin-client-thread | adminclient-1': > (org.apache.kafka.common.utils.KafkaThread) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) > at org.apache.kafka.common.network.Selector.poll(Selector.java:468) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured bro
Attila Sasvari created KAFKA-7696: - Summary: kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured broker Key: KAFKA-7696 URL: https://issues.apache.org/jira/browse/KAFKA-7696 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.0.0, 1.1.0 Reporter: Attila Sasvari When the command-config file of kafka-delegation-tokens contain security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user error), the process throws a java.lang.OutOfMemoryError upon connection attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. {code} [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) at org.apache.kafka.common.network.Selector.poll(Selector.java:468) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7691) Encypt-then-MAC Delegation token metadata
[ https://issues.apache.org/jira/browse/KAFKA-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7691: - Assignee: Attila Sasvari > Encypt-then-MAC Delegation token metadata > - > > Key: KAFKA-7691 > URL: https://issues.apache.org/jira/browse/KAFKA-7691 > Project: Kafka > Issue Type: Improvement >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > Currently delegation token metadata is stored unencrypted in Zookeeper. > Kafka brokers could implement a strategy called > [Encrypt-then-MAC|https://en.wikipedia.org/wiki/Authenticated_encryption#Encrypt-then-MAC_(EtM)] > to encrypt sensitive metadata information about delegation tokens. > For more details, please read > https://cwiki.apache.org/confluence/display/KAFKA/KIP-395%3A+Encypt-then-MAC+Delegation+token+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7691) Encypt-then-MAC Delegation token metadata
Attila Sasvari created KAFKA-7691: - Summary: Encypt-then-MAC Delegation token metadata Key: KAFKA-7691 URL: https://issues.apache.org/jira/browse/KAFKA-7691 Project: Kafka Issue Type: Improvement Reporter: Attila Sasvari Currently delegation token metadata is stored unencrypted in Zookeeper. Kafka brokers could implement a strategy called [Encrypt-then-MAC|https://en.wikipedia.org/wiki/Authenticated_encryption#Encrypt-then-MAC_(EtM)] to encrypt sensitive metadata information about delegation tokens. For more details, please read https://cwiki.apache.org/confluence/display/KAFKA/KIP-395%3A+Encypt-then-MAC+Delegation+token+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf
[ https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7631: - Assignee: Attila Sasvari > NullPointerException when SCRAM is allowed bu ScramLoginModule is not in > broker's jaas.conf > --- > > Key: KAFKA-7631 > URL: https://issues.apache.org/jira/browse/KAFKA-7631 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 2.0.0 >Reporter: Andras Beni >Assignee: Attila Sasvari >Priority: Minor > > When user wants to use delegation tokens and lists {{SCRAM}} in > {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to > broker's JAAS configuration, a null pointer exception is thrown on broker > side and the connection is closed. > Meaningful error message should be logged and sent back to the client. > {code} > java.lang.NullPointerException > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376) > at > org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489) > at org.apache.kafka.common.network.Selector.poll(Selector.java:427) > at kafka.network.Processor.poll(SocketServer.scala:679) > at kafka.network.Processor.run(SocketServer.scala:584) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7455) JmxTool cannot connect to an SSL-enabled JMX RMI port
Attila Sasvari created KAFKA-7455: - Summary: JmxTool cannot connect to an SSL-enabled JMX RMI port Key: KAFKA-7455 URL: https://issues.apache.org/jira/browse/KAFKA-7455 Project: Kafka Issue Type: Bug Components: tools Reporter: Attila Sasvari When JmxTool tries to connect to an SSL-enabled JMX RMI port with JMXConnectorFactory'connect(), the connection attempt results in a "java.rmi.ConnectIOException: non-JRMP server at remote endpoint": {code} $ export KAFKA_OPTS="-Djavax.net.ssl.trustStore=/tmp/kafka.server.truststore.jks -Djavax.net.ssl.trustStorePassword=test" $ bin/kafka-run-class.sh kafka.tools.JmxTool --object-name "kafka.server:type=kafka-metrics-count" --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9393/jmxrmi ConnectIOException: non-JRMP server at remote endpoint]. java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: non-JRMP server at remote endpoint] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) at kafka.tools.JmxTool$.main(JmxTool.scala:120) at kafka.tools.JmxTool.main(JmxTool.scala) {code} The problem is that {{JmxTool}} does not specify {{SslRMIClientSocketFactory}} when it tries to connect https://github.com/apache/kafka/blob/70d90c371833b09cf934c8c2358171433892a085/core/src/main/scala/kafka/tools/JmxTool.scala#L120 {code} jmxc = JMXConnectorFactory.connect(url, null) {code} To connect to a secured RMI port, it should pass an envionrment map that contains a {{("com.sun.jndi.rmi.factory.socket", new SslRMIClientSocketFactory)}} entry. More info: - https://docs.oracle.com/cd/E19698-01/816-7609/security-35/index.html - https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands
[ https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620538#comment-16620538 ] Attila Sasvari commented on KAFKA-7418: --- - I started a discussion thread about the KIP. - [~mrsrinivas] please request contributor permissions via an email sent to d...@kafka.apache.org (including your JIRA ID). A Kafka committer or PMC member will add you to Kafka contributors, and then you will be able to assign the JIRA to yourself. > Add '--help' option to all available Kafka CLI commands > > > Key: KAFKA-7418 > URL: https://issues.apache.org/jira/browse/KAFKA-7418 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Attila Sasvari >Priority: Major > > Currently, the '--help' option is not recognized by some Kafka commands . For > example: > {code} > $ kafka-console-producer --help > help is not a recognized option > {code} > However, the '--help' option is supported by other commands: > {code} > $ kafka-verifiable-producer --help > usage: verifiable-producer [-h] --topic TOPIC --broker-list > HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput > THROUGHPUT] [--acks ACKS] >[--producer.config CONFIG_FILE] > [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX] > ... > {code} > To provide a consistent user experience, it would be nice to add a '--help' > option to all Kafka commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands
[ https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620470#comment-16620470 ] Attila Sasvari commented on KAFKA-7418: --- - Created a KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-373%3A+Add+%27--help%27+option+to+all+available+Kafka+CLI+commands I will send out an email to the kafka-dev mailing list soon. > Add '--help' option to all available Kafka CLI commands > > > Key: KAFKA-7418 > URL: https://issues.apache.org/jira/browse/KAFKA-7418 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Attila Sasvari >Priority: Major > > Currently, the '--help' option is not recognized by some Kafka commands . For > example: > {code} > $ kafka-console-producer --help > help is not a recognized option > {code} > However, the '--help' option is supported by other commands: > {code} > $ kafka-verifiable-producer --help > usage: verifiable-producer [-h] --topic TOPIC --broker-list > HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput > THROUGHPUT] [--acks ACKS] >[--producer.config CONFIG_FILE] > [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX] > ... > {code} > To provide a consistent user experience, it would be nice to add a '--help' > option to all Kafka commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands
[ https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620459#comment-16620459 ] Attila Sasvari commented on KAFKA-7418: --- [~mrsrinivas] sure, it is totally fine with me. However, it might be probably needed to write a KIP, as we add a new option to the commands, therefore their public interface changes (it is only an addition that shall be backward compatible). [~ijuma], [~hachikuji] what do you think? > Add '--help' option to all available Kafka CLI commands > > > Key: KAFKA-7418 > URL: https://issues.apache.org/jira/browse/KAFKA-7418 > Project: Kafka > Issue Type: Improvement > Components: tools >Reporter: Attila Sasvari >Priority: Major > > Currently, the '--help' option is not recognized by some Kafka commands . For > example: > {code} > $ kafka-console-producer --help > help is not a recognized option > {code} > However, the '--help' option is supported by other commands: > {code} > $ kafka-verifiable-producer --help > usage: verifiable-producer [-h] --topic TOPIC --broker-list > HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput > THROUGHPUT] [--acks ACKS] >[--producer.config CONFIG_FILE] > [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX] > ... > {code} > To provide a consistent user experience, it would be nice to add a '--help' > option to all Kafka commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands
Attila Sasvari created KAFKA-7418: - Summary: Add '--help' option to all available Kafka CLI commands Key: KAFKA-7418 URL: https://issues.apache.org/jira/browse/KAFKA-7418 Project: Kafka Issue Type: Improvement Components: tools Reporter: Attila Sasvari Currently, the '--help' option is not recognized by some Kafka commands . For example: {code} $ kafka-console-producer --help help is not a recognized option {code} However, the '--help' option is supported by other commands: {code} $ kafka-verifiable-producer --help usage: verifiable-producer [-h] --topic TOPIC --broker-list HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput THROUGHPUT] [--acks ACKS] [--producer.config CONFIG_FILE] [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX] ... {code} To provide a consistent user experience, it would be nice to add a '--help' option to all Kafka commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7392) Allow to specify subnet for Docker containers using standard CIDR notation
[ https://issues.apache.org/jira/browse/KAFKA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7392: - Assignee: Attila Sasvari > Allow to specify subnet for Docker containers using standard CIDR notation > -- > > Key: KAFKA-7392 > URL: https://issues.apache.org/jira/browse/KAFKA-7392 > Project: Kafka > Issue Type: Improvement > Components: system tests >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > During Kafka system test execution, the IP range of the Docker subnet, > 'ducknet' is allocated by Docker. > {code} > docker network inspect ducknet > [ > { > "Name": "ducknet", > "Id": > "f4325c524feee777817b9cc57b91634e20de96127409c1906c2c156bfeb4beeb", > "Created": "2018-09-09T11:53:40.4332613Z", > "Scope": "local", > "Driver": "bridge", > "EnableIPv6": false, > "IPAM": { > "Driver": "default", > "Options": {}, > "Config": [ > { > "Subnet": "172.23.0.0/16", > "Gateway": "172.23.0.1" > } > ] > }, > {code} > The default bridge (docker0) can be controlled > [externally|https://success.docker.com/article/how-do-i-configure-the-default-bridge-docker0-network-for-docker-engine-to-a-different-subnet] > through etc/docker/daemon.json, however, subnet created by ducknet is not. > It might be a problem as many businesses make extensive use of the > [RFC1918|https://tools.ietf.org/html/rfc1918] private address space (such as > 172.16.0.0/12 : 172.16.0.0 - 172.31.255.255) for internal networks (e.g. VPN). > h4. Proposed changes: > - Introduce a new subnet argument that can be used by {{ducker-ak up}} to > specify IP range using standard CIDR, extend help message with the following: > {code} > If --subnet is specified, default Docker subnet is overriden by given IP > address and netmask, > using standard CIDR notation. For example: 192.168.1.5/24. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7392) Allow to specify subnet for Docker containers using standard CIDR notation
Attila Sasvari created KAFKA-7392: - Summary: Allow to specify subnet for Docker containers using standard CIDR notation Key: KAFKA-7392 URL: https://issues.apache.org/jira/browse/KAFKA-7392 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Attila Sasvari During Kafka system test execution, the IP range of the Docker subnet, 'ducknet' is allocated by Docker. {code} docker network inspect ducknet [ { "Name": "ducknet", "Id": "f4325c524feee777817b9cc57b91634e20de96127409c1906c2c156bfeb4beeb", "Created": "2018-09-09T11:53:40.4332613Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.23.0.0/16", "Gateway": "172.23.0.1" } ] }, {code} The default bridge (docker0) can be controlled [externally|https://success.docker.com/article/how-do-i-configure-the-default-bridge-docker0-network-for-docker-engine-to-a-different-subnet] through etc/docker/daemon.json, however, subnet created by ducknet is not. It might be a problem as many businesses make extensive use of the [RFC1918|https://tools.ietf.org/html/rfc1918] private address space (such as 172.16.0.0/12 : 172.16.0.0 - 172.31.255.255) for internal networks (e.g. VPN). h4. Proposed changes: - Introduce a new subnet argument that can be used by {{ducker-ak up}} to specify IP range using standard CIDR, extend help message with the following: {code} If --subnet is specified, default Docker subnet is overriden by given IP address and netmask, using standard CIDR notation. For example: 192.168.1.5/24. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication
[ https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607140#comment-16607140 ] Attila Sasvari commented on KAFKA-4544: --- Thanks [~omkreddy], I see your point. Today, it is not possible as --consumer-property is not exposed by the python wrapper (neither kafka_opts), but my patch will make console-consumer.py clever. > Add system tests for delegation token based authentication > -- > > Key: KAFKA-4544 > URL: https://issues.apache.org/jira/browse/KAFKA-4544 > Project: Kafka > Issue Type: Sub-task > Components: security >Reporter: Ashish Singh >Assignee: Attila Sasvari >Priority: Major > > Add system tests for delegation token based authentication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication
[ https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605944#comment-16605944 ] Attila Sasvari commented on KAFKA-4544: --- [~omkreddy] thanks for the info, I extended the test case to better cover the lifecycle of a delegation token based on your idea: - Create delegation token - Create a console-producer using SCRAM and delegation token and produce a message - Verify message is created (with kafka.search_data_files() ) - Create a console-consumer using SCRAM and delegation token and consume 1 message - Expire the token, immediately - Try producing one more message with the expired token - Verify the last message is not persisted by the broker Initially, I wanted to use console_consumer.py and verifiable clients to validate things (messages produced / consumed), but I ran into some issues: - jaas.conf / KafkaClient config cannot include more login modules {code} Multiple LoginModule-s in JAAS Caused by: java.lang.IllegalArgumentException: JAAS config property contains 2 login modules, should be 1 module at org.apache.kafka.common.security.JaasContext.load(JaasContext.java:95) at org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84) at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:119) at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:65) at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:88) at org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:419) {code} - To request a delegation token, we need GSSAPI (and use keytab), subsequently, consumers and producers use the delegation token. So I ended up constructing manually the jaas.config and client configs in my POC. - With and even without my changes, JMX failed to start up when I tried to run {{./ducker-ak test ../kafkatest/sanity_checks/test_console_consumer.py}}: {code} Exception in thread ConsoleConsumer-0-140287252789520-worker-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/usr/local/lib/python2.7/dist-packages/ducktape/services/background_thread.py", line 35, in _protected_worker self._worker(idx, node) File "/opt/kafka-dev/tests/kafkatest/services/console_consumer.py", line 229, in _worker self.start_jmx_tool(idx, node) File "/opt/kafka-dev/tests/kafkatest/services/monitor/jmx.py", line 86, in start_jmx_tool wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account) File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 36, in wait_until raise TimeoutError(err_msg) TimeoutError: ducker@ducker04: Jmx tool took too long to start {code} Right now a lot of things are [hardcoded|https://github.com/asasvari/kafka/commit/edfc37012079764d2a589dbf5f24ad04505975d4#diff-3e7b2bdbd55d075bcebbbe5ba8c4e269] (using shell scripts) in my POC. It would be nice to extract common functionalities and make them easily reusable (e.g. creating wrappers in python, for example, to do delegation token handling). > Add system tests for delegation token based authentication > -- > > Key: KAFKA-4544 > URL: https://issues.apache.org/jira/browse/KAFKA-4544 > Project: Kafka > Issue Type: Sub-task > Components: security >Reporter: Ashish Singh >Assignee: Attila Sasvari >Priority: Major > > Add system tests for delegation token based authentication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication
[ https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604530#comment-16604530 ] Attila Sasvari commented on KAFKA-4544: --- [~omkreddy] thanks a lot! - I started to work on this here: [https://github.com/asasvari/kafka/commit/6ce766c3ec17b7787415d278a9b59f15ed197c1c], can you take a quick look? - What kind of tests did you plan? We might want to also create subtasks for this ticket as it would be easier to review the changes. - So far I have only added a new test to verify that we can create a delegation token with {{kafka-delegation-tokens.sh}}. - I believe an other basic test is to start a console application (e.g. kafka-console-consumer) and test wether it can connect to the broker using the previously generated delegation token. > Add system tests for delegation token based authentication > -- > > Key: KAFKA-4544 > URL: https://issues.apache.org/jira/browse/KAFKA-4544 > Project: Kafka > Issue Type: Sub-task > Components: security >Reporter: Ashish Singh >Assignee: Attila Sasvari >Priority: Major > > Add system tests for delegation token based authentication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-4544) Add system tests for delegation token based authentication
[ https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-4544: - Assignee: Attila Sasvari (was: Manikumar) > Add system tests for delegation token based authentication > -- > > Key: KAFKA-4544 > URL: https://issues.apache.org/jira/browse/KAFKA-4544 > Project: Kafka > Issue Type: Sub-task > Components: security >Reporter: Ashish Singh >Assignee: Attila Sasvari >Priority: Major > > Add system tests for delegation token based authentication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication
[ https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598557#comment-16598557 ] Attila Sasvari commented on KAFKA-4544: --- Do you have the capacity to work on this [~omkreddy]? If there's anything I can do to help, please let me know. > Add system tests for delegation token based authentication > -- > > Key: KAFKA-4544 > URL: https://issues.apache.org/jira/browse/KAFKA-4544 > Project: Kafka > Issue Type: Sub-task > Components: security >Reporter: Ashish Singh >Assignee: Manikumar >Priority: Major > > Add system tests for delegation token based authentication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7289) Performance tools should allow user to specify output type
[ https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580792#comment-16580792 ] Attila Sasvari commented on KAFKA-7289: --- - I will start the KIP soon. It might be enough to deal with CSV in the first run. - I have started to work on a POC that allows ProducerPerformance to print out final results to an output file in CSV format: https://github.com/asasvari/kafka/commit/82bbff649c5afb2c30f56960172319d2c380fbcd. It will be probably a subtask of this JIRA. It uses Apache Commons CSV. {{--print-metrics}} is not handled (metrics are not written to the output file of the final results - could be handled in KAFKA-1939). {code} $ bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance ... --output-with-header Print out final results to output file with headers. (default: false) --output-type OUTPUT-TYPE Format type of the output file. By default it is CSV. (default: csv) --output-path OUTPUT-PATH Write final results to the file OUTPUT-PATH. $ bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic TOPIC --num-records 1 --throughput -1 --record-size 100 --producer-props bootstrap.servers=localhost:9092 --output-path producer_stats.csv --output-with-header $ cat producer_stats.csv records sent,records/sec,MB/sec,ms avg latency,ms max latency,ms 50th,ms 95th,ms 99th,ms 99.9th 1,6.7114093959731544,6.400498767827181E-4,142.0,142.0,142,142,142,142 {code} > Performance tools should allow user to specify output type > -- > > Key: KAFKA-7289 > URL: https://issues.apache.org/jira/browse/KAFKA-7289 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > Currently, org.apache.kafka.tools.ProducerPerformance and > kafka.tools.ConsumerPerformance do not provide command line options to > specify output type(s). > Sample output of ProducerPerformance is as follows: > {code} > 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg > latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, > 3842 ms 99.9th. > {code} > It would be, however, nice to allow users to generate performance reports in > a machine-readable format (such as CSV and JSON). This way, performance > results could be easily processed by external applications (e.g. displayed in > charts). > It will probably require a KIP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7289) Performance tools should allow user to specify output type
[ https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7289: - Assignee: Attila Sasvari > Performance tools should allow user to specify output type > -- > > Key: KAFKA-7289 > URL: https://issues.apache.org/jira/browse/KAFKA-7289 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > Currently, org.apache.kafka.tools.ProducerPerformance and > kafka.tools.ConsumerPerformance do not provide command line options to > specify output type(s). > Sample output of ProducerPerformance is as follows: > {code} > 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg > latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, > 3842 ms 99.9th. > {code} > It would be, however, nice to allow users to generate performance reports in > a machine-readable format (such as CSV and JSON). This way, performance > results could be easily processed by external applications (e.g. displayed in > charts). > It will probably require a KIP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7289) Performance tools should allow user to specify output type
[ https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579719#comment-16579719 ] Attila Sasvari commented on KAFKA-7289: --- It is a generalization of KAFKA-1939. > Performance tools should allow user to specify output type > -- > > Key: KAFKA-7289 > URL: https://issues.apache.org/jira/browse/KAFKA-7289 > Project: Kafka > Issue Type: Improvement > Components: tools >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Priority: Major > > Currently, org.apache.kafka.tools.ProducerPerformance and > kafka.tools.ConsumerPerformance do not provide command line options to > specify output type(s). > Sample output of ProducerPerformance is as follows: > {code} > 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg > latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, > 3842 ms 99.9th. > {code} > It would be, however, nice to allow users to generate performance reports in > a machine-readable format (such as CSV and JSON). This way, performance > results could be easily processed by external applications (e.g. displayed in > charts). > It will probably require a KIP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7289) Performance tools should allow user to specify output type
Attila Sasvari created KAFKA-7289: - Summary: Performance tools should allow user to specify output type Key: KAFKA-7289 URL: https://issues.apache.org/jira/browse/KAFKA-7289 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.0.0 Reporter: Attila Sasvari Currently, org.apache.kafka.tools.ProducerPerformance and kafka.tools.ConsumerPerformance do not provide command line options to specify output type(s). Sample output of ProducerPerformance is as follows: {code} 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 3842 ms 99.9th. {code} It would be, however, nice to allow users to generate performance reports in a machine-readable format (such as CSV and JSON). This way, performance results could be easily processed by external applications (e.g. displayed in charts). It will probably require a KIP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7134) KafkaLog4jAppender - Appender exceptions are propagated to caller
[ https://issues.apache.org/jira/browse/KAFKA-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558325#comment-16558325 ] Attila Sasvari commented on KAFKA-7134: --- [~venkatpotru] please note if the underlying producer cannot connect to a Kafka broker (because the broker is not running), {{send()}} will fail and throw a {{TimeoutException}}. The producer will notice it and tries to log it: https://github.com/apache/kafka/blob/1d9a427225c64e7629a4eb2e2d129d5551185049/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L876 However, if the {{rootLogger}} in {{log4j.properties}} is set to the {{KafkaLog4jAppender}}, it will try to log this message and send it to Kafka, and it creates an infinite loop. > KafkaLog4jAppender - Appender exceptions are propagated to caller > - > > Key: KAFKA-7134 > URL: https://issues.apache.org/jira/browse/KAFKA-7134 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: venkata praveen >Assignee: Andras Katona >Priority: Major > > KafkaLog4jAppender exceptions are propagated to caller when Kafka is > down/slow/other, it may cause the application crash. Ideally appender should > print and ignore the exception > or should provide option to ignore/throw the exceptions like > 'ignoreExceptions' property of > https://logging.apache.org/log4j/2.x/manual/appenders.html#KafkaAppender -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7159) mark configuration files in confluent-kafka RPM SPEC file
[ https://issues.apache.org/jira/browse/KAFKA-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-7159. --- Resolution: Won't Fix > mark configuration files in confluent-kafka RPM SPEC file > - > > Key: KAFKA-7159 > URL: https://issues.apache.org/jira/browse/KAFKA-7159 > Project: Kafka > Issue Type: Improvement > Components: packaging >Affects Versions: 1.1.0 > Environment: RHEL7 >Reporter: Robert Fabisiak >Priority: Trivial > Labels: rpm > > All configuration files in kafka RPM SPEC file should be marked with %config > prefix in %files section. > This would prevent overwrites during install/upgrade and uninstall operations > [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/rpm_packaging_guide/index#files] > It's especially important to save configuration during package upgrades. > Section to change in SPEC file: > {code:java} > %files > %config(noreplace) %{_sysconfdir}/kafka/*.conf > %config(noreplace) %{_sysconfdir}/kafka/*.properties > {code} > It would also be good to mark documentation files with %doc -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7159) mark configuration files in confluent-kafka RPM SPEC file
[ https://issues.apache.org/jira/browse/KAFKA-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550735#comment-16550735 ] Attila Sasvari commented on KAFKA-7159: --- - Are you sure you filed the JIRA to the proper project? There is no code matching '%files' in apache/kafka - https://github.com/apache/kafka/search?q=%25files_q=%25files - However, Apache Bigtop has some code for packaging kafka, but it was not updated for a while: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/kafka/SPECS/kafka.spec - {{confluent-kafka RPM SPEC}} indicates you might have wanted to report this to Confluent > mark configuration files in confluent-kafka RPM SPEC file > - > > Key: KAFKA-7159 > URL: https://issues.apache.org/jira/browse/KAFKA-7159 > Project: Kafka > Issue Type: Improvement > Components: packaging >Affects Versions: 1.1.0 > Environment: RHEL7 >Reporter: Robert Fabisiak >Priority: Trivial > Labels: rpm > > All configuration files in kafka RPM SPEC file should be marked with %config > prefix in %files section. > This would prevent overwrites during install/upgrade and uninstall operations > [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/rpm_packaging_guide/index#files] > It's especially important to save configuration during package upgrades. > Section to change in SPEC file: > {code:java} > %files > %config(noreplace) %{_sysconfdir}/kafka/*.conf > %config(noreplace) %{_sysconfdir}/kafka/*.properties > {code} > It would also be good to mark documentation files with %doc -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-6960) Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-6960: - Assignee: Attila Sasvari > Remove the methods from the internal Scala AdminClient that are provided by > the new AdminClient > --- > > Key: KAFKA-6960 > URL: https://issues.apache.org/jira/browse/KAFKA-6960 > Project: Kafka > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Attila Sasvari >Assignee: Attila Sasvari >Priority: Major > > This is a follow-up task of KAFKA-6884. > We should remove all the methods from the internal Scala AdminClient that are > provided by the new AdminClient. To "safe delete" them (i.e. > {{deleteConsumerGroups, describeConsumerGroup, listGroups, listAllGroups, > listAllGroupsFlattened}}), related tests need to be reviewed and adjusted > (for example: the tests in core_tests and streams_test). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549082#comment-16549082 ] Attila Sasvari commented on KAFKA-6884: --- pull request was merged, thanks for the reviews! > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > Fix For: 2.1.0 > > > Now that we have KIP-222, we should update ConsumerGroupCommand to use the > new AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-7173) Update BrokerApiVersionsCommand to use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-7173: - Assignee: Attila Sasvari > Update BrokerApiVersionsCommand to use new AdminClient > -- > > Key: KAFKA-7173 > URL: https://issues.apache.org/jira/browse/KAFKA-7173 > Project: Kafka > Issue Type: Improvement > Components: admin >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > Labels: needs-kip > > This tool seems to be the last use of the deprecated scala AdminClient. This > may require a KIP to fix since the new AdminClient doesn't seem to expose > broker API versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6960) Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient
Attila Sasvari created KAFKA-6960: - Summary: Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient Key: KAFKA-6960 URL: https://issues.apache.org/jira/browse/KAFKA-6960 Project: Kafka Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Attila Sasvari This is a follow-up task of KAFKA-6884. We should remove all the methods from the internal Scala AdminClient that are provided by the new AdminClient. To "safe delete" them (i.e. {{deleteConsumerGroups, describeConsumerGroup, listGroups, listAllGroups, listAllGroupsFlattened}}), related tests need to be reviewed and adjusted (for example: the tests in core_tests and streams_test). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari updated KAFKA-6884: -- Description: Now that we have KIP-222, we should update ConsumerGroupCommand to use the new AdminClient. (was: Now that we have KIP-222, we should update consumer-groups.sh to use the new AdminClient.) > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update ConsumerGroupCommand to use the > new AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476020#comment-16476020 ] Attila Sasvari commented on KAFKA-6884: --- - tests that verify timeouts pass after making some changes in DescribeConsumerGroupTest FROM {code} case _: TimeoutException => // OK {code} TO: {code} case e: ExecutionException => assert(e.getMessage.contains("TimeoutException")) // OK {code} - [~andrasbeni] noticed that the Empty state was missing from ConsumerGroupState.java in the pull request https://github.com/apache/kafka/pull/4980 . I added it and now tests that exercise groups with no members pass > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475751#comment-16475751 ] Attila Sasvari commented on KAFKA-6884: --- Thanks [~hachikuji]. I have seen the pull request was updated, so I also updated [my work in progress patch|https://github.com/asasvari/kafka/tree/KAFKA-6884_ConsumerGroupCommand_should_use_new_AdminClient] too. Right now 9/31 tests fail in DescribeConsumerGroupTest: - Four of them are related to the change in behaviour of handling timeouts. In particual, now a ExecutionException is is thrown that carries the information. For example, {{testDescribeGroupOffsetsWithShortInitializationTimeout, testDescribeGroupMembersWithShortInitializationTimeout, testDescribeGroupStateWithShortInitializationTimeout, testDescribeGroupWithShortInitializationTimeout}} fail with: {code} java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. {code} (TimeoutException is embedded in ExecutionException) - Rest of the failures are related to assignments / group membership. As I see handling assignment information of ConsumerGroupDescription in KafkaAdminClient is different from the previous version. If consumer group is not in state "Stable", an empty list is returned, see [AdminClient.scala|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminClient.scala#L336]. Because of the change, when state is unknown, KafkaAdminClient returns : {code} java.lang.AssertionError: Expected no active member in describe group results, state: Some(Unknown), assignments: Some(List(PartitionAssignmentState(test.group,Some(localhost:58637 (id: 0 rack: null)),Some(foo),Some(0),Some(0),Some(0),Some(-),Some(-),Some(-),Some(0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at kafka.admin.DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers(DescribeConsumerGroupTest.scala:339) ... {code} I believe that is why {{testDescribeExistingGroupWithNoMembers, testDescribeOffsetsOfExistingGroupWithNoMembers, testDescribeSimpleConsumerGroup, testDescribeMembersOfExistingGroupWithNoMembers, testDescribeStateOfExistingGroupWithNoMembers}} fail, but I am still investigating. > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472041#comment-16472041 ] Attila Sasvari commented on KAFKA-6884: --- It seems assignmentStrategy is also needed. Added some new review comments to the pull request. > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471932#comment-16471932 ] Attila Sasvari commented on KAFKA-6884: --- [~hachikuji] Thanks, I downloaded the pull request as a diff & applied it. Using the new KafkaAdminClient, I am wondering how I can retrieve the coordinator / partition leader of the internal offset topic for a consumer group. [Earlier|https://github.com/apache/kafka/blob/c3921d489f4da80aad6f387158c33ec2e4bca52d/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L571] it was returned: in the [ConsumerGroupSummary|https://github.com/apache/kafka/blob/c3921d489f4da80aad6f387158c33ec2e4bca52d/core/src/main/scala/kafka/admin/AdminClient.scala#L300] object. > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470680#comment-16470680 ] Attila Sasvari commented on KAFKA-6884: --- [~hachikuji] thanks for the additional information! - I replaced the old, Scala based AdminClient with KafkaAdminClient in ConsumerGroupCommand, and started to resolve the issues. - As I see, currently AdminClient.scala fetches [metadata state information of the groups|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminClient.scala#L315] that is used in [ConsumerGroupCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L153] at multiple places (such as [collectGroupOffsets|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L587], [collectGroupMembers |https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L593], and so on). - KIP-222's DescribeConsumerGroupsResult returned by the new AdminClient's describeConsumerGroups does not include this information, see for example [the member fields |https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConsumerGroupsResult.java] and [ConsumerGroupDescription|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/ConsumerGroupDescription.java]. - Without state information we cannot say, for example, whether a consumer group is being rebalanced. I believe the problem is that KafkaAdminClient does not extract metadata state from [DescribeGroupsResponse.GroupMetadata|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L2396] when constructing the DescribeConsumerGroupsResult. Do I understand it correctly? What do you think: shall I fix it here or in a separate JIRA? > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468881#comment-16468881 ] Attila Sasvari commented on KAFKA-6884: --- [~hachikuji] consumer-groups.sh currently uses {{kafka.admin.ConsumerGroupCommand}}. Shall I modify {{ConsumerGroupCommand}} class so that it imports KafkaAdminClient and uses the new methods introduced by KIP-222? I believe we don't want to add a main method to KafkaAdminClient and adjust the shell script itself. What is the target version? If it is 2.0, then shall I also remove support for --zookeeper (old consumer)? > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient
[ https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-6884: - Assignee: Attila Sasvari > ConsumerGroupCommand should use new AdminClient > --- > > Key: KAFKA-6884 > URL: https://issues.apache.org/jira/browse/KAFKA-6884 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Attila Sasvari >Priority: Major > > Now that we have KIP-222, we should update consumer-groups.sh to use the new > AdminClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6883) KafkaShortnamer should allow to convert Kerberos principal name to upper case user name
[ https://issues.apache.org/jira/browse/KAFKA-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467207#comment-16467207 ] Attila Sasvari commented on KAFKA-6883: --- HADOOP-13984 is a similar issue. > KafkaShortnamer should allow to convert Kerberos principal name to upper case > user name > --- > > Key: KAFKA-6883 > URL: https://issues.apache.org/jira/browse/KAFKA-6883 > Project: Kafka > Issue Type: Improvement >Reporter: Attila Sasvari >Priority: Major > > KAFKA-5764 implemented support to convert Kerberos principal name to lower > case Linux user name via auth_to_local rules. > As a follow-up, KafkaShortnamer could be further extended to allow converting > principal names to uppercase by appending /U to the rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6883) KafkaShortnamer should allow to convert Kerberos principal name to upper case user name
Attila Sasvari created KAFKA-6883: - Summary: KafkaShortnamer should allow to convert Kerberos principal name to upper case user name Key: KAFKA-6883 URL: https://issues.apache.org/jira/browse/KAFKA-6883 Project: Kafka Issue Type: Improvement Reporter: Attila Sasvari KAFKA-5764 implemented support to convert Kerberos principal name to lower case Linux user name via auth_to_local rules. As a follow-up, KafkaShortnamer could be further extended to allow converting principal names to uppercase by appending /U to the rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6822) Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0
[ https://issues.apache.org/jira/browse/KAFKA-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari updated KAFKA-6822: -- Component/s: streams > Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0 > --- > > Key: KAFKA-6822 > URL: https://issues.apache.org/jira/browse/KAFKA-6822 > Project: Kafka > Issue Type: Bug > Components: consumer, streams >Reporter: Phil Mikhailov >Priority: Major > > We faced the problem when StateStore (0.10.2.1) stuck in loading data during > start of microservice. > Our configuration is Kafka 1.0.0 but microservices are built with Kafka > Streams 0.10.2.1. > We had to reset the stream offsets in order unblock microservices 'cause > restarts didn't help. > We faced the problem only once and didn't have a chance to reproduce it, so > we're sorry in advance for maybe poor explanations. > Below are details that we've managed to collect that time: > Kafka consumer 0.10.2.1 calculates offsets like this: > Fetcher:524 > {code:java} > long nextOffset = partRecords.get(partRecords.size() - 1).offset() + 1; > {code} > Get the latest offset from records (which were got from {{poll}}) plus 1. > So the next offset is estimated. > In Kafka 1.0.0 otherwise broker explicitly provides the next offset to fetch: > {code:java} > long nextOffset = partitionRecords.nextFetchOffset; > {code} > It returns the actual next offset but not estimated. > > That said, we had a situation when StateStore (0.10.2.1) stuck in loading > data. The reason was in {{ProcessorStateManager.restoreActiveState:245}} > which kept spinning in consumer loop 'cause this condition never happened: > {code:java} > } else if (restoreConsumer.position(storePartition) == endOffset) { > break; > } > {code} > > We assume that consumer 0.10.2.1 estimates endoffset incorrectly in a case of > compaction. > Or there is inconsistency between offsets calculation between 0.10.2.1 and > 1.0.0 which doesn't allow to use 0.10.2.1 consumer with 1.0.0 broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6822) Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0
[ https://issues.apache.org/jira/browse/KAFKA-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451772#comment-16451772 ] Attila Sasvari commented on KAFKA-6822: --- [~phil.mikhailov] I wonder if it is related to [KAFKA-6367|https://issues.apache.org/jira/browse/KAFKA-6367] - Fix StateRestoreListener To Use Correct Batch Ending Offset which is fixed in 1.0.1 & 1.1.0. Can you re-test with 1.0.1? > Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0 > --- > > Key: KAFKA-6822 > URL: https://issues.apache.org/jira/browse/KAFKA-6822 > Project: Kafka > Issue Type: Bug > Components: consumer >Reporter: Phil Mikhailov >Priority: Major > > We faced the problem when StateStore (0.10.2.1) stuck in loading data during > start of microservice. > Our configuration is Kafka 1.0.0 but microservices are built with Kafka > Streams 0.10.2.1. > We had to reset the stream offsets in order unblock microservices 'cause > restarts didn't help. > We faced the problem only once and didn't have a chance to reproduce it, so > we're sorry in advance for maybe poor explanations. > Below are details that we've managed to collect that time: > Kafka consumer 0.10.2.1 calculates offsets like this: > Fetcher:524 > {code:java} > long nextOffset = partRecords.get(partRecords.size() - 1).offset() + 1; > {code} > Get the latest offset from records (which were got from {{poll}}) plus 1. > So the next offset is estimated. > In Kafka 1.0.0 otherwise broker explicitly provides the next offset to fetch: > {code:java} > long nextOffset = partitionRecords.nextFetchOffset; > {code} > It returns the actual next offset but not estimated. > > That said, we had a situation when StateStore (0.10.2.1) stuck in loading > data. The reason was in {{ProcessorStateManager.restoreActiveState:245}} > which kept spinning in consumer loop 'cause this condition never happened: > {code:java} > } else if (restoreConsumer.position(storePartition) == endOffset) { > break; > } > {code} > > We assume that consumer 0.10.2.1 estimates endoffset incorrectly in a case of > compaction. > Or there is inconsistency between offsets calculation between 0.10.2.1 and > 1.0.0 which doesn't allow to use 0.10.2.1 consumer with 1.0.0 broker. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-6799. --- Resolution: Information Provided > Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Assignee: Attila Sasvari >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener$$Lambda$42.1776656466.run(Unknown > Source:-1) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:sh|title=Heartbeat thread|borderStyle=solid} > "kafka-coordinator-heartbeat-thread | helloWorldGroup@10728" daemon prio=5 > tid=0x36 nid=NA waiting for monitor entry > java.lang.Thread.State: BLOCKED >waiting for kafka-consumer-1@10733 to release lock on <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at java.lang.Object.wait(Object.java:-1) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:955) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari reassigned KAFKA-6799: - Assignee: Attila Sasvari > Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Assignee: Attila Sasvari >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener$$Lambda$42.1776656466.run(Unknown > Source:-1) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:sh|title=Heartbeat thread|borderStyle=solid} > "kafka-coordinator-heartbeat-thread | helloWorldGroup@10728" daemon prio=5 > tid=0x36 nid=NA waiting for monitor entry > java.lang.Thread.State: BLOCKED >waiting for kafka-consumer-1@10733 to release lock on <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at java.lang.Object.wait(Object.java:-1) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:955) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445529#comment-16445529 ] Attila Sasvari commented on KAFKA-6799: --- [~phdezann] do you have any update? I have been running the test since yesterday (set replication factor to 3 for __consumer_offsets topics), and did not encounter the issue anymore. I would like to resolve the issue with "Information Provided" or Not a Bug". Some more information: - I saw the following displayed on my stdout many times: {code} could not reproduce the issue this time, re-create the cluster from scratch Stops containers and removes containers, networks, volumes, and images created by `up`. {code} - When I have only 3 producers, it is also printed: {code} {"timestamp":1524215171045,"status":500,"error":"Internal Server Error","exception":"org.apache.kafka.common.errors.InvalidReplicationFactorException","message":"Replication fact or: 3 larger than available brokers: 2.","path":"/kafka/topic/create"}Producer started {code} - I also tested with 4 brokers, and it went well too. > Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45) > at >
[jira] [Comment Edited] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443921#comment-16443921 ] Attila Sasvari edited comment on KAFKA-6799 at 4/19/18 3:00 PM: [~phdezann] thanks for reporting this issue and creating the docker environment for reproducing it. - I had to set the environment variable M2_REPOSITORY before running the shell script: {{M2_REPOSITORY=/root/.m2/ ./spin.sh}} Then I saw the issue you described in the description. - I looked a bit around in the helloworld-kafka-1 docker container, and noticed that the replication factor for the internal topic was set to 1: {code} root@b6e9218f1761:/install/kafka# bin/kafka-topics.sh --describe -zookeeper 172.170.0.80:2181 Topic:__consumer_offsetsPartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer Topic: __consumer_offsets Partition: 0Leader: 103 Replicas: 103 Isr: 103 Topic: __consumer_offsets Partition: 1Leader: 101 Replicas: 101 Isr: 101 Topic: __consumer_offsets Partition: 2Leader: -1 Replicas: 102 Isr: 102 {code} In this situation, consumer cannot contact the partition leader for __consumer_offsets, Partition: 2 as it was killed by the test. So it won't be able to commit the offset, for that specific partition. - I changed replication factor to 3 for {{__consumer_offsets}} and then I did not see this issue. - Can you add the following to {{docker/entrypoint.sh}} and re-test? {code} cat >>config/server.properties <>config/server.properties < Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at >
[jira] [Commented] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443921#comment-16443921 ] Attila Sasvari commented on KAFKA-6799: --- [~phdezann] thanks for reporting this issue and creating the docker environment for reproducing the issue. - I had to set the environment variable M2_REPOSITORY before running the shell script: {{M2_REPOSITORY=/root/.m2/ ./spin.sh}} Then I saw the issue you described in the description. - I looked a bit around in the helloworld-kafka-1 docker container, and noticed that the replication factor for the internal topic was set to 1: {code} root@b6e9218f1761:/install/kafka# bin/kafka-topics.sh --describe -zookeeper 172.170.0.80:2181 Topic:__consumer_offsetsPartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer Topic: __consumer_offsets Partition: 0Leader: 103 Replicas: 103 Isr: 103 Topic: __consumer_offsets Partition: 1Leader: 101 Replicas: 101 Isr: 101 Topic: __consumer_offsets Partition: 2Leader: -1 Replicas: 102 Isr: 102 {code} In this situation, consumer cannot contact the partition leader for __consumer_offsets, Partition: 2 as it was killed by the test. So it won't be able to commit the offset, for that specific partition. - I changed replication factor to 3 for {{__consumer_offsets}} and then I did not see this issue. - Can you add the following to {{docker/entrypoint.sh}} and re-test? {code} cat >>config/server.properties < Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at >
[jira] [Resolved] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
[ https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-6703. --- Resolution: Not A Bug It turned out I forgot to set proper replication factor for {{__consumer_offsets}}. The default replication factor is 1, and the consumer group controller (determined by {{partitionFor(group)}} in GroupCoordinator) was down. Changing replication factor to 3, I did not experience the issue. I still saw a couple of messages like {code:java} [2018-03-23 16:45:52,298] DEBUG [Consumer clientId=2-1, groupId=2] Leader for partition testR1P3-2 is unavailable for fetching offset (org.apache.kafka.clients.consumer.internals.Fetcher){code} but messages in other topics matched by the whitelist regexp were fetched by MirrorMaker. > MirrorMaker cannot make progress when any matched topic from a whitelist > regexp has -1 leader > - > > Key: KAFKA-6703 > URL: https://issues.apache.org/jira/browse/KAFKA-6703 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Attila Sasvari >Priority: Major > > Scenario: > - MM whitelabel regexp matches multiple topics > - destination cluster has 5 brokers with multiple topics replication factor 3 > - without partition reassign shut down 2 brokers > - suppose a topic has no leader any more because it was off-sync and the > leader and the rest of the replicas are hosted on the downed brokers. > - so we have 1 topic with some partitions with leader -1 > - the rest of the matching topics has 3 replicas with leaders > MM will not produce into any of the matched topics until: > - the "orphaned" topic removed or > - the partition reassign carried out from the downed brokers (suppose you > can turn these back on) > In the MirrorMaker logs, there are a lot of messages like the following ones: > {code} > [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Coordinator discovery failed, refreshing > metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Sending metadata request > (type=MetadataRequest, topics=) to node 192.168.1.102:9092 (id: 0 rack: > null) (org.apache.kafka.clients.NetworkClient) > [2018-03-22 19:55:32,525] DEBUG Updated cluster metadata version 10 to > Cluster(id = Y-qtoFP-RMq2uuVnkEKAAw, nodes = [192.168.1.102:9092 (id: 0 rack: > null)], partitions = [Partition(topic = testR1P2, partition = 1, leader = > none, replicas = [42], isr = [], offlineReplicas = [42]), Partition(topic = > testR1P1, partition = 0, leader = 0, replicas = [0], isr = [0], > offlineReplicas = []), Partition(topic = testAlive, partition = 0, leader = > 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = > testERRR, partition = 0, leader = 0, replicas = [0], isr = [0], > offlineReplicas = []), Partition(topic = testR1P2, partition = 0, leader = 0, > replicas = [0], isr = [0], offlineReplicas = [])]) > (org.apache.kafka.clients.Metadata) > [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Sending FindCoordinator request to broker > 192.168.1.102:9092 (id: 0 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Received FindCoordinator response > ClientResponse(receivedTimeMs=1521744932525, latencyMs=0, disconnected=false, > requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, > clientId=consumer-1, correlationId=19), > responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', > error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,526] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Group coordinator lookup failed: The > coordinator is not available. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > {code} > Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer > properties file, then an OldConsumer is created, and it can make progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6699) When one of two Kafka nodes are dead, streaming API cannot handle messaging
[ https://issues.apache.org/jira/browse/KAFKA-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411454#comment-16411454 ] Attila Sasvari commented on KAFKA-6699: --- [~habdank] just a quick question: is the replication factor set to 2 for the internal topic {{__consumer_offsets}} too? Please note if you change the application id, you will not be able to continue consuming messages from the last processed offset (unless you do some magic). > When one of two Kafka nodes are dead, streaming API cannot handle messaging > --- > > Key: KAFKA-6699 > URL: https://issues.apache.org/jira/browse/KAFKA-6699 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.2 >Reporter: Seweryn Habdank-Wojewodzki >Priority: Major > > Dears, > I am observing quite often, when Kafka Broker is partly dead(*), then > application, which uses streaming API are doing nothing. > (*) Partly dead in my case it means that one of two Kafka nodes are out of > order. > Especially when disk is full on one machine, then Broker is going in some > strange state, where streaming API goes vacations. It seems like regular > producer/consumer API has no problem in such a case. > Can you have a look on that matter? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
[ https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari updated KAFKA-6703: -- Description: Scenario: - MM whitelabel regexp matches multiple topics - destination cluster has 5 brokers with multiple topics replication factor 3 - without partition reassign shut down 2 brokers - suppose a topic has no leader any more because it was off-sync and the leader and the rest of the replicas are hosted on the downed brokers. - so we have 1 topic with some partitions with leader -1 - the rest of the matching topics has 3 replicas with leaders MM will not produce into any of the matched topics until: - the "orphaned" topic removed or - the partition reassign carried out from the downed brokers (suppose you can turn these back on) In the MirrorMaker logs, there are a lot of messages like the following ones: {code} [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-43054] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-43054] Sending metadata request (type=MetadataRequest, topics=) to node 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.NetworkClient) [2018-03-22 19:55:32,525] DEBUG Updated cluster metadata version 10 to Cluster(id = Y-qtoFP-RMq2uuVnkEKAAw, nodes = [192.168.1.102:9092 (id: 0 rack: null)], partitions = [Partition(topic = testR1P2, partition = 1, leader = none, replicas = [42], isr = [], offlineReplicas = [42]), Partition(topic = testR1P1, partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = testAlive, partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = testERRR, partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = testR1P2, partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas = [])]) (org.apache.kafka.clients.Metadata) [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-43054] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-43054] Received FindCoordinator response ClientResponse(receivedTimeMs=1521744932525, latencyMs=0, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=19), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 19:55:32,526] DEBUG [Consumer clientId=consumer-1, groupId=console-consumer-43054] Group coordinator lookup failed: The coordinator is not available. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) {code} Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer properties file, then an OldConsumer is created, and it can make progress. was: Scenario: - MM whitelabel regexp matches multiple topics - destination cluster has 5 brokers with multiple topics replication factor 3 - without partition reassign shut down 2 brokers - suppose a topic has no leader any more because it was off-sync and the leader and the rest of the replicas are hosted on the downed brokers. - so we have 1 topic with some partitions with leader -1 - the rest of the matching topics has 3 replicas with leaders MM will not produce into any of the matched topics until: - the "orphaned" topic removed or - the partition reassign carried out from the downed brokers (suppose you can turn these back on) In the MirrorMaker logs, there are a lot of messages like the following ones: {code} [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, latencyMs=1, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=1-0, correlationId=71), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22
[jira] [Commented] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
[ https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410078#comment-16410078 ] Attila Sasvari commented on KAFKA-6703: --- There is a call to [ensureCoordinatorReady()|https://github.com/apache/kafka/blob/f0a29a693548efe539cba04807e21862c8dfc1bf/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L279] in {{ConsumerCoordinator}}. It tries to poll the coordinator in a [loop|https://github.com/apache/kafka/blob/f0a29a693548efe539cba04807e21862c8dfc1bf/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L217-L219]. It does not succeed, so it will retry the connection, and other matched topics are ignored until the failing coordinator becomes healthy. > MirrorMaker cannot make progress when any matched topic from a whitelist > regexp has -1 leader > - > > Key: KAFKA-6703 > URL: https://issues.apache.org/jira/browse/KAFKA-6703 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Attila Sasvari >Priority: Major > > Scenario: > - MM whitelabel regexp matches multiple topics > - destination cluster has 5 brokers with multiple topics replication factor 3 > - without partition reassign shut down 2 brokers > - suppose a topic has no leader any more because it was off-sync and the > leader and the rest of the replicas are hosted on the downed brokers. > - so we have 1 topic with some partitions with leader -1 > - the rest of the matching topics has 3 replicas with leaders > MM will not produce into any of the matched topics until: > - the "orphaned" topic removed or > - the partition reassign carried out from the downed brokers (suppose you > can turn these back on) > In the MirrorMaker logs, there are a lot of messages like the following ones: > {code} > [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending > FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending > FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received > FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, > latencyMs=1, disconnected=false, > requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, > clientId=1-0, correlationId=71), > responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', > error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Received > FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, > latencyMs=1, disconnected=false, > requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, > clientId=1-1, correlationId=71), > responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', > error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Group > coordinator lookup failed: The coordinator is not available. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Group > coordinator lookup failed: The coordinator is not available. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] > Coordinator discovery failed, refreshing metadata > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > {code} > Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer > properties file, then an OldConsumer is created, and it can make progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
Attila Sasvari created KAFKA-6703: - Summary: MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader Key: KAFKA-6703 URL: https://issues.apache.org/jira/browse/KAFKA-6703 Project: Kafka Issue Type: Bug Affects Versions: 1.1.0 Reporter: Attila Sasvari Scenario: - MM whitelabel regexp matches multiple topics - destination cluster has 5 brokers with multiple topics replication factor 3 - without partition reassign shut down 2 brokers - suppose a topic has no leader any more because it was off-sync and the leader and the rest of the replicas are hosted on the downed brokers. - so we have 1 topic with some partitions with leader -1 - the rest of the matching topics has 3 replicas with leaders MM will not produce into any of the matched topics until: - the "orphaned" topic removed or - the partition reassign carried out from the downed brokers (suppose you can turn these back on) In the MirrorMaker logs, there are a lot of messages like the following ones: {code} [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, latencyMs=1, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=1-0, correlationId=71), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Received FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, latencyMs=1, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=1-1, correlationId=71), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Group coordinator lookup failed: The coordinator is not available. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Group coordinator lookup failed: The coordinator is not available. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) {code} Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer properties file, then an OldConsumer is created, and it can make progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up
[ https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369119#comment-16369119 ] Attila Sasvari commented on KAFKA-6332: --- [~hachikuji] many thanks! I created a PR and all checks have passed. > Kafka system tests should use nc instead of log grep to detect start-up > --- > > Key: KAFKA-6332 > URL: https://issues.apache.org/jira/browse/KAFKA-6332 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Assignee: Attila Sasvari >Priority: Major > Labels: newbie > > [~ewencp] suggested using nc -z test instead of grepping the logs for a more > reliable test. This came up when the system tests were broken by a log > improvement change. > Reference: https://github.com/apache/kafka/pull/3834 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up
[ https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368328#comment-16368328 ] Attila Sasvari commented on KAFKA-6332: --- [~ijuma] I'd like to work on this task, can you please assign it to me? In {{kafka.py}}, we could add something similar to the [listening() function in zookeeper.py|https://github.com/apache/kafka/blob/ee352be9c88663b95b1096b7a294e61857857380/tests/kafkatest/services/zookeeper.py#L87] : {code} def listening(self, node, port): try: cmd = "nc -z %s %s" % (node.account.hostname, port) node.account.ssh_output(cmd, allow_fail=False) self.logger.debug("Kafka server started accepting connections at: '%s:%s')", node.account.hostname, port) return True except (RemoteCommandError, ValueError) as e: return False {code} and call it in {{start_node}} {code} def start_node(self, node): ... self.logger.debug("Attempting to start KafkaService on %s with command: %s" % (str(node.account), cmd)) node.account.ssh(cmd) wait_until(lambda: self.listening(node, self.jmx_port), timeout_sec=30, backoff_sec=.25, err_msg="Kafka server didn't finish startup") {code} I have a [work in progress|https://github.com/asasvari/kafka/commit/916ab0b8291c7db2d16d84fc847630972628e345], executed a few tests ({{./tests/kafkatest/tests/core/consumer_group_command_test.py}}, {{./tests/kafkatest/tests/core/simple_consumer_shell_test.py}}) and they passed. However, {{listening()}} could be more general; so it might go into the {{utils}} module. I see there are references to {{monitor.wait_until}} grepping for messages in server logs at other places (like {{minikdc.py}}, {{connect.py}}, {{torgdor.py}}, {{streams.py}} - we need to know the exact ports). > Kafka system tests should use nc instead of log grep to detect start-up > --- > > Key: KAFKA-6332 > URL: https://issues.apache.org/jira/browse/KAFKA-6332 > Project: Kafka > Issue Type: Bug >Reporter: Ismael Juma >Priority: Major > Labels: newbie > > [~ewencp] suggested using nc -z test instead of grepping the logs for a more > reliable test. This came up when the system tests were broken by a log > improvement change. > Reference: https://github.com/apache/kafka/pull/3834 -- This message was sent by Atlassian JIRA (v7.6.3#76005)