[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up

2019-03-13 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791592#comment-16791592
 ] 

Attila Sasvari commented on KAFKA-6332:
---

Feel free to take it over.

> Kafka system tests should use nc instead of log grep to detect start-up
> ---
>
> Key: KAFKA-6332
> URL: https://issues.apache.org/jira/browse/KAFKA-6332
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Priority: Major
>  Labels: newbie
>
> [~ewencp] suggested using nc -z test instead of grepping the logs for a more 
> reliable test. This came up when the system tests were broken by a log 
> improvement change.
> Reference: https://github.com/apache/kafka/pull/3834



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up

2019-03-13 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-6332:
-

Assignee: (was: Attila Sasvari)

> Kafka system tests should use nc instead of log grep to detect start-up
> ---
>
> Key: KAFKA-6332
> URL: https://issues.apache.org/jira/browse/KAFKA-6332
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Priority: Major
>  Labels: newbie
>
> [~ewencp] suggested using nc -z test instead of grepping the logs for a more 
> reliable test. This came up when the system tests were broken by a log 
> improvement change.
> Reference: https://github.com/apache/kafka/pull/3834



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7813) JmxTool throws NPE when --object-name is omitted

2019-01-11 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7813:
-

 Summary: JmxTool throws NPE when --object-name is omitted
 Key: KAFKA-7813
 URL: https://issues.apache.org/jira/browse/KAFKA-7813
 Project: Kafka
  Issue Type: Bug
Reporter: Attila Sasvari


Running the JMX tool without --object-name parameter, results in a 
NullPointerException:
{code}
$ bin/kafka-run-class.sh kafka.tools.JmxTool  --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi
...
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143)
at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143)
at 
scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:93)
at scala.collection.immutable.List.exists(List.scala:84)
at kafka.tools.JmxTool$.main(JmxTool.scala:143)
at kafka.tools.JmxTool.main(JmxTool.scala)
{code} 

Documentation of the tool says:
{code}
--object-name  A JMX object name to use as a query.  
   This can contain wild cards, and
   this option can be given multiple   
   times to specify more than one  
   query. If no objects are specified  
   all objects will be queried.
{code}

Running the tool with {{--object-name ''}}, also results in an NPE:
{code}
$ bin/kafka-run-class.sh kafka.tools.JmxTool  --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name ''
...
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.JmxTool$.main(JmxTool.scala:197)
at kafka.tools.JmxTool.main(JmxTool.scala)
{code}

Runnig the tool with --object-name without an argument, the tool with 
OptionMissingRequiredArgumentException:
{code}
$ bin/kafka-run-class.sh kafka.tools.JmxTool  --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name 

Exception in thread "main" joptsimple.OptionMissingRequiredArgumentException: 
Option object-name requires an argument
at 
joptsimple.RequiredArgumentOptionSpec.detectOptionArgument(RequiredArgumentOptionSpec.java:48)
at 
joptsimple.ArgumentAcceptingOptionSpec.handleOption(ArgumentAcceptingOptionSpec.java:257)
at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:513)
at 
joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56)
at joptsimple.OptionParser.parse(OptionParser.java:396)
at kafka.tools.JmxTool$.main(JmxTool.scala:104)
at kafka.tools.JmxTool.main(JmxTool.scala)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended

2018-12-21 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726830#comment-16726830
 ] 

Attila Sasvari commented on KAFKA-7752:
---

In kafka.zk.ZkData class, ZkAclStore.securePaths contains the following paths 
{code}
0 = "/kafka-acl"
1 = "/kafka-acl-changes"
2 = "/kafka-acl-extended/prefixed"
3 = "/kafka-acl-extended-changes"
{code}

When the migrator tool runs, ZkUtils.SecureZkRootPaths contains:
{code}
result = {$colon$colon@2669} "::" size = 14
 0 = "/admin"
 1 = "/brokers"
 2 = "/cluster"
 3 = "/config"
 4 = "/controller"
 5 = "/controller_epoch"
 6 = "/isr_change_notification"
 7 = "/latest_producer_id_block"
 8 = "/log_dir_event_notification"
 9 = "/delegation_token"
 10 = "/kafka-acl"
 11 = "/kafka-acl-changes"
 12 = "/kafka-acl-extended/prefixed"
 13 = "/kafka-acl-extended-changes"
{code}

Then the code recursively travels these paths and set ACL on all the child 
znodes.
As a result, {{/kafka-acl-extended}} is missed.

> zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
> -
>
> Key: KAFKA-7752
> URL: https://issues.apache.org/jira/browse/KAFKA-7752
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname 
> -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then 
> {{zookeeper-security-migration.sh --zookeeper.connect $(hostname 
> -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those.
> I noticed that the tool did not remove ACLs on certain nodes: 
> {code}
> ] getAcl /kafka/kafka-acl-extended
> 'world,'anyone
> : r
> 'sasl,'kafka
> : cdrwa
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended

2018-12-21 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7752:
-

Assignee: Attila Sasvari

> zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
> -
>
> Key: KAFKA-7752
> URL: https://issues.apache.org/jira/browse/KAFKA-7752
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname 
> -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then 
> {{zookeeper-security-migration.sh --zookeeper.connect $(hostname 
> -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those.
> I noticed that the tool did not remove ACLs on certain nodes: 
> {code}
> ] getAcl /kafka/kafka-acl-extended
> 'world,'anyone
> : r
> 'sasl,'kafka
> : cdrwa
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended

2018-12-18 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7752:
-

 Summary: zookeeper-security-migration.sh does not remove ACL on 
kafka-acl-extended
 Key: KAFKA-7752
 URL: https://issues.apache.org/jira/browse/KAFKA-7752
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0
Reporter: Attila Sasvari


Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname 
-f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then 
{{zookeeper-security-migration.sh --zookeeper.connect $(hostname -f):2181/kafka 
--zookeeper.acl unsecure}} to unsecure those.

I noticed that the tool did not remove ACLs on certain nodes: 
{code}
] getAcl /kafka/kafka-acl-extended
'world,'anyone
: r
'sasl,'kafka
: cdrwa
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf

2018-12-13 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7631:
-

Assignee: Viktor Somogyi  (was: Attila Sasvari)

> NullPointerException when SCRAM is allowed bu ScramLoginModule is not in 
> broker's jaas.conf
> ---
>
> Key: KAFKA-7631
> URL: https://issues.apache.org/jira/browse/KAFKA-7631
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.0.0
>Reporter: Andras Beni
>Assignee: Viktor Somogyi
>Priority: Minor
>
> When user wants to use delegation tokens and lists {{SCRAM}} in 
> {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to 
> broker's JAAS configuration, a null pointer exception is thrown on broker 
> side and the connection is closed.
> Meaningful error message should be logged and sent back to the client.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376)
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262)
> at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127)
> at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:427)
> at kafka.network.Processor.poll(SocketServer.scala:679)
> at kafka.network.Processor.run(SocketServer.scala:584)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured b

2018-12-04 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708633#comment-16708633
 ] 

Attila Sasvari commented on KAFKA-7696:
---

Thanks [~omkreddy]! Resolving this as a duplicate

> kafka-delegation-tokens.sh using a config file that contains 
> security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to 
> connect to an SSL-enabled secured broker
> -
>
> Key: KAFKA-7696
> URL: https://issues.apache.org/jira/browse/KAFKA-7696
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Attila Sasvari
>Assignee: Viktor Somogyi
>Priority: Major
>
> When the command-config file of kafka-delegation-tokens contain 
> security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user 
> error), the process throws a java.lang.OutOfMemoryError upon connection 
> attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker.
> {code}
> [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 
> 'kafka-admin-client-thread | adminclient-1': 
> (org.apache.kafka.common.utils.KafkaThread)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:468)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured br

2018-12-04 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari resolved KAFKA-7696.
---
Resolution: Duplicate

> kafka-delegation-tokens.sh using a config file that contains 
> security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to 
> connect to an SSL-enabled secured broker
> -
>
> Key: KAFKA-7696
> URL: https://issues.apache.org/jira/browse/KAFKA-7696
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Attila Sasvari
>Assignee: Viktor Somogyi
>Priority: Major
>
> When the command-config file of kafka-delegation-tokens contain 
> security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user 
> error), the process throws a java.lang.OutOfMemoryError upon connection 
> attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker.
> {code}
> [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 
> 'kafka-admin-client-thread | adminclient-1': 
> (org.apache.kafka.common.utils.KafkaThread)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:468)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured br

2018-12-03 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7696:
-

Assignee: Viktor Somogyi

> kafka-delegation-tokens.sh using a config file that contains 
> security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to 
> connect to an SSL-enabled secured broker
> -
>
> Key: KAFKA-7696
> URL: https://issues.apache.org/jira/browse/KAFKA-7696
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Attila Sasvari
>Assignee: Viktor Somogyi
>Priority: Major
>
> When the command-config file of kafka-delegation-tokens contain 
> security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user 
> error), the process throws a java.lang.OutOfMemoryError upon connection 
> attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker.
> {code}
> [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 
> 'kafka-admin-client-thread | adminclient-1': 
> (org.apache.kafka.common.utils.KafkaThread)
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497)
>   at 
> org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:468)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured bro

2018-12-03 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7696:
-

 Summary: kafka-delegation-tokens.sh using a config file that 
contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries 
to connect to an SSL-enabled secured broker
 Key: KAFKA-7696
 URL: https://issues.apache.org/jira/browse/KAFKA-7696
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0, 1.1.0
Reporter: Attila Sasvari


When the command-config file of kafka-delegation-tokens contain 
security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user 
error), the process throws a java.lang.OutOfMemoryError upon connection attempt 
to a secured (i.e. Kerberized, SSL-enabled) Kafka broker.

{code}
[2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 
'kafka-admin-client-thread | adminclient-1': 
(org.apache.kafka.common.utils.KafkaThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407)
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497)
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533)
at org.apache.kafka.common.network.Selector.poll(Selector.java:468)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
at 
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7691) Encypt-then-MAC Delegation token metadata

2018-11-30 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7691:
-

Assignee: Attila Sasvari

> Encypt-then-MAC Delegation token metadata
> -
>
> Key: KAFKA-7691
> URL: https://issues.apache.org/jira/browse/KAFKA-7691
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> Currently delegation token metadata is stored unencrypted in Zookeeper.
> Kafka brokers could implement a strategy called 
> [Encrypt-then-MAC|https://en.wikipedia.org/wiki/Authenticated_encryption#Encrypt-then-MAC_(EtM)]
>  to encrypt sensitive metadata information about delegation tokens.
> For more details, please read 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-395%3A+Encypt-then-MAC+Delegation+token+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7691) Encypt-then-MAC Delegation token metadata

2018-11-30 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7691:
-

 Summary: Encypt-then-MAC Delegation token metadata
 Key: KAFKA-7691
 URL: https://issues.apache.org/jira/browse/KAFKA-7691
 Project: Kafka
  Issue Type: Improvement
Reporter: Attila Sasvari


Currently delegation token metadata is stored unencrypted in Zookeeper.

Kafka brokers could implement a strategy called 
[Encrypt-then-MAC|https://en.wikipedia.org/wiki/Authenticated_encryption#Encrypt-then-MAC_(EtM)]
 to encrypt sensitive metadata information about delegation tokens.

For more details, please read 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-395%3A+Encypt-then-MAC+Delegation+token+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7631) NullPointerException when SCRAM is allowed bu ScramLoginModule is not in broker's jaas.conf

2018-11-15 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7631:
-

Assignee: Attila Sasvari

> NullPointerException when SCRAM is allowed bu ScramLoginModule is not in 
> broker's jaas.conf
> ---
>
> Key: KAFKA-7631
> URL: https://issues.apache.org/jira/browse/KAFKA-7631
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.0.0
>Reporter: Andras Beni
>Assignee: Attila Sasvari
>Priority: Minor
>
> When user wants to use delegation tokens and lists {{SCRAM}} in 
> {{sasl.enabled.mechanisms}}, but does not add {{ScramLoginModule}} to 
> broker's JAAS configuration, a null pointer exception is thrown on broker 
> side and the connection is closed.
> Meaningful error message should be logged and sent back to the client.
> {code}
> java.lang.NullPointerException
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleSaslToken(SaslServerAuthenticator.java:376)
> at 
> org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:262)
> at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:127)
> at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:489)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:427)
> at kafka.network.Processor.poll(SocketServer.scala:679)
> at kafka.network.Processor.run(SocketServer.scala:584)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7455) JmxTool cannot connect to an SSL-enabled JMX RMI port

2018-09-28 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7455:
-

 Summary: JmxTool cannot connect to an SSL-enabled JMX RMI port
 Key: KAFKA-7455
 URL: https://issues.apache.org/jira/browse/KAFKA-7455
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Attila Sasvari


When JmxTool tries to connect to an SSL-enabled JMX RMI port with 
JMXConnectorFactory'connect(), the connection attempt results in a 
"java.rmi.ConnectIOException: non-JRMP server at remote endpoint":

{code}
$ export 
KAFKA_OPTS="-Djavax.net.ssl.trustStore=/tmp/kafka.server.truststore.jks 
-Djavax.net.ssl.trustStorePassword=test"

$ bin/kafka-run-class.sh kafka.tools.JmxTool --object-name 
"kafka.server:type=kafka-metrics-count"  --jmx-url 
service:jmx:rmi:///jndi/rmi://localhost:9393/jmxrmi

ConnectIOException: non-JRMP server at remote endpoint].
java.io.IOException: Failed to retrieve RMIServer stub: 
javax.naming.CommunicationException [Root exception is 
java.rmi.ConnectIOException: non-JRMP server at remote endpoint]
at 
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369)
at 
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270)
at kafka.tools.JmxTool$.main(JmxTool.scala:120)
at kafka.tools.JmxTool.main(JmxTool.scala)
{code}

The problem is that {{JmxTool}} does not specify {{SslRMIClientSocketFactory}} 
when it tries to connect
https://github.com/apache/kafka/blob/70d90c371833b09cf934c8c2358171433892a085/core/src/main/scala/kafka/tools/JmxTool.scala#L120
{code}  
  jmxc = JMXConnectorFactory.connect(url, null)
{code}
To connect to a secured RMI port, it should pass an envionrment map that 
contains a {{("com.sun.jndi.rmi.factory.socket", new 
SslRMIClientSocketFactory)}} entry.

More info:
- https://docs.oracle.com/cd/E19698-01/816-7609/security-35/index.html
- https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands

2018-09-19 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620538#comment-16620538
 ] 

Attila Sasvari commented on KAFKA-7418:
---

- I started a discussion thread about the KIP.
- [~mrsrinivas] please request contributor permissions via an email sent to 
d...@kafka.apache.org (including your JIRA ID). A Kafka committer or PMC member 
will add you to Kafka contributors, and then you will be able to assign the 
JIRA to yourself. 

> Add '--help' option to all available Kafka CLI commands 
> 
>
> Key: KAFKA-7418
> URL: https://issues.apache.org/jira/browse/KAFKA-7418
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Attila Sasvari
>Priority: Major
>
> Currently, the '--help' option is not recognized by some Kafka commands . For 
> example:
> {code}
> $ kafka-console-producer --help
> help is not a recognized option
> {code}
> However, the '--help' option is supported by other commands:
> {code}
> $ kafka-verifiable-producer --help
> usage: verifiable-producer [-h] --topic TOPIC --broker-list 
> HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput 
> THROUGHPUT] [--acks ACKS]
>[--producer.config CONFIG_FILE] 
> [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX]
> ...
> {code} 
> To provide a consistent user experience, it would be nice to add a '--help' 
> option to all Kafka commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands

2018-09-19 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620470#comment-16620470
 ] 

Attila Sasvari commented on KAFKA-7418:
---

- Created a KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-373%3A+Add+%27--help%27+option+to+all+available+Kafka+CLI+commands
I will send out an email to the kafka-dev mailing list soon.

> Add '--help' option to all available Kafka CLI commands 
> 
>
> Key: KAFKA-7418
> URL: https://issues.apache.org/jira/browse/KAFKA-7418
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Attila Sasvari
>Priority: Major
>
> Currently, the '--help' option is not recognized by some Kafka commands . For 
> example:
> {code}
> $ kafka-console-producer --help
> help is not a recognized option
> {code}
> However, the '--help' option is supported by other commands:
> {code}
> $ kafka-verifiable-producer --help
> usage: verifiable-producer [-h] --topic TOPIC --broker-list 
> HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput 
> THROUGHPUT] [--acks ACKS]
>[--producer.config CONFIG_FILE] 
> [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX]
> ...
> {code} 
> To provide a consistent user experience, it would be nice to add a '--help' 
> option to all Kafka commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands

2018-09-19 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620459#comment-16620459
 ] 

Attila Sasvari commented on KAFKA-7418:
---

[~mrsrinivas] sure, it is totally fine with me. However, it might be probably 
needed to write a KIP, as we add a new option to the commands, therefore their 
public interface changes (it is only an addition that shall be backward 
compatible). [~ijuma], [~hachikuji] what do you think?

> Add '--help' option to all available Kafka CLI commands 
> 
>
> Key: KAFKA-7418
> URL: https://issues.apache.org/jira/browse/KAFKA-7418
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Attila Sasvari
>Priority: Major
>
> Currently, the '--help' option is not recognized by some Kafka commands . For 
> example:
> {code}
> $ kafka-console-producer --help
> help is not a recognized option
> {code}
> However, the '--help' option is supported by other commands:
> {code}
> $ kafka-verifiable-producer --help
> usage: verifiable-producer [-h] --topic TOPIC --broker-list 
> HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput 
> THROUGHPUT] [--acks ACKS]
>[--producer.config CONFIG_FILE] 
> [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX]
> ...
> {code} 
> To provide a consistent user experience, it would be nice to add a '--help' 
> option to all Kafka commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands

2018-09-18 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7418:
-

 Summary: Add '--help' option to all available Kafka CLI commands 
 Key: KAFKA-7418
 URL: https://issues.apache.org/jira/browse/KAFKA-7418
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: Attila Sasvari


Currently, the '--help' option is not recognized by some Kafka commands . For 
example:
{code}
$ kafka-console-producer --help
help is not a recognized option
{code}

However, the '--help' option is supported by other commands:
{code}
$ kafka-verifiable-producer --help
usage: verifiable-producer [-h] --topic TOPIC --broker-list 
HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput 
THROUGHPUT] [--acks ACKS]
   [--producer.config CONFIG_FILE] 
[--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX]

...
{code} 

To provide a consistent user experience, it would be nice to add a '--help' 
option to all Kafka commands.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7392) Allow to specify subnet for Docker containers using standard CIDR notation

2018-09-09 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7392:
-

Assignee: Attila Sasvari

> Allow to specify subnet for Docker containers using standard CIDR notation
> --
>
> Key: KAFKA-7392
> URL: https://issues.apache.org/jira/browse/KAFKA-7392
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> During Kafka system test execution, the IP range of the Docker subnet, 
> 'ducknet' is allocated by Docker.
> {code}
> docker network inspect ducknet
> [
> {
> "Name": "ducknet",
> "Id": 
> "f4325c524feee777817b9cc57b91634e20de96127409c1906c2c156bfeb4beeb",
> "Created": "2018-09-09T11:53:40.4332613Z",
> "Scope": "local",
> "Driver": "bridge",
> "EnableIPv6": false,
> "IPAM": {
> "Driver": "default",
> "Options": {},
> "Config": [
> {
> "Subnet": "172.23.0.0/16",
> "Gateway": "172.23.0.1"
> }
> ]
> },
> {code}
> The default bridge (docker0) can be controlled 
> [externally|https://success.docker.com/article/how-do-i-configure-the-default-bridge-docker0-network-for-docker-engine-to-a-different-subnet]
>  through etc/docker/daemon.json, however, subnet created by ducknet is not. 
> It might be a problem as many businesses make extensive use of the 
> [RFC1918|https://tools.ietf.org/html/rfc1918] private address space (such as 
> 172.16.0.0/12 : 172.16.0.0 - 172.31.255.255) for internal networks (e.g. VPN).
> h4. Proposed changes:
> - Introduce a new subnet argument that can be used by {{ducker-ak up}} to 
> specify IP range using standard CIDR, extend help message with the following:
> {code}
> If --subnet is specified, default Docker subnet is overriden by given IP 
> address and netmask,
> using standard CIDR notation. For example: 192.168.1.5/24.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7392) Allow to specify subnet for Docker containers using standard CIDR notation

2018-09-09 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7392:
-

 Summary: Allow to specify subnet for Docker containers using 
standard CIDR notation
 Key: KAFKA-7392
 URL: https://issues.apache.org/jira/browse/KAFKA-7392
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Attila Sasvari


During Kafka system test execution, the IP range of the Docker subnet, 
'ducknet' is allocated by Docker.
{code}
docker network inspect ducknet
[
{
"Name": "ducknet",
"Id": 
"f4325c524feee777817b9cc57b91634e20de96127409c1906c2c156bfeb4beeb",
"Created": "2018-09-09T11:53:40.4332613Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.23.0.0/16",
"Gateway": "172.23.0.1"
}
]
},
{code}

The default bridge (docker0) can be controlled 
[externally|https://success.docker.com/article/how-do-i-configure-the-default-bridge-docker0-network-for-docker-engine-to-a-different-subnet]
 through etc/docker/daemon.json, however, subnet created by ducknet is not. It 
might be a problem as many businesses make extensive use of the 
[RFC1918|https://tools.ietf.org/html/rfc1918] private address space (such as 
172.16.0.0/12 : 172.16.0.0 - 172.31.255.255) for internal networks (e.g. VPN).

h4. Proposed changes:
- Introduce a new subnet argument that can be used by {{ducker-ak up}} to 
specify IP range using standard CIDR, extend help message with the following:
{code}
If --subnet is specified, default Docker subnet is overriden by given IP 
address and netmask,
using standard CIDR notation. For example: 192.168.1.5/24.
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-07 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607140#comment-16607140
 ] 

Attila Sasvari commented on KAFKA-4544:
---

Thanks [~omkreddy], I see your point. Today, it is not possible as 
--consumer-property  is not exposed by the python wrapper (neither kafka_opts), 
but my patch will make console-consumer.py clever.

> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-06 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605944#comment-16605944
 ] 

Attila Sasvari commented on KAFKA-4544:
---

[~omkreddy] thanks for the info, I extended the test case to better cover the 
lifecycle of a delegation token based on your idea:
- Create delegation token
- Create a console-producer using SCRAM and delegation token and produce a 
message
- Verify message is created (with kafka.search_data_files() )
- Create a console-consumer using SCRAM and delegation token and consume 1 
message 
- Expire the token, immediately
- Try producing one more message with the expired token
- Verify the last message is not persisted by the broker

Initially, I wanted to use console_consumer.py and verifiable clients to 
validate things (messages produced / consumed), but I ran into some issues:
- jaas.conf / KafkaClient config cannot include more login modules 
{code}  
Multiple LoginModule-s in JAAS 
Caused by: java.lang.IllegalArgumentException: JAAS config property contains 2 
login modules, should be 1 module
at 
org.apache.kafka.common.security.JaasContext.load(JaasContext.java:95)
at 
org.apache.kafka.common.security.JaasContext.loadClientContext(JaasContext.java:84)
at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:119)
at 
org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:65)
at 
org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:88)
at 
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:419)
{code}
- To request a delegation token, we need GSSAPI (and use keytab), subsequently, 
consumers and producers use the delegation token. So I ended up constructing 
manually the jaas.config and client configs in my POC. 
- With and even without my changes, JMX failed to start up when I tried to run 
{{./ducker-ak test ../kafkatest/sanity_checks/test_console_consumer.py}}:
{code}
Exception in thread ConsoleConsumer-0-140287252789520-worker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/usr/local/lib/python2.7/dist-packages/ducktape/services/background_thread.py",
 line 35, in _protected_worker
self._worker(idx, node)
  File "/opt/kafka-dev/tests/kafkatest/services/console_consumer.py", line 229, 
in _worker
self.start_jmx_tool(idx, node)
  File "/opt/kafka-dev/tests/kafkatest/services/monitor/jmx.py", line 86, in 
start_jmx_tool
wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
36, in wait_until
raise TimeoutError(err_msg)
TimeoutError: ducker@ducker04: Jmx tool took too long to start
{code}

Right now a lot of things are 
[hardcoded|https://github.com/asasvari/kafka/commit/edfc37012079764d2a589dbf5f24ad04505975d4#diff-3e7b2bdbd55d075bcebbbe5ba8c4e269]
 (using shell scripts) in my POC. It would be nice to extract common 
functionalities and make them easily reusable (e.g. creating wrappers in 
python, for example, to do delegation token handling). 
 

> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-05 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604530#comment-16604530
 ] 

Attila Sasvari commented on KAFKA-4544:
---

[~omkreddy] thanks a lot!
- I started to work on this here: 
[https://github.com/asasvari/kafka/commit/6ce766c3ec17b7787415d278a9b59f15ed197c1c],
 can you take a quick look?
- What kind of tests did you plan? We might want to also create subtasks for 
this ticket as it would be easier to review the changes.
- So far I have only added a new test to verify that we can create a delegation 
token with {{kafka-delegation-tokens.sh}}. 
- I believe an other basic test is to start a console application (e.g. 
kafka-console-consumer) and test wether it can connect to the broker using the 
previously generated delegation token. 


> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-4544) Add system tests for delegation token based authentication

2018-09-05 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-4544:
-

Assignee: Attila Sasvari  (was: Manikumar)

> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Attila Sasvari
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4544) Add system tests for delegation token based authentication

2018-08-31 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598557#comment-16598557
 ] 

Attila Sasvari commented on KAFKA-4544:
---

Do you have the capacity to work on this [~omkreddy]? If there's anything I can 
do to help, please let me know.

> Add system tests for delegation token based authentication
> --
>
> Key: KAFKA-4544
> URL: https://issues.apache.org/jira/browse/KAFKA-4544
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Ashish Singh
>Assignee: Manikumar
>Priority: Major
>
> Add system tests for delegation token based authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7289) Performance tools should allow user to specify output type

2018-08-15 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580792#comment-16580792
 ] 

Attila Sasvari commented on KAFKA-7289:
---

- I will start the KIP soon. It might be enough to deal with CSV in the first 
run.
- I have started to work on a POC that allows ProducerPerformance to print out 
final results to an output file in CSV format: 
https://github.com/asasvari/kafka/commit/82bbff649c5afb2c30f56960172319d2c380fbcd.
 It will be probably a subtask of this JIRA. It uses Apache Commons CSV. 
{{--print-metrics}} is not handled (metrics are not written to the output file 
of the final results - could be handled in KAFKA-1939).
{code}
$ bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance
...
  --output-with-header   Print out final results to output file with headers. 
(default: false)
  --output-type OUTPUT-TYPE
 Format type of the output file. By default it is CSV. 
(default: csv)
  --output-path OUTPUT-PATH
 Write final results to the file OUTPUT-PATH.

$ bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic 
TOPIC --num-records 1 --throughput -1 --record-size  100 --producer-props 
bootstrap.servers=localhost:9092 --output-path  producer_stats.csv 
--output-with-header

$ cat producer_stats.csv 
records sent,records/sec,MB/sec,ms avg latency,ms max latency,ms 50th,ms 
95th,ms 99th,ms 99.9th
1,6.7114093959731544,6.400498767827181E-4,142.0,142.0,142,142,142,142
{code}


> Performance tools should allow user to specify output type
> --
>
> Key: KAFKA-7289
> URL: https://issues.apache.org/jira/browse/KAFKA-7289
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> Currently, org.apache.kafka.tools.ProducerPerformance and 
> kafka.tools.ConsumerPerformance do not provide command line options to 
> specify output type(s).
> Sample output of ProducerPerformance is as follows:
> {code}
> 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg 
> latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 
> 3842 ms 99.9th.
> {code}
> It would be, however, nice to allow users to generate performance reports in 
> a machine-readable format (such as CSV and JSON). This way, performance 
> results could be easily processed by external applications (e.g. displayed in 
> charts).
> It will probably require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7289) Performance tools should allow user to specify output type

2018-08-15 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7289:
-

Assignee: Attila Sasvari

> Performance tools should allow user to specify output type
> --
>
> Key: KAFKA-7289
> URL: https://issues.apache.org/jira/browse/KAFKA-7289
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> Currently, org.apache.kafka.tools.ProducerPerformance and 
> kafka.tools.ConsumerPerformance do not provide command line options to 
> specify output type(s).
> Sample output of ProducerPerformance is as follows:
> {code}
> 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg 
> latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 
> 3842 ms 99.9th.
> {code}
> It would be, however, nice to allow users to generate performance reports in 
> a machine-readable format (such as CSV and JSON). This way, performance 
> results could be easily processed by external applications (e.g. displayed in 
> charts).
> It will probably require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7289) Performance tools should allow user to specify output type

2018-08-14 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579719#comment-16579719
 ] 

Attila Sasvari commented on KAFKA-7289:
---

It is a generalization of KAFKA-1939.

> Performance tools should allow user to specify output type
> --
>
> Key: KAFKA-7289
> URL: https://issues.apache.org/jira/browse/KAFKA-7289
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Priority: Major
>
> Currently, org.apache.kafka.tools.ProducerPerformance and 
> kafka.tools.ConsumerPerformance do not provide command line options to 
> specify output type(s).
> Sample output of ProducerPerformance is as follows:
> {code}
> 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg 
> latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 
> 3842 ms 99.9th.
> {code}
> It would be, however, nice to allow users to generate performance reports in 
> a machine-readable format (such as CSV and JSON). This way, performance 
> results could be easily processed by external applications (e.g. displayed in 
> charts).
> It will probably require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7289) Performance tools should allow user to specify output type

2018-08-14 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-7289:
-

 Summary: Performance tools should allow user to specify output type
 Key: KAFKA-7289
 URL: https://issues.apache.org/jira/browse/KAFKA-7289
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.0.0
Reporter: Attila Sasvari


Currently, org.apache.kafka.tools.ProducerPerformance and 
kafka.tools.ConsumerPerformance do not provide command line options to specify 
output type(s).

Sample output of ProducerPerformance is as follows:
{code}
1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg 
latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 3842 
ms 99.9th.
{code}

It would be, however, nice to allow users to generate performance reports in a 
machine-readable format (such as CSV and JSON). This way, performance results 
could be easily processed by external applications (e.g. displayed in charts).

It will probably require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7134) KafkaLog4jAppender - Appender exceptions are propagated to caller

2018-07-26 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558325#comment-16558325
 ] 

Attila Sasvari commented on KAFKA-7134:
---

[~venkatpotru] please note if the underlying producer cannot connect to a Kafka 
broker (because the broker is not running), {{send()}} will fail and throw a 
{{TimeoutException}}. The producer will notice it and tries to log it:
https://github.com/apache/kafka/blob/1d9a427225c64e7629a4eb2e2d129d5551185049/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L876

However, if the {{rootLogger}} in {{log4j.properties}} is set to the 
{{KafkaLog4jAppender}}, it will try to log this message and send it to Kafka, 
and it creates an infinite loop. 

> KafkaLog4jAppender - Appender exceptions are propagated to caller
> -
>
> Key: KAFKA-7134
> URL: https://issues.apache.org/jira/browse/KAFKA-7134
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: venkata praveen
>Assignee: Andras Katona
>Priority: Major
>
> KafkaLog4jAppender exceptions are propagated to caller when Kafka is 
> down/slow/other, it may cause the application crash. Ideally appender should 
> print and ignore the exception
>  or should provide option to ignore/throw the exceptions like 
> 'ignoreExceptions' property of 
> https://logging.apache.org/log4j/2.x/manual/appenders.html#KafkaAppender



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7159) mark configuration files in confluent-kafka RPM SPEC file

2018-07-25 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari resolved KAFKA-7159.
---
Resolution: Won't Fix

> mark configuration files in confluent-kafka RPM SPEC file
> -
>
> Key: KAFKA-7159
> URL: https://issues.apache.org/jira/browse/KAFKA-7159
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 1.1.0
> Environment: RHEL7
>Reporter: Robert Fabisiak
>Priority: Trivial
>  Labels: rpm
>
> All configuration files in kafka RPM SPEC file should be marked with %config 
> prefix in %files section.
> This would prevent overwrites during install/upgrade and uninstall operations
> [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/rpm_packaging_guide/index#files]
> It's especially important to save configuration during package upgrades.
> Section to change in SPEC file:
> {code:java}
> %files
> %config(noreplace) %{_sysconfdir}/kafka/*.conf
> %config(noreplace) %{_sysconfdir}/kafka/*.properties
> {code}
> It would also be good to mark documentation files with %doc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7159) mark configuration files in confluent-kafka RPM SPEC file

2018-07-20 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550735#comment-16550735
 ] 

Attila Sasvari commented on KAFKA-7159:
---

- Are you sure you filed the JIRA to the proper project? There is no code 
matching '%files' in apache/kafka - 
https://github.com/apache/kafka/search?q=%25files_q=%25files
- However, Apache Bigtop has some code for packaging kafka, but it was not 
updated for a while: 
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/kafka/SPECS/kafka.spec
- {{confluent-kafka RPM SPEC}} indicates you might have wanted to report this 
to Confluent

> mark configuration files in confluent-kafka RPM SPEC file
> -
>
> Key: KAFKA-7159
> URL: https://issues.apache.org/jira/browse/KAFKA-7159
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 1.1.0
> Environment: RHEL7
>Reporter: Robert Fabisiak
>Priority: Trivial
>  Labels: rpm
>
> All configuration files in kafka RPM SPEC file should be marked with %config 
> prefix in %files section.
> This would prevent overwrites during install/upgrade and uninstall operations
> [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/rpm_packaging_guide/index#files]
> It's especially important to save configuration during package upgrades.
> Section to change in SPEC file:
> {code:java}
> %files
> %config(noreplace) %{_sysconfdir}/kafka/*.conf
> %config(noreplace) %{_sysconfdir}/kafka/*.properties
> {code}
> It would also be good to mark documentation files with %doc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-6960) Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient

2018-07-19 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-6960:
-

Assignee: Attila Sasvari

> Remove the methods from the internal Scala AdminClient that are provided by 
> the new AdminClient
> ---
>
> Key: KAFKA-6960
> URL: https://issues.apache.org/jira/browse/KAFKA-6960
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Attila Sasvari
>Assignee: Attila Sasvari
>Priority: Major
>
> This is a follow-up task of KAFKA-6884. 
> We should remove all the methods from the internal Scala AdminClient that are 
> provided by the new AdminClient. To "safe delete" them (i.e. 
> {{deleteConsumerGroups, describeConsumerGroup, listGroups, listAllGroups,  
> listAllGroupsFlattened}}), related tests need to be reviewed and adjusted 
> (for example: the tests in core_tests and streams_test). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-07-19 Thread Attila Sasvari (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549082#comment-16549082
 ] 

Attila Sasvari commented on KAFKA-6884:
---

pull request was merged, thanks for the reviews!

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
> Fix For: 2.1.0
>
>
> Now that we have KIP-222, we should update ConsumerGroupCommand to use the 
> new AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7173) Update BrokerApiVersionsCommand to use new AdminClient

2018-07-17 Thread Attila Sasvari (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-7173:
-

Assignee: Attila Sasvari

> Update BrokerApiVersionsCommand to use new AdminClient
> --
>
> Key: KAFKA-7173
> URL: https://issues.apache.org/jira/browse/KAFKA-7173
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>  Labels: needs-kip
>
> This tool seems to be the last use of the deprecated scala AdminClient. This 
> may require a KIP to fix since the new AdminClient doesn't seem to expose 
> broker API versions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6960) Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient

2018-05-28 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-6960:
-

 Summary: Remove the methods from the internal Scala AdminClient 
that are provided by the new AdminClient
 Key: KAFKA-6960
 URL: https://issues.apache.org/jira/browse/KAFKA-6960
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Attila Sasvari


This is a follow-up task of KAFKA-6884. 

We should remove all the methods from the internal Scala AdminClient that are 
provided by the new AdminClient. To "safe delete" them (i.e. 
{{deleteConsumerGroups, describeConsumerGroup, listGroups, listAllGroups,  
listAllGroupsFlattened}}), related tests need to be reviewed and adjusted (for 
example: the tests in core_tests and streams_test). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-17 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari updated KAFKA-6884:
--
Description: Now that we have KIP-222, we should update 
ConsumerGroupCommand to use the new AdminClient.  (was: Now that we have 
KIP-222, we should update consumer-groups.sh to use the new AdminClient.)

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update ConsumerGroupCommand to use the 
> new AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-15 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476020#comment-16476020
 ] 

Attila Sasvari commented on KAFKA-6884:
---

- tests that verify timeouts pass after making some changes in 
DescribeConsumerGroupTest
FROM
{code}
case _: TimeoutException => // OK
{code}
TO:
{code}
case e: ExecutionException => assert(e.getMessage.contains("TimeoutException")) 
// OK
{code}

- [~andrasbeni] noticed that the Empty state was missing from 
ConsumerGroupState.java in the pull request 
https://github.com/apache/kafka/pull/4980 . I added it and now tests that 
exercise groups with no members pass 

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-15 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475751#comment-16475751
 ] 

Attila Sasvari commented on KAFKA-6884:
---

Thanks [~hachikuji]. I have seen the pull request was updated, so I also 
updated [my work in progress 
patch|https://github.com/asasvari/kafka/tree/KAFKA-6884_ConsumerGroupCommand_should_use_new_AdminClient]
 too.

Right now 9/31 tests fail in DescribeConsumerGroupTest: 
- Four of them are related to the change in behaviour of handling timeouts. In 
particual, now a ExecutionException is is thrown that carries the information. 
For example, {{testDescribeGroupOffsetsWithShortInitializationTimeout, 
testDescribeGroupMembersWithShortInitializationTimeout, 
testDescribeGroupStateWithShortInitializationTimeout, 
testDescribeGroupWithShortInitializationTimeout}} fail with:
{code}
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node 
assignment.
{code}
(TimeoutException is embedded in ExecutionException)
- Rest of the failures are related to assignments / group membership. As I see 
handling assignment information of ConsumerGroupDescription in KafkaAdminClient 
is different from the previous version. If consumer group is not in state 
"Stable", an empty list is returned, see 
[AdminClient.scala|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminClient.scala#L336].
 Because of the change, when state is unknown, KafkaAdminClient returns :
{code}
java.lang.AssertionError: Expected no active member in describe group results, 
state: Some(Unknown), assignments: 
Some(List(PartitionAssignmentState(test.group,Some(localhost:58637 (id: 0 rack: 
null)),Some(foo),Some(0),Some(0),Some(0),Some(-),Some(-),Some(-),Some(0

at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
kafka.admin.DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers(DescribeConsumerGroupTest.scala:339)
...
{code}
I believe that is why  {{testDescribeExistingGroupWithNoMembers, 
testDescribeOffsetsOfExistingGroupWithNoMembers, 
testDescribeSimpleConsumerGroup, 
testDescribeMembersOfExistingGroupWithNoMembers, 
testDescribeStateOfExistingGroupWithNoMembers}} fail, but I am still 
investigating.

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-11 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472041#comment-16472041
 ] 

Attila Sasvari commented on KAFKA-6884:
---

It seems assignmentStrategy is also needed. Added some new review comments to 
the pull request. 

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-11 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471932#comment-16471932
 ] 

Attila Sasvari commented on KAFKA-6884:
---

[~hachikuji] Thanks, I downloaded the pull request as a diff & applied it. 

Using the new KafkaAdminClient, I am wondering how I can retrieve the 
coordinator / partition leader of the internal offset topic for a consumer 
group. 
[Earlier|https://github.com/apache/kafka/blob/c3921d489f4da80aad6f387158c33ec2e4bca52d/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L571]
 it was returned: in the 
[ConsumerGroupSummary|https://github.com/apache/kafka/blob/c3921d489f4da80aad6f387158c33ec2e4bca52d/core/src/main/scala/kafka/admin/AdminClient.scala#L300]
 object.

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-10 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470680#comment-16470680
 ] 

Attila Sasvari commented on KAFKA-6884:
---

[~hachikuji] thanks for the additional information!
 - I replaced the old, Scala based AdminClient with KafkaAdminClient in 
ConsumerGroupCommand, and started to resolve the issues.
 - As I see, currently AdminClient.scala fetches [metadata state information of 
the 
groups|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminClient.scala#L315]
 that is used in 
[ConsumerGroupCommand|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L153]
 at multiple places (such as 
[collectGroupOffsets|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L587],
 [collectGroupMembers 
|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala#L593],
 and so on).
 - KIP-222's DescribeConsumerGroupsResult returned by the new AdminClient's 
describeConsumerGroups does not include this information, see for example [the 
member fields 
|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConsumerGroupsResult.java]
 and 
[ConsumerGroupDescription|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/ConsumerGroupDescription.java].
 - Without state information we cannot say, for example, whether a consumer 
group is being rebalanced.

I believe the problem is that KafkaAdminClient does not extract metadata state 
from 
[DescribeGroupsResponse.GroupMetadata|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java#L2396]
 when constructing the DescribeConsumerGroupsResult. Do I understand it 
correctly? What do you think: shall I fix it here or in a separate JIRA?

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-09 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468881#comment-16468881
 ] 

Attila Sasvari commented on KAFKA-6884:
---

[~hachikuji] consumer-groups.sh currently uses 
{{kafka.admin.ConsumerGroupCommand}}. Shall I modify {{ConsumerGroupCommand}} 
class so that it imports KafkaAdminClient and uses the new methods introduced 
by KIP-222? I believe we don't want to add a main method to KafkaAdminClient 
and adjust the shell script itself.

What is the target version? If it is 2.0, then shall I also remove support for 
--zookeeper (old consumer)?

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-6884) ConsumerGroupCommand should use new AdminClient

2018-05-09 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-6884:
-

Assignee: Attila Sasvari

> ConsumerGroupCommand should use new AdminClient
> ---
>
> Key: KAFKA-6884
> URL: https://issues.apache.org/jira/browse/KAFKA-6884
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Attila Sasvari
>Priority: Major
>
> Now that we have KIP-222, we should update consumer-groups.sh to use the new 
> AdminClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6883) KafkaShortnamer should allow to convert Kerberos principal name to upper case user name

2018-05-08 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467207#comment-16467207
 ] 

Attila Sasvari commented on KAFKA-6883:
---

HADOOP-13984 is a similar issue.

> KafkaShortnamer should allow to convert Kerberos principal name to upper case 
> user name
> ---
>
> Key: KAFKA-6883
> URL: https://issues.apache.org/jira/browse/KAFKA-6883
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Attila Sasvari
>Priority: Major
>
> KAFKA-5764 implemented support to convert Kerberos principal name to lower 
> case Linux user name via auth_to_local rules. 
> As a follow-up, KafkaShortnamer could be further extended to allow converting 
> principal names to uppercase by appending /U to the rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6883) KafkaShortnamer should allow to convert Kerberos principal name to upper case user name

2018-05-08 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-6883:
-

 Summary: KafkaShortnamer should allow to convert Kerberos 
principal name to upper case user name
 Key: KAFKA-6883
 URL: https://issues.apache.org/jira/browse/KAFKA-6883
 Project: Kafka
  Issue Type: Improvement
Reporter: Attila Sasvari


KAFKA-5764 implemented support to convert Kerberos principal name to lower case 
Linux user name via auth_to_local rules. 

As a follow-up, KafkaShortnamer could be further extended to allow converting 
principal names to uppercase by appending /U to the rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6822) Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0

2018-04-25 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari updated KAFKA-6822:
--
Component/s: streams

> Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0
> ---
>
> Key: KAFKA-6822
> URL: https://issues.apache.org/jira/browse/KAFKA-6822
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Reporter: Phil Mikhailov
>Priority: Major
>
> We faced the problem when StateStore (0.10.2.1) stuck in loading data during 
> start of microservice.
>  Our configuration is Kafka 1.0.0 but microservices are built with Kafka 
> Streams 0.10.2.1.
>  We had to reset the stream offsets in order unblock microservices 'cause 
> restarts didn't help.
> We faced the problem only once and didn't have a chance to reproduce it, so 
> we're sorry in advance for maybe poor explanations.
> Below are details that we've managed to collect that time:
> Kafka consumer 0.10.2.1 calculates offsets like this:
>  Fetcher:524
> {code:java}
> long nextOffset = partRecords.get(partRecords.size() - 1).offset() + 1;
> {code}
> Get the latest offset from records (which were got from {{poll}}) plus 1.
>  So the next offset is estimated.
> In Kafka 1.0.0 otherwise broker explicitly provides the next offset to fetch:
> {code:java}
> long nextOffset = partitionRecords.nextFetchOffset;
> {code}
> It returns the actual next offset but not estimated.
>  
> That said, we had a situation when StateStore (0.10.2.1) stuck in loading 
> data. The reason was in {{ProcessorStateManager.restoreActiveState:245}} 
> which kept spinning in consumer loop 'cause this condition never happened:
> {code:java}
>    } else if (restoreConsumer.position(storePartition) == endOffset) {
>    break;
>    }
> {code}
>  
> We assume that consumer 0.10.2.1 estimates endoffset incorrectly in a case of 
> compaction. 
>  Or there is inconsistency between offsets calculation between 0.10.2.1 and 
> 1.0.0 which doesn't allow to use 0.10.2.1 consumer with 1.0.0 broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6822) Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0

2018-04-25 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451772#comment-16451772
 ] 

Attila Sasvari commented on KAFKA-6822:
---

[~phil.mikhailov] I wonder if it is related to 
[KAFKA-6367|https://issues.apache.org/jira/browse/KAFKA-6367] - Fix 
StateRestoreListener To Use Correct Batch Ending Offset which is fixed in 1.0.1 
& 1.1.0. Can you re-test with 1.0.1?

> Kafka Consumer 0.10.2.1 can not normally read data from Kafka 1.0.0
> ---
>
> Key: KAFKA-6822
> URL: https://issues.apache.org/jira/browse/KAFKA-6822
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Phil Mikhailov
>Priority: Major
>
> We faced the problem when StateStore (0.10.2.1) stuck in loading data during 
> start of microservice.
>  Our configuration is Kafka 1.0.0 but microservices are built with Kafka 
> Streams 0.10.2.1.
>  We had to reset the stream offsets in order unblock microservices 'cause 
> restarts didn't help.
> We faced the problem only once and didn't have a chance to reproduce it, so 
> we're sorry in advance for maybe poor explanations.
> Below are details that we've managed to collect that time:
> Kafka consumer 0.10.2.1 calculates offsets like this:
>  Fetcher:524
> {code:java}
> long nextOffset = partRecords.get(partRecords.size() - 1).offset() + 1;
> {code}
> Get the latest offset from records (which were got from {{poll}}) plus 1.
>  So the next offset is estimated.
> In Kafka 1.0.0 otherwise broker explicitly provides the next offset to fetch:
> {code:java}
> long nextOffset = partitionRecords.nextFetchOffset;
> {code}
> It returns the actual next offset but not estimated.
>  
> That said, we had a situation when StateStore (0.10.2.1) stuck in loading 
> data. The reason was in {{ProcessorStateManager.restoreActiveState:245}} 
> which kept spinning in consumer loop 'cause this condition never happened:
> {code:java}
>    } else if (restoreConsumer.position(storePartition) == endOffset) {
>    break;
>    }
> {code}
>  
> We assume that consumer 0.10.2.1 estimates endoffset incorrectly in a case of 
> compaction. 
>  Or there is inconsistency between offsets calculation between 0.10.2.1 and 
> 1.0.0 which doesn't allow to use 0.10.2.1 consumer with 1.0.0 broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6799) Consumer livelock during consumer group rebalance

2018-04-23 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari resolved KAFKA-6799.
---
Resolution: Information Provided

> Consumer livelock during consumer group rebalance
> -
>
> Key: KAFKA-6799
> URL: https://issues.apache.org/jira/browse/KAFKA-6799
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 1.0.0, 0.11.0.2, 1.1.0
>Reporter: Pierre-Henri Dezanneau
>Assignee: Attila Sasvari
>Priority: Critical
>
> We have the following environment:
> * 1 kafka cluster with 3 brokers
> * 1 topic with 3 partitions
> * 1 producer
> * 1 consumer group with 3 consumers
> From this setup, we remove one broker from the cluster, the hard way, by 
> simply killing it. Quite often, we see that the consumer group is not 
> rebalanced correctly. By that I mean that all 3 consumers stop consuming and 
> get stuck in a loop, forever.
> The thread dump shows that the consumer threads aren't blocked but run 
> forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due 
> to the {{synchonized}} keyword on the calling method. Heartbeat threads are 
> blocked, waiting for the consumer threads to release the lock. This situation 
> prevents all consumers from consuming any more record.
> We build a simple project which seems to reliably demonstrate this:
> {code:sh}
> $ mkdir -p /tmp/sandbox && cd /tmp/sandbox
> $ git clone https://github.com/phdezann/helloworld-kafka-livelock
> $ cd helloworld-kafka-livelock && ./spin.sh
> ...
> livelock detected
> {code}
> {code:sh|title=Consumer thread|borderStyle=solid}
> "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728
> at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl)
> - locked <0x2a16> (a java.util.Collections$UnmodifiableSet)
> - locked <0x2a17> (a sun.nio.ch.Util$3)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:684)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
> - locked <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
> at 
> org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45)
> at 
> org.helloworld.kafka.bus.HelloWorldKafkaListener$$Lambda$42.1776656466.run(Unknown
>  Source:-1)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> {code:sh|title=Heartbeat thread|borderStyle=solid}
> "kafka-coordinator-heartbeat-thread | helloWorldGroup@10728" daemon prio=5 
> tid=0x36 nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
>waiting for kafka-consumer-1@10733 to release lock on <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:955)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-6799) Consumer livelock during consumer group rebalance

2018-04-23 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari reassigned KAFKA-6799:
-

Assignee: Attila Sasvari

> Consumer livelock during consumer group rebalance
> -
>
> Key: KAFKA-6799
> URL: https://issues.apache.org/jira/browse/KAFKA-6799
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 1.0.0, 0.11.0.2, 1.1.0
>Reporter: Pierre-Henri Dezanneau
>Assignee: Attila Sasvari
>Priority: Critical
>
> We have the following environment:
> * 1 kafka cluster with 3 brokers
> * 1 topic with 3 partitions
> * 1 producer
> * 1 consumer group with 3 consumers
> From this setup, we remove one broker from the cluster, the hard way, by 
> simply killing it. Quite often, we see that the consumer group is not 
> rebalanced correctly. By that I mean that all 3 consumers stop consuming and 
> get stuck in a loop, forever.
> The thread dump shows that the consumer threads aren't blocked but run 
> forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due 
> to the {{synchonized}} keyword on the calling method. Heartbeat threads are 
> blocked, waiting for the consumer threads to release the lock. This situation 
> prevents all consumers from consuming any more record.
> We build a simple project which seems to reliably demonstrate this:
> {code:sh}
> $ mkdir -p /tmp/sandbox && cd /tmp/sandbox
> $ git clone https://github.com/phdezann/helloworld-kafka-livelock
> $ cd helloworld-kafka-livelock && ./spin.sh
> ...
> livelock detected
> {code}
> {code:sh|title=Consumer thread|borderStyle=solid}
> "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728
> at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl)
> - locked <0x2a16> (a java.util.Collections$UnmodifiableSet)
> - locked <0x2a17> (a sun.nio.ch.Util$3)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:684)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
> - locked <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
> at 
> org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45)
> at 
> org.helloworld.kafka.bus.HelloWorldKafkaListener$$Lambda$42.1776656466.run(Unknown
>  Source:-1)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> {code:sh|title=Heartbeat thread|borderStyle=solid}
> "kafka-coordinator-heartbeat-thread | helloWorldGroup@10728" daemon prio=5 
> tid=0x36 nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
>waiting for kafka-consumer-1@10733 to release lock on <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:955)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6799) Consumer livelock during consumer group rebalance

2018-04-20 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445529#comment-16445529
 ] 

Attila Sasvari commented on KAFKA-6799:
---

[~phdezann] do you have any update? I have been running the test since 
yesterday (set replication factor to 3 for __consumer_offsets topics), and did 
not encounter the issue anymore. I would like to resolve the issue with 
"Information Provided" or Not a Bug".

Some more information:
- I saw the following displayed on my stdout many times: 
{code}
could not reproduce the issue this time, re-create the cluster from scratch
Stops containers and removes containers, networks, volumes, and images
created by `up`.
{code}
- When I have only 3 producers, it is also printed: 
{code}
{"timestamp":1524215171045,"status":500,"error":"Internal Server 
Error","exception":"org.apache.kafka.common.errors.InvalidReplicationFactorException","message":"Replication
 fact
or: 3 larger than available brokers: 2.","path":"/kafka/topic/create"}Producer 
started
{code} 
- I also tested with 4 brokers, and it went well too.



> Consumer livelock during consumer group rebalance
> -
>
> Key: KAFKA-6799
> URL: https://issues.apache.org/jira/browse/KAFKA-6799
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 1.0.0, 0.11.0.2, 1.1.0
>Reporter: Pierre-Henri Dezanneau
>Priority: Critical
>
> We have the following environment:
> * 1 kafka cluster with 3 brokers
> * 1 topic with 3 partitions
> * 1 producer
> * 1 consumer group with 3 consumers
> From this setup, we remove one broker from the cluster, the hard way, by 
> simply killing it. Quite often, we see that the consumer group is not 
> rebalanced correctly. By that I mean that all 3 consumers stop consuming and 
> get stuck in a loop, forever.
> The thread dump shows that the consumer threads aren't blocked but run 
> forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due 
> to the {{synchonized}} keyword on the calling method. Heartbeat threads are 
> blocked, waiting for the consumer threads to release the lock. This situation 
> prevents all consumers from consuming any more record.
> We build a simple project which seems to reliably demonstrate this:
> {code:sh}
> $ mkdir -p /tmp/sandbox && cd /tmp/sandbox
> $ git clone https://github.com/phdezann/helloworld-kafka-livelock
> $ cd helloworld-kafka-livelock && ./spin.sh
> ...
> livelock detected
> {code}
> {code:sh|title=Consumer thread|borderStyle=solid}
> "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728
> at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl)
> - locked <0x2a16> (a java.util.Collections$UnmodifiableSet)
> - locked <0x2a17> (a sun.nio.ch.Util$3)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:684)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
> - locked <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
> at 
> org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45)
> at 
> 

[jira] [Comment Edited] (KAFKA-6799) Consumer livelock during consumer group rebalance

2018-04-19 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443921#comment-16443921
 ] 

Attila Sasvari edited comment on KAFKA-6799 at 4/19/18 3:00 PM:


[~phdezann] thanks for reporting this issue and creating the docker environment 
for reproducing it.

- I had to set the environment variable M2_REPOSITORY before running the shell 
script: {{M2_REPOSITORY=/root/.m2/ ./spin.sh}}
Then I saw the issue you described in the description.
- I looked a bit around in the helloworld-kafka-1 docker container, and noticed 
that the replication factor for the internal topic was set to 1:
{code}
root@b6e9218f1761:/install/kafka# bin/kafka-topics.sh --describe -zookeeper 
172.170.0.80:2181
Topic:__consumer_offsetsPartitionCount:50   ReplicationFactor:1 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets   Partition: 0Leader: 103 
Replicas: 103   Isr: 103
Topic: __consumer_offsets   Partition: 1Leader: 101 
Replicas: 101   Isr: 101
Topic: __consumer_offsets   Partition: 2Leader: -1  
Replicas: 102   Isr: 102
{code}
In this situation, consumer cannot contact the partition leader for  
__consumer_offsets, Partition: 2 as it was killed by the test. So it won't be 
able to commit the offset, for that specific partition.
- I changed replication factor to 3 for {{__consumer_offsets}} and then I did 
not see this issue.
- Can you add the following to {{docker/entrypoint.sh}} and re-test?
{code}
cat >>config/server.properties <>config/server.properties < Consumer livelock during consumer group rebalance
> -
>
> Key: KAFKA-6799
> URL: https://issues.apache.org/jira/browse/KAFKA-6799
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 1.0.0, 0.11.0.2, 1.1.0
>Reporter: Pierre-Henri Dezanneau
>Priority: Critical
>
> We have the following environment:
> * 1 kafka cluster with 3 brokers
> * 1 topic with 3 partitions
> * 1 producer
> * 1 consumer group with 3 consumers
> From this setup, we remove one broker from the cluster, the hard way, by 
> simply killing it. Quite often, we see that the consumer group is not 
> rebalanced correctly. By that I mean that all 3 consumers stop consuming and 
> get stuck in a loop, forever.
> The thread dump shows that the consumer threads aren't blocked but run 
> forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due 
> to the {{synchonized}} keyword on the calling method. Heartbeat threads are 
> blocked, waiting for the consumer threads to release the lock. This situation 
> prevents all consumers from consuming any more record.
> We build a simple project which seems to reliably demonstrate this:
> {code:sh}
> $ mkdir -p /tmp/sandbox && cd /tmp/sandbox
> $ git clone https://github.com/phdezann/helloworld-kafka-livelock
> $ cd helloworld-kafka-livelock && ./spin.sh
> ...
> livelock detected
> {code}
> {code:sh|title=Consumer thread|borderStyle=solid}
> "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728
> at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl)
> - locked <0x2a16> (a java.util.Collections$UnmodifiableSet)
> - locked <0x2a17> (a sun.nio.ch.Util$3)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:684)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
> - locked <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at 
> 

[jira] [Commented] (KAFKA-6799) Consumer livelock during consumer group rebalance

2018-04-19 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443921#comment-16443921
 ] 

Attila Sasvari commented on KAFKA-6799:
---

[~phdezann] thanks for reporting this issue and creating the docker environment 
for reproducing the issue.

- I had to set the environment variable M2_REPOSITORY before running the shell 
script: {{M2_REPOSITORY=/root/.m2/ ./spin.sh}}
Then I saw the issue you described in the description.
- I looked a bit around in the helloworld-kafka-1 docker container, and noticed 
that the replication factor for the internal topic was set to 1:
{code}
root@b6e9218f1761:/install/kafka# bin/kafka-topics.sh --describe -zookeeper 
172.170.0.80:2181
Topic:__consumer_offsetsPartitionCount:50   ReplicationFactor:1 
Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets   Partition: 0Leader: 103 
Replicas: 103   Isr: 103
Topic: __consumer_offsets   Partition: 1Leader: 101 
Replicas: 101   Isr: 101
Topic: __consumer_offsets   Partition: 2Leader: -1  
Replicas: 102   Isr: 102
{code}
In this situation, consumer cannot contact the partition leader for  
__consumer_offsets, Partition: 2 as it was killed by the test. So it won't be 
able to commit the offset, for that specific partition.
- I changed replication factor to 3 for {{__consumer_offsets}} and then I did 
not see this issue.
- Can you add the following to {{docker/entrypoint.sh}} and re-test?
{code}
cat >>config/server.properties < Consumer livelock during consumer group rebalance
> -
>
> Key: KAFKA-6799
> URL: https://issues.apache.org/jira/browse/KAFKA-6799
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Affects Versions: 1.0.0, 0.11.0.2, 1.1.0
>Reporter: Pierre-Henri Dezanneau
>Priority: Critical
>
> We have the following environment:
> * 1 kafka cluster with 3 brokers
> * 1 topic with 3 partitions
> * 1 producer
> * 1 consumer group with 3 consumers
> From this setup, we remove one broker from the cluster, the hard way, by 
> simply killing it. Quite often, we see that the consumer group is not 
> rebalanced correctly. By that I mean that all 3 consumers stop consuming and 
> get stuck in a loop, forever.
> The thread dump shows that the consumer threads aren't blocked but run 
> forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due 
> to the {{synchonized}} keyword on the calling method. Heartbeat threads are 
> blocked, waiting for the consumer threads to release the lock. This situation 
> prevents all consumers from consuming any more record.
> We build a simple project which seems to reliably demonstrate this:
> {code:sh}
> $ mkdir -p /tmp/sandbox && cd /tmp/sandbox
> $ git clone https://github.com/phdezann/helloworld-kafka-livelock
> $ cd helloworld-kafka-livelock && ./spin.sh
> ...
> livelock detected
> {code}
> {code:sh|title=Consumer thread|borderStyle=solid}
> "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
>blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728
> at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl)
> - locked <0x2a16> (a java.util.Collections$UnmodifiableSet)
> - locked <0x2a17> (a sun.nio.ch.Util$3)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:684)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
> - locked <0x2a0c> (a 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> at 
> 

[jira] [Resolved] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader

2018-03-23 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari resolved KAFKA-6703.
---
Resolution: Not A Bug

It turned out I forgot to set proper replication factor for 
{{__consumer_offsets}}. The default replication factor is 1, and the consumer 
group controller (determined by {{partitionFor(group)}} in GroupCoordinator) 
was down. Changing replication factor to 3, I did not experience the issue. I 
still saw a couple of messages like
{code:java}
[2018-03-23 16:45:52,298] DEBUG [Consumer clientId=2-1, groupId=2] Leader for 
partition testR1P3-2 is unavailable for fetching offset 
(org.apache.kafka.clients.consumer.internals.Fetcher){code}
but messages in other topics matched by the whitelist regexp were fetched by 
MirrorMaker.

> MirrorMaker cannot make progress when any matched topic from a whitelist 
> regexp has -1 leader
> -
>
> Key: KAFKA-6703
> URL: https://issues.apache.org/jira/browse/KAFKA-6703
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Attila Sasvari
>Priority: Major
>
> Scenario:
>  - MM whitelabel regexp matches multiple topics
>  - destination cluster has 5 brokers with multiple topics replication factor 3
>  - without partition reassign shut down 2 brokers
>  - suppose a topic has no leader any more because it was off-sync and the 
> leader and the rest of the replicas are hosted on the downed brokers.
>  - so we have 1 topic with some partitions with leader -1
>  - the rest of the matching topics has 3 replicas with leaders
> MM will not produce into any of the matched topics until:
>  - the "orphaned" topic removed or
>  - the partition reassign carried out from the downed brokers (suppose you 
> can turn these back on)
> In the MirrorMaker logs, there are a lot of messages like the following ones:
> {code}
> [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, 
> groupId=console-consumer-43054] Coordinator discovery failed, refreshing 
> metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, 
> groupId=console-consumer-43054] Sending metadata request 
> (type=MetadataRequest, topics=) to node 192.168.1.102:9092 (id: 0 rack: 
> null) (org.apache.kafka.clients.NetworkClient)
> [2018-03-22 19:55:32,525] DEBUG Updated cluster metadata version 10 to 
> Cluster(id = Y-qtoFP-RMq2uuVnkEKAAw, nodes = [192.168.1.102:9092 (id: 0 rack: 
> null)], partitions = [Partition(topic = testR1P2, partition = 1, leader = 
> none, replicas = [42], isr = [], offlineReplicas = [42]), Partition(topic = 
> testR1P1, partition = 0, leader = 0, replicas = [0], isr = [0], 
> offlineReplicas = []), Partition(topic = testAlive, partition = 0, leader = 
> 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = 
> testERRR, partition = 0, leader = 0, replicas = [0], isr = [0], 
> offlineReplicas = []), Partition(topic = testR1P2, partition = 0, leader = 0, 
> replicas = [0], isr = [0], offlineReplicas = [])]) 
> (org.apache.kafka.clients.Metadata)
> [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, 
> groupId=console-consumer-43054] Sending FindCoordinator request to broker 
> 192.168.1.102:9092 (id: 0 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, 
> groupId=console-consumer-43054] Received FindCoordinator response 
> ClientResponse(receivedTimeMs=1521744932525, latencyMs=0, disconnected=false, 
> requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
> clientId=consumer-1, correlationId=19), 
> responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
> error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 19:55:32,526] DEBUG [Consumer clientId=consumer-1, 
> groupId=console-consumer-43054] Group coordinator lookup failed: The 
> coordinator is not available. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> {code}
> Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer 
> properties file, then an OldConsumer is created, and it can make progress.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6699) When one of two Kafka nodes are dead, streaming API cannot handle messaging

2018-03-23 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411454#comment-16411454
 ] 

Attila Sasvari commented on KAFKA-6699:
---

[~habdank] just a quick question: is the replication factor set to 2 for the 
internal topic {{__consumer_offsets}} too? Please note if you change the 
application id, you will not be able to continue consuming messages from the 
last processed offset (unless you do some magic). 

> When one of two Kafka nodes are dead, streaming API cannot handle messaging
> ---
>
> Key: KAFKA-6699
> URL: https://issues.apache.org/jira/browse/KAFKA-6699
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.2
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Major
>
> Dears,
> I am observing quite often, when Kafka Broker is partly dead(*), then 
> application, which uses streaming API are doing nothing.
> (*) Partly dead in my case it means that one of two Kafka nodes are out of 
> order. 
> Especially when disk is full on one machine, then Broker is going in some 
> strange state, where streaming API goes vacations. It seems like regular 
> producer/consumer API has no problem in such a case.
> Can you have a look on that matter?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader

2018-03-22 Thread Attila Sasvari (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Sasvari updated KAFKA-6703:
--
Description: 
Scenario:
 - MM whitelabel regexp matches multiple topics
 - destination cluster has 5 brokers with multiple topics replication factor 3
 - without partition reassign shut down 2 brokers
 - suppose a topic has no leader any more because it was off-sync and the 
leader and the rest of the replicas are hosted on the downed brokers.
 - so we have 1 topic with some partitions with leader -1
 - the rest of the matching topics has 3 replicas with leaders

MM will not produce into any of the matched topics until:
 - the "orphaned" topic removed or
 - the partition reassign carried out from the downed brokers (suppose you can 
turn these back on)

In the MirrorMaker logs, there are a lot of messages like the following ones:
{code}
[2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, 
groupId=console-consumer-43054] Coordinator discovery failed, refreshing 
metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, 
groupId=console-consumer-43054] Sending metadata request (type=MetadataRequest, 
topics=) to node 192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.NetworkClient)

[2018-03-22 19:55:32,525] DEBUG Updated cluster metadata version 10 to 
Cluster(id = Y-qtoFP-RMq2uuVnkEKAAw, nodes = [192.168.1.102:9092 (id: 0 rack: 
null)], partitions = [Partition(topic = testR1P2, partition = 1, leader = none, 
replicas = [42], isr = [], offlineReplicas = [42]), Partition(topic = testR1P1, 
partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas = []), 
Partition(topic = testAlive, partition = 0, leader = 0, replicas = [0], isr = 
[0], offlineReplicas = []), Partition(topic = testERRR, partition = 0, leader = 
0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = 
testR1P2, partition = 0, leader = 0, replicas = [0], isr = [0], offlineReplicas 
= [])]) (org.apache.kafka.clients.Metadata)

[2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, 
groupId=console-consumer-43054] Sending FindCoordinator request to broker 
192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, 
groupId=console-consumer-43054] Received FindCoordinator response 
ClientResponse(receivedTimeMs=1521744932525, latencyMs=0, disconnected=false, 
requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
clientId=consumer-1, correlationId=19), 
responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

[2018-03-22 19:55:32,526] DEBUG [Consumer clientId=consumer-1, 
groupId=console-consumer-43054] Group coordinator lookup failed: The 
coordinator is not available. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
{code}

Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer 
properties file, then an OldConsumer is created, and it can make progress.



  was:
Scenario:
 - MM whitelabel regexp matches multiple topics
 - destination cluster has 5 brokers with multiple topics replication factor 3
 - without partition reassign shut down 2 brokers
 - suppose a topic has no leader any more because it was off-sync and the 
leader and the rest of the replicas are hosted on the downed brokers.
 - so we have 1 topic with some partitions with leader -1
 - the rest of the matching topics has 3 replicas with leaders

MM will not produce into any of the matched topics until:
 - the "orphaned" topic removed or
 - the partition reassign carried out from the downed brokers (suppose you can 
turn these back on)

In the MirrorMaker logs, there are a lot of messages like the following ones:
{code}
[2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending 
FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending 
FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received 
FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, 
latencyMs=1, disconnected=false, 
requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
clientId=1-0, correlationId=71), 
responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 

[jira] [Commented] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader

2018-03-22 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410078#comment-16410078
 ] 

Attila Sasvari commented on KAFKA-6703:
---

There is a call to 
[ensureCoordinatorReady()|https://github.com/apache/kafka/blob/f0a29a693548efe539cba04807e21862c8dfc1bf/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L279]
 in {{ConsumerCoordinator}}. It tries to poll the coordinator in a 
[loop|https://github.com/apache/kafka/blob/f0a29a693548efe539cba04807e21862c8dfc1bf/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L217-L219].
 It does not succeed, so it will retry the connection, and other matched topics 
are ignored until the failing coordinator becomes healthy.


> MirrorMaker cannot make progress when any matched topic from a whitelist 
> regexp has -1 leader
> -
>
> Key: KAFKA-6703
> URL: https://issues.apache.org/jira/browse/KAFKA-6703
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Attila Sasvari
>Priority: Major
>
> Scenario:
>  - MM whitelabel regexp matches multiple topics
>  - destination cluster has 5 brokers with multiple topics replication factor 3
>  - without partition reassign shut down 2 brokers
>  - suppose a topic has no leader any more because it was off-sync and the 
> leader and the rest of the replicas are hosted on the downed brokers.
>  - so we have 1 topic with some partitions with leader -1
>  - the rest of the matching topics has 3 replicas with leaders
> MM will not produce into any of the matched topics until:
>  - the "orphaned" topic removed or
>  - the partition reassign carried out from the downed brokers (suppose you 
> can turn these back on)
> In the MirrorMaker logs, there are a lot of messages like the following ones:
> {code}
> [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending 
> FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending 
> FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received 
> FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, 
> latencyMs=1, disconnected=false, 
> requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
> clientId=1-0, correlationId=71), 
> responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
> error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Received 
> FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, 
> latencyMs=1, disconnected=false, 
> requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
> clientId=1-1, correlationId=71), 
> responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
> error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Group 
> coordinator lookup failed: The coordinator is not available. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Group 
> coordinator lookup failed: The coordinator is not available. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] 
> Coordinator discovery failed, refreshing metadata 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> {code}
> Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer 
> properties file, then an OldConsumer is created, and it can make progress.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader

2018-03-22 Thread Attila Sasvari (JIRA)
Attila Sasvari created KAFKA-6703:
-

 Summary: MirrorMaker cannot make progress when any matched topic 
from a whitelist regexp has -1 leader
 Key: KAFKA-6703
 URL: https://issues.apache.org/jira/browse/KAFKA-6703
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Attila Sasvari


Scenario:
 - MM whitelabel regexp matches multiple topics
 - destination cluster has 5 brokers with multiple topics replication factor 3
 - without partition reassign shut down 2 brokers
 - suppose a topic has no leader any more because it was off-sync and the 
leader and the rest of the replicas are hosted on the downed brokers.
 - so we have 1 topic with some partitions with leader -1
 - the rest of the matching topics has 3 replicas with leaders

MM will not produce into any of the matched topics until:
 - the "orphaned" topic removed or
 - the partition reassign carried out from the downed brokers (suppose you can 
turn these back on)

In the MirrorMaker logs, there are a lot of messages like the following ones:
{code}
[2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending 
FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending 
FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received 
FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, 
latencyMs=1, disconnected=false, 
requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
clientId=1-0, correlationId=71), 
responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Received 
FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, 
latencyMs=1, disconnected=false, 
requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, 
clientId=1-1, correlationId=71), 
responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', 
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Group 
coordinator lookup failed: The coordinator is not available. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Group 
coordinator lookup failed: The coordinator is not available. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Coordinator 
discovery failed, refreshing metadata 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Coordinator 
discovery failed, refreshing metadata 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
{code}

Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer 
properties file, then an OldConsumer is created, and it can make progress.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up

2018-02-19 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369119#comment-16369119
 ] 

Attila Sasvari commented on KAFKA-6332:
---

[~hachikuji] many thanks! I created a PR and all checks have passed.

> Kafka system tests should use nc instead of log grep to detect start-up
> ---
>
> Key: KAFKA-6332
> URL: https://issues.apache.org/jira/browse/KAFKA-6332
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Attila Sasvari
>Priority: Major
>  Labels: newbie
>
> [~ewencp] suggested using nc -z test instead of grepping the logs for a more 
> reliable test. This came up when the system tests were broken by a log 
> improvement change.
> Reference: https://github.com/apache/kafka/pull/3834



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6332) Kafka system tests should use nc instead of log grep to detect start-up

2018-02-17 Thread Attila Sasvari (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368328#comment-16368328
 ] 

Attila Sasvari commented on KAFKA-6332:
---

[~ijuma] I'd like to work on this task, can you please assign it to me?

In {{kafka.py}}, we could add something similar to the [listening() function in 
zookeeper.py|https://github.com/apache/kafka/blob/ee352be9c88663b95b1096b7a294e61857857380/tests/kafkatest/services/zookeeper.py#L87]
 :
{code}
def listening(self, node, port):
  try:
  cmd = "nc -z %s %s" % (node.account.hostname, port)
  node.account.ssh_output(cmd, allow_fail=False)
  self.logger.debug("Kafka server started accepting connections at: 
'%s:%s')", node.account.hostname, port)
  return True
  except (RemoteCommandError, ValueError) as e:
  return False
{code}
and call it in {{start_node}}  
{code}
def start_node(self, node):
...
self.logger.debug("Attempting to start KafkaService on %s with command: 
%s" % (str(node.account), cmd))
node.account.ssh(cmd)
wait_until(lambda: self.listening(node, self.jmx_port), timeout_sec=30, 
backoff_sec=.25,
   err_msg="Kafka server didn't finish startup")
{code}
I have a [work in 
progress|https://github.com/asasvari/kafka/commit/916ab0b8291c7db2d16d84fc847630972628e345],
 executed a few tests 
({{./tests/kafkatest/tests/core/consumer_group_command_test.py}}, 
{{./tests/kafkatest/tests/core/simple_consumer_shell_test.py}}) and they passed.

However, {{listening()}} could be more general; so it might go into the 
{{utils}} module. I see there are references to {{monitor.wait_until}} grepping 
for messages in server logs at other places (like {{minikdc.py}}, 
{{connect.py}}, {{torgdor.py}}, {{streams.py}} - we need to know the exact 
ports).

> Kafka system tests should use nc instead of log grep to detect start-up
> ---
>
> Key: KAFKA-6332
> URL: https://issues.apache.org/jira/browse/KAFKA-6332
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Priority: Major
>  Labels: newbie
>
> [~ewencp] suggested using nc -z test instead of grepping the logs for a more 
> reliable test. This came up when the system tests were broken by a log 
> improvement change.
> Reference: https://github.com/apache/kafka/pull/3834



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)