[jira] [Created] (KAFKA-12712) KRaft: Missing controller.quorom.voters config not properly handled

2021-04-23 Thread Magnus Edenhill (Jira)
Magnus Edenhill created KAFKA-12712:
---

 Summary: KRaft: Missing controller.quorom.voters config not 
properly handled
 Key: KAFKA-12712
 URL: https://issues.apache.org/jira/browse/KAFKA-12712
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.8.0
Reporter: Magnus Edenhill


When trying out KRaft in 2.8 I mispelled controller.quorum.voters as 
controller.quorum.voters, but the broker did not fail to start, nor did it 
print any warning.

 

Instead it raised this error:

 
{code:java}
[2021-04-23 18:25:13,484] INFO Starting controller 
(kafka.server.ControllerServer)[2021-04-23 18:25:13,484] INFO Starting 
controller (kafka.server.ControllerServer)[2021-04-23 18:25:13,485] ERROR 
[kafka-raft-io-thread]: Error due to 
(kafka.raft.KafkaRaftManager$RaftIoThread)java.lang.IllegalArgumentException: 
bound must be positive at java.util.Random.nextInt(Random.java:388) at 
org.apache.kafka.raft.RequestManager.findReadyVoter(RequestManager.java:57) at 
org.apache.kafka.raft.KafkaRaftClient.maybeSendAnyVoterFetch(KafkaRaftClient.java:1778)
 at 
org.apache.kafka.raft.KafkaRaftClient.pollUnattachedAsObserver(KafkaRaftClient.java:2080)
 at 
org.apache.kafka.raft.KafkaRaftClient.pollUnattached(KafkaRaftClient.java:2061) 
at 
org.apache.kafka.raft.KafkaRaftClient.pollCurrentState(KafkaRaftClient.java:2096)
 at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2181) at 
kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:53) at 
kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
{code}
which I guess eventually (1 minute later) lead to this error which terminated 
the broker:
{code:java}
[2021-04-23 18:26:14,435] ERROR [BrokerLifecycleManager id=2] Shutting down 
because we were unable to register with the controller quorum. 
(kafka.server.BrokerLifecycleManager)[2021-04-23 18:26:14,435] ERROR 
[BrokerLifecycleManager id=2] Shutting down because we were unable to register 
with the controller quorum. (kafka.server.BrokerLifecycleManager)[2021-04-23 
18:26:14,436] INFO [BrokerLifecycleManager id=2] registrationTimeout: shutting 
down event queue. (org.apache.kafka.queue.KafkaEventQueue)[2021-04-23 
18:26:14,437] INFO [BrokerLifecycleManager id=2] Transitioning from STARTING to 
SHUTTING_DOWN. (kafka.server.BrokerLifecycleManager)[2021-04-23 18:26:14,437] 
INFO [broker-2-to-controller-send-thread]: Shutting down 
(kafka.server.BrokerToControllerRequestThread)[2021-04-23 18:26:14,438] INFO 
[broker-2-to-controller-send-thread]: Stopped 
(kafka.server.BrokerToControllerRequestThread)[2021-04-23 18:26:14,438] INFO 
[broker-2-to-controller-send-thread]: Shutdown completed 
(kafka.server.BrokerToControllerRequestThread)[2021-04-23 18:26:14,441] ERROR 
[BrokerServer id=2] Fatal error during broker startup. Prepare to shutdown 
(kafka.server.BrokerServer)java.util.concurrent.CancellationException at 
java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2276) at 
kafka.server.BrokerLifecycleManager$ShutdownEvent.run(BrokerLifecycleManager.scala:474)
 at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:174)
 at java.lang.Thread.run(Thread.java:748)
{code}
But since the client listeners were made available prior to shutting down, the 
broker was deemed up and operational by the (naiive) monitoring tool.

So..:

 - Broker should fail on startup on invalid/unknown config properties. I 
understand this is tehcnically tricky, so at least a warning log should be 
printed.

 - Perhaps not create client listeners before control plane is somewhat happy.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9180) Broker won't start with empty log dir

2019-11-13 Thread Magnus Edenhill (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill resolved KAFKA-9180.

Resolution: Invalid

Turned out to be old client jars making a mess.

 

> Broker won't start with empty log dir
> -
>
> Key: KAFKA-9180
> URL: https://issues.apache.org/jira/browse/KAFKA-9180
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.0
>Reporter: Magnus Edenhill
>Assignee: Ismael Juma
>Priority: Blocker
>
> On kafka trunk at commit 1675115ec193acf4c7d44e68a57272edfec0b455:
>  
> Attempting to start the broker with an existing but empty log dir yields the 
> following error and terminates the process:
> {code:java}
> [2019-11-13 10:42:16,922] ERROR Failed to read meta.properties file under dir 
> /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
>  due to 
> /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
>  (No such file or directory) 
> (kafka.server.BrokerMetadataCheckpoint)[2019-11-13 10:42:16,924] ERROR Fail 
> to read meta.properties under log directory 
> /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs
>  (kafka.server.KafkaServer)java.io.FileNotFoundException: 
> /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
>  (No such file or directory)at java.io.FileInputStream.open0(Native 
> Method)at java.io.FileInputStream.open(FileInputStream.java:195)  
>   at java.io.FileInputStream.(FileInputStream.java:138)at 
> java.io.FileInputStream.(FileInputStream.java:93)at 
> org.apache.kafka.common.utils.Utils.loadProps(Utils.java:512)at 
> kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:73)
> at 
> kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:72) 
>at 
> kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1(KafkaServer.scala:704)
> at 
> kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1$adapted(KafkaServer.scala:702)
> at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)   
>  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)  
>   at 
> kafka.server.KafkaServer.getBrokerMetadataAndOfflineDirs(KafkaServer.scala:702)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:214)at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)  
>   at kafka.Kafka$.main(Kafka.scala:84)at 
> kafka.Kafka.main(Kafka.scala) {code}
>  
>  
> Changing the catch to FileNotFoundException fixes the issue, here:
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerMetadataCheckpoint.scala#L84]
>  
>  
> This is a regression from 2.3.x.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9180) Broker won't start with empty log dir

2019-11-13 Thread Magnus Edenhill (Jira)
Magnus Edenhill created KAFKA-9180:
--

 Summary: Broker won't start with empty log dir
 Key: KAFKA-9180
 URL: https://issues.apache.org/jira/browse/KAFKA-9180
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.4.0
Reporter: Magnus Edenhill
Assignee: Ismael Juma


On kafka trunk at commit 1675115ec193acf4c7d44e68a57272edfec0b455:

 

Attempting to start the broker with an existing but empty log dir yields the 
following error and terminates the process:
{code:java}
[2019-11-13 10:42:16,922] ERROR Failed to read meta.properties file under dir 
/Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
 due to 
/Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
 (No such file or directory) (kafka.server.BrokerMetadataCheckpoint)[2019-11-13 
10:42:16,924] ERROR Fail to read meta.properties under log directory 
/Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs
 (kafka.server.KafkaServer)java.io.FileNotFoundException: 
/Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties
 (No such file or directory)at java.io.FileInputStream.open0(Native 
Method)at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)at 
java.io.FileInputStream.(FileInputStream.java:93)at 
org.apache.kafka.common.utils.Utils.loadProps(Utils.java:512)at 
kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:73)
at 
kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:72)   
 at 
kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1(KafkaServer.scala:704)
at 
kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1$adapted(KafkaServer.scala:702)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)  
  at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)  
  at 
kafka.server.KafkaServer.getBrokerMetadataAndOfflineDirs(KafkaServer.scala:702) 
   at kafka.server.KafkaServer.startup(KafkaServer.scala:214)at 
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:84)at kafka.Kafka.main(Kafka.scala) 
{code}
 
 
Changing the catch to FileNotFoundException fixes the issue, here:
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerMetadataCheckpoint.scala#L84]
 
 
This is a regression from 2.3.x.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-7549) Old ProduceRequest with zstd compression does not return error to client

2018-10-25 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-7549:
--

 Summary: Old ProduceRequest with zstd compression does not return 
error to client
 Key: KAFKA-7549
 URL: https://issues.apache.org/jira/browse/KAFKA-7549
 Project: Kafka
  Issue Type: Bug
  Components: compression
Reporter: Magnus Edenhill


Kafka broker v2.1.0rc0.

 

KIP-110 states that:

"Zstd will only be allowed for the bumped produce API. That is, for older 
version clients(=below KAFKA_2_1_IV0), we return UNSUPPORTED_COMPRESSION_TYPE 
regardless of the message format."

 

However, sending a ProduceRequest V3 with zstd compression (which is a client 
side bug) closes the connection with the following exception rather than 
returning UNSUPPORTED_COMPRESSION_TYPE in the ProduceResponse:

 
{noformat}
[2018-10-25 11:40:31,813] ERROR Exception while processing request from 
127.0.0.1:60723-127.0.0.1:60656-94 (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request 
for apiKey: PRODUCE, apiVersion: 3, connectionId: 
127.0.0.1:60723-127.0.0.1:60656-94, listenerName: ListenerName(PLAINTEXT), 
principal: User:ANONYMOUS
Caused by: org.apache.kafka.common.record.InvalidRecordException: Produce 
requests with version 3 are note allowed to use ZStandard compression
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6885) DescribeConfigs synonyms are are identical to parent entry for BROKER resources

2018-05-09 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill resolved KAFKA-6885.

Resolution: Invalid

> DescribeConfigs synonyms are are identical to parent entry for BROKER 
> resources
> ---
>
> Key: KAFKA-6885
> URL: https://issues.apache.org/jira/browse/KAFKA-6885
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 1.1.0
>Reporter: Magnus Edenhill
>Priority: Major
>
> The DescribeConfigs protocol response for BROKER resources returns synonyms 
> for various configuration entries, such as "log.dir".
> The list of synonyms returned are identical to their parent configuration 
> entry, rather than the actual synonyms.
> For example, for the broker "log.dir" config entry it returns one synonym, 
> also named "log.dir" rather than "log.dirs" or whatever the synonym is 
> supposed to be.
>  
> This does not seem to happen for TOPIC resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6885) DescribeConfigs synonyms are are identical to parent entry for BROKER resources

2018-05-08 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-6885:
--

 Summary: DescribeConfigs synonyms are are identical to parent 
entry for BROKER resources
 Key: KAFKA-6885
 URL: https://issues.apache.org/jira/browse/KAFKA-6885
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 1.1.0
Reporter: Magnus Edenhill


The DescribeConfigs protocol response for BROKER resources returns synonyms for 
various configuration entries, such as "log.dir".

The list of synonyms returned are identical to their parent configuration 
entry, rather than the actual synonyms.

For example, for the broker "log.dir" config entry it returns one synonym, also 
named "log.dir" rather than "log.dirs" or whatever the synonym is supposed to 
be.

 

This does not seem to happen for TOPIC resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6778) DescribeConfigs does not return error for non-existent topic

2018-04-11 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-6778:
--

 Summary: DescribeConfigs does not return error for non-existent 
topic
 Key: KAFKA-6778
 URL: https://issues.apache.org/jira/browse/KAFKA-6778
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 1.1.0
Reporter: Magnus Edenhill


Sending a DescribeConfigsRequest with a ConfigResource(TOPIC, 
"non-existent-topic") returns a fully populated ConfigResource back in the 
response with 24 configuration entries.

A resource-level error_code of UnknownTopic.. would be expected instead.

 
{code:java}
[0081_admin / 1.143s] ConfigResource #0: type TOPIC (2), 
"rdkafkatest_rnd3df408bf5d94d696_DescribeConfigs_notexist": 24 ConfigEntries, 
error NO_ERROR ()
[0081_admin / 1.144s] #0/24: Source UNKNOWN (5): "compression.type"="producer" 
[is read-only=n, default=n, sensitive=n, synonym=n] with 1 synonym(s)

{code}
But the topic does not exist:
{code:java}
$ $KAFKA_PATH/bin/kafka-topics.sh --zookeeper $ZK_ADDRESS --list | grep 
rdkafkatest_rnd3df408bf5d94d696_DescribeConfigs_notexist ; echo $?
1

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-05-27 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027366#comment-16027366
 ] 

Magnus Edenhill commented on KAFKA-4340:


Since this is a change to the protocol API (change of behaviour), I suspect 
that a proper KIP is required and would recommend to revert this commit for the 
upcoming release.


> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.11.0.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-05-24 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023668#comment-16023668
 ] 

Magnus Edenhill commented on KAFKA-4340:


Generally I would agree, but in this case I don't think there is a practical 
difference to discarding outdated messages, or removing them with the retention 
cleaner, it is just a matter of time frame - discarding and not appending 
outdated messages to the logs is immediate, while the retention cleaner might 
kick in immediately or in whatever max interval it is configured to.
So from the producer and consumer perspectives the end result is pretty much 
the same: there is no guarantee that an outdated message will be seen by a 
consumer.

However, rejecting the entire batch means there will be guaranteed data loss in 
the message stream: the producer will not try to re-send those failed messages, 
even if all but one message in the batch were actually okay. I strongly feel 
this is undesired behaviour from the application's point of view.


> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.11.0.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-05-24 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022945#comment-16022945
 ] 

Magnus Edenhill commented on KAFKA-4340:


While the idea behind this JIRA is good (as a means of optimization) I think it 
might be troublesome in practice.

If a producer sends a batch of N messages, with one message being too old, the 
entire batch will fail (errors are propagated per partition, not message) and 
the producer can't really recover and retry gracefully to produce the 
non-timedout messages.

This problem is not related to a specific client, but rather the nature of the 
data being produced:
it will manifest itself with old timestamps, such as app-sourced timestamps, or 
things like MM.

A better alternative would perhaps be to silently discard the message on the 
broker instead (which is effectively the same as writing the message to log and 
then immediately cleaning it before a consumer picks up the message).

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.11.0.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-05-24 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill reopened KAFKA-4340:


> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.11.0.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (KAFKA-4842) Streams integration tests occasionally fail with connection error

2017-04-18 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill updated KAFKA-4842:
---
Comment: was deleted

(was: Happened again on a trunk PR, 
https://github.com/apache/kafka/pull/2765#issuecomment-290381971 :

https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2525/


{noformat}
org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance FAILED
java.lang.AssertionError: Condition not met within timeout 3. waiting 
for metadata, store and value to be non null
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:257)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:262)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:344)
{noformat}

)

> Streams integration tests occasionally fail with connection error
> -
>
> Key: KAFKA-4842
> URL: https://issues.apache.org/jira/browse/KAFKA-4842
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
>Assignee: Matthias J. Sax
> Fix For: 0.11.0.0
>
>
> {noformat}
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[0] FAILED  java.lang.IllegalStateException: No entry found 
> for connection 0. In ClusterConnectionStates.java: node state.
> {noformat}
> This happens locally. Happened to 
> KStreamAggregationIntegrationTest.java.shouldReduce() once too. 
> Not clear this is streams related. Also, it's hard to reproduce since it 
> doesn't happen all the time. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4983) Test failure: kafka.api.ConsumerBounceTest.testSubscribeWhenTopicUnavailable

2017-03-30 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill updated KAFKA-4983:
---
Description: 
The PR builder encountered this test failure:
{{kafka.api.ConsumerBounceTest.testSubscribeWhenTopicUnavailable}}

https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2532/testReport/junit/kafka.api/ConsumerBounceTest/testSubscribeWhenTopicUnavailable/

{noformat}
[2017-03-30 13:42:40,875] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-03-30 13:42:40,884] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-03-30 13:42:41,221] FATAL [Kafka Server 1], Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer:118)
kafka.common.KafkaException: Socket server failed to bind to localhost:42198: 
Address already in use.
at kafka.network.Acceptor.openServerSocket(SocketServer.scala:330)
at kafka.network.Acceptor.(SocketServer.scala:255)
at kafka.network.SocketServer.$anonfun$startup$1(SocketServer.scala:99)
at 
kafka.network.SocketServer.$anonfun$startup$1$adapted(SocketServer.scala:90)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.network.SocketServer.startup(SocketServer.scala:90)
at kafka.server.KafkaServer.startup(KafkaServer.scala:215)
at kafka.utils.TestUtils$.createServer(TestUtils.scala:122)
at 
kafka.integration.KafkaServerTestHarness.$anonfun$setUp$1(KafkaServerTestHarness.scala:91)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.map(TraversableLike.scala:234)
at scala.collection.TraversableLike.map$(TraversableLike.scala:227)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:91)
at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:65)
at kafka.api.ConsumerBounceTest.setUp(ConsumerBounceTest.scala:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at 

[jira] [Created] (KAFKA-4983) Test failure: kafka.api.ConsumerBounceTest.testSubscribeWhenTopicUnavailable

2017-03-30 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-4983:
--

 Summary: Test failure: 
kafka.api.ConsumerBounceTest.testSubscribeWhenTopicUnavailable
 Key: KAFKA-4983
 URL: https://issues.apache.org/jira/browse/KAFKA-4983
 Project: Kafka
  Issue Type: Test
Reporter: Magnus Edenhill


The PR builder encountered this test failure:
{{kafka.api.ConsumerBounceTest.testSubscribeWhenTopicUnavailable}}

https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2532/testReport/junit/kafka.api/ConsumerBounceTest/testSubscribeWhenTopicUnavailable/

{{noformat}}
[2017-03-30 13:42:40,875] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-03-30 13:42:40,884] ERROR ZKShutdownHandler is not registered, so 
ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
changes (org.apache.zookeeper.server.ZooKeeperServer:472)
[2017-03-30 13:42:41,221] FATAL [Kafka Server 1], Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer:118)
kafka.common.KafkaException: Socket server failed to bind to localhost:42198: 
Address already in use.
at kafka.network.Acceptor.openServerSocket(SocketServer.scala:330)
at kafka.network.Acceptor.(SocketServer.scala:255)
at kafka.network.SocketServer.$anonfun$startup$1(SocketServer.scala:99)
at 
kafka.network.SocketServer.$anonfun$startup$1$adapted(SocketServer.scala:90)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.network.SocketServer.startup(SocketServer.scala:90)
at kafka.server.KafkaServer.startup(KafkaServer.scala:215)
at kafka.utils.TestUtils$.createServer(TestUtils.scala:122)
at 
kafka.integration.KafkaServerTestHarness.$anonfun$setUp$1(KafkaServerTestHarness.scala:91)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234)
at scala.collection.Iterator.foreach(Iterator.scala:929)
at scala.collection.Iterator.foreach$(Iterator.scala:929)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1406)
at scala.collection.IterableLike.foreach(IterableLike.scala:71)
at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.map(TraversableLike.scala:234)
at scala.collection.TraversableLike.map$(TraversableLike.scala:227)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:91)
at 
kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:65)
at kafka.api.ConsumerBounceTest.setUp(ConsumerBounceTest.scala:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
at 

[jira] [Commented] (KAFKA-4476) Kafka Streams gets stuck if metadata is missing

2017-03-30 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949128#comment-15949128
 ] 

Magnus Edenhill commented on KAFKA-4476:


Directed here from KAFKA-4482.

Happened again on trunk PR:
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2536/testReport/junit/org.apache.kafka.streams.integration/ResetIntegrationTest/testReprocessingFromScratchAfterResetWithoutIntermediateUserTopic/

> Kafka Streams gets stuck if metadata is missing
> ---
>
> Key: KAFKA-4476
> URL: https://issues.apache.org/jira/browse/KAFKA-4476
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> When a Kafka Streams application gets started for the first time, it can 
> happen that some topic metadata is missing when 
> {{StreamPartitionAssigner#assign()}} is called on the group leader instance. 
> This can result in an infinite loop within 
> {{StreamPartitionAssigner#assign()}}. This issue was detected by 
> {{ResetIntegrationTest}} that does have a transient timeout failure (c.f. 
> https://issues.apache.org/jira/browse/KAFKA-4058 -- this issue was re-opened 
> multiple times as the problem was expected to be in the test -- however, that 
> is not the case).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4842) Streams integration tests occasionally fail with connection error

2017-03-30 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948898#comment-15948898
 ] 

Magnus Edenhill commented on KAFKA-4842:


Happened again on a trunk PR, 
https://github.com/apache/kafka/pull/2765#issuecomment-290381971 :

https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2525/


{noformat}
org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance FAILED
java.lang.AssertionError: Condition not met within timeout 3. waiting 
for metadata, store and value to be non null
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:257)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.verifyAllKVKeys(QueryableStateIntegrationTest.java:262)
at 
org.apache.kafka.streams.integration.QueryableStateIntegrationTest.queryOnRebalance(QueryableStateIntegrationTest.java:344)
{noformat}



> Streams integration tests occasionally fail with connection error
> -
>
> Key: KAFKA-4842
> URL: https://issues.apache.org/jira/browse/KAFKA-4842
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
> Fix For: 0.11.0.0
>
>
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[0] FAILED  java.lang.IllegalStateException: No entry found 
> for connection 0. In ClusterConnectionStates.java: node state. This happens 
> locally. Happened to KStreamAggregationIntegrationTest.java.shouldReduce() 
> once too. 
> Not clear this is streams related. Also, it's hard to reproduce since it 
> doesn't happen all the time. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4974) System test failure in 0.8.2.2 upgrade tests

2017-03-29 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill updated KAFKA-4974:
---
Description: 
The 0.10.2 system test failed in one of the upgrade tests from 0.8.2.2:
http://testing.confluent.io/confluent-kafka-0-10-2-system-test-results/?prefix=2017-03-21--001.1490092219--apache--0.10.2--4a019bd/TestUpgrade/test_upgrade/from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy.new_consumer=False/


{noformat}
[INFO  - 2017-03-21 07:35:48,802 - runner_client - log - lineno:221]: 
RunnerClient: 
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy.new_consumer=False:
 FAIL: Kafka server didn't finish startup
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 125, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 114, in run_produce_consume_validate
core_test_action(*args)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 126, in 
to_message_format_version))
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 52, in perform_upgrade
self.kafka.start_node(node)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 222, in start_node
monitor.wait_until("Kafka Server.*started", timeout_sec=30, 
backoff_sec=.25, err_msg="Kafka server didn't finish startup")
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
 line 642, in wait_until
return wait_until(lambda: self.acct.ssh("tail -c +%d %s | grep '%s'" % 
(self.offset+1, self.log, pattern), allow_fail=True) == 0, **kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
{noformat}

Logs:
{noformat}
==> ./KafkaService-0-140646398705744/worker9/server-start-stdout-stderr.log <==
[2017-03-21 07:35:18,250] DEBUG Leaving process event 
(org.I0Itec.zkclient.ZkClient)
[2017-03-21 07:35:18,250] INFO EventThread shut down 
(org.apache.zookeeper.ClientCnxn)
[2017-03-21 07:35:18,250] INFO [Kafka Server 2], shut down completed 
(kafka.server.KafkaServer)
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 9192; nested exception is: 
java.net.BindException: Address already in use
{noformat}

That's from starting the upgraded broker, which seems to indicate that 0.8.2.2 
was not properly shut down or has its RMI port in the close-wait state.

Since there probably isn't much to do about 0.8.2.2 the test should probably be 
hardened to either select a random port, or wait for lingering port to become 
available (can use netstat for that).

This earlier failrue from the same 0.8.2.2 invocation might be of interest:
{noformat}
[2017-03-21 07:35:18,233] DEBUG Writing clean shutdown marker at 
/mnt/kafka-data-logs (kafka.log.LogManager)
[2017-03-21 07:35:18,235] INFO Shutdown complete. (kafka.log.LogManager)
[2017-03-21 07:35:18,238] DEBUG Shutting down task scheduler. 
(kafka.utils.KafkaScheduler)
[2017-03-21 07:35:18,243] WARN sleep interrupted (kafka.utils.Utils$)
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:144)
at kafka.utils.Utils$.swallow(Utils.scala:172)
at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
at kafka.utils.Logging$class.swallow(Logging.scala:94)
at kafka.utils.Utils$.swallow(Utils.scala:45)

[jira] [Created] (KAFKA-4974) System test failure in 0.8.2.2 upgrade tests

2017-03-29 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-4974:
--

 Summary: System test failure in 0.8.2.2 upgrade tests
 Key: KAFKA-4974
 URL: https://issues.apache.org/jira/browse/KAFKA-4974
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Magnus Edenhill


The 0.10.2 system test failed in one of the upgrade tests from 0.8.2.2:
http://testing.confluent.io/confluent-kafka-0-10-2-system-test-results/?prefix=2017-03-21--001.1490092219--apache--0.10.2--4a019bd/TestUpgrade/test_upgrade/from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy.new_consumer=False/


{{
[INFO  - 2017-03-21 07:35:48,802 - runner_client - log - lineno:221]: 
RunnerClient: 
kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.8.2.2.to_message_format_version=None.compression_types=.snappy.new_consumer=False:
 FAIL: Kafka server didn't finish startup
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 125, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 114, in run_produce_consume_validate
core_test_action(*args)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 126, in 
to_message_format_version))
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 52, in perform_upgrade
self.kafka.start_node(node)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/services/kafka/kafka.py",
 line 222, in start_node
monitor.wait_until("Kafka Server.*started", timeout_sec=30, 
backoff_sec=.25, err_msg="Kafka server didn't finish startup")
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/cluster/remoteaccount.py",
 line 642, in wait_until
return wait_until(lambda: self.acct.ssh("tail -c +%d %s | grep '%s'" % 
(self.offset+1, self.log, pattern), allow_fail=True) == 0, **kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
 line 36, in wait_until
}}

Logs:
{{
==> ./KafkaService-0-140646398705744/worker9/server-start-stdout-stderr.log <==
[2017-03-21 07:35:18,250] DEBUG Leaving process event 
(org.I0Itec.zkclient.ZkClient)
[2017-03-21 07:35:18,250] INFO EventThread shut down 
(org.apache.zookeeper.ClientCnxn)
[2017-03-21 07:35:18,250] INFO [Kafka Server 2], shut down completed 
(kafka.server.KafkaServer)
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
already in use: 9192; nested exception is: 
java.net.BindException: Address already in use
}}

That's from starting the upgraded broker, which seems to indicate that 0.8.2.2 
was not properly shut down or has its RMI port in the close-wait state.

Since there probably isn't much to do about 0.8.2.2 the test should probably be 
hardened to either select a random port, or wait for lingering port to become 
available (can use netstat for that).

This earlier failrue from the same 0.8.2.2 invocation might be of interest:
{{
[2017-03-21 07:35:18,233] DEBUG Writing clean shutdown marker at 
/mnt/kafka-data-logs (kafka.log.LogManager)
[2017-03-21 07:35:18,235] INFO Shutdown complete. (kafka.log.LogManager)
[2017-03-21 07:35:18,238] DEBUG Shutting down task scheduler. 
(kafka.utils.KafkaScheduler)
[2017-03-21 07:35:18,243] WARN sleep interrupted (kafka.utils.Utils$)
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:144)
at kafka.utils.Utils$.swallow(Utils.scala:172)
at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
at 

[jira] [Commented] (KAFKA-1588) Offset response does not support two requests for the same topic/partition combo

2016-08-10 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415000#comment-15415000
 ] 

Magnus Edenhill commented on KAFKA-1588:


I see now that this behaviour is infact documented in the protocol spec, sorry 
about that. I rest my case.

> Offset response does not support two requests for the same topic/partition 
> combo
> 
>
> Key: KAFKA-1588
> URL: https://issues.apache.org/jira/browse/KAFKA-1588
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Todd Palino
>Assignee: Sriharsha Chintalapani
>  Labels: newbie
>
> When performing an OffsetRequest, if you request the same topic and partition 
> combination in a single request more than once (for example, if you want to 
> get both the head and tail offsets for a partition in the same request), you 
> will get a response for both, but they will be the same offset.
> We identified that the problem is that when the offset response is assembled, 
> a map is used to store the offset info before it is converted to the response 
> format and sent to the client. Therefore, the second request for a 
> topic/partition combination will overwrite the offset from the first request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1588) Offset response does not support two requests for the same topic/partition combo

2016-08-10 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414995#comment-15414995
 ] 

Magnus Edenhill commented on KAFKA-1588:


Thanks [~ijuma].
But I dont see how the bug fix changes the protocol (the behaviour is 
undocumented and non-intuitive), or breaks any clients (how can a client rely 
or make use of undefined ordering, silent ignore of duplicate topics, etc?).

For the greater good of the community and eco-system I feel it is important 
that the protocol specification is considered authoritative rather than the 
broker implementation, and any discrepencies between the two should be fixed in 
the corresponding implementation(s) rather than the protocol spec unless the 
protocol spec is clearly wrong (which would affect all protocol implementations 
(clients)) .


> Offset response does not support two requests for the same topic/partition 
> combo
> 
>
> Key: KAFKA-1588
> URL: https://issues.apache.org/jira/browse/KAFKA-1588
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Todd Palino
>Assignee: Sriharsha Chintalapani
>  Labels: newbie
>
> When performing an OffsetRequest, if you request the same topic and partition 
> combination in a single request more than once (for example, if you want to 
> get both the head and tail offsets for a partition in the same request), you 
> will get a response for both, but they will be the same offset.
> We identified that the problem is that when the offset response is assembled, 
> a map is used to store the offset info before it is converted to the response 
> format and sent to the client. Therefore, the second request for a 
> topic/partition combination will overwrite the offset from the first request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1588) Offset response does not support two requests for the same topic/partition combo

2016-08-10 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414868#comment-15414868
 ] 

Magnus Edenhill commented on KAFKA-1588:


[~guozhang] Why would this fix require a protocol change? It is already an 
array of Topics+partitions in the OffsetRequest and all the broker needs to do 
is honour the contents and order of that list and respond accordingly, which 
should be a broker implementation detail only and not affect the protocol 
definition, right?

> Offset response does not support two requests for the same topic/partition 
> combo
> 
>
> Key: KAFKA-1588
> URL: https://issues.apache.org/jira/browse/KAFKA-1588
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Todd Palino
>Assignee: Sriharsha Chintalapani
>  Labels: newbie
>
> When performing an OffsetRequest, if you request the same topic and partition 
> combination in a single request more than once (for example, if you want to 
> get both the head and tail offsets for a partition in the same request), you 
> will get a response for both, but they will be the same offset.
> We identified that the problem is that when the offset response is assembled, 
> a map is used to store the offset info before it is converted to the response 
> format and sent to the client. Therefore, the second request for a 
> topic/partition combination will overwrite the offset from the first request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3743) kafka-server-start.sh: Unhelpful error message

2016-05-22 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-3743:
--

 Summary: kafka-server-start.sh: Unhelpful error message
 Key: KAFKA-3743
 URL: https://issues.apache.org/jira/browse/KAFKA-3743
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0.0
Reporter: Magnus Edenhill
Priority: Minor


When trying to start Kafka from an uncompiled source tarball rather than the 
binary the kafka-server-start.sh command gives a mystical error message:

```
$ bin/kafka-server-start.sh config/server.properties 
Error: Could not find or load main class config.server.properties

```

This could probably be improved to say something closer to the truth.

This is on 0.10.0.0-rc6 tarball from github.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3547) Broker does not disconnect client on unknown request

2016-04-12 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236783#comment-15236783
 ] 

Magnus Edenhill commented on KAFKA-3547:


Fixed in
https://github.com/apache/kafka/commit/bb643f83a95e9ebb3a9f8331fe9998c8722c07d4#diff-d0332a0ff31df50afce3809d90505b25R80

> Broker does not disconnect client on unknown request
> 
>
> Key: KAFKA-3547
> URL: https://issues.apache.org/jira/browse/KAFKA-3547
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0, 0.9.0.1
>Reporter: Magnus Edenhill
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> A regression in 0.9.0 causes the broker to not close a client connection when 
> receiving an unsupported request.
> Two effects of this are:
>  - the client is not informed that the request was not supported (even though 
> a closed connection is a blunt indication it is infact some indication), the 
> request will (hopefully) just time out on the client after some time, 
> stalling sub-sequent operations.
>  - the broker leaks the connection until the connection reaper brings it down 
> or the client closes the connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3547) Broker does not disconnect client on unknown request

2016-04-12 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-3547:
--

 Summary: Broker does not disconnect client on unknown request
 Key: KAFKA-3547
 URL: https://issues.apache.org/jira/browse/KAFKA-3547
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.1, 0.9.0.0
Reporter: Magnus Edenhill
Priority: Critical
 Fix For: 0.10.0.0


A regression in 0.9.0 causes the broker to not close a client connection when 
receiving an unsupported request.

Two effects of this are:
 - the client is not informed that the request was not supported (even though a 
closed connection is a blunt indication it is infact some indication), the 
request will (hopefully) just time out on the client after some time, stalling 
sub-sequent operations.
 - the broker leaks the connection until the connection reaper brings it down 
or the client closes the connection





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-04-11 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234610#comment-15234610
 ] 

Magnus Edenhill commented on KAFKA-3160:


[~ijuma] It needs to go through slow-path to "fix-down" (i.e., break the 
checksum) for older clients connecting to a newer broker.
But yeah, the plan is to bundle this functionality with Message format 1, so 
that Message format v0 attribute flag LZ4 has a broken checksum, while Message 
format 1 with attribute flag LZ4 has a correct checksum.

> Kafka LZ4 framing code miscalculates header checksum
> 
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Magnus Edenhill
>  Labels: compatibility, compression, lz4
>
> KAFKA-1493 partially implements the LZ4 framing specification, but it 
> incorrectly calculates the header checksum. This causes 
> KafkaLZ4BlockInputStream to raise an error 
> [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
> LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly 
> framed LZ4 data, which means clients decoding LZ4 messages from kafka will 
> always receive incorrectly framed data.
> Specifically, the current implementation includes the 4-byte MagicNumber in 
> the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that 
> brokers reject messages as having a corrupt checksum. So currently non-java 
> clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-04-10 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234245#comment-15234245
 ] 

Magnus Edenhill commented on KAFKA-3160:


[~dana.powers] My broker patch adds a new Attribute bit to specify the fixed 
LZ4F framing but still relies on clients being backwards compatible with the 
broken framing format, but that was before KIP-31 was a thing..

Your proposed solution with reusing the KIP-31 behaviour is much better, I'd 
definately like to see this in broker 0.10.
This will also formally add LZ4 support to the protocol (it is not even 
mentioned in the Kafka protocol docs) and be compatible with KIP-35 (e.g., 
ProduceRequest >= v2 supports LZ4).

I'm a strong +1 on this.

> Kafka LZ4 framing code miscalculates header checksum
> 
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Magnus Edenhill
>  Labels: compatibility, compression, lz4
>
> KAFKA-1493 partially implements the LZ4 framing specification, but it 
> incorrectly calculates the header checksum. This causes 
> KafkaLZ4BlockInputStream to raise an error 
> [IOException(DESCRIPTOR_HASH_MISMATCH)] if a client sends *correctly* framed 
> LZ4 data. It also causes KafkaLZ4BlockOutputStream to generate incorrectly 
> framed LZ4 data, which means clients decoding LZ4 messages from kafka will 
> always receive incorrectly framed data.
> Specifically, the current implementation includes the 4-byte MagicNumber in 
> the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that 
> brokers reject messages as having a corrupt checksum. So currently non-java 
> clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3442) FetchResponse size exceeds max.partition.fetch.bytes

2016-03-22 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206838#comment-15206838
 ] 

Magnus Edenhill commented on KAFKA-3442:


librdkafka silently ignores partial messages (which are most often seen at the 
end of a messageset), so it should be improved for this case where it gets 
stuck on a single message.

> FetchResponse size exceeds max.partition.fetch.bytes
> 
>
> Key: KAFKA-3442
> URL: https://issues.apache.org/jira/browse/KAFKA-3442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Dana Powers
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 0.10.0.0
>
>
> Produce 1 byte message to topic foobar
> Fetch foobar w/ max.partition.fetch.bytes=1024
> Test expects to receive a truncated message (~1024 bytes). 0.8 and 0.9 pass 
> this test, but 0.10 FetchResponse has full message, exceeding the max 
> specified in the FetchRequest.
> I tested with v0 and v1 apis, both fail. Have not tested w/ v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3394) Broker fails to parse Null Metadata in OffsetCommit requests

2016-03-14 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-3394:
--

 Summary: Broker fails to parse Null Metadata in OffsetCommit 
requests
 Key: KAFKA-3394
 URL: https://issues.apache.org/jira/browse/KAFKA-3394
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.0
Reporter: Magnus Edenhill


librdkafka sends a Null Metadata string (size -1) in its OffsetCommitRequests 
when there is no metadata, this unfortunately leads to an exception on the 
broker that expects a non-null string.

{noformat}
[2016-03-11 11:11:57,623] ERROR Closing socket for 
10.191.0.33:9092-10.191.0.33:56503 because of error (kafka.network.Processor)
kafka.network.InvalidRequestException: Error getting request for apiKey: 8 and 
apiVersion: 1
at 
kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:91)
at kafka.network.RequestChannel$Request.(RequestChannel.scala:88)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:426)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:421)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.run(SocketServer.scala:421)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error 
reading field 'topics': Error reading field 'partitions': Error reading field 
'metadata': java.lang.NegativeArraySizeException
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
at 
org.apache.kafka.common.requests.OffsetCommitRequest.parse(OffsetCommitRequest.java:260)
at 
org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:50)
at 
kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:88)
... 9 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3219) Long topic names mess up broker topic state

2016-02-08 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill updated KAFKA-3219:
---
Description: 
Seems like the broker doesn't like topic names of 254 chars or more when 
creating using kafka-topics.sh --create.
The problem does not seem to arise when topic is created through automatic 
topic creation.

How to reproduce:

{code}
TOPIC=$(printf 'd%.0s' {1..254} ) ; bin/kafka-topics.sh --zookeeper 0 --create 
--topic $TOPIC --partitions 1 --replication-factor 1
{code}

{code}
[2016-02-06 22:00:01,943] INFO [ReplicaFetcherManager on broker 3] Removed 
fetcher for partitions 
[dd,0]
 (kafka.server.ReplicaFetcherManager)
[2016-02-06 22:00:01,944] ERROR [KafkaApi-3] Error when handling request 
{controller_id=3,controller_epoch=12,partition_states=[{topic=dd,partition=0,controller_epoch=12,leader=3,leader_epoch=0,isr=[3],zk_version=0,replicas=[3]}],live_leaders=[{id=3,host=eden,port=9093}]}
 (kafka.server.KafkaApis)
java.lang.NullPointerException
at 
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.log.Log.loadSegments(Log.scala:138)
at kafka.log.Log.(Log.scala:92)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:96)
at 
kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
at 
kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:176)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:170)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:267)
at kafka.cluster.Partition.makeLeader(Partition.scala:170)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:696)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:695)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:695)
at 
kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:641)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:142)
at kafka.server.KafkaApis.handle(KafkaApis.scala:79)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
{code}

  was:
Seems like the broker doesn't like topic names of 254 chars or more when 
creating using kafka-topics.sh --create.
The problem does not seem to arise when topic is created through automatic 
topic creation.

How to reproduce:

TOPIC=$(printf 'd%.0s' {1..254} ) ; bin/kafka-topics.sh --zookeeper 0 --create 
--topic $TOPIC --partitions 1 --replication-factor 1

[2016-02-06 22:00:01,943] INFO [ReplicaFetcherManager on broker 3] Removed 
fetcher for partitions 
[dd,0]
 (kafka.server.ReplicaFetcherManager)
[2016-02-06 22:00:01,944] ERROR [KafkaApi-3] Error when handling request 

[jira] [Created] (KAFKA-3219) Long topic names mess up broker topic state

2016-02-08 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-3219:
--

 Summary: Long topic names mess up broker topic state
 Key: KAFKA-3219
 URL: https://issues.apache.org/jira/browse/KAFKA-3219
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.0
Reporter: Magnus Edenhill


Seems like the broker doesn't like topic names of 254 chars or more when 
creating using kafka-topics.sh --create.
The problem does not seem to arise when topic is created through automatic 
topic creation.

How to reproduce:

TOPIC=$(printf 'd%.0s' {1..254} ) ; bin/kafka-topics.sh --zookeeper 0 --create 
--topic $TOPIC --partitions 1 --replication-factor 1

[2016-02-06 22:00:01,943] INFO [ReplicaFetcherManager on broker 3] Removed 
fetcher for partitions 
[dd,0]
 (kafka.server.ReplicaFetcherManager)
[2016-02-06 22:00:01,944] ERROR [KafkaApi-3] Error when handling request 
{controller_id=3,controller_epoch=12,partition_states=[{topic=dd,partition=0,controller_epoch=12,leader=3,leader_epoch=0,isr=[3],zk_version=0,replicas=[3]}],live_leaders=[{id=3,host=eden,port=9093}]}
 (kafka.server.KafkaApis)
java.lang.NullPointerException
at 
scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at kafka.log.Log.loadSegments(Log.scala:138)
at kafka.log.Log.(Log.scala:92)
at kafka.log.LogManager.createLog(LogManager.scala:357)
at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:96)
at 
kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
at 
kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:176)
at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:170)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:259)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:267)
at kafka.cluster.Partition.makeLeader(Partition.scala:170)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:696)
at 
kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:695)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:695)
at 
kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:641)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:142)
at kafka.server.KafkaApis.handle(KafkaApis.scala:79)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3160) Kafka LZ4 framing code miscalculates header checksum

2016-01-28 Thread Magnus Edenhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magnus Edenhill reassigned KAFKA-3160:
--

Assignee: Magnus Edenhill

> Kafka LZ4 framing code miscalculates header checksum
> 
>
> Key: KAFKA-3160
> URL: https://issues.apache.org/jira/browse/KAFKA-3160
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.8.2.2
>Reporter: Dana Powers
>Assignee: Magnus Edenhill
>
> KAFKA-1493 implements the LZ4 framing specification, but it incorrectly 
> calculates the header checksum. Specifically, the current implementation 
> includes the 4-byte MagicNumber in the checksum, which is incorrect.
> http://cyan4973.github.io/lz4/lz4_Frame_format.html
> Third-party clients that attempt to use off-the-shelf lz4 framing find that 
> brokers reject messages as having a corrupt checksum. So currently non-java 
> clients must 'fixup' lz4 packets to deal with the broken checksum.
> Magnus first identified this issue in librdkafka; kafka-python has the same 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2016-01-27 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119920#comment-15119920
 ] 

Magnus Edenhill commented on KAFKA-1493:


[~dana.powers] I can confirm this is the case, as you describe it. I suggest 
creating a new issue for this.
I have a patch that adds a new compression.type=lz4f with proper framing.

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> --
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: James Oliver
>Assignee: James Oliver
>Priority: Blocker
> Fix For: 0.8.2.0
>
> Attachments: KAFKA-1493.patch, KAFKA-1493.patch, 
> KAFKA-1493_2014-10-16_13:49:34.patch, KAFKA-1493_2014-10-16_21:25:23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2695) Handle null string/bytes protocol primitives

2015-12-29 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073886#comment-15073886
 ] 

Magnus Edenhill commented on KAFKA-2695:


This blocks kafka-consumer-groups.sh to describe librdkafka-based consumer 
groups, see https://github.com/edenhill/librdkafka/issues/479

> Handle null string/bytes protocol primitives
> 
>
> Key: KAFKA-2695
> URL: https://issues.apache.org/jira/browse/KAFKA-2695
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> The kafka protocol supports null bytes and string primitives by passing -1 as 
> the size, but the current deserializers implemented in 
> o.a.k.common.protocol.types.Type do not handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-10-06 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160927#comment-14160927
 ] 

Magnus Edenhill commented on KAFKA-1367:


May I suggest not to change the protocol but to only send an empty ISR vector 
in the MetadataResponse?

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
  Labels: newbie++
 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-09-25 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148054#comment-14148054
 ] 

Magnus Edenhill commented on KAFKA-1367:


[~guozhang] Yes, KAFKA-1555 looks like a good match.

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
  Labels: newbie++
 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2014-09-22 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143218#comment-14143218
 ] 

Magnus Edenhill commented on KAFKA-1367:


Apart from supplying the ISR count and list in its metadata API, librdkafka 
also provides an `enforce.isr.cnt` configuration property that fails
produce requests locally before transmission if the currently known ISR count 
is smaller than the configured value.
This is a workaround for the broker not fully honoring `request.required.acks`, 
i.e., if `request.required.acks=3` and only one broker is available the produce 
request will not fail.
More info in the original issue here: 
https://github.com/edenhill/librdkafka/issues/91

Generally I would assume that information provided by the broker is correct, 
otherwise it should not be included at all since it can't be used (reliably).

 Broker topic metadata not kept in sync with ZooKeeper
 -

 Key: KAFKA-1367
 URL: https://issues.apache.org/jira/browse/KAFKA-1367
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Ryan Berdeen
  Labels: newbie++
 Attachments: KAFKA-1367.txt


 When a broker is restarted, the topic metadata responses from the brokers 
 will be incorrect (different from ZooKeeper) until a preferred replica leader 
 election.
 In the metadata, it looks like leaders are correctly removed from the ISR 
 when a broker disappears, but followers are not. Then, when a broker 
 reappears, the ISR is never updated.
 I used a variation of the Vagrant setup created by Joe Stein to reproduce 
 this with latest from the 0.8.1 branch: 
 https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-955) After a leader change, messages sent with ack=0 are lost

2013-08-21 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746810#comment-13746810
 ] 

Magnus Edenhill commented on KAFKA-955:
---

Hi Guozhang,

I understand that you might not want to introduce a new message semantic at 
this point of the 0.8 beta, but it wont get easier after the release.

My proposal is a change of the protocol definition to allow unsolicited 
metadata response messages to be sent from the broker, this would of course 
require changes in most clients, but a very small one for those that are not 
interested in keeping their leader cache up to date.

Consider a producer forwarding 100kmsgs/s for a number of topics to a broker 
that suddenly drops the connection because one of those topics changed leader, 
the producer message queue will quickly build up and might start dropping 
messages (for topics that didnt loose their leader) due to local queue 
thresholds or very slowly recover if the current rate of messages is close to 
the maximum thruput.


In my mind closing the socket because one top+par changed leader is a very 
intrusive way to signal an event for sub-set of the communication, and it 
should instead be fixed properly with an unsoliticed metadata response message.

The unsolicited metadata response message is useful for other scenarios aswell, 
new brokers and topics being added, for instance.

My two cents on the topic, thank you.

 After a leader change, messages sent with ack=0 are lost
 

 Key: KAFKA-955
 URL: https://issues.apache.org/jira/browse/KAFKA-955
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
Assignee: Guozhang Wang
 Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, 
 KAFKA-955.v2.patch, KAFKA-955.v3.patch


 If the leader changes for a partition, and a producer is sending messages 
 with ack=0, then messages will be lost, since the producer has no active way 
 of knowing that the leader has changed, until it's next metadata refresh 
 update.
 The broker receiving the message, which is no longer the leader, logs a 
 message like this:
 Produce request with correlation id 7136261 from client  on partition 
 [mytopic,0] failed due to Leader not local for partition [mytopic,0] on 
 broker 508818741
 This is exacerbated by the controlled shutdown mechanism, which forces an 
 immediate leader change.
 A possible solution to this would be for a broker which receives a message, 
 for a topic that it is no longer the leader for (and if the ack level is 0), 
 then the broker could just silently forward the message over to the current 
 leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira