[jira] [Assigned] (KAFKA-5358) Consumer perf tool should count rebalance time separately

2017-06-01 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5358:
---

Assignee: huxi

> Consumer perf tool should count rebalance time separately
> -
>
> Key: KAFKA-5358
> URL: https://issues.apache.org/jira/browse/KAFKA-5358
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: huxi
>
> It would be helpful to measure rebalance time separately in the performance 
> tool so that throughput between different versions can be compared more 
> easily in spite of improvements such as 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance.
>  At the moment, running the perf tool on 0.11.0 or trunk for a short amount 
> of time will present a severely skewed picture since the overall time will be 
> dominated by the join group delay.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5327) Console Consumer should only poll for up to max messages

2017-05-25 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5327:
---

Assignee: huxi

> Console Consumer should only poll for up to max messages
> 
>
> Key: KAFKA-5327
> URL: https://issues.apache.org/jira/browse/KAFKA-5327
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Dustin Cote
>Assignee: huxi
>Priority: Minor
>
> The ConsoleConsumer has a --max-messages flag that can be used to limit the 
> number of messages consumed. However, the number of records actually consumed 
> is governed by max.poll.records. This means you see one message on the 
> console, but your offset has moved forward a default of 500, which is kind of 
> counterintuitive. It would be good to only commit offsets for messages we 
> have printed to the console.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5296) Unable to write to some partitions of newly created topic in 10.2

2017-05-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019074#comment-16019074
 ] 

huxi commented on KAFKA-5296:
-

With 0.10.2 client, could you try to enlarge `request.timeout.ms` a little bit 
to see if this issue happens again?

> Unable to write to some partitions of newly created topic in 10.2
> -
>
> Key: KAFKA-5296
> URL: https://issues.apache.org/jira/browse/KAFKA-5296
> Project: Kafka
>  Issue Type: Bug
>Reporter: Abhisek Saikia
>
> We are using kafka 10.2 and the cluster was running fine for a month with 50 
> topics and now we are having issue in producing message by creating new 
> topics. The create topic command is successful but producers are throwing 
> error while writing to some partitions. 
> Error in producer-
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> [topic1]-8: 30039 ms has passed since batch creation plus linger time
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:70)
>  ~[kafka-clients-0.10.2.0.jar:na]
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:57)
>  ~[kafka-clients-0.10.2.0.jar:na]
>   at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
>  ~[kafka-clients-0.10.2.0.jar:na]
> On the broker side, I don't see any topic-parition folder getting created for 
> the broker who is the leader for the partition. 
> While using 0.8 client, the write used to hang while it starts writing to the 
> partition having this issue. With 10.2 it resolved the the producer hang issue
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5299) MirrorMaker with New.consumer doesn't consume message from multiple topics whitelisted

2017-05-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019072#comment-16019072
 ] 

huxi commented on KAFKA-5299:
-

How did you specify the whitelist? It should be a valid regular expression.

> MirrorMaker with New.consumer doesn't consume message from multiple topics 
> whitelisted 
> ---
>
> Key: KAFKA-5299
> URL: https://issues.apache.org/jira/browse/KAFKA-5299
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Jyoti
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5278) kafka-console-consumer: `--value-deserializer` is not working but `--property value.deserializer` does

2017-05-19 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5278:
---

Assignee: huxi

> kafka-console-consumer: `--value-deserializer` is not working but `--property 
> value.deserializer` does
> --
>
> Key: KAFKA-5278
> URL: https://issues.apache.org/jira/browse/KAFKA-5278
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.2.1
>Reporter: Yeva Byzek
>Assignee: huxi
>Priority: Minor
>
> kafka-console-consumer: {{--value-deserializer}} is not working but 
> {{--property value.deserializer}} is working
> 1. Does not work
> {noformat}
> $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning 
> --topic TEST1  --value-deserializer 
> org.apache.kafka.common.serialization.LongDeserializer
> [2017-05-18 13:09:41,745] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> java.lang.ClassCastException: java.lang.Long cannot be cast to [B
>   at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:100)
>   at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
>   at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
>   at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
>   at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> Processed a total of 0 messages
> {noformat}
> 2. Does work
> {noformat}
> $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning 
> --topic TEST1  --property 
> value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
> 1000
> 2500
> 2000
> 5500
> 8000
> {noformat}
> Without either, the output is
> {noformat}
> $ kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning 
> --topic TEST1
> ?
>   ?
> ?
> |
> @
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-14 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009914#comment-16009914
 ] 

huxi commented on KAFKA-5200:
-

[~ecomar] Seems there is no convenient tool to do this. However, you could 
follow the instructions below to complete the deletion:
1. Stop all healthy brokers
2. Manually delete `/brokers/topics/` znode in Zookeeper
3. Delete log files for the topic in the file systems
4. Remove `/admin/delete_topics` znode in Zookeeper
5. Start all healthy brokers

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5222) Kafka Connection error in contoller.logs

2017-05-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007770#comment-16007770
 ] 

huxi commented on KAFKA-5222:
-

You sure you see it when the broker is running instead of being shutdown?

> Kafka Connection error in contoller.logs
> 
>
> Key: KAFKA-5222
> URL: https://issues.apache.org/jira/browse/KAFKA-5222
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
> Environment: AMI - Amazon Linux AMI 
> Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk)
>Reporter: Abhimanyu Nagrath
>Priority: Minor
>
> I am using single node Kafka and single node zookeeper and my controller.log 
> files are flooded with this exception.
> java.io.IOException: Connection to 1 was disconnected before the response 
> was read
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114)
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112)
>   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
>   at 
> kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142)
>   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:192)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> What is the fix for this error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5221) Kafka Consumer Attached to partition but not consuming messages

2017-05-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007766#comment-16007766
 ] 

huxi commented on KAFKA-5221:
-

[~manyu_aditya] How many partitions are subscribed by the consumer instance 
which you suspect not consuming messages? If it subscribes too many partitions, 
I guess it's very likely for some partitions not to be consumed for quite a 
while.

> Kafka Consumer Attached to partition but not consuming messages
> ---
>
> Key: KAFKA-5221
> URL: https://issues.apache.org/jira/browse/KAFKA-5221
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
> Environment: Amazon-Linux AMI 
> Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk)
>Reporter: Abhimanyu Nagrath
>
> I have a single node Kafka broker and a single zookeeper (3.4.9). I am using 
> new Kafka Consumer APIs. One strange thing I observed is when I am starting 
> multiple Kafka consumers for multiple topics placed in a single group and on 
> hitting ./kafka-consumer-groups.sh this script for the group. Few of the 
> consumers are attached to the group but they do not consume any message. 
> Below are the stats of group command.
> TOPIC  PARTITION  CURRENT-OFFSET  
> LOG-END-OFFSET  LAGCONSUMER-ID HOST  
> topic1 0  288 288 0  
> consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae   /  
> consumer-8 
> 
> topic1 1  283 283 0 
> consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae   /  
> consumer-8 
> 
> topic1 2  279 279 0 
> consumer-8-c9487cd3-573b-4c97-87c1-ddf2063ab5ae   /  
> consumer-8 
> 
> topic2 0  -   9   - 
> consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed   /  
> consumer-1 
> 
> topic2 1  -   2   - 
> consumer-1-b0476dc8-099c-4a62-a68c-e9dc9c0a5bed   /  
> consumer-1 
> 
> topic3 0  450 450 0 
> consumer-3-63c07703-17d0-471b-8c5f-17347699f108   /  
> consumer-3 
> 
> topic41  -   54
> -consumer-2-94dcc209-8377-45ce-8473-9ab0d85951c4   
> /  
> 
> topic2 2  441 441 0  
> consumer-5-bcfffc99-5915-41f4-b3e4-970baa204c14   /
> So can someone help me that why for topic **topic2** partition 0 
> **current-offset** is showing - and **lag** is showing - but messages are 
> still there in the server as  **LOG-END-OFFSET** is showing 9.
> I am having 600 GB Data distributed among 280 topics and 7000 partitions.
> This is happening very frequently and restarting the consumers solves the 
> issue temporarily.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007757#comment-16007757
 ] 

huxi commented on KAFKA-5200:
-

Alternatively, you could start up that broker before the deleting operation 
will be resumed automatically.

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5222) Kafka Connection error in contoller.logs

2017-05-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007742#comment-16007742
 ] 

huxi commented on KAFKA-5222:
-

Did you set `broker.id` in server.properties? I wonder where the `Connection to 
1` comes from if you did not set this config and only one broker is started 
since the default broker is starts from zero.

> Kafka Connection error in contoller.logs
> 
>
> Key: KAFKA-5222
> URL: https://issues.apache.org/jira/browse/KAFKA-5222
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
> Environment: AMI - Amazon Linux AMI 
> Hardware Specifications - (8 core, 16 GB RAM , 1 TB HardDisk)
>Reporter: Abhimanyu Nagrath
>Priority: Minor
>
> I am using single node Kafka and single node zookeeper and my controller.log 
> files are flooded with this exception.
> java.io.IOException: Connection to 1 was disconnected before the response 
> was read
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114)
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112)
>   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
>   at 
> kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142)
>   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:192)
>   at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:184)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> What is the fix for this error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002107#comment-16002107
 ] 

huxi edited comment on KAFKA-5200 at 5/9/17 6:15 AM:
-

Could add a parameter for kafka-reassign-partitions script to enable a force 
reassignment for  partitions from dead brokers.


was (Author: huxi_2b):
Could add parameter for kafka-reassign-partitions script to forcibly reassign 
partitions from dead brokers.

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5200) Deleting topic when one broker is down will prevent topic to be re-creatable

2017-05-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002107#comment-16002107
 ] 

huxi commented on KAFKA-5200:
-

Could add parameter for kafka-reassign-partitions script to forcibly reassign 
partitions from dead brokers.

> Deleting topic when one broker is down will prevent topic to be re-creatable
> 
>
> Key: KAFKA-5200
> URL: https://issues.apache.org/jira/browse/KAFKA-5200
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Edoardo Comar
>
> In a cluster with 5 broker, replication factor=3, min in sync=2,
> one broker went down 
> A user's app remained of course unaware of that and deleted a topic that 
> (unknowingly) had a replica on the dead broker.
> The topic went in 'pending delete' mode
> The user then tried to recreate the topic - which failed, so his app was left 
> stuck - no working topic and no ability to create one.
> The reassignment tool fails to move the replica out of the dead broker - 
> specifically because the broker with the partition replica to move is dead :-)
> Incidentally the confluent-rebalancer docs say
> http://docs.confluent.io/current/kafka/post-deployment.html#scaling-the-cluster
> > Supports moving partitions away from dead brokers
> It'd be nice to similarly improve the opensource reassignment tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5178) Potential Performance Degradation in Kafka Producer when using Multiple Threads

2017-05-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998316#comment-15998316
 ] 

huxi commented on KAFKA-5178:
-

I suggest splitting the test into multiple sub-testings in order to find out 
the bottleneck. The roundtrip latency contains many parts of the time, such as 
remote time, queue time on the producer side and polling time on the consumer 
side. It's hard to tell which part contributes most when hitting the 
performance limit.  

Besides, I notice a fact that there are totally 64 partitions distributed on 
three brokers. How many physical disks does your broker machine have averagely? 
Could you also monitor the disk usage% during the test to see if it's too busy?

> Potential Performance Degradation in Kafka Producer when using Multiple 
> Threads
> ---
>
> Key: KAFKA-5178
> URL: https://issues.apache.org/jira/browse/KAFKA-5178
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>
> There is evidence that the Kafka Producer drops performance as we increase 
> the number of threads using it. 
> This is based on some benchmarking done in the community. I have not 
> independently validated these results. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5177) Automatic creation of topic prompts warnings into console

2017-05-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997932#comment-15997932
 ] 

huxi commented on KAFKA-5177:
-

Seems this should be a transient warning log meaning clients fails to find 
leader from the metadata. You will not see this any more once client-side 
metadata is updated. 

> Automatic creation of topic prompts warnings into console
> -
>
> Key: KAFKA-5177
> URL: https://issues.apache.org/jira/browse/KAFKA-5177
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 0.10.1.0
> Environment: Mac OSX 16.5.0 Darwin Kernel Version 16.5.0 
> root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64 & JDK 1.8.0_121
>Reporter: Pranas Baliuka
>Priority: Minor
>
> The quick guide https://kafka.apache.org/0101/documentation.html#quickstart
> Leaves the bad first impression at the step when test messages are appended:
> {code}
> kafka_2.11-0.10.1.0 pranas$ bin/kafka-topics.sh --list --zookeeper 
> localhost:2181
> session-1
> kafka_2.11-0.10.1.0 pranas$ bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> Message 1
> [2017-05-05 09:05:10,923] WARN Error while fetching metadata with correlation 
> id 0 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> Message 2
> Message 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5177) Default quick start config prompts warnings into console

2017-05-04 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997692#comment-15997692
 ] 

huxi commented on KAFKA-5177:
-

Did you create topic `test` before issuing kafka--console-producer script?

> Default quick start config prompts warnings into console
> 
>
> Key: KAFKA-5177
> URL: https://issues.apache.org/jira/browse/KAFKA-5177
> Project: Kafka
>  Issue Type: Wish
>  Components: documentation
>Affects Versions: 0.10.1.0
> Environment: Mac OSX 16.5.0 Darwin Kernel Version 16.5.0 
> root:xnu-3789.51.2~3/RELEASE_X86_64 x86_64 & JDK 1.8.0_121
>Reporter: Pranas Baliuka
>
> The quick guide https://kafka.apache.org/0101/documentation.html#quickstart
> Leaves the bad first impression at the step when test messages are appended:
> {code}
> kafka_2.11-0.10.1.0 pranas$ bin/kafka-topics.sh --list --zookeeper 
> localhost:2181
> session-1
> kafka_2.11-0.10.1.0 pranas$ bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> Message 1
> [2017-05-05 09:05:10,923] WARN Error while fetching metadata with correlation 
> id 0 : {test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
> Message 2
> Message 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting

2017-05-02 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994197#comment-15994197
 ] 

huxi commented on KAFKA-5153:
-

Could you set `replica.lag.time.max.ms`, `replica.fetch.wait.max.ms` to larger 
values and retry? 

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---
>
> Key: KAFKA-5153
> URL: https://issues.apache.org/jira/browse/KAFKA-5153
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
> Environment: RHEL 6
> Java Version  1.8.0_91-b14
>Reporter: Arpan
>Priority: Critical
> Attachments: server_1_72server.log, server_2_73_server.log, 
> server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, 
> ThreadDump_1493564177.dump, ThreadDump_1493564249.dump
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-5155) Messages can be deleted prematurely when some producers use timestamps and some not

2017-05-02 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994177#comment-15994177
 ] 

huxi edited comment on KAFKA-5155 at 5/3/17 2:27 AM:
-

This is very similar with a jira 
issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported 
by me complaining of the fact that broker side cannot honor the order of 
timestamp.   
Sounds like you cannot mix up the new timestamps and old timestamps based on 
the current design.


was (Author: huxi_2b):
This is very similar with a jira 
issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported 
by me complaining of the fact that Kafka cannot broker side cannot honor the 
order of timestamp.   
Sounds like you cannot mix up the new timestamps and old timestamps based on 
the current design.

> Messages can be deleted prematurely when some producers use timestamps and 
> some not
> ---
>
> Key: KAFKA-5155
> URL: https://issues.apache.org/jira/browse/KAFKA-5155
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Petr Plavjaník
>
> Some messages can be deleted prematurely and never read in following 
> scenario. A producer uses timestamps and produces messages that are appended 
> to the beginning of a log segment. Other producer produces messages without a 
> timestamp. In that case the largest timestamp is made by the old messages 
> with a timestamp and new messages with the timestamp does not influence and 
> the log segment with old and new messages can be delete immediately after the 
> last new message with no timestamp is appended. When all appended messages 
> have no timestamp, then they are not deleted because {{lastModified}} 
> attribute of a {{LogSegment}} is used.
> New test case to {{kafka.log.LogTest}} that fails:
> {code}
>   @Test
>   def 
> shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() {
> val retentionMs = 1000
> val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0)
> val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1, 
> magicValue = 0)
> val log = createLog(set.sizeInBytes, retentionMs = retentionMs)
> // append some messages to create some segments
> log.append(old)
> for (_ <- 0 until 12)
>   log.append(set)
> assertEquals("No segment should be deleted", 0, log.deleteOldSegments())
>   }
> {code}
> It can be prevented by using {{def largestTimestamp = 
> Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using 
> current timestamp when messages with timestamp {{-1}} are appended.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5155) Messages can be deleted prematurely when some producers use timestamps and some not

2017-05-02 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994177#comment-15994177
 ] 

huxi commented on KAFKA-5155:
-

This is very similar with a jira 
issue([kafka-4398|https://issues.apache.org/jira/browse/KAFKA-4398]) reported 
by me complaining of the fact that Kafka cannot broker side cannot honor the 
order of timestamp.   
Sounds like you cannot mix up the new timestamps and old timestamps based on 
the current design.

> Messages can be deleted prematurely when some producers use timestamps and 
> some not
> ---
>
> Key: KAFKA-5155
> URL: https://issues.apache.org/jira/browse/KAFKA-5155
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Petr Plavjaník
>
> Some messages can be deleted prematurely and never read in following 
> scenario. A producer uses timestamps and produces messages that are appended 
> to the beginning of a log segment. Other producer produces messages without a 
> timestamp. In that case the largest timestamp is made by the old messages 
> with a timestamp and new messages with the timestamp does not influence and 
> the log segment with old and new messages can be delete immediately after the 
> last new message with no timestamp is appended. When all appended messages 
> have no timestamp, then they are not deleted because {{lastModified}} 
> attribute of a {{LogSegment}} is used.
> New test case to {{kafka.log.LogTest}} that fails:
> {code}
>   @Test
>   def 
> shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() {
> val retentionMs = 1000
> val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0)
> val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1, 
> magicValue = 0)
> val log = createLog(set.sizeInBytes, retentionMs = retentionMs)
> // append some messages to create some segments
> log.append(old)
> for (_ <- 0 until 12)
>   log.append(set)
> assertEquals("No segment should be deleted", 0, log.deleteOldSegments())
>   }
> {code}
> It can be prevented by using {{def largestTimestamp = 
> Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using 
> current timestamp when messages with timestamp {{-1}} are appended.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5161) reassign-partitions to check if broker of ID exists in cluster

2017-05-02 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5161:
---

Assignee: huxi

> reassign-partitions to check if broker of ID exists in cluster
> --
>
> Key: KAFKA-5161
> URL: https://issues.apache.org/jira/browse/KAFKA-5161
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.1.1
> Environment: Debian 8
>Reporter: Lawrence Weikum
>Assignee: huxi
>Priority: Minor
>
> A topic was created with only one replica. We wanted to increase it later to 
> 3 replicas. A JSON file was created, but the IDs for the brokers were 
> incorrect and not part of the system. 
> The script or the brokers receiving the reassignment command should first 
> check if the new IDs exist in the cluster first and then continue, throwing 
> an error to the user if there is one that doesn't.
> The current effect of assign partitions to non-existant brokers is a stuck 
> replication assignment with no way to stop it. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5143) Windows platform does not offer kafka-broker-api-versions.bat

2017-04-29 Thread huxi (JIRA)
huxi created KAFKA-5143:
---

 Summary: Windows platform does not offer 
kafka-broker-api-versions.bat
 Key: KAFKA-5143
 URL: https://issues.apache.org/jira/browse/KAFKA-5143
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 0.10.2.0
Reporter: huxi
Assignee: huxi
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4879) KafkaConsumer.position may hang forever when deleting a topic

2017-04-25 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982598#comment-15982598
 ] 

huxi commented on KAFKA-4879:
-

[~guozhang] What about KafkaConsumer.pollOnce? Does running time of 
`updateFetchPositions` count in the total timeout?

> KafkaConsumer.position may hang forever when deleting a topic
> -
>
> Key: KAFKA-4879
> URL: https://issues.apache.org/jira/browse/KAFKA-4879
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Shixiong Zhu
>Assignee: Armin Braun
>
> KafkaConsumer.position may hang forever when deleting a topic. The problem is 
> this line 
> https://github.com/apache/kafka/blob/022bf129518e33e165f9ceefc4ab9e622952d3bd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L374
> The timeout is "Long.MAX_VALUE", and it will just retry forever for 
> UnknownTopicOrPartitionException.
> Here is a reproducer
> {code}
> import org.apache.kafka.clients.consumer.KafkaConsumer;
> import org.apache.kafka.common.TopicPartition;
> import org.apache.kafka.common.serialization.StringDeserializer;
> import java.util.Collections;
> import java.util.Properties;
> import java.util.Set;
> public class KafkaReproducer {
>   public static void main(String[] args) {
> // Make sure "delete.topic.enable" is set to true.
> // Please create the topic test with "3" partitions manually.
> // The issue is gone when there is only one partition.
> String topic = "test";
> Properties props = new Properties();
> props.put("bootstrap.servers", "localhost:9092");
> props.put("group.id", "testgroup");
> props.put("value.deserializer", StringDeserializer.class.getName());
> props.put("key.deserializer", StringDeserializer.class.getName());
> props.put("enable.auto.commit", "false");
> KafkaConsumer kc = new KafkaConsumer(props);
> kc.subscribe(Collections.singletonList(topic));
> kc.poll(0);
> Set partitions = kc.assignment();
> System.out.println("partitions: " + partitions);
> kc.pause(partitions);
> kc.seekToEnd(partitions);
> System.out.println("please delete the topic in 30 seconds");
> try {
>   // Sleep 30 seconds to give us enough time to delete the topic.
>   Thread.sleep(3);
> } catch (InterruptedException e) {
>   e.printStackTrace();
> }
> System.out.println("sleep end");
> for (TopicPartition p : partitions) {
>   System.out.println(p + " offset: " + kc.position(p));
> }
> System.out.println("cannot reach here");
> kc.close();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5118:
---

Assignee: huxi

> Improve message for Kafka failed startup with non-Kafka data in data.dirs
> -
>
> Key: KAFKA-5118
> URL: https://issues.apache.org/jira/browse/KAFKA-5118
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.0
>Reporter: Dustin Cote
>Assignee: huxi
>Priority: Minor
>
> Today, if you try to startup a broker with some non-Kafka data in the 
> data.dirs you end up with a cryptic message:
> {code}
> [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads 
> during logs loading: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 (kafka.log.LogManager) 
> [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
> {code}
> It'd be better if we could tell the user to look for non-Kafka data in the 
> data.dirs and print out the offending directory that caused the problem in 
> the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982092#comment-15982092
 ] 

huxi edited comment on KAFKA-5118 at 4/25/17 1:12 AM:
--

I remember 0.10.2.0 checks whether the directory name contains the dash before 
substring-ing it. There is another possibility that your non-Kafka directory 
ends with `-delete`. Right?


was (Author: huxi_2b):
I remember 0.10.2.0 checks whether the directory name contains the dash before 
substring-ing it. This is another possibility that your non-Kafka directory 
ends with `-delete`. Right?

> Improve message for Kafka failed startup with non-Kafka data in data.dirs
> -
>
> Key: KAFKA-5118
> URL: https://issues.apache.org/jira/browse/KAFKA-5118
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.0
>Reporter: Dustin Cote
>Priority: Minor
>
> Today, if you try to startup a broker with some non-Kafka data in the 
> data.dirs you end up with a cryptic message:
> {code}
> [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads 
> during logs loading: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 (kafka.log.LogManager) 
> [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
> {code}
> It'd be better if we could tell the user to look for non-Kafka data in the 
> data.dirs and print out the offending directory that caused the problem in 
> the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5118) Improve message for Kafka failed startup with non-Kafka data in data.dirs

2017-04-24 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982092#comment-15982092
 ] 

huxi commented on KAFKA-5118:
-

I remember 0.10.2.0 checks whether the directory name contains the dash before 
substring-ing it. This is another possibility that your non-Kafka directory 
ends with `-delete`. Right?

> Improve message for Kafka failed startup with non-Kafka data in data.dirs
> -
>
> Key: KAFKA-5118
> URL: https://issues.apache.org/jira/browse/KAFKA-5118
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.0
>Reporter: Dustin Cote
>Priority: Minor
>
> Today, if you try to startup a broker with some non-Kafka data in the 
> data.dirs you end up with a cryptic message:
> {code}
> [2017-04-21 13:35:08,122] ERROR There was an error in one of the threads 
> during logs loading: java.lang.StringIndexOutOfBoundsException: String index 
> out of range: -1 (kafka.log.LogManager) 
> [2017-04-21 13:35:08,124] FATAL [Kafka Server 3], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 
> {code}
> It'd be better if we could tell the user to look for non-Kafka data in the 
> data.dirs and print out the offending directory that caused the problem in 
> the first place.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper

2017-04-24 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980835#comment-15980835
 ] 

huxi commented on KAFKA-4295:
-

[~ijuma]  Are we going to continue to fix this issue since KIP 109 is planned 
to be finished in 0.11?

> kafka-console-consumer.sh does not delete the temporary group in zookeeper
> --
>
> Key: KAFKA-4295
> URL: https://issues.apache.org/jira/browse/KAFKA-4295
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sswater Shi
>Assignee: huxi
>Priority: Minor
>
> I'm not sure it is a bug or you guys designed it.
> Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group 
> information in zookeeper/consumers on exit when without "--new-consumer". 
> There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if 
> kafka-console-consumer.sh runs a lot of times.
> When 0.8.x.x,  the kafka-console-consumer.sh can be followed by an argument 
> "group". If not specified, the kafka-console-consumer.sh will create a 
> temporary group name like 'console-consumer-'. If the group name is 
> specified by "group", the information in the zookeeper/consumers will be kept 
> on exit. If the group name is a temporary one, the information in the 
> zookeeper will be deleted when kafka-console-consumer.sh is quitted by 
> Ctrl+C. Why this is changed from 0.9.x.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4592) Kafka Producer Metrics Invalid Value

2017-04-24 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980818#comment-15980818
 ] 

huxi commented on KAFKA-4592:
-

[~ijuma] What's the status for this jira after you said you need a discussion 
with Jun about whether 0 is a good initial value for metrics

> Kafka Producer Metrics Invalid Value
> 
>
> Key: KAFKA-4592
> URL: https://issues.apache.org/jira/browse/KAFKA-4592
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
>Reporter: AJ Jwair
>Assignee: huxi
>
> Producer metrics
> Metric name: record-size-max
> When no records are produced during the monitoring window, the 
> record-size-max has an invalid value of -9.223372036854776E16
> Please notice that the value is not a very small number close to zero bytes, 
> it is negative 90 quadrillion bytes
> The same behavior was observed in: records-lag-max



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5091) ReassignPartitionsCommand should protect against empty replica list assignment

2017-04-23 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5091:
---

Assignee: huxi

> ReassignPartitionsCommand should protect against empty replica list 
> assignment 
> ---
>
> Key: KAFKA-5091
> URL: https://issues.apache.org/jira/browse/KAFKA-5091
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ryan P
>Assignee: huxi
>
> Currently it is possible to lower a topics replication factor to 0 through 
> the use of the kafka-reassign-partitions command. 
> i.e. 
> cat increase-replication-factor.json
>   {"version":1,
>   "partitions":[{"topic":"foo","partition":0,"replicas":[]}]}
> kafka-reassign-partitions --zookeeper localhost:2181 --reassignment-json-file 
> increase-replication-factor.json --execute
> Topic:testPartitionCount:1ReplicationFactor:0 Configs:
>   Topic: foo  Partition: 0Leader: -1  Replicas:   Isr:
> I for one can't think of a reason why this is something someone would do 
> intentionally. That said I think it's worth validating that at least 1 
> replica remains within the replica list prior to executing the partition 
> reassignment. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5104) DumpLogSegments should not open index files with `rw`

2017-04-23 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5104:
---

Assignee: huxi

> DumpLogSegments should not open index files with `rw`
> -
>
> Key: KAFKA-5104
> URL: https://issues.apache.org/jira/browse/KAFKA-5104
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.10.2.0
> Environment: Kafka is run as root
>Reporter: Yeva Byzek
>Assignee: huxi
>Priority: Minor
>
> kafka.tools.DumpLogSegments requires sudo for index files but not for log 
> files. It seems inconsistent why `w` access would be required for the index 
> files just to dump the output.
> {noformat}
> $ sudo kafka-run-class kafka.tools.DumpLogSegments  --files 
> .index
> Dumping .index
> offset: 0 position: 0
> {noformat}
> {noformat}
> $ kafka-run-class kafka.tools.DumpLogSegments  --files 
> .indexDumping .index
> Exception in thread "main" java.io.FileNotFoundException: 
> .index (Permission denied)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at kafka.log.AbstractIndex.(AbstractIndex.scala:50)
> at kafka.log.OffsetIndex.(OffsetIndex.scala:52)
> at 
> kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpIndex(DumpLogSegments.scala:137)
> at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:100)
> at 
> kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:93)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:93)
> at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak

2017-04-18 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972314#comment-15972314
 ] 

huxi commented on KAFKA-5007:
-

Maybe a complete stack trace is helpful to diagnose what action throws this 
exception.

> Kafka Replica Fetcher Thread- Resource Leak
> ---
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 0.10.0.0, 0.10.1.1, 0.10.2.0
> Environment: Centos 7
> Jave 8
>Reporter: Joseph Aliase
>Priority: Critical
>  Labels: reliability
>
> Kafka is running out of open file descriptor when system network interface is 
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
> descriptor for the account running Kafka is set to 10.
> During an upgrade, network interface went down. Outage continued for 12 hours 
> eventually all the broker crashed with java.io.IOException: Too many open 
> files error.
> We repeated the test in a lower environment and observed that Open Socket 
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica 
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   
> for the broker pid continued to increase although NIC was down leading to  
> Open File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5010) Log cleaner crashed with BufferOverflowException when writing to the writeBuffer

2017-04-17 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972081#comment-15972081
 ] 

huxi commented on KAFKA-5010:
-

[~ijuma] Is it due to the fact that Kafka chooses 1024 as the minimum block 
size for Snappy if the total size of log message set is smaller than 1024? See 
o.a.k.common.record.AbstractRecords#estimatedSize: 
return compressionType == CompressionType.NONE ? size : Math.min(Math.max(size 
/ 2, 1024), 1 << 16);


> Log cleaner crashed with BufferOverflowException when writing to the 
> writeBuffer
> 
>
> Key: KAFKA-5010
> URL: https://issues.apache.org/jira/browse/KAFKA-5010
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.2.0
>Reporter: Shuai Lin
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer:
> {code}
> [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log 
> app-topic-20170317-20. (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for 
> app-topic-20170317-20... (kafka.log.LogCleaner)
> [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log 
> app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). 
> (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log 
> app-topic-20170317-20 complete. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 
> (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior 
> to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log 
> app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
> at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
> at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158)
> at 
> org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468)
> at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363)
> at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:378)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped  
> (kafka.log.LogCleaner)
> {code}
> I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M 
> to 128M, all with no luck: The log cleaner thread crashed immediately after 
> the broker got restarted. But setting it to 256MB fixed the problem!
> Here are the settings for the cluster:
> {code}
> - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old 
> consumers)
> - log.cleaner.enable = 'true'
> - log.cleaner.min.cleanable.ratio = '0.1'
> - log.cleaner.threads = '1'
> - log.cleaner.io.buffer.load.factor = '0.98'
> - log.roll.hours = '24'
> - log.cleaner.dedupe.buffer.size = 2GB 
> - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB 
> for this topic)
> - message.max.bytes = 10MB
> {code}
> Given that the size of readBuffer and writeBuffer are exactly the same (half 
> of log.cleaner.io.buffer.size), why would the cleaner throw a 
> BufferOverflowException when writing the filtered records into the 
> writeBuffer? IIUC that should never happen because the size of the filtered 
> records should be no greater than the size of the readBuffer, thus no greater 
> than the size of the writeBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5068) Optionally print out metrics after running the perf tests

2017-04-16 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-5068:
---

Assignee: huxi

> Optionally print out metrics after running the perf tests
> -
>
> Key: KAFKA-5068
> URL: https://issues.apache.org/jira/browse/KAFKA-5068
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>Assignee: huxi
>  Labels: newbie
>
> Often, we run ProducerPerformance/ConsumerPerformance tests to investigate 
> performance issues. It's useful for the tool to print out the metrics in the 
> producer/consumer at the end of the tests. We can make this optional to 
> preserve the current behavior by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5068) Optionally print out metrics after running the perf tests

2017-04-14 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968805#comment-15968805
 ] 

huxi commented on KAFKA-5068:
-

Seems ProducerPerformance does print the final result for the test. 

> Optionally print out metrics after running the perf tests
> -
>
> Key: KAFKA-5068
> URL: https://issues.apache.org/jira/browse/KAFKA-5068
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>  Labels: newbie
>
> Often, we run ProducerPerformance/ConsumerPerformance tests to investigate 
> performance issues. It's useful for the tool to print out the metrics in the 
> producer/consumer at the end of the tests. We can make this optional to 
> preserve the current behavior by default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5062) Kafka brokers can accept malformed requests which allocate gigabytes of memory

2017-04-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967026#comment-15967026
 ] 

huxi commented on KAFKA-5062:
-

I am intrigued by how to reproduce the problem.

> Kafka brokers can accept malformed requests which allocate gigabytes of memory
> --
>
> Key: KAFKA-5062
> URL: https://issues.apache.org/jira/browse/KAFKA-5062
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>
> In some circumstances, it is possible to cause a Kafka broker to allocate 
> massive amounts of memory by writing malformed bytes to the brokers port. 
> In investigating an issue, we saw byte arrays on the kafka heap upto 1.8 
> gigabytes, the first 360 bytes of which were non kafka requests -- an 
> application was writing the wrong data to kafka, causing the broker to 
> interpret the request size as 1.8GB and then allocate that amount. Apart from 
> the first 360 bytes, the rest of the 1.8GB byte array was null. 
> We have a socket.request.max.bytes set at 100MB to protect against this kind 
> of thing, but somehow that limit is not always respected. We need to 
> investigate why and fix it.
> cc [~rnpridgeon], [~ijuma], [~gwenshap], [~cmccabe]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5048) kafka brokers hang when one of broker node killed in the cluster and remaining broker nodes are not consuming the data.

2017-04-12 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965622#comment-15965622
 ] 

huxi commented on KAFKA-5048:
-

These warnings should be transient as long as you set the replication.factor.  
What's the replication factor for the topic?

> kafka brokers hang when one of broker node killed in the cluster and 
> remaining broker nodes are not consuming the data.
> ---
>
> Key: KAFKA-5048
> URL: https://issues.apache.org/jira/browse/KAFKA-5048
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, offset manager
>Affects Versions: 0.10.1.1
>Reporter: Srinivas Yarra
>
> Kafka brokers hang when one of broker node killed in cluster and remaining 
> broker nodes are not consuming the data. Below message is getting in the 
> console consumer screen:
> WARN Auto offset commit failed for group console-consumer-26249: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> if bring it back stopped service and then all brokers are starting consuming 
> data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4987) Topic creation allows invalid config values on running brokers

2017-03-30 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950237#comment-15950237
 ] 

huxi commented on KAFKA-4987:
-

Did you use a higher version of client talking to a lower version of broker?

> Topic creation allows invalid config values on running brokers
> --
>
> Key: KAFKA-4987
> URL: https://issues.apache.org/jira/browse/KAFKA-4987
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1, 0.10.1.0
>Reporter: dan norwood
>
> we use kip4 capabilities to make a `CreateTopicsRequest` for our topics. one 
> of the configs we use is `cleanup.policy=compact, delete`. this was 
> inadvertently run against a cluster that does not support that policy. the 
> result was that the topic was created, however on subsequent broker bounce 
> the broker fails to start up
> {code}
> [2017-03-23 00:00:44,837] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.kafka.common.config.ConfigException: Invalid value compact,delete 
> for configuration cleanup.policy: String must be one of: compact, delete
>   at 
> org.apache.kafka.common.config.ConfigDef$ValidString.ensureValid(ConfigDef.java:827)
>   at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:427)
>   at 
> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:55)
>   at kafka.log.LogConfig.(LogConfig.scala:56)
>   at kafka.log.LogConfig$.fromProps(LogConfig.scala:192)
>   at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:598)
>   at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:597)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:597)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:183)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
>   at kafka.Kafka$.main(Kafka.scala:67)
>   at kafka.Kafka.main(Kafka.scala)
> [2017-03-23 00:00:44,839] INFO shutting down (kafka.server.KafkaServer)
> [2017-03-23 00:00:44,844] INFO shut down completed (kafka.server.KafkaServer)
> {code}
> i believe that the broker should fail when given an invalid config during 
> topic creation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4952) ZkUtils get topics by consumer group not correct

2017-03-27 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944500#comment-15944500
 ] 

huxi commented on KAFKA-4952:
-

[~zxylvlp] All subscribed topics' names will be created as a child znode under 
`owners` when the consumer group is running, so it is feasible to retrieve all 
topics by fetching all children nodes under this directory, I think.


> ZkUtils get topics by consumer group not correct
> 
>
> Key: KAFKA-4952
> URL: https://issues.apache.org/jira/browse/KAFKA-4952
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 
> 0.10.1.1, 0.10.2.0
> Environment: all
>Reporter: ZhaoXingyu
>  Labels: easyfix
> Attachments: WechatIMG1.jpeg
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ZkUtils.getTopicsByConsumerGroup should get topics by consumergroup offsets 
> dir, but the code get topics by owners now. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4949) Calling kaka-consumer-group.sh to get the consumer offset throws OOM with heap space error

2017-03-26 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942584#comment-15942584
 ] 

huxi edited comment on KAFKA-4949 at 3/27/17 3:30 AM:
--

Is it a duplicate of 
[kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]?


was (Author: huxi_2b):
Is is a duplicate of 
[kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]?

> Calling kaka-consumer-group.sh to get the consumer offset throws OOM with 
> heap space error
> --
>
> Key: KAFKA-4949
> URL: https://issues.apache.org/jira/browse/KAFKA-4949
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.10.1.0
>Reporter: T Rao
>
> Command --> 
> bin/kafka-consumer-groups.sh --bootstrap-server 
> broker1:9092,broker2:9092,broker3:9092 --describe --group testgroups
> Error 
> -
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:180)
>   at 
> kafka.admin.AdminClient.kafka$admin$AdminClient$$send(AdminClient.scala:49)
>   at 
> kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:61)
>   at 
> kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:58)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:58)
>   at kafka.admin.AdminClient.findCoordinator(AdminClient.scala:72)
>   at kafka.admin.AdminClient.describeGroup(AdminClient.scala:125)
>   at kafka.admin.AdminClient.describeConsumerGroup(AdminClient.scala:147)
>   at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:308)
>   at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89)
>   at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describe(ConsumerGroupCommand.scala:296)
>   at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68)
>   at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4949) Calling kaka-consumer-group.sh to get the consumer offset throws OOM with heap space error

2017-03-26 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942584#comment-15942584
 ] 

huxi commented on KAFKA-4949:
-

Is is a duplicate of 
[kafka-4090|https://issues.apache.org/jira/browse/KAFKA-4090]?

> Calling kaka-consumer-group.sh to get the consumer offset throws OOM with 
> heap space error
> --
>
> Key: KAFKA-4949
> URL: https://issues.apache.org/jira/browse/KAFKA-4949
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.10.1.0
>Reporter: T Rao
>
> Command --> 
> bin/kafka-consumer-groups.sh --bootstrap-server 
> broker1:9092,broker2:9092,broker3:9092 --describe --group testgroups
> Error 
> -
> java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>   at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>   at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
>   at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:180)
>   at 
> kafka.admin.AdminClient.kafka$admin$AdminClient$$send(AdminClient.scala:49)
>   at 
> kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:61)
>   at 
> kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:58)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:58)
>   at kafka.admin.AdminClient.findCoordinator(AdminClient.scala:72)
>   at kafka.admin.AdminClient.describeGroup(AdminClient.scala:125)
>   at kafka.admin.AdminClient.describeConsumerGroup(AdminClient.scala:147)
>   at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describeGroup(ConsumerGroupCommand.scala:308)
>   at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService$class.describe(ConsumerGroupCommand.scala:89)
>   at 
> kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.describe(ConsumerGroupCommand.scala:296)
>   at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:68)
>   at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4908) consumer.properties logging warnings

2017-03-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932099#comment-15932099
 ] 

huxi commented on KAFKA-4908:
-

`zookeeper.connection.timeout.ms` and  `zookeeper.connect` are old consumer's 
configs which should be not specified when issuing console consumer with new 
consumer (which is also the default case). If you do, Kafka complains that and 
it will not recognize them as valid, which is an expected behavior.

> consumer.properties logging warnings
> 
>
> Key: KAFKA-4908
> URL: https://issues.apache.org/jira/browse/KAFKA-4908
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Seweryn Habdank-Wojewodzki
>Priority: Minor
>
> default consumer.properties at startaup of the console consumer delivered 
> with Kafka package are logging warnings:
> [2017-03-15 16:36:57,439] WARN The configuration 
> 'zookeeper.connection.timeout.ms' was supplied but isn't a known config. 
> (org.apache.kafka.clients.consumer.ConsumerConfig)
> [2017-03-15 16:36:57,455] WARN The configuration 'zookeeper.connect' was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.consumer.ConsumerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4879) KafkaConsumer.position may hang forever when deleting a topic

2017-03-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904322#comment-15904322
 ] 

huxi commented on KAFKA-4879:
-

Could have KafkaConsumer#updateFetchPositions, Fetcher#updateFetchPositions, 
Fetcher#resetOffset add a timeout. An alternative way to return immediately 
once capturing `UnknownTopicOrPartitionException` explicitly.

> KafkaConsumer.position may hang forever when deleting a topic
> -
>
> Key: KAFKA-4879
> URL: https://issues.apache.org/jira/browse/KAFKA-4879
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.10.2.0
>Reporter: Shixiong Zhu
>
> KafkaConsumer.position may hang forever when deleting a topic. The problem is 
> this line 
> https://github.com/apache/kafka/blob/022bf129518e33e165f9ceefc4ab9e622952d3bd/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L374
> The timeout is "Long.MAX_VALUE", and it will just retry forever for 
> UnknownTopicOrPartitionException.
> Here is a reproducer
> {code}
> import org.apache.kafka.clients.consumer.KafkaConsumer;
> import org.apache.kafka.common.TopicPartition;
> import org.apache.kafka.common.serialization.StringDeserializer;
> import java.util.Collections;
> import java.util.Properties;
> import java.util.Set;
> public class KafkaReproducer {
>   public static void main(String[] args) {
> // Make sure "delete.topic.enable" is set to true.
> // Please create the topic test with "3" partitions manually.
> // The issue is gone when there is only one partition.
> String topic = "test";
> Properties props = new Properties();
> props.put("bootstrap.servers", "localhost:9092");
> props.put("group.id", "testgroup");
> props.put("value.deserializer", StringDeserializer.class.getName());
> props.put("key.deserializer", StringDeserializer.class.getName());
> props.put("enable.auto.commit", "false");
> KafkaConsumer kc = new KafkaConsumer(props);
> kc.subscribe(Collections.singletonList(topic));
> kc.poll(0);
> Set partitions = kc.assignment();
> System.out.println("partitions: " + partitions);
> kc.pause(partitions);
> kc.seekToEnd(partitions);
> System.out.println("please delete the topic in 30 seconds");
> try {
>   // Sleep 30 seconds to give us enough time to delete the topic.
>   Thread.sleep(3);
> } catch (InterruptedException e) {
>   e.printStackTrace();
> }
> System.out.println("sleep end");
> for (TopicPartition p : partitions) {
>   System.out.println(p + " offset: " + kc.position(p));
> }
> System.out.println("cannot reach here");
> kc.close();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4866) Kafka console consumer property is ignored

2017-03-08 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4866:
---

Assignee: huxi

> Kafka console consumer property is ignored
> --
>
> Key: KAFKA-4866
> URL: https://issues.apache.org/jira/browse/KAFKA-4866
> Project: Kafka
>  Issue Type: Bug
>  Components: core, tools
>Affects Versions: 0.10.2.0
> Environment: Java 8, Mac
>Reporter: Frank Lyaruu
>Assignee: huxi
>Priority: Trivial
>
> I'd like to read a topic using the console consumer, which prints the keys 
> but not the values:
> kafka-console-consumer --bootstrap-server someserver:9092 --from-beginning 
> --property print.key=true --property print.value=false --topic some_topic
> the print.value property seems to be completely ignored (I seems missing in 
> the source), but it is mentioned in the quickstart.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong

2017-03-06 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898790#comment-15898790
 ] 

huxi commented on KAFKA-4834:
-

Did you delete the zookeeper nodes manually before issuing this delete topics?

> Kafka cannot delete topic with ReplicaStateMachine went wrong
> -
>
> Key: KAFKA-4834
> URL: https://issues.apache.org/jira/browse/KAFKA-4834
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Dan
>  Labels: reliability
>
> It happened several times that some topics can not be deleted in our 
> production environment. By analyzing the log, we found ReplicaStateMachine 
> went wrong. Here are the error messages:
> In state-change.log:
> ERROR Controller 2 epoch 201 initiated state change of replica 1 for 
> partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted 
> failed (state.change.logger)
> java.lang.AssertionError: assertion failed: Replica 
> [Topic=test_create_topic1,Partition=1,Replica=1] should be in the 
> OfflineReplica states before moving to ReplicaDeletionStarted state. Instead 
> it is in OnlineReplica state
> at scala.Predef$.assert(Predef.scala:179)
> at 
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> In controller.log:
> INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip 
> sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch)
> There may exist two controllers in the cluster because creating a new topic 
> may trigger two machines to change the state of same partition,

[jira] [Commented] (KAFKA-4844) kafka is holding open file descriptors

2017-03-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896815#comment-15896815
 ] 

huxi commented on KAFKA-4844:
-

What is your OS version? Seems it's a known behavior before NFS 4.

> kafka is holding open file descriptors
> --
>
> Key: KAFKA-4844
> URL: https://issues.apache.org/jira/browse/KAFKA-4844
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: chao
>Priority: Critical
>
> We found strange issue on Kafka 0.9.0.1 , kafka is holding opne file 
> descriptors , and not allowing disk space to be reclaimed
> my question:
> 1. what does file (nfsX) mean ??? 
> 2. why kafka is holding file ?? 
> $ sudo lsof /nas/kafka_logs/kafka/Order-6/.nfs04550ffcbd61
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 97465 kafka mem REG 0,25 10485760 72683516 
> /nas/kafka_logs/kafka/Order-6/.nfs04550ffcbd61



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong

2017-03-02 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893619#comment-15893619
 ] 

huxi commented on KAFKA-4834:
-

How did you delete the topic? Seems the partitions' state is still Online which 
is weird a little bit since delete thread should firstly put them into Offline. 

> Kafka cannot delete topic with ReplicaStateMachine went wrong
> -
>
> Key: KAFKA-4834
> URL: https://issues.apache.org/jira/browse/KAFKA-4834
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Dan
>
> It happened several times that some topics can not be deleted in our 
> production environment. By analyzing the log, we found ReplicaStateMachine 
> went wrong. Here are the error messages:
> In state-change.log:
> ERROR Controller 2 epoch 201 initiated state change of replica 1 for 
> partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted 
> failed (state.change.logger)
> java.lang.AssertionError: assertion failed: Replica 
> [Topic=test_create_topic1,Partition=1,Replica=1] should be in the 
> OfflineReplica states before moving to ReplicaDeletionStarted state. Instead 
> it is in OnlineReplica state
> at scala.Predef$.assert(Predef.scala:179)
> at 
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> In controller.log:
> INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip 
> sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch)
> There may exist two controllers in the cluster because creating a new topic 
> may trigger two machines to change 

[jira] [Commented] (KAFKA-4822) Kafka producer implementation without additional threads, similar to sync producer of 0.8.

2017-03-01 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889871#comment-15889871
 ] 

huxi commented on KAFKA-4822:
-

For KafkaProducer, all writes are asynchronous by default. Snippets below 
implements synchronous writes:
{code}
Future future = producer.send(record);
RecordMetadata metadata = future.get();
{code}



> Kafka producer implementation without additional threads, similar to sync 
> producer of 0.8.
> --
>
> Key: KAFKA-4822
> URL: https://issues.apache.org/jira/browse/KAFKA-4822
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>Reporter: Giri
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4811:
---

Assignee: huxi

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>Assignee: huxi
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887308#comment-15887308
 ] 

huxi commented on KAFKA-4811:
-

[~junrao] what about the port change? Do we also need to consider this 
situation?

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large

2017-02-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877641#comment-15877641
 ] 

huxi edited comment on KAFKA-4762 at 2/22/17 7:16 AM:
--

[~huadongliu] The reason you saw `NoCompressionCodec` in the output is because 
of the effect of the config `deep-iteration` which enforces a deep iteration of 
the compressed messages. 

However, it's easy to find out that many messages share a same position which 
means the producer enables the compression if it's not explicitly set on broker 
side. 


was (Author: huxi_2b):
[~huadongliu] The reason you saw `NoCompressionCodec` in the output is because 
of the effect of the config `deep-iteration` which enforces a deep interation 
of the compressed messages. 

However, it's easy to find out that many messages share a same position which 
means the producer enables the compression if it's not explicitly set on broker 
side. 

> Consumer throwing RecordTooLargeException even when messages are not that 
> large
> ---
>
> Key: KAFKA-4762
> URL: https://issues.apache.org/jira/browse/KAFKA-4762
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We were just recently hit by a weird error. 
> Before going in any further, explaining of our service setup. we have a 
> producer which produces messages not larger than 256 kb of messages( we have 
> an explicit check about this on the producer side) and on the client side we 
> have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) 
> Recently our client started to see this error:
> {quote}
> org.apache.kafka.common.errors.RecordTooLargeException: There are some 
> messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is 
> larger than the fetch size 524288 and hence cannot be ever returned. Increase 
> the fetch size, or decrease the maximum message size the broker will allow.
> {quote}
> We tried consuming messages with another consumer, without any 
> max.partition.fetch.bytes limit, and it consumed fine. The messages were 
> small, and did not seem to be greater than 256 kb
> We took a log dump, and the log size looked fine.
> {quote}
> mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8
> offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8
> offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8
> offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8
> offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8
> offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8
> offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8
> offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: 
> 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8
> offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8
> {quote}
> Has anyone seen something similar? or any points to troubleshoot this further
> Please Note: To overcome this issue, we deployed a new consumer, without this 
> limit of max.partition.fetch.bytes, and it worked fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large

2017-02-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877641#comment-15877641
 ] 

huxi commented on KAFKA-4762:
-

[~huadongliu] The reason you saw `NoCompressionCodec` in the output is because 
of the effect of the config `deep-iteration` which enforces a deep interation 
of the compressed messages. 

However, it's easy to find out that many messages share a same position which 
means the producer enables the compression if it's not explicitly set on broker 
side. 

> Consumer throwing RecordTooLargeException even when messages are not that 
> large
> ---
>
> Key: KAFKA-4762
> URL: https://issues.apache.org/jira/browse/KAFKA-4762
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We were just recently hit by a weird error. 
> Before going in any further, explaining of our service setup. we have a 
> producer which produces messages not larger than 256 kb of messages( we have 
> an explicit check about this on the producer side) and on the client side we 
> have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) 
> Recently our client started to see this error:
> {quote}
> org.apache.kafka.common.errors.RecordTooLargeException: There are some 
> messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is 
> larger than the fetch size 524288 and hence cannot be ever returned. Increase 
> the fetch size, or decrease the maximum message size the broker will allow.
> {quote}
> We tried consuming messages with another consumer, without any 
> max.partition.fetch.bytes limit, and it consumed fine. The messages were 
> small, and did not seem to be greater than 256 kb
> We took a log dump, and the log size looked fine.
> {quote}
> mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8
> offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8
> offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8
> offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8
> offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8
> offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8
> offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8
> offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: 
> 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8
> offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8
> {quote}
> Has anyone seen something similar? or any points to troubleshoot this further
> Please Note: To overcome this issue, we deployed a new consumer, without this 
> limit of max.partition.fetch.bytes, and it worked fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-19 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4767:
---

Assignee: huxi

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Assignee: huxi
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4780) ReplicaFetcherThread.fetch could not get any reponse

2017-02-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873942#comment-15873942
 ] 

huxi commented on KAFKA-4780:
-

Seems it is a duplicate of 
[KAFKA-4477|https://issues.apache.org/jira/browse/KAFKA-4477?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22was%20disconnected%20before%20the%20response%20was%20read%22]?

> ReplicaFetcherThread.fetch could not get any reponse
> 
>
> Key: KAFKA-4780
> URL: https://issues.apache.org/jira/browse/KAFKA-4780
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.1.0
>Reporter: mashudong
> Attachments: capture.png, log.png, partition.png
>
>
> All partitions with broker 3 as leader has just broker 3 in its isr
>  !partition.png!
> Many IOException in server.log on broker 1 and 2
> !log.png!
> According to network packet capture, ReplicaFetcherThread of broker 1 could 
> not get any response from broker 3, and after 30 seconds the connection was 
> closed by broker 1.
> !capture.png!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870055#comment-15870055
 ] 

huxi commented on KAFKA-4767:
-

What do you mean by "leaking the IO thread"? Do you mean it could not be shut 
down successfully after interrupting the user thread in which 
KafkaProducer.close was invoked?  This should be not gonna happen since 
this.sender.initiateClose() would always be run even when you interrupt the 
user thread. 

In my opinion, interrupting the user thread is no different from invoking 
ioThread.join with a relatively small timeout because there is still a chance 
to force close the IO thread and wait it again. That's also why we swallow 
InterruptedException during the first join. 

Does it look good to you though? And for sake of the curiosity, did you 
encounter any cases where IO thread got failed to be shut down?

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869382#comment-15869382
 ] 

huxi commented on KAFKA-4767:
-

I think I might get the point of what you mean. What you really concerns is 
that IO thread might got failed to be shut down or leave an inconsistent state 
if the user thread was interrupted. Am I right?

1. That `this.sender.initiateClose()` is not interruptible which means user 
thread could always be able to initiate a close to the IO thread even after we 
interrupt the user thread somewhere.
2. I do agree to restore the interruption status of the user thread after 
catching the InterruptedException, which is a really good practice.
3. The current logic already considers the situation where user thread does not 
wait enough time to have the IO thread finish its work, so it adds forceClose 
and corresponding code to force close the IO thread. In this case, we don't 
have to explicitly do the same thing again in the catch clause like what you 
suggest.

Do they make any senses?

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868943#comment-15868943
 ] 

huxi commented on KAFKA-4767:
-

Seems that KafkaProducer already initiates a close to the IO thread by setting 
running to false, so it is not reasonable to issue ioThread.interrupt() 
directly.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  // join again (if you want to be more accurate, you can re-adjust 
> the timeout)
>  this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864954#comment-15864954
 ] 

huxi edited comment on KAFKA-4762 at 2/15/17 9:17 AM:
--

Logs show that you are using 0.10.x(or before) where max.partition.fetch.bytes 
is a hard limit even when you enable the compression. In your case, seems that 
you have enabled the compression on the producer side. 
`max.partition.fetch.bytes` also applies to the whole compressed message which 
is often much larger than a single one. That's why you run into 
RecordTooLargeException.

0.10.1 which completes 
[KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes]
 already 'fixes' your problem by making  `max.partition.fetch.bytes` field in 
the fetch request much less useful, so you can try with an 0.10.1 build.



was (Author: huxi_2b):
Logs show that you are using 0.10.x where max.partition.fetch.bytes is a hard 
limit even when you enable the compression. In your case, seems that you have 
enabled the compression on the producer side. `max.partition.fetch.bytes` also 
applies to the whole compressed message which is often much larger than a 
single one. That's why you run into RecordTooLargeException.

0.10.1 which completes 
[KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes]
 already 'fixes' your problem by making  `max.partition.fetch.bytes` field in 
the fetch request much less useful, so you can try with an 0.10.1 build.


> Consumer throwing RecordTooLargeException even when messages are not that 
> large
> ---
>
> Key: KAFKA-4762
> URL: https://issues.apache.org/jira/browse/KAFKA-4762
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We were just recently hit by a weird error. 
> Before going in any further, explaining of our service setup. we have a 
> producer which produces messages not larger than 256 kb of messages( we have 
> an explicit check about this on the producer side) and on the client side we 
> have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) 
> Recently our client started to see this error:
> {quote}
> org.apache.kafka.common.errors.RecordTooLargeException: There are some 
> messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is 
> larger than the fetch size 524288 and hence cannot be ever returned. Increase 
> the fetch size, or decrease the maximum message size the broker will allow.
> {quote}
> We tried consuming messages with another consumer, without any 
> max.partition.fetch.bytes limit, and it consumed fine. The messages were 
> small, and did not seem to be greater than 256 kb
> We took a log dump, and the log size looked fine.
> {quote}
> mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8
> offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8
> offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8
> offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8
> offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8
> offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8
> offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8
> offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: 
> 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8
> offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8
> {quote}
> Has anyone seen something similar? or any points to troubleshoot this further
> Please Note: To overcome this issue, we deployed a new consumer, without this 
> limit of max.partition.fetch.bytes, and it worked fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large

2017-02-13 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864954#comment-15864954
 ] 

huxi commented on KAFKA-4762:
-

Logs show that you are using 0.10.x where max.partition.fetch.bytes is a hard 
limit even when you enable the compression. In your case, seems that you have 
enabled the compression on the producer side. `max.partition.fetch.bytes` also 
applies to the whole compressed message which is often much larger than a 
single one. That's why you run into RecordTooLargeException.

0.10.1 which completes 
[KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes]
 already 'fixes' your problem by making  `max.partition.fetch.bytes` field in 
the fetch request much less useful, so you can try with an 0.10.1 build.


> Consumer throwing RecordTooLargeException even when messages are not that 
> large
> ---
>
> Key: KAFKA-4762
> URL: https://issues.apache.org/jira/browse/KAFKA-4762
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vipul Singh
>
> We were just recently hit by a weird error. 
> Before going in any further, explaining of our service setup. we have a 
> producer which produces messages not larger than 256 kb of messages( we have 
> an explicit check about this on the producer side) and on the client side we 
> have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) 
> Recently our client started to see this error:
> {quote}
> org.apache.kafka.common.errors.RecordTooLargeException: There are some 
> messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is 
> larger than the fetch size 524288 and hence cannot be ever returned. Increase 
> the fetch size, or decrease the maximum message size the broker will allow.
> {quote}
> We tried consuming messages with another consumer, without any 
> max.partition.fetch.bytes limit, and it consumed fine. The messages were 
> small, and did not seem to be greater than 256 kb
> We took a log dump, and the log size looked fine.
> {quote}
> mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8
> offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8
> offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8
> offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8
> offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8
> offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8
> offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8
> offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: 
> 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8
> offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8
> {quote}
> Has anyone seen something similar? or any points to troubleshoot this further
> Please Note: To overcome this issue, we deployed a new consumer, without this 
> limit of max.partition.fetch.bytes, and it worked fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4739) KafkaConsumer poll going into an infinite loop

2017-02-06 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855232#comment-15855232
 ] 

huxi commented on KAFKA-4739:
-

Seems that FETCH requests time out every 40 seconds and 40 second is the 
default value for config `request.timeout.ms` of the new consumer. Could you 
make sure if your settings take effect? And why do you set a probably large 
timeout value? Poor network condition?

> KafkaConsumer poll going into an infinite loop
> --
>
> Key: KAFKA-4739
> URL: https://issues.apache.org/jira/browse/KAFKA-4739
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4727) A Production server configuration needs to be updated

2017-02-02 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4727:
---

Assignee: huxi

> A Production server configuration needs to be updated
> -
>
> Key: KAFKA-4727
> URL: https://issues.apache.org/jira/browse/KAFKA-4727
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Assignee: huxi
>  Labels: newbie
>
> In docs/ops.html, we have a section on "A Production server configuration" 
> with queued.max.requests=16. This is often too low. We should change it to 
> the default value, which is 500. It will also be useful to see if other 
> configurations need to be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4685) All partitions offline, no conroller znode in ZK

2017-01-23 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834169#comment-15834169
 ] 

huxi commented on KAFKA-4685:
-

Server logs showed that all the brokers have come back to register in ZK again. 
See "Creating /brokers/ids/0,1,2" in the logs.  So did you mean /controller and 
/brokers/ids are still empty even after the broker 2 was elected as the new 
controller? 

> All partitions offline, no conroller znode in ZK
> 
>
> Key: KAFKA-4685
> URL: https://issues.apache.org/jira/browse/KAFKA-4685
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sinóros-Szabó Péter
> Attachments: kafka-0-logs.zip, kafka-1-logs.zip, kafka-2-logs.zip, 
> zookeeper-logs.zip
>
>
> Setup: 3 Kafka 0.11.1.1 nodes on kubernetes (in AWS), and another 3 nodes of 
> Zookeeper 3.5.2-alpha also in kubernetes (in AWS).
> At 2017-01-23 06:51 ZK sessions expired. It seems from the logs that kafka-2 
> was elected as the new controller, but I am not sure how to read that logs.
> I've checked the ZK data and both the /controller is empty and also the 
> /brokers/ids is empty. Kafka reports that all partitions are offline, 
> although it seems to be working because messages are coming and going.
> We are using an alpha version, I know that it may be a problem, but I suppose 
> that Kafka should see that there is not any node registered as controller.
> I have attached the Kafka and ZK logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4684) Kafka does not offer kafka-configs.bat on Windows box

2017-01-23 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi updated KAFKA-4684:

Labels: newbie  (was: )

> Kafka does not offer kafka-configs.bat on Windows box
> -
>
> Key: KAFKA-4684
> URL: https://issues.apache.org/jira/browse/KAFKA-4684
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10.1.1
>Reporter: huxi
>Assignee: huxi
>Priority: Minor
>  Labels: newbie
>
> Kafka does not ship with kafka-configs.bat, so it's a little inconvenient to 
> add/modify configs on Windows platform



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-01-23 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834053#comment-15834053
 ] 

huxi edited comment on KAFKA-4669 at 1/23/17 8:25 AM:
--

Maybe we could invoke an alternative method `ProduceRequestResult#await(long, 
TimeUnit)` and specify a reasonable timeout value(i.e. 30 seconds or 
request.timeout.ms of time) in RecordAccumulator#awaitFlushCompletion. If await 
times out, we record a warn log to indicate users that some request did not get 
marked as complete, and flush will not be stuck anymore, as show below:

{code:borderStyle=solid}
public void awaitFlushCompletion() throws InterruptedException {
try {
for (RecordBatch batch : this.incomplete.all()) {
if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) {
log.warn("Did not complete the produce request 
for {} in 30 seconds.", batch.produceFuture.topicPartition());
}
}
} finally {
this.flushesInProgress.decrementAndGet();
}
}
{code}

[~becket_qin] Does it make sense?


was (Author: huxi_2b):
Maybe we could invoke an alternative method `ProduceRequestResult#await(long, 
TimeUnit)` and specify a reasonable timeout value in 
RecordAccumulator#awaitFlushCompletion. If await times out, we record a warn 
log to indicate users that some request did not get marked as complete, and 
flush will not be stuck anymore, as show below:

{code:borderStyle=solid}
public void awaitFlushCompletion() throws InterruptedException {
try {
for (RecordBatch batch : this.incomplete.all()) {
if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) {
log.warn("Did not complete the produce request 
for {} in 30 seconds.", batch.produceFuture.topicPartition());
}
}
} finally {
this.flushesInProgress.decrementAndGet();
}
}
{code}

[~becket_qin] Does it make sense?

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-01-23 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834053#comment-15834053
 ] 

huxi commented on KAFKA-4669:
-

Maybe we could invoke an alternative method `ProduceRequestResult#await(long, 
TimeUnit)` and specify a reasonable timeout value in 
RecordAccumulator#awaitFlushCompletion. If await times out, we record a warn 
log to indicate users that some request did not get marked as complete, and 
flush will not be stuck anymore, as show below:

{code:borderStyle=solid}
public void awaitFlushCompletion() throws InterruptedException {
try {
for (RecordBatch batch : this.incomplete.all()) {
if (!batch.produceFuture.await(30, TimeUnit.SECONDS)) {
log.warn("Did not complete the produce request 
for {} in 30 seconds.", batch.produceFuture.topicPartition());
}
}
} finally {
this.flushesInProgress.decrementAndGet();
}
}
{code}

[~becket_qin] Does it make sense?

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:477)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:440)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
>   at java.lang.Thread.run(Thread.java:745)
> Then jstack shows the thread is hanging on:
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:57)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.awaitFlushCompletion(RecordAccumulator.java:425)
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.flush(KafkaProducer.java:544)
>   at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:224)
>   at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>   at java.lang.Thread.run(Thread.java:745)
> client code 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command

2017-01-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1586#comment-1586
 ] 

huxi edited comment on KAFKA-4675 at 1/22/17 7:26 AM:
--

Failed to reproduce this issue with snippet below:

{code}
val zkUtils = ZkUtils("localhost:2181", 3, 3, false)
AdminUtils.deleteTopic(zkUtils, "old-topic")
AdminUtils.createTopic(zkUtils, "new-topic", 1, 1)
{code}

After running the code above, topic "new-topic" will be created. Seems 
subsequent CreateTopic command has still been invoked. [~guozhang] Does the 
code reflect the way you run into this problem?


was (Author: huxi_2b):
Failed to reproduce this issue with snippet below:

{code}
val zkUtils = ZkUtils("localhost:2181", 3, 3, false)
AdminUtils.deleteTopic(zkUtils, "old-topic")
AdminUtils.createTopic(zkUtils, "new-topic", 1, 1)
{code}

After running the code above, topic "new-topic" will be created. Seems 
subsequent CreateTopic command has still been invoked. Does the code reflect 
the way you run into this problem?

> Subsequent CreateTopic command could be lost after a DeleteTopic command
> 
>
> Key: KAFKA-4675
> URL: https://issues.apache.org/jira/browse/KAFKA-4675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: admin
>
> This is discovered while investigating KAFKA-3896: If an admin client sends a 
> delete topic command and a create topic command consecutively, even if it 
> wait for the response of the previous command before issuing the second, 
> there is still a race condition that the create topic command could be "lost".
> This is because currently these commands are all asynchronous as defined in 
> KIP-4, and controller will return the response once it has written the 
> corresponding data to ZK path, which can be handled by different listener 
> threads at different paces, and if the thread handling create is faster than 
> the other, the executions could be effectively re-ordered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command

2017-01-21 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1586#comment-1586
 ] 

huxi commented on KAFKA-4675:
-

Failed to reproduce this issue with snippet below:

{code}
val zkUtils = ZkUtils("localhost:2181", 3, 3, false)
AdminUtils.deleteTopic(zkUtils, "old-topic")
AdminUtils.createTopic(zkUtils, "new-topic", 1, 1)
{code}

After running the code above, topic "new-topic" will be created. Seems 
subsequent CreateTopic command has still been invoked. Does the code reflect 
the way you run into this problem?

> Subsequent CreateTopic command could be lost after a DeleteTopic command
> 
>
> Key: KAFKA-4675
> URL: https://issues.apache.org/jira/browse/KAFKA-4675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: admin
>
> This is discovered while investigating KAFKA-3896: If an admin client sends a 
> delete topic command and a create topic command consecutively, even if it 
> wait for the response of the previous command before issuing the second, 
> there is still a race condition that the create topic command could be "lost".
> This is because currently these commands are all asynchronous as defined in 
> KIP-4, and controller will return the response once it has written the 
> corresponding data to ZK path, which can be handled by different listener 
> threads at different paces, and if the thread handling create is faster than 
> the other, the executions could be effectively re-ordered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4684) Kafka does not offer kafka-configs.bat on Windows box

2017-01-21 Thread huxi (JIRA)
huxi created KAFKA-4684:
---

 Summary: Kafka does not offer kafka-configs.bat on Windows box
 Key: KAFKA-4684
 URL: https://issues.apache.org/jira/browse/KAFKA-4684
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.10.1.1
Reporter: huxi
Assignee: huxi
Priority: Minor


Kafka does not ship with kafka-configs.bat, so it's a little inconvenient to 
add/modify configs on Windows platform



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4680) min.insync.replicas can be set higher than replication factor

2017-01-20 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832729#comment-15832729
 ] 

huxi commented on KAFKA-4680:
-

If sending records to a topic with a larger 'min.insync.replicas' than 
replication factor with acks set to all, client callback returns 
`NOT_ENOUGH_REPLICAS` indicating that you be aware of this and adjust the 
topic-level min.isr.replicas or check the broker availability, doesn't it?

I do agree that it's definitely better if we add some check before 
creating/altering topic. What I am concerned is the completeness. If a check is 
introduced during creating/altering topics, then do we also need to add some 
checks before changing the replication factor, especially reducing this number 
(we could call kafka-reassign-partitions.sh to do this although it's sort of 
inconvenient to use), to ensure the reduced number is still larger than 
min.isr.replicas? If this is the case, seems we have to figure out all the 
affected function paths.

You of course could assign this jira to youself and get rolling :-)





> min.insync.replicas can be set higher than replication factor
> -
>
> Key: KAFKA-4680
> URL: https://issues.apache.org/jira/browse/KAFKA-4680
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: James Cheng
>
> It is possible to specify a min.insync.replicas for a topic that is higher 
> than the replication factor of the topic. If you do this, you will not be 
> able to produce to the topic with acks=all.
> Furthermore, each produce request (including retries) to the topic will emit 
> an ERROR level message to the broker debuglogs. If this is not noticed 
> quickly enough, it can cause the debuglogs to balloon.
> We actually hosed one of our Kafka clusters because of this. A topic got 
> configured with min.insync.replicas > replication factor. It had partitions 
> on all brokers of our cluster. The broker logs ballooned and filled up the 
> disks. We run these clusters on CoreOS, and CoreOS's etcd database got 
> corrupted. (Kafka didn't get corrupted, tho).
> I think Kafka should do validation when someone tries to change a topic to 
> min.insync.replicas > replication factor, and reject the change.
> This would presumably affect kafka-topics.sh, kafka-configs.sh, as well as 
> the CreateTopics operation that came in KIP-4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command

2017-01-20 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831945#comment-15831945
 ] 

huxi commented on KAFKA-4675:
-

Seems deleting '/brokers/topics/' comes fourth from the bottom during 
the topic deletion while creating it comes top front in createTopic logic, did 
you encounter the TopicExistsException that failed the creation?

> Subsequent CreateTopic command could be lost after a DeleteTopic command
> 
>
> Key: KAFKA-4675
> URL: https://issues.apache.org/jira/browse/KAFKA-4675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: admin
>
> This is discovered while investigating KAFKA-3896: If an admin client sends a 
> delete topic command and a create topic command consecutively, even if it 
> wait for the response of the previous command before issuing the second, 
> there is still a race condition that the create topic command could be "lost".
> This is because currently these commands are all asynchronous as defined in 
> KIP-4, and controller will return the response once it has written the 
> corresponding data to ZK path, which can be handled by different listener 
> threads at different paces, and if the thread handling create is faster than 
> the other, the executions could be effectively re-ordered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4592) Kafka Producer Metrics Invalid Value

2017-01-19 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi updated KAFKA-4592:

Reviewer: Ismael Juma

> Kafka Producer Metrics Invalid Value
> 
>
> Key: KAFKA-4592
> URL: https://issues.apache.org/jira/browse/KAFKA-4592
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
>Reporter: AJ Jwair
>Assignee: huxi
>
> Producer metrics
> Metric name: record-size-max
> When no records are produced during the monitoring window, the 
> record-size-max has an invalid value of -9.223372036854776E16
> Please notice that the value is not a very small number close to zero bytes, 
> it is negative 90 quadrillion bytes
> The same behavior was observed in: records-lag-max



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper

2017-01-19 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi updated KAFKA-4295:

Reviewer: Ismael Juma

> kafka-console-consumer.sh does not delete the temporary group in zookeeper
> --
>
> Key: KAFKA-4295
> URL: https://issues.apache.org/jira/browse/KAFKA-4295
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sswater Shi
>Assignee: huxi
>Priority: Minor
>
> I'm not sure it is a bug or you guys designed it.
> Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group 
> information in zookeeper/consumers on exit when without "--new-consumer". 
> There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if 
> kafka-console-consumer.sh runs a lot of times.
> When 0.8.x.x,  the kafka-console-consumer.sh can be followed by an argument 
> "group". If not specified, the kafka-console-consumer.sh will create a 
> temporary group name like 'console-consumer-'. If the group name is 
> specified by "group", the information in the zookeeper/consumers will be kept 
> on exit. If the group name is a temporary one, the information in the 
> zookeeper will be deleted when kafka-console-consumer.sh is quitted by 
> Ctrl+C. Why this is changed from 0.9.x.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4674) Frequent ISR shrinking and expanding and disconnects among brokers

2017-01-19 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829525#comment-15829525
 ] 

huxi commented on KAFKA-4674:
-

Is it a duplicate of 
[KAFKA-3916|https://issues.apache.org/jira/browse/KAFKA-3916]?

> Frequent ISR shrinking and expanding and disconnects among brokers
> --
>
> Key: KAFKA-4674
> URL: https://issues.apache.org/jira/browse/KAFKA-4674
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 0.10.0.1
> Environment: OS: Redhat Linux 2.6.32-431.el6.x86_64
> JDK: 1.8.0_45
>Reporter: Kaiming Wan
> Attachments: controller.log.rar, server.log.2017-01-11-14, 
> zookeeper.out.2017-01-11.log
>
>
> We use a kafka cluster with 3 brokers in production environment. It works 
> well for several month. Recently, we get the UnderReplicatedPartitions>0 
> warning mail. When we check the log, we find that the partition is always 
> experience ISR shrinking and expanding. And the disconnection exception can 
> be found in controller's log.
> We also found some deviant output in zookeeper's log which point to a 
> consumer(using old API depends on zookeeper ) which has stopped its work with 
> many lags.
> Actually, it is not the first time we encounter this problem. When we 
> first met this problem, we also found the same phenomenon and the log output. 
> We solve the problem by deleting the consumer node info in zookeeper. Then 
> everything goes well.
> However, this time, after we deleting the consumer which already have 
> large lag, the frequent ISR shrinking and expanding didn't stop for a very 
> long time(serveral hours). Though, the issue didn't affect our consumer and 
> producer, we think it will make our cluster unstable. So at last, we solve 
> this problem by restart the controller broker.
> And now I wander what cause this problem. I check the source code and 
> only know poll timeout will cause disconnection and ISR shrinking. Is the 
> issue related to zookeeper because it will not hold too many metadata 
> modification and make the replication fetch thread take more time?
> I upload the log file in the attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak

2017-01-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545
 ] 

huxi edited comment on KAFKA-4629 at 1/17/17 6:48 AM:
--

Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some 
debugging, I found that BrokerTopicMetrics.close could always be invoked even 
when the corresponding MBeans did not get removed. Not sure if it's not caused 
by Kafka code. Might need to investigate whether it's a known issue for 
metrics-core library.


was (Author: huxi_2b):
Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some 
debugging, I found that BrokerTopicMetrics.close could always be invoked even 
when the corresponding MBeans did not get removed. Seems that it's not caused 
by Kafka code. Might need to investigate whether it's a known issue for 
metrics-core library.

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak

2017-01-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545
 ] 

huxi edited comment on KAFKA-4629 at 1/17/17 6:41 AM:
--

Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some 
debugging, I found that BrokerTopicMetrics.close could always be invoked even 
when the corresponding MBeans did not get removed. Seems that it's not caused 
by Kafka code. Might need to investigate whether it's a known issue for 
metrics-core library.


was (Author: huxi_2b):
Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some 
debugging, I found that BrokerTopicMetrics.close could always be invoked even 
the corresponding MBeans did not get removed. Seems that it's not caused by 
Kafka code. Might need to investigate whether it's a known issue for 
metrics-core library.

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4629) Per topic MBeans leak

2017-01-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825545#comment-15825545
 ] 

huxi commented on KAFKA-4629:
-

Thanks for [~alberto.fo...@natwestmarkets.com]'s response. After some 
debugging, I found that BrokerTopicMetrics.close could always be invoked even 
the corresponding MBeans did not get removed. Seems that it's not caused by 
Kafka code. Might need to investigate whether it's a known issue for 
metrics-core library.

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4629) Per topic MBeans leak

2017-01-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823329#comment-15823329
 ] 

huxi edited comment on KAFKA-4629 at 1/16/17 9:36 AM:
--

[~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing 
the issue. And some interesting things were found. If deleting topic command  
was issued right after creating the topic, some MBeans at the topic level 
indeed failed to get removed. But if some delay time was put on between these 
two commands, then all the topic-level MBeans could be removed as expected. In 
the former case, the controller is still doing many background tasks to 
complete the topic creation although the CREATE command returns successfully. 
So could you wait some time before issuing the delete command to see whether 
you would run into this issue? 

By the way, I am using 0.10.1.0 although I think it does not matter.


was (Author: huxi_2b):
[~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing 
the issue. And some interesting things were found. If deleting topic command  
was issued right after creating the topic, some MBeans at the topic level 
indeed failed to get removed. But if some delay time was put on between these 
two commands, then all the topic-level MBeans could be removed as expected. In 
the former case, the controller is still doing many background tasks to 
complete the topic creation although the CREATE command returns successfully. 
So could you wait some time before issuing the delete command to see whether 
you would run into this issue? 

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-16 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823594#comment-15823594
 ] 

huxi commented on KAFKA-4595:
-

[~pengwei] I don't think it's doable since it would leave inconsistent states 
for the involved partitions, thus polluting controller's cache. As in the 
current design, the topic deleting is totally asynchronous. Users nearly always 
see the topic is marked as deleted successfully although there are several 
steps the controller needs to finish in the background. If we time out the 
deleteTopicStopReplicaCallback, completeReplicaDeletion will not be invoked.

Does it make sense?

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>Locked ownable synchronizers:
>   - <

[jira] [Commented] (KAFKA-4557) ConcurrentModificationException in KafkaProducer event loop

2017-01-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823399#comment-15823399
 ] 

huxi commented on KAFKA-4557:
-

[~scf37] Did you create your own Sender instances in the producer code?

> ConcurrentModificationException in KafkaProducer event loop
> ---
>
> Key: KAFKA-4557
> URL: https://issues.apache.org/jira/browse/KAFKA-4557
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Sergey Alaev
>Priority: Critical
>  Labels: reliability
> Fix For: 0.10.2.0
>
>
> Under heavy load, Kafka producer can stop publishing events. Logs below.
> [2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [] [DEBUG]: Disconnecting from node 2 due to 
> request timeout.
> [2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> [2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received. (#2 from 2016-12-19T15:01:28.793Z)
> 
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#285 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [SgsService] [] [] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
> deadletter queue
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#286 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [Sender] [] [] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer 
> I/O thread:
> java.util.ConcurrentModificationException: null
> at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
> ~[na:1.8.0_45]
> at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
>  ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
> ~[kafka-clients-0.10.1.0.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [] [1B2M2Y8Asg] [WARN]: Error while fetching 
> metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers

2017-01-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823347#comment-15823347
 ] 

huxi commented on KAFKA-4577:
-

I suspect the TopicDeletionManager instance in KafkaController is null although 
the controller should have created an instance of that during the 
initialization. Could you check logs to see if controller has been failed over 
to another broker?

> NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers 
> --
>
> Key: KAFKA-4577
> URL: https://issues.apache.org/jira/browse/KAFKA-4577
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Scott Reynolds
>
> Seems as if either deleteTopicManager or deleteTopicManager. 
> partitionsToBeDeleted wasn't set ?
> {code}
> java.lang.NullPointerException
> at 
> kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers
>  (ControllerChannelManager.scala:331)
> at kafka.controller.KafkaController.sendUpdateMetadataRequest 
> (KafkaController.scala:1023)
> at 
> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications
>  (KafkaController.scala:1371)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp
>  (KafkaController.scala:1358)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234)
> at kafka.controller.IsrChangeNotificationListener.handleChildChange 
> (KafkaController.scala:1351)
> at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843)
> at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4629) Per topic MBeans leak

2017-01-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823329#comment-15823329
 ] 

huxi commented on KAFKA-4629:
-

[~alberto.fo...@natwestmarkets.com] Yes, multiple brokers DOES help reproducing 
the issue. And some interesting things were found. If deleting topic command  
was issued right after creating the topic, some MBeans at the topic level 
indeed failed to get removed. But if some delay time was put on between these 
two commands, then all the topic-level MBeans could be removed as expected. In 
the former case, the controller is still doing many background tasks to 
complete the topic creation although the CREATE command returns successfully. 
So could you wait some time before issuing the delete command to see whether 
you would run into this issue? 

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4629) Per topic MBeans leak

2017-01-14 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823029#comment-15823029
 ] 

huxi commented on KAFKA-4629:
-

I tried in my kafka0.10.0.1 single-broker test environment with no luck to 
reproduce the issue. The partition number and replication factor were both set 
to 1 which I am sure if the setting matters.

As for the WARN logs, seems you are requesting a deletion for a non-exist 
topic. Could you confirm that the topic exists before invoking 
AdminUtils.deleteTopic?

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4616) Message loss is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle

2017-01-13 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821473#comment-15821473
 ] 

huxi commented on KAFKA-4616:
-

They are not missing, but are just not delivered to Kafka successfully.
bq. The guarantee that Kafka offers is that a committed message will not be 
lost, as long as there is at least one in sync replica alive, at all times.
To avoid your "data loss", try to append "retries=" 
to the command, although you might see some repeated produced messages. 

> Message loss is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle
> ---
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
> Topic: test3R3P3Partition: 0Leader: 2   Replicas: 
> 1,2,0 Isr: 2,0,1
> Topic: test3R3P3Partition: 1Leader: 2   Replicas: 
> 2,0,1 Isr: 2,0,1
> Topic: test3R3P3Partition: 2Leader: 0   Replicas: 
> 0,1,2 Isr: 2,0,1
> **consumer group command:
> $  bin/kafka-consumer-groups.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
> console-consumer-9926
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> console-consumer-9926  test3R3P3  0  
> 33265   33265   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  1  
> 4   4   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  2  
> 3   3   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> could you please help me understand what this error means "err - 
> org.apache.kafka.common.errors.NetworkException: The se

[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between

2017-01-11 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820267#comment-15820267
 ] 

huxi commented on KAFKA-4616:
-

Appending acks option to the command as this:  --producer-props 
bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x: acks=-1

> Message log is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle in-between 
> --
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
> Topic: test3R3P3Partition: 0Leader: 2   Replicas: 
> 1,2,0 Isr: 2,0,1
> Topic: test3R3P3Partition: 1Leader: 2   Replicas: 
> 2,0,1 Isr: 2,0,1
> Topic: test3R3P3Partition: 2Leader: 0   Replicas: 
> 0,1,2 Isr: 2,0,1
> **consumer group command:
> $  bin/kafka-consumer-groups.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
> console-consumer-9926
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> console-consumer-9926  test3R3P3  0  
> 33265   33265   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  1  
> 4   4   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  2  
> 3   3   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> could you please help me understand what this error means "err - 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received."?
> Could you please provide suggestion to fix this issue?
> we are seeing this behavior every-time we perform above test-scenario.
> my understanding is, there should not any data loss till n-1 broker is alive. 
> is message loss is an expected behavior in the above case?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4616) Message log is seen when kafka-producer-perf-test.sh is running and any broker restarted in middle in-between

2017-01-11 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817886#comment-15817886
 ] 

huxi commented on KAFKA-4616:
-

Could you set acks = -1 and retry the producer to see if there still exists 
messages loss?

> Message log is seen when kafka-producer-perf-test.sh is running and any 
> broker restarted in middle in-between 
> --
>
> Key: KAFKA-4616
> URL: https://issues.apache.org/jira/browse/KAFKA-4616
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
> Environment: Apache mesos
>Reporter: sandeep kumar singh
>
> if any broker is restarted while kafka-producer-perf-test.sh command is 
> running, we see message loss.
> commands i run:
> **perf command:
> $ bin/kafka-producer-perf-test.sh --num-records 10 --record-size 4096  
> --throughput 1000 --topic test3R3P3 --producer-props 
> bootstrap.servers=x.x.x.x:,x.x.x.x:,x.x.x.x:
> I am  sending 10 messages of each having size 4096
> error thrown by perf command:
> 4944 records sent, 988.6 records/sec (3.86 MB/sec), 31.5 ms avg latency, 
> 433.0 max latency.
> 5061 records sent, 1012.0 records/sec (3.95 MB/sec), 67.7 ms avg latency, 
> 798.0 max latency.
> 5001 records sent, 1000.0 records/sec (3.91 MB/sec), 49.0 ms avg latency, 
> 503.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 37.3 ms avg latency, 
> 594.0 max latency.
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 32.6 ms avg latency, 
> 501.0 max latency.
> 5000 records sent, 999.8 records/sec (3.91 MB/sec), 49.4 ms avg latency, 
> 516.0 max latency.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> truncated
> 5001 records sent, 1000.2 records/sec (3.91 MB/sec), 33.9 ms avg latency, 
> 497.0 max latency.
> 4928 records sent, 985.6 records/sec (3.85 MB/sec), 42.1 ms avg latency, 
> 521.0 max latency.
> 5073 records sent, 1014.4 records/sec (3.96 MB/sec), 39.4 ms avg latency, 
> 418.0 max latency.
> 10 records sent, 999.950002 records/sec (3.91 MB/sec), 37.65 ms avg 
> latency, 798.00 ms max latency, 1 ms 50th, 260 ms 95th, 411 ms 99th, 571 ms 
> 99.9th.
> **consumer command:
> $ bin/kafka-console-consumer.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --topic  test3R3P3  
> 1>~/kafka_output.log
> message stored:
> $ wc -l ~/kafka_output.log
> 99932 /home/montana/kafka_output.log
> I found only 99932 message are stored and 68 messages are lost.
> **topic describe command:
>  $ bin/kafka-topics.sh  --zookeeper x.x.x.x:2181/dcos-service-kafka-framework 
> --describe |grep test3R3
> Topic:test3R3P3 PartitionCount:3ReplicationFactor:3 Configs:
> Topic: test3R3P3Partition: 0Leader: 2   Replicas: 
> 1,2,0 Isr: 2,0,1
> Topic: test3R3P3Partition: 1Leader: 2   Replicas: 
> 2,0,1 Isr: 2,0,1
> Topic: test3R3P3Partition: 2Leader: 0   Replicas: 
> 0,1,2 Isr: 2,0,1
> **consumer group command:
> $  bin/kafka-consumer-groups.sh --zookeeper 
> x.x.x.x:2181/dcos-service-kafka-framework --describe --group 
> console-consumer-9926
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> console-consumer-9926  test3R3P3  0  
> 33265   33265   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  1  
> 4   4   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> console-consumer-9926  test3R3P3  2  
> 3   3   0   
> console-consumer-9926_node-44a8422fe1a0-1484127474935-c795478e-0
> could you please help me understand what this error means "err - 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received."?
> Could you please provide suggestion to fix this issue?
> we are seeing this behavior every-time we perform above test-scenario.
> my understanding is, there should not any data loss till n-1 broker is alive. 
> is message loss is an expected behavior in the above case?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

2017-01-11 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817849#comment-15817849
 ] 

huxi edited comment on KAFKA-4614 at 1/11/17 10:00 AM:
---

Great catch for the whole things. 

As for the unmap things, seems there is no perfect solution at this moment. 
Java does not offer a corresponding unmap that invokes underlying munmap system 
call for the mapped file. See [the jdk 
bug|http://bugs.java.com/view_bug.do?bug_id=4724038]

Right now there exists two "not-that-perfect" solutions:
1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 
'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method 
named 'forceUnmap' that actually does this kind of thing. We could simply reuse 
this method. However, this solution has two drawbacks: 1. If others use 
MappedByteBuffer later, it will crash. 2. Portability risk, although 
sun.misc.Cleaner is present in most of major JVM vendors.
2 Explicitly invoke 'System.gc()', which is also a not-good-enough idea, 
especially when setting -XX:+DisableExplicitGC.


was (Author: huxi_2b):
Great catch for the whole things. 

As for the unmap things, seems there is no perfect solution at this moment. 
Java does not offer a corresponding unmap that invokes underlying munmap system 
call for the mapped file. See [the jdk 
bug|http://bugs.java.com/view_bug.do?bug_id=4724038]

Right now there exists two "not-that-perfect" solutions:
1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 
'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method 
named 'forceUnmap' that actually does this kind of thing. We could simply reuse 
this method. However, this solution has two drawbacks: 1. If others use 
MappedByteBuffer later, it will crash. 2. Portability risk, although 
sun.misc.Cleaner is present in most of major JVM vendors.
2 Explicitly invoke 'System.gc()', which is also a good enough idea, especially 
when setting -XX:+DisableExplicitGC.

> Long GC pause harming broker performance which is caused by mmap objects 
> created for OffsetIndex
> 
>
> Key: KAFKA-4614
> URL: https://issues.apache.org/jira/browse/KAFKA-4614
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>
> First let me clarify our system environment information as I think it's 
> important to understand this issue:
> OS: CentOS6
> Kernel version: 2.6.32-XX
> Filesystem used for data volume: XFS
> Java version: 1.8.0_66
> GC option: Kafka default(G1GC)
> Kafka version: 0.10.0.1
> h2. Phenomenon
> In our Kafka cluster, an usual response time for Produce request is about 1ms 
> for 50th percentile to 10ms for 99th percentile. All topics are configured to 
> have 3 replicas and all producers are configured {{acks=all}} so this time 
> includes replication latency.
> Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms 
> but for the most cases the time consuming part is "Remote" which means it is 
> caused by slow replication which is known to happen by various reasons(which 
> is also an issue that we're trying to improve, but out of interest within 
> this issue).
> However, we found that there are some different patterns which happens rarely 
> but stationary 3 ~ 5 times a day for each servers. The interesting part is 
> that "RequestQueue" also got increased as well as "Total" and "Remote".
> At the same time, we observed that the disk read metrics(in terms of both 
> read bytes and read time) spikes exactly for the same moment. Currently we 
> have only caught up consumers so this metric sticks to zero while all 
> requests are served by page cache.
> In order to investigate what Kafka is "read"ing, I employed SystemTap and 
> wrote the following script. It traces all disk reads(only actual read by 
> physical device) made for the data volume by broker process.
> {code}
> global target_pid = KAFKA_PID
> global target_dev = DATA_VOLUME
> probe ioblock.request {
>   if (rw == BIO_READ && pid() == target_pid && devname == target_dev) {
>  t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment
>  printf("%s,%03d:  tid = %d, device = %s, inode = %d, size = %d\n", 
> ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size)
>  print_backtrace()
>  print_ubacktrace()
>   }
> }
> {code}
> As the result, we could observe many logs like below:
> {code}
&

[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

2017-01-11 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817849#comment-15817849
 ] 

huxi commented on KAFKA-4614:
-

Great catch for the whole things. 

As for the unmap things, seems there is no perfect solution at this moment. 
Java does not offer a corresponding unmap that invokes underlying munmap system 
call for the mapped file. See [the jdk 
bug|http://bugs.java.com/view_bug.do?bug_id=4724038]

Right now there exists two "not-that-perfect" solutions:
1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 
'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method 
named 'forceUnmap' that actually does this kind of thing. We could simply reuse 
this method. However, this solution has two drawbacks: 1. If others use 
MappedByteBuffer later, it will crash. 2. Portability risk, although 
sun.misc.Cleaner is present in most of major JVM vendors.
2 Explicitly invoke 'System.gc()', which is also a good enough idea, especially 
when setting -XX:+DisableExplicitGC.

> Long GC pause harming broker performance which is caused by mmap objects 
> created for OffsetIndex
> 
>
> Key: KAFKA-4614
> URL: https://issues.apache.org/jira/browse/KAFKA-4614
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>
> First let me clarify our system environment information as I think it's 
> important to understand this issue:
> OS: CentOS6
> Kernel version: 2.6.32-XX
> Filesystem used for data volume: XFS
> Java version: 1.8.0_66
> GC option: Kafka default(G1GC)
> Kafka version: 0.10.0.1
> h2. Phenomenon
> In our Kafka cluster, an usual response time for Produce request is about 1ms 
> for 50th percentile to 10ms for 99th percentile. All topics are configured to 
> have 3 replicas and all producers are configured {{acks=all}} so this time 
> includes replication latency.
> Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms 
> but for the most cases the time consuming part is "Remote" which means it is 
> caused by slow replication which is known to happen by various reasons(which 
> is also an issue that we're trying to improve, but out of interest within 
> this issue).
> However, we found that there are some different patterns which happens rarely 
> but stationary 3 ~ 5 times a day for each servers. The interesting part is 
> that "RequestQueue" also got increased as well as "Total" and "Remote".
> At the same time, we observed that the disk read metrics(in terms of both 
> read bytes and read time) spikes exactly for the same moment. Currently we 
> have only caught up consumers so this metric sticks to zero while all 
> requests are served by page cache.
> In order to investigate what Kafka is "read"ing, I employed SystemTap and 
> wrote the following script. It traces all disk reads(only actual read by 
> physical device) made for the data volume by broker process.
> {code}
> global target_pid = KAFKA_PID
> global target_dev = DATA_VOLUME
> probe ioblock.request {
>   if (rw == BIO_READ && pid() == target_pid && devname == target_dev) {
>  t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment
>  printf("%s,%03d:  tid = %d, device = %s, inode = %d, size = %d\n", 
> ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size)
>  print_backtrace()
>  print_ubacktrace()
>   }
> }
> {code}
> As the result, we could observe many logs like below:
> {code}
> Thu Dec 22 17:21:39 2016,209:  tid = 126123, device = sdb1, inode = -1, size 
> = 4096
>  0x81275050 : generic_make_request+0x0/0x5a0 [kernel]
>  0x81275660 : submit_bio+0x70/0x120 [kernel]
>  0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs]
>  0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs]
>  0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs]
>  0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs]
>  0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs]
>  0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
>  0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs]
>  0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs]
>  0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
>  0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs]
>  0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs]
>  0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs]
>  0xa

[jira] [Commented] (KAFKA-4603) argument error,and command parsed error

2017-01-06 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803933#comment-15803933
 ] 

huxi commented on KAFKA-4603:
-

[~auroraxlh] If you exactly type '--zookeeper.connect' the command could run 
successfully? And it gets failed should you get some typos. Correct? 

> argument error,and command parsed error
> ---
>
> Key: KAFKA-4603
> URL: https://issues.apache.org/jira/browse/KAFKA-4603
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, documentation
>Affects Versions: 0.10.0.1, 0.10.2.0
> Environment: suse
>Reporter: Xin
>Priority: Minor
>
> according to the 7.6.2 Migrating clusters of document :
> ./zookeeper-security-migration.sh --zookeeper.acl=secure 
> --zookeeper.connection=localhost:2181
> joptsimple.OptionArgumentConversionException: Cannot parse argument 
> 'localhost:2181' of option zookeeper.connection.timeout
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:93)
>   at 
> joptsimple.ArgumentAcceptingOptionSpec.convert(ArgumentAcceptingOptionSpec.java:274)
>   at joptsimple.OptionSet.valuesOf(OptionSet.java:223)
>   at joptsimple.OptionSet.valueOf(OptionSet.java:172)
>   at kafka.admin.ZkSecurityMigrator$.run(ZkSecurityMigrator.scala:111)
>   at kafka.admin.ZkSecurityMigrator$.main(ZkSecurityMigrator.scala:119)
>   at kafka.admin.ZkSecurityMigrator.main(ZkSecurityMigrator.scala)
> Caused by: joptsimple.internal.ReflectionException: 
> java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> joptsimple.internal.Reflection.reflectionException(Reflection.java:140)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:122)
>   at 
> joptsimple.internal.MethodInvokingValueConverter.convert(MethodInvokingValueConverter.java:48)
>   at joptsimple.internal.Reflection.convertWith(Reflection.java:128)
>   at joptsimple.AbstractOptionSpec.convertWith(AbstractOptionSpec.java:90)
>   ... 6 more
> Caused by: java.lang.NumberFormatException: For input string: "localhost:2181"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:492)
>   at java.lang.Integer.valueOf(Integer.java:582)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at joptsimple.internal.Reflection.invoke(Reflection.java:119)
>   ... 9 more
> ===>the argument  "zookeeper.connection" has been parsed to 
> "zookeeper.connection.timeout"
> using help i found that  the argument  is :
> --zookeeper.connectSets the ZooKeeper connect string 
>  (ensemble). This parameter takes a  
>  comma-separated list of host:port   
>  pairs. (default: localhost:2181)
> --zookeeper.connection.timeout Sets the ZooKeeper connection timeout.
> the document describe wrong, and the code also has something wrong:
>  in ZkSecurityMigrator.scala,
>   val parser = new OptionParse()==>
> Any of --v, --ve, ... are accepted on the command line and treated as though 
> you had typed --verbose.
> To suppress this behavior, use the OptionParser constructor 
> OptionParser(boolean allowAbbreviations) and pass a value of false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4595) Controller send thread can't stop when broker change listener event trigger for dead brokers

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803804#comment-15803804
 ] 

huxi commented on KAFKA-4595:
-

Yes, seems that the process might look like this:
1. At some point of time, deleting topic operation began which triggered to 
resume the DeleteTopicsThread. This thread got the controller lock and ran 
successfully. Then it released the lock. But the request send thread had not 
begun to execute the callback.
2. Later, the network got something wrong, then controller triggered the zk 
listener thread to remove dead brokers. This thread got the controller lock and 
shut down the request send thread for that dead broker and wait.
3. Then, the request send thread began to execute the 
deleteTopicStopReplicaCallback which also tried to get the controller lock. So 
it waited forever and failed to be shut down.
4. Then deadlock happened.
[~pengwei]  Does it make sense?

> Controller send thread can't stop when broker change listener event trigger 
> for  dead brokers
> -
>
> Key: KAFKA-4595
> URL: https://issues.apache.org/jira/browse/KAFKA-4595
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.1
>Reporter: Pengwei
> Fix For: 0.10.2.0
>
>
> In our test env, we found controller is not working after a delete topic 
> opertation and network issue, the stack is below:
> "ZkClient-EventThread-15-192.168.1.3:2184,192.168.1.4:2184,192.168.1.5:2184" 
> #15 daemon prio=5 os_prio=0 tid=0x7fb76416e000 nid=0x3019 waiting on 
> condition [0x7fb76b7c8000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xc05497b8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:50)
>   at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:32)
>   at 
> kafka.controller.ControllerChannelManager.kafka$controller$ControllerChannelManager$$removeExistingBroker(ControllerChannelManager.scala:128)
>   at 
> kafka.controller.ControllerChannelManager.removeBroker(ControllerChannelManager.scala:81)
>   - locked <0xc0258760> (a java.lang.Object)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply$mcVI$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(ReplicaStateMachine.scala:369)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:369)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at 
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChan

[jira] [Commented] (KAFKA-4599) KafkaConsumer encounters SchemaException when Kafka broker stopped

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803315#comment-15803315
 ] 

huxi commented on KAFKA-4599:
-

What kind of Storm-supplied kafka client do you use? storm-kafka or 
storm-kafka-client ? And the version?

> KafkaConsumer encounters SchemaException when Kafka broker stopped
> --
>
> Key: KAFKA-4599
> URL: https://issues.apache.org/jira/browse/KAFKA-4599
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Andrew Olson
>
> We recently observed an issue in production that can apparently occur a small 
> percentage of the time when a Kafka broker is stopped. We're using version 
> 0.9.0.1 for all brokers and clients.
> During a recent episode, 3 KafkaConsumer instances (out of approximately 100) 
> ran into the following SchemaException within a few seconds of instructing 
> the broker to shutdown.
> {noformat}
> 2017-01-04 14:46:19 org.apache.kafka.common.protocol.types.SchemaException: 
> Error reading field 'responses': Error reading array of size 2774863, only 62 
> bytes available
>   at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:71)
>   at 
> org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:439)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {noformat}
> The exception message was slightly different for one consumer,
> {{Error reading field 'responses': Error reading array of size 2774863, only 
> 260 bytes available}}
> The exception was not caught and caused the Storm Executor thread to restart, 
> so it's not clear if it would have been transient or fatal for the 
> KafkaConsumer.
> Here are the initial broker shutdown logs,
> {noformat}
> 2017-01-04 14:46:15,869 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> shutting down
> 2017-01-04 14:46:16,298 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutting down
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Stopped 
> 2017-01-04 14:46:18,364 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-1-40], Shutdown completed
> 2017-01-04 14:46:18,612 INFO kafka.server.ReplicaFetcherThread: 
> [ReplicaFetcherThread-3-30], Shutting down
> 2017-01-04 14:46:19,547 INFO kafka.server.KafkaServer: [Kafka Server 4], 
> Controlled shutdown succeeded
> 2017-01-04 14:46:19,554 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutting down
> 2017-01-04 14:46:19,593 INFO kafka.network.SocketServer: [Socket Server on 
> Broker 4], Shutdown completed
> {noformat}
> We've found one very similar reported occurrence,
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAGnq0kFPm%2Bd0Xdm4tY_O7MnV3_LqLU10uDhPwxzv-T7UnHy08g%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4592) Kafka Producer Metrics Invalid Value

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801348#comment-15801348
 ] 

huxi edited comment on KAFKA-4592 at 1/5/17 2:36 PM:
-

Seems that o.a.k.common.metrics.stats.Max's default constructor assigns 
Double.NEGATIVE_INFINITY as the initial value and picks NEGATIVE_INFINITY in 
combine, but for the record size and like, value of zero is more reasonable. 


was (Author: huxi_2b):
Seems that o.a.k.common.metrics.stats.Max's default constructor assigns 
Double.NEGATIVE_INFINITY as the initial value, but for the record size and 
like, value of zero is more reasonable. 

> Kafka Producer Metrics Invalid Value
> 
>
> Key: KAFKA-4592
> URL: https://issues.apache.org/jira/browse/KAFKA-4592
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
>Reporter: AJ Jwair
>Assignee: huxi
>
> Producer metrics
> Metric name: record-size-max
> When no records are produced during the monitoring window, the 
> record-size-max has an invalid value of -9.223372036854776E16
> Please notice that the value is not a very small number close to zero bytes, 
> it is negative 90 quadrillion bytes
> The same behavior was observed in: records-lag-max



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4592) Kafka Producer Metrics Invalid Value

2017-01-05 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4592:
---

Assignee: huxi

> Kafka Producer Metrics Invalid Value
> 
>
> Key: KAFKA-4592
> URL: https://issues.apache.org/jira/browse/KAFKA-4592
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
>Reporter: AJ Jwair
>Assignee: huxi
>
> Producer metrics
> Metric name: record-size-max
> When no records are produced during the monitoring window, the 
> record-size-max has an invalid value of -9.223372036854776E16
> Please notice that the value is not a very small number close to zero bytes, 
> it is negative 90 quadrillion bytes
> The same behavior was observed in: records-lag-max



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4592) Kafka Producer Metrics Invalid Value

2017-01-05 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801348#comment-15801348
 ] 

huxi edited comment on KAFKA-4592 at 1/5/17 1:31 PM:
-

Seems that o.a.k.common.metrics.stats.Max's default constructor assigns 
Double.NEGATIVE_INFINITY as the initial value, but for the record size and 
like, value of zero is more reasonable. 


was (Author: huxi_2b):
Seems that o.a.k.common.metrics.stats.Max's default constructor assign 
Double.NEGATIVE_INFINITY as the initial value, but for the record size and 
like, value of zero is more reasonable. 

> Kafka Producer Metrics Invalid Value
> 
>
> Key: KAFKA-4592
> URL: https://issues.apache.org/jira/browse/KAFKA-4592
> Project: Kafka
>  Issue Type: Bug
>  Components: core, producer 
>Affects Versions: 0.10.1.1
>Reporter: AJ Jwair
>
> Producer metrics
> Metric name: record-size-max
> When no records are produced during the monitoring window, the 
> record-size-max has an invalid value of -9.223372036854776E16
> Please notice that the value is not a very small number close to zero bytes, 
> it is negative 90 quadrillion bytes
> The same behavior was observed in: records-lag-max



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-4295) kafka-console-consumer.sh does not delete the temporary group in zookeeper

2017-01-04 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reopened KAFKA-4295:
-

> kafka-console-consumer.sh does not delete the temporary group in zookeeper
> --
>
> Key: KAFKA-4295
> URL: https://issues.apache.org/jira/browse/KAFKA-4295
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Sswater Shi
>Assignee: huxi
>Priority: Minor
>
> I'm not sure it is a bug or you guys designed it.
> Since 0.9.x.x, the kafka-console-consumer.sh will not delete the group 
> information in zookeeper/consumers on exit when without "--new-consumer". 
> There will be a lot of abandoned zookeeper/consumers/console-consumer-xxx if 
> kafka-console-consumer.sh runs a lot of times.
> When 0.8.x.x,  the kafka-console-consumer.sh can be followed by an argument 
> "group". If not specified, the kafka-console-consumer.sh will create a 
> temporary group name like 'console-consumer-'. If the group name is 
> specified by "group", the information in the zookeeper/consumers will be kept 
> on exit. If the group name is a temporary one, the information in the 
> zookeeper will be deleted when kafka-console-consumer.sh is quitted by 
> Ctrl+C. Why this is changed from 0.9.x.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4434) KafkaProducer configuration is logged twice

2017-01-04 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi resolved KAFKA-4434.
-
Resolution: Fixed
  Reviewer: Ismael Juma

> KafkaProducer configuration is logged twice
> ---
>
> Key: KAFKA-4434
> URL: https://issues.apache.org/jira/browse/KAFKA-4434
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.1
>Reporter: Ruben de Gooijer
>Assignee: huxi
>Priority: Minor
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> The constructor of org.apache.kafka.clients.producer.KafkaProducer accepts a 
> ProducerConfig which when constructed logs the configuration: 
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java#L58
>  . 
> However, when the construction of KafkaProducer proceeds the provided 
> ProducerConfig is repurposed and another instance is created 
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L323
>  which triggers another log with the same contents (only the clientId can 
> differ in case its not supplied in the original config). 
> At first sight this seems like unintended behaviour to me. At least it caused 
> me to dive into it in order to verify if there weren't two producer instances 
> running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3739) Add no-arg constructor for library provided serdes

2017-01-04 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-3739:
---

Assignee: huxi

> Add no-arg constructor for library provided serdes
> --
>
> Key: KAFKA-3739
> URL: https://issues.apache.org/jira/browse/KAFKA-3739
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: huxi
>  Labels: newbie, user-experience
>
> We need to add the no-arg constructor explicitly for those library-provided 
> serdes such as {{WindowedSerde}} that already have constructors with 
> arguments. Otherwise they cannot be used through configs which are expecting 
> to construct them via reflections with no-arg constructors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4308) Inconsistent parameters between console producer and consumer

2017-01-03 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi resolved KAFKA-4308.
-
Resolution: Duplicate

> Inconsistent parameters between console producer and consumer
> -
>
> Key: KAFKA-4308
> URL: https://issues.apache.org/jira/browse/KAFKA-4308
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Gwen Shapira
>  Labels: newbie
>
> kafka-console-producer uses --broker-list while kafka-console-consumer uses 
> --bootstrap-server.
> Let's add --bootstrap-server to the producer for some consistency?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4434) KafkaProducer configuration is logged twice

2017-01-03 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4434:
---

Assignee: huxi  (was: kevin.chen)

> KafkaProducer configuration is logged twice
> ---
>
> Key: KAFKA-4434
> URL: https://issues.apache.org/jira/browse/KAFKA-4434
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.1
>Reporter: Ruben de Gooijer
>Assignee: huxi
>Priority: Minor
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> The constructor of org.apache.kafka.clients.producer.KafkaProducer accepts a 
> ProducerConfig which when constructed logs the configuration: 
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java#L58
>  . 
> However, when the construction of KafkaProducer proceeds the provided 
> ProducerConfig is repurposed and another instance is created 
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L323
>  which triggers another log with the same contents (only the clientId can 
> differ in case its not supplied in the original config). 
> At first sight this seems like unintended behaviour to me. At least it caused 
> me to dive into it in order to verify if there weren't two producer instances 
> running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4576) Log segments close to max size break on fetch

2017-01-03 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795040#comment-15795040
 ] 

huxi commented on KAFKA-4576:
-

If we use a loop structure to collect all the 12 bytes, there is a possibility 
that the loop never gets exited which otherwise is a very very rare situation.  
Should we handle such situation if we decide to employ the loop?

> Log segments close to max size break on fetch
> -
>
> Key: KAFKA-4576
> URL: https://issues.apache.org/jira/browse/KAFKA-4576
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.1
>Reporter: Ivan Babrou
>Assignee: huxi
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1).
> Max segment size is set to 2147483647 globally, that's 1 byte less than max 
> signed int32.
> Every now and then we see failures like this:
> {noformat}
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on 
> Broker 1006]: Error processing fetch operation on partition [mytopic,11], 
> offset 483579108587 (kafka.server.ReplicaManager)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: 
> java.lang.IllegalStateException: Failed to read complete buffer for 
> targetOffset 483686627237 startPosition 2145701130 in 
> /disk/data0/kafka-logs/mytopic-11/483571890786.log
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.LogSegment.translateOffset(LogSegment.scala:128)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.LogSegment.read(LogSegment.scala:180)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.Log.read(Log.scala:563)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:79)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> {noformat}
> ...
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:15 
> 483557418204.timeindex
> -rw-r--r-- 1 kafka kafka   9496 Dec 25 15:26 483564654488.index
> -rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:26 
> 483564654488.timeindex
> -rw-r--r-- 1 kafka kafka   9576 Dec 25 15:37 483571890786.index
> -rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:37 
> 483571890786.timeindex
> -rw-r--r-- 1 kafka kafka   9568 Dec 25 15:48 483579135712.index
> -rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:48 
> 483579135712.timeindex
> -rw-r--r-- 1 kafka kafka   9408 Dec 25 15:59 483586374164.index
> ...
> {noformat}
> Here 483571890786.log is just 3 bytes below the max size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4576) Log segments close to max size break on fetch

2017-01-02 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794042#comment-15794042
 ] 

huxi commented on KAFKA-4576:
-

Seems FileChannel.read does not fill up the 12-byte buffer although there are 
enough bytes to read.

The javadoc does not exactly specify how many bytes the buffer will be filled 
in. In practice, we are likely to always get a full buffer when using 
FileChannel, but actually Java does not make a guarantee about that. Instead, 
it depends on the OS implementation.

Maybe we should use a while loop to make sure 12 bytes have been filled into 
the buffer without violating the corner case check. Does it make sense? 
[~guozhang] [~ijuma] [~becket_qin]

> Log segments close to max size break on fetch
> -
>
> Key: KAFKA-4576
> URL: https://issues.apache.org/jira/browse/KAFKA-4576
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.1
>Reporter: Ivan Babrou
>
> We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1).
> Max segment size is set to 2147483647 globally, that's 1 byte less than max 
> signed int32.
> Every now and then we see failures like this:
> {noformat}
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on 
> Broker 1006]: Error processing fetch operation on partition [mytopic,11], 
> offset 483579108587 (kafka.server.ReplicaManager)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: 
> java.lang.IllegalStateException: Failed to read complete buffer for 
> targetOffset 483686627237 startPosition 2145701130 in 
> /disk/data0/kafka-logs/mytopic-11/483571890786.log
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.LogSegment.translateOffset(LogSegment.scala:128)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.LogSegment.read(LogSegment.scala:180)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.log.Log.read(Log.scala:563)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.Iterator$class.foreach(Iterator.scala:893)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:79)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> {noformat}
> ...
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:15 
> 483557418204.timeindex
> -rw-r--r-- 1 kafka kafka   9496 Dec 25 15:26 483564654488.index
> -rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:26 
> 483564654488.timeindex
> -rw-r--r-- 1 kafka kafka   9576 Dec 25 15:37 483571890786.index
> -rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log
> -rw-r--r-- 1 kafka kafka  0 Dec 25 15:37 
> 483571890786.timeindex
> -rw-r--r-- 1 kafka kafka   9568 Dec 25 15:48 483579135712.index
> -rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log
> -rw-r--r-- 1 kafka kafka   

[jira] [Commented] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers

2016-12-31 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790517#comment-15790517
 ] 

huxi commented on KAFKA-4577:
-

Could you elaborate on the steps you ran into this problem?

> NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers 
> --
>
> Key: KAFKA-4577
> URL: https://issues.apache.org/jira/browse/KAFKA-4577
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Scott Reynolds
>
> Seems as if either deleteTopicManager or deleteTopicManager. 
> partitionsToBeDeleted wasn't set ?
> {code}
> java.lang.NullPointerException
> at 
> kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers
>  (ControllerChannelManager.scala:331)
> at kafka.controller.KafkaController.sendUpdateMetadataRequest 
> (KafkaController.scala:1023)
> at 
> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications
>  (KafkaController.scala:1371)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp
>  (KafkaController.scala:1358)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234)
> at kafka.controller.IsrChangeNotificationListener.handleChildChange 
> (KafkaController.scala:1351)
> at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843)
> at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers

2016-12-31 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790517#comment-15790517
 ] 

huxi edited comment on KAFKA-4577 at 1/1/17 4:26 AM:
-

Could you elaborate on the steps when you ran into this problem?


was (Author: huxi_2b):
Could you elaborate on the steps you ran into this problem?

> NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers 
> --
>
> Key: KAFKA-4577
> URL: https://issues.apache.org/jira/browse/KAFKA-4577
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Scott Reynolds
>
> Seems as if either deleteTopicManager or deleteTopicManager. 
> partitionsToBeDeleted wasn't set ?
> {code}
> java.lang.NullPointerException
> at 
> kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers
>  (ControllerChannelManager.scala:331)
> at kafka.controller.KafkaController.sendUpdateMetadataRequest 
> (KafkaController.scala:1023)
> at 
> kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications
>  (KafkaController.scala:1371)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp
>  (KafkaController.scala:1358)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at 
> kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
>  (KafkaController.scala:1351)
> at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234)
> at kafka.controller.IsrChangeNotificationListener.handleChildChange 
> (KafkaController.scala:1351)
> at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843)
> at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4573) Producer sporadic timeout

2016-12-28 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784692#comment-15784692
 ] 

huxi commented on KAFKA-4573:
-

Is it possible it's caused by a transient network error? 

> Producer sporadic timeout
> -
>
> Key: KAFKA-4573
> URL: https://issues.apache.org/jira/browse/KAFKA-4573
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Ankur C
>
> We had production outage due to sporadic kafka producer timeout. About 1 to 
> 2% of the message would timeout continuously. 
> Kafka version - 0.9.0.1
> #Kafka brokers - 5
> #Replication for each topic - 3
> #Number of topics  - ~30
> #Number of partition - ~300
> We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any 
> issues. However, on Dec 23rd we saw sporadic kafka producer timeout. 
> Issue begin around 6:51am and continued until we bounced kafka broker. 
> 6:51am Underreplication started on small number of topics
> 6:53am All underreplication recovered 
> 11:00am We restarted all kafka producer writer app but this didn't solve the 
> sporadic kafka producer timeout issue
> 12:01pm We restarted all kafka broker after this the issue was resolved.
> Kafka metrics and kafka logs doesn't show any major issue. There were no 
> offline partitions during the outage and #controller was exactly 1. 
> We only saw following exception in kafka broker in controller.log. This log 
> was present for all broker 0 to 4.
> java.io.IOException: Connection to 2 was disconnected before the response was 
> read at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
>  at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
>  at scala.Option.foreach(Option.scala:236) at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
>  at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
>  at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
>  at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
>  at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
>  at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
>  at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) 
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>  [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], 
> Controller 2 epoch 18 fails to send request 
> {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]}
>  to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >