[jira] [Comment Edited] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

2019-07-13 Thread songyingshuan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884547#comment-16884547
 ] 

songyingshuan edited comment on KAFKA-8604 at 7/14/19 5:56 AM:
---

I think we have got the reason : in our kafka cluster, there is a topic with 
high throughput. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads=11', and there is only a few 
topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be syncDeleteSegment or decrease the 
default interval of asyn delete operation.



was (Author: ymxz):
I think we have got the reason : in our kafka cluster, there is a topic with 
high throughput. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be syncDeleteSegment or decrease the 
default interval of asyn delete operation.


> kafka log dir was marked as offline because of deleting segments of 
> __consumer_offsets failed
> -
>
> Key: KAFKA-8604
> URL: https://issues.apache.org/jira/browse/KAFKA-8604
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 1.0.1
>Reporter: songyingshuan
>Priority: Major
> Attachments: error-logs.log
>
>
> We encountered a problem in our product env without any foresight. When kafka 
> broker trying to clean __consumer_offsets-38 (and only happents to this 
> partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a 
> negative impact on some normal partitions (because of the ISR list of those 
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked as 
> offline. BUT!! this problem occurs periodically with the same reason so we 
> have to restart broker periodically.
> we read some source code of kafka-1.0.1, but cannot make sure why this 
> happends. And The cluster status had been good until this problem suddenly 
> attacked us.
> the error log is something like this :
>  
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index 
> /data6/kafka/data/__consumer_offsets-38/012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while 
> deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.

[jira] [Commented] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

2019-07-13 Thread songyingshuan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884548#comment-16884548
 ] 

songyingshuan commented on KAFKA-8604:
--

It is worth mentioning that we have another kafka cluster specifically used to 
perform kafka-streaming/sql tasks. the same problem appeared in that cluster. 
And looking back now, it is most likely because the streaming/ksql tasks 
consume a topic with high throughput and update with high frequency.

How do u think?
[~junrao]
[~huxi_2b]

> kafka log dir was marked as offline because of deleting segments of 
> __consumer_offsets failed
> -
>
> Key: KAFKA-8604
> URL: https://issues.apache.org/jira/browse/KAFKA-8604
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 1.0.1
>Reporter: songyingshuan
>Priority: Major
> Attachments: error-logs.log
>
>
> We encountered a problem in our product env without any foresight. When kafka 
> broker trying to clean __consumer_offsets-38 (and only happents to this 
> partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a 
> negative impact on some normal partitions (because of the ISR list of those 
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked as 
> offline. BUT!! this problem occurs periodically with the same reason so we 
> have to restart broker periodically.
> we read some source code of kafka-1.0.1, but cannot make sure why this 
> happends. And The cluster status had been good until this problem suddenly 
> attacked us.
> the error log is something like this :
>  
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index 
> /data6/kafka/data/__consumer_offsets-38/012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while 
> deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught exception 
> in scheduled task 'delete-file'
> org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
> segments for __consumer_offsets-38 in dir /data6/kafka/data
> Caused by: java.io.IOException: Delete of log 
> .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.

[jira] [Comment Edited] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

2019-07-13 Thread songyingshuan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884547#comment-16884547
 ] 

songyingshuan edited comment on KAFKA-8604 at 7/14/19 3:50 AM:
---

I think we have got the reason : in our kafka cluster, there is a topic with 
high throughput. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be syncDeleteSegment or decrease the 
default interval of asyn delete operation.



was (Author: ymxz):
I think we have got the reason : in our kafka cluster, there is a topic with 
high throughput. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be asyncDeleteSegment or decrease 
the default interval of asyn delete operation.


> kafka log dir was marked as offline because of deleting segments of 
> __consumer_offsets failed
> -
>
> Key: KAFKA-8604
> URL: https://issues.apache.org/jira/browse/KAFKA-8604
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 1.0.1
>Reporter: songyingshuan
>Priority: Major
> Attachments: error-logs.log
>
>
> We encountered a problem in our product env without any foresight. When kafka 
> broker trying to clean __consumer_offsets-38 (and only happents to this 
> partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a 
> negative impact on some normal partitions (because of the ISR list of those 
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked as 
> offline. BUT!! this problem occurs periodically with the same reason so we 
> have to restart broker periodically.
> we read some source code of kafka-1.0.1, but cannot make sure why this 
> happends. And The cluster status had been good until this problem suddenly 
> attacked us.
> the error log is something like this :
>  
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index 
> /data6/kafka/data/__consumer_offsets-38/012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while 
> deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent

[jira] [Commented] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

2019-07-13 Thread songyingshuan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884547#comment-16884547
 ] 

songyingshuan commented on KAFKA-8604:
--

I think we have got the reason : in our kafka cluster, there is a topic with 
high throughout. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be asyncDeleteSegment or decrease 
the default interval of asyn delete operation.


> kafka log dir was marked as offline because of deleting segments of 
> __consumer_offsets failed
> -
>
> Key: KAFKA-8604
> URL: https://issues.apache.org/jira/browse/KAFKA-8604
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 1.0.1
>Reporter: songyingshuan
>Priority: Major
> Attachments: error-logs.log
>
>
> We encountered a problem in our product env without any foresight. When kafka 
> broker trying to clean __consumer_offsets-38 (and only happents to this 
> partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a 
> negative impact on some normal partitions (because of the ISR list of those 
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked as 
> offline. BUT!! this problem occurs periodically with the same reason so we 
> have to restart broker periodically.
> we read some source code of kafka-1.0.1, but cannot make sure why this 
> happends. And The cluster status had been good until this problem suddenly 
> attacked us.
> the error log is something like this :
>  
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index 
> /data6/kafka/data/__consumer_offsets-38/012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while 
> deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2019-06-25 00:11:26,265 ERROR kafka.utils.KafkaScheduler: Uncaught exception 
> in scheduled task 'delete-file'
> org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
> segments for __consumer_offsets-38 in dir /data6/kafka/data
> Caused by: java.io.IOException: Delete of log 
> .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> 

[jira] [Comment Edited] (KAFKA-8604) kafka log dir was marked as offline because of deleting segments of __consumer_offsets failed

2019-07-13 Thread songyingshuan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884547#comment-16884547
 ] 

songyingshuan edited comment on KAFKA-8604 at 7/14/19 3:49 AM:
---

I think we have got the reason : in our kafka cluster, there is a topic with 
high throughput. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be asyncDeleteSegment or decrease 
the default interval of asyn delete operation.



was (Author: ymxz):
I think we have got the reason : in our kafka cluster, there is a topic with 
high throughout. 
And the consumer commit its offset everytime it is going to poll() more 
messages. 
This consumer group commit about 7.5 million records in about 5 minutes, 
which cause the __consuemr_group-38 partition rolls a new segment file about 
every 20 seconds.
Whenever there is a new segment created, the log cleaner thread tried to clean 
this partition (we config 'log.cleaner.threads
=11', and there is only a few topics are 'compact' type).
At the end of cleaning process, a asyncDeleteSegment task will be sheduled 
(default : 60s later), if the next two tasks have the same file to delete, the 
latter will fail。

Based on the analysis result,we first modified the consumer's code,auto commit 
was used ant the interval was fix to 3 seconds.
And the LogDirFailuer have not appeared yet.

So, we think the asyncDeleteSegment should be asyncDeleteSegment or decrease 
the default interval of asyn delete operation.


> kafka log dir was marked as offline because of deleting segments of 
> __consumer_offsets failed
> -
>
> Key: KAFKA-8604
> URL: https://issues.apache.org/jira/browse/KAFKA-8604
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 1.0.1
>Reporter: songyingshuan
>Priority: Major
> Attachments: error-logs.log
>
>
> We encountered a problem in our product env without any foresight. When kafka 
> broker trying to clean __consumer_offsets-38 (and only happents to this 
> partition), the log shows
> it failed, and marking the whole disk/log dir offline, and this leads to a 
> negative impact on some normal partitions (because of the ISR list of those 
> partitions decrease).
> we had to restart the broker server to reuse the disk/dir which was marked as 
> offline. BUT!! this problem occurs periodically with the same reason so we 
> have to restart broker periodically.
> we read some source code of kafka-1.0.1, but cannot make sure why this 
> happends. And The cluster status had been good until this problem suddenly 
> attacked us.
> the error log is something like this :
>  
> {code:java}
> 2019-06-25 00:11:26,241 INFO kafka.log.TimeIndex: Deleting index 
> /data6/kafka/data/__consumer_offsets-38/012855596978.timeindex.deleted
> 2019-06-25 00:11:26,258 ERROR kafka.server.LogDirFailureChannel: Error while 
> deleting segments for __consumer_offsets-38 in dir /data6/kafka/data
> java.io.IOException: Delete of log .log.deleted failed.
> at kafka.log.LogSegment.delete(LogSegment.scala:496)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply$mcV$sp(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log$$anonfun$kafka$log$Log$$deleteSeg$1$1.apply(Log.scala:1596)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> at kafka.log.Log.kafka$log$Log$$deleteSeg$1(Log.scala:1595)
> at 
> kafka.log.Log$$anonfun$kafka$log$Log$$asyncDeleteSegment$1.apply$mcV$sp(Log.scala:1599)
> at 
> kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurren

[jira] [Commented] (KAFKA-8663) partition assignment would be better original_assignment + new_reassignment during reassignments

2019-07-13 Thread GEORGE LI (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884526#comment-16884526
 ] 

GEORGE LI commented on KAFKA-8663:
--

As we can see from the original comments of the code:  

{code}
 //1. Update AR in ZK with OAR + RAR.
{code}

But in the actual implementation, it's doing:  RAR + OAR  instead (different 
ordering). 

> partition assignment would be better original_assignment + new_reassignment 
> during reassignments
> 
>
> Key: KAFKA-8663
> URL: https://issues.apache.org/jira/browse/KAFKA-8663
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 1.1.1, 2.3.0
>Reporter: GEORGE LI
>Priority: Minor
>
> From my observation/experience during reassignment,  the partition assignment 
> replica ordering gets changed.   because it's  OAR + RAR  (original replicas 
> + reassignment replicas)  set union. 
> However, it seems like the preferred leaders changed during the 
> reassignments.  Normally if there is no cluster preferred leader election,  
> the leader is still the old leader.  But if during the reassignments, there 
> is a leader election,  the leadership changes.  This caused some side 
> effects.  Let's look at this example.
> {code}
> Topic:georgeli_test   PartitionCount:8ReplicationFactor:3 Configs:
>   Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
> 1026,1028,1025Isr: 1026,1028,1025
> {code}
> reassignment  (1026,1028,1025) => (1027,1025,1028)
> {code}
> Topic:georgeli_test   PartitionCount:8ReplicationFactor:4 
> Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027
>   Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
> 1027,1025,1028,1026   Isr: 1026,1028,1025
> {code}
> Notice the above:   Leader remains 1026.   but Replicas: 1027,1025,1028,1026. 
>   If we run preferred leader election,  it will try 1027 first, then 1025.
> After  1027 is in ISR,  then the final assignment will be  (1027,1025,1028).  
>   
> My proposal for a minor improvement is to keep the original ordering replicas 
> during the reassignment (could be long for big topic/partitions).  and after 
> all replicas in ISR, then finally set the partition assignment to New 
> reassignment.  
> {code}
>   val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ 
> controllerContext.partitionReplicaAssignment(topicPartition)).toSet
>   //1. Update AR in ZK with OAR + RAR.
>   updateAssignedReplicasForPartition(topicPartition, 
> newAndOldReplicas.toSeq)
> {code} 
> above code changed to below to keep the original ordering first during 
> reassignment: 
> {code}
>   val newAndOldReplicas = 
> (controllerContext.partitionReplicaAssignment(topicPartition) ++ 
> reassignedPartitionContext.newReplicas).toSet
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8663) partition assignment would be better original_assignment + new_reassignment during reassignments

2019-07-13 Thread GEORGE LI (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GEORGE LI updated KAFKA-8663:
-
Description: 
>From my observation/experience during reassignment,  the partition assignment 
>replica ordering gets changed.   because it's  OAR + RAR  (original replicas + 
>reassignment replicas)  set union. 

However, it seems like the preferred leaders changed during the reassignments.  
Normally if there is no cluster preferred leader election,  the leader is still 
the old leader.  But if during the reassignments, there is a leader election,  
the leadership changes.  This caused some side effects.  Let's look at this 
example.

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:3 Configs:
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1026,1028,1025Isr: 1026,1028,1025
{code}

reassignment  (1026,1028,1025) => (1027,1025,1028)

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:4 
Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1027,1025,1028,1026   Isr: 1026,1028,1025
{code}

Notice the above:   Leader remains 1026.   but Replicas: 1027,1025,1028,1026.   
If we run preferred leader election,  it will try 1027 first, then 1025.
After  1027 is in ISR,  then the final assignment will be  (1027,1025,1028).

My proposal for a minor improvement is to keep the original ordering replicas 
during the reassignment (could be long for big topic/partitions).  and after 
all replicas in ISR, then finally set the partition assignment to New 
reassignment.  

{code}
  val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ 
controllerContext.partitionReplicaAssignment(topicPartition)).toSet
  //1. Update AR in ZK with OAR + RAR.
  updateAssignedReplicasForPartition(topicPartition, 
newAndOldReplicas.toSeq)
{code} 

above code changed to below to keep the original ordering first during 
reassignment: 

{code}
  val newAndOldReplicas = 
(controllerContext.partitionReplicaAssignment(topicPartition) ++ 
reassignedPartitionContext.newReplicas).toSet
{code}

  was:
>From my observation/experience during reassignment,  the partition assignment 
>replica ordering gets changed.   because it's  OAR + RAR  (original replicas + 
>reassignment replicas)  set union. 

However, it seems like the preferred leaders changed during the reassignments.  
Normally if there is no cluster preferred leader election,  the leader is still 
the old leader.  But if during the reassignments, there is a leader election,  
the leadership changes.  This caused some side effects.  Let's look at this 
example.

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:3 Configs:
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1026,1028,1025Isr: 1026,1028,1025
{code}

reassignment  (1026,1028,1025) => (1027,1025,1028)

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:4 
Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1027,1025,1028,1026   Isr: 1026,1028,1025
{code}

Notice the above:   Leader remains 1026.   but Replicas: 1027,1025,1028,1026.   
If we run preferred leader election,  it will try 1027 first, then 1025.
After  1027 is in ISR,  then the final assignment will be  (1027,1025,1028).

My proposal for a minor improvement is to keep the original ordering replicas 
during the reassignment (could be long for big topic/partitions).  and after 
all replicas in ISR, then finally set the partition assignment to New 
reassignment.  

{code}
  val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ 
controllerContext.partitionReplicaAssignment(topicPartition)).toSet
  //1. Update AR in ZK with OAR + RAR.
  updateAssignedReplicasForPartition(topicPartition, 
newAndOldReplicas.toSeq)
{code} 

above code changed to below to keep the original ordering during reassignment: 

{code}
  val newAndOldReplicas = 
(controllerContext.partitionReplicaAssignment(topicPartition) ++ 
reassignedPartitionContext.newReplicas).toSet
{ code}


> partition assignment would be better original_assignment + new_reassignment 
> during reassignments
> 
>
> Key: KAFKA-8663
> URL: https://issues.apache.org/jira/browse/KAFKA-8663
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, core
>Affects Versions: 1.1.1, 2.3.0
>Reporter: GEORGE LI
>Priority: Minor
>
> From my observation/e

[jira] [Created] (KAFKA-8663) partition assignment would be better original_assignment + new_reassignment during reassignments

2019-07-13 Thread GEORGE LI (JIRA)
GEORGE LI created KAFKA-8663:


 Summary: partition assignment would be better original_assignment 
+ new_reassignment during reassignments
 Key: KAFKA-8663
 URL: https://issues.apache.org/jira/browse/KAFKA-8663
 Project: Kafka
  Issue Type: Improvement
  Components: controller, core
Affects Versions: 2.3.0, 1.1.1
Reporter: GEORGE LI


>From my observation/experience during reassignment,  the partition assignment 
>replica ordering gets changed.   because it's  OAR + RAR  (original replicas + 
>reassignment replicas)  set union. 

However, it seems like the preferred leaders changed during the reassignments.  
Normally if there is no cluster preferred leader election,  the leader is still 
the old leader.  But if during the reassignments, there is a leader election,  
the leadership changes.  This caused some side effects.  Let's look at this 
example.

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:3 Configs:
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1026,1028,1025Isr: 1026,1028,1025
{code}

reassignment  (1026,1028,1025) => (1027,1025,1028)

{code}
Topic:georgeli_test PartitionCount:8ReplicationFactor:4 
Configs:leader.replication.throttled.replicas=0:1026,0:1028,0:1025,follower.replication.throttled.replicas=0:1027
Topic: georgeli_testPartition: 0Leader: 1026Replicas: 
1027,1025,1028,1026   Isr: 1026,1028,1025
{code}

Notice the above:   Leader remains 1026.   but Replicas: 1027,1025,1028,1026.   
If we run preferred leader election,  it will try 1027 first, then 1025.
After  1027 is in ISR,  then the final assignment will be  (1027,1025,1028).

My proposal for a minor improvement is to keep the original ordering replicas 
during the reassignment (could be long for big topic/partitions).  and after 
all replicas in ISR, then finally set the partition assignment to New 
reassignment.  

{code}
  val newAndOldReplicas = (reassignedPartitionContext.newReplicas ++ 
controllerContext.partitionReplicaAssignment(topicPartition)).toSet
  //1. Update AR in ZK with OAR + RAR.
  updateAssignedReplicasForPartition(topicPartition, 
newAndOldReplicas.toSeq)
{code} 

above code changed to below to keep the original ordering during reassignment: 

{code}
  val newAndOldReplicas = 
(controllerContext.partitionReplicaAssignment(topicPartition) ++ 
reassignedPartitionContext.newReplicas).toSet
{ code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8183) Trogdor - ProduceBench should retry on UnknownTopicOrPartitionException during topic creation

2019-07-13 Thread Stanislav Kozlovski (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski resolved KAFKA-8183.

Resolution: Fixed

> Trogdor - ProduceBench should retry on UnknownTopicOrPartitionException 
> during topic creation
> -
>
> Key: KAFKA-8183
> URL: https://issues.apache.org/jira/browse/KAFKA-8183
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> There exists a race condition in the Trogdor produce bench worker code where 
> `WorkerUtils#createTopics()` [notices the topic 
> exists|https://github.com/apache/kafka/blob/4824dc994d7fc56b7540b643a78aadb4bdd0f14d/tools/src/main/java/org/apache/kafka/trogdor/common/WorkerUtils.java#L159]
>  yet when it goes on to verify the topics, the DescribeTopics call throws an 
> `UnknownTopicOrPartitionException`.
> We should add sufficient retries such that this does not fail the task.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8183) Trogdor - ProduceBench should retry on UnknownTopicOrPartitionException during topic creation

2019-07-13 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884491#comment-16884491
 ] 

Stanislav Kozlovski commented on KAFKA-8183:


[https://github.com/apache/kafka/pull/6937] fixes this issue. Thanks [~cmccabe]!

> Trogdor - ProduceBench should retry on UnknownTopicOrPartitionException 
> during topic creation
> -
>
> Key: KAFKA-8183
> URL: https://issues.apache.org/jira/browse/KAFKA-8183
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
>
> There exists a race condition in the Trogdor produce bench worker code where 
> `WorkerUtils#createTopics()` [notices the topic 
> exists|https://github.com/apache/kafka/blob/4824dc994d7fc56b7540b643a78aadb4bdd0f14d/tools/src/main/java/org/apache/kafka/trogdor/common/WorkerUtils.java#L159]
>  yet when it goes on to verify the topics, the DescribeTopics call throws an 
> `UnknownTopicOrPartitionException`.
> We should add sufficient retries such that this does not fail the task.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8662) Produce fails if a previous produce was to an unauthorized topic

2019-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884485#comment-16884485
 ] 

ASF GitHub Bot commented on KAFKA-8662:
---

rajinisivaram commented on pull request #7086: KAFKA-8662; Fix producer 
metadata error handling and consumer manual assignment
URL: https://github.com/apache/kafka/pull/7086
 
 
   Producer adds a topic to its Metadata instance when send is requested. If 
metadata request for the topic fails (e.g. due to authorization failure),  we 
retain the topic in Metadata and continue to attempt refresh until a hard-coded 
expiry time of 5 minutes. Due to changes introduced in 
https://github.com/apache/kafka/commit/460e46c3bb76a361d0706b263c03696005e12566,
 subsequent sends to any topic including valid authorized topics report 
authorization failures in any topic in the metadata, rather than just the topic 
to which send is requested. As a result, the producer remains unusable for 5 
minutes if a send is requested on an unauthorized topic. This PR fails send 
only if metadata for the topic being sent to has an error (or there is a fatal 
exception like authentication failure).
   
   Consumer adds a topic to its Metadata instance on `subscribe()` or 
`assign()`. Even though `assign()` is not incremental and replaces existing 
assignment, new assignments were being added to existing topics in 
SubscriptionState#groupSubscriptions, which is used for fetching topic 
metadata. This PR does a replace for manual assignment alone.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Produce fails if a previous produce was to an unauthorized topic
> 
>
> Key: KAFKA-8662
> URL: https://issues.apache.org/jira/browse/KAFKA-8662
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.3.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 2.4.0, 2.3.1
>
>
> This is a regression introduced by the commit 
> [https://github.com/apache/kafka/commit/460e46c3bb76a361d0706b263c03696005e12566|https://github.com/apache/kafka/commit/460e46c3bb76a361d0706b263c03696005e12566.].
> When we produce to a topic, was add the topic to the producer's Metadata 
> instance. If metadata authorization fails for the topic, we fail the send and 
> propagate the authorization exception to the caller. The topic remains in the 
> Metadata instance. We expire the topic and remove from Metadata after a fixed 
> interval of 5 minutes. This has been the case for a while.
>  
> If a subsequent send is to a different authorized topic, we may still get 
> metadata authorization failures for the previous unauthorized topic that is 
> still in Metadata. Prior to that commit in 2.3.0, send to authorized topics 
> completed successfully even if there were other unauthorized or invalid 
> topics in the Metadata. Now, we propagate the exceptions without checking 
> topic. This is a regression and not the expected behaviour since producer 
> becomes unusable for 5 minutes unless authorization is granted to the first 
> topic.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8613) Set default grace period to 0

2019-07-13 Thread Lillian (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884478#comment-16884478
 ] 

Lillian commented on KAFKA-8613:


[~cadonna], Do you have time to write up a KIP for this issue?

If not, I can try.

> Set default grace period to 0
> -
>
> Key: KAFKA-8613
> URL: https://issues.apache.org/jira/browse/KAFKA-8613
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Bruno Cadonna
>Priority: Blocker
>
> Currently, the grace period is set to retention time if the grace period is 
> not specified explicitly. The reason for setting the default grace period to 
> retention time was backward compatibility. Topologies that were implemented 
> before the introduction of the grace period, added late arriving records to a 
> window as long as the window existed, i.e., as long as its retention time was 
> not elapsed.  
> This unintuitive default grace period has already caused confusion among 
> users.
> For the next major release, we should set the default grace period to 
> {{Duration.ZERO}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation

2019-07-13 Thread Lillian (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884471#comment-16884471
 ] 

Lillian commented on KAFKA-8608:


It appears that there is not enough context in order to deduce what the warning 
was triggered by. [~xmar], Is it possible for you to provide the log (with 
redactions)?

> Broker shows WARN on reassignment partitions on new brokers: Replica LEO, 
> follower position & Cache truncation
> --
>
> Key: KAFKA-8608
> URL: https://issues.apache.org/jira/browse/KAFKA-8608
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.1
> Environment: Kafka 2.1.1
>Reporter: Di Campo
>Priority: Minor
>  Labels: broker, reassign, repartition
>
> I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where 
> there were 32 topics and 64 partitions on each, replication 3.
> Running reassigning partitions. 
> On each run, I can see the following WARN messages, but when the reassignment 
> partition process finishes, it all seems OK. ISR is OK (count is 3 in every 
> partition).
> But I get the following messages types, one per partition:
>  
> {code:java}
> [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch 
> entry EpochEntry(epoch=24, startOffset=51540) caused truncation of 
> conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). 
> Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code}
> -> This relates to cache, so I suppose it's pretty safe.
> {code:java}
> [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to 
> record follower 3's position 47981 since the replica is not recognized to be 
> one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty 
> records will be returned for this partition. 
> (kafka.server.ReplicaManager){code}
> -> This is scary. I'm not sure about the severity of this, but it looks like 
> it may be missing records? 
> {code:java}
> [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the 
> replica LEO, the partition visitors-0.0.1-58 hasn't been created. 
> (kafka.server.ReplicaManager){code}
> -> Here, these partitions are created. 
> First of all - am I supposed to be missing data here? I am assuming I don't, 
> and so far I don't see traces of losing anything.
> If so, I'm not sure what these messages are trying to say here. Should they 
> really be at WARN level? If so - should the message clarify better the 
> different risks involved? 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8662) Produce fails if a previous produce was to an unauthorized topic

2019-07-13 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-8662:
-

 Summary: Produce fails if a previous produce was to an 
unauthorized topic
 Key: KAFKA-8662
 URL: https://issues.apache.org/jira/browse/KAFKA-8662
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 2.3.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 2.4.0, 2.3.1


This is a regression introduced by the commit 
[https://github.com/apache/kafka/commit/460e46c3bb76a361d0706b263c03696005e12566|https://github.com/apache/kafka/commit/460e46c3bb76a361d0706b263c03696005e12566.].

When we produce to a topic, was add the topic to the producer's Metadata 
instance. If metadata authorization fails for the topic, we fail the send and 
propagate the authorization exception to the caller. The topic remains in the 
Metadata instance. We expire the topic and remove from Metadata after a fixed 
interval of 5 minutes. This has been the case for a while.

 

If a subsequent send is to a different authorized topic, we may still get 
metadata authorization failures for the previous unauthorized topic that is 
still in Metadata. Prior to that commit in 2.3.0, send to authorized topics 
completed successfully even if there were other unauthorized or invalid topics 
in the Metadata. Now, we propagate the exceptions without checking topic. This 
is a regression and not the expected behaviour since producer becomes unusable 
for 5 minutes unless authorization is granted to the first topic.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-6333) java.awt.headless should not be on commandline

2019-07-13 Thread Sujay Hegde (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884383#comment-16884383
 ] 

Sujay Hegde commented on KAFKA-6333:


Hi,

 

I am a newbie and wanted to work on this issue.

How do I go about doing this.

Please help me out.

 

Thanks,

Sujay

> java.awt.headless should not be on commandline
> --
>
> Key: KAFKA-6333
> URL: https://issues.apache.org/jira/browse/KAFKA-6333
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.0.0
>Reporter: Fabrice Bacchella
>Priority: Trivial
>
> The option -Djava.awt.headless=true is defined in KAFKA_JVM_PERFORMANCE_OPTS.
> But it should even not be present on command line. It's only useful for 
> application that can be used in application that is used in both a headless 
> and a traditional environment. Kafka is a server, so it should be setup in 
> the main class. This help reduce clutter in command line.
> See http://www.oracle.com/technetwork/articles/javase/headless-136834.html 
> for more details.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8024) UtilsTest.testFormatBytes fails with german locale

2019-07-13 Thread Sujay Hegde (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884382#comment-16884382
 ] 

Sujay Hegde commented on KAFKA-8024:


Hi,

I am a newbie.

How do I assign this issue to myself?

I deem it to be a trivial issue and a good starting point to contribute to 
kafka.

 

Thanks

> UtilsTest.testFormatBytes fails with german locale
> --
>
> Key: KAFKA-8024
> URL: https://issues.apache.org/jira/browse/KAFKA-8024
> Project: Kafka
>  Issue Type: Bug
>Reporter: Patrik Kleindl
>Priority: Trivial
>
> The unit test fails when the default locale is not English (in my case, deAT)
> assertEquals("1.1 MB", formatBytes((long) (1.1 * 1024 * 1024)));
>  
> org.apache.kafka.common.utils.UtilsTest > testFormatBytes FAILED
>     org.junit.ComparisonFailure: expected:<1[.]1 MB> but was:<1[,]1 MB>
>         at org.junit.Assert.assertEquals(Assert.java:115)
>         at org.junit.Assert.assertEquals(Assert.java:144)
>         at 
> org.apache.kafka.common.utils.UtilsTest.testFormatBytes(UtilsTest.java:106)
>  
> The easiest fix in this case should be adding
> {code:java}
> jvmArgs '-Duser.language=en -Duser.country=US'{code}
> to the test configuration 
> [https://github.com/apache/kafka/blob/b03e8c234a8aeecd10c2c96b683cfb39b24b548a/build.gradle#L270]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8360) Docs do not mention RequestQueueSize JMX metric

2019-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884343#comment-16884343
 ] 

ASF GitHub Bot commented on KAFKA-8360:
---

ankit-kumar-25 commented on pull request #220: KAFKA-8360: Docs do not mention 
RequestQueueSize JMX metric 
URL: https://github.com/apache/kafka-site/pull/220
 
 
   What? :: Mentioning "Request Queue Size" under 
[Monitoring](https://kafka.apache.org/documentation/#monitoring) tab. 
RequestQueueSize is an important metric to monitor the number of requests in 
the queue. As a crowded queue might face issue processing incoming or outgoing 
requests
   
   Can you please review this?
   
   Thanks!!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Docs do not mention RequestQueueSize JMX metric
> ---
>
> Key: KAFKA-8360
> URL: https://issues.apache.org/jira/browse/KAFKA-8360
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation, metrics, network
>Reporter: Charles Francis Larrieu Casias
>Assignee: Ankit Kumar
>Priority: Major
>  Labels: documentation
>
> In the [monitoring 
> documentation|[https://kafka.apache.org/documentation/#monitoring],] there is 
> no mention of the `kafka.network:type=RequestChannel,name=RequestQueueSize` 
> JMX metric. This is an important metric because it can indicate that there 
> are too many requests in queue and suggest either increasing 
> `queued.max.requests` (along with perhaps memory), or increasing 
> `num.io.threads`.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)