[jira] [Updated] (KAFKA-6366) StackOverflowError in kafka-coordinator-heartbeat-thread

2018-01-12 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-6366:
---
Fix Version/s: 1.0.1

> StackOverflowError in kafka-coordinator-heartbeat-thread
> 
>
> Key: KAFKA-6366
> URL: https://issues.apache.org/jira/browse/KAFKA-6366
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0
>Reporter: Joerg Heinicke
>Assignee: Jason Gustafson
> Fix For: 1.0.1
>
> Attachments: 6366.v1.txt, ConverterProcessor.zip, 
> ConverterProcessor_DEBUG.zip, Screenshot-2017-12-19 21.35-22.10 processing.png
>
>
> With Kafka 1.0 our consumer groups fall into a permanent cycle of rebalancing 
> once a StackOverflowError in the heartbeat thread occurred due to 
> connectivity issues of the consumers to the coordinating broker:
> Immediately before the exception there are hundreds, if not thousands of log 
> entries of following type:
> 2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] INFO  - [Consumer clientId=consumer-4, 
> groupId=my-consumer-group] Marking the coordinator : (id: 
> 2147483645 rack: null) dead
> The exceptions always happen somewhere in the DateFormat code, even 
> though at different lines.
> 2017-12-12 16:23:12.363 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] ERROR - Uncaught exception in thread 
> 'kafka-coordinator-heartbeat-thread | my-consumer-group':
> java.lang.StackOverflowError
>  at 
> java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362)
>  at 
> java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
>  at java.util.Calendar.getDisplayName(Calendar.java:2110)
>  at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
>  at java.text.DateFormat.format(DateFormat.java:345)
>  at 
> org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
>  at 
> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>  at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
>  at 
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
>  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>  at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>  at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>  at org.apache.log4j.Category.callAppenders(Category.java:206)
>  at org.apache.log4j.Category.forcedLog(Category.java:391)
>  at org.apache.log4j.Category.log(Category.java:856)
>  at 
> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324)
>  at 
> org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
> ...
> the following 9 lines are repeated around hundred times.
> ...
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.failUnsentRequests(ConsumerNetworkClient.java:416)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.disconnect(ConsumerNetworkClient.java:388)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:653)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>  at 
> 

[jira] [Commented] (KAFKA-6366) StackOverflowError in kafka-coordinator-heartbeat-thread

2018-01-12 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324992#comment-16324992
 ] 

Jason Gustafson commented on KAFKA-6366:


[~joerg.heinicke] Thanks for the debug logs. I think I finally understand the 
problem. All it takes is a coordinator disconnect with a large number of 
pending offset commits. In this case, the consumer appears to be sending async 
commits quite frequently (nearly every message apparently). When the 
coordinator disconnected, I counted 3,346 pending commits which had to be 
cancelled. I wrote a simple test case for this scenario and reproduced the 
overflow. I was also able to confirm that my fix above does indeed solve the 
problem, so I will remove the WIP tag and try to get this into the upcoming bug 
fix release.

For a shorter term solution, I would recommend implementing some logic to 
dampen the rate of offset commits. Typically people make it periodic or base it 
off of the number of messages consumed.  

> StackOverflowError in kafka-coordinator-heartbeat-thread
> 
>
> Key: KAFKA-6366
> URL: https://issues.apache.org/jira/browse/KAFKA-6366
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 1.0.0
>Reporter: Joerg Heinicke
>Assignee: Jason Gustafson
> Attachments: 6366.v1.txt, ConverterProcessor.zip, 
> ConverterProcessor_DEBUG.zip, Screenshot-2017-12-19 21.35-22.10 processing.png
>
>
> With Kafka 1.0 our consumer groups fall into a permanent cycle of rebalancing 
> once a StackOverflowError in the heartbeat thread occurred due to 
> connectivity issues of the consumers to the coordinating broker:
> Immediately before the exception there are hundreds, if not thousands of log 
> entries of following type:
> 2017-12-12 16:23:12.361 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] INFO  - [Consumer clientId=consumer-4, 
> groupId=my-consumer-group] Marking the coordinator : (id: 
> 2147483645 rack: null) dead
> The exceptions always happen somewhere in the DateFormat code, even 
> though at different lines.
> 2017-12-12 16:23:12.363 [kafka-coordinator-heartbeat-thread | 
> my-consumer-group] ERROR - Uncaught exception in thread 
> 'kafka-coordinator-heartbeat-thread | my-consumer-group':
> java.lang.StackOverflowError
>  at 
> java.text.DateFormatSymbols.getProviderInstance(DateFormatSymbols.java:362)
>  at 
> java.text.DateFormatSymbols.getInstance(DateFormatSymbols.java:340)
>  at java.util.Calendar.getDisplayName(Calendar.java:2110)
>  at java.text.SimpleDateFormat.subFormat(SimpleDateFormat.java:1125)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:966)
>  at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
>  at java.text.DateFormat.format(DateFormat.java:345)
>  at 
> org.apache.log4j.helpers.PatternParser$DatePatternConverter.convert(PatternParser.java:443)
>  at 
> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>  at org.apache.log4j.PatternLayout.format(PatternLayout.java:506)
>  at 
> org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
>  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
>  at 
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>  at 
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>  at org.apache.log4j.Category.callAppenders(Category.java:206)
>  at org.apache.log4j.Category.forcedLog(Category.java:391)
>  at org.apache.log4j.Category.log(Category.java:856)
>  at 
> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:324)
>  at 
> org.apache.kafka.common.utils.LogContext$KafkaLogger.info(LogContext.java:341)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead(AbstractCoordinator.java:649)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onFailure(AbstractCoordinator.java:797)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:496)
> ...
> the following 9 lines are repeated around hundred times.
> ...
>  at 
> 

[jira] [Comment Edited] (KAFKA-6434) Kafka-consumer-groups.sh reset-offsets does not work properly for not existing group

2018-01-12 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324830#comment-16324830
 ] 

Vahid Hashemian edited comment on KAFKA-6434 at 1/13/18 5:18 AM:
-

[~cabot] This is fixed in the current trunk (by KIP-171) and will land on 1.1.0.

It appears though that {{kafka-consumer-groups.sh --list --bootstrap-server 
localhost:9092}} still does not output such groups because the protocol type 
for them is not set (the required protocol type by this command is 
{{consumer}}). [~hachikuji], do you think this is the correct behavior?

*Update*: I just came across 
[KAFKA-6287|https://issues.apache.org/jira/browse/KAFKA-6287] which addresses 
the issue I mentioned.


was (Author: vahid):
[~cabot] This is fixed in the current trunk (by KIP-171) and will land on 1.1.0.

It appears though that {{kafka-consumer-groups.sh --list --bootstrap-server 
localhost:9092}} still does not output such groups because the protocol type 
for them is not set (the required protocol type by this command is 
{{consumer}}). [~hachikuji], do you think this is the correct behavior?

> Kafka-consumer-groups.sh reset-offsets does not work properly for not 
> existing group
> 
>
> Key: KAFKA-6434
> URL: https://issues.apache.org/jira/browse/KAFKA-6434
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.2
>Reporter: George Smith
>Assignee: Vahid Hashemian
>
> Our usecase: We are migrating Spark streaming app into Kafka streaming. We 
> want to continue processing from the last processed offsets of the Spark 
> streaming app. Therefore we want to define new consumer group (application 
> id) with given offsets. The new app was not launched yet (we don't want to 
> make side effects of processing into db) -> new consumer group does not exist.
> I was happy to see the updated  Kafka-consumer-groups.sh supports 
> reset-offsets method. Unfortunately it seems it's not working as expected. 
> {code}
> kafka-consumer-groups.sh --reset-offsets --bootstrap-server localhost:9092 
> --topic testTopic:0 --group testGROUP --to-offset 10
> Note: This will only show information about consumers that use the Java 
> consumer API (non-ZooKeeper-based consumers).
> TOPIC  PARTITION  NEW-OFFSET
> testTopic  0  10
> {code}
> Now I want to check offsets for the group:
> {code}
> kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group 
> testGROUP
> Error: Consumer group 'testGROUP' does not exist.
> {code}
> That's strange, isn't it?
> On the other side when I use kafka-streams-application-reset.sh - the group 
> is obviously created - unfortunately this tool does not support given offsets 
> for partitions (only the beginning is supported) + it does not support 
> secured Kafka connection...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4499) Add "getAllKeys" API for querying windowed KTable stores

2018-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324869#comment-16324869
 ] 

ASF GitHub Bot commented on KAFKA-4499:
---

guozhangwang closed pull request #4385: [KAFKA-4499] Adding documentation for 
querying WindowStores
URL: https://github.com/apache/kafka/pull/4385
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/streams/upgrade-guide.html b/docs/streams/upgrade-guide.html
index 7567abd085e..2e0a2fc3aca 100644
--- a/docs/streams/upgrade-guide.html
+++ b/docs/streams/upgrade-guide.html
@@ -63,6 +63,10 @@ Upgrade Guide  API Changes
 
 
 Streams API changes in 1.1.0
+
+   We have added support for methods in ReadOnlyWindowStore 
which allows for querying WindowStores without the neccesity of 
providing keys.
+
+
 
 
 The introduction of https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier;>KIP-220


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add "getAllKeys" API for querying windowed KTable stores
> 
>
> Key: KAFKA-4499
> URL: https://issues.apache.org/jira/browse/KAFKA-4499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Richard Yu
>  Labels: needs-kip
> Fix For: 1.1.0
>
> Attachments: 4499-All-test-v1.patch, 4499-CachingWindowStore-v1.patch
>
>
> Currently, both {{KTable}} and windowed-{{KTable}} stores can be queried via 
> IQ feature. While {{ReadOnlyKeyValueStore}} (for {{KTable}} stores) provide 
> method {{all()}} to scan the whole store (ie, returns an iterator over all 
> stored key-value pairs), there is no similar API for {{ReadOnlyWindowStore}} 
> (for windowed-{{KTable}} stores).
> This limits the usage of a windowed store, because the user needs to know 
> what keys are stored in order the query it. It would be useful to provide 
> possible APIs like this (only a rough sketch):
>  - {{keys()}} returns all keys available in the store (maybe together with 
> available time ranges)
>  - {{all(long timeFrom, long timeTo)}} that returns all window for a specific 
> time range
>  - {{allLatest()}} that returns the latest window for each key
> Because this feature would require to scan multiple segments (ie, RockDB 
> instances) it would be quite inefficient with current store design. Thus, 
> this feature also required to redesign the underlying window store itself.
> Because this is a major change, a KIP 
> (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
>  is required. The KIP should cover the actual API design as well as the store 
> refactoring.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6441) FetchRequest populates buffer of size MinBytes, even if response is smaller

2018-01-12 Thread Ivan Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324857#comment-16324857
 ] 

Ivan Babrou commented on KAFKA-6441:


Looks like the issue is in Sarama, which only reads one record batch:

* https://github.com/Shopify/sarama/issues/1022

> FetchRequest populates buffer of size MinBytes, even if response is smaller
> ---
>
> Key: KAFKA-6441
> URL: https://issues.apache.org/jira/browse/KAFKA-6441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>Reporter: Ivan Babrou
>
> We're using Sarama Go client as consumer, but I don't think it's relevant. 
> Producer is syslog-ng with Kafka output, I'm not quite sure which log format 
> Kafka itself is using, but I can assume 0.11.0.0, because that's what is set 
> in topic settings.
> Our FetchRequest has minSize = 16MB, maxSize = 64, maxWait = 500ms. For a 
> silly reason, Kafka decides to reply with at least minSize buffer with just 
> one 1KB log message. When Sarama was using older consumer API, everything was 
> okay. When we upgraded to 0.11.0.0 consumer API, consumer traffic for 
> 125Mbit/s topic spiked to 55000Mbit/s on the wire and consumer wasn't even 
> able to keep up.
> 1KB message in a 16MB buffer is 1,600,000% overhead.
> I don't think there's any valid reason to do this.
> It's also mildly annoying that there is no tag 0.11.0.1 in git, looking at 
> changes is harder than it should be.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6441) FetchRequest populates buffer of size MinBytes, even if response is smaller

2018-01-12 Thread Ivan Babrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Babrou resolved KAFKA-6441.

Resolution: Invalid

> FetchRequest populates buffer of size MinBytes, even if response is smaller
> ---
>
> Key: KAFKA-6441
> URL: https://issues.apache.org/jira/browse/KAFKA-6441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>Reporter: Ivan Babrou
>
> We're using Sarama Go client as consumer, but I don't think it's relevant. 
> Producer is syslog-ng with Kafka output, I'm not quite sure which log format 
> Kafka itself is using, but I can assume 0.11.0.0, because that's what is set 
> in topic settings.
> Our FetchRequest has minSize = 16MB, maxSize = 64, maxWait = 500ms. For a 
> silly reason, Kafka decides to reply with at least minSize buffer with just 
> one 1KB log message. When Sarama was using older consumer API, everything was 
> okay. When we upgraded to 0.11.0.0 consumer API, consumer traffic for 
> 125Mbit/s topic spiked to 55000Mbit/s on the wire and consumer wasn't even 
> able to keep up.
> 1KB message in a 16MB buffer is 1,600,000% overhead.
> I don't think there's any valid reason to do this.
> It's also mildly annoying that there is no tag 0.11.0.1 in git, looking at 
> changes is harder than it should be.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6309) add support for getting topic defaults from AdminClient

2018-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324854#comment-16324854
 ] 

ASF GitHub Bot commented on KAFKA-6309:
---

guozhangwang opened a new pull request #4421: KAFKA-6309: Return value getter 
based on KTable materialization status
URL: https://github.com/apache/kafka/pull/4421
 
 
   This is a bug fix that is composed of two parts:
   
   1. The major part is, for all operators that is generating a KTable, we 
should construct its value getter based on whether the KTable itself is 
materialized.
   1.a If yes, then query the materialized store directly for value getter.
   1.b If not, then hand over to its parents value getter (recursively) and 
apply the computation to return.
   
   2. The minor part is, in KStreamImpl, when joining with a table, we should 
connect with table's `valueGetterSupplier().storeNames()`, not the 
`internalStoreName()` as the latter always assume that the KTable is 
materialized, but that is not always true.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> add support for getting topic defaults from AdminClient
> ---
>
> Key: KAFKA-6309
> URL: https://issues.apache.org/jira/browse/KAFKA-6309
> Project: Kafka
>  Issue Type: Improvement
>Reporter: dan norwood
>Assignee: dan norwood
>
> kip here: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-234%3A+add+support+for+getting+topic+defaults+from+AdminClient



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6434) Kafka-consumer-groups.sh reset-offsets does not work properly for not existing group

2018-01-12 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324830#comment-16324830
 ] 

Vahid Hashemian commented on KAFKA-6434:


[~cabot] This is fixed in the current trunk (by KIP-171) and will land on 1.1.0.

It appears though that {{kafka-consumer-groups.sh --list --bootstrap-server 
localhost:9092}} still does not output such groups because the protocol type 
for them is not set (the required protocol type by this command is 
{{consumer}}). [~hachikuji], do you think this is the correct behavior?

> Kafka-consumer-groups.sh reset-offsets does not work properly for not 
> existing group
> 
>
> Key: KAFKA-6434
> URL: https://issues.apache.org/jira/browse/KAFKA-6434
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.2
>Reporter: George Smith
>Assignee: Vahid Hashemian
>
> Our usecase: We are migrating Spark streaming app into Kafka streaming. We 
> want to continue processing from the last processed offsets of the Spark 
> streaming app. Therefore we want to define new consumer group (application 
> id) with given offsets. The new app was not launched yet (we don't want to 
> make side effects of processing into db) -> new consumer group does not exist.
> I was happy to see the updated  Kafka-consumer-groups.sh supports 
> reset-offsets method. Unfortunately it seems it's not working as expected. 
> {code}
> kafka-consumer-groups.sh --reset-offsets --bootstrap-server localhost:9092 
> --topic testTopic:0 --group testGROUP --to-offset 10
> Note: This will only show information about consumers that use the Java 
> consumer API (non-ZooKeeper-based consumers).
> TOPIC  PARTITION  NEW-OFFSET
> testTopic  0  10
> {code}
> Now I want to check offsets for the group:
> {code}
> kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group 
> testGROUP
> Error: Consumer group 'testGROUP' does not exist.
> {code}
> That's strange, isn't it?
> On the other side when I use kafka-streams-application-reset.sh - the group 
> is obviously created - unfortunately this tool does not support given offsets 
> for partitions (only the beginning is supported) + it does not support 
> secured Kafka connection...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6434) Kafka-consumer-groups.sh reset-offsets does not work properly for not existing group

2018-01-12 Thread Satyajit varma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324760#comment-16324760
 ] 

Satyajit varma commented on KAFKA-6434:
---

[~cabot] it is because your Consumer Group "testGROUP" is not currently 
Stable(Running), but it does respond and show you the new offsets for the 
reset-offset request made.

Reset-offset would only work when the consumer group is INACTIVE(Not Running), 
you should be able to see "kafka-consumer-groups.sh --describe 
--bootstrap-server localhost:9092 --group testGROUP" result only when you have 
the consumer group running.

> Kafka-consumer-groups.sh reset-offsets does not work properly for not 
> existing group
> 
>
> Key: KAFKA-6434
> URL: https://issues.apache.org/jira/browse/KAFKA-6434
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.2
>Reporter: George Smith
>Assignee: Vahid Hashemian
>
> Our usecase: We are migrating Spark streaming app into Kafka streaming. We 
> want to continue processing from the last processed offsets of the Spark 
> streaming app. Therefore we want to define new consumer group (application 
> id) with given offsets. The new app was not launched yet (we don't want to 
> make side effects of processing into db) -> new consumer group does not exist.
> I was happy to see the updated  Kafka-consumer-groups.sh supports 
> reset-offsets method. Unfortunately it seems it's not working as expected. 
> {code}
> kafka-consumer-groups.sh --reset-offsets --bootstrap-server localhost:9092 
> --topic testTopic:0 --group testGROUP --to-offset 10
> Note: This will only show information about consumers that use the Java 
> consumer API (non-ZooKeeper-based consumers).
> TOPIC  PARTITION  NEW-OFFSET
> testTopic  0  10
> {code}
> Now I want to check offsets for the group:
> {code}
> kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group 
> testGROUP
> Error: Consumer group 'testGROUP' does not exist.
> {code}
> That's strange, isn't it?
> On the other side when I use kafka-streams-application-reset.sh - the group 
> is obviously created - unfortunately this tool does not support given offsets 
> for partitions (only the beginning is supported) + it does not support 
> secured Kafka connection...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6018) Make KafkaFuture.Function java 8 lambda compatible

2018-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324658#comment-16324658
 ] 

ASF GitHub Bot commented on KAFKA-6018:
---

ewencp closed pull request #4033: KAFKA-6018: Make KafkaFuture.Future an 
interface
URL: https://github.com/apache/kafka/pull/4033
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/clients/src/main/java/org/apache/kafka/clients/admin/DeleteAclsResult.java 
b/clients/src/main/java/org/apache/kafka/clients/admin/DeleteAclsResult.java
index 90bc2970e13..eaa5a0185cd 100644
--- a/clients/src/main/java/org/apache/kafka/clients/admin/DeleteAclsResult.java
+++ b/clients/src/main/java/org/apache/kafka/clients/admin/DeleteAclsResult.java
@@ -102,7 +102,7 @@ public ApiException exception() {
  */
 public KafkaFuture all() {
 return KafkaFuture.allOf(futures.values().toArray(new 
KafkaFuture[0])).thenApply(
-new KafkaFuture.Function() {
+new KafkaFuture.FunctionInterface() {
 @Override
 public Collection apply(Void v) {
 List acls = new ArrayList<>();
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConfigsResult.java
 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConfigsResult.java
index 478bf055cfd..343a06af9d5 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConfigsResult.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeConfigsResult.java
@@ -53,7 +53,7 @@
  */
 public KafkaFuture> all() {
 return KafkaFuture.allOf(futures.values().toArray(new KafkaFuture[0])).
-thenApply(new KafkaFuture.Function>() {
+thenApply(new KafkaFuture.FunctionInterface>() {
 @Override
 public Map apply(Void v) {
 Map configs = new 
HashMap<>(futures.size());
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeLogDirsResult.java
 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeLogDirsResult.java
index de186fd751d..9bd2d520b6b 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeLogDirsResult.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeLogDirsResult.java
@@ -51,7 +51,7 @@
  */
 public KafkaFuture>> all() {
 return KafkaFuture.allOf(futures.values().toArray(new KafkaFuture[0])).
-thenApply(new KafkaFuture.Function>>() {
+thenApply(new KafkaFuture.FunctionInterface>>() {
 @Override
 public Map> apply(Void v) {
 Map> descriptions = new 
HashMap<>(futures.size());
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeReplicaLogDirsResult.java
 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeReplicaLogDirsResult.java
index 401b4aa7b9d..8a73df38b46 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeReplicaLogDirsResult.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeReplicaLogDirsResult.java
@@ -52,7 +52,7 @@
  */
 public KafkaFuture> all() {
 return KafkaFuture.allOf(futures.values().toArray(new KafkaFuture[0])).
-thenApply(new KafkaFuture.Function>() {
+thenApply(new KafkaFuture.FunctionInterface>() {
 @Override
 public Map 
apply(Void v) {
 Map 
replicaLogDirInfos = new HashMap<>();
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeTopicsResult.java
 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeTopicsResult.java
index 18f5f9d20cd..add317659f6 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/admin/DescribeTopicsResult.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/admin/DescribeTopicsResult.java
@@ -51,7 +51,7 @@
  */
 public KafkaFuture> all() {
 return KafkaFuture.allOf(futures.values().toArray(new 

[jira] [Commented] (KAFKA-6018) Make KafkaFuture.Function java 8 lambda compatible

2018-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324659#comment-16324659
 ] 

ASF GitHub Bot commented on KAFKA-6018:
---

steven-aerts opened a new pull request #4033: KAFKA-6018: Make 
KafkaFuture.Future an interface
URL: https://github.com/apache/kafka/pull/4033
 
 
   Changing KafkaFuture.Future and KafkaFuture.BiConsumer into an interface 
makes
   them a functional interface.  This makes them Java 8 lambda compatible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make KafkaFuture.Function java 8 lambda compatible
> --
>
> Key: KAFKA-6018
> URL: https://issues.apache.org/jira/browse/KAFKA-6018
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Steven Aerts
>
> KafkaFuture.Function is currently an empty public abstract class.
> This means you cannot implement them as a java lambda.  And you end up with 
> constructs as:
> {code:java}
> new KafkaFuture.Function() {
> @Override
> public Object apply(Set strings) {
> return foo;
> }
> }
> {code}
> I propose to define them as interfaces.
> So this code can become in java 8:
> {code:java}
> strings -> foo
> {code}
> I know this change is backwards incompatible (extends becomes implements).
> But as {{KafkaFuture}} is marked as {{@InterfaceStability.Evolving}}.
> And KafkaFuture states in its javadoc:
> {quote}This will eventually become a thin shim on top of Java 8's 
> CompletableFuture.{quote}
> I think this change might be worth considering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6434) Kafka-consumer-groups.sh reset-offsets does not work properly for not existing group

2018-01-12 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian reassigned KAFKA-6434:
--

Assignee: Vahid Hashemian

> Kafka-consumer-groups.sh reset-offsets does not work properly for not 
> existing group
> 
>
> Key: KAFKA-6434
> URL: https://issues.apache.org/jira/browse/KAFKA-6434
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.2
>Reporter: George Smith
>Assignee: Vahid Hashemian
>
> Our usecase: We are migrating Spark streaming app into Kafka streaming. We 
> want to continue processing from the last processed offsets of the Spark 
> streaming app. Therefore we want to define new consumer group (application 
> id) with given offsets. The new app was not launched yet (we don't want to 
> make side effects of processing into db) -> new consumer group does not exist.
> I was happy to see the updated  Kafka-consumer-groups.sh supports 
> reset-offsets method. Unfortunately it seems it's not working as expected. 
> {code}
> kafka-consumer-groups.sh --reset-offsets --bootstrap-server localhost:9092 
> --topic testTopic:0 --group testGROUP --to-offset 10
> Note: This will only show information about consumers that use the Java 
> consumer API (non-ZooKeeper-based consumers).
> TOPIC  PARTITION  NEW-OFFSET
> testTopic  0  10
> {code}
> Now I want to check offsets for the group:
> {code}
> kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group 
> testGROUP
> Error: Consumer group 'testGROUP' does not exist.
> {code}
> That's strange, isn't it?
> On the other side when I use kafka-streams-application-reset.sh - the group 
> is obviously created - unfortunately this tool does not support given offsets 
> for partitions (only the beginning is supported) + it does not support 
> secured Kafka connection...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5540) Deprecate and remove internal converter configs

2018-01-12 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324350#comment-16324350
 ] 

Randall Hauch commented on KAFKA-5540:
--

[KIP-174|https://cwiki.apache.org/confluence/display/KAFKA/KIP-174+-+Deprecate+and+remove+internal+converter+configs+in+WorkerConfig]
 covers this and is now undergoing voting.

> Deprecate and remove internal converter configs
> ---
>
> Key: KAFKA-5540
> URL: https://issues.apache.org/jira/browse/KAFKA-5540
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> The internal.key.converter and internal.value.converter were original exposed 
> as configs because a) they are actually pluggable and b) providing a default 
> would require relying on the JsonConverter always being available, which 
> until we had classloader isolation it was possible might be removed for 
> compatibility reasons.
> However, this has ultimately just caused a lot more trouble and confusion 
> than it is worth. We should deprecate the configs, give them a default of 
> JsonConverter (which is also kind of nice since it results in human-readable 
> data in the internal topics), and then ultimately remove them in the next 
> major version.
> These are all public APIs so this will need a small KIP before we can make 
> the change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6443) KTable involved in multiple joins could result in duplicate results

2018-01-12 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324283#comment-16324283
 ] 

Guozhang Wang commented on KAFKA-6443:
--

The two duplicate records will be the same, see my added tests in 
https://github.com/apache/kafka/pull/4331, the expected result list for details.

> KTable involved in multiple joins could result in duplicate results
> ---
>
> Key: KAFKA-6443
> URL: https://issues.apache.org/jira/browse/KAFKA-6443
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>
> Consider the following multi table-table joins:
> {code}
> table1.join(table2).join(table2);// "join" could be replaced with 
> "leftJoin" and "outerJoin"
> {code}
> where {{table2}} is involved multiple times in this multi-way joins. In this 
> case, when a new record from the source topic of {{table2}} is being 
> processing, it will send to two children down in the topology and hence may 
> resulting in duplicated join results depending on the join types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5117) Kafka Connect REST endpoints reveal Password typed values

2018-01-12 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324215#comment-16324215
 ] 

Randall Hauch commented on KAFKA-5117:
--

So I'm a bit concerned that simply masking the passwords will not be that 
advantageous. Sure, it might work if you're just managing your configuration 
files locally and then using the REST API with curl and thus never really 
needing to get configurations back out. But this change would likely break 
every management tool that is using the API to read, modify, and post 
configurations. Also, to maintain backward compatibility, we'd need to 
introduce a config file that defaults to _not masking_ -- doesn't that kind of 
defeat the purpose?

[KIP-208|https://cwiki.apache.org/confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface]
 is already trying to add SSL/TLS support to the Connect REST API, and then 
adding (with a different KIP) ACLs support would mean you can control who can 
and cannot use different endpoints. That is definitely one approach to 
preventing exposure of passwords.

Another approach is to avoid putting passwords in the configuration file in the 
first place. KAFKA-6142 proposes adding support for variables in configuration 
files, and variables could be used in place of passwords to have the passwords 
resolved only upon deployment via some "configuration transformer" plugin.

> Kafka Connect REST endpoints reveal Password typed values
> -
>
> Key: KAFKA-5117
> URL: https://issues.apache.org/jira/browse/KAFKA-5117
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Thomas Holmes
>  Labels: needs-kip
>
> A Kafka Connect connector can specify ConfigDef keys as type of Password. 
> This type was added to prevent logging the values (instead "[hidden]" is 
> logged).
> This change does not apply to the values returned by executing a GET on 
> {{connectors/\{connector-name\}}} and 
> {{connectors/\{connector-name\}/config}}. This creates an easily accessible 
> way for an attacker who has infiltrated your network to gain access to 
> potential secrets that should not be available.
> I have started on a code change that addresses this issue by parsing the 
> config values through the ConfigDef for the connector and returning their 
> output instead (which leads to the masking of Password typed configs as 
> [hidden]).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6204) Interceptor and MetricsReporter should implement java.io.Closeable

2018-01-12 Thread Charly Molter (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323797#comment-16323797
 ] 

Charly Molter commented on KAFKA-6204:
--

This is stuck on the java8 upgrade so we can have a default implementation for 
backward compatibility

> Interceptor and MetricsReporter should implement java.io.Closeable
> --
>
> Key: KAFKA-6204
> URL: https://issues.apache.org/jira/browse/KAFKA-6204
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Charly Molter
>Priority: Minor
>
> The serializers and deserializers extends the Closeable interface, even 
> ConsumerInterceptors and ProducerInterceptors implement it.
> ConsumerInterceptor, ProducerInterceptor and MetricsReporter do not extend 
> the Closeable interface.
> Maybe they should for coherency with the rest of the apis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5890) records.lag should use tags for topic and partition rather than using metric name.

2018-01-12 Thread Charly Molter (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charly Molter reassigned KAFKA-5890:


Assignee: Charly Molter

> records.lag should use tags for topic and partition rather than using metric 
> name.
> --
>
> Key: KAFKA-5890
> URL: https://issues.apache.org/jira/browse/KAFKA-5890
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.2.0
>Reporter: Charly Molter
>Assignee: Charly Molter
> Fix For: 1.1.0
>
>
> As part of KIP-92[1] a per partition lag metric was added.
> These metrics are really useful, however in the implementation  it was 
> implemented as a prefix to the metric name: 
> https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java#L1321-L1344
> Usually these kind of metrics use tags and the name is constant for all 
> topics, partitions.
> We have a custom reporter which aggregates topics/partitions together to 
> avoid explosion of the number of KPIs and this KPI doesn't support this as it 
> doesn't have tags but a complex name.
> [1] 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-92+-+Add+per+partition+lag+metrics+to+KafkaConsumer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6310) ConcurrentModificationException when reporting requests-in-flight in producer

2018-01-12 Thread Charly Molter (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charly Molter reassigned KAFKA-6310:


Assignee: Charly Molter

> ConcurrentModificationException when reporting requests-in-flight in producer
> -
>
> Key: KAFKA-6310
> URL: https://issues.apache.org/jira/browse/KAFKA-6310
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics, network
>Affects Versions: 1.0.0
>Reporter: Charly Molter
>Assignee: Charly Molter
>
> We are running in an issue really similar to KAFKA-4950.
> We have a producer running and a MetricsReporter with a background thread 
> which publishes these metrics.
> The concurrent exception happens when calling `InFlightRequests.count()` in 
> one thread when a connection or disconnection is happening.
> In this case one thread is iterating over the map while another is 
> adding/removing from it thus causing the exception.
> We could potentially fix this with a volatile like in KAFKA-4950.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6180) Add a Validator for NonNull configurations and remove redundant null checks on lists

2018-01-12 Thread Charly Molter (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charly Molter updated KAFKA-6180:
-
Fix Version/s: 1.1.0

> Add a Validator for NonNull configurations and remove redundant null checks 
> on lists
> 
>
> Key: KAFKA-6180
> URL: https://issues.apache.org/jira/browse/KAFKA-6180
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 0.10.1.0, 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.0, 1.0.0
>Reporter: Charly Molter
>Assignee: Charly Molter
>Priority: Trivial
> Fix For: 1.1.0
>
>
> AbstractConfig.getList returns null if the property is unset and there's no 
> default.
> This creates a lot of cases where we need to do null checks (and remember 
> them).
> It's good practice to just return an empty list as usually code naturally 
> handles empty lists.
> To do this we set the default on lists to be Collections.emptyList() and add 
> a Validator to disallow null values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

2018-01-12 Thread Andreas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323763#comment-16323763
 ] 

Andreas commented on KAFKA-6442:


For what is worth, restarting node4 (for maintenance) seems to have got 
everything unstuck. I guess it triggered leader relection internally?

> Catch 22 with cluster rebalancing
> -
>
> Key: KAFKA-6442
> URL: https://issues.apache.org/jira/browse/KAFKA-6442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6442) Catch 22 with cluster rebalancing

2018-01-12 Thread Andreas (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323747#comment-16323747
 ] 

Andreas commented on KAFKA-6442:


Thanks for the reply. I am afraid "unclean.leader.election.enable" is not set 
at all, so it should default to true.
Running ./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids" returns

WatchedEvent state:SyncConnected type:None path:null
[1, 2, 3, 4]

which is legit.

> Catch 22 with cluster rebalancing
> -
>
> Key: KAFKA-6442
> URL: https://issues.apache.org/jira/browse/KAFKA-6442
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andreas
>
> PS. I classified this as a bug because I think the cluster should not be 
> stuck in that situation, apologies if that is wrong.
> Hi,
> I found myself in a situation a bit difficult to explain so I will skip the 
> how I ended up in this situation, but here is the problem.
> Some of the brokers of my cluster are permanently gone. Consequently, I had 
> some partitions that now had offline leaders etc so, I used the 
> {{kafka-reassign-partitions.sh}} to rebalance my topics and for the most part 
> that worked ok. Where that did not work ok, was for partitions that had 
> leaders, rs and irs completely in the gone brokers. Those got stuck halfway 
> through to what now looks like
> Topic: topicA Partition: 32 Leader: -1 Replicas: 1,6,2,7,3,8 Isr: 
> (1,2,3 are legit, 6,7,8 permanently gone)
> So the first catch 22, is that I cannot elect a new leader, because the 
> leader needs to be elected from the ISR, and I cannot recreate the ISR 
> because the topic has no leader.
> The second catch 22 is that I cannot rerun {{kafka-reassign-partitions.sh}} 
> because the previous one is supposedly still in progress, and I cannot 
> increase the number of partitions to account for the now permanently offline 
> partitions, because that produces the following error {{Error while executing 
> topic command requirement failed: All partitions should have the same number 
> of replicas.}}, from which I cannot recover because I cannot run 
> {{kafka-reassign-partitions.sh}}.
> Is there a way to recover from such a situation? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)