[jira] [Issue Comment Deleted] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongLiang updated KAFKA-6326:
-
Comment: was deleted

(was: [~liuweiwell]gao xiao ne?)

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongLiang updated KAFKA-6326:
-
Attachment: (was: fast-recver-shutdownbroker.diff)

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283158#comment-16283158
 ] 

HongLiang edited comment on KAFKA-6326 at 12/8/17 7:32 AM:
---

sorry, the version is 0.10.2.1 [~huxi_2b]


was (Author: hongliang):
sorry, the version is 0.10.2.1

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283158#comment-16283158
 ] 

HongLiang commented on KAFKA-6326:
--

sorry, the version is 0.10.2.1

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread huxihx (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283156#comment-16283156
 ] 

huxihx commented on KAFKA-6326:
---

Based on the screenshot, it seems to be 0.10.2.1 codebase. Did you run into 
this problem against 1.0.0?

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongLiang updated KAFKA-6326:
-
Attachment: (was: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png)

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283053#comment-16283053
 ] 

HongLiang edited comment on KAFKA-6326 at 12/8/17 4:51 AM:
---

[~huxi_2b]  zookeeper.session.timeout.ms is 120 sec. but the zk session timeout 
not problem in this case.
the problem is broker has down. the controller still send broker .wait 30 sec. 
I think controller not wait. because controller know the broker has down.


was (Author: hongliang):
[~huxi_2b]  zookeeper.session.timeout.ms is 120 sec. but the zk session timeout 
not problem in this case.
the problem is broker has down(zookeeper has know in session.timeout). the 
controller still send broker .wait 30 sec. I think controller not wait. because 
controller know the broker has down.

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283053#comment-16283053
 ] 

HongLiang commented on KAFKA-6326:
--

[~huxi_2b]  zookeeper.session.timeout.ms is 120 sec. but the zk session timeout 
not problem in this case.
the problem is broker has down(zookeeper has know in session.timeout). the 
controller still send broker .wait 30 sec. I think controller not wait. because 
controller know the broker has down.

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HongLiang updated KAFKA-6326:
-
Attachment: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread huxihx (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283028#comment-16283028
 ] 

huxihx commented on KAFKA-6326:
---

What's your `zookeeper.session.timeout.ms`? And I am also curious about what 
ZkEventThread complained before shutting down the request-sending thread.

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283003#comment-16283003
 ] 

ASF GitHub Bot commented on KAFKA-6323:
---

GitHub user fredfp opened a pull request:

https://github.com/apache/kafka/pull/4304

KAFKA-6323: document that punctuation is called immediately.

If KAFKA-6323 is not a bug, then it needs better documentation.

Alternative to https://github.com/apache/kafka/pull/4301

@mihbor @mjsax

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fredfp/kafka KAFKA-6323

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4304


commit 935d0fc61084f383ec528d28f9c18f8b51fff1d2
Author: Frederic Arno 
Date:   2017-12-08T03:22:52Z

KAFKA-6323: document that punctuation is called immediately.




> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6329) Load trust store as a resource

2017-12-07 Thread Allen Wang (JIRA)
Allen Wang created KAFKA-6329:
-

 Summary: Load trust store as a resource
 Key: KAFKA-6329
 URL: https://issues.apache.org/jira/browse/KAFKA-6329
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 1.0.0, 0.11.0.0, 0.10.2.0
Reporter: Allen Wang


We would like to publish a Kafka client library with SSL enabled by default and 
distribute to internal applications so that they can communicate to our brokers 
securely. We also need to distribute a trust store with our internal CA cert. 
In our library/application ecosystem, this is the easiest way to enable 
security without adding burdens to each application to deploy a certain trust 
store.

However, that does not seem to be possible as Kafka client assumes that the 
trust store is in a local file system and uses FileInputStream which does not 
work with resources.

https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/ssl/SslFactory.java

Here is the actual line of code:

{code:java}
in = new FileInputStream(path);
{code}

Ideally we would also like to be able to do this as another way to load trust 
store:

{code:java}
in = this.getClass().getResourcesAsStream(resourcePath)
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6126) Reduce rebalance time by not checking if created topics are available

2017-12-07 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-6126:
--

Assignee: Matthias J. Sax

> Reduce rebalance time by not checking if created topics are available
> -
>
> Key: KAFKA-6126
> URL: https://issues.apache.org/jira/browse/KAFKA-6126
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
> Fix For: 1.1.0
>
>
> Within {{StreamPartitionAssignor#assign}} we create new topics and afterwards 
> wait in an "infinite loop" until topic metadata propagated throughout the 
> cluster. We do this, to make sure topics are available when we start 
> processing.
> However, with this approach we "extend" the time in the rebalance phase and 
> thus are not responsive (no calls to `poll` for liveness check and 
> {{KafkaStreams#close}} suffers). Thus, we might want to remove this check and 
> handle potential "topic not found" exceptions in the main thread gracefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282953#comment-16282953
 ] 

HongLiang edited comment on KAFKA-6326 at 12/8/17 2:57 AM:
---

[~liuweiwell]gao xiao ne?


was (Author: hongliang):
[~liuweiwell]gao xiao ni?

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282953#comment-16282953
 ] 

HongLiang commented on KAFKA-6326:
--

[~liuweiwell]gao xiao ni?

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6318) StreamsResetter should return non-zero return code on error

2017-12-07 Thread siva santhalingam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282932#comment-16282932
 ] 

siva santhalingam commented on KAFKA-6318:
--

Hi [~mjsax] Can i assign this to myself. Also 
maybeResetInputAndSeekToEndIntermediateTopicOffsets is the only method that 
needs to be changed right ? Please let me know if I'm missing something ?

> StreamsResetter should return non-zero return code on error
> ---
>
> Key: KAFKA-6318
> URL: https://issues.apache.org/jira/browse/KAFKA-6318
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, tools
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>
> If users specify a non-existing topic as input parameter,  
> {{StreamsResetter}} only prints out an error message that the topic was not 
> found, but return code is still zero. We should return a non-zero return code 
> for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4857) Replace StreamsKafkaClient with AdminClient in Kafka Streams

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282785#comment-16282785
 ] 

ASF GitHub Bot commented on KAFKA-4857:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4242


> Replace StreamsKafkaClient with AdminClient in Kafka Streams
> 
>
> Key: KAFKA-4857
> URL: https://issues.apache.org/jira/browse/KAFKA-4857
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
> Fix For: 1.1.0
>
>
> Streams uses {{KafkaClientSupplier}} to get 
> consumer/restore-consumer/producer clients. Streams also uses one more client 
> for admin purpose namely {{StreamsKafkaClient}} that is instantiated 
> "manually".
> With the newly upcoming {{AdminClient}} from KIP-117, we can simplify (or 
> even replace {{StreamsKafkaClient}} with the new {{AdminClient}}. We 
> furthermore want to unify how the client in generated and extend 
> {{KafkaClientSupplier}} with method that return this client.
> NOTE: The public facing changes are summarized in a separate ticket 
> KAFKA-6170, and this ticket is only for the internal swap, with the accepted 
> criterion to completely remove StreamsKafkaClient with the newly introduced 
> KafkaAdminClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6328) Exclude node groups belonging to global stores in InternalTopologyBuilder#makeNodeGroups

2017-12-07 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6328:
---
Affects Version/s: 1.0.0

> Exclude node groups belonging to global stores in 
> InternalTopologyBuilder#makeNodeGroups
> 
>
> Key: KAFKA-6328
> URL: https://issues.apache.org/jira/browse/KAFKA-6328
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Guozhang Wang
>  Labels: newbie
>
> Today when we group processor nodes into groups (i.e. sub-topologies), we 
> assign the sub-topology id for global tables' dummy groups as well. As a 
> result, the subtopology ids (and hence task ids) are not consecutive anymore. 
> This is quite confusing for users trouble shooting and debugging; in 
> addition, the node group for global stores are not useful as well: we simply 
> exclude it in all the caller functions of makeNodeGroups.
> It would be better to simply exclude the global store's node groups in this 
> function so that the subtopology ids and task ids are consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6328) Exclude node groups belonging to global stores in InternalTopologyBuilder#makeNodeGroups

2017-12-07 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-6328:


 Summary: Exclude node groups belonging to global stores in 
InternalTopologyBuilder#makeNodeGroups
 Key: KAFKA-6328
 URL: https://issues.apache.org/jira/browse/KAFKA-6328
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Today when we group processor nodes into groups (i.e. sub-topologies), we 
assign the sub-topology id for global tables' dummy groups as well. As a 
result, the subtopology ids (and hence task ids) are not consecutive anymore. 
This is quite confusing for users trouble shooting and debugging; in addition, 
the node group for global stores are not useful as well: we simply exclude it 
in all the caller functions of makeNodeGroups.

It would be better to simply exclude the global store's node groups in this 
function so that the subtopology ids and task ids are consecutive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6327) IllegalArgumentException in RocksDB when RocksDBException being generated

2017-12-07 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282593#comment-16282593
 ] 

Matthias J. Sax commented on KAFKA-6327:


Thanks for the heads up [~nyokodo]!

> IllegalArgumentException in RocksDB when RocksDBException being generated
> -
>
> Key: KAFKA-6327
> URL: https://issues.apache.org/jira/browse/KAFKA-6327
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Anthony May
>Priority: Minor
>
> RocksDB had a bug where RocksDBException subCodes related to disk usage were 
> not present and when a RocksDBException is generated for those it throws an 
> IllegalArgumentException instead obscuring the error. This is 
> [fixed|https://github.com/facebook/rocksdb/pull/3050] in RocksDB master but 
> doesn't appear to have been released yet. Adding this issue so that it can be 
> tracked for a future release.
> Exception:
> {noformat}
> java.lang.IllegalArgumentException: Illegal value provided for SubCode.
>   at org.rocksdb.Status$SubCode.getSubCode(Status.java:109)
>   at org.rocksdb.Status.(Status.java:30)
>   at org.rocksdb.RocksDB.write0(Native Method)
>   at org.rocksdb.RocksDB.write(RocksDB.java:602)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6319) kafka-acls regression for comma characters (and maybe other characters as well)

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282489#comment-16282489
 ] 

ASF GitHub Bot commented on KAFKA-6319:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/4303

KAFKA-6319: Quote strings stored in JSON configs

This is required for ACLs where SSL principals contain special characters 
(e.g. comma) that are escaped using backslash. The strings need to be quoted 
for JSON to ensure that the JSON stored in ZK is valid.
Also converted `SslEndToEndAuthorizationTest` to use a principal with 
special characters to ensure that this path is tested.

*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-6319

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4303


commit 9518f465a76b3431a36777d83d57bf1430eaa302
Author: Rajini Sivaram 
Date:   2017-12-07T20:00:03Z

KAFKA-6319: Quote strings in JSON to enable ACLs for principals with 
special chars




> kafka-acls regression for comma characters (and maybe other characters as 
> well)
> ---
>
> Key: KAFKA-6319
> URL: https://issues.apache.org/jira/browse/KAFKA-6319
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 1.0.0
> Environment: Debian 8. Java 8. SSL clients.
>Reporter: Jordan Mcmillan
>Assignee: Rajini Sivaram
>  Labels: regression
> Fix For: 1.1.0
>
>
> As of version 1.0.0, kafka-acls.sh no longer recognizes my ACLs stored in 
> zookeeper. I am using SSL and the default principle builder class. My 
> principle name contains a comma. Ex:
> "CN=myhost.mycompany.com,OU=MyOU,O=MyCompany, Inc.,ST=MyST,C=US"
> The default principle builder uses the getName() function in 
> javax.security.auth.x500:
> https://docs.oracle.com/javase/8/docs/api/javax/security/auth/x500/X500Principal.html#getName
> The documentation states "The only characters in attribute values that are 
> escaped are those which section 2.4 of RFC 2253 states must be escaped". This 
> makes sense as my kakfa-authorizor log shows the comma correctly escaped with 
> a backslash:
> INFO Principal = User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, 
> Inc.,ST=MyST,C=US is Denied Operation = Describe from host = 1.2.3.4 on 
> resource = Topic:mytopic (kafka.authorizer.logger)
> Here's what I get when I try to create the ACL in kafka 1.0:
> {code:java}
> > # kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
> > --allow-principal User:"CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, 
> > Inc.,ST=MyST,C=US" --operation "Describe" --allow-host "*" --topic="mytopic"
> Adding ACLs for resource `Topic:mytopic`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> Current ACLs for resource `Topic:mytopic`:
>  "User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US has 
> Allow permission for operations: Describe from hosts: *">
> {code}
> Examining Zookeeper, I can see the data. Though I notice that the json string 
> for ACLs is not actually valid since the backslash is not escaped with a 
> double backslash. This was true for 0.11.0.1 as well, but was never actually 
> a problem.
> {code:java}
> > #  zk-shell localhost:2181
> Welcome to zk-shell (1.1.1)
> (CLOSED) />
> (CONNECTED) /> get /kafka-acl/Topic/mytopic
> {"version":1,"acls":[{"principal":"User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\,
>  
> Inc.,ST=MyST,C=US","permissionType":"Allow","operation":"Describe","host":"*"}]}
> (CONNECTED) /> json_get /kafka-acl/Topic/mytopic acls
> Path /kafka-acl/Topic/mytopic has bad JSON.
> {code}
> Now Kafka does not recognize any ACLs that have an escaped comma in the 
> principle name and all the clients are denied access. I tried searching for 

[jira] [Created] (KAFKA-6327) IllegalArgumentException in RocksDB when RocksDBException being generated

2017-12-07 Thread Anthony May (JIRA)
Anthony May created KAFKA-6327:
--

 Summary: IllegalArgumentException in RocksDB when RocksDBException 
being generated
 Key: KAFKA-6327
 URL: https://issues.apache.org/jira/browse/KAFKA-6327
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.0.0
Reporter: Anthony May
Priority: Minor


RocksDB had a bug where RocksDBException subCodes related to disk usage were 
not present and when a RocksDBException is generated for those it throws an 
IllegalArgumentException instead obscuring the error. This is 
[fixed|https://github.com/facebook/rocksdb/pull/3050] in RocksDB master but 
doesn't appear to have been released yet. Adding this issue so that it can be 
tracked for a future release.

Exception:

{noformat}
java.lang.IllegalArgumentException: Illegal value provided for SubCode.
at org.rocksdb.Status$SubCode.getSubCode(Status.java:109)
at org.rocksdb.Status.(Status.java:30)
at org.rocksdb.RocksDB.write0(Native Method)
at org.rocksdb.RocksDB.write(RocksDB.java:602)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-07 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6323:
---
Fix Version/s: 1.1.0

> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
> Fix For: 1.1.0, 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception

2017-12-07 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282274#comment-16282274
 ] 

Matthias J. Sax commented on KAFKA-6260:


You could check out {{https://github.com/apache/kafka/tree/1.0}} and build it 
yourself. Than update your application dependencies to use the new client jars.

> AbstractCoordinator not clearly handles NULL Exception
> --
>
> Key: KAFKA-6260
> URL: https://issues.apache.org/jira/browse/KAFKA-6260
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RedHat Linux
>Reporter: Seweryn Habdank-Wojewodzki
>Assignee: Jason Gustafson
> Fix For: 1.1.0, 1.0.1
>
>
> The error reporting is not clear. But it seems that Kafka Heartbeat shuts 
> down application due to NULL exception caused by "fake" disconnections.
> One more comment. We are processing messages in the stream, but sometimes we 
> have to block processing for minutes, as consumers are not handling too much 
> load. Is it possibble that when stream is waiting, then heartbeat is as well 
> blocked?
> Can you check that?
> {code}
> 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending Heartbeat request to coordinator 
> cljp01.eb.lan.at:9093 (id: 2147483646 rack: null)
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending HEARTBEAT 
> {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08}
>  with correlation id 24 to node 2147483646
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT 
> with correlation id 24, received {throttle_time_ms=0,error_code=0}
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout.
> 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled request 
> {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]}
>  with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, 
> apiVersion=6, 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  correlationId=21) with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Fetch request 
> {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, 
> maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed 
> org.apache.kafka.common.errors.DisconnectException: null
> 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  grou

[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282200#comment-16282200
 ] 

Ismael Juma commented on KAFKA-6322:


The issue was always related to holding a file handle while trying to delete 
the file. KAFKA-6324 fixes one instance of that.

> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282186#comment-16282186
 ] 

Ted Yu commented on KAFKA-6322:
---

Edited previous comment.
Since Dongyan's case was not for file not existing at time of deletion, we 
should look out for possible cause of file being used.

> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282120#comment-16282120
 ] 

Ted Yu edited comment on KAFKA-6322 at 12/7/17 5:20 PM:


FileSystemException was encountered in Dongyan's case (which is not among the 
exceptions thrown by 
https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)).

Edit:
If the problem is not fixed, I am not sure what exception would surface when 
deleting file.


was (Author: yuzhih...@gmail.com):
FileSystemException was encountered in Dongyan's case (which is not among the 
exceptions thrown by 
https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)).


> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282156#comment-16282156
 ] 

Ismael Juma commented on KAFKA-6322:


It was thrown by Files.delete as shown by the stacktrace. Not sure how the 
javadoc helps here.

> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282120#comment-16282120
 ] 

Ted Yu commented on KAFKA-6322:
---

FileSystemException was encountered in Dongyan's case (which is not among the 
exceptions thrown by 
https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path)).


> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage

2017-12-07 Thread Manikumar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282116#comment-16282116
 ] 

Manikumar commented on KAFKA-6199:
--

[~rt_skyscanner] yes, jstack output

> Single broker with fast growing heap usage
> --
>
> Key: KAFKA-6199
> URL: https://issues.apache.org/jira/browse/KAFKA-6199
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.1
> Environment: Amazon Linux
>Reporter: Robin Tweedie
> Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot 
> 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, 
> histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, 
> path2gc.png
>
>
> We have a single broker in our cluster of 25 with fast growing heap usage 
> which necessitates us restarting it every 12 hours. If we don't restart the 
> broker, it becomes very slow from long GC pauses and eventually has 
> {{OutOfMemory}} errors.
> See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage 
> percentage on the broker. A "normal" broker in the same cluster stays below 
> 50% (averaged) over the same time period.
> We have taken heap dumps when the broker's heap usage is getting dangerously 
> high, and there are a lot of retained {{NetworkSend}} objects referencing 
> byte buffers.
> We also noticed that the single affected broker logs a lot more of this kind 
> of warning than any other broker:
> {noformat}
> WARN Attempting to send response via channel for which there is no open 
> connection, connection id 13 (kafka.network.Processor)
> {noformat}
> See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log 
> message visualized across all the brokers (to show it happens a bit on other 
> brokers, but not nearly as much as it does on the "bad" broker).
> I can't make the heap dumps public, but would appreciate advice on how to pin 
> down the problem better. We're currently trying to narrow it down to a 
> particular client, but without much success so far.
> Let me know what else I could investigate or share to track down the source 
> of this leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage

2017-12-07 Thread Robin Tweedie (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282111#comment-16282111
 ] 

Robin Tweedie commented on KAFKA-6199:
--

[~omkreddy] Nothing obvious beyond the warnings I shared further up. I'll have 
another look.

When you say thread dump, just the output of {{jstack PID}}?

> Single broker with fast growing heap usage
> --
>
> Key: KAFKA-6199
> URL: https://issues.apache.org/jira/browse/KAFKA-6199
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.1
> Environment: Amazon Linux
>Reporter: Robin Tweedie
> Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot 
> 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, 
> histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, 
> path2gc.png
>
>
> We have a single broker in our cluster of 25 with fast growing heap usage 
> which necessitates us restarting it every 12 hours. If we don't restart the 
> broker, it becomes very slow from long GC pauses and eventually has 
> {{OutOfMemory}} errors.
> See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage 
> percentage on the broker. A "normal" broker in the same cluster stays below 
> 50% (averaged) over the same time period.
> We have taken heap dumps when the broker's heap usage is getting dangerously 
> high, and there are a lot of retained {{NetworkSend}} objects referencing 
> byte buffers.
> We also noticed that the single affected broker logs a lot more of this kind 
> of warning than any other broker:
> {noformat}
> WARN Attempting to send response via channel for which there is no open 
> connection, connection id 13 (kafka.network.Processor)
> {noformat}
> See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log 
> message visualized across all the brokers (to show it happens a bit on other 
> brokers, but not nearly as much as it does on the "bad" broker).
> I can't make the heap dumps public, but would appreciate advice on how to pin 
> down the problem better. We're currently trying to narrow it down to a 
> particular client, but without much success so far.
> Let me know what else I could investigate or share to track down the source 
> of this leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery

2017-12-07 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282109#comment-16282109
 ] 

Ismael Juma commented on KAFKA-6324:


https://github.com/apache/kafka/pull/4040

> Change LogSegment.delete to deleteIfExists and harden log recovery
> --
>
> Key: KAFKA-6324
> URL: https://issues.apache.org/jira/browse/KAFKA-6324
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 1.1.0
>
>
> Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and KAFKA-6075) 
> and makes the code more robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-6324:
---
Description: Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and 
KAFKA-6075) and makes the code more robust.  (was: Fixes KAFKA-6194 and makes 
the code more robust.)

> Change LogSegment.delete to deleteIfExists and harden log recovery
> --
>
> Key: KAFKA-6324
> URL: https://issues.apache.org/jira/browse/KAFKA-6324
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 1.1.0
>
>
> Fixes KAFKA-6194 and a delete while open issue (KAFKA-6322 and KAFKA-6075) 
> and makes the code more robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282105#comment-16282105
 ] 

Ismael Juma commented on KAFKA-6322:


Probably a duplicate of KAFKA-6324, that is nearly ready to be merged.

> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6322) Error deleting log for topic, all log dirs failed.

2017-12-07 Thread dongyan li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282064#comment-16282064
 ] 

dongyan li commented on KAFKA-6322:
---

Yes, I think they attribute to the same problem. However, instead of changing 
the methods for deleting, we may want to find whom is actually holding the file 
handler when the system is try to delete the folder. I know NFS filesystem is 
not a "regular" filesystem and that is the root problem.

> Error deleting log for topic, all log dirs failed.
> --
>
> Key: KAFKA-6322
> URL: https://issues.apache.org/jira/browse/KAFKA-6322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RancherOS with NFS mounted for log directory
>Reporter: dongyan li
>
> Hello,
> I encountered a error when I try to delete a topic with kafka version 1.0.0, 
> the error is not present on version 0.10.2.1 which is the version I upgraded 
> from.
> I suspect that some other thread is still using that file while the Kafka is 
> trying to delete that.
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,846] ERROR Exception while deleting 
> Log(/opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete) in 
> dir /opt/kafka/logs. (kafka.log.LogManager)
> 12/6/2017 3:37:32 PMorg.apache.kafka.common.errors.KafkaStorageException: 
> Error while deleting log for topicname-0 in dir /opt/kafka/logs
> 12/6/2017 3:37:32 PMCaused by: java.nio.file.FileSystemException: 
> /opt/kafka/logs/topicname-0.112ff11872e4411ca7470ba7e3026ab0-delete/.nfs00e609f200ce:
>  Device or resource busy
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
> 12/6/2017 3:37:32 PM  at 
> sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.delete(Files.java:1126)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:630)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils$1.visitFile(Utils.java:619)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2670)
> 12/6/2017 3:37:32 PM  at java.nio.file.Files.walkFileTree(Files.java:2742)
> 12/6/2017 3:37:32 PM  at 
> org.apache.kafka.common.utils.Utils.delete(Utils.java:619)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.$anonfun$delete$2(Log.scala:1432)
> 12/6/2017 3:37:32 PM  at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
> 12/6/2017 3:37:32 PM  at kafka.log.Log.delete(Log.scala:1427)
> 12/6/2017 3:37:32 PM  at kafka.log.LogManager.deleteLogs(LogManager.scala:626)
> 12/6/2017 3:37:32 PM  at 
> kafka.log.LogManager.$anonfun$startup$7(LogManager.scala:362)
> 12/6/2017 3:37:32 PM  at 
> kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
> 12/6/2017 3:37:32 PM  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 12/6/2017 3:37:32 PM  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 12/6/2017 3:37:32 PM  at java.lang.Thread.run(Thread.java:748)
> {code}
> Then
> {code:java}
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,882] INFO Stopping serving logs in 
> dir /opt/kafka/logs (kafka.log.LogManager)
> 12/6/2017 3:37:32 PM[2017-12-06 21:37:32,883] FATAL Shutdown broker because 
> all log dirs in /opt/kafka/logs have failed (kafka.log.LogManager)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6199) Single broker with fast growing heap usage

2017-12-07 Thread Manikumar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282041#comment-16282041
 ] 

Manikumar commented on KAFKA-6199:
--

Are there any exceptions in the  broker? Can you upload the thread dump of the 
problematic broker?

> Single broker with fast growing heap usage
> --
>
> Key: KAFKA-6199
> URL: https://issues.apache.org/jira/browse/KAFKA-6199
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.1
> Environment: Amazon Linux
>Reporter: Robin Tweedie
> Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot 
> 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, 
> histo_live_20171206.txt, histo_live_80.txt, merge_shortest_paths.png, 
> path2gc.png
>
>
> We have a single broker in our cluster of 25 with fast growing heap usage 
> which necessitates us restarting it every 12 hours. If we don't restart the 
> broker, it becomes very slow from long GC pauses and eventually has 
> {{OutOfMemory}} errors.
> See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage 
> percentage on the broker. A "normal" broker in the same cluster stays below 
> 50% (averaged) over the same time period.
> We have taken heap dumps when the broker's heap usage is getting dangerously 
> high, and there are a lot of retained {{NetworkSend}} objects referencing 
> byte buffers.
> We also noticed that the single affected broker logs a lot more of this kind 
> of warning than any other broker:
> {noformat}
> WARN Attempting to send response via channel for which there is no open 
> connection, connection id 13 (kafka.network.Processor)
> {noformat}
> See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log 
> message visualized across all the brokers (to show it happens a bit on other 
> brokers, but not nearly as much as it does on the "bad" broker).
> I can't make the heap dumps public, but would appreciate advice on how to pin 
> down the problem better. We're currently trying to narrow it down to a 
> particular client, but without much success so far.
> Let me know what else I could investigate or share to track down the source 
> of this leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6325) Producer.flush() doesn't throw exception on timeout

2017-12-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282030#comment-16282030
 ] 

Ted Yu commented on KAFKA-6325:
---

I assume you have modified your producer code to accommodate this behavior.

Looks like option #2 can be adopted.

> Producer.flush() doesn't throw exception on timeout
> ---
>
> Key: KAFKA-6325
> URL: https://issues.apache.org/jira/browse/KAFKA-6325
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Erik Scheuter
> Attachments: FlushTest.java
>
>
> Reading the javadoc of the flush() method we assumed an exception would've 
> been thrown when an error occurs. This would make the code more 
> understandable as we don't have to return a list of futures if we want to 
> send multiple records to kafka and eventually call future.get().
> When send() is called, the metadata is retrieved and send is blocked on this 
> process. When this process fails (no brokers) an FutureFailure is returned. 
> When you just flush; no exceptions will be thrown (in contrast to 
> future.get()). Ofcourse you can implement callbacks in the send method.
> I think there are two solutions:
> * Change flush() (& doSend()) and throw exceptions
> * Change the javadoc and describe the scenario you can lose events because no 
> exceptions are thrown and the events are not sent.
> I added an unittest to show the behaviour. Kafka doesn't have to be available 
> for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread wei liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei liu updated KAFKA-6326:
---
Comment: was deleted

(was: Hi [~China body] 
{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}


{code:xml}



{code}
{panel})

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread wei liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987
 ] 

wei liu edited comment on KAFKA-6326 at 12/7/17 3:19 PM:
-

Hi [~China body] 
{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}


{code:xml}



{code}
{panel}


was (Author: liuweiwell):
{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}


{code:xml}



{code}
{panel}

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread wei liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987
 ] 

wei liu edited comment on KAFKA-6326 at 12/7/17 3:18 PM:
-

{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}


{code:xml}



{code}
{panel}


was (Author: liuweiwell):
{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}
{panel}

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread wei liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281987#comment-16281987
 ] 

wei liu commented on KAFKA-6326:


{panel:title=My SQL 
code|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
{code:}
select * from table;
{code}
{panel}

> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout 
> --
>
> Key: KAFKA-6326
> URL: https://issues.apache.org/jira/browse/KAFKA-6326
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 1.0.0
>Reporter: HongLiang
> Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
> fast-recver-shutdownbroker.diff
>
>
> when broker is unavailable(such as broker's machine is down), controller will 
> wait 30 sec timeout by dedault. it seems to be that the timeout waiting is 
> not necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6326) when broker is unavailable(such as broker's machine is down), controller will wait 30 sec timeout

2017-12-07 Thread HongLiang (JIRA)
HongLiang created KAFKA-6326:


 Summary: when broker is unavailable(such as broker's machine is 
down), controller will wait 30 sec timeout 
 Key: KAFKA-6326
 URL: https://issues.apache.org/jira/browse/KAFKA-6326
 Project: Kafka
  Issue Type: Bug
  Components: controller
Affects Versions: 1.0.0
Reporter: HongLiang
 Attachments: _f7d1d2b4-39ae-4e02-8519-99bcba849668.png, 
fast-recver-shutdownbroker.diff

when broker is unavailable(such as broker's machine is down), controller will 
wait 30 sec timeout by dedault. it seems to be that the timeout waiting is not 
necessary. It will be increase the MTTR of dead broker .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6325) Producer.flush() doesn't throw exception on timeout

2017-12-07 Thread Erik Scheuter (JIRA)
Erik Scheuter created KAFKA-6325:


 Summary: Producer.flush() doesn't throw exception on timeout
 Key: KAFKA-6325
 URL: https://issues.apache.org/jira/browse/KAFKA-6325
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Reporter: Erik Scheuter
 Attachments: FlushTest.java

Reading the javadoc of the flush() method we assumed an exception would've been 
thrown when an error occurs. This would make the code more understandable as we 
don't have to return a list of futures if we want to send multiple records to 
kafka and eventually call future.get().

When send() is called, the metadata is retrieved and send is blocked on this 
process. When this process fails (no brokers) an FutureFailure is returned. 

When you just flush; no exceptions will be thrown (in contrast to 
future.get()). Ofcourse you can implement callbacks in the send method.

I think there are two solutions:
* Change flush() (& doSend()) and throw exceptions
* Change the javadoc and describe the scenario you can lose events because no 
exceptions are thrown and the events are not sent.

I added an unittest to show the behaviour. Kafka doesn't have to be available 
for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-3240) Replication issues

2017-12-07 Thread Sergey AKhmatov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281779#comment-16281779
 ] 

Sergey AKhmatov commented on KAFKA-3240:


Happens to me as well.
kafka 1.0.0
3-node setup on FreeBSD-11.1
openjdk8-8.152.16

Sometimes .log file becomes corrupted on random topics and partitions.
hexdump shows zeroes in a place where record could be.

Can't figure out how to reproduce.

> Replication issues
> --
>
> Key: KAFKA-3240
> URL: https://issues.apache.org/jira/browse/KAFKA-3240
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1
> Environment: FreeBSD 10.2-RELEASE-p9
>Reporter: Jan Omar
>  Labels: reliability
>
> Hi,
> We are trying to replace our 3-broker cluster running on 0.6 with a new 
> cluster on 0.9.0.1 (but tried 0.8.2.2 and 0.9.0.0 as well).
> - 3 kafka nodes with one zookeeper instance on each machine
> - FreeBSD 10.2 p9
> - Nagle off (sysctl net.inet.tcp.delayed_ack=0)
> - all kafka machines write a ZFS ZIL to a dedicated SSD
> - 3 producers on 3 machines, writing to 1 topics, partitioning 3, replication 
> factor 3
> - acks all
> - 10 Gigabit Ethernet, all machines on one switch, ping 0.05 ms worst case.
> While using the ProducerPerformance or rdkafka_performance we are seeing very 
> strange Replication errors. Any hint on what's going on would be highly 
> appreciated. Any suggestion on how to debug this properly would help as well.
> This is what our broker config looks like:
> {code}
> broker.id=5
> auto.create.topics.enable=false
> delete.topic.enable=true
> listeners=PLAINTEXT://:9092
> port=9092
> host.name=kafka-five.acc
> advertised.host.name=10.5.3.18
> zookeeper.connect=zookeeper-four.acc:2181,zookeeper-five.acc:2181,zookeeper-six.acc:2181
> zookeeper.connection.timeout.ms=6000
> num.replica.fetchers=1
> replica.fetch.max.bytes=1
> replica.fetch.wait.max.ms=500
> replica.high.watermark.checkpoint.interval.ms=5000
> replica.socket.timeout.ms=30
> replica.socket.receive.buffer.bytes=65536
> replica.lag.time.max.ms=1000
> min.insync.replicas=2
> controller.socket.timeout.ms=3
> controller.message.queue.size=100
> log.dirs=/var/db/kafka
> num.partitions=8
> message.max.bytes=1
> auto.create.topics.enable=false
> log.index.interval.bytes=4096
> log.index.size.max.bytes=10485760
> log.retention.hours=168
> log.flush.interval.ms=1
> log.flush.interval.messages=2
> log.flush.scheduler.interval.ms=2000
> log.roll.hours=168
> log.retention.check.interval.ms=30
> log.segment.bytes=536870912
> zookeeper.connection.timeout.ms=100
> zookeeper.sync.time.ms=5000
> num.io.threads=8
> num.network.threads=4
> socket.request.max.bytes=104857600
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
> queued.max.requests=10
> fetch.purgatory.purge.interval.requests=100
> producer.purgatory.purge.interval.requests=100
> replica.lag.max.messages=1000
> {code}
> These are the errors we're seeing:
> {code:borderStyle=solid}
> ERROR [Replica Manager on Broker 5]: Error processing fetch operation on 
> partition [test,0] offset 50727 (kafka.server.ReplicaManager)
> java.lang.IllegalStateException: Invalid message size: 0
>   at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:141)
>   at kafka.log.LogSegment.translateOffset(LogSegment.scala:105)
>   at kafka.log.LogSegment.read(LogSegment.scala:126)
>   at kafka.log.Log.read(Log.scala:506)
>   at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:536)
>   at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:507)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:507)
>   at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:462)
>   at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:431)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:69)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:745)0
> {code}
> and 
> {code}
> ERROR Found invalid messages during fetch for partition [test,0] offset 2732 
> error Message found with corrupt size (0) in shallow iterator 
> (kafka

[jira] [Commented] (KAFKA-1996) Scaladoc error: unknown tag parameter

2017-12-07 Thread Andy Doddington (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281760#comment-16281760
 ] 

Andy Doddington commented on KAFKA-1996:


I am getting this error on scala docs that I have added to a method in a trait. 
Curiously, the error only appears on the third @param definition in the method 
header (third out of five - the other three have no errors). I am using 
IntelliJ IDEA CE 2017.3 Build IC-173.3727.127.

> Scaladoc error: unknown tag parameter
> -
>
> Key: KAFKA-1996
> URL: https://issues.apache.org/jira/browse/KAFKA-1996
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Yaguo Zhou
>Assignee: Yaguo Zhou
>Priority: Minor
>  Labels: doc
> Attachments: scala-doc-unknown-tag-parameter.patch
>
>
> There are some scala doc error: unknown tag parameter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6324) Change LogSegment.delete to deleteIfExists and harden log recovery

2017-12-07 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-6324:
--

 Summary: Change LogSegment.delete to deleteIfExists and harden log 
recovery
 Key: KAFKA-6324
 URL: https://issues.apache.org/jira/browse/KAFKA-6324
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma
 Fix For: 1.1.0


Fixes KAFKA-6194 and makes the code more robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5272) Improve validation for Alter Configs (KIP-133)

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-5272:
--

Assignee: (was: Ismael Juma)

> Improve validation for Alter Configs (KIP-133)
> --
>
> Key: KAFKA-5272
> URL: https://issues.apache.org/jira/browse/KAFKA-5272
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
> Fix For: 1.1.0
>
>
> TopicConfigHandler.processConfigChanges() warns about certain topic configs. 
> We should include such validations in alter configs and reject the change if 
> the validation fails. Note that this should be done without changing the 
> behaviour of the ConfigCommand (as it does not have access to the broker 
> configs).
> We should consider adding other validations like KAFKA-4092 and KAFKA-4680.
> Finally, the AlterConfigsPolicy mentioned in KIP-133 will be added at the 
> same time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5276) Support derived and prefixed configs in DescribeConfigs (KIP-133)

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-5276:
--

Assignee: (was: Ismael Juma)

> Support derived and prefixed configs in DescribeConfigs (KIP-133)
> -
>
> Key: KAFKA-5276
> URL: https://issues.apache.org/jira/browse/KAFKA-5276
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
> Fix For: 1.1.0
>
>
> The broker supports config overrides per listener. The way we do that is by 
> prefixing the configs with the listener name. These configs are not defined 
> by ConfigDef and they don't appear in `values()`. They do appear in 
> `originals()`. We should change the code so that we return these configs. 
> Because these configs are read-only, nothing needs to be done for 
> AlterConfigs.
> With regards to derived configs, an example is advertised.listeners, which 
> falls back to listeners. This is currently done outside AbstractConfig. We 
> should look into including these into AbstractConfig so that the fallback 
> happens for the returned configs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-3665) Default ssl.endpoint.identification.algorithm should be https

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3665:
--

Assignee: (was: Ismael Juma)

> Default ssl.endpoint.identification.algorithm should be https
> -
>
> Key: KAFKA-3665
> URL: https://issues.apache.org/jira/browse/KAFKA-3665
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Ismael Juma
>
> The default `ssl.endpoint.identification.algorithm` is `null` which is not a 
> secure default (man in the middle attacks are possible).
> We should probably use `https` instead. A more conservative alternative would 
> be to update the documentation instead of changing the default.
> A paper on the topic (thanks to Ryan Pridgeon for the reference): 
> http://www.cs.utexas.edu/~shmat/shmat_ccs12.pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-4680) min.insync.replicas can be set higher than replication factor

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-4680:
--

Assignee: (was: Ismael Juma)

> min.insync.replicas can be set higher than replication factor
> -
>
> Key: KAFKA-4680
> URL: https://issues.apache.org/jira/browse/KAFKA-4680
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: James Cheng
>
> It is possible to specify a min.insync.replicas for a topic that is higher 
> than the replication factor of the topic. If you do this, you will not be 
> able to produce to the topic with acks=all.
> Furthermore, each produce request (including retries) to the topic will emit 
> an ERROR level message to the broker debuglogs. If this is not noticed 
> quickly enough, it can cause the debuglogs to balloon.
> We actually hosed one of our Kafka clusters because of this. A topic got 
> configured with min.insync.replicas > replication factor. It had partitions 
> on all brokers of our cluster. The broker logs ballooned and filled up the 
> disks. We run these clusters on CoreOS, and CoreOS's etcd database got 
> corrupted. (Kafka didn't get corrupted, tho).
> I think Kafka should do validation when someone tries to change a topic to 
> min.insync.replicas > replication factor, and reject the change.
> This would presumably affect kafka-topics.sh, kafka-configs.sh, as well as 
> the CreateTopics operation that came in KIP-4.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-4637) Update system test(s) to use multiple listeners for the same security protocol

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-4637:
--

Assignee: (was: Ismael Juma)

> Update system test(s) to use multiple listeners for the same security protocol
> --
>
> Key: KAFKA-4637
> URL: https://issues.apache.org/jira/browse/KAFKA-4637
> Project: Kafka
>  Issue Type: Test
>Reporter: Ismael Juma
>  Labels: newbie
>
> Even though this is tested via the JUnit tests introduced by KAFKA-4565, it 
> would be good to have at least one system test exercising this functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-6319) kafka-acls regression for comma characters (and maybe other characters as well)

2017-12-07 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-6319:
-

Assignee: Rajini Sivaram

> kafka-acls regression for comma characters (and maybe other characters as 
> well)
> ---
>
> Key: KAFKA-6319
> URL: https://issues.apache.org/jira/browse/KAFKA-6319
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 1.0.0
> Environment: Debian 8. Java 8. SSL clients.
>Reporter: Jordan Mcmillan
>Assignee: Rajini Sivaram
>  Labels: regression
> Fix For: 1.1.0
>
>
> As of version 1.0.0, kafka-acls.sh no longer recognizes my ACLs stored in 
> zookeeper. I am using SSL and the default principle builder class. My 
> principle name contains a comma. Ex:
> "CN=myhost.mycompany.com,OU=MyOU,O=MyCompany, Inc.,ST=MyST,C=US"
> The default principle builder uses the getName() function in 
> javax.security.auth.x500:
> https://docs.oracle.com/javase/8/docs/api/javax/security/auth/x500/X500Principal.html#getName
> The documentation states "The only characters in attribute values that are 
> escaped are those which section 2.4 of RFC 2253 states must be escaped". This 
> makes sense as my kakfa-authorizor log shows the comma correctly escaped with 
> a backslash:
> INFO Principal = User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, 
> Inc.,ST=MyST,C=US is Denied Operation = Describe from host = 1.2.3.4 on 
> resource = Topic:mytopic (kafka.authorizer.logger)
> Here's what I get when I try to create the ACL in kafka 1.0:
> {code:java}
> > # kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add 
> > --allow-principal User:"CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, 
> > Inc.,ST=MyST,C=US" --operation "Describe" --allow-host "*" --topic="mytopic"
> Adding ACLs for resource `Topic:mytopic`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> Current ACLs for resource `Topic:mytopic`:
>  "User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US has 
> Allow permission for operations: Describe from hosts: *">
> {code}
> Examining Zookeeper, I can see the data. Though I notice that the json string 
> for ACLs is not actually valid since the backslash is not escaped with a 
> double backslash. This was true for 0.11.0.1 as well, but was never actually 
> a problem.
> {code:java}
> > #  zk-shell localhost:2181
> Welcome to zk-shell (1.1.1)
> (CLOSED) />
> (CONNECTED) /> get /kafka-acl/Topic/mytopic
> {"version":1,"acls":[{"principal":"User:CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\,
>  
> Inc.,ST=MyST,C=US","permissionType":"Allow","operation":"Describe","host":"*"}]}
> (CONNECTED) /> json_get /kafka-acl/Topic/mytopic acls
> Path /kafka-acl/Topic/mytopic has bad JSON.
> {code}
> Now Kafka does not recognize any ACLs that have an escaped comma in the 
> principle name and all the clients are denied access. I tried searching for 
> anything relevant that changed between 0.11.0.1 and 1.0.0 and I noticed 
> KAFKA-1595:
> https://github.com/apache/kafka/commit/8b14e11743360a711b2bb670cf503acc0e604602#diff-db89a14f2c85068b1f0076d52e590d05
> Could the new json library be failing because the acl is not actually a valid 
> json string? 
> I can store a valid json string with an escaped backslash (ex: containing 
> "O=MyCompany\\, Inc."), and the comparison between the principle builder 
> string, and what is read from zookeeper succeeds. However, successively apply 
> ACLs seems to strip the backslashes and generally corrupts things:
> {code:java}
> > #  kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 
> > --add --allow-principal 
> > User:"CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\\, Inc.,ST=MyST,C=US" 
> > --operation Describe --group="*" --allow-host "*" --topic="mytopic"
> Adding ACLs for resource `Topic:mytopic`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> Adding ACLs for resource `Group:*`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> Current ACLs for resource `Topic:mytopic`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> Current ACLs for resource `Group:*`:
>  User:CN=CN=myhost.mycompany.com,OU=MyOU,O=MyCompany\, Inc.,ST=MyST,C=US 
> has Allow permission for operations: Describe from hosts: *
> {code}
> It looks as though the backslash used for escaping RFC 2253 strings is not 
> being handled correctly. That's as far as I've dug.


[jira] [Resolved] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-6313.

Resolution: Fixed

> Kafka Core should have explicit SLF4J API dependency
> 
>
> Key: KAFKA-6313
> URL: https://issues.apache.org/jira/browse/KAFKA-6313
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
> Fix For: 1.1.0
>
>
> In an application that depends on the Kafka server artifacts with:
> {code:xml}
>   
>   org.apache.kafka
>   kafka_2.11
>   1.1.0-SNAPSHOT
>   
> {code}
> and then running the server programmatically, the following error occurs:
> {noformat}
> [2017-11-23 01:01:45,029] INFO Shutting down producer 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,051] INFO Closing all sync producers 
> (kafka.producer.ProducerPool:63)
> [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down 
> (kafka.server.KafkaServer:63)
> [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer shutdown. (kafka.server.KafkaServer:161)
> java.lang.NoClassDefFoundError: org/slf4j/event/Level
>   at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520)
>...
> Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 25 more
> {noformat}
> It appears that KAFKA-1044 and [this 
> PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from 
> Core but [added use 
> of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34]
>  the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The 
> {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, 
> which is currently not included in the dependencies of 
> {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the 
> server, the SLF4J API library probably needs to be added to the dependencies.
> [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that 
> the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just 
> because this probably needs to be sorted out before the release.
> *Update*: As the comments below explain, the actual problem is a bit 
> different to what is in the JIRA description.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-6313:
---
Description: 
In an application that depends on the Kafka server artifacts with:

{code:xml}

org.apache.kafka
kafka_2.11
1.1.0-SNAPSHOT

{code}

and then running the server programmatically, the following error occurs:

{noformat}
[2017-11-23 01:01:45,029] INFO Shutting down producer 
(kafka.producer.Producer:63)
[2017-11-23 01:01:45,051] INFO Closing all sync producers 
(kafka.producer.ProducerPool:63)
[2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms 
(kafka.producer.Producer:63)
[2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down 
(kafka.server.KafkaServer:63)
[2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during 
KafkaServer shutdown. (kafka.server.KafkaServer:161)
java.lang.NoClassDefFoundError: org/slf4j/event/Level
at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520)
   ...
Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level
at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 25 more
{noformat}

It appears that KAFKA-1044 and [this 
PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from 
Core but [added use 
of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34]
 the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The 
{{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, 
which is currently not included in the dependencies of 
{{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the 
server, the SLF4J API library probably needs to be added to the dependencies.

[~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that the 
SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just 
because this probably needs to be sorted out before the release.

*Update*: As the comments below explain, the actual problem is a bit different 
to what is in the JIRA description.

  was:
In an application that depends on the Kafka server artifacts with:

{code:xml}

org.apache.kafka
kafka_2.11
1.1.0-SNAPSHOT

{code}

and then running the server programmatically, the following error occurs:

{noformat}
[2017-11-23 01:01:45,029] INFO Shutting down producer 
(kafka.producer.Producer:63)
[2017-11-23 01:01:45,051] INFO Closing all sync producers 
(kafka.producer.ProducerPool:63)
[2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms 
(kafka.producer.Producer:63)
[2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down 
(kafka.server.KafkaServer:63)
[2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during 
KafkaServer shutdown. (kafka.server.KafkaServer:161)
java.lang.NoClassDefFoundError: org/slf4j/event/Level
at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520)
   ...
Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level
at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 25 more
{noformat}

It appears that KAFKA-1044 and [this 
PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from 
Core but [added use 
of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34]
 the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The 
{{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, 
which is currently not included in the dependencies of 
{{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the 
server, the SLF4J API library probably needs to be added to the dependencies.

[~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that the 
SLF4J API be marked

[jira] [Updated] (KAFKA-6313) Kafka Core should have explicit SLF4J API dependency

2017-12-07 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-6313:
---
Summary: Kafka Core should have explicit SLF4J API dependency  (was: Kafka 
Core maven dependencies are missing SLF4J API)

> Kafka Core should have explicit SLF4J API dependency
> 
>
> Key: KAFKA-6313
> URL: https://issues.apache.org/jira/browse/KAFKA-6313
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
> Fix For: 1.1.0
>
>
> In an application that depends on the Kafka server artifacts with:
> {code:xml}
>   
>   org.apache.kafka
>   kafka_2.11
>   1.1.0-SNAPSHOT
>   
> {code}
> and then running the server programmatically, the following error occurs:
> {noformat}
> [2017-11-23 01:01:45,029] INFO Shutting down producer 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,051] INFO Closing all sync producers 
> (kafka.producer.ProducerPool:63)
> [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down 
> (kafka.server.KafkaServer:63)
> [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer shutdown. (kafka.server.KafkaServer:161)
> java.lang.NoClassDefFoundError: org/slf4j/event/Level
>   at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520)
>...
> Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 25 more
> {noformat}
> It appears that KAFKA-1044 and [this 
> PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from 
> Core but [added use 
> of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34]
>  the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The 
> {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, 
> which is currently not included in the dependencies of 
> {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the 
> server, the SLF4J API library probably needs to be added to the dependencies.
> [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that 
> the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just 
> because this probably needs to be sorted out before the release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6313) Kafka Core maven dependencies are missing SLF4J API

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281499#comment-16281499
 ] 

ASF GitHub Bot commented on KAFKA-6313:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4296


> Kafka Core maven dependencies are missing SLF4J API
> ---
>
> Key: KAFKA-6313
> URL: https://issues.apache.org/jira/browse/KAFKA-6313
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
> Fix For: 1.1.0
>
>
> In an application that depends on the Kafka server artifacts with:
> {code:xml}
>   
>   org.apache.kafka
>   kafka_2.11
>   1.1.0-SNAPSHOT
>   
> {code}
> and then running the server programmatically, the following error occurs:
> {noformat}
> [2017-11-23 01:01:45,029] INFO Shutting down producer 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,051] INFO Closing all sync producers 
> (kafka.producer.ProducerPool:63)
> [2017-11-23 01:01:45,052] INFO Producer shutdown completed in 23 ms 
> (kafka.producer.Producer:63)
> [2017-11-23 01:01:45,052] INFO [KafkaServer id=1] shutting down 
> (kafka.server.KafkaServer:63)
> [2017-11-23 01:01:45,057] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer shutdown. (kafka.server.KafkaServer:161)
> java.lang.NoClassDefFoundError: org/slf4j/event/Level
>   at kafka.utils.CoreUtils$.swallow$default$3(CoreUtils.scala:83)
>   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:520)
>...
> Caused by: java.lang.ClassNotFoundException: org.slf4j.event.Level
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:312)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 25 more
> {noformat}
> It appears that KAFKA-1044 and [this 
> PR|https://github.com/apache/kafka/pull/3477] removed the use of Log4J from 
> Core but [added use 
> of|https://github.com/confluentinc/kafka/commit/ed8b0315a6c3705b2a163ce3ab4723234779264f#diff-52505b9374ea885e44bcb73cbc4714d6R34]
>  the {{org.slf4j.event.Level}} in {{CoreUtils.scala}}. The 
> {{org.slf4j.event.Level}} class is in the {{org.slf4j:slf4j-api}} artifact, 
> which is currently not included in the dependencies of 
> {{org.apache.kafka:kafka_2.11:1.1.0-SNAPSHOT}}. Because this is needed by the 
> server, the SLF4J API library probably needs to be added to the dependencies.
> [~viktorsomogyi] and [~ijuma], was this intentional, or is it intended that 
> the SLF4J API be marked as {{provided}}? BTW, I marked this as CRITICAL just 
> because this probably needs to be sorted out before the release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6323) punctuate with WALL_CLOCK_TIME triggered immediately

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281498#comment-16281498
 ] 

ASF GitHub Bot commented on KAFKA-6323:
---

GitHub user fredfp opened a pull request:

https://github.com/apache/kafka/pull/4301

KAFKA-6323: punctuate with WALL_CLOCK_TIME triggered immediately

This is the only way I found to fix the issue without altering the API.

@mihbor @mjsax 

the contribution is my original work and I license the work to the project 
under the project's open source license

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fredfp/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4301


commit 0c9b6cac6a5de8e6db81e6ae6f42fe8933012621
Author: Frederic Arno 
Date:   2017-12-07T08:18:42Z

KAFKA-6323: fix punctuate with WALL_CLOCK_TIME triggered immediately




> punctuate with WALL_CLOCK_TIME triggered immediately
> 
>
> Key: KAFKA-6323
> URL: https://issues.apache.org/jira/browse/KAFKA-6323
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Frederic Arno
> Fix For: 1.0.1
>
>
> When working on a custom Processor from which I am scheduling a punctuation 
> using WALL_CLOCK_TIME. I've noticed that whatever the punctuation interval I 
> set, a call to my Punctuator is always triggered immediately.
> Having a quick look at kafka-streams' code, I could find that all 
> PunctuationSchedule's timestamps are matched against the current time in 
> order to decide whether or not to trigger the punctuator 
> (org.apache.kafka.streams.processor.internals.PunctuationQueue#mayPunctuate). 
> However, I've only seen code that initializes PunctuationSchedule's timestamp 
> to 0, which I guess is what is causing an immediate punctuation.
> At least when using WALL_CLOCK_TIME, shouldn't the PunctuationSchedule's 
> timestamp be initialized to current time + interval?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2017-12-07 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281484#comment-16281484
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-5882:
---

Hmm... I will try to retest all together with fix for KAFKA-6260.

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5882) NullPointerException in StreamTask

2017-12-07 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272747#comment-16272747
 ] 

Seweryn Habdank-Wojewodzki edited comment on KAFKA-5882 at 12/7/17 8:09 AM:


[~mjsax] In meanwhile I had ported the code to {{1.0.0}} :-). I will try do my 
best.


was (Author: habdank):
[~mjsax] In mean while I had ported the code to {{1.0.0}} :-). I will try do my 
best.

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception

2017-12-07 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281481#comment-16281481
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-6260:
---

Thanks a lot!

When releases will be ready, I will test them.
I am not sure if and how I can get lib earlier before they are relased :-).

> AbstractCoordinator not clearly handles NULL Exception
> --
>
> Key: KAFKA-6260
> URL: https://issues.apache.org/jira/browse/KAFKA-6260
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RedHat Linux
>Reporter: Seweryn Habdank-Wojewodzki
>Assignee: Jason Gustafson
> Fix For: 1.1.0, 1.0.1
>
>
> The error reporting is not clear. But it seems that Kafka Heartbeat shuts 
> down application due to NULL exception caused by "fake" disconnections.
> One more comment. We are processing messages in the stream, but sometimes we 
> have to block processing for minutes, as consumers are not handling too much 
> load. Is it possibble that when stream is waiting, then heartbeat is as well 
> blocked?
> Can you check that?
> {code}
> 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending Heartbeat request to coordinator 
> cljp01.eb.lan.at:9093 (id: 2147483646 rack: null)
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending HEARTBEAT 
> {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08}
>  with correlation id 24 to node 2147483646
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT 
> with correlation id 24, received {throttle_time_ms=0,error_code=0}
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout.
> 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled request 
> {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]}
>  with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, 
> apiVersion=6, 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  correlationId=21) with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Fetch request 
> {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, 
> maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed 
> org.apache.kafka.common.errors.DisconnectException: null
> 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  gro