date:20171117


 [ 
https://issues.apache.org/jira/browse/KAFKA-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-5721.
---
   Resolution: Fixed
Fix Version/s: 0.11.0.2

The specific deadlock in the stack trace in this JIRA was fixed under 
KAFKA-3994 (0.10.2.0). There have been a couple of other deadlocks which have 
been fixed since. The two releases that don't have any known deadlocks are 
1.0.0 and 0.11.0.2.

> Kafka broker stops after network failure
> 
>
> Key: KAFKA-5721
> URL: https://issues.apache.org/jira/browse/KAFKA-5721
> Project: Kafka
>  Issue Type: Bug
>Reporter: Val Feron
> Fix For: 0.11.0.2
>
> Attachments: thread_dump_kafka.out
>
>
> +Cluster description+
> * 3 brokers
> * version 0.10.1.1
> * running on AWS
> +Description+
> The following will happen at random intervals, on random brokers
> From the logs here is the information I could gather :
> # Shrinking Intra-cluster replication on a random broker (I suppose it could 
> be a temporary network failure but couldn't produce evidence of it)
> # System starts showing close to no activity @02:27:20 (note that it's not 
> load related as it happens at very quiet times)
> !https://i.stack.imgur.com/g1Pem.png!
> # From there, this kafka broker doesn't process messages which is expected 
> IMO as it dropped out of the cluster replication.
> # Now the real issue appears as the number of connections in CLOSE_WAIT is 
> constantly increasing until it reaches the configured ulimit of the 
> system/process, ending up crashing the kafka process.
> Now, I've been changing limits to see if kafka would eventually join again 
> the ISR before crashing but even with a limit that's very high, kafka just 
> seems stuck in a weird state and never recovers.
> Note that between the time when the faulty broker is on its own and the time 
> it crashes, kafka is listening and kafka producer.
> For this single crash, I could see 320 errors like this from the producers :
> {code}java.util.concurrent.ExecutionException: 
> org.springframework.kafka.core.KafkaProducerException: Failed to send; nested 
> exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> 
> The configuration being the default one and the use being quite standard, I'm 
> wondering if I missed something.
> I put in place a script that check the number of kafka file descriptors and 
> restarts the service when it gets abnormally high, which does the trick for 
> now but I still lose messages when it crashes.
> I'm available to make any verification / test you need.
> Could see some similarities with ticket KAFKA-5007 too.
> cc [~junrao]
> Confirming the logs seen at the time of failure :
> {code}
>  /var/log/kafka/kafkaServer.out:[2017-08-09 22:13:29,045] WARN 
> [ReplicaFetcherThread-0-2], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@4a63f79 
> (kafka.server.ReplicaFetcherThread)
> /var/log/kafka/kafkaServer.out-java.io.IOException: Connection to 2 was 
> disconnected before the response was read
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> /var/log/kafka/kafkaServer.out-   at 
> scala.Option.foreach(Option.scala:257)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.AbstractFetcherThread.pro

[jira] [Commented] (KAFKA-3821) Allow Kafka Connect source tasks to produce offset without writing to topics

2017-11-17 Thread Gunnar Morling (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257642#comment-16257642
 ] 

Gunnar Morling commented on KAFKA-3821:
---

Hi [~rhauch], seeing that this one is assigned for 1.1.0. Have you created a 
KIP for it already, or plans to do so soon? I like the idea of the dedicated 
{{OffsetRecord}}.

> Allow Kafka Connect source tasks to produce offset without writing to topics
> 
>
> Key: KAFKA-3821
> URL: https://issues.apache.org/jira/browse/KAFKA-3821
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.1
>Reporter: Randall Hauch
>  Labels: needs-kip
> Fix For: 1.1.0
>
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for 
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown 
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a 
> database). Once the task completes those records, the connector wants to 
> update the offsets (e.g., the snapshot is complete) but has no more records 
> to be written to a topic. With this change, the task could simply supply an 
> updated offset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-6230) KafkaConsumer.poll(0) should not block forever if coordinator is not available

2017-11-17 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6230.

Resolution: Duplicate

> KafkaConsumer.poll(0) should not block forever if coordinator is not available
> --
>
> Key: KAFKA-6230
> URL: https://issues.apache.org/jira/browse/KAFKA-6230
> Project: Kafka
>  Issue Type: Bug
>Reporter: Dong Lin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257556#comment-16257556
 ] 

Sebb commented on KAFKA-6223:
-

Why are 0.11.0.2 and 0.10.2.1 still needed?

AFAICT 0.11.0.2 serves the same Scala versions as 1.0.
Unless there will be further development of the 0.11.x.y line - or it is needed 
to support a particular version of Scala - why should anyone be using it in 
preference to 1.0?


> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257536#comment-16257536
 ] 

Rajini Sivaram commented on KAFKA-6223:
---

I have updated 0.9.0.1 and 0.8.2.2 to use archives. So now only 1.0.0, 0.11.0.2 
and 0.10.2.1 use mirrors.
I wasn't sure of recommended releases, so haven't done that.

> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-6231) Download page must link to source artifact


 [ 
https://issues.apache.org/jira/browse/KAFKA-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb resolved KAFKA-6231.
-
Resolution: Invalid

Doh - sorry, wrong project - there are source links on Kafka

> Download page must link to source artifact
> --
>
> Key: KAFKA-6231
> URL: https://issues.apache.org/jira/browse/KAFKA-6231
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sebb
>
> The ASF mission is the release of open source software.
> As such, the download page must include a link to the source distribution.
> It may also include a link to binary distribution(s),



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6231) Download page must link to source artifact

Sebb created KAFKA-6231:
---

 Summary: Download page must link to source artifact
 Key: KAFKA-6231
 URL: https://issues.apache.org/jira/browse/KAFKA-6231
 Project: Kafka
  Issue Type: Bug
Reporter: Sebb


The ASF mission is the release of open source software.
As such, the download page must include a link to the source distribution.
It may also include a link to binary distribution(s),



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257474#comment-16257474
 ] 

Sebb commented on KAFKA-6223:
-

The download page has a lot of releases on it, and it is not obvious which 
releases are considered the best available.

I assume 1.0 should be used for Scala 2.11 and 2.12, but otherwise it's quite 
hard to find which release to use.
It might be useful to add a table of links for each version of Scala.

> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257465#comment-16257465
 ] 

Ismael Juma edited comment on KAFKA-6223 at 11/17/17 7:39 PM:
--

Thanks [~rsivaram], I think we can change all pre 0.10 releases to use the 
archive.


was (Author: ijuma):
Thanks @rajini

> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257465#comment-16257465
 ] 

Ismael Juma commented on KAFKA-6223:


Thanks @rajini

> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6223) Please delete old releases from mirroring system


[ 
https://issues.apache.org/jira/browse/KAFKA-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257449#comment-16257449
 ] 

Rajini Sivaram commented on KAFKA-6223:
---

[~ijuma] I have updated the downloads page to use the archive link for older 
releases of each major version. So the releases still pointing to mirrors are 
(1.0.0, 0.11.0.2, 0.10.2.1, 0.9.0.1 and 0.8.2.2). I wasn't sure if we wanted to 
delete any of these.

> Please delete old releases from mirroring system
> 
>
> Key: KAFKA-6223
> URL: https://issues.apache.org/jira/browse/KAFKA-6223
> Project: Kafka
>  Issue Type: Bug
> Environment: https://dist.apache.org/repos/dist/release/kafka/
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> To reduce the load on the ASF mirrors, projects are required to delete old 
> releases [1]
> Please can you remove all non-current releases?
> It's unfair to expect the 3rd party mirrors to carry old releases.
> Note that older releases can still be linked from the download page, but such 
> links should use the archive server at:
> https://archive.apache.org/dist/kafka/
> A suggested process is:
> + Change the download page to use archive.a.o for old releases
> + Delete the corresponding directories from 
> {{https://dist.apache.org/repos/dist/release/kafka/}}
> e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}
> Thanks!
> [1] http://www.apache.org/dev/release.html#when-to-archive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-6222) Download page must not link to dist.apache.org


 [ 
https://issues.apache.org/jira/browse/KAFKA-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-6222.
---
   Resolution: Fixed
Fix Version/s: 0.11.0.2

Updated along with the 0.11.0.2 changes.

> Download page must not link to dist.apache.org
> --
>
> Key: KAFKA-6222
> URL: https://issues.apache.org/jira/browse/KAFKA-6222
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Rajini Sivaram
> Fix For: 0.11.0.2
>
>
> The download page currently links to dist.apache.org for sigs and hashes.
> However that host is only intended as a development staging host; it is not 
> intended for general downloads.
> Links to hashes and sigs must use https://www.apache.org/dist/kafka
> Note that the KEYS file should really use https: as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6222) Download page must not link to dist.apache.org


 [ 
https://issues.apache.org/jira/browse/KAFKA-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-6222:
--
Fix Version/s: (was: 0.11.0.2)

> Download page must not link to dist.apache.org
> --
>
> Key: KAFKA-6222
> URL: https://issues.apache.org/jira/browse/KAFKA-6222
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sebb
>Assignee: Rajini Sivaram
>
> The download page currently links to dist.apache.org for sigs and hashes.
> However that host is only intended as a development staging host; it is not 
> intended for general downloads.
> Links to hashes and sigs must use https://www.apache.org/dist/kafka
> Note that the KEYS file should really use https: as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-3554) Generate actual data with specific compression ratio and add multi-thread support in the ProducerPerformance tool.

2017-11-17 Thread Jiangjie Qin (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257400#comment-16257400
 ] 

Jiangjie Qin commented on KAFKA-3554:
-

[~airbots] Thanks for volunteer to help. The patch needs a rebase, again. I 
guess currently the reviewers are busy. [~ijuma] do you have time to look at 
this patch if I do a rebase. Not sure if we need a KIP for this though. 
Sometimes we submit KIPs for tooling but sometimes we don't. I am neutral on 
this one. Let me know if you prefer a KIP.

> Generate actual data with specific compression ratio and add multi-thread 
> support in the ProducerPerformance tool.
> --
>
> Key: KAFKA-3554
> URL: https://issues.apache.org/jira/browse/KAFKA-3554
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.1
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 1.1.0
>
>
> Currently the ProducerPerformance always generate the payload with same 
> bytes. This does not quite well to test the compressed data because the 
> payload is extremely compressible no matter how big the payload is.
> We can make some changes to make it more useful for compressed messages. 
> Currently I am generating the payload containing integer from a given range. 
> By adjusting the range of the integers, we can get different compression 
> ratios. 
> API wise, we can either let user to specify the integer range or the expected 
> compression ratio (we will do some probing to get the corresponding range for 
> the users)
> Besides that, in many cases, it is useful to have multiple producer threads 
> when the producer threads themselves are bottleneck. Admittedly people can 
> run multiple ProducerPerformance to achieve similar result, but it is still 
> different from the real case when people actually use the producer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6230) KafkaConsumer.poll(0) should not block forever if coordinator is not available

2017-11-17 Thread Dong Lin (JIRA)

Dong Lin created KAFKA-6230:
---

 Summary: KafkaConsumer.poll(0) should not block forever if 
coordinator is not available
 Key: KAFKA-6230
 URL: https://issues.apache.org/jira/browse/KAFKA-6230
 Project: Kafka
  Issue Type: Bug
Reporter: Dong Lin






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6048) Support negative record timestamps

2017-11-17 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257307#comment-16257307
 ] 

Matthias J. Sax commented on KAFKA-6048:


Thanks for the input. Doing the less intrusive might be a good way to go to. 
However, I disagree here:

{quote}
but you need a sentinel value anyway unless you encode the lack of timestamp in 
a flag
{quote}

An alternative might be to force people to configure {{AppendTime}} and return 
an error to a producer if it doesn't specify a timestamp and {{CreateTime}} is 
configured.

> Support negative record timestamps
> --
>
> Key: KAFKA-6048
> URL: https://issues.apache.org/jira/browse/KAFKA-6048
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: james chien
>  Labels: needs-kip
>
> Kafka does not support negative record timestamps, and this prevents the 
> storage of historical data in Kafka. In general, negative timestamps are 
> supported by UNIX system time stamps: 
> From https://en.wikipedia.org/wiki/Unix_time
> {quote}
> The Unix time number is zero at the Unix epoch, and increases by exactly 
> 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after 
> the epoch, is represented by the Unix time number 12,677 × 86,400 = 
> 1095292800. This can be extended backwards from the epoch too, using negative 
> numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is 
> represented by the Unix time number −4,472 × 86,400 = −386380800.
> {quote}
> Allowing for negative timestamps would require multiple changes:
>  - while brokers in general do support negative timestamps, broker use {{-1}} 
> as default value if a producer uses an old message format (this would not be 
> compatible with supporting negative timestamps "end-to-end" as {{-1}} cannot 
> be used as "unknown" anymore): we could introduce a message flag indicating a 
> missing timestamp (and let producer throw an exception if 
> {{ConsumerRecord#timestamp()}} is called. Another possible solution might be, 
> to require topics that are used by old producers to be configured with 
> {{LogAppendTime}} semantics and rejecting writes to topics with 
> {{CreateTime}} semantics for older message formats
>  - {{KafkaProducer}} does not allow to send records with negative timestamp 
> and thus this would need to be fixed
>  - Streams API does drop records with negative timestamps (or fails by 
> default) -- also, some internal store implementation for windowed stores 
> assume that there are not negative timestamps to do range queries
> There might be other gaps we need to address. This is just a summary of issue 
> coming to my mind atm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6229) Controller node is not switched in ZK when existing is blocked using iptables

2017-11-17 Thread Navdeep Brar (JIRA)

Navdeep Brar created KAFKA-6229:
---

 Summary: Controller node is not switched in ZK when existing is 
blocked using iptables
 Key: KAFKA-6229
 URL: https://issues.apache.org/jira/browse/KAFKA-6229
 Project: Kafka
  Issue Type: Bug
Reporter: Navdeep Brar


When controller node is blocked on network, ZK has no information that 
controller went down and doesn't switch to new controller (other broker). 

 - This only happens when connection is blocked using iptables
 - if kafka service is brought down on controller node or even killed, then ZK 
very well acknowledges it and switches to new controller




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6228) Intermittent test failure in FetchRequestTest.testDownConversionWithConnectionFailure


[ 
https://issues.apache.org/jira/browse/KAFKA-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257266#comment-16257266
 ] 

Ismael Juma commented on KAFKA-6228:


cc [~rsivaram]

> Intermittent test failure in 
> FetchRequestTest.testDownConversionWithConnectionFailure
> -
>
> Key: KAFKA-6228
> URL: https://issues.apache.org/jira/browse/KAFKA-6228
> Project: Kafka
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Minor
>
> From 
> https://builds.apache.org/job/kafka-trunk-jdk8/2219/testReport/junit/kafka.server/FetchRequestTest/testDownConversionWithConnectionFailure/
>  :
> {code}
> java.lang.AssertionError: Fetch size too small 42, broker may have run out of 
> memory
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> kafka.server.FetchRequestTest.kafka$server$FetchRequestTest$$fetch$1(FetchRequestTest.scala:214)
>   at 
> kafka.server.FetchRequestTest$$anonfun$testDownConversionWithConnectionFailure$2.apply(FetchRequestTest.scala:226)
>   at 
> kafka.server.FetchRequestTest$$anonfun$testDownConversionWithConnectionFailure$2.apply(FetchRequestTest.scala:226)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> kafka.server.FetchRequestTest.testDownConversionWithConnectionFailure(FetchRequestTest.scala:226)
> {code}
> I ran FetchRequestTest locally which passed.
> {code}
>   assertTrue(s"Fetch size too small $size, broker may have run out of 
> memory",
>   size > maxPartitionBytes - batchSize)
> {code}
> The assertion message should include maxPartitionBytes and batchSize which 
> would give us more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6048) Support negative record timestamps

2017-11-17 Thread Ewen Cheslack-Postava (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257265#comment-16257265
 ] 

Ewen Cheslack-Postava commented on KAFKA-6048:
--

A less intrusive change would be to just special case -1. It's unfortunate we 
didn't spot this when initially adding timestamps, but you need a sentinel 
value anyway unless you encode the lack of timestamp in a flag (which we don't 
have, would use a precious flag bit, and would be a more complicated change). 
Then the fix is isolated to fixing `KafkaProducer` and the Streams API and is a 
lot more minimal. Apps might need to be aware of this if they are going to use 
negative timestamps, but a single invalid timestamp value that apps need to 
deal with might be the best tradeoff here, especially since I suspect use of 
negative timestamps is probably relatively uncommon unless you're working with 
historical data.

> Support negative record timestamps
> --
>
> Key: KAFKA-6048
> URL: https://issues.apache.org/jira/browse/KAFKA-6048
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: james chien
>  Labels: needs-kip
>
> Kafka does not support negative record timestamps, and this prevents the 
> storage of historical data in Kafka. In general, negative timestamps are 
> supported by UNIX system time stamps: 
> From https://en.wikipedia.org/wiki/Unix_time
> {quote}
> The Unix time number is zero at the Unix epoch, and increases by exactly 
> 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after 
> the epoch, is represented by the Unix time number 12,677 × 86,400 = 
> 1095292800. This can be extended backwards from the epoch too, using negative 
> numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is 
> represented by the Unix time number −4,472 × 86,400 = −386380800.
> {quote}
> Allowing for negative timestamps would require multiple changes:
>  - while brokers in general do support negative timestamps, broker use {{-1}} 
> as default value if a producer uses an old message format (this would not be 
> compatible with supporting negative timestamps "end-to-end" as {{-1}} cannot 
> be used as "unknown" anymore): we could introduce a message flag indicating a 
> missing timestamp (and let producer throw an exception if 
> {{ConsumerRecord#timestamp()}} is called. Another possible solution might be, 
> to require topics that are used by old producers to be configured with 
> {{LogAppendTime}} semantics and rejecting writes to topics with 
> {{CreateTime}} semantics for older message formats
>  - {{KafkaProducer}} does not allow to send records with negative timestamp 
> and thus this would need to be fixed
>  - Streams API does drop records with negative timestamps (or fails by 
> default) -- also, some internal store implementation for windowed stores 
> assume that there are not negative timestamps to do range queries
> There might be other gaps we need to address. This is just a summary of issue 
> coming to my mind atm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6228) Intermittent test failure in FetchRequestTest.testDownConversionWithConnectionFailure

2017-11-17 Thread Ted Yu (JIRA)

Ted Yu created KAFKA-6228:
-

 Summary: Intermittent test failure in 
FetchRequestTest.testDownConversionWithConnectionFailure
 Key: KAFKA-6228
 URL: https://issues.apache.org/jira/browse/KAFKA-6228
 Project: Kafka
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/kafka-trunk-jdk8/2219/testReport/junit/kafka.server/FetchRequestTest/testDownConversionWithConnectionFailure/
> :
{code}
java.lang.AssertionError: Fetch size too small 42, broker may have run out of 
memory
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
kafka.server.FetchRequestTest.kafka$server$FetchRequestTest$$fetch$1(FetchRequestTest.scala:214)
at 
kafka.server.FetchRequestTest$$anonfun$testDownConversionWithConnectionFailure$2.apply(FetchRequestTest.scala:226)
at 
kafka.server.FetchRequestTest$$anonfun$testDownConversionWithConnectionFailure$2.apply(FetchRequestTest.scala:226)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at 
kafka.server.FetchRequestTest.testDownConversionWithConnectionFailure(FetchRequestTest.scala:226)
{code}
I ran FetchRequestTest locally which passed.
{code}
  assertTrue(s"Fetch size too small $size, broker may have run out of 
memory",
  size > maxPartitionBytes - batchSize)
{code}
The assertion message should include maxPartitionBytes and batchSize which 
would give us more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5007) Kafka Replica Fetcher Thread- Resource Leak


[ 
https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257169#comment-16257169
 ] 

Ismael Juma commented on KAFKA-5007:


0.11.0.2 includes a couple of deadlock fixes and, as you can see in 
https://issues.apache.org/jira/browse/KAFKA-5721?focusedCommentId=16257162&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16257162,
 deadlocks can cause a build-up of connections in CLOSE_WAIT status. The 
announcement is going out very soon, but you can already find it in the mirrors:

http://apache.mirror.anlx.net/kafka/0.11.0.2/

> Kafka Replica Fetcher Thread- Resource Leak
> ---
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
>  Issue Type: Bug
>  Components: core, network
>Affects Versions: 0.10.0.0, 0.10.1.1, 0.10.2.0
> Environment: Centos 7
> Jave 8
>Reporter: Joseph Aliase
>Priority: Critical
>  Labels: reliability
> Attachments: jstack-kafka.out, jstack-zoo.out, lsofkafka.txt, 
> lsofzookeeper.txt
>
>
> Kafka is running out of open file descriptor when system network interface is 
> done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
> descriptor for the account running Kafka is set to 10.
> During an upgrade, network interface went down. Outage continued for 12 hours 
> eventually all the broker crashed with java.io.IOException: Too many open 
> files error.
> We repeated the test in a lower environment and observed that Open Socket 
> count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica 
> fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor   
> for the broker pid continued to increase although NIC was down leading to  
> Open File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting


 [ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5153:
---
Labels: reliability  (was: )

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---
>
> Key: KAFKA-5153
> URL: https://issues.apache.org/jira/browse/KAFKA-5153
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0, 0.11.0.0
> Environment: RHEL 6
> Java Version  1.8.0_91-b14
>Reporter: Arpan
>Priority: Critical
>  Labels: reliability
> Attachments: ThreadDump_1493564142.dump, ThreadDump_1493564177.dump, 
> ThreadDump_1493564249.dump, server.properties, server_1_72server.log, 
> server_2_73_server.log, server_3_74Server.log
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting


[ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257165#comment-16257165
 ] 

Ismael Juma commented on KAFKA-5153:


0.11.0.2 includes additional deadlock fixes. If any of the people seeing this 
issue with 0.10.1.1 or 0.10.2.x still see it when they upgrade to 0.11.0.2, it 
would be helpful to know. Thanks.

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---
>
> Key: KAFKA-5153
> URL: https://issues.apache.org/jira/browse/KAFKA-5153
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0, 0.11.0.0
> Environment: RHEL 6
> Java Version  1.8.0_91-b14
>Reporter: Arpan
>Priority: Critical
>  Labels: reliability
> Attachments: ThreadDump_1493564142.dump, ThreadDump_1493564177.dump, 
> ThreadDump_1493564249.dump, server.properties, server_1_72server.log, 
> server_2_73_server.log, server_3_74Server.log
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6216) kafka logs for misconfigured ssl clients are unhelpful

2017-11-17 Thread radai rosenblatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

radai rosenblatt updated KAFKA-6216:

Affects Version/s: (was: 1.0.0)
   0.10.2.1

> kafka logs for misconfigured ssl clients are unhelpful
> --
>
> Key: KAFKA-6216
> URL: https://issues.apache.org/jira/browse/KAFKA-6216
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.1
>Reporter: radai rosenblatt
>
> if you misconfigure the keystores on an ssl client, you will currently get a 
> log full of these:
> {code}
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163)
>   at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731)
>   at 
> org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54)
>   at org.apache.kafka.common.network.Selector.doClose(Selector.java:540)
>   at org.apache.kafka.common.network.Selector.close(Selector.java:531)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> these are caught and printed as part of the client Selector trying to close 
> the channel after having caught an IOException (lets call that the root 
> issue).
> the root issue itself is only logged at debug, which is not on 99% of the 
> time, leaving users with very litle clues as to whats gone wrong.
> after turning on debug log, the root issue clearly indicated what the problem 
> was in our case:
> {code}
> javax.net.ssl.SSLHandshakeException: General SSLEngine problem
>   at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1409)
>   at 
> sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
>   at 
> sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
>   at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
>   at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:382)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:243)
>   at 
> org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69)
>   at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350)
>   at org.apache.kafka.common.network.Selector.poll(Selector.java:303)
>   at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
>   at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>   at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
>   at 
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478)
>   at 
> sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212)
>   at sun.security.ssl.Handshaker.processLoop(Handshaker.java:957)
>   at sun.security.ssl.Handshaker$1.run(Handshaker.java:897)
>   at sun.security.ssl.Handshaker$1.run(Handshaker.java:894)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1347)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:336)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:417)
>   at 
> org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:270)
>   ... 7 more
> Caused by: sun.security.validator.ValidatorExc

[jira] [Commented] (KAFKA-5721) Kafka broker stops after network failure


[ 
https://issues.apache.org/jira/browse/KAFKA-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257162#comment-16257162
 ] 

Ismael Juma commented on KAFKA-5721:


Your stacktrace says that a "deadlock" was found. It could be a duplicate of 
KAFKA-4477, KAFKA-4399 or KAFKA-5970. [~rsivaram], can you please check and 
close if appropriate?

> Kafka broker stops after network failure
> 
>
> Key: KAFKA-5721
> URL: https://issues.apache.org/jira/browse/KAFKA-5721
> Project: Kafka
>  Issue Type: Bug
>Reporter: Val Feron
> Attachments: thread_dump_kafka.out
>
>
> +Cluster description+
> * 3 brokers
> * version 0.10.1.1
> * running on AWS
> +Description+
> The following will happen at random intervals, on random brokers
> From the logs here is the information I could gather :
> # Shrinking Intra-cluster replication on a random broker (I suppose it could 
> be a temporary network failure but couldn't produce evidence of it)
> # System starts showing close to no activity @02:27:20 (note that it's not 
> load related as it happens at very quiet times)
> !https://i.stack.imgur.com/g1Pem.png!
> # From there, this kafka broker doesn't process messages which is expected 
> IMO as it dropped out of the cluster replication.
> # Now the real issue appears as the number of connections in CLOSE_WAIT is 
> constantly increasing until it reaches the configured ulimit of the 
> system/process, ending up crashing the kafka process.
> Now, I've been changing limits to see if kafka would eventually join again 
> the ISR before crashing but even with a limit that's very high, kafka just 
> seems stuck in a weird state and never recovers.
> Note that between the time when the faulty broker is on its own and the time 
> it crashes, kafka is listening and kafka producer.
> For this single crash, I could see 320 errors like this from the producers :
> {code}java.util.concurrent.ExecutionException: 
> org.springframework.kafka.core.KafkaProducerException: Failed to send; nested 
> exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> 
> The configuration being the default one and the use being quite standard, I'm 
> wondering if I missed something.
> I put in place a script that check the number of kafka file descriptors and 
> restarts the service when it gets abnormally high, which does the trick for 
> now but I still lose messages when it crashes.
> I'm available to make any verification / test you need.
> Could see some similarities with ticket KAFKA-5007 too.
> cc [~junrao]
> Confirming the logs seen at the time of failure :
> {code}
>  /var/log/kafka/kafkaServer.out:[2017-08-09 22:13:29,045] WARN 
> [ReplicaFetcherThread-0-2], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@4a63f79 
> (kafka.server.ReplicaFetcherThread)
> /var/log/kafka/kafkaServer.out-java.io.IOException: Connection to 2 was 
> disconnected before the response was read
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> /var/log/kafka/kafkaServer.out-   at 
> scala.Option.foreach(Option.scala:257)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
> /var/log/kafka/kafkaServer.out-   at 
> kafka.serve

[jira] [Resolved] (KAFKA-6103) one broker appear to dead lock after running serval hours with a fresh cluster


 [ 
https://issues.apache.org/jira/browse/KAFKA-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-6103.

Resolution: Duplicate

> one broker appear to dead lock after running serval hours with a fresh cluster
> --
>
> Key: KAFKA-6103
> URL: https://issues.apache.org/jira/browse/KAFKA-6103
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.10.1.0
> Environment: brokers: 3
> Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-96-generic x86_64)
>  cpu: 8 core mem: 16G
>Reporter: Peyton Peng
>
> today we recreated a refresh kafka cluster with three brokers, at the 
> beginning everything runs well without exception. main configuration list as 
> below:
> num.io.threads=16
> num.network.threads=3
> offsets.commit.timeout.ms=1
> #offsets.topic.num.partitions=60
> default.replication.factor=3
> offsets.topic.replication.factor=3
> num.replica.fetchers=4
> replica.fetch.wait.max.ms=1000
> replica.lag.time.max.ms=2
> replica.socket.receive.buffer.bytes=1048576
> replica.socket.timeout.ms=6
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
> log.dirs=/data/kafka-logs
> num.partitions=12
> log.retention.hours=48
> log.roll.hours=48
> zookeeper.connect=
> listeners=PLAINTEXT://:9092
> advertised.listeners=PLAINTEXT://*:9092
> broker.id=3
> after serval hours we got the the LOOP exception from consumer layer as below:
> "Marking the coordinator ### dead".
> checked and found one broker running with no IO, cpu rate is ok, memory ok 
> also, but other two brokers throws exception :
> java.io.IOException: Connection to 3 was disconnected before the response was 
> read
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
>   at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
>   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
>   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> finally we found the jvm stacks show dead lock:
> Found one Java-level deadlock:
> =
> "executor-Heartbeat":
>   waiting to lock monitor 0x7f747c038db8 (object 0x0005886a5ae8, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> "group-metadata-manager-0":
>   waiting to lock monitor 0x7f757c0085e8 (object 0x0005a71fccb0, a 
> java.util.LinkedList),
>   which is held by "kafka-request-handler-7"
> "kafka-request-handler-7":
>   waiting to lock monitor 0x7f747c038db8 (object 0x0005886a5ae8, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> Java stack information for the threads listed above:
> ===
> "executor-Heartbeat":
> at 
> kafka.coordinator.GroupCoordinator.onExpireHeartbeat(GroupCoordinator.scala:739)
> - waiting to lock <0x0005886a5ae8> (a 
> kafka.coordinator.GroupMetadata)
> at 
> kafka.coordinator.DelayedHeartbeat.onExpiration(DelayedHeartbeat.scala:33)
> at kafka.server.DelayedOperation.run(DelayedOperation.scala:107)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "group-meta

[jira] [Commented] (KAFKA-6103) one broker appear to dead lock after running serval hours with a fresh cluster


[ 
https://issues.apache.org/jira/browse/KAFKA-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257159#comment-16257159
 ] 

Ismael Juma commented on KAFKA-6103:


KAFKA-4399 for example. The workaround is to upgrade.

> one broker appear to dead lock after running serval hours with a fresh cluster
> --
>
> Key: KAFKA-6103
> URL: https://issues.apache.org/jira/browse/KAFKA-6103
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.10.1.0
> Environment: brokers: 3
> Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-96-generic x86_64)
>  cpu: 8 core mem: 16G
>Reporter: Peyton Peng
>
> today we recreated a refresh kafka cluster with three brokers, at the 
> beginning everything runs well without exception. main configuration list as 
> below:
> num.io.threads=16
> num.network.threads=3
> offsets.commit.timeout.ms=1
> #offsets.topic.num.partitions=60
> default.replication.factor=3
> offsets.topic.replication.factor=3
> num.replica.fetchers=4
> replica.fetch.wait.max.ms=1000
> replica.lag.time.max.ms=2
> replica.socket.receive.buffer.bytes=1048576
> replica.socket.timeout.ms=6
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
> log.dirs=/data/kafka-logs
> num.partitions=12
> log.retention.hours=48
> log.roll.hours=48
> zookeeper.connect=
> listeners=PLAINTEXT://:9092
> advertised.listeners=PLAINTEXT://*:9092
> broker.id=3
> after serval hours we got the the LOOP exception from consumer layer as below:
> "Marking the coordinator ### dead".
> checked and found one broker running with no IO, cpu rate is ok, memory ok 
> also, but other two brokers throws exception :
> java.io.IOException: Connection to 3 was disconnected before the response was 
> read
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
>   at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
>   at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
>   at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>   at 
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
>   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>   at 
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>   at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>   at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> finally we found the jvm stacks show dead lock:
> Found one Java-level deadlock:
> =
> "executor-Heartbeat":
>   waiting to lock monitor 0x7f747c038db8 (object 0x0005886a5ae8, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> "group-metadata-manager-0":
>   waiting to lock monitor 0x7f757c0085e8 (object 0x0005a71fccb0, a 
> java.util.LinkedList),
>   which is held by "kafka-request-handler-7"
> "kafka-request-handler-7":
>   waiting to lock monitor 0x7f747c038db8 (object 0x0005886a5ae8, a 
> kafka.coordinator.GroupMetadata),
>   which is held by "group-metadata-manager-0"
> Java stack information for the threads listed above:
> ===
> "executor-Heartbeat":
> at 
> kafka.coordinator.GroupCoordinator.onExpireHeartbeat(GroupCoordinator.scala:739)
> - waiting to lock <0x0005886a5ae8> (a 
> kafka.coordinator.GroupMetadata)
> at 
> kafka.coordinator.DelayedHeartbeat.onExpiration(DelayedHeartbeat.scala:33)
> at kafka.server.DelayedOperation.run(DelayedOperation.scala:107)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

[jira] [Updated] (KAFKA-6227) Kafka 0.11.01 New consumer - multiple consumers under same group not working as expected

2017-11-17 Thread Ramkumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar updated KAFKA-6227:

Description: 
In Kafka 0.8 High level consumers, the consumer.id under group.id 
differentiates the consumers connection and manage the rebalancing the 
partitions in zookeeper.  Our Service uses this logic and keeps the Kafka 
stream connection in a cache (Concurrent Hashmap). so that consecutive http 
client connection doesn’t have to make a stream connection, but takes from 
cache and read off the messages. This also helps multiple consumers under same 
group.id can simulatenously make connection to kafka and read off the message 
(load balancing).


In Kafka 0.11.0.1, the New consumer API the design has changed.  The 
consumer.id properties are no more supported and the connections with zookeeper 
are managed by Kafka itself. When 2 consumers instances under the same group 
attempts to make a connection simulatenously , one connection waits  on 
consumer.poll() method until the other one (which is already active) drops the 
connection. 
http://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
 That is at any point of only one active consumer instance is able to poll the 
messages from the topic. This slightly would change the behavior of our service 
that we have to restrict only one consumer connection for a group for a topic. 
That is we couldn’t hold the connection in cache if multiple consumer under 
same group needs to use the Kafka.
I couldn’t find any properties that aids to make multiple consumer connections 
on the same group

The manual partition assignment may be a work around but this is way complex to 
handle that in a service. This is complex because the service needs to track 
the consumer connections and assign the partitions of the topic and do the 
rebalancing (what Kafka 0.8 high level consumer does originally).   
Unfortunately we cannot use legacy api's n Kafka 0.11 since documentation 
refers about performance degradation in using old apis in new versions.

Was there a solution devised how the highlevel consumer of kafka 0.8 can be 
migrated with out any change to the behavior from the users perspective


  was:
In Kafka 0.8 High level consumers, the consumer.id under group.id 
differentiates the consumers connection and manage the rebalancing the 
partitions in zookeeper.  Our Service uses this logic and keeps the Kafka 
stream connection in a cache (Concurrent Hashmap). so that consecutive http 
client connection doesn’t have to make a stream connection, but takes from 
cache and read off the messages. This also helps multiple consumers under same 
group.id can simulatenously make connection to kafka and read off the message 
(load balancing).


In Kafka 0.11.0.1, the New consumer API the design has changed.  The 
consumer.id properties are no more supported and the connections with zookeeper 
are managed by Kafka itself. When 2 consumers instances under the same group 
attempts to make a connection simulatenously , one connection waits  on 
consumer.poll() method until the other one drops the connection. 
http://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
 That is at any point of only one active consumer instance is able to poll the 
messages from the topic. This slightly would change the behavior of our service 
that we have to restrict only one consumer connection for a group for a topic. 
That is we couldn’t hold the connection in cache if multiple consumer under 
same group needs to use the Kafka.
I couldn’t find any properties that aids to make multiple consumer connections 
on the same group

The manual partition assignment may be a work around but this is way complex to 
handle that in a service. This is complex because the service needs to track 
the consumer connections and assign the partitions of the topic and do the 
rebalancing (what Kafka 0.8 high level consumer does originally).   
Unfortunately we cannot use legacy api's n Kafka 0.11 since documentation 
refers about performance degradation in using old apis in new versions.

Was there a solution devised how the highlevel consumer of kafka 0.8 can be 
migrated with out any change to the behavior from the users perspective



> Kafka 0.11.01 New consumer - multiple consumers under same group not working 
> as expected
> 
>
> Key: KAFKA-6227
> URL: https://issues.apache.org/jira/browse/KAFKA-6227
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.11.0.1
>Reporter: Ramkumar
>
> In Kafka 0.8 High level consumers, the consumer.id under group.id 
> differentiates the consumers connection and manage the rebalancing the 
> partitions in zookeeper.  Ou

[jira] [Updated] (KAFKA-6227) Kafka 0.11.01 New consumer - multiple consumers under same group not working as expected

2017-11-17 Thread Ramkumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar updated KAFKA-6227:

Description: 
In Kafka 0.8 High level consumers, the consumer.id under group.id 
differentiates the consumers connection and manage the rebalancing the 
partitions in zookeeper.  Our Service uses this logic and keeps the Kafka 
stream connection in a cache (Concurrent Hashmap). so that consecutive http 
client connection doesn’t have to make a stream connection, but takes from 
cache and read off the messages. This also helps multiple consumers under same 
group.id can simulatenously make connection to kafka and read off the message 
(load balancing).


In Kafka 0.11.0.1, the New consumer API the design has changed.  The 
consumer.id properties are no more supported and the connections with zookeeper 
are managed by Kafka itself. When 2 consumers instances under the same group 
attempts to make a connection simulatenously , one connection waits  on 
consumer.poll() method until the other one drops the connection. 
http://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
 That is at any point of only one active consumer instance is able to poll the 
messages from the topic. This slightly would change the behavior of our service 
that we have to restrict only one consumer connection for a group for a topic. 
That is we couldn’t hold the connection in cache if multiple consumer under 
same group needs to use the Kafka.
I couldn’t find any properties that aids to make multiple consumer connections 
on the same group

The manual partition assignment may be a work around but this is way complex to 
handle that in a service. This is complex because the service needs to track 
the consumer connections and assign the partitions of the topic and do the 
rebalancing (what Kafka 0.8 high level consumer does originally).   
Unfortunately we cannot use legacy api's n Kafka 0.11 since documentation 
refers about performance degradation in using old apis in new versions.

Was there a solution devised how the highlevel consumer of kafka 0.8 can be 
migrated with out any change to the behavior from the users perspective


  was:

In Kafka 0.8 High level consumers, the consumer.id under group.id 
differentiates the consumers connection and manage the rebalancing the 
partitions.  Our Service uses this logic and keeps the Kafka stream connection 
in a cache (Concurrent Hashmap). so that consecutive http client connection 
doesn’t have to make a stream connection, but takes from cache and read off the 
messages. This also helps multiple consumers under same group.id can 
simulatenously make connection to kafka and read off the message (load 
balancing).


In Kafka 0.11.0.1, the New consumer API the design has changed.  The 
consumer.id properties are no more supported and the connections with zookeeper 
are managed by Kafka itself. When 2 consumers instances under the same group 
attempts to make a connection simulatenously , one connection waits  on 
consumer.poll() method until the other one drops the connection. 
http://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
 That is at any point of only one active consumer instance is able to poll the 
messages from the topic. This slightly would change the behavior of our service 
that we have to restrict only one consumer connection for a group for a topic. 
That is we couldn’t hold the connection in cache if multiple consumer under 
same group needs to use the Kafka.
I couldn’t find any properties that aids to make multiple consumer connections 
on the same group

The manual partition assignment may be a work around but this is way complex to 
handle that in a service. This is complex because the service needs to track 
the consumer connections and assign the partitions of the topic and do the 
rebalancing (what Kafka 0.8 high level consumer does originally).   
Unfortunately we cannot use legacy api's n Kafka 0.11 since documentation 
refers about performance degradation in using old apis in new versions.

Was there a solution devised how the highlevel consumer of kafka 0.8 can be 
migrated with out any change to the behavior from the users perspective



> Kafka 0.11.01 New consumer - multiple consumers under same group not working 
> as expected
> 
>
> Key: KAFKA-6227
> URL: https://issues.apache.org/jira/browse/KAFKA-6227
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.11.0.1
>Reporter: Ramkumar
>
> In Kafka 0.8 High level consumers, the consumer.id under group.id 
> differentiates the consumers connection and manage the rebalancing the 
> partitions in zookeeper.  Our Service uses this logic and keeps the

[jira] [Created] (KAFKA-6227) Kafka 0.11.01 New consumer - multiple consumers under same group not working as expected

2017-11-17 Thread Ramkumar (JIRA)

Ramkumar created KAFKA-6227:
---

Summary: Kafka 0.11.01 New consumer - multiple consumers under
same group not working as expected
Key: KAFKA-6227
URL: https://issues.apache.org/jira/browse/KAFKA-6227
Project: Kafka
Issue Type: Bug
Components: consumer
Affects Versions: 0.11.0.1
Reporter: Ramkumar

In Kafka 0.8 High level consumers, the consumer.id under group.id
differentiates the consumers connection and manage the rebalancing the
partitions. Our Service uses this logic and keeps the Kafka stream connection
in a cache (Concurrent Hashmap). so that consecutive http client connection
doesn’t have to make a stream connection, but takes from cache and read off the
messages. This also helps multiple consumers under same group.id can
simulatenously make connection to kafka and read off the message (load
balancing).

In Kafka 0.11.0.1, the New consumer API the design has changed. The
consumer.id properties are no more supported and the connections with zookeeper
are managed by Kafka itself. When 2 consumers instances under the same group
attempts to make a connection simulatenously , one connection waits on
consumer.poll() method until the other one drops the connection.
http://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
That is at any point of only one active consumer instance is able to poll the
messages from the topic. This slightly would change the behavior of our service
that we have to restrict only one consumer connection for a group for a topic.
That is we couldn’t hold the connection in cache if multiple consumer under
same group needs to use the Kafka.
I couldn’t find any properties that aids to make multiple consumer connections
on the same group

The manual partition assignment may be a work around but this is way complex to
handle that in a service. This is complex because the service needs to track
the consumer connections and assign the partitions of the topic and do the
rebalancing (what Kafka 0.8 high level consumer does originally).
Unfortunately we cannot use legacy api's n Kafka 0.11 since documentation
refers about performance degradation in using old apis in new versions.

Was there a solution devised how the highlevel consumer of kafka 0.8 can be
migrated with out any change to the behavior from the users perspective

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-6046) DeleteRecordsRequest to a non-leader


 [ 
https://issues.apache.org/jira/browse/KAFKA-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-6046.

Resolution: Fixed

> DeleteRecordsRequest to a non-leader
> 
>
> Key: KAFKA-6046
> URL: https://issues.apache.org/jira/browse/KAFKA-6046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tom Bentley
>Assignee: Ted Yu
> Fix For: 1.1.0
>
>
> When a `DeleteRecordsRequest` is sent to a broker that's not the leader for 
> the partition the  `DeleteRecordsResponse` returns 
> `UNKNOWN_TOPIC_OR_PARTITION`. This is ambiguous (does the topic not exist on 
> any broker, or did we just sent the request to the wrong broker?), and 
> inconsistent (a `ProduceRequest` would return `NOT_LEADER_FOR_PARTITION`).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6046) DeleteRecordsRequest to a non-leader

2017-11-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257154#comment-16257154
 ] 

ASF GitHub Bot commented on KAFKA-6046:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4052


> DeleteRecordsRequest to a non-leader
> 
>
> Key: KAFKA-6046
> URL: https://issues.apache.org/jira/browse/KAFKA-6046
> Project: Kafka
>  Issue Type: Bug
>Reporter: Tom Bentley
>Assignee: Ted Yu
> Fix For: 1.1.0
>
>
> When a `DeleteRecordsRequest` is sent to a broker that's not the leader for 
> the partition the  `DeleteRecordsResponse` returns 
> `UNKNOWN_TOPIC_OR_PARTITION`. This is ambiguous (does the topic not exist on 
> any broker, or did we just sent the request to the wrong broker?), and 
> inconsistent (a `ProduceRequest` would return `NOT_LEADER_FOR_PARTITION`).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


 [ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-6221:
---
Priority: Minor  (was: Major)

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1, 0.11.0.1, 1.0.0
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>Priority: Minor
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 and 1.0.0 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1 and 1.0.0.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,0] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherT

[jira] [Commented] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


[ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257059#comment-16257059
 ] 

Ismael Juma commented on KAFKA-6221:


[~alex.dunayevsky], I think the best we can do is add a clarifying message. 
This error may indicate real issues if it doesn't fix itself after a short 
period of time.

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1, 0.11.0.1, 1.0.0
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 and 1.0.0 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1 and 1.0.0.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2

[jira] [Resolved] (KAFKA-679) Phabricator for code review


 [ 
https://issues.apache.org/jira/browse/KAFKA-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-679.
---
Resolution: Won't Do

Yes, this can be safely closed.

> Phabricator for code review
> ---
>
> Key: KAFKA-679
> URL: https://issues.apache.org/jira/browse/KAFKA-679
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Neha Narkhede
>Assignee: Sriram Subramanian
>
> Sriram proposed adding phabricator support for code reviews. 
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We can use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for Kafka.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


 [ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Dunayevsky updated KAFKA-6221:
---
Affects Version/s: 0.11.0.1

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1, 0.11.0.1, 1.0.0
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 and 1.0.0 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1 and 1.0.0.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,0] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16

[jira] [Commented] (KAFKA-679) Phabricator for code review

2017-11-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257000#comment-16257000
 ] 

Sönke Liebau commented on KAFKA-679:


Is there still any activity in this direction ongoing or pending? Seeing as the 
ticket has not been updated in five years and review/coding/etc. has moved to 
github I'd assume this can be closed?

> Phabricator for code review
> ---
>
> Key: KAFKA-679
> URL: https://issues.apache.org/jira/browse/KAFKA-679
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Neha Narkhede
>Assignee: Sriram Subramanian
>
> Sriram proposed adding phabricator support for code reviews. 
> From http://phabricator.org/ : "Phabricator is a open source collection of 
> web applications which make it easier to write, review, and share source 
> code. It is currently available as an early release. Phabricator was 
> developed at Facebook."
> It's open source so pretty much anyone could host an instance of this 
> software.
> To begin with, there will be a public-facing instance located at 
> http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
> http://osuosl.org).
> We can use this JIRA to deal with adding (and ensuring) Apache-friendly 
> support that will allow us to do code reviews with Phabricator for Kafka.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-517) Ensure that we escape the metric names if they include user strings

2017-11-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256991#comment-16256991
 ] 

Sönke Liebau commented on KAFKA-517:


A recent 
[commit|https://github.com/apache/kafka/commit/9be71f7bdcd147aee7a360a4ccf400acb858a056]
 introduced a jmxSanitizer to properly quote jmx strings if necessary. This is 
currently being applied to tags and mbean strings, but not to the name.
This is probably not really a problem, as I couldn't find any occurrences where 
there are dynamic metric names, so any exceptions should occur during testing 
when adding new metrics, however it is a potential exception that can be 
avoided by sanitizing names as well.

We could apply the transformation 
[here|https://github.com/apache/kafka/blob/7672e9ec3def7af6797bc0ecf254ac694efdfad5/core/src/main/scala/kafka/metrics/KafkaMetricsGroup.scala#L64].
 As far as I can tell (and I am by no means an expert on jmx) this should not 
change anything for properly named metrics (which currently is all of them) but 
in case someone ever adds one with an illegal name it would not cause an 
exception. 

I am unsure if this is a useful addition or if we'd rather new metrics fail so 
that the author can change the name to something valid. Maybe someone can 
comment on my musings, happy to create a small pull request if we deem this 
useful. If not, I believe we can close the issue.

> Ensure that we escape the metric names if they include user strings
> ---
>
> Key: KAFKA-517
> URL: https://issues.apache.org/jira/browse/KAFKA-517
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
>  Labels: bugs
>
> JMX has limits on valid strings. We need to check validity before blindly 
> creating a metric that includes a given topic name. If we fail to do this we 
> will get an exception like this:
> javax.management.MalformedObjectNameException: Unterminated key property part
>   at javax.management.ObjectName.construct(ObjectName.java:540)
>   at javax.management.ObjectName.(ObjectName.java:1403)
>   at 
> com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
>   at 
> com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
>   at 
> com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
>   at 
> com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)
>   at com.yammer.metrics.Metrics.newMeter(Metrics.java:245)
>   at 
> kafka.metrics.KafkaMetricsGroup$class.newMeter(KafkaMetricsGroup.scala:46)
>   at kafka.server.FetcherStat.newMeter(AbstractFetcherThread.scala:180)
>   at kafka.server.FetcherStat.(AbstractFetcherThread.scala:182)
>   at 
> kafka.server.FetcherStat$$anonfun$2.apply(AbstractFetcherThread.scala:186)
>   at 
> kafka.server.FetcherStat$$anonfun$2.apply(AbstractFetcherThread.scala:186)
>   at kafka.utils.Pool.getAndMaybePut(Pool.scala:60)
>   at 
> kafka.server.FetcherStat$.getFetcherStat(AbstractFetcherThread.scala:190)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


 [ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Dunayevsky updated KAFKA-6221:
---
Description: 
This issue appeared to happen frequently on 0.10.2.0. 
On 0.10.2.1 and 1.0.0 it's a way harder to reproduce. 
We'll focus on reproducing it on 0.10.2.1 and 1.0.0.

*TOPOLOGY:* 
  3 brokers, 1 zk.

*REPRODUCING STRATEGY:* 
Create a few dozens topics (say, 40) one by one, each with replication factor 
2. Number of partitions, generally, does not matter but, for easier 
reproduction, should not be too small (around 30 or so). 

*CREATE 40 TOPICS:*
{code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
"topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
done {code}

*ERRORS*
{code:java}
*BROKER 1*
[2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,27] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,27] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,9] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,9] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,3] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,3] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,15] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,15] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,21] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[topic1_p28_r2,21] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)

*BROKER 2*
[2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,12] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,12] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,0] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,0] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,6] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
[2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for partition 
[topic20_p28_r2,6] to broker 
3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This

[jira] [Updated] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


 [ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Dunayevsky updated KAFKA-6221:
---
Affects Version/s: 1.0.0

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1, 1.0.0
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,410] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,0] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,410] ERROR [ReplicaFetcher

[jira] [Created] (KAFKA-6226) Performance Consumer should print units in it's output, like the producer

2017-11-17 Thread Antony Stubbs (JIRA)

Antony Stubbs created KAFKA-6226:


 Summary: Performance Consumer should print units in it's output, 
like the producer
 Key: KAFKA-6226
 URL: https://issues.apache.org/jira/browse/KAFKA-6226
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Antony Stubbs


IMO this should be the default behaviour which would match the performance 
producer, and be able to disable it with a config.

https://github.com/apache/kafka/pull/4080



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6225) Add an option to performance consumer consume continuously

2017-11-17 Thread Antony Stubbs (JIRA)

Antony Stubbs created KAFKA-6225:


 Summary: Add an option to performance consumer consume continuously
 Key: KAFKA-6225
 URL: https://issues.apache.org/jira/browse/KAFKA-6225
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 1.1.0
Reporter: Antony Stubbs
Priority: Minor


IMO this should be the default behaviour which would match the performance 
producer, and be able to disable it with a config.
I can implement this either by adding an infinite loop, or by allowing the user 
to configure the timeout setting which is currently hard coded to 1 second. 
Patches are available for either.

https://github.com/apache/kafka/pull/4082



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5563) Clarify handling of connector name in config

2017-11-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/KAFKA-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256744#comment-16256744
 ] 

Sönke Liebau commented on KAFKA-5563:
-

As KAFKA-4930 is approaching a stable state I looked at this again and found 
that a check like this is already implemented for [updating a 
connectorconfig|https://github.com/apache/kafka/blob/e31c0c9bdbad432bc21b583bd3c084f05323f642/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectorsResource.java#L135]
 

I've prepared a pull request that moves this check into a small helper function 
and call this from the createConnector method as well. 

I've also tested merging this with the PR for 4930 which works nicely (for 
now), if any problems occur I'll rebase whichever PR gets merged later.

> Clarify handling of connector name in config 
> -
>
> Key: KAFKA-5563
> URL: https://issues.apache.org/jira/browse/KAFKA-5563
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Minor
>
> The connector name is currently being stored in two places, once at the root 
> level of the connector and once in the config:
> {code:java}
> {
>   "name": "test",
>   "config": {
>   "connector.class": 
> "org.apache.kafka.connect.tools.MockSinkConnector",
>   "tasks.max": "3",
>   "topics": "test-topic",
>   "name": "test"
>   },
>   "tasks": [
>   {
>   "connector": "test",
>   "task": 0
>   }
>   ]
> }
> {code}
> If no name is provided in the "config" element, then the name from the root 
> level is [copied there when the connector is being 
> created|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectorsResource.java#L95].
>  If however a name is provided in the config then it is not touched, which 
> means it is possible to create a connector with a different name at the root 
> level and in the config like this:
> {code:java}
> {
>   "name": "test1",
>   "config": {
>   "connector.class": 
> "org.apache.kafka.connect.tools.MockSinkConnector",
>   "tasks.max": "3",
>   "topics": "test-topic",
>   "name": "differentname"
>   },
>   "tasks": [
>   {
>   "connector": "test1",
>   "task": 0
>   }
>   ]
> }
> {code}
> I am not aware of any issues that this currently causes, but it is at least 
> confusing and probably not intended behavior and definitely bears potential 
> for bugs, if different functions take the name from different places.
> Would it make sense to add a check to reject requests that provide different 
> names in the request and the config section?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5563) Clarify handling of connector name in config

2017-11-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256739#comment-16256739
 ] 

ASF GitHub Bot commented on KAFKA-5563:
---

GitHub user soenkeliebau opened a pull request:

https://github.com/apache/kafka/pull/4230

KAFKA-5563: Moved comparison of connector name from url against name …

…from config to own function and added check to create connector call.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/soenkeliebau/kafka KAFKA-5563

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4230


commit 9722a70e235951f55b5e02dce14ab4c9bf3f95ae
Author: Soenke Liebau 
Date:   2017-11-17T09:24:54Z

KAFKA-5563: Moved comparison of connector name from url against name from 
config to own function and added check to create connector call.




> Clarify handling of connector name in config 
> -
>
> Key: KAFKA-5563
> URL: https://issues.apache.org/jira/browse/KAFKA-5563
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Minor
>
> The connector name is currently being stored in two places, once at the root 
> level of the connector and once in the config:
> {code:java}
> {
>   "name": "test",
>   "config": {
>   "connector.class": 
> "org.apache.kafka.connect.tools.MockSinkConnector",
>   "tasks.max": "3",
>   "topics": "test-topic",
>   "name": "test"
>   },
>   "tasks": [
>   {
>   "connector": "test",
>   "task": 0
>   }
>   ]
> }
> {code}
> If no name is provided in the "config" element, then the name from the root 
> level is [copied there when the connector is being 
> created|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/resources/ConnectorsResource.java#L95].
>  If however a name is provided in the config then it is not touched, which 
> means it is possible to create a connector with a different name at the root 
> level and in the config like this:
> {code:java}
> {
>   "name": "test1",
>   "config": {
>   "connector.class": 
> "org.apache.kafka.connect.tools.MockSinkConnector",
>   "tasks.max": "3",
>   "topics": "test-topic",
>   "name": "differentname"
>   },
>   "tasks": [
>   {
>   "connector": "test1",
>   "task": 0
>   }
>   ]
> }
> {code}
> I am not aware of any issues that this currently causes, but it is at least 
> confusing and probably not intended behavior and definitely bears potential 
> for bugs, if different functions take the name from different places.
> Would it make sense to add a check to reject requests that provide different 
> names in the request and the config section?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


[ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256703#comment-16256703
 ] 

Alex Dunayevsky edited comment on KAFKA-6221 at 11/17/17 9:20 AM:
--

*huxihx*, thank you for the explanation! Nope, no exceptions later, everything 
works fine, but it's quite confusing to observe this when deploying Kafka in 
production... I believe this should *not* be considered as a normal Kafka 
behavior and should be fixed. What do you think?


was (Author: alex.dunayevsky):
*huxihx*, thank you for the explanation! Nope, no exceptions later, everything 
works fine, but it's quite confusing to observe this when deploying Kafka in 
production... I believe this should *not * be considered as a normal Kafka 
behavior and should be fixed. What do you think?

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
>

[jira] [Comment Edited] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


[ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256703#comment-16256703
 ] 

Alex Dunayevsky edited comment on KAFKA-6221 at 11/17/17 9:20 AM:
--

*huxihx*, thank you for the explanation! Nope, no exceptions later, everything 
works fine, but it's quite confusing to observe this when deploying Kafka in 
production... I believe this should *not * be considered as a normal Kafka 
behavior and should be fixed. What do you think?


was (Author: alex.dunayevsky):
*huxihx*, thank you for the explanation! Nope, no exceptions later, everything 
works fine, but it's quite confusing to observe this when deploying Kafka in 
production... I believe this should not be considered as a normal Kafka 
behavior and should be fixed. What do you think?

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> se

[jira] [Commented] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation


[ 
https://issues.apache.org/jira/browse/KAFKA-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256703#comment-16256703
 ] 

Alex Dunayevsky commented on KAFKA-6221:


*huxihx*, thank you for the explanation! Nope, no exceptions later, everything 
works fine, but it's quite confusing to observe this when deploying Kafka in 
production... I believe this should not be considered as a normal Kafka 
behavior and should be fixed. What do you think?

> ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic 
> creation 
> ---
>
> Key: KAFKA-6221
> URL: https://issues.apache.org/jira/browse/KAFKA-6221
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: RHEL 7
>Reporter: Alex Dunayevsky
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This issue appeared to happen frequently on 0.10.2.0. 
> On 0.10.2.1 it's a way harder to reproduce. 
> We'll focus on reproducing it on 0.10.2.1.
> *TOPOLOGY:* 
>   3 brokers, 1 zk.
> *REPRODUCING STRATEGY:* 
> Create a few dozens topics (say, 40) one by one, each with replication factor 
> 2. Number of partitions, generally, does not matter but, for easier 
> reproduction, should not be too small (around 30 or so). 
> *CREATE 40 TOPICS:*
> {code:java} for i in {1..40}; do bin/kafka-topics.sh --create --topic 
> "topic${i}_p28_r2" --partitions 28 --replication-factor 2 --zookeeper :2165; 
> done {code}
> *ERRORS*
> {code:java}
> *BROKER 1*
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,853] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,27] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,9] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,3] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,15] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:00,854] ERROR [ReplicaFetcherThread-0-2], Error for 
> partition [topic1_p28_r2,21] to broker 
> 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> *BROKER 2*
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:46:36,408] ERROR [ReplicaFetcherThread-0-3], Error for 
> partition [topic20_p28_r2,12] to broker 
> 3:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
> [2017-11-15 16:4

[jira] [Comment Edited] (KAFKA-6219) Inconsistent behavior for kafka-consumer-groups

2017-11-17 Thread huxihx (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254858#comment-16254858
 ] 

huxihx edited comment on KAFKA-6219 at 11/17/17 8:55 AM:
-

[~vahid] I've been thinking of that before opening this one. In KAFKA-5638 you 
recommend we narrow down the minimum required permission for ListGroup. For 
this jira, however, I am thinking whether it's okay for `listGroups` to capture 
all exceptions and return the empty list. The ACL case here is just an example. 
Does it make sense?


was (Author: huxi_2b):
[~vahid] I've been thinking of that before opening this one. In KAFKA-5638 you 
recommend we narrow down the minimum required permission for ListGroup. For 
this jira, however, I am thinking whether it's okay for `listGroups` to capture 
all exceptions return the empty list. The ACL case here is just an example. 
Does it make sense?

> Inconsistent behavior for kafka-consumer-groups
> ---
>
> Key: KAFKA-6219
> URL: https://issues.apache.org/jira/browse/KAFKA-6219
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 1.0.0
>Reporter: huxihx
>Assignee: huxihx
>
> For example, when ACL is enabled, running kafka-consumer-groups.sh --describe 
> to describe a group complains:
> `Error: Executing consumer group command failed due to Not authorized to 
> access group: Group authorization failed.`
> However, running kafka-consumer-groups.sh --list otherwise returns nothing, 
> confusing user whether there are no groups at all or something wrong happened.
> In `AdminClient.listAllGroups`, it captures all the possible exceptions and 
> returns an empty List.
> It's better keep those two methods consistent. Does it make any sense?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (KAFKA-6224) Can not build Kafka 1.0.0 with gradle 3.2.1

2017-11-17 Thread huxihx (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256668#comment-16256668
 ] 

huxihx edited comment on KAFKA-6224 at 11/17/17 8:51 AM:
-

Seems the valid property name should be `scalaVersion` in build.gradle instead 
of `scala_version`.


was (Author: huxi_2b):
Seems the valid property name should be ``scalaVersion` in build.gradle instead 
of `scala_version`.

> Can not build Kafka 1.0.0 with gradle 3.2.1
> ---
>
> Key: KAFKA-6224
> URL: https://issues.apache.org/jira/browse/KAFKA-6224
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0.0
> Environment: Ubuntu 17.10
>Reporter: Chao Ren
>Priority: Trivial
>
> Trying to play around with Kafka on a brand new machine with Ubutun 17.10 
> installed.
> When building, got the following error, I do have Gradle 3.2.1 installed. 
> definitely no need to upgrade, should I downgrade?
> ```
> chaoren@chaoren:~/code/kafka-1.0.0-src$ ./gradlew jar -Pscala_version=2.11.11
> To honour the JVM settings for this build a new JVM will be forked. Please 
> consider using the daemon: 
> https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
> Building project 'core' with Scala version 2.11.11
> FAILURE: Build failed with an exception.
> * Where:
> Build file '/home/chaoren/code/kafka-1.0.0-src/build.gradle' line: 963
> * What went wrong:
> A problem occurred evaluating root project 'kafka-1.0.0-src'.
> > Failed to apply plugin [class 
> > 'com.github.jengelman.gradle.plugins.shadow.ShadowBasePlugin']
>> This version of Shadow supports Gradle 3.0+ only. Please upgrade.
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output.
> BUILD FAILED
> Total time: 6.186 secs
> chaoren@chaoren:~/code/kafka-1.0.0-src$ gradle -version
> 
> Gradle 3.2.1
> 
> Build time:   2012-12-21 00:00:00 UTC
> Revision: none
> Groovy:   2.4.8
> Ant:  Apache Ant(TM) version 1.9.9 compiled on June 29 2017
> JVM:  1.8.0_151 (Oracle Corporation 25.151-b12)
> OS:   Linux 4.13.0-16-generic amd64
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6224) Can not build Kafka 1.0.0 with gradle 3.2.1

2017-11-17 Thread huxihx (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256668#comment-16256668
 ] 

huxihx commented on KAFKA-6224:
---

Seems the valid property name should be ``scalaVersion` in build.gradle instead 
of `scala_version`.

> Can not build Kafka 1.0.0 with gradle 3.2.1
> ---
>
> Key: KAFKA-6224
> URL: https://issues.apache.org/jira/browse/KAFKA-6224
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0.0
> Environment: Ubuntu 17.10
>Reporter: Chao Ren
>Priority: Trivial
>
> Trying to play around with Kafka on a brand new machine with Ubutun 17.10 
> installed.
> When building, got the following error, I do have Gradle 3.2.1 installed. 
> definitely no need to upgrade, should I downgrade?
> ```
> chaoren@chaoren:~/code/kafka-1.0.0-src$ ./gradlew jar -Pscala_version=2.11.11
> To honour the JVM settings for this build a new JVM will be forked. Please 
> consider using the daemon: 
> https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
> Building project 'core' with Scala version 2.11.11
> FAILURE: Build failed with an exception.
> * Where:
> Build file '/home/chaoren/code/kafka-1.0.0-src/build.gradle' line: 963
> * What went wrong:
> A problem occurred evaluating root project 'kafka-1.0.0-src'.
> > Failed to apply plugin [class 
> > 'com.github.jengelman.gradle.plugins.shadow.ShadowBasePlugin']
>> This version of Shadow supports Gradle 3.0+ only. Please upgrade.
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output.
> BUILD FAILED
> Total time: 6.186 secs
> chaoren@chaoren:~/code/kafka-1.0.0-src$ gradle -version
> 
> Gradle 3.2.1
> 
> Build time:   2012-12-21 00:00:00 UTC
> Revision: none
> Groovy:   2.4.8
> Ant:  Apache Ant(TM) version 1.9.9 compiled on June 29 2017
> JVM:  1.8.0_151 (Oracle Corporation 25.151-b12)
> OS:   Linux 4.13.0-16-generic amd64
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6224) Can not build Kafka 1.0.0 with gradle 3.2.1


[ 
https://issues.apache.org/jira/browse/KAFKA-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256661#comment-16256661
 ] 

Ismael Juma commented on KAFKA-6224:


The message implies that gradle 2.13 is being used:

{code}
chaoren@chaoren:~/code/kafka-1.0.0-src$ ./gradlew jar -Pscala_version=2.11.11
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
{code}

Note that the docs link is for 2.13. What was the output when you ran `gradle` 
in order to generate `gradlew`?

> Can not build Kafka 1.0.0 with gradle 3.2.1
> ---
>
> Key: KAFKA-6224
> URL: https://issues.apache.org/jira/browse/KAFKA-6224
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.0.0
> Environment: Ubuntu 17.10
>Reporter: Chao Ren
>Priority: Trivial
>
> Trying to play around with Kafka on a brand new machine with Ubutun 17.10 
> installed.
> When building, got the following error, I do have Gradle 3.2.1 installed. 
> definitely no need to upgrade, should I downgrade?
> ```
> chaoren@chaoren:~/code/kafka-1.0.0-src$ ./gradlew jar -Pscala_version=2.11.11
> To honour the JVM settings for this build a new JVM will be forked. Please 
> consider using the daemon: 
> https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
> Building project 'core' with Scala version 2.11.11
> FAILURE: Build failed with an exception.
> * Where:
> Build file '/home/chaoren/code/kafka-1.0.0-src/build.gradle' line: 963
> * What went wrong:
> A problem occurred evaluating root project 'kafka-1.0.0-src'.
> > Failed to apply plugin [class 
> > 'com.github.jengelman.gradle.plugins.shadow.ShadowBasePlugin']
>> This version of Shadow supports Gradle 3.0+ only. Please upgrade.
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output.
> BUILD FAILED
> Total time: 6.186 secs
> chaoren@chaoren:~/code/kafka-1.0.0-src$ gradle -version
> 
> Gradle 3.2.1
> 
> Build time:   2012-12-21 00:00:00 UTC
> Revision: none
> Groovy:   2.4.8
> Ant:  Apache Ant(TM) version 1.9.9 compiled on June 29 2017
> JVM:  1.8.0_151 (Oracle Corporation 25.151-b12)
> OS:   Linux 4.13.0-16-generic amd64
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-5588) Remove deprecated new-consumer option for tools

2017-11-17 Thread Paolo Patierno (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated KAFKA-5588:
--
Fix Version/s: 2.0.0

> Remove deprecated new-consumer option for tools
> ---
>
> Key: KAFKA-5588
> URL: https://issues.apache.org/jira/browse/KAFKA-5588
> Project: Kafka
>  Issue Type: Bug
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Minor
> Fix For: 2.0.0
>
>
> Hi,
> with the current version of the ConsoleConsumer, ConsumerPerformance and 
> ConsumerGroupCommand command line tools, it's not needed to specify the 
> --new-consumer option anymore in order to use the new consumer. The choice 
> for using the old or the new one is made just specifying the --zookeeper for 
> the former and --bootstrap-server for the latter.
> The issues [KAFKA-5599|https://issues.apache.org/jira/browse/KAFKA-5599] and 
> [KAFKA-5619|https://issues.apache.org/jira/browse/KAFKA-5619] fixed and 
> included in the 1.0.0 release deprecated the usage of the --new-consumer flag.
> More details in the related 
> [KIP-176|https://cwiki.apache.org/confluence/display/KAFKA/KIP-176%3A+Remove+deprecated+new-consumer+option+for+tools].
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-3984) Broker doesn't retry reconnecting to an expired Zookeeper connection

2017-11-17 Thread Kuldeep Dhole (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256617#comment-16256617
 ] 

Kuldeep Dhole commented on KAFKA-3984:
--

any ETA for this fix?

> Broker doesn't retry reconnecting to an expired Zookeeper connection
> 
>
> Key: KAFKA-3984
> URL: https://issues.apache.org/jira/browse/KAFKA-3984
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.1.1
>Reporter: Braedon Vickers
>
> We've been having issues with the network connectivity of our Kafka cluster, 
> and this seems to be triggering an issue where the brokers stop trying to 
> reconnect to Zookeeper, leaving us with a broken cluster even when the 
> network has recovered.
> When network issues begin we see {{java.net.NoRouteToHostException}} 
> exceptions from {{org.apache.zookeeper.ClientCnxn}} as it attempts to 
> re-establish the connection. If the network issue resolves itself while we 
> are only getting these errors the broker seems to reconnect fine.
> However, a lot of the time we end up with a message like this:
> {code}[2016-07-22 00:21:44,181] FATAL Could not establish session with 
> zookeeper (kafka.server.KafkaHealthcheck)
> org.I0Itec.zkclient.exception.ZkException: Unable to connect to  hosts>
>   at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:71)
>   at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:1279)
> ...
> Caused by: java.net.UnknownHostException: 
>   at java.net.InetAddress.getAllByName(InetAddress.java:1126)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1192)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> ...
> {code}
> (apologies for the partial stack traces - I'm having to try and reconstruct 
> them from a less than ideal centralised logging setup.)
> If this happens, the broker stops trying to reconnect to Zookeeper, and we 
> have to restart it.
> It looks like while the {{org.apache.zookeeper.Zookeeper}} client's state 
> isn't {{Expired}} it will keep retrying the connection, and will recover OK 
> when the network is back. However, once it changes to {{Expired}} (not 
> entirely sure how that happens - based on the session timeout perhaps?) 
> zkclient closes the existing client and attempts to create a new one. If the 
> network is still down, the client constructor throws a 
> {{java.net.UnknownHostException}}, zkclient calls 
> {{handleSessionEstablishmentError()}} on {{KafkaHealthcheck}}, 
> {{KafkaHealthcheck.handleSessionEstablishmentError()}} logs a "Fatal" error 
> and does nothing else.
> It seems like some form of retry needs to happen here, or the broker is stuck 
> with no Zookeeper connection 
> indefinitely.{{KafkaHealthcheck.handleSessionEstablishmentError()}} used to 
> kill the JVM, but that was removed in 
> https://issues.apache.org/jira/browse/KAFKA-2405. Killing the JVM would be 
> better than doing nothing, as then your init system could restart it, 
> allowing it to recover once the network was back.
> Our cluster is running 0.9.0.1, so not sure if it affects 0.10.0.0 as well. 
> However, it seems likely, as there doesn't seem to be any code changes in 
> kafka or zkclient that would affect this behaviour.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-4871) Kafka doesn't respect TTL on Zookeeper hostname - crash if zookeeper IP changes

2017-11-17 Thread Kuldeep Dhole (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256616#comment-16256616
 ] 

Kuldeep Dhole commented on KAFKA-4871:
--

any ETA for this fix?

> Kafka doesn't respect TTL on Zookeeper hostname - crash if zookeeper IP 
> changes
> ---
>
> Key: KAFKA-4871
> URL: https://issues.apache.org/jira/browse/KAFKA-4871
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Stephane Maarek
>
> I had a Zookeeper cluster that automatically obtains hostname so that they 
> remain constant over time. I deleted my 3 zookeeper machines and new machines 
> came back online, with the same hostname, and they updated their CNAME
> Kafka then failed and couldn't reconnect to Zookeeper as it didn't try to 
> resolve the IP of Zookeeper again. See log below:
> [2017-03-09 05:49:57,302] INFO Client will use GSSAPI as SASL mechanism. 
> (org.apache.zookeeper.client.ZooKeeperSaslClient)
> [2017-03-09 05:49:57,302] INFO Opening socket connection to server 
> zookeeper-3.example.com/10.12.79.43:2181. Will attempt to SASL-authenticate 
> using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
> [ec2-user]$ dig +short zookeeper-3.example.com
> 10.12.79.36
> As you can see even though the machine is capable of finding the new 
> hostname, Kafka somehow didn't respect the TTL (was set to 60 seconds) and 
> didn't get the new IP. I feel that on failed Zookeeper connection, Kafka 
> should at least try to resolve the new Zookeeper IP. That allows Kafka to 
> keep up with Zookeeper changes over time
> What do you think? Is that expected behaviour or a bug?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (KAFKA-6221) ReplicaFetcherThread throws UnknownTopicOrPartitionException on topic creation