Build failed in Jenkins: kafka-trunk-jdk8 #526

2016-04-13 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3470: treat commits as member heartbeats

--
[...truncated 3190 lines...]

kafka.KafkaTest > testGetKafkaConfigFromArgs PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheEnd PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsOnly PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheBegging PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue PASSED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.message.MessageCompressionTest > testCompressSize PASSED

kafka.message.MessageCompressionTest > testSimpleCompressDecompress PASSED

kafka.message.MessageWriterTest > testWithNoCompressionAttribute PASSED

kafka.message.MessageWriterTest > testWithCompressionAttribute PASSED

kafka.message.MessageWriterTest > testBufferingOutputStream PASSED

kafka.message.MessageWriterTest > testWithKey PASSED

kafka.message.MessageTest > testChecksum PASSED

kafka.message.MessageTest > testInvalidTimestamp PASSED

kafka.message.MessageTest > testIsHashable PASSED

kafka.message.MessageTest > testInvalidTimestampAndMagicValueCombination PASSED

kafka.message.MessageTest > testExceptionMapping PASSED

kafka.message.MessageTest > testFieldValues PASSED

kafka.message.MessageTest > testInvalidMagicByte PASSED

kafka.message.MessageTest > testEquality PASSED

kafka.message.MessageTest > testMessageFormatConversion PASSED

kafka.message.ByteBufferMessageSetTest > testMessageWithProvidedOffsetSeq PASSED

kafka.message.ByteBufferMessageSetTest > testValidBytes PASSED

kafka.message.ByteBufferMessageSetTest > testValidBytesWithCompression PASSED

kafka.message.ByteBufferMessageSetTest > 
testOffsetAssignmentAfterMessageFormatConversion PASSED

kafka.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.message.ByteBufferMessageSetTest > testAbsoluteOffsetAssignment PASSED

kafka.message.ByteBufferMessageSetTest > testCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testInvalidCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.message.ByteBufferMessageSetTest > testLogAppendTime PASSED

kafka.message.ByteBufferMessageSetTest > testWriteTo PASSED

kafka.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.message.ByteBufferMessageSetTest > testIterator PASSED

kafka.message.ByteBufferMessageSetTest > testRelativeOffsetAssignment PASSED

kafka.tools.ConsoleConsumerTest > shouldLimitReadsToMaxMessageLimit PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidNewConsumerValidConfig PASSED

kafka.tools.ConsoleConsumerTest > shouldParseConfigsFromFile PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidOldConsumerValidConfig PASSED

kafka.tools.ConsoleProducerTest > testParseKeyProp PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer PASSED

kafka.tools.ConsoleProducerTest > testInvalidConfigs PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest 

[jira] [Commented] (KAFKA-3470) Consumer group coordinator should take commit requests as effective as heartbeats

2016-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240353#comment-15240353
 ] 

ASF GitHub Bot commented on KAFKA-3470:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1206


> Consumer group coordinator should take commit requests as effective as 
> heartbeats
> -
>
> Key: KAFKA-3470
> URL: https://issues.apache.org/jira/browse/KAFKA-3470
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Zaiming Shi
>Assignee: Jason Gustafson
> Fix For: 0.10.0.0
>
>
> Group coordinator does not reset heartbeat timer when commit request is 
> received.
> This may lead to unnecessary session timeouts when a consumer sends
> sync commit requests so frequently that causes heartbeat requests to be
> delayed.
> Presumably (as I do not know Kafka code well):
> Commit requests (v1 and v2) have all data fields for a heartbeat request,
> If they are taken as effective as heartbeat requests, we should have
> better group stability.
> [For reference]
> previous discussions in: us...@kafka.apache.org
> mail title:   consumer group, why commit requests are not considered as 
> effective heartbeats?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3470: treat commits as member heartbeats

2016-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1206


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3470) Consumer group coordinator should take commit requests as effective as heartbeats

2016-04-13 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3470.
--
   Resolution: Fixed
Fix Version/s: 0.10.0.0

Issue resolved by pull request 1206
[https://github.com/apache/kafka/pull/1206]

> Consumer group coordinator should take commit requests as effective as 
> heartbeats
> -
>
> Key: KAFKA-3470
> URL: https://issues.apache.org/jira/browse/KAFKA-3470
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Zaiming Shi
>Assignee: Jason Gustafson
> Fix For: 0.10.0.0
>
>
> Group coordinator does not reset heartbeat timer when commit request is 
> received.
> This may lead to unnecessary session timeouts when a consumer sends
> sync commit requests so frequently that causes heartbeat requests to be
> delayed.
> Presumably (as I do not know Kafka code well):
> Commit requests (v1 and v2) have all data fields for a heartbeat request,
> If they are taken as effective as heartbeat requests, we should have
> better group stability.
> [For reference]
> previous discussions in: us...@kafka.apache.org
> mail title:   consumer group, why commit requests are not considered as 
> effective heartbeats?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3556) Improve group coordinator metrics

2016-04-13 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-3556:
--

 Summary: Improve group coordinator metrics
 Key: KAFKA-3556
 URL: https://issues.apache.org/jira/browse/KAFKA-3556
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


We currently don't have many metrics to track the behavior of the group 
coordinator (especially with respect to the new consumer). On a quick pass, I 
only saw a couple gauges in GroupMetadataManager for the number of groups and 
the number of cached offsets. Here are some interesting metrics that may be 
worth tracking:

1. Session timeout rate
2. Rebalance latency/rate
3. Commit latency/rate
4. Average group size
5. Size of metadata cache

Some of these may also be interesting to track per group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3511) Provide built-in aggregators sum() and avg() in Kafka Streams DSL

2016-04-13 Thread Ishita Mandhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishita Mandhan reassigned KAFKA-3511:
-

Assignee: Ishita Mandhan

> Provide built-in aggregators sum() and avg() in Kafka Streams DSL
> -
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Ishita Mandhan
>  Labels: api, newbie
> Fix For: 0.10.1.0
>
>
> Currently we only have one built-in aggregate function count() in the Kafka 
> Streams DSL, but we want to add more aggregation functions like sum() and 
> avg().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3555) Unexpected close of KStreams transformer

2016-04-13 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-3555:
---

 Summary: Unexpected close of KStreams transformer 
 Key: KAFKA-3555
 URL: https://issues.apache.org/jira/browse/KAFKA-3555
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.0.0
Reporter: Anna Povzner
Assignee: Guozhang Wang


I consistently get this behavior when running my system test that runs 1-node 
kafka cluster.

We implemented TransformerSupplier, and the topology is 
transform().filter().map().filter().aggregate(). I have a log message in my 
transformer's close() method. On every run of the test, I see that after 
running 10-20 seconds, transformer's close() is called. Then, in about 20 
seconds, I see that transformer is re-initialized and continues running.

I don't see any exceptions happening in KStreams before close() happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #525

2016-04-13 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3552) New Consumer: java.lang.OutOfMemoryError: Direct buffer memory

2016-04-13 Thread Kanak Biscuitwala (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240051#comment-15240051
 ] 

Kanak Biscuitwala commented on KAFKA-3552:
--

Looks like there is a full GC immediately before the OOM (and that's confusing 
because shouldn't the completion of a full GC be enough to free up memory?):

{code}
2016-04-13T18:19:47.181+: 5318.045: [GC [1 CMS-initial-mark: 
702209K(900444K)] 809543K(1844188K), 0.0330340 secs] [Times: user=0.03 
sys=0.00, real=0.03 secs]
2016-04-13T18:19:47.214+: 5318.078: [CMS-concurrent-mark-start]
2016-04-13T18:19:47.563+: 5318.427: [CMS-concurrent-mark: 0.349/0.349 secs] 
[Times: user=0.58 sys=0.16, real=0.35 secs]
2016-04-13T18:19:47.564+: 5318.427: [CMS-concurrent-preclean-start]
2016-04-13T18:19:47.573+: 5318.436: [CMS-concurrent-preclean: 0.008/0.009 
secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-04-13T18:19:47.573+: 5318.437: 
[CMS-concurrent-abortable-preclean-start]
2016-04-13T18:19:49.670+: 5320.534: [Full GC2016-04-13T18:19:49.671+: 
5320.534: [CMS2016-04-13T18:19:49.677+: 5320.541: 
[CMS-concurrent-abortable-preclean: 2.079/2.104 secs] [Times: user=3.46 
sys=0.22, real=2.11 secs]
 (concurrent mode interrupted): 702209K->385639K(900444K), 0.8443580 secs] 
1507659K->385639K(1844188K), [CMS Perm : 56002K->56002K(83968K)], 0.8448730 
secs] [Times: user=0.86 sys=0.00, real=0.84 secs]
2016-04-13T18:19:52.619+: 5323.483: [Full GC2016-04-13T18:19:52.620+: 
5323.483: [CMS: 385639K->384334K(900444K), 0.6693420 secs] 
714628K->384334K(1844188K), [CMS Perm : 56002K->56002K(83968K)], 0.6698370 
secs] [Times: user=0.68 sys=0.00, real=0.67 secs]
2016-04-13T18:19:55.395+: 5326.259: [Full GC2016-04-13T18:19:55.395+: 
5326.259: [CMS: 384334K->383389K(900444K), 0.6660360 secs] 
695662K->383389K(1844188K), [CMS Perm : 56002K->56002K(83968K)], 0.6665300 
secs] [Times: user=0.68 sys=0.00, real=0.67 secs]
2016-04-13T18:19:58.166+: 5329.030: [Full GC2016-04-13T18:19:58.166+: 
5329.030: [CMS: 383389K->382675K(900444K), 0.6607420 secs] 
624249K->382675K(1844188K), [CMS Perm : 56002K->56002K(83968K)], 0.6612310 
secs] [Times: user=0.67 sys=0.00, real=0.66 secs]
2016-04-13T18:20:01.171+: 5332.035: [GC2016-04-13T18:20:01.171+: 
5332.035: [ParNew: 838912K->90048K(943744K), 0.0167690 secs] 
1221587K->472723K(1844188K), 0
.0172720 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
2016-04-13T18:20:07.407+: 5338.270: [GC2016-04-13T18:20:07.407+: 
5338.271: [ParNew: 928960K->25607K(943744K), 0.0232340 secs] 
1311635K->408283K(1844188K), 0
.0237360 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
{code}

And the OOM occurs at 2016-04-13/18:19:58.928 UTC

I do see that my host is running somewhat low on physical memory (but has 
plenty of swap) -- the heap dump is too large to attach, but I will attach a 
couple screenshots of byte[] allocations taking much more space than I would 
expect. Is it possible that there is a memory leak here?

> New Consumer: java.lang.OutOfMemoryError: Direct buffer memory
> --
>
> Key: KAFKA-3552
> URL: https://issues.apache.org/jira/browse/KAFKA-3552
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Kanak Biscuitwala
>Assignee: Liquan Pei
> Attachments: Screen Shot 2016-04-13 at 11.56.05 AM.png, Screen Shot 
> 2016-04-13 at 2.17.48 PM.png
>
>
> I'm running Kafka's new consumer with message handlers that can sometimes 
> take a lot of time to return, and combining that with manual offset 
> management (to get at-least-once semantics). Since poll() is the only way to 
> heartbeat with the consumer, I have a thread that runs every 500 milliseconds 
> that does the following:
> 1) Pause all partitions
> 2) Call poll(0)
> 3) Resume all partitions
> For the record, all accesses to KafkaConsumer are protected by synchronized 
> blocks. This generally works, but I'm occasionally seeing messages like this:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> 

[jira] [Updated] (KAFKA-3552) New Consumer: java.lang.OutOfMemoryError: Direct buffer memory

2016-04-13 Thread Kanak Biscuitwala (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanak Biscuitwala updated KAFKA-3552:
-
Attachment: Screen Shot 2016-04-13 at 11.56.05 AM.png
Screen Shot 2016-04-13 at 2.17.48 PM.png

> New Consumer: java.lang.OutOfMemoryError: Direct buffer memory
> --
>
> Key: KAFKA-3552
> URL: https://issues.apache.org/jira/browse/KAFKA-3552
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Kanak Biscuitwala
>Assignee: Liquan Pei
> Attachments: Screen Shot 2016-04-13 at 11.56.05 AM.png, Screen Shot 
> 2016-04-13 at 2.17.48 PM.png
>
>
> I'm running Kafka's new consumer with message handlers that can sometimes 
> take a lot of time to return, and combining that with manual offset 
> management (to get at-least-once semantics). Since poll() is the only way to 
> heartbeat with the consumer, I have a thread that runs every 500 milliseconds 
> that does the following:
> 1) Pause all partitions
> 2) Call poll(0)
> 3) Resume all partitions
> For the record, all accesses to KafkaConsumer are protected by synchronized 
> blocks. This generally works, but I'm occasionally seeing messages like this:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {code}
> In addition, when I'm reporting offsets, I'm seeing:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
> {code}
> Given that I'm 

[jira] [Commented] (KAFKA-3490) Multiple version support for ducktape performance tests

2016-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239991#comment-15239991
 ] 

ASF GitHub Bot commented on KAFKA-3490:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1173


> Multiple version support for ducktape performance tests
> ---
>
> Key: KAFKA-3490
> URL: https://issues.apache.org/jira/browse/KAFKA-3490
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> To verify the performance impact of changes, it is very handy to be able to 
> run ducktape performance tests across multiple Kafka versions. Luckily 
> [~geoffra] has done most of the work for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3490; Multiple version support for duckt...

2016-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1173


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3490) Multiple version support for ducktape performance tests

2016-04-13 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3490:
-
   Resolution: Fixed
Fix Version/s: (was: 0.10.0.0)
   0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1173
[https://github.com/apache/kafka/pull/1173]

> Multiple version support for ducktape performance tests
> ---
>
> Key: KAFKA-3490
> URL: https://issues.apache.org/jira/browse/KAFKA-3490
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.10.1.0
>
>
> To verify the performance impact of changes, it is very handy to be able to 
> run ducktape performance tests across multiple Kafka versions. Luckily 
> [~geoffra] has done most of the work for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Alexander Binzberger (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239897#comment-15239897
 ] 

Alexander Binzberger commented on KAFKA-3042:
-

In my case it was high load on the network.
Had this on the test machines/test cluster on openstack.
The physical network was at max load (peeks but sometimes for some more time) 
when I saw this.
I know this info is not very precise but might still help.

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Problem with kafka doFlush method

2016-04-13 Thread Liquan Pei
Hi

Are you using the JDBC connector? Can you share with me the command and
configuration that you were running?

Thanks,
Liquan

On Wed, Mar 23, 2016 at 8:36 AM, Aleksandar Pejakovic  wrote:

> Hi,
>
>
> I have created Confluent Connect (http://docs.confluent.io/2.0.1/connect/?)
> sink and source tasks. While working in standalone mode there are no
> errrors or exceptions.
>
>
> But, when i start tasks in distributed mode i get NullPointerException
> while OffsetStorageWriter is executing doFlush() method.
>
> Full stack trace:
>
>
> ERROR Unhandled exception when committing
> WorkerSourceTask{id=distributed-s3-source-0}:
> (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:118)
> java.lang.NullPointerException
> at
> org.apache.kafka.connect.storage.KafkaOffsetBackingStore.set(KafkaOffsetBackingStore.java:122)
> at
> org.apache.kafka.connect.storage.OffsetStorageWriter.doFlush(OffsetStorageWriter.java:161)
> at
> org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:267)
> at
> org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter.commit(SourceTaskOffsetCommitter.java:110)
> at
> org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter$1.run(SourceTaskOffsetCommitter.java:76)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> After a closer look i have determined that in KafkaOffsetBackingStore in
> set() method, parameter [final Map values] is null,
> therefore next line:
>
>  - SetCallbackFuture producerCallback = new
> SetCallbackFuture(values.size(), callback);  throws exception.
>
> After that exception sink and source tasks continue working without any
> problems.
> When kafka tries to do another flush, it throws following exception:
>
> ERROR Invalid call to OffsetStorageWriter flush() while already flushing,
> the framework should not allow this
> (org.apache.kafka.connect.storage.OffsetStorageWriter:108)
> [2016-03-23 14:57:33,220] ERROR Unhandled exception when committing
> WorkerSourceTask{id=distributed-s3-source-0}:
> (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:118)
> org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is
> already flushing
> at
> org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:110)
> at
> org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:227)
> at
> org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter.commit(SourceTaskOffsetCommitter.java:110)
> at
> org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter$1.run(SourceTaskOffsetCommitter.java:76)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Does anyone know how to fix this exception?
>
>
> I am using confluent-2.0.1 with kafka-9.0.1-cp1.
>
>
>
>
>


-- 
Liquan Pei
Software Engineer, Confluent Inc


[jira] [Commented] (KAFKA-3492) support quota based on authenticated user name

2016-04-13 Thread Scott Kruger (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239810#comment-15239810
 ] 

Scott Kruger commented on KAFKA-3492:
-

Really, I think the important thing is to be able to control the distribution 
of quota at times of contention, or barring that, a relatively fair partition 
of the quota among active clients.

> support quota based on authenticated user name
> --
>
> Key: KAFKA-3492
> URL: https://issues.apache.org/jira/browse/KAFKA-3492
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Assignee: Rajini Sivaram
>
> Currently, quota is based on the client.id set in the client configuration, 
> which can be changed easily. Ideally, quota should be set on the 
> authenticated user name. We will need to have a KIP proposal/discussion on 
> this first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3492) support quota based on authenticated user name

2016-04-13 Thread Scott Kruger (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239806#comment-15239806
 ] 

Scott Kruger commented on KAFKA-3492:
-

It would nice to be able to assign clients minimum percentages of a user's 
quota.  For example, if user A has clients X and Y, both of which are 
contending for A's quota, it would be nice to assign client X a minimum of 25% 
of A's quota so that Y can't starve it out.

> support quota based on authenticated user name
> --
>
> Key: KAFKA-3492
> URL: https://issues.apache.org/jira/browse/KAFKA-3492
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Assignee: Rajini Sivaram
>
> Currently, quota is based on the client.id set in the client configuration, 
> which can be changed easily. Ideally, quota should be set on the 
> authenticated user name. We will need to have a KIP proposal/discussion on 
> this first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3554) Generate actual data with specific compression ratio in the ProducerPerformance tool.

2016-04-13 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3554:

Affects Version/s: 0.9.0.1
Fix Version/s: 0.10.1.0

> Generate actual data with specific compression ratio in the 
> ProducerPerformance tool.
> -
>
> Key: KAFKA-3554
> URL: https://issues.apache.org/jira/browse/KAFKA-3554
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.1
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
> Fix For: 0.10.1.0
>
>
> Currently the ProducerPerformance always generate the payload with same 
> bytes. This does not quite well to test the compressed data because the 
> payload is extremely compressible no matter how big the payload is.
> We can make some changes to make it more useful for compressed messages. 
> Currently I am generating the payload containing integer from a given range. 
> By adjusting the range of the integers, we can get different compression 
> ratios. 
> API wise, we can either let user to specify the integer range or the expected 
> compression ratio (we will do some probing to get the corresponding range for 
> the users)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3554) Generate actual data with specific compression ratio in the ProducerPerformance tool.

2016-04-13 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-3554:
---

 Summary: Generate actual data with specific compression ratio in 
the ProducerPerformance tool.
 Key: KAFKA-3554
 URL: https://issues.apache.org/jira/browse/KAFKA-3554
 Project: Kafka
  Issue Type: Improvement
Reporter: Jiangjie Qin


Currently the ProducerPerformance always generate the payload with same bytes. 
This does not quite well to test the compressed data because the payload is 
extremely compressible no matter how big the payload is.

We can make some changes to make it more useful for compressed messages. 
Currently I am generating the payload containing integer from a given range. By 
adjusting the range of the integers, we can get different compression ratios. 

API wise, we can either let user to specify the integer range or the expected 
compression ratio (we will do some probing to get the corresponding range for 
the users)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3554) Generate actual data with specific compression ratio in the ProducerPerformance tool.

2016-04-13 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3554:

Assignee: Dong Lin

> Generate actual data with specific compression ratio in the 
> ProducerPerformance tool.
> -
>
> Key: KAFKA-3554
> URL: https://issues.apache.org/jira/browse/KAFKA-3554
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
>
> Currently the ProducerPerformance always generate the payload with same 
> bytes. This does not quite well to test the compressed data because the 
> payload is extremely compressible no matter how big the payload is.
> We can make some changes to make it more useful for compressed messages. 
> Currently I am generating the payload containing integer from a given range. 
> By adjusting the range of the integers, we can get different compression 
> ratios. 
> API wise, we can either let user to specify the integer range or the expected 
> compression ratio (we will do some probing to get the corresponding range for 
> the users)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3358) Only request metadata updates once we have topics or a pattern subscription

2016-04-13 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239604#comment-15239604
 ] 

Jason Gustafson commented on KAFKA-3358:


I can confirm Phil Luckhurst's investigation. This issue with the metadata 
storm should go away automatically with the changes to the topic metadata 
request, but it can also be fixed more simply with the following patch: 
https://github.com/hachikuji/kafka/commit/428a973c36434f06282900209f1ea29a1343704e.
 This change seems to make sense on its own, so perhaps it makes sense to merge 
it in addition to the KIP-4 changes. Thoughts?

> Only request metadata updates once we have topics or a pattern subscription
> ---
>
> Key: KAFKA-3358
> URL: https://issues.apache.org/jira/browse/KAFKA-3358
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.0, 0.9.0.1
>Reporter: Ismael Juma
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.1.0
>
>
> The current code requests a metadata update for _all_ topics which can cause 
> major load issues in large clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-33 - Add a time based log index

2016-04-13 Thread Becket Qin
Hi Jun and Guozhang,

I have updated the KIP wiki to incorporate your comments. Please let me
know if you prefer starting another discussion thread for further
discussion.

Thanks,

Jiangjie (Becket) Qin

On Mon, Apr 11, 2016 at 12:21 AM, Becket Qin  wrote:

> Hi Guozhang and Jun,
>
> Thanks for the comments. Please see the responses below.
>
> Regarding to Guozhang's question #1 and Jun's question #12. I was
> inserting the time index and offset index entry together mostly for
> simplicity as Guozhang mentioned. The purpose of using index interval bytes
> for time index was to control the density of the time index, which is the
> same purpose as offset index. It seems reasonable to make them aligned. We
> can track separately the physical position when we insert the last time
> index entry(my original code did that), but when I wrote the code I feel it
> seems unnecessary. Another minor benefit is that searching by timestamp
> could be potentially faster if we align the time index and offset index.
> It is possible that we only have either a corrupted time index or an
> offset index, but not both. Although we can choose to only rebuild the one
> which is corrupted, given that we have to scan the entire log segment
> anyway, rebuilding both of them seems not much overhead. So the current
> patch I have is rebuilding both of them together.
>
> 10. Yes, it should only happen after a hard failure. The last time index
> entry of a normally closed segment has already points to the LEO, so there
> is no scan during start up.
>
> 11. On broker startup, if a time index does not exist, an empty one will
> be created first. If message format version is 0.9.0, we will append a time
> index entry of (last modification time -> base offset of next segment) to
> the time index of each inactive segment. So no actual rebuild will happen
> during upgrade. However, if message format version is 0.10.0, we will
> rebuild the time index if it does not exist. (I actually had a question
> about the how we are loading the log segments, we can discuss it in the PR)
>
> I will update the wiki to clarify the question raised in the comments and
> submit a PR by tomorrow. I am currently cleaning up the documentation.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Sun, Apr 10, 2016 at 9:25 PM, Jun Rao  wrote:
>
>> Hi, Jiangjie,
>>
>> Thanks for the update. Looks good to me overall. Just a few minor comments
>> below.
>>
>> 10. On broker startup, it's not clear to me why we need to scan the log
>> segment to retrieve the largest timestamp since the time index always has
>> an entry for the largest timestamp. Is that only for restarting after a
>> hard failure?
>>
>> 11. On broker startup, if a log segment misses the time index, do we
>> always
>> rebuild it? This can happen when the broker is upgraded.
>>
>> 12. Related to Guozhang's question #1. It seems it's simpler to add time
>> index entries independent of the offset index since at index entry may not
>> be added to the offset and the time index at the same time. Also, this
>> allows time index to be rebuilt independently if needed.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Wed, Apr 6, 2016 at 5:44 PM, Becket Qin  wrote:
>>
>> > Hi all,
>> >
>> > I updated KIP-33 based on the initial implementation. Per discussion on
>> > yesterday's KIP hangout, I would like to initiate the new vote thread
>> for
>> > KIP-33.
>> >
>> > The KIP wiki:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
>> >
>> > Here is a brief summary of the KIP:
>> > 1. We propose to add a time index for each log segment.
>> > 2. The time indices are going to be used of log retention, log rolling
>> and
>> > message search by timestamp.
>> >
>> > There was an old voting thread which has some discussions on this KIP.
>> The
>> > mail thread link is following:
>> >
>> >
>> http://mail-archives.apache.org/mod_mbox/kafka-dev/201602.mbox/%3ccabtagwgoebukyapfpchmycjk2tepq3ngtuwnhtr2tjvsnc8...@mail.gmail.com%3E
>> >
>> > I have the following WIP patch for reference. It needs a few more unit
>> > tests and documentation. Other than that it should run fine.
>> >
>> >
>> https://github.com/becketqin/kafka/commit/712357a3fbf1423e05f9eed7d2fed5b6fe6c37b7
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>>
>
>


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239575#comment-15239575
 ] 

Jun Rao commented on KAFKA-3042:


[~ismael juma], we fixed a few issues related to soft failure post 0.8.2. 
However, there could be other issues that we don't know yet.

[~delbaeth], [~wushujames], even when we fix all the bugs related to soft 
failure, it would still be good to avoid it in the first place since it only 
adds overhead. A 6 zookeeper.session.timeout seems high though. Do you know 
what's causing the ZK session timeout? Is it related to GC or network?

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239575#comment-15239575
 ] 

Jun Rao edited comment on KAFKA-3042 at 4/13/16 4:45 PM:
-

[~ijuma]], we fixed a few issues related to soft failure post 0.8.2. However, 
there could be other issues that we don't know yet.

[~delbaeth], [~wushujames], even when we fix all the bugs related to soft 
failure, it would still be good to avoid it in the first place since it only 
adds overhead. A 6 zookeeper.session.timeout seems high though. Do you know 
what's causing the ZK session timeout? Is it related to GC or network?


was (Author: junrao):
[~ismael juma], we fixed a few issues related to soft failure post 0.8.2. 
However, there could be other issues that we don't know yet.

[~delbaeth], [~wushujames], even when we fix all the bugs related to soft 
failure, it would still be good to avoid it in the first place since it only 
adds overhead. A 6 zookeeper.session.timeout seems high though. Do you know 
what's causing the ZK session timeout? Is it related to GC or network?

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239575#comment-15239575
 ] 

Jun Rao edited comment on KAFKA-3042 at 4/13/16 4:45 PM:
-

[~ijuma], we fixed a few issues related to soft failure post 0.8.2. However, 
there could be other issues that we don't know yet.

[~delbaeth], [~wushujames], even when we fix all the bugs related to soft 
failure, it would still be good to avoid it in the first place since it only 
adds overhead. A 6 zookeeper.session.timeout seems high though. Do you know 
what's causing the ZK session timeout? Is it related to GC or network?


was (Author: junrao):
[~ijuma]], we fixed a few issues related to soft failure post 0.8.2. However, 
there could be other issues that we don't know yet.

[~delbaeth], [~wushujames], even when we fix all the bugs related to soft 
failure, it would still be good to avoid it in the first place since it only 
adds overhead. A 6 zookeeper.session.timeout seems high though. Do you know 
what's causing the ZK session timeout? Is it related to GC or network?

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2016-04-13 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239469#comment-15239469
 ] 

Jun Rao commented on KAFKA-3410:


[~james cheng], for #2, what you suggested makes sense. Instead of halting the 
broker, a less intrusive way is probably to just log an error, delay the 
fetching of the partition, and keep retrying.

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3522) Consider adding version information into rocksDB storage format

2016-04-13 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska reassigned KAFKA-3522:
---

Assignee: Eno Thereska

> Consider adding version information into rocksDB storage format
> ---
>
> Key: KAFKA-3522
> URL: https://issues.apache.org/jira/browse/KAFKA-3522
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: semantics
> Fix For: 0.10.1.0
>
>
> Kafka Streams does not introduce any modifications to the data format in the 
> underlying Kafka protocol, but it does use RocksDB for persistent state 
> storage, and currently its data format is fixed and hard-coded. We want to 
> consider the evolution path in the future we we change the data format, and 
> hence having some version info stored along with the storage file / directory 
> would be useful.
> And this information could be even out of the storage file; for example, we 
> can just use a small "version indicator" file in the rocksdb directory for 
> this purposes. Thoughts? [~enothereska] [~jkreps]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239434#comment-15239434
 ] 

Ismael Juma commented on KAFKA-3548:


[~rsivaram], please go ahead.

> Locale is not handled properly in kafka-consumer
> 
>
> Key: KAFKA-3548
> URL: https://issues.apache.org/jira/browse/KAFKA-3548
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Tanju Cataltepe
>Assignee: Rajini Sivaram
> Fix For: 0.10.0.0
>
>
> If the JVM local language is Turkish, which has different upper case for the 
> lower case letter i, the result is a runtime error caused by 
> org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
> enum constant *EARLİEST* is generated which does not match *EARLIEST* (note 
> the _dotted capital i_).
> If the locale for the JVM is explicitly set to en_US, the example runs as 
> expected.
> A sample error log is below:
> {noforma}
> [akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
> at akka.actor.ActorCell.create(ActorCell.scala:606)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
> at akka.dispatch.Mailbox.run(Mailbox.scala:223)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
> consumer
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
> at 
> com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
> at akka.actor.Props.newActor(Props.scala:214)
> at akka.actor.ActorCell.newActor(ActorCell.scala:562)
> at akka.actor.ActorCell.create(ActorCell.scala:588)
> ... 7 more
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-13 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3548:
---
Assignee: Rajini Sivaram  (was: Neha Narkhede)

> Locale is not handled properly in kafka-consumer
> 
>
> Key: KAFKA-3548
> URL: https://issues.apache.org/jira/browse/KAFKA-3548
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Tanju Cataltepe
>Assignee: Rajini Sivaram
> Fix For: 0.10.0.0
>
>
> If the JVM local language is Turkish, which has different upper case for the 
> lower case letter i, the result is a runtime error caused by 
> org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
> enum constant *EARLİEST* is generated which does not match *EARLIEST* (note 
> the _dotted capital i_).
> If the locale for the JVM is explicitly set to en_US, the example runs as 
> expected.
> A sample error log is below:
> {noforma}
> [akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
> at akka.actor.ActorCell.create(ActorCell.scala:606)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
> at akka.dispatch.Mailbox.run(Mailbox.scala:223)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
> consumer
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
> at 
> com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
> at akka.actor.Props.newActor(Props.scala:214)
> at akka.actor.ActorCell.newActor(ActorCell.scala:562)
> at akka.actor.ActorCell.create(ActorCell.scala:588)
> ... 7 more
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-13 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239431#comment-15239431
 ] 

Rajini Sivaram commented on KAFKA-3548:
---

[~nehanarkhede] Is anyone working on this issue? Otherwise, I can submit a PR. 
Thanks.

> Locale is not handled properly in kafka-consumer
> 
>
> Key: KAFKA-3548
> URL: https://issues.apache.org/jira/browse/KAFKA-3548
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Tanju Cataltepe
>Assignee: Neha Narkhede
> Fix For: 0.10.0.0
>
>
> If the JVM local language is Turkish, which has different upper case for the 
> lower case letter i, the result is a runtime error caused by 
> org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
> enum constant *EARLİEST* is generated which does not match *EARLIEST* (note 
> the _dotted capital i_).
> If the locale for the JVM is explicitly set to en_US, the example runs as 
> expected.
> A sample error log is below:
> {noforma}
> [akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
> at akka.actor.ActorCell.create(ActorCell.scala:606)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
> at akka.dispatch.Mailbox.run(Mailbox.scala:223)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
> consumer
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
> at 
> com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
> at akka.actor.Props.newActor(Props.scala:214)
> at akka.actor.ActorCell.newActor(ActorCell.scala:562)
> at akka.actor.ActorCell.create(ActorCell.scala:588)
> ... 7 more
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3553) Issue with getting data of next offset after restarting the consumer and producer is always running

2016-04-13 Thread pooja sadashiv deokar (JIRA)
pooja sadashiv deokar created KAFKA-3553:


 Summary: Issue with getting data of next offset after restarting 
the consumer and producer is always running
 Key: KAFKA-3553
 URL: https://issues.apache.org/jira/browse/KAFKA-3553
 Project: Kafka
  Issue Type: Bug
  Components: consumer, offset manager
Affects Versions: 0.9.0.0
 Environment: operating system : ubuntu 14.04
using kafka-python
Reporter: pooja sadashiv deokar
Assignee: Neha Narkhede


I am pushing data to kafka topic after every 1 second in python. And written 
consumer to fetch data from topic with consumer timeout as 500 ms and 
enable_auto_commit as false.
Following are the scripts:

1) prod.py

from kafka import KafkaClient,SimpleConsumer
from kafka.producer import SimpleProducer
import random, time
kafka =  KafkaClient("localhost:9092")

producer = SimpleProducer(kafka)
i=0
while(True):
  l1 = ['Rahul','Narendra','NaMo','ManMohan','Sonia']
  msg = str(i) + ',' + str(random.choice(l1))
  print('putting data : '+msg) 
  producer.send_messages("test", msg)
  i = i + 1
  time.sleep(1)

2) con.py

from kafka import KafkaConsumer
consumer = 
KafkaConsumer('test',bootstrap_servers=['localhost:9092'],consumer_timeout_ms = 
500, enable_auto_commit = False)
for message in consumer:
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
  message.offset, message.key,
  message.value))
consumer.commit()


My producer is continuously running. And I am  running consumer again when it 
stops. For first time consumer works well and gets data for me on time. But 
after first time its taking minimum 1 minute to maximum  5 minutes (or above 5 
min also) to get next data. It should give me data after 1 second or maximum 
after 5 seconds. But its giving problem :( . 
Also 1 thing which I observe is If I wait for starting consumer about 1 min 
then data comes as expected.

Please correct me if I am going wrong.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3149) Extend SASL implementation to support more mechanisms

2016-04-13 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-3149:
--
Fix Version/s: 0.10.0.0
   Status: Patch Available  (was: Open)

Updated based on the discussions in the mailing threads around KIP-43 and 
KIP-35.

> Extend SASL implementation to support more mechanisms
> -
>
> Key: KAFKA-3149
> URL: https://issues.apache.org/jira/browse/KAFKA-3149
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.9.0.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
> Fix For: 0.10.0.0
>
>
> Make SASL implementation more configurable to enable integration with 
> existing authentication servers.
> Details are in KIP-43: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-43%3A+Kafka+SASL+enhancements]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-43: Kafka SASL enhancements

2016-04-13 Thread Rajini Sivaram
I have updated the PR (https://github.com/apache/kafka/pull/812) and KIP-43
to use standard Kafka format for the new request/response added by KIP-43.
I haven't changed the overall structure of the Java code. Feedback is
appreciated.

Thanks,

Rajini

On Tue, Apr 12, 2016 at 3:52 PM, Ismael Juma  wrote:

> Hi Jun,
>
> Comments inline.
>
> On Mon, Apr 11, 2016 at 1:57 AM, Jun Rao  wrote:
>
> > Yes, that should be fine right? Since the new api key will start with a 0
> > byte, it actually guarantees that it's different from 0x60 (1st byte in
> the
> > old protocol) even if we change the request version id in the future.
>
>
> Yes, this is true. Also, the GSS API library will throw an exception if the
> first byte is not 0x60 (for the case where newer clients connect to older
> brokers):
>
>
> https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/security/jgss/GSSHeader.java#L97
>
>
> And the DEFECTIVE_TOKEN status code is specified in both RFC 2743[1] and
> RFC 5653[2]. Section 3.1 of RFC 2743 specifies that the token tag consists
> of the following elements, in order:
>
> 1. 0x60 -- Tag for [APPLICATION 0] SEQUENCE; indicates that
>   -- constructed form, definite length encoding follows.
>
> 2. Token length octets ...
>
>
> Ismael
>
> [1] Generic Security Service Application Program Interface Version 2,
> Update 1: https://tools.ietf.org/html/rfc2743
> [2] Generic Security Service API Version 2: Java Bindings Update:
> https://tools.ietf.org/html/rfc5653
>
> Ismael
>



-- 
Regards,

Rajini


[jira] [Commented] (KAFKA-3358) Only request metadata updates once we have topics or a pattern subscription

2016-04-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239013#comment-15239013
 ] 

Ismael Juma commented on KAFKA-3358:


Phil Luckhurst mentioned in the mailing list that an additional aspect is that 
we send metadata requests quite frequently if the producer is started, but not 
used:

{noformat}
With debug logging turned on we've sometimes seen our logs filling up with the 
kafka producer sending metadata requests
every 100ms e.g.

2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | phil-pa-1] 
org.apache.kafka.clients.NetworkClient: Sending metadata request 
ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=249,client_id=phil-pa-1},
 body={topics=[phil-pa-1-device-update]}), isInitiatedByNetworkClient, 
createdTimeMs=1460108373592, sendTimeMs=0) to node 0
2016-04-08 10:39:33,592 DEBUG [kafka-producer-network-thread | phil-pa-1] 
org.apache.kafka.clients.Metadata: Updated cluster metadata version 248 to 
Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], partitions = [Partition(topic = 
phil-pa-1-device-update, partition = 0, leader = 0, replicas = [0,], isr = 
[0,]])
2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | phil-pa-1] 
org.apache.kafka.clients.NetworkClient: Sending metadata request 
ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=250,client_id=phil-pa-1},
 body={topics=[phil-pa-1-device-update]}), isInitiatedByNetworkClient, 
createdTimeMs=1460108373698, sendTimeMs=0) to node 0
2016-04-08 10:39:33,698 DEBUG [kafka-producer-network-thread | phil-pa-1] 
org.apache.kafka.clients.Metadata: Updated cluster metadata version 249 to 
Cluster(nodes = [Node(0, ta-eng-kafka2, 9092)], partitions = [Partition(topic = 
phil-pa-1-device-update, partition = 0, leader = 0, replicas = [0,], isr = 
[0,]])

These metadata requests continue to be sent every 100ms (retry.backoff.ms) 
until we stop the process.

This only seems to happen if the KafkaProducer instance is created but not used 
to publish a message for 5 minutes. After 5
minutes (metadata.max.age.ms) the producer thread sends a metadata request to 
the server that has an empty topics list and
the server responds with the partition information for *all* topics hosted on 
the server.

2016-04-11 14:16:39,320 DEBUG [kafka-producer-network-thread | phil-pa-1] 
org.apache.kafka.clients.NetworkClient: Sending metadata request 
ClientRequest(expectResponse=true, callback=null, 
request=RequestSend(header={api_key=3,api_version=0,correlation_id=0,client_id=phil-pa-1},
 body={topics=[]}), isInitiatedByNetworkClient, createdTimeMs=1460380599289, 
sendTimeMs=0) to node -1

If we then use that KafkaProducer instance to send a message the next 'Sending 
meta request' will just be for the topic we have
sent the message to and this then triggers the flood of retry requests as noted 
above.

If we ensure we send the first message within the time set by 
metadata.max.age.ms (default 5 minutes) then everything works as
expected and the metadata requests do not continually get retried.

In many cases I can understand that creating a KafkaProducer and then not using 
it within 5 minutes is not usual but in our case
we're creating it when our REST based application starts up and we can't 
guarantee that a message will be published within that
time. To get around this we are currently posting a test message to the topic 
right after creating the KafkaProducer prevents it
happening.
{noformat}

Phil investigated some more and said:

{noformat}
The request does succeed and the reason it keeps requesting is a check in the 
Sender.run(long now) method.

public void run(long now) {
Cluster cluster = metadata.fetch();
// get the list of partitions with data ready to send
RecordAccumulator.ReadyCheckResult result = 
this.accumulator.ready(cluster, now);

// if there are any partitions whose leaders are not known yet, force 
metadata update
if (result.unknownLeadersExist)
this.metadata.requestUpdate();

It looks like the this.accumulator.ready(cluster, now) method checks the leader 
for each partition in the response against what it
already had. In this case the original metadata request had the empty topic 
list so got information for all partitions but after
using the producer the cluster only has the one topic in it which means this 
check sets unknownLeadersExist = true.
Node leader = cluster.leaderFor(part);
if (leader == null) {
unknownLeadersExist = true;

As you can see above the Sender.run method checks for this in the result and 
then calls this.metadata.requestUpdate() which
triggers the metadata to be requested again. And of course the same thing 
happens when checking the next response and
we're suddenly in 

[jira] [Updated] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3042:
---
Fix Version/s: 0.10.0.0

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-13 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238721#comment-15238721
 ] 

Ismael Juma commented on KAFKA-3042:


[~junrao], some people have said that they have seen this issue in 0.8.2 too so 
that suggests that there may be 2 different problems?

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)