[jira] [Comment Edited] (KAFKA-5403) Transactions system test should dedup consumed messages by offset

2017-08-16 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129866#comment-16129866
 ] 

Apurva Mehta edited comment on KAFKA-5403 at 8/17/17 4:19 AM:
--

I think we should punt on this. The problems with the patch are not easy to fix 
as the {{VerifiableConsumer}} validates that offsets are sequential, which is 
not true when you have transactions. And the {{ConsoleConsumer}} doesn't expose 
offsets. So modifying either without breaking compatibility will take time.

Also, the system test has been running reliably for months without suffering 
any problems with duplicate reads on the same offset.


was (Author: apurva):
I think we should punt on this. The problems with the patch are not easy to fix 
as the {{VerifiableConsumer}} validates that offsets are sequential, which is 
not true when you have transactions. And the {{ConsoleConsumer}} doesn't expose 
offsets. So modifying either breaking compatibility will take time.

Also, the system test has been running reliably for months without suffering 
any problems with duplicate reads on the same offset.

> Transactions system test should dedup consumed messages by offset
> -
>
> Key: KAFKA-5403
> URL: https://issues.apache.org/jira/browse/KAFKA-5403
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> In KAFKA-5396, we saw that the consumers which verify the data in multiple 
> topics could read the same offsets multiple times, for instance when a 
> rebalance happens. 
> This would detect spurious duplicates, causing the test to fail. We should 
> dedup the consumed messages by offset and only fail the test if we have 
> duplicate values for a if for a unique set of offsets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5543) We don't remove the LastStableOffsetLag metric when a partition is moved away from a broker

2017-08-16 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5543:

Fix Version/s: (was: 0.11.0.1)
   1.0.0

> We don't remove the LastStableOffsetLag metric when a partition is moved away 
> from a broker
> ---
>
> Key: KAFKA-5543
> URL: https://issues.apache.org/jira/browse/KAFKA-5543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> Reported by [~junrao], we have a small leak where the `LastStableOffsetLag` 
> metric is not removed along with the other metrics in the 
> `Partition.removeMetrics` method. This could create a leak when partitions 
> are reassigned or a topic is deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5482) A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request slows down transactions significantly

2017-08-16 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5482:

Fix Version/s: (was: 0.11.0.2)
   1.0.0

> A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request 
> slows down transactions significantly
> --
>
> Key: KAFKA-5482
> URL: https://issues.apache.org/jira/browse/KAFKA-5482
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> Here is the issue.
> # When we do a commit transaction, the producer sends an `EndTxn` request to 
> the coordinator. The coordinator writes the `PrepareCommit` message to the 
> transaction log and then returns the response the client. It writes the 
> transaction markers and the final 'CompleteCommit' message asynchronously. 
> # In the mean time, if the client starts another transaction, it will send an 
> `AddPartitions` request on the next `Sender.run` loop. If the markers haven't 
> been written yet, then the coordinator will return a retriable 
> `CONCURRENT_TRANSACTIONS` error to the client.
> # The current behavior in the producer is to sleep for `retryBackoffMs` 
> before retrying the request. The current default for this is 100ms. So the 
> producer will sleep for 100ms before sending the `AddPartitions` again. This 
> puts a floor on the latency for back to back transactions.
> This has been worked around in 
> https://issues.apache.org/jira/browse/KAFKA-5477 by reducing the retryBackoff 
> for the first AddPartitions request. But we need a stronger solution: like 
> having the commit block until the transaction is complete, or delaying the 
> addPartitions until batches are actually ready to be sent to the transaction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5482) A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request slows down transactions significantly

2017-08-16 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129870#comment-16129870
 ] 

Apurva Mehta commented on KAFKA-5482:
-

The proposed solution is involved and cannot be part of the next bugfix 
release. 

> A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request 
> slows down transactions significantly
> --
>
> Key: KAFKA-5482
> URL: https://issues.apache.org/jira/browse/KAFKA-5482
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> Here is the issue.
> # When we do a commit transaction, the producer sends an `EndTxn` request to 
> the coordinator. The coordinator writes the `PrepareCommit` message to the 
> transaction log and then returns the response the client. It writes the 
> transaction markers and the final 'CompleteCommit' message asynchronously. 
> # In the mean time, if the client starts another transaction, it will send an 
> `AddPartitions` request on the next `Sender.run` loop. If the markers haven't 
> been written yet, then the coordinator will return a retriable 
> `CONCURRENT_TRANSACTIONS` error to the client.
> # The current behavior in the producer is to sleep for `retryBackoffMs` 
> before retrying the request. The current default for this is 100ms. So the 
> producer will sleep for 100ms before sending the `AddPartitions` again. This 
> puts a floor on the latency for back to back transactions.
> This has been worked around in 
> https://issues.apache.org/jira/browse/KAFKA-5477 by reducing the retryBackoff 
> for the first AddPartitions request. But we need a stronger solution: like 
> having the commit block until the transaction is complete, or delaying the 
> addPartitions until batches are actually ready to be sent to the transaction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5347) OutOfSequence error should be fatal

2017-08-16 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5347:

Fix Version/s: (was: 0.11.0.2)
   1.0.0

> OutOfSequence error should be fatal
> ---
>
> Key: KAFKA-5347
> URL: https://issues.apache.org/jira/browse/KAFKA-5347
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 1.0.0
>
>
> If the producer sees an OutOfSequence error for a given partition, we 
> currently treat it as an abortable error. This makes some sense because 
> OutOfSequence won't prevent us from being able to send the EndTxn to abort 
> the transaction. The problem is that the producer, even after aborting, still 
> won't be able to send to the topic with an OutOfSequence. One way to deal 
> with this is to ask the user to call {{initTransactions()}} again to bump the 
> epoch, but this is a bit difficult to explain and could be dangerous since it 
> renders zombie checking less effective. Probably we should just consider 
> OutOfSequence fatal for the transactional producer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5284) Add tools and metrics to diagnose problems with the idempotent producer and transactions

2017-08-16 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5284:

Fix Version/s: (was: 0.11.0.2)
   1.0.0

> Add tools and metrics to diagnose problems with the idempotent producer and 
> transactions
> 
>
> Key: KAFKA-5284
> URL: https://issues.apache.org/jira/browse/KAFKA-5284
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Apurva Mehta
>  Labels: exactly-once
> Fix For: 1.0.0
>
>
> The KIP mentions a number of metrics which we should add, but haven't yet 
> done so. IT would also be good to have tools to help diagnose degenerate 
> situations like:
> # If a consumer is stuck, we should be able to find the LSO of the partition 
> it is blocked on, and which producer is holding up the advancement of the LSO.
> # We should be able to force abort any inflight transaction to free up 
> consumers
> # etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5403) Transactions system test should dedup consumed messages by offset

2017-08-16 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129866#comment-16129866
 ] 

Apurva Mehta commented on KAFKA-5403:
-

I think we should punt on this. The problems with the patch are not easy to fix 
as the {{VerifiableConsumer}} validates that offsets are sequential, which is 
not true when you have transactions. And the {{ConsoleConsumer}} doesn't expose 
offsets. So modifying either breaking compatibility will take time.

Also, the system test has been running reliably for months without suffering 
any problems with duplicate reads on the same offset.

> Transactions system test should dedup consumed messages by offset
> -
>
> Key: KAFKA-5403
> URL: https://issues.apache.org/jira/browse/KAFKA-5403
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> In KAFKA-5396, we saw that the consumers which verify the data in multiple 
> topics could read the same offsets multiple times, for instance when a 
> rebalance happens. 
> This would detect spurious duplicates, causing the test to fail. We should 
> dedup the consumed messages by offset and only fail the test if we have 
> duplicate values for a if for a unique set of offsets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5745) Partition.makeLeader() should convert HW to OffsetMetadata before becoming the leader

2017-08-16 Thread huxihx (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxihx reassigned KAFKA-5745:
-

Assignee: huxihx

> Partition.makeLeader() should convert HW to OffsetMetadata before becoming 
> the leader
> -
>
> Key: KAFKA-5745
> URL: https://issues.apache.org/jira/browse/KAFKA-5745
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Jun Rao
>Assignee: huxihx
>
> Saw the following uncaught exception in the log.
> ERROR [KafkaApi-411] Error when handling request Name: FetchRequest; Version: 
> 3; CorrelationId: 708572; ClientId: client-0; ReplicaId: -1; MaxWait: 500 ms; 
> MinBytes: 1 bytes; MaxBytes:52428800 bytes; RequestInfo: 
> ([topic1,3],PartitionFetchInfo(953794,1048576)) (kafka.server.KafkaApis) 
> org.apache.kafka.common.KafkaException: 953793 [-1 : -1] cannot compare its 
> segment info with 953794 [829749 : 564535961] since it only has message 
> offset info 
> at kafka.server.LogOffsetMetadata.onOlderSegment(LogOffsetMetadata.scala:48) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:93) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:77) 
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
> at kafka.server.DelayedFetch.tryComplete(DelayedFetch.scala:77) 
> at kafka.server.DelayedOperation.safeTryComplete(DelayedOperation.scala:104)
> at 
> kafka.server.DelayedOperationPurgatory.tryCompleteElseWatch(DelayedOperation.scala:196)
>  
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:516) 
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) 
> at kafka.server.KafkaApis.handle(KafkaApis.scala:79) 
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) 
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5745) Partition.makeLeader() should convert HW to OffsetMetadata before becoming the leader

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129839#comment-16129839
 ] 

ASF GitHub Bot commented on KAFKA-5745:
---

GitHub user huxihx opened a pull request:

https://github.com/apache/kafka/pull/3682

KAFKA-5745: makeLeader should invoke `convertHWToLocalOffsetMetadata` 
before marking it as leader



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huxihx/kafka KAFKA-5745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3682


commit 3d7c4081428a7eae0ba1cb26e017b9bcbc1603c7
Author: huxihx 
Date:   2017-08-17T03:47:25Z

KAFKA-5745: makeLeader should invoke `convertHWToLocalOffsetMetadata` 
before marking local replica as the leader.




> Partition.makeLeader() should convert HW to OffsetMetadata before becoming 
> the leader
> -
>
> Key: KAFKA-5745
> URL: https://issues.apache.org/jira/browse/KAFKA-5745
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Jun Rao
>
> Saw the following uncaught exception in the log.
> ERROR [KafkaApi-411] Error when handling request Name: FetchRequest; Version: 
> 3; CorrelationId: 708572; ClientId: client-0; ReplicaId: -1; MaxWait: 500 ms; 
> MinBytes: 1 bytes; MaxBytes:52428800 bytes; RequestInfo: 
> ([topic1,3],PartitionFetchInfo(953794,1048576)) (kafka.server.KafkaApis) 
> org.apache.kafka.common.KafkaException: 953793 [-1 : -1] cannot compare its 
> segment info with 953794 [829749 : 564535961] since it only has message 
> offset info 
> at kafka.server.LogOffsetMetadata.onOlderSegment(LogOffsetMetadata.scala:48) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:93) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:77) 
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
> at kafka.server.DelayedFetch.tryComplete(DelayedFetch.scala:77) 
> at kafka.server.DelayedOperation.safeTryComplete(DelayedOperation.scala:104)
> at 
> kafka.server.DelayedOperationPurgatory.tryCompleteElseWatch(DelayedOperation.scala:196)
>  
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:516) 
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) 
> at kafka.server.KafkaApis.handle(KafkaApis.scala:79) 
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) 
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5676) MockStreamsMetrics should be in o.a.k.test

2017-08-16 Thread Chanchal Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129758#comment-16129758
 ] 

Chanchal Singh commented on KAFKA-5676:
---

testMetrics() in all three classes seems to be depend on a real metrics 
implementation , ProcessorNodeTest class and NamedCacheTest class. i have 
created pull request please review


> MockStreamsMetrics should be in o.a.k.test
> --
>
> Key: KAFKA-5676
> URL: https://issues.apache.org/jira/browse/KAFKA-5676
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Chanchal Singh
>  Labels: newbie
>  Time Spent: 96h
>  Remaining Estimate: 0h
>
> {{MockStreamsMetrics}}'s package should be `o.a.k.test` not 
> `o.a.k.streams.processor.internals`. 
> In addition, it should not require a {{Metrics}} parameter in its constructor 
> as it is only needed for its extended base class; the right way of mocking 
> should be implementing {{StreamsMetrics}} with mock behavior than extended a 
> real implementaion of {{StreamsMetricsImpl}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5745) Partition.makeLeader() should convert HW to OffsetMetadata before becoming the leader

2017-08-16 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129725#comment-16129725
 ] 

Jun Rao commented on KAFKA-5745:


The HW in the follower only includes the offset, but not the log segment and 
segment position. When a follower changes to a leader, in 
Partition.makeLeader(), we first mark the replica as the leader and then call 
leaderReplica.convertHWToLocalOffsetMetadata(), which adds the log segment info 
to the HW. So, if a consumer fetches data between these two steps,  
DelayedFetch.tryComplete() will hit the exception in the description since HW 
is missing log segment info.

We can fix this issue by first calling 
leaderReplica.convertHWToLocalOffsetMetadata() and then marking the replica as 
the leader.

> Partition.makeLeader() should convert HW to OffsetMetadata before becoming 
> the leader
> -
>
> Key: KAFKA-5745
> URL: https://issues.apache.org/jira/browse/KAFKA-5745
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.0
>Reporter: Jun Rao
>
> Saw the following uncaught exception in the log.
> ERROR [KafkaApi-411] Error when handling request Name: FetchRequest; Version: 
> 3; CorrelationId: 708572; ClientId: client-0; ReplicaId: -1; MaxWait: 500 ms; 
> MinBytes: 1 bytes; MaxBytes:52428800 bytes; RequestInfo: 
> ([topic1,3],PartitionFetchInfo(953794,1048576)) (kafka.server.KafkaApis) 
> org.apache.kafka.common.KafkaException: 953793 [-1 : -1] cannot compare its 
> segment info with 953794 [829749 : 564535961] since it only has message 
> offset info 
> at kafka.server.LogOffsetMetadata.onOlderSegment(LogOffsetMetadata.scala:48) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:93) 
> at 
> kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:77) 
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
> at kafka.server.DelayedFetch.tryComplete(DelayedFetch.scala:77) 
> at kafka.server.DelayedOperation.safeTryComplete(DelayedOperation.scala:104)
> at 
> kafka.server.DelayedOperationPurgatory.tryCompleteElseWatch(DelayedOperation.scala:196)
>  
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:516) 
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) 
> at kafka.server.KafkaApis.handle(KafkaApis.scala:79) 
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) 
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5745) Partition.makeLeader() should convert HW to OffsetMetadata before becoming the leader

2017-08-16 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-5745:
--

 Summary: Partition.makeLeader() should convert HW to 
OffsetMetadata before becoming the leader
 Key: KAFKA-5745
 URL: https://issues.apache.org/jira/browse/KAFKA-5745
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.11.0.0
Reporter: Jun Rao


Saw the following uncaught exception in the log.

ERROR [KafkaApi-411] Error when handling request Name: FetchRequest; Version: 
3; CorrelationId: 708572; ClientId: client-0; ReplicaId: -1; MaxWait: 500 ms; 
MinBytes: 1 bytes; MaxBytes:52428800 bytes; RequestInfo: 
([topic1,3],PartitionFetchInfo(953794,1048576)) (kafka.server.KafkaApis) 
org.apache.kafka.common.KafkaException: 953793 [-1 : -1] cannot compare its 
segment info with 953794 [829749 : 564535961] since it only has message offset 
info 
at kafka.server.LogOffsetMetadata.onOlderSegment(LogOffsetMetadata.scala:48) 
at 
kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:93) 
at 
kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:77) 
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
at kafka.server.DelayedFetch.tryComplete(DelayedFetch.scala:77) 
at kafka.server.DelayedOperation.safeTryComplete(DelayedOperation.scala:104)
at 
kafka.server.DelayedOperationPurgatory.tryCompleteElseWatch(DelayedOperation.scala:196)
 
at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:516) 
at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) 
at kafka.server.KafkaApis.handle(KafkaApis.scala:79) 
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) 
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5233) Changes to punctuate semantics (KIP-138)

2017-08-16 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129646#comment-16129646
 ] 

Guozhang Wang commented on KAFKA-5233:
--

Thanks [~mihbor]!

> Changes to punctuate semantics (KIP-138)
> 
>
> Key: KAFKA-5233
> URL: https://issues.apache.org/jira/browse/KAFKA-5233
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>  Labels: kip
> Fix For: 1.0.0
>
>
> This ticket is to track implementation of 
> [KIP-138: Change punctuate 
> semantics|https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+Change+punctuate+semantics]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5733) System tests get exception RocksDBException: db has more levels than options.num_levels

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129642#comment-16129642
 ] 

ASF GitHub Bot commented on KAFKA-5733:
---

GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3681

KAFKA-5733: RocksDB bulk load with lower number of levels.

This is to complete Bill's PR on KAFKA-5733, incorporating the suggestion 
in https://github.com/facebook/rocksdb/issues/2734.

Some minor changes: move `open = true` in `openDB`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka K5733-rocksdb-bulk-load

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3681


commit 2cfc54c8e77b840acc0d176cea8eabda4d1c4b03
Author: Bill Bejeck 
Date:   2017-08-14T14:46:08Z

KAFKA-5733: ensure clean RocksDB directory before setting 
prepareForBulkload settings

commit f468147455c9f8eaf640f2821b7849b7b370968d
Author: Bill Bejeck 
Date:   2017-08-14T16:57:29Z

KAFKA-5733: address comments

commit 149634b405cd5d73ec0e27c9a55b16f9971abf1f
Author: Bill Bejeck 
Date:   2017-08-14T18:20:05Z

KAFKA-5733: address additional comments, enforce not opening
and closing if not a clean RocksDB dir on restore

commit b4cd505b5c5cb9e57df4728414b5194a65d14b1c
Author: Guozhang Wang 
Date:   2017-08-16T22:17:17Z

Merge branch 'KAFKA-5733_rocks_db_throws_more_than_num_levels_on_restore' 
of https://github.com/bbejeck/kafka into K5733-rocksdb-bulk-load

commit d0ada11eb8e97fbd5efdfb4c66160a81970575e4
Author: Guozhang Wang 
Date:   2017-08-16T22:45:16Z

use compactRange to still enable bulk loading

commit 08da1f16c8aeb587dd8b1ac7b5192f1ab3395cd7
Author: Guozhang Wang 
Date:   2017-08-16T23:53:02Z

minor fixes




> System tests get exception RocksDBException: db has more levels than 
> options.num_levels
> ---
>
> Key: KAFKA-5733
> URL: https://issues.apache.org/jira/browse/KAFKA-5733
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.1
>Reporter: Eno Thereska
> Fix For: 0.11.0.1
>
>
> New system tests as part of KAFKA-5725 with PR 
> https://github.com/apache/kafka/pull/3656 consistently fail like this:
> [2017-08-14 10:37:57,216] ERROR User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group SmokeTest failed on partition assignment 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> org.apache.kafka.streams.errors.ProcessorStateException: Error opening store 
> cntStoreName at location /mnt/streams/SmokeTest/2_0/rocksdb/cntStoreName
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:206)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:176)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.toggleDbForBulkLoading(RocksDBStore.java:259)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore.access$200(RocksDBStore.java:74)
>   at 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.onRestoreStart(RocksDBStore.java:551)
>   at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.onRestoreStart(CompositeRestoreListener.java:54)
>   at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restoreStarted(StateRestorer.java:61)
>   at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:126)
>   at 
> org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:94)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:177)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:368)
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:317)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>   at 
> 

[jira] [Commented] (KAFKA-5743) All ducktape services should store their files in subdirectories of /mnt

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129641#comment-16129641
 ] 

ASF GitHub Bot commented on KAFKA-5743:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3680

KAFKA-5743. Ducktape services should use subdirs of /mnt



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3680


commit 65d6737b654f49a32f5c92be9bb0c47a56425137
Author: Colin P. Mccabe 
Date:   2017-08-16T23:13:48Z

KAFKA-5743. Ducktape services should use subdirs of /mnt




> All ducktape services should store their files in subdirectories of /mnt
> 
>
> Key: KAFKA-5743
> URL: https://issues.apache.org/jira/browse/KAFKA-5743
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Currently, some ducktape services like KafkaService store their files 
> directly in /mnt.  This means that cleanup involves running {{rm -rf 
> /mnt/*}}.  It would be better if services stored their files in 
> subdirectories of mount.  For example, KafkaService could store its files in 
> /mnt/kafka.  This would make cleanup simpler and avoid the need to remove all 
> of /mnt.  It would also make running multiple services on the same node 
> possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5743) All ducktape services should store their files in subdirectories of /mnt

2017-08-16 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-5743:
---
Description: Currently, some ducktape services like KafkaService store 
their files directly in /mnt.  This means that cleanup involves running {{rm 
-rf /mnt/*}}.  It would be better if services stored their files in 
subdirectories of mount.  For example, KafkaService could store its files in 
/mnt/kafka.  This would make cleanup simpler and avoid the need to remove all 
of /mnt.  It would also make running multiple services on the same node 
possible.  (was: Currently, ducktape services like KafkaService store their 
files directly in /mnt.  This means that cleanup involves running {{rm -rf 
/mnt/*}}.  It would be better if services stored their files in subdirectories 
of mount.  For example, KafkaService could store its files in /mnt/kafka.  This 
would make cleanup simpler and avoid the need to remove all of /mnt.  It would 
also make running multiple services on the same node possible.)

> All ducktape services should store their files in subdirectories of /mnt
> 
>
> Key: KAFKA-5743
> URL: https://issues.apache.org/jira/browse/KAFKA-5743
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Currently, some ducktape services like KafkaService store their files 
> directly in /mnt.  This means that cleanup involves running {{rm -rf 
> /mnt/*}}.  It would be better if services stored their files in 
> subdirectories of mount.  For example, KafkaService could store its files in 
> /mnt/kafka.  This would make cleanup simpler and avoid the need to remove all 
> of /mnt.  It would also make running multiple services on the same node 
> possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5743) All ducktape services should store their files in subdirectories of /mnt

2017-08-16 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-5743:
---
Summary: All ducktape services should store their files in subdirectories 
of /mnt  (was: Ducktape services should store their files in subdirectories of 
/mnt)

> All ducktape services should store their files in subdirectories of /mnt
> 
>
> Key: KAFKA-5743
> URL: https://issues.apache.org/jira/browse/KAFKA-5743
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Currently, ducktape services like KafkaService store their files directly in 
> /mnt.  This means that cleanup involves running {{rm -rf /mnt/*}}.  It would 
> be better if services stored their files in subdirectories of mount.  For 
> example, KafkaService could store its files in /mnt/kafka.  This would make 
> cleanup simpler and avoid the need to remove all of /mnt.  It would also make 
> running multiple services on the same node possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5744) ShellTest: add tests for attempting to run nonexistent program, error return

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129560#comment-16129560
 ] 

ASF GitHub Bot commented on KAFKA-5744:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3679

KAFKA-5744: ShellTest: add tests for attempting to run nonexistent pr…

…ogram, error return

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3679


commit dea4535693dd4980971c4b615c4cf035de65b082
Author: Colin P. Mccabe 
Date:   2017-08-16T22:46:53Z

KAFKA-5744: ShellTest: add tests for attempting to run nonexistent program, 
error return




> ShellTest: add tests for attempting to run nonexistent program, error return
> 
>
> Key: KAFKA-5744
> URL: https://issues.apache.org/jira/browse/KAFKA-5744
> Project: Kafka
>  Issue Type: Improvement
>  Components: unit tests
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> ShellTest should have tests for attempting to run nonexistent program, and 
> running a program which returns an error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5744) ShellTest: add tests for attempting to run nonexistent program, error return

2017-08-16 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-5744:
--

 Summary: ShellTest: add tests for attempting to run nonexistent 
program, error return
 Key: KAFKA-5744
 URL: https://issues.apache.org/jira/browse/KAFKA-5744
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


ShellTest should have tests for attempting to run nonexistent program, and 
running a program which returns an error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5076) Remove usage of java.xml.bind.* classes hidden by default in JDK9

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129557#comment-16129557
 ] 

ASF GitHub Bot commented on KAFKA-5076:
---

Github user xvrl closed the pull request at:

https://github.com/apache/kafka/pull/2862


> Remove usage of java.xml.bind.* classes hidden by default in JDK9
> -
>
> Key: KAFKA-5076
> URL: https://issues.apache.org/jira/browse/KAFKA-5076
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Xavier Léauté
>Assignee: Xavier Léauté
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5233) Changes to punctuate semantics (KIP-138)

2017-08-16 Thread Michal Borowiecki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129556#comment-16129556
 ] 

Michal Borowiecki commented on KAFKA-5233:
--

Hi [~guozhang], 
I had a look at KStreamTestDriver and the punctuate method defined there is 
only ever called from the KStreamTransformTest to trigger the deprecated 
punctuate method on the Transformer interface. The test validates that records 
returned from Transformer.punctuate are correctly forwarded and processed by 
the downstream Processor.
I think there is value it preserving this test until the deprecated method is 
removed. However, I've added a new equivalent test to check that calling 
context.forward from within the Punctuator callback achieves the same result. I 
renamed KStreamTestDriver's punctuate to punctuateDeprecated and annotated it 
as such to make it clearer it's only left to test deprecated functionality so 
that it's not accidentally used for new developments.

> Changes to punctuate semantics (KIP-138)
> 
>
> Key: KAFKA-5233
> URL: https://issues.apache.org/jira/browse/KAFKA-5233
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>  Labels: kip
> Fix For: 1.0.0
>
>
> This ticket is to track implementation of 
> [KIP-138: Change punctuate 
> semantics|https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+Change+punctuate+semantics]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5743) Ducktape services should store their files in subdirectories of /mnt

2017-08-16 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-5743:
--

 Summary: Ducktape services should store their files in 
subdirectories of /mnt
 Key: KAFKA-5743
 URL: https://issues.apache.org/jira/browse/KAFKA-5743
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Currently, ducktape services like KafkaService store their files directly in 
/mnt.  This means that cleanup involves running {{rm -rf /mnt/*}}.  It would be 
better if services stored their files in subdirectories of mount.  For example, 
KafkaService could store its files in /mnt/kafka.  This would make cleanup 
simpler and avoid the need to remove all of /mnt.  It would also make running 
multiple services on the same node possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5233) Changes to punctuate semantics (KIP-138)

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129533#comment-16129533
 ] 

ASF GitHub Bot commented on KAFKA-5233:
---

GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/3678

KAFKA-5233 follow up



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3678


commit f6da0be924c6fb657d532ea4cf2b1877354e00d1
Author: Michal Borowiecki 
Date:   2017-08-16T22:21:33Z

KAFKA-5233 follow up




> Changes to punctuate semantics (KIP-138)
> 
>
> Key: KAFKA-5233
> URL: https://issues.apache.org/jira/browse/KAFKA-5233
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>  Labels: kip
> Fix For: 1.0.0
>
>
> This ticket is to track implementation of 
> [KIP-138: Change punctuate 
> semantics|https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+Change+punctuate+semantics]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5567) With transformations that mutate the topic-partition committing offsets should to refer to the original topic-partition

2017-08-16 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-5567.
--
Resolution: Fixed

> With transformations that mutate the topic-partition committing offsets 
> should to refer to the original topic-partition
> ---
>
> Key: KAFKA-5567
> URL: https://issues.apache.org/jira/browse/KAFKA-5567
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0, 0.10.2.1, 0.11.0.0
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Blocker
> Fix For: 0.11.0.1, 1.0.0
>
>
>   When a chain of transformations (SMTs) that mutate a record's 
> topic-partition is applied, then Connect is unable to map the transformed 
> record to its original topic-partition. This affects committing offsets. 
>  Currently, in order to reproduce the issue one could use the 
> {{TimestampRouter}} transformation with a sink connector such as the 
> {{FileStreamSinkConnector}}.
>   In this ticket we'll address the issue for connectors that don't 
> manage/commit their offsets themselves. For the connectors that do such 
> management, broader API changes are required to supply the connectors with 
> the necessary information that will allow them to map a transformed record to 
> the original. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5742) Support passing ZK chroot in system tests

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129436#comment-16129436
 ] 

ASF GitHub Bot commented on KAFKA-5742:
---

GitHub user xvrl opened a pull request:

https://github.com/apache/kafka/pull/3677

KAFKA-5742 support ZK chroot in system tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xvrl/kafka support-zk-chroot-in-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3677


commit e2c6ed282c1d6d0a373def99fa0243e654cce37d
Author: Xavier Léauté 
Date:   2017-08-16T20:58:58Z

KAFKA-5742 support ZK chroot in system tests




> Support passing ZK chroot in system tests
> -
>
> Key: KAFKA-5742
> URL: https://issues.apache.org/jira/browse/KAFKA-5742
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Xavier Léauté
>Assignee: Xavier Léauté
>
> Currently spinning up multiple Kafka clusters in a system tests requires at 
> least one ZK node per Kafka cluster, which wastes a lot of resources. We 
> currently also don't test anything outside of the ZK root path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5742) Support passing ZK chroot in system tests

2017-08-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xavier Léauté reassigned KAFKA-5742:


Assignee: Xavier Léauté

> Support passing ZK chroot in system tests
> -
>
> Key: KAFKA-5742
> URL: https://issues.apache.org/jira/browse/KAFKA-5742
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Xavier Léauté
>Assignee: Xavier Léauté
>
> Currently spinning up multiple Kafka clusters in a system tests requires at 
> least one ZK node per Kafka cluster, which wastes a lot of resources. We 
> currently also don't test anything outside of the ZK root path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5742) Support passing ZK chroot in system tests

2017-08-16 Thread JIRA
Xavier Léauté created KAFKA-5742:


 Summary: Support passing ZK chroot in system tests
 Key: KAFKA-5742
 URL: https://issues.apache.org/jira/browse/KAFKA-5742
 Project: Kafka
  Issue Type: Test
  Components: system tests
Reporter: Xavier Léauté


Currently spinning up multiple Kafka clusters in a system tests requires at 
least one ZK node per Kafka cluster, which wastes a lot of resources. We 
currently also don't test anything outside of the ZK root path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5740) Use separate file for HTTP logs

2017-08-16 Thread CJ Woolard (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129389#comment-16129389
 ] 

CJ Woolard commented on KAFKA-5740:
---

For deployments in Mesos, where you typically have a scheduled health check 
assigned to your Kafka Connect distributed worker instances, you end up with 
quite a few HTTP log lines in the Kafka Connect logs. This can sometimes make 
it difficult to inspect the output of the Kafka Connect tasks themselves 
(without additional tooling). As this ticket describes it might be helpful to 
separate out the HTTP logs from the 'normal' Kafka Connect logs in order to 
ease debugging scenarios. 

> Use separate file for HTTP logs
> ---
>
> Key: KAFKA-5740
> URL: https://issues.apache.org/jira/browse/KAFKA-5740
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Priority: Critical
>  Labels: monitoring, usability
>
> Currently the HTTP logs are interspersed in the normal output. However, 
> client usage/requests should be logged to a separate file.
> Question: should the HTTP logs be *removed* from the normal Connect logs?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5441) Fix transaction marker grouping by producerId in TC

2017-08-16 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-5441:
-
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2
   1.0.0

> Fix transaction marker grouping by producerId in TC
> ---
>
> Key: KAFKA-5441
> URL: https://issues.apache.org/jira/browse/KAFKA-5441
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Guozhang Wang
>  Labels: exactly-once
> Fix For: 1.0.0, 0.11.0.2
>
>
> It seems in some cases the WriteTxnMarker request can be sent with multiple 
> entries for the same ProducerId. This is the cause of KAFKA-5438. This is not 
> necessarily a correctness problem, but it seems unintentional and should 
> probably be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5441) Fix transaction marker grouping by producerId in TC

2017-08-16 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129239#comment-16129239
 ] 

Guozhang Wang commented on KAFKA-5441:
--

I'll not do it for 0.11.0.1, moving to 0.11.0.1 / 1.0.0 for now.

> Fix transaction marker grouping by producerId in TC
> ---
>
> Key: KAFKA-5441
> URL: https://issues.apache.org/jira/browse/KAFKA-5441
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Guozhang Wang
>  Labels: exactly-once
> Fix For: 1.0.0, 0.11.0.2
>
>
> It seems in some cases the WriteTxnMarker request can be sent with multiple 
> entries for the same ProducerId. This is the cause of KAFKA-5438. This is not 
> necessarily a correctness problem, but it seems unintentional and should 
> probably be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5545) Kafka Stream not able to successfully restart over new broker ip

2017-08-16 Thread Yogesh BG (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129229#comment-16129229
 ] 

Yogesh BG commented on KAFKA-5545:
--

thank you




> Kafka Stream not able to successfully restart over new broker ip
> 
>
> Key: KAFKA-5545
> URL: https://issues.apache.org/jira/browse/KAFKA-5545
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Yogesh BG
>Priority: Critical
> Fix For: 0.11.0.1, 1.0.0
>
> Attachments: kafkastreams.log
>
>
> Hi
> I have one kafka broker and one kafka stream application
> initially kafka stream connected and starts processing data. Then i restart 
> the broker. When broker restarts new ip will be assigned.
> In kafka stream i have a 5min interval thread which checks if broker ip 
> changed and if changed, we cleanup the stream, rebuild topology(tried with 
> reusing topology) and start the stream again. I end up with the following 
> exceptions.
> 11:04:08.032 [StreamThread-38] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-38] Creating active task 0_5 with assigned 
> partitions [PR-5]
> 11:04:08.033 [StreamThread-41] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-41] Creating active task 0_1 with assigned 
> partitions [PR-1]
> 11:04:08.036 [StreamThread-34] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-34] Creating active task 0_7 with assigned 
> partitions [PR-7]
> 11:04:08.036 [StreamThread-37] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-37] Creating active task 0_3 with assigned 
> partitions [PR-3]
> 11:04:08.036 [StreamThread-45] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-45] Creating active task 0_0 with assigned 
> partitions [PR-0]
> 11:04:08.037 [StreamThread-36] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-36] Creating active task 0_4 with assigned 
> partitions [PR-4]
> 11:04:08.037 [StreamThread-43] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-43] Creating active task 0_6 with assigned 
> partitions [PR-6]
> 11:04:08.038 [StreamThread-48] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-48] Creating active task 0_2 with assigned 
> partitions [PR-2]
> 11:04:09.034 [StreamThread-38] WARN  o.a.k.s.p.internals.StreamThread - Could 
> not create task 0_5. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_5] Failed to lock the 
> state directory for task 0_5
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> 

[jira] [Updated] (KAFKA-5545) Kafka Stream not able to successfully restart over new broker ip

2017-08-16 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-5545:
-
Fix Version/s: 1.0.0
   0.11.0.1

> Kafka Stream not able to successfully restart over new broker ip
> 
>
> Key: KAFKA-5545
> URL: https://issues.apache.org/jira/browse/KAFKA-5545
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Yogesh BG
>Priority: Critical
> Fix For: 0.11.0.1, 1.0.0
>
> Attachments: kafkastreams.log
>
>
> Hi
> I have one kafka broker and one kafka stream application
> initially kafka stream connected and starts processing data. Then i restart 
> the broker. When broker restarts new ip will be assigned.
> In kafka stream i have a 5min interval thread which checks if broker ip 
> changed and if changed, we cleanup the stream, rebuild topology(tried with 
> reusing topology) and start the stream again. I end up with the following 
> exceptions.
> 11:04:08.032 [StreamThread-38] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-38] Creating active task 0_5 with assigned 
> partitions [PR-5]
> 11:04:08.033 [StreamThread-41] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-41] Creating active task 0_1 with assigned 
> partitions [PR-1]
> 11:04:08.036 [StreamThread-34] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-34] Creating active task 0_7 with assigned 
> partitions [PR-7]
> 11:04:08.036 [StreamThread-37] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-37] Creating active task 0_3 with assigned 
> partitions [PR-3]
> 11:04:08.036 [StreamThread-45] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-45] Creating active task 0_0 with assigned 
> partitions [PR-0]
> 11:04:08.037 [StreamThread-36] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-36] Creating active task 0_4 with assigned 
> partitions [PR-4]
> 11:04:08.037 [StreamThread-43] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-43] Creating active task 0_6 with assigned 
> partitions [PR-6]
> 11:04:08.038 [StreamThread-48] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-48] Creating active task 0_2 with assigned 
> partitions [PR-2]
> 11:04:09.034 [StreamThread-38] WARN  o.a.k.s.p.internals.StreamThread - Could 
> not create task 0_5. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_5] Failed to lock the 
> state directory for task 0_5
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> 

[jira] [Commented] (KAFKA-5545) Kafka Stream not able to successfully restart over new broker ip

2017-08-16 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129204#comment-16129204
 ] 

Guozhang Wang commented on KAFKA-5545:
--

[~yogeshbelur] We have KAFKA-5152 which summarizes the root cause of this 
issue, and we already have a fix in the 0.11 branch. It will be included in the 
0.11.0.1 bug fix release which is expected to be out soon.

Note that upgrading Streams client from 0.10.2.x to 0.11.0.x should be smooth 
as there is no backward incompatibility changes.

> Kafka Stream not able to successfully restart over new broker ip
> 
>
> Key: KAFKA-5545
> URL: https://issues.apache.org/jira/browse/KAFKA-5545
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Yogesh BG
>Priority: Critical
> Attachments: kafkastreams.log
>
>
> Hi
> I have one kafka broker and one kafka stream application
> initially kafka stream connected and starts processing data. Then i restart 
> the broker. When broker restarts new ip will be assigned.
> In kafka stream i have a 5min interval thread which checks if broker ip 
> changed and if changed, we cleanup the stream, rebuild topology(tried with 
> reusing topology) and start the stream again. I end up with the following 
> exceptions.
> 11:04:08.032 [StreamThread-38] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-38] Creating active task 0_5 with assigned 
> partitions [PR-5]
> 11:04:08.033 [StreamThread-41] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-41] Creating active task 0_1 with assigned 
> partitions [PR-1]
> 11:04:08.036 [StreamThread-34] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-34] Creating active task 0_7 with assigned 
> partitions [PR-7]
> 11:04:08.036 [StreamThread-37] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-37] Creating active task 0_3 with assigned 
> partitions [PR-3]
> 11:04:08.036 [StreamThread-45] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-45] Creating active task 0_0 with assigned 
> partitions [PR-0]
> 11:04:08.037 [StreamThread-36] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-36] Creating active task 0_4 with assigned 
> partitions [PR-4]
> 11:04:08.037 [StreamThread-43] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-43] Creating active task 0_6 with assigned 
> partitions [PR-6]
> 11:04:08.038 [StreamThread-48] INFO  o.a.k.s.p.internals.StreamThread - 
> stream-thread [StreamThread-48] Creating active task 0_2 with assigned 
> partitions [PR-2]
> 11:04:09.034 [StreamThread-38] WARN  o.a.k.s.p.internals.StreamThread - Could 
> not create task 0_5. Will retry.
> org.apache.kafka.streams.errors.LockException: task [0_5] Failed to lock the 
> state directory for task 0_5
> at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:100)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:73)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:108)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:864)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:1237)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
>  ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:967)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$600(StreamThread.java:69)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:234)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:259)
>  [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:352)
>  

[jira] [Resolved] (KAFKA-3796) SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-3796.
--
Resolution: Cannot Reproduce

 Pl reopen if you think the issue still exists

> SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk
> --
>
> Key: KAFKA-3796
> URL: https://issues.apache.org/jira/browse/KAFKA-3796
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, security
>Affects Versions: 0.10.0.0
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Trivial
>
> org.apache.kafka.common.network.SslTransportLayerTest > 
> testEndpointIdentificationDisabled FAILED
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
> at 
> org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:48)
> at 
> org.apache.kafka.common.network.SslTransportLayerTest.testEndpointIdentificationDisabled(SslTransportLayerTest.java:120)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5696) SourceConnector does not commit offset on rebalance

2017-08-16 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5696:
-
Priority: Critical  (was: Major)

> SourceConnector does not commit offset on rebalance
> ---
>
> Key: KAFKA-5696
> URL: https://issues.apache.org/jira/browse/KAFKA-5696
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Oleg Kuznetsov
>Priority: Critical
>  Labels: newbie
>
> I'm running SourceConnector, that reads files from storage and put data in 
> kafka. I want, in case of reconfiguration, offsets to be flushed. 
> Say, a file is completely processed, but source records are not yet committed 
> and in case of reconfiguration their offsets might be missing in store.
> Is it possible to force committing offsets on reconfiguration?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5741) Prioritize threads in Connect distributed worker process

2017-08-16 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5741:
-
Priority: Critical  (was: Blocker)

> Prioritize threads in Connect distributed worker process
> 
>
> Key: KAFKA-5741
> URL: https://issues.apache.org/jira/browse/KAFKA-5741
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Priority: Critical
>
> Connect's distributed worker process uses the {{DistributedHerder}} to 
> perform all administrative operations, including: starting, stopping, 
> pausing, resuming, reconfiguring connectors; rebalancing; etc. The 
> {{DistributedHerder}} uses a single threaded executor service to do all this 
> work and to do it sequentially. If this thread gets preempted for any reason 
> (e.g., connector tasks are bogging down the process, DoS, etc.), then the 
> herder's membership in the group may be dropped, causing a rebalance.
> This herder thread should be run at a much higher priority than all of the 
> other threads in the system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5696) SourceConnector does not commit offset on rebalance

2017-08-16 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5696:
-
Fix Version/s: (was: 0.10.0.2)

> SourceConnector does not commit offset on rebalance
> ---
>
> Key: KAFKA-5696
> URL: https://issues.apache.org/jira/browse/KAFKA-5696
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Oleg Kuznetsov
>  Labels: newbie
>
> I'm running SourceConnector, that reads files from storage and put data in 
> kafka. I want, in case of reconfiguration, offsets to be flushed. 
> Say, a file is completely processed, but source records are not yet committed 
> and in case of reconfiguration their offsets might be missing in store.
> Is it possible to force committing offsets on reconfiguration?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4813) 2h6R1

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4813.
--
Resolution: Invalid

> 2h6R1
> -
>
> Key: KAFKA-4813
> URL: https://issues.apache.org/jira/browse/KAFKA-4813
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4803) OT6Y1

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4803.
--
Resolution: Invalid

> OT6Y1
> -
>
> Key: KAFKA-4803
> URL: https://issues.apache.org/jira/browse/KAFKA-4803
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4821) 9244L

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4821.
--
Resolution: Invalid

> 9244L
> -
>
> Key: KAFKA-4821
> URL: https://issues.apache.org/jira/browse/KAFKA-4821
> Project: Kafka
>  Issue Type: Task
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4804) TdOZY

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4804.
--
Resolution: Invalid

> TdOZY
> -
>
> Key: KAFKA-4804
> URL: https://issues.apache.org/jira/browse/KAFKA-4804
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4847) 1Y30J

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4847.
--
Resolution: Invalid

> 1Y30J
> -
>
> Key: KAFKA-4847
> URL: https://issues.apache.org/jira/browse/KAFKA-4847
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4865) 2X8BF

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4865.
--
Resolution: Invalid

> 2X8BF
> -
>
> Key: KAFKA-4865
> URL: https://issues.apache.org/jira/browse/KAFKA-4865
> Project: Kafka
>  Issue Type: Bug
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4889) 2G8lc

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4889.
--
Resolution: Invalid

> 2G8lc
> -
>
> Key: KAFKA-4889
> URL: https://issues.apache.org/jira/browse/KAFKA-4889
> Project: Kafka
>  Issue Type: Task
>Reporter: Vamsi Jakkula
>
> Creating of an issue using project keys and issue type names using the REST 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-4951) KafkaProducer may send duplicated message sometimes

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-4951.
--
Resolution: Fixed

This scenario is handled in the Idempotent producer (KIP-98) released in Kafka 
0.11.0.0.  Pl reopen if you think the issue still exists

> KafkaProducer may send duplicated message sometimes
> ---
>
> Key: KAFKA-4951
> URL: https://issues.apache.org/jira/browse/KAFKA-4951
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: cuiyang
>
> I foud that KafkaProducer may send duplicated message sometimes, which is 
> happend when:
>  In Sender thread:
>  NetworkClient::poll()
>  -> this.selector.poll()//send the message, such as "abc", and 
> send it to broker successfully
>  -> handleTimedOutRequests(responses,updatedNow);  //Judge whether 
> the message  "abc" which is sent above is expired or timeout,  and the judge 
> is  based on the parameter  this.requestTimeoutMs and updatedNow;  
>  -> response.request().callback().onComplete()
>  -> 
> completeBatch(batch,Errors.NETWORK_EXCEPTION,-1L,correlationId,now);   //If 
> themessage was judged as expired, then it will be reenqueued and send 
> repeatly next loop;
>  -> this.accumulator.reenqueue(batch,now);
> The problem comes out:  If the message "abc" is sent successfully to broker, 
> but it may be judged to expired, so the message will be sent repeately next 
> loop, which make the message duplicated.
> I can reproduce this scenario normally.
> In my opinion, I think Send::handleTimedOutRequests() is not much useful, 
> because the response of sending request from broker is succesfully and has no 
> error, which means brokers persist it successfully. And this function  will 
> induce to the duplicated message problems.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5740) Use separate file for HTTP logs

2017-08-16 Thread Randall Hauch (JIRA)
Randall Hauch created KAFKA-5740:


 Summary: Use separate file for HTTP logs
 Key: KAFKA-5740
 URL: https://issues.apache.org/jira/browse/KAFKA-5740
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 0.11.0.0
Reporter: Randall Hauch
Priority: Critical


Currently the HTTP logs are interspersed in the normal output. However, client 
usage/requests should be logged to a separate file.

Question: should the HTTP logs be *removed* from the normal Connect logs?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5723) Refactor BrokerApiVersionsCommand to use the new AdminClient

2017-08-16 Thread Andrey Dyachkov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128985#comment-16128985
 ] 

Andrey Dyachkov commented on KAFKA-5723:


[~ppatierno] thank you for clarification, as you can see (PR is there) I 
already took some your classes and change it a little bit (sorry about that). I 
really would like to proceed with this work, it would be very nice if someone 
of you take a look at PR [~ppatierno] [~ijuma]

> Refactor BrokerApiVersionsCommand to use the new AdminClient
> 
>
> Key: KAFKA-5723
> URL: https://issues.apache.org/jira/browse/KAFKA-5723
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>
> Currently it uses the deprecated AdminClient and in order to remove usages of 
> that client, this class needs to be refactored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5723) Refactor BrokerApiVersionsCommand to use the new AdminClient

2017-08-16 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128973#comment-16128973
 ] 

Paolo Patierno commented on KAFKA-5723:
---

[~adyachkov] sure you can. I think that for having more people working on 
different tools but with a common way (i.e. some common classes for options and 
so on) we should be able to commit these common classes first so that people 
can use them. For now it's all on my fork. Having these common classes 
committed to the Kafka project could help the development, at least in a branch 
not on trunk. I don't know if something like this already happened in the past 
and what the approach was. Maybe [~ijuma] can give us some advices on that ?

> Refactor BrokerApiVersionsCommand to use the new AdminClient
> 
>
> Key: KAFKA-5723
> URL: https://issues.apache.org/jira/browse/KAFKA-5723
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi
>Assignee: Viktor Somogyi
>
> Currently it uses the deprecated AdminClient and in order to remove usages of 
> that client, this class needs to be refactored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128899#comment-16128899
 ] 

Ismael Juma commented on KAFKA-4834:


cc [~junrao] [~onurkaraman]

> Kafka cannot delete topic with ReplicaStateMachine went wrong
> -
>
> Key: KAFKA-4834
> URL: https://issues.apache.org/jira/browse/KAFKA-4834
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.1.1
>Reporter: Dan
>  Labels: reliability
>
> It happened several times that some topics can not be deleted in our 
> production environment. By analyzing the log, we found ReplicaStateMachine 
> went wrong. Here are the error messages:
> In state-change.log:
> ERROR Controller 2 epoch 201 initiated state change of replica 1 for 
> partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted 
> failed (state.change.logger)
> java.lang.AssertionError: assertion failed: Replica 
> [Topic=test_create_topic1,Partition=1,Replica=1] should be in the 
> OfflineReplica states before moving to ReplicaDeletionStarted state. Instead 
> it is in OnlineReplica state
> at scala.Predef$.assert(Predef.scala:179)
> at 
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> In controller.log:
> INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip 
> sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch)
> There may exist two controllers in the cluster because creating a new topic 
> may trigger two machines to change the state of same partition, eg. 
> NonExistentPartition -> NewPartition.
> On the other controller, we found following messages in controller.log of 
> several days earlier:
> [2017-02-25 16:51:22,353] INFO [Topic Deletion Manager 0], Topic deletion 
> callback for 

[jira] [Updated] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4834:
---
Component/s: controller

> Kafka cannot delete topic with ReplicaStateMachine went wrong
> -
>
> Key: KAFKA-4834
> URL: https://issues.apache.org/jira/browse/KAFKA-4834
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.10.1.1
>Reporter: Dan
>  Labels: reliability
>
> It happened several times that some topics can not be deleted in our 
> production environment. By analyzing the log, we found ReplicaStateMachine 
> went wrong. Here are the error messages:
> In state-change.log:
> ERROR Controller 2 epoch 201 initiated state change of replica 1 for 
> partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted 
> failed (state.change.logger)
> java.lang.AssertionError: assertion failed: Replica 
> [Topic=test_create_topic1,Partition=1,Replica=1] should be in the 
> OfflineReplica states before moving to ReplicaDeletionStarted state. Instead 
> it is in OnlineReplica state
> at scala.Predef$.assert(Predef.scala:179)
> at 
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:403)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:397)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:397)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> In controller.log:
> INFO Leader not yet assigned for partition [test_create_topic1,1]. Skip 
> sending UpdateMetadataRequest. (kafka.controller.ControllerBrokerRequestBatch)
> There may exist two controllers in the cluster because creating a new topic 
> may trigger two machines to change the state of same partition, eg. 
> NonExistentPartition -> NewPartition.
> On the other controller, we found following messages in controller.log of 
> several days earlier:
> [2017-02-25 16:51:22,353] INFO [Topic Deletion Manager 0], Topic deletion 
> callback for simple_topic_090 (kafka.controller.TopicDeletionManager)

[jira] [Commented] (KAFKA-4834) Kafka cannot delete topic with ReplicaStateMachine went wrong

2017-08-16 Thread Joao Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128881#comment-16128881
 ] 

Joao Santos commented on KAFKA-4834:


We are experiencing exactly the same issue on 0.11.0.0.

Here is what we do to fix it:
 - Force controller re-election

 - If the topic is gone, cleanup is done. Otherwise, continue to next step.

 - Check that the data directories for the problematic topic do not exist on 
any of the Kafka nodes. For example, for topic "testTopic", and a Kafka data 
base directory of "/opt/kafka/logs", check there are no directories named 
"/opt/kafka/logs/testTopic-*"

 - If directories do exist, stop the node where they do exist, remove the 
directories from the filesystem, restart the node and wait for all topics to be 
in sync again ("ISR" equals "Replicas").  Repeat this step for each node 
containing those directories. It is important to wait for sync, to guarantee 
cluster availability.

 - If the topic is gone, cleanup is done. Otherwise, continue to next step.

 - Login to zookeeper using zookeeper shell

 - Delete the topic from the zookeeper tree by issuing the command "rmr 
/brokers/topics/testTopic"

 - Check that the topic is gone.



To guarantee it's in a good state, you can now create the topic manually on the 
command line and delete it again. It should delete properly this time around.

> Kafka cannot delete topic with ReplicaStateMachine went wrong
> -
>
> Key: KAFKA-4834
> URL: https://issues.apache.org/jira/browse/KAFKA-4834
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Dan
>  Labels: reliability
>
> It happened several times that some topics can not be deleted in our 
> production environment. By analyzing the log, we found ReplicaStateMachine 
> went wrong. Here are the error messages:
> In state-change.log:
> ERROR Controller 2 epoch 201 initiated state change of replica 1 for 
> partition [test_create_topic1,1] from OnlineReplica to ReplicaDeletionStarted 
> failed (state.change.logger)
> java.lang.AssertionError: assertion failed: Replica 
> [Topic=test_create_topic1,Partition=1,Replica=1] should be in the 
> OfflineReplica states before moving to ReplicaDeletionStarted state. Instead 
> it is in OnlineReplica state
> at scala.Predef$.assert(Predef.scala:179)
> at 
> kafka.controller.ReplicaStateMachine.assertValidPreviousStates(ReplicaStateMachine.scala:309)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChange(ReplicaStateMachine.scala:190)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.ReplicaStateMachine$$anonfun$handleStateChanges$2.apply(ReplicaStateMachine.scala:114)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> kafka.controller.ReplicaStateMachine.handleStateChanges(ReplicaStateMachine.scala:114)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:344)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$startReplicaDeletion$2.apply(TopicDeletionManager.scala:334)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> kafka.controller.TopicDeletionManager.startReplicaDeletion(TopicDeletionManager.scala:334)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onPartitionDeletion(TopicDeletionManager.scala:367)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:313)
> at 
> kafka.controller.TopicDeletionManager$$anonfun$kafka$controller$TopicDeletionManager$$onTopicDeletion$2.apply(TopicDeletionManager.scala:312)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$onTopicDeletion(TopicDeletionManager.scala:312)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:431)
> at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1$$anonfun$apply$mcV$sp$4.apply(TopicDeletionManager.scala:403)
> at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
> at 
> 

[jira] [Commented] (KAFKA-5714) Allow whitespaces in the principal name

2017-08-16 Thread Manikumar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128831#comment-16128831
 ] 

Manikumar commented on KAFKA-5714:
--

>>The point is, that I am expecting the same behavior, whether I put this name 
>>in server.properties with spaces, or without.
Ok..I got your point, but why are we expecting same behavior?  KafkaPricipal is 
formed from the name of the principal rececived from the underlying channel. In 
the case of SSL, it is string representation of the X.500 certificate.  This is 
comma separated attribute key/values string without any spaces. So we expect 
the same string to used in configs(super.users) and scripts (kafka-acls.sh). we 
also have PrincipalBuilder interface for any customization.

Not sure we want to trim white spaces from the principal name. let us hear 
others opinions on this. 


> Allow whitespaces in the principal name
> ---
>
> Key: KAFKA-5714
> URL: https://issues.apache.org/jira/browse/KAFKA-5714
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.2.1
>Reporter: Alla Tumarkin
>Assignee: Manikumar
>
> Request
> Improve parser behavior to allow whitespaces in the principal name in the 
> config file, as in:
> {code}
> super.users=User:CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, 
> C=Unknown
> {code}
> Background
> Current implementation requires that there are no whitespaces after commas, 
> i.e.
> {code}
> super.users=User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> {code}
> Note: having a semicolon at the end doesn't help, i.e. this does not work 
> either
> {code}
> super.users=User:CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, 
> C=Unknown;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128797#comment-16128797
 ] 

ASF GitHub Bot commented on KAFKA-5152:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/3675

KAFKA-5152: perform state restoration in poll loop



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-5152

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3675


commit 4da235927f9eebf2442533b44bf017454d372747
Author: Damian Guy 
Date:   2017-08-14T17:23:53Z

perform state restoration in poll loop




> Kafka Streams keeps restoring state after shutdown is initiated during startup
> --
>
> Key: KAFKA-5152
> URL: https://issues.apache.org/jira/browse/KAFKA-5152
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Xavier Léauté
>Assignee: Damian Guy
>Priority: Blocker
> Fix For: 0.11.0.1, 1.0.0
>
>
> If streams shutdown is initiated during state restore (e.g. an uncaught 
> exception is thrown) streams will not shut down until all stores are first 
> finished restoring.
> As restore progresses, stream threads appear to be taken out of service as 
> part of the shutdown sequence, causing rebalancing of tasks. This compounds 
> the problem by slowing down the restore process even further, since the 
> remaining threads now have to also restore the reassigned tasks before they 
> can shut down.
> A more severe issue is that if there is a new rebalance triggered during the 
> end of the waitingSync phase (e.g. due to a new member joining the group, or 
> some members timed out the SyncGroup response), then some consumer clients of 
> the group may already proceed with the {{onPartitionsAssigned}} and blocked 
> on trying to grab the file dir lock not yet released from other clients, 
> while the other clients holding the lock are consistently re-sending 
> {{JoinGroup}} requests while the rebalance cannot be completed because the 
> clients blocked on the file dir lock will not be kicked out of the group as 
> its heartbeat thread has been consistently sending HBRequest. Hence this is a 
> deadlock caused by not releasing the file dir locks in task suspension.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5739) Rewrite KStreamPeekTest at processor level avoiding driver usage

2017-08-16 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128795#comment-16128795
 ] 

Paolo Patierno commented on KAFKA-5739:
---

[~damianguy] what do you think about that ?

> Rewrite KStreamPeekTest at processor level avoiding driver usage
> 
>
> Key: KAFKA-5739
> URL: https://issues.apache.org/jira/browse/KAFKA-5739
> Project: Kafka
>  Issue Type: Test
>  Components: streams
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Minor
>
> Hi,
> as already done for the {{KStreamPrintTest}} we could remove the usage of 
> {{KStreamTestDriver}} even in the {{KStreamPeekTest}} and testing it at 
> processor level not at stream level.
> My proposal is to :
> * create the {{KStreamPeek}} instance providing the action which fill a 
> collection as already happens today
> * testing for both {{forwardDownStream}} values true and false
> * using the {{MockProcessorContext}} class for overriding the {{forward}} 
> method filling a streamObserved collection as happens today 
> {{forwardDownStream}} is true; checking that the {{forward}} isn't called 
> when {{forwardDownStream}} is false (so the test fails)
> Thanks,
> Paolo 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5739) Rewrite KStreamPeekTest at processor level avoiding driver usage

2017-08-16 Thread Paolo Patierno (JIRA)
Paolo Patierno created KAFKA-5739:
-

 Summary: Rewrite KStreamPeekTest at processor level avoiding 
driver usage
 Key: KAFKA-5739
 URL: https://issues.apache.org/jira/browse/KAFKA-5739
 Project: Kafka
  Issue Type: Test
  Components: streams
Reporter: Paolo Patierno
Assignee: Paolo Patierno
Priority: Minor


Hi,
as already done for the {{KStreamPrintTest}} we could remove the usage of 
{{KStreamTestDriver}} even in the {{KStreamPeekTest}} and testing it at 
processor level not at stream level.
My proposal is to :

* create the {{KStreamPeek}} instance providing the action which fill a 
collection as already happens today
* testing for both {{forwardDownStream}} values true and false
* using the {{MockProcessorContext}} class for overriding the {{forward}} 
method filling a streamObserved collection as happens today 
{{forwardDownStream}} is true; checking that the {{forward}} isn't called when 
{{forwardDownStream}} is false (so the test fails)

Thanks,
Paolo 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5054) ChangeLoggingKeyValueByteStore delete and putIfAbsent should be synchronized

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128786#comment-16128786
 ] 

ASF GitHub Bot commented on KAFKA-5054:
---

Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/3170


> ChangeLoggingKeyValueByteStore delete and putIfAbsent should be synchronized
> 
>
> Key: KAFKA-5054
> URL: https://issues.apache.org/jira/browse/KAFKA-5054
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Assignee: Damian Guy
>Priority: Critical
> Fix For: 1.0.0
>
>
> {{putIfAbsent}} and {{delete}} should be synchronized as they involve at 
> least 2 operations on the underlying store and may result in inconsistent 
> results if someone were to query via IQ



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5546) Temporary loss of availability data when the leader is disconnected

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128785#comment-16128785
 ] 

Ismael Juma commented on KAFKA-5546:


What guarantees are you aiming for? You can tweak timeouts like the producer's 
request.timeout.ms and the broker's zookeeper.session.timeout.ms so that such 
issues are detected quicker. If your network is unreliable (like AWS, for 
example), then this is likely to have undesired effects, however.

> Temporary loss of availability data when the leader is disconnected
> ---
>
> Key: KAFKA-5546
> URL: https://issues.apache.org/jira/browse/KAFKA-5546
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.1, 0.11.0.0
> Environment: docker, failing-network
>Reporter: Björn Eriksson
>
> We've noticed that if the leaders networking is deconfigured (with {{ifconfig 
> eth0 down}}) the producer won't notice this and doesn't immediately connect 
> to the newly elected leader.
> {{docker-compose.yml}} and test runner are at 
> https://github.com/owbear/kafka-network-failure-tests.
> We were expecting a transparent failover to the new leader but testing shows 
> that there's a 8-15 seconds long gap where no values are stored in the log 
> after the network is taken down.
> Tests (and results) [against 
> 0.10.2.1|https://github.com/owbear/kafka-network-failure-tests/tree/kafka-network-failure-tests-0.10.2.1]
> Tests (and results) [against 
> 0.11.0.0|https://github.com/owbear/kafka-network-failure-tests/tree/kafka-network-failure-tests-0.11.0.0]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5611) One or more consumers in a consumer-group stop consuming after rebalancing

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-5611.

Resolution: Fixed

I'm going to mark this as closed. If [~pskianis] or someone else can still see 
the issue after the fix that was merged, we can reopen.

> One or more consumers in a consumer-group stop consuming after rebalancing
> --
>
> Key: KAFKA-5611
> URL: https://issues.apache.org/jira/browse/KAFKA-5611
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Panos Skianis
>Assignee: Jason Gustafson
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: bad-server-with-more-logging-1.tar.gz, kka02, Server 1, 
> Server 2, Server 3
>
>
> Scenario: 
>   - 3 zookeepers, 4 Kafkas. 0.10.2.0, with 0.9.0 compatibility still on 
> (other apps need it but the one mentioned below is already on kafka 0.10.2.0  
> client).
>   - 3 servers running 1 consumer each under the same consumer groupId. 
>   - Servers seem to be consuming messages happily but then there is a timeout 
> to an external service that causes our app to restart the Kafka Consumer on 
> one of the servers (this is by design). That causes rebalancing of the group 
> and upon restart of one of the Consumers seem to "block".
>   - Server 3 is where the problems occur.
>   - Problem fixes itself either by restarting one of the 3 servers or cause 
> the group to rebalance again by using the console consumer with the 
> autocommit set to false and using the same group.
>  
> Note: 
>  - Haven't managed to recreate it at will yet.
>  - Mainly happens in production environment, often enough. Hence I do not 
> have any logs with DEBUG/TRACE statements yet.
>  - Extracts from log of each app server are attached. Also the log of the 
> kafka that seems to be dealing with the related group and generations.
>  - See COMMENT lines in the files for further info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-3087) Fix documentation for retention.ms property and update documentation for LogConfig.scala class

2017-08-16 Thread Manikumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-3087.
--
Resolution: Fixed

This doc issue was fixed in newer Kafka versions.

> Fix documentation for retention.ms property and update documentation for 
> LogConfig.scala class
> --
>
> Key: KAFKA-3087
> URL: https://issues.apache.org/jira/browse/KAFKA-3087
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Raju Bairishetti
>Assignee: Jay Kreps
>Priority: Critical
>  Labels: documentation
>
> Log retention settings can be set it in broker and some properties can be 
> overriden at topic level. 
> |Property |Default|Server Default property| Description|
> |retention.ms|7 days|log.retention.minutes|This configuration controls the 
> maximum time we will retain a log before we will discard old log segments to 
> free up space if we are using the "delete" retention policy. This represents 
> an SLA on how soon consumers must read their data.|
> But retention.ms is in milli seconds not in minutes. So corresponding *Server 
> Default property* should be *log.retention.ms* instead of 
> *log.retention.minutes*.
> It would be better if we mention the if the time age is in 
> millis/minutes/hours in the documentation page and documenting in code as 
> well (Right now, it is saying *age in the code*. We should specify the *age 
> in time granularity).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5737) KafkaAdminClient thread should be daemon

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128710#comment-16128710
 ] 

ASF GitHub Bot commented on KAFKA-5737:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3674


> KafkaAdminClient thread should be daemon
> 
>
> Key: KAFKA-5737
> URL: https://issues.apache.org/jira/browse/KAFKA-5737
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> The admin client thread should be daemon, for consistency with the consumer 
> and producer threads.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5738) Add cumulative count attribute for all Kafka rate metrics

2017-08-16 Thread Rajini Sivaram (JIRA)
Rajini Sivaram created KAFKA-5738:
-

 Summary: Add cumulative count attribute for all Kafka rate metrics
 Key: KAFKA-5738
 URL: https://issues.apache.org/jira/browse/KAFKA-5738
 Project: Kafka
  Issue Type: New Feature
  Components: metrics
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 1.0.0


Add cumulative count attribute to all Kafka rate metrics to make downstream 
processing simpler, more accurate, and more flexible.
 
See 
[KIP-187|https://cwiki.apache.org/confluence/display/KAFKA/KIP-187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics]
 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup

2017-08-16 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-5152:
--
Fix Version/s: (was: 0.10.2.2)
   1.0.0

> Kafka Streams keeps restoring state after shutdown is initiated during startup
> --
>
> Key: KAFKA-5152
> URL: https://issues.apache.org/jira/browse/KAFKA-5152
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Xavier Léauté
>Assignee: Damian Guy
>Priority: Blocker
> Fix For: 0.11.0.1, 1.0.0
>
>
> If streams shutdown is initiated during state restore (e.g. an uncaught 
> exception is thrown) streams will not shut down until all stores are first 
> finished restoring.
> As restore progresses, stream threads appear to be taken out of service as 
> part of the shutdown sequence, causing rebalancing of tasks. This compounds 
> the problem by slowing down the restore process even further, since the 
> remaining threads now have to also restore the reassigned tasks before they 
> can shut down.
> A more severe issue is that if there is a new rebalance triggered during the 
> end of the waitingSync phase (e.g. due to a new member joining the group, or 
> some members timed out the SyncGroup response), then some consumer clients of 
> the group may already proceed with the {{onPartitionsAssigned}} and blocked 
> on trying to grab the file dir lock not yet released from other clients, 
> while the other clients holding the lock are consistently re-sending 
> {{JoinGroup}} requests while the rebalance cannot be completed because the 
> clients blocked on the file dir lock will not be kicked out of the group as 
> its heartbeat thread has been consistently sending HBRequest. Hence this is a 
> deadlock caused by not releasing the file dir locks in task suspension.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-4950) ConcurrentModificationException when iterating over Kafka Metrics

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4950:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> ConcurrentModificationException when iterating over Kafka Metrics
> -
>
> Key: KAFKA-4950
> URL: https://issues.apache.org/jira/browse/KAFKA-4950
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Dumitru Postoronca
>Assignee: Vahid Hashemian
>Priority: Minor
> Fix For: 0.11.0.2
>
>
> It looks like the when calling {{PartitionStates.partitionSet()}}, while the 
> resulting Hashmap is being built, the internal state of the allocations can 
> change, which leads to ConcurrentModificationException during the copy 
> operation.
> {code}
> java.util.ConcurrentModificationException
> at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
> at 
> java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:119)
> at 
> org.apache.kafka.common.internals.PartitionStates.partitionSet(PartitionStates.java:66)
> at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedPartitions(SubscriptionState.java:291)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$ConsumerCoordinatorMetrics$1.measure(ConsumerCoordinator.java:783)
> at 
> org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:61)
> at 
> org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:52)
> {code}
> {code}
> // client code:
> import java.util.Collections;
> import java.util.HashMap;
> import java.util.Map;
> import com.codahale.metrics.Gauge;
> import com.codahale.metrics.Metric;
> import com.codahale.metrics.MetricSet;
> import org.apache.kafka.clients.consumer.KafkaConsumer;
> import org.apache.kafka.common.MetricName;
> import static com.codahale.metrics.MetricRegistry.name;
> public class KafkaMetricSet implements MetricSet {
> private final KafkaConsumer client;
> public KafkaMetricSet(KafkaConsumer client) {
> this.client = client;
> }
> @Override
> public Map getMetrics() {
> final Map gauges = new HashMap();
> Map m = client.metrics();
> for (Map.Entry e : 
> m.entrySet()) {
> gauges.put(name(e.getKey().group(), e.getKey().name(), "count"), 
> new Gauge() {
> @Override
> public Double getValue() {
> return e.getValue().value(); // exception thrown here 
> }
> });
> }
> return Collections.unmodifiableMap(gauges);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5327) Console Consumer should only poll for up to max messages

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5327:
---
Fix Version/s: (was: 0.11.0.1)
   1.0.0

> Console Consumer should only poll for up to max messages
> 
>
> Key: KAFKA-5327
> URL: https://issues.apache.org/jira/browse/KAFKA-5327
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Dustin Cote
>Assignee: huxihx
>Priority: Minor
> Fix For: 1.0.0
>
>
> The ConsoleConsumer has a --max-messages flag that can be used to limit the 
> number of messages consumed. However, the number of records actually consumed 
> is governed by max.poll.records. This means you see one message on the 
> console, but your offset has moved forward a default of 500, which is kind of 
> counterintuitive. It would be good to only commit offsets for messages we 
> have printed to the console.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5543) We don't remove the LastStableOffsetLag metric when a partition is moved away from a broker

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128622#comment-16128622
 ] 

Ismael Juma commented on KAFKA-5543:


[~apurva], will this make it to 0.11.0.1 (happening soon) or should it be moved 
to 0.11.0.2?

> We don't remove the LastStableOffsetLag metric when a partition is moved away 
> from a broker
> ---
>
> Key: KAFKA-5543
> URL: https://issues.apache.org/jira/browse/KAFKA-5543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.11.0.1
>
>
> Reported by [~junrao], we have a small leak where the `LastStableOffsetLag` 
> metric is not removed along with the other metrics in the 
> `Partition.removeMetrics` method. This could create a leak when partitions 
> are reassigned or a topic is deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5503) Idempotent producer ignores shutdown while fetching ProducerId

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128621#comment-16128621
 ] 

Ismael Juma commented on KAFKA-5503:


[~evis], do you intend to submit a PR for this soon? Otherwise, we will have to 
move this to 0.11.0.2.

> Idempotent producer ignores shutdown while fetching ProducerId
> --
>
> Key: KAFKA-5503
> URL: https://issues.apache.org/jira/browse/KAFKA-5503
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Affects Versions: 0.11.0.0
>Reporter: Jason Gustafson
>Assignee: Evgeny Veretennikov
> Fix For: 0.11.0.1
>
>
> When using the idempotent producer, we initially block the sender thread 
> while we attempt to get the ProducerId. During this time, a concurrent call 
> to close() will be ignored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5482) A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request slows down transactions significantly

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5482:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> A CONCURRENT_TRANASCTIONS error for the first AddPartitionsToTxn request 
> slows down transactions significantly
> --
>
> Key: KAFKA-5482
> URL: https://issues.apache.org/jira/browse/KAFKA-5482
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.11.0.2
>
>
> Here is the issue.
> # When we do a commit transaction, the producer sends an `EndTxn` request to 
> the coordinator. The coordinator writes the `PrepareCommit` message to the 
> transaction log and then returns the response the client. It writes the 
> transaction markers and the final 'CompleteCommit' message asynchronously. 
> # In the mean time, if the client starts another transaction, it will send an 
> `AddPartitions` request on the next `Sender.run` loop. If the markers haven't 
> been written yet, then the coordinator will return a retriable 
> `CONCURRENT_TRANSACTIONS` error to the client.
> # The current behavior in the producer is to sleep for `retryBackoffMs` 
> before retrying the request. The current default for this is 100ms. So the 
> producer will sleep for 100ms before sending the `AddPartitions` again. This 
> puts a floor on the latency for back to back transactions.
> This has been worked around in 
> https://issues.apache.org/jira/browse/KAFKA-5477 by reducing the retryBackoff 
> for the first AddPartitions request. But we need a stronger solution: like 
> having the commit block until the transaction is complete, or delaying the 
> addPartitions until batches are actually ready to be sent to the transaction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5441) Fix transaction marker grouping by producerId in TC

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128619#comment-16128619
 ] 

Ismael Juma commented on KAFKA-5441:


[~guozhang], are you planning to do this for 0.11.0.1 or shall we move it to 
0.11.0.2?

> Fix transaction marker grouping by producerId in TC
> ---
>
> Key: KAFKA-5441
> URL: https://issues.apache.org/jira/browse/KAFKA-5441
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Guozhang Wang
>  Labels: exactly-once
> Fix For: 0.11.0.1
>
>
> It seems in some cases the WriteTxnMarker request can be sent with multiple 
> entries for the same ProducerId. This is the cause of KAFKA-5438. This is not 
> necessarily a correctness problem, but it seems unintentional and should 
> probably be fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5383) Additional Test Cases for ReplicaManager

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5383:
---
Fix Version/s: (was: 0.11.0.1)
   1.0.0

> Additional Test Cases for ReplicaManager
> 
>
> Key: KAFKA-5383
> URL: https://issues.apache.org/jira/browse/KAFKA-5383
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
> Fix For: 1.0.0
>
>
> KAFKA-5355 and KAFKA-5376 have shown that current testing of ReplicaManager 
> is inadequate. This is definitely the case when it comes to KIP-98 and is 
> likely true in general. We should improve this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5284) Add tools and metrics to diagnose problems with the idempotent producer and transactions

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5284:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Add tools and metrics to diagnose problems with the idempotent producer and 
> transactions
> 
>
> Key: KAFKA-5284
> URL: https://issues.apache.org/jira/browse/KAFKA-5284
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.2
>
>
> The KIP mentions a number of metrics which we should add, but haven't yet 
> done so. IT would also be good to have tools to help diagnose degenerate 
> situations like:
> # If a consumer is stuck, we should be able to find the LSO of the partition 
> it is blocked on, and which producer is holding up the advancement of the LSO.
> # We should be able to force abort any inflight transaction to free up 
> consumers
> # etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5342) Distinguish abortable failures in transactional producer

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128617#comment-16128617
 ] 

Ismael Juma commented on KAFKA-5342:


[~hachikuji], we are starting to plan 0.11.0.1. Do we expect to have this soon 
or shall we move it to 0.11.0.2?

> Distinguish abortable failures in transactional producer
> 
>
> Key: KAFKA-5342
> URL: https://issues.apache.org/jira/browse/KAFKA-5342
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.11.0.1
>
>
> The transactional producer distinguishes two classes of user-visible errors:
> 1. Abortable errors: these are errors which are fatal to the ongoing 
> transaction, but which can be successfully aborted. Essentially any error in 
> which the producer can still expect to successfully send EndTxn to the 
> transaction coordinator is abortable.
> 2. Fatal errors: any error which is not abortable is fatal. For example, a 
> transactionalId authorization error is fatal because it would also prevent 
> the TC from receiving the EndTxn request.
> At the moment, it's not clear how the user would know how they should handle 
> a given failure. One option is to add an exception type to indicate which 
> errors are abortable (e.g. AbortableKafkaException). Then any other exception 
> could be considered fatal.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3955) Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to failed broker boot

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3955:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Kafka log recovery doesn't truncate logs on non-monotonic offsets, leading to 
> failed broker boot
> 
>
> Key: KAFKA-3955
> URL: https://issues.apache.org/jira/browse/KAFKA-3955
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1, 0.8.2.0, 0.8.2.1, 0.8.2.2, 
> 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.1.1
>Reporter: Tom Crayford
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.2
>
>
> Hi,
> I've found a bug impacting kafka brokers on startup after an unclean 
> shutdown. If a log segment is corrupt and has non-monotonic offsets (see the 
> appendix of this bug for a sample output from {{DumpLogSegments}}), then 
> {{LogSegment.recover}} throws an {{InvalidOffsetException}} error: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/OffsetIndex.scala#L218
> That code is called by {{LogSegment.recover}}: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogSegment.scala#L191
> Which is called in several places in {{Log.scala}}. Notably it's called four 
> times during recovery:
> Thrice in Log.loadSegments
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L199
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L204
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L226
> and once in Log.recoverLog
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L268
> Of these, only the very last one has a {{catch}} for 
> {{InvalidOffsetException}}. When that catches the issue, it truncates the 
> whole log (not just this segment): 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/Log.scala#L274
>  to the start segment of the bad log segment.
> However, this code can't be hit on recovery, because of the code paths in 
> {{loadSegments}} - they mean we'll never hit truncation here, as we always 
> throw this exception and that goes all the way to the toplevel exception 
> handler and crashes the JVM.
> As {{Log.recoverLog}} is always called during recovery, I *think* a fix for 
> this is to move this crash recovery/truncate code inside a new method in 
> {{Log.scala}}, and call that instead of {{LogSegment.recover}} in each place. 
> That code should return the number of {{truncatedBytes}} like we do in 
> {{Log.recoverLog}} and then truncate the log. The callers will have to be 
> notified "stop iterating over files in the directory", likely via a return 
> value of {{truncatedBytes}} like {{Log.recoverLog` does right now.
> I'm happy working on a patch for this. I'm aware this recovery code is tricky 
> and important to get right.
> I'm also curious (and currently don't have good theories as of yet) as to how 
> this log segment got into this state with non-monotonic offsets. This segment 
> is using gzip compression, and is under 0.9.0.1. The same bug with respect to 
> recovery exists in trunk, but I'm unsure if the new handling around 
> compressed messages (KIP-31) means the bug where non-monotonic offsets get 
> appended is still present in trunk.
> As a production workaround, one can manually truncate that log folder 
> yourself (delete all .index/.log files including and after the one with the 
> bad offset). However, kafka should (and can) handle this case well - with 
> replication we can truncate in broker startup.
> stacktrace and error message:
> {code}
> pri=WARN  t=pool-3-thread-4 at=Log Found a corrupted index file, 
> /$DIRECTORY/$TOPIC-22/14306536.index, deleting and rebuilding 
> index...
> pri=ERROR t=main at=LogManager There was an error in one of the threads 
> during logs loading: kafka.common.InvalidOffsetException: Attempt to append 
> an offset (15000337) to position 111719 no larger than the last offset 
> appended (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> pri=FATAL t=main at=KafkaServer Fatal error during KafkaServer startup. 
> Prepare to shutdown kafka.common.InvalidOffsetException: Attempt to append an 
> offset (15000337) to position 111719 no larger than the last offset appended 
> (15000337) to /$DIRECTORY/$TOPIC-8/14008931.index.
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)

[jira] [Updated] (KAFKA-5347) OutOfSequence error should be fatal

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5347:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> OutOfSequence error should be fatal
> ---
>
> Key: KAFKA-5347
> URL: https://issues.apache.org/jira/browse/KAFKA-5347
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.2
>
>
> If the producer sees an OutOfSequence error for a given partition, we 
> currently treat it as an abortable error. This makes some sense because 
> OutOfSequence won't prevent us from being able to send the EndTxn to abort 
> the transaction. The problem is that the producer, even after aborting, still 
> won't be able to send to the topic with an OutOfSequence. One way to deal 
> with this is to ask the user to call {{initTransactions()}} again to bump the 
> epoch, but this is a bit difficult to explain and could be dangerous since it 
> renders zombie checking less effective. Probably we should just consider 
> OutOfSequence fatal for the transactional producer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-4972) Kafka 0.10.0 Found a corrupted index file during Kafka broker startup

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4972:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Kafka 0.10.0  Found a corrupted index file during Kafka broker startup
> --
>
> Key: KAFKA-4972
> URL: https://issues.apache.org/jira/browse/KAFKA-4972
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0
> Environment: JDK: HotSpot  x64  1.7.0_80
> Tag: 0.10.0
>Reporter: fangjinuo
>Priority: Critical
> Fix For: 0.11.0.2
>
> Attachments: Snap3.png
>
>
> After force shutdown all kafka brokers one by one, restart them one by one, 
> but a broker startup failure.
> The following WARN leval log was found in the log file:
> found a corrutped index file,  .index , delet it  ...
> you can view details by following attachment.
> I look up some codes in core module, found out :
> the nonthreadsafe method LogSegment.append(offset, messages)  has tow caller:
> 1) Log.append(messages)  // here has a synchronized 
> lock 
> 2) LogCleaner.cleanInto(topicAndPartition, source, dest, map, retainDeletes, 
> messageFormatVersion)   // here has not 
> So I guess this may be the reason for the repeated offset in 0xx.log file 
> (logsegment's .log) 
> Although this is just my inference, but I hope that this problem can be 
> quickly repaired



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-1923) Negative offsets in replication-offset-checkpoint file

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-1923:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Negative offsets in replication-offset-checkpoint file
> --
>
> Key: KAFKA-1923
> URL: https://issues.apache.org/jira/browse/KAFKA-1923
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.0
>Reporter: Oleg Golovin
>  Labels: reliability
> Fix For: 0.11.0.2
>
>
> Today was the second time we witnessed negative offsets in 
> replication-offset-checkpoint file. After restart the node stops replicating 
> some of its partitions.
> Unfortunately we can't reproduce it yet. But the two cases we encountered 
> indicate a bug which should be addressed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5026) DebuggingConsumerId and DebuggingMessageFormatter and message format v2

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5026:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> DebuggingConsumerId and DebuggingMessageFormatter and message format v2
> ---
>
> Key: KAFKA-5026
> URL: https://issues.apache.org/jira/browse/KAFKA-5026
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Ismael Juma
>  Labels: exactly-once, tooling
> Fix For: 0.11.0.2
>
>
> [~junrao] suggested the following:
> Currently, the broker supports a DebuggingConsumerId mode for the fetch 
> request. Should we extend that so that the consumer can read the control 
> message as well? Should we also have some kind of DebuggingMessageFormatter 
> so that ConsoleConsumer can show all the newly introduced fields in the new 
> message format (e.g., pid, epoch, etc) for debugging purpose?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5403) Transactions system test should dedup consumed messages by offset

2017-08-16 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128614#comment-16128614
 ] 

Ismael Juma commented on KAFKA-5403:


[~apurva], we are starting to plan 0.11.0.1. Will this be ready soon or shall 
we move it to 0.11.0.2?

> Transactions system test should dedup consumed messages by offset
> -
>
> Key: KAFKA-5403
> URL: https://issues.apache.org/jira/browse/KAFKA-5403
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.11.0.1
>
>
> In KAFKA-5396, we saw that the consumers which verify the data in multiple 
> topics could read the same offsets multiple times, for instance when a 
> rebalance happens. 
> This would detect spurious duplicates, causing the test to fail. We should 
> dedup the consumed messages by offset and only fail the test if we have 
> duplicate values for a if for a unique set of offsets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-4686) Null Message payload is shutting down broker

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4686:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Null Message payload is shutting down broker
> 
>
> Key: KAFKA-4686
> URL: https://issues.apache.org/jira/browse/KAFKA-4686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
> Environment: Amazon Linux AMI release 2016.03 kernel 
> 4.4.19-29.55.amzn1.x86_64
>Reporter: Rodrigo Queiroz Saramago
>Priority: Critical
> Fix For: 0.11.0.2
>
> Attachments: KAFKA-4686-NullMessagePayloadError.tar.xz, 
> kafkaServer.out
>
>
> Hello, I have a test environment with 3 brokers and 1 zookeeper nodes, in 
> which clients connect using two-way ssl authentication. I use kafka version 
> 0.10.1.1, the system works as expected for a while, but if the broker goes 
> down and then is restarted, something got corrupted and is not possible start 
> broker again, it always fails with the same error. What this error mean? What 
> can I do in this case? Is this the expected behavior?
> [2017-01-23 07:03:28,927] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.KafkaException: Message payload is null: 
> Message(magic = 0, attributes = 1, crc = 4122289508, key = null, payload = 
> null) (kafka.log.LogManager)
> [2017-01-23 07:03:28,929] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
> at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-23 07:03:28,946] INFO shutting down (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,949] INFO Terminate ZkClient event thread. 
> (org.I0Itec.zkclient.ZkEventThread)
> [2017-01-23 07:03:28,954] INFO EventThread shut down for session: 
> 0x159bd458ae70008 (org.apache.zookeeper.ClientCnxn)
> [2017-01-23 07:03:28,954] INFO Session: 0x159bd458ae70008 closed 
> (org.apache.zookeeper.ZooKeeper)
> [2017-01-23 07:03:28,957] INFO shut down completed (kafka.server.KafkaServer)
> [2017-01-23 07:03:28,959] FATAL Fatal error during KafkaServerStartable 
> startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
> kafka.common.KafkaException: Message payload is null: Message(magic = 0, 
> attributes = 1, crc = 4122289508, key = null, payload = null)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.(ByteBufferMessageSet.scala:90)
> at 
> kafka.message.ByteBufferMessageSet$.deepIterator(ByteBufferMessageSet.scala:85)
> at kafka.message.MessageAndOffset.firstOffset(MessageAndOffset.scala:33)
> at kafka.log.LogSegment.recover(LogSegment.scala:223)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
> at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> at kafka.log.Log.loadSegments(Log.scala:179)
> at kafka.log.Log.(Log.scala:108)
> 

[jira] [Updated] (KAFKA-5272) Improve validation for Alter Configs (KIP-133)

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5272:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.1.0

> Improve validation for Alter Configs (KIP-133)
> --
>
> Key: KAFKA-5272
> URL: https://issues.apache.org/jira/browse/KAFKA-5272
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.11.1.0
>
>
> TopicConfigHandler.processConfigChanges() warns about certain topic configs. 
> We should include such validations in alter configs and reject the change if 
> the validation fails. Note that this should be done without changing the 
> behaviour of the ConfigCommand (as it does not have access to the broker 
> configs).
> We should consider adding other validations like KAFKA-4092 and KAFKA-4680.
> Finally, the AlterConfigsPolicy mentioned in KIP-133 will be added at the 
> same time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-3689) Exception when attempting to decrease connection count for address with no connections

2017-08-16 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3689:
---
Fix Version/s: (was: 0.11.0.1)
   0.11.0.2

> Exception when attempting to decrease connection count for address with no 
> connections
> --
>
> Key: KAFKA-3689
> URL: https://issues.apache.org/jira/browse/KAFKA-3689
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 0.9.0.1
> Environment: ubuntu 14.04,
> java version "1.7.0_95"
> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
> 3 broker cluster (all 3 servers identical -  Intel Xeon E5-2670 @2.6GHz, 
> 8cores, 16 threads 64 GB RAM & 1 TB Disk)
> Kafka Cluster is managed by 3 server ZK cluster (these servers are different 
> from Kafka broker servers). All 6 servers are connected via 10G switch. 
> Producers run from external servers.
>Reporter: Buvaneswari Ramanan
>Assignee: Ryan P
> Fix For: 0.11.0.2
>
> Attachments: kafka-3689-instrumentation.patch, KAFKA-3689.log.redacted
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> As per Ismael Juma's suggestion in email thread to us...@kafka.apache.org 
> with the same subject, I am creating this bug report.
> The following error occurs in one of the brokers in our 3 broker cluster, 
> which serves about 8000 topics. These topics are single partitioned with a 
> replication factor = 3. Each topic gets data at a low rate  – 200 bytes per 
> sec.  Leaders are balanced across the topics.
> Producers run from external servers (4 Ubuntu servers with same config as the 
> brokers), each producing to 2000 topics utilizing kafka-python library.
> This error message occurs repeatedly in one of the servers. Between the hours 
> of 10:30am and 1:30pm on 5/9/16, there were about 10 Million such 
> occurrences. This was right after a cluster restart.
> This is not the first time we got this error in this broker. In those 
> instances, error occurred hours / days after cluster restart.
> =
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144 (actual network address 
> masked)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)
> [2016-05-09 10:38:43,932] ERROR Processor got uncaught exception. 
> (kafka.network.Processor)
> java.lang.IllegalArgumentException: Attempted to decrease connection count 
> for address with no connections, address: /X.Y.Z.144
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at 
> kafka.network.ConnectionQuotas$$anonfun$9.apply(SocketServer.scala:565)
> at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
> at scala.collection.AbstractMap.getOrElse(Map.scala:59)
> at kafka.network.ConnectionQuotas.dec(SocketServer.scala:564)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:450)
> at 
> kafka.network.Processor$$anonfun$run$13.apply(SocketServer.scala:445)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:445)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128595#comment-16128595
 ] 

ASF GitHub Bot commented on KAFKA-5152:
---

Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/3653


> Kafka Streams keeps restoring state after shutdown is initiated during startup
> --
>
> Key: KAFKA-5152
> URL: https://issues.apache.org/jira/browse/KAFKA-5152
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Xavier Léauté
>Assignee: Damian Guy
>Priority: Blocker
> Fix For: 0.10.2.2, 0.11.0.1
>
>
> If streams shutdown is initiated during state restore (e.g. an uncaught 
> exception is thrown) streams will not shut down until all stores are first 
> finished restoring.
> As restore progresses, stream threads appear to be taken out of service as 
> part of the shutdown sequence, causing rebalancing of tasks. This compounds 
> the problem by slowing down the restore process even further, since the 
> remaining threads now have to also restore the reassigned tasks before they 
> can shut down.
> A more severe issue is that if there is a new rebalance triggered during the 
> end of the waitingSync phase (e.g. due to a new member joining the group, or 
> some members timed out the SyncGroup response), then some consumer clients of 
> the group may already proceed with the {{onPartitionsAssigned}} and blocked 
> on trying to grab the file dir lock not yet released from other clients, 
> while the other clients holding the lock are consistently re-sending 
> {{JoinGroup}} requests while the rebalance cannot be completed because the 
> clients blocked on the file dir lock will not be kicked out of the group as 
> its heartbeat thread has been consistently sending HBRequest. Hence this is a 
> deadlock caused by not releasing the file dir locks in task suspension.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5440) Kafka Streams report state RUNNING even if all threads are dead

2017-08-16 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy resolved KAFKA-5440.
---
Resolution: Duplicate

> Kafka Streams report state RUNNING even if all threads are dead
> ---
>
> Key: KAFKA-5440
> URL: https://issues.apache.org/jira/browse/KAFKA-5440
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1, 0.11.0.0
>Reporter: Matthias J. Sax
> Fix For: 0.11.0.1, 1.0.0
>
>
> From the mailing list:
> {quote}
> Hi All,
> We recently implemented a health check for a Kafka Streams based application. 
> The health check is simply checking the state of Kafka Streams by calling 
> KafkaStreams.state(). It reports healthy if it’s not in PENDING_SHUTDOWN or 
> NOT_RUNNING states. 
> We truly appreciate having the possibility to easily check the state of Kafka 
> Streams but to our surprise we noticed that KafkaStreams.state() returns 
> RUNNING even though all StreamThreads has crashed and reached NOT_RUNNING 
> state. Is this intended behaviour or is it a bug? Semantically it seems weird 
> to me that KafkaStreams would say it’s RUNNING when it is in fact not 
> consuming anything since all underlying working threads has crashed. 
> If this is intended behaviour I would appreciate an explanation of why that 
> is the case. Also in that case, how could I determine if the consumption from 
> Kafka hasn’t crashed? 
> If this is not intended behaviour, how fast could I expect it to be fixed? I 
> wouldn’t mind fixing it myself but I’m not sure if this is considered trivial 
> or big enough to require a JIRA. Also, if I would implement a fix I’d like 
> your input on what would be a reasonable solution. By just inspecting to code 
> I have an idea but I’m not sure I understand all the implication so I’d be 
> happy to hear your thoughts first. 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-5735) Client-ids are not handled consistently by clients and broker

2017-08-16 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-5735:
-

Assignee: Mickael Maison  (was: Rajini Sivaram)

> Client-ids are not handled consistently by clients and broker
> -
>
> Key: KAFKA-5735
> URL: https://issues.apache.org/jira/browse/KAFKA-5735
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, core
>Affects Versions: 0.11.0.0
>Reporter: Rajini Sivaram
>Assignee: Mickael Maison
> Fix For: 1.0.0
>
>
> At the moment, Java clients expect client-ids to use a limited set of 
> characters so that they can be used without quoting in metrics. 
> kafka-configs.sh allows quotas to be defined only for that limited set. But 
> the broker does not validate client-ids. And the documentation does not 
> mention any limitations. Existing non-Java clients do not place any 
> restrictions on client-ids and hence introducing restrictions on the 
> broker-side now will be a breaking change. So we should allow any characters 
> and treat them consistently in the same way as we handle user principals.
> Changes required:
> 1. Client-id in metrics should be sanitized using URL-encoding similar to the 
> encoding used for user principal in quota metrics. This leaves metrics for 
> client-ids using the current limited set of characters as-is, but will allow 
> arbitrary characters in encoded form. To avoid sanitizing multiple times and 
> to avoid unsanitized ids being used by mistake in some metrics, it may be 
> better to introduce a ClientId class that stores the sanitized id and uses 
> appropriate methods to retrieve the id for metrics etc.
> 2. Quota metrics and sensors as well as ZooKeeper quota configuration paths 
> should use sanitized ids for client-ids (they already do for user principal).
> 3. Remove client-id validation in kafka-configs.sh and allow any characters 
> for client-id similar to usernames, URL-encoding the names to generate ZK 
> path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5684) KStreamPrintProcessor as customized KStreamPeekProcessor

2017-08-16 Thread Paolo Patierno (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno resolved KAFKA-5684.
---
Resolution: Feedback Received

> KStreamPrintProcessor as customized KStreamPeekProcessor
> 
>
> Key: KAFKA-5684
> URL: https://issues.apache.org/jira/browse/KAFKA-5684
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Minor
>
> Hi,
> the {{KStreamPrintProcessor}} is implemented from scratch (from the 
> {{AbstractProcessor}}) and the same for the related supplier.
> It looks to me that it's just a special {{KStreamPeekProcessor}} with 
> forwardDownStream to false and that allows the possibility to specify Serdes 
> instances used if key/values are bytes.
> At same time used by a {{print()}} method it provides a fast way to print 
> data flowing through the pipeline (while using just {{peek()}} you need to 
> write the code).
> I think that it could be useful to refactoring the {{KStreamPrintProcessor}} 
> as derived from the {{KStreamPeekProcessor}} customizing its behavior.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5684) KStreamPrintProcessor as customized KStreamPeekProcessor

2017-08-16 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128423#comment-16128423
 ] 

Paolo Patierno commented on KAFKA-5684:
---

I'm going to close this JIRA (and the related PR). This issue will be addressed 
through the KIP-182.

> KStreamPrintProcessor as customized KStreamPeekProcessor
> 
>
> Key: KAFKA-5684
> URL: https://issues.apache.org/jira/browse/KAFKA-5684
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Minor
>
> Hi,
> the {{KStreamPrintProcessor}} is implemented from scratch (from the 
> {{AbstractProcessor}}) and the same for the related supplier.
> It looks to me that it's just a special {{KStreamPeekProcessor}} with 
> forwardDownStream to false and that allows the possibility to specify Serdes 
> instances used if key/values are bytes.
> At same time used by a {{print()}} method it provides a fast way to print 
> data flowing through the pipeline (while using just {{peek()}} you need to 
> write the code).
> I think that it could be useful to refactoring the {{KStreamPrintProcessor}} 
> as derived from the {{KStreamPeekProcessor}} customizing its behavior.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5684) KStreamPrintProcessor as customized KStreamPeekProcessor

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128422#comment-16128422
 ] 

ASF GitHub Bot commented on KAFKA-5684:
---

Github user ppatierno closed the pull request at:

https://github.com/apache/kafka/pull/3636


> KStreamPrintProcessor as customized KStreamPeekProcessor
> 
>
> Key: KAFKA-5684
> URL: https://issues.apache.org/jira/browse/KAFKA-5684
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Minor
>
> Hi,
> the {{KStreamPrintProcessor}} is implemented from scratch (from the 
> {{AbstractProcessor}}) and the same for the related supplier.
> It looks to me that it's just a special {{KStreamPeekProcessor}} with 
> forwardDownStream to false and that allows the possibility to specify Serdes 
> instances used if key/values are bytes.
> At same time used by a {{print()}} method it provides a fast way to print 
> data flowing through the pipeline (while using just {{peek()}} you need to 
> write the code).
> I think that it could be useful to refactoring the {{KStreamPrintProcessor}} 
> as derived from the {{KStreamPeekProcessor}} customizing its behavior.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)