[jira] [Commented] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-21 Thread yanrui (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870040#comment-16870040
 ] 

yanrui commented on KAFKA-8576:
---

The problem is solved after the broker restarts.

> Consumer failed to join the coordinator
> ---
>
> Key: KAFKA-8576
> URL: https://issues.apache.org/jira/browse/KAFKA-8576
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: yanrui
>Priority: Blocker
> Attachments: image-2019-06-21-10-52-38-762.png
>
>
> Environment:
>  single node kafka (2.1.1)6g 4c
>  client(0.11.0.1 )
>  consumer group number:1170
> After running for a while, consumers can’t join the coordinator.The report is 
> not the correct coordinator when describing the group.The consumer is  
> endless trap in the discovery group,then marking  the group coordinator dead 
> .Ask for help analyzing the reason, thank you very much
>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-21 Thread yanrui (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870039#comment-16870039
 ] 

yanrui edited comment on KAFKA-8576 at 6/22/19 3:56 AM:


Is it possible that the leaderAndIsr sent by the controller is lost during the 
broker restart process, causing some partitions of the _consumer__offset  not 
be successfully loaded, resulting in ‘ownedPartitions’  not add those 
partitions?


was (Author: yanrui):

Is it possible that the leaderAndIsr sent by the controller is lost during the 
broker restart process, causing some partitions of the _consumer__offset  not 
be successfully loaded, resulting in ‘ownedPartitions’  not add those partitions

> Consumer failed to join the coordinator
> ---
>
> Key: KAFKA-8576
> URL: https://issues.apache.org/jira/browse/KAFKA-8576
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: yanrui
>Priority: Blocker
> Attachments: image-2019-06-21-10-52-38-762.png
>
>
> Environment:
>  single node kafka (2.1.1)6g 4c
>  client(0.11.0.1 )
>  consumer group number:1170
> After running for a while, consumers can’t join the coordinator.The report is 
> not the correct coordinator when describing the group.The consumer is  
> endless trap in the discovery group,then marking  the group coordinator dead 
> .Ask for help analyzing the reason, thank you very much
>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-21 Thread yanrui (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870039#comment-16870039
 ] 

yanrui commented on KAFKA-8576:
---


Is it possible that the leaderAndIsr sent by the controller is lost during the 
broker restart process, causing some partitions of the _consumer__offset  not 
be successfully loaded, resulting in ‘ownedPartitions’  not add those partitions

> Consumer failed to join the coordinator
> ---
>
> Key: KAFKA-8576
> URL: https://issues.apache.org/jira/browse/KAFKA-8576
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: yanrui
>Priority: Blocker
> Attachments: image-2019-06-21-10-52-38-762.png
>
>
> Environment:
>  single node kafka (2.1.1)6g 4c
>  client(0.11.0.1 )
>  consumer group number:1170
> After running for a while, consumers can’t join the coordinator.The report is 
> not the correct coordinator when describing the group.The consumer is  
> endless trap in the discovery group,then marking  the group coordinator dead 
> .Ask for help analyzing the reason, thank you very much
>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-21 Thread yanrui (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870022#comment-16870022
 ] 

yanrui commented on KAFKA-8576:
---

2019-05-27 15:16:09 860 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
(Re-)joining group fm-base-message
2019-05-27 15:16:09 866 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message
2019-05-27 15:16:09 975 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Discovered coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) for group 
fm-base-message.
2019-05-27 15:16:09 976 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
(Re-)joining group fm-base-message
2019-05-27 15:16:10 042 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message
2019-05-27 15:16:10 161 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Discovered coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) for group 
fm-base-message.
2019-05-27 15:16:10 161 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
(Re-)joining group fm-base-message
2019-05-27 15:16:10 176 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message
2019-05-27 15:16:10 348 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Discovered coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) for group 
fm-base-message.
2019-05-27 15:16:10 349 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
(Re-)joining group fm-base-message
2019-05-27 15:16:10 358 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message
2019-05-27 15:16:10 467 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Discovered coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) for group 
fm-base-message.
2019-05-27 15:16:10 467 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
(Re-)joining group fm-base-message
2019-05-27 15:16:10 473 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-23] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message
2019-05-27 15:16:10 479 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-22] - 
Discovered coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) for group 
fm-base-message.
2019-05-27 15:16:10 544 INFO  
[org.apache.kafka.clients.consumer.internals.AbstractCoordinator][Thread-22] - 
Marking the coordinator 172.20.0.26:9092 (id: 2147483647 rack: null) dead for 
group fm-base-message

> Consumer failed to join the coordinator
> ---
>
> Key: KAFKA-8576
> URL: https://issues.apache.org/jira/browse/KAFKA-8576
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: yanrui
>Priority: Blocker
> Attachments: image-2019-06-21-10-52-38-762.png
>
>
> Environment:
>  single node kafka (2.1.1)6g 4c
>  client(0.11.0.1 )
>  consumer group number:1170
> After running for a while, consumers can’t join the coordinator.The report is 
> not the correct coordinator when describing the group.The consumer is  
> endless trap in the discovery group,then marking  the group coordinator dead 
> .Ask for help analyzing the reason, thank you very much
>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8586) Source task producers silently fail to send records

2019-06-21 Thread Chris Egerton (JIRA)
Chris Egerton created KAFKA-8586:


 Summary: Source task producers silently fail to send records
 Key: KAFKA-8586
 URL: https://issues.apache.org/jira/browse/KAFKA-8586
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.3.0
Reporter: Chris Egerton


The Connect framework marks source records as successfully sent when they are 
dispatched to the producer, instead of when they are actually sent to Kafka. 
[This is assumed to be good 
enough|https://github.com/apache/kafka/blob/3e9d1c1411c5268de382f9dfcc95bdf66d0063a0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L324-L331]
 since the Connect framework sets up its producer to use infinite retries on 
retriable errors, but in the case of an authorization or authentication failure 
with a secured Kafka broker, the errors aren't retriable and cause the producer 
to invoke its send callback with an exception and then give up on sending the 
message. This is a problem since the callback currently used by the 
WorkerSourceTask class when it invokes Producer.send(...) logs the exception 
and does nothing else. This leads to data loss since the source offsets for 
those failed records are committed, and the status of the task is never 
affected so users may not even know that something is wrong unless they check 
the worker log files or notice that data isn't flowing into Kafka. Until and 
unless someone does notice that something's wrong, the task will continue 
processing records and committing offsets, even though nothing is making it 
into Kafka.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8586) Source task producers silently fail to send records

2019-06-21 Thread Chris Egerton (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton reassigned KAFKA-8586:


Assignee: Chris Egerton

> Source task producers silently fail to send records
> ---
>
> Key: KAFKA-8586
> URL: https://issues.apache.org/jira/browse/KAFKA-8586
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.3.0
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> The Connect framework marks source records as successfully sent when they are 
> dispatched to the producer, instead of when they are actually sent to Kafka. 
> [This is assumed to be good 
> enough|https://github.com/apache/kafka/blob/3e9d1c1411c5268de382f9dfcc95bdf66d0063a0/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L324-L331]
>  since the Connect framework sets up its producer to use infinite retries on 
> retriable errors, but in the case of an authorization or authentication 
> failure with a secured Kafka broker, the errors aren't retriable and cause 
> the producer to invoke its send callback with an exception and then give up 
> on sending the message. This is a problem since the callback currently used 
> by the WorkerSourceTask class when it invokes Producer.send(...) logs the 
> exception and does nothing else. This leads to data loss since the source 
> offsets for those failed records are committed, and the status of the task is 
> never affected so users may not even know that something is wrong unless they 
> check the worker log files or notice that data isn't flowing into Kafka. 
> Until and unless someone does notice that something's wrong, the task will 
> continue processing records and committing offsets, even though nothing is 
> making it into Kafka.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8584) Allow "bytes" type to generated a ByteBuffer rather than byte arrays

2019-06-21 Thread SuryaTeja Duggi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869968#comment-16869968
 ] 

SuryaTeja Duggi commented on KAFKA-8584:


Let me know if there is any thing else to know

> Allow "bytes" type to generated a ByteBuffer rather than byte arrays
> 
>
> Key: KAFKA-8584
> URL: https://issues.apache.org/jira/browse/KAFKA-8584
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>Assignee: SuryaTeja Duggi
>Priority: Major
>  Labels: newbie
>
> Right now in the RPC definition, type {{bytes}} would be translated into 
> {{byte[]}} in generated Java code. However, for some requests like 
> ProduceRequest#partitionData, the underlying type would better be a 
> ByteBuffer rather than a byte array.
> One proposal is to add an additional boolean tag {{useByteBuffer}} for 
> {{bytes}} type, which by default is false; when set to {{true}} set the 
> corresponding field to generate {{ByteBuffer}} instead of {{[]byte}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8584) Allow "bytes" type to generated a ByteBuffer rather than byte arrays

2019-06-21 Thread SuryaTeja Duggi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SuryaTeja Duggi reassigned KAFKA-8584:
--

Assignee: SuryaTeja Duggi

> Allow "bytes" type to generated a ByteBuffer rather than byte arrays
> 
>
> Key: KAFKA-8584
> URL: https://issues.apache.org/jira/browse/KAFKA-8584
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Guozhang Wang
>Assignee: SuryaTeja Duggi
>Priority: Major
>  Labels: newbie
>
> Right now in the RPC definition, type {{bytes}} would be translated into 
> {{byte[]}} in generated Java code. However, for some requests like 
> ProduceRequest#partitionData, the underlying type would better be a 
> ByteBuffer rather than a byte array.
> One proposal is to add an additional boolean tag {{useByteBuffer}} for 
> {{bytes}} type, which by default is false; when set to {{true}} set the 
> corresponding field to generate {{ByteBuffer}} instead of {{[]byte}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8585) Controller should make LeaderAndIsr updates optimistically

2019-06-21 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-8585:
--

 Summary: Controller should make LeaderAndIsr updates optimistically
 Key: KAFKA-8585
 URL: https://issues.apache.org/jira/browse/KAFKA-8585
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Before the controller updates LeaderAndIsr information for a set of partitions, 
it always first looks up the current state. We can skip this since the state is 
already cached on the controller. In the common case (e.g. controlled 
shutdown), the update will succeed. If there was a change which had not been 
propagated to the controller, we can retry (this logic exists already).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8574) EOS race condition during task transition leads to LocalStateStore truncation in Kafka Streams 2.0.1

2019-06-21 Thread William Greer (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869529#comment-16869529
 ] 

William Greer edited comment on KAFKA-8574 at 6/21/19 11:13 PM:


This is a different issue from KAFKA-8187. KAFKA-8187 had the possible local 
data loss when standby tasks were in use. This issue can have possible data 
loss regardless of usage of standbys as long as EOS is enabled. This issue may 
be less likely by KAFKA-7672 but the race condition that makes this possible 
still exists even with the KAFKA-7672 change-set applied. The race conditions 
around the unprotected access of the checkpoint files (Writes and reads of the 
checkpoint without a lock) still exist in trunk.


was (Author: wgreerx):
This is a different issue from KAFKA-8187. KAFKA-8187 had the possible local 
data loss when standby tasks were in use. This issue can have possible data 
loss regardless of usage of standbys as long as EOS is enabled. This issue may 
be mitigated by KAFKA-7672 but the race condition that makes this possible 
still exists even with the KAFKA-7672 change-set applied. The race conditions 
around the unprotected access of the checkpoint files (Writes and reads of the 
checkpoint without a lock) still exist in trunk, whether there are any 
correctness issues when the race condition occurs is a different question.

> EOS race condition during task transition leads to LocalStateStore truncation 
> in Kafka Streams 2.0.1
> 
>
> Key: KAFKA-8574
> URL: https://issues.apache.org/jira/browse/KAFKA-8574
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.1
>Reporter: William Greer
>Priority: Major
>
> *Overview*
>  While using EOS in Kafka Stream there is a race condition where the 
> checkpoint file is written by the previous owning thread (Thread A) after the 
> new owning thread (Thread B) reads the checkpoint file. Thread B then starts 
> a restoration since no checkpoint file was found. A re-balance occurs before 
> Thread B completes the restoration and a third Thread (Thread C) becomes the 
> owning thread (Thread C) reads the checkpoint file written by Thread A which 
> does not correspond to the current state of the RocksDB state store. When 
> this race condition occurs the state store will have the most recent records 
> and some amount of the oldest records but will be missing some amount of 
> records in between. If A->Z represents the entire changelog to the present 
> then when this scenario occurs the state store would contain records [A->K 
> and Y->Z] where the state store is missing records K->Y.
>   
>  This race condition is possible due to dirty writes and dirty reads of the 
> checkpoint file.
>   
>  *Example:*
>  Thread refers to a Kafka Streams StreamThread [0]
>  Thread A, B and C are running in the same JVM in the same streams 
> application.
>   
>  Scenario:
>  Thread-A is in RUNNING state and up to date on partition 1.
>  Thread-A is suspended on 1. This does not write a checkpoint file because 
> EOS is enabled [1]
>  Thread-B is assigned to 1
>  Thread-B does not find checkpoint in StateManager [2]
>  Thread-A is assigned a different partition. Task writes suspended tasks 
> checkpoints to disk. Checkpoint for 1 is written. [3]
>  Thread-B deletes LocalStore and starts restoring. The deletion of the 
> LocalStore does not delete checkpoint file. [4]
>  Thread-C is revoked
>  Thread-A is revoked
>  Thread-B is revoked from the assigned status. Does not write a checkpoint 
> file
>  - Note Thread-B never reaches the running state, it remains in the 
> PARTITIONS_ASSIGNED state until it transitions to the PARTITIONS_REVOKED state
> Thread-C is assigned 1
>  Thread-C finds checkpoint in StateManager. This checkpoint corresponds to 
> where Thread-A left the state store for partition 1 at and not where Thread-B 
> left the state store at.
>  Thread-C begins restoring from checkpoint. The state store is missing an 
> unknown number of records at this point
>  Thread-B is assigned does not write a checkpoint file for partition 1, 
> because it had not reached a running status before being revoked
>   
>  [0] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java]
>  [1] 
> [https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L522-L553]
>  [2] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java#L98]
>  [3] 
> 

[jira] [Updated] (KAFKA-8574) EOS race condition during task transition leads to LocalStateStore truncation in Kafka Streams 2.0.1

2019-06-21 Thread William Greer (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Greer updated KAFKA-8574:
-
Description: 
*Overview*
 While using EOS in Kafka Stream there is a race condition where the checkpoint 
file is written by the previous owning thread (Thread A) after the new owning 
thread (Thread B) reads the checkpoint file. Thread B then starts a restoration 
since no checkpoint file was found. A re-balance occurs before Thread B 
completes the restoration and a third Thread (Thread C) becomes the owning 
thread (Thread C) reads the checkpoint file written by Thread A which does not 
correspond to the current state of the RocksDB state store. When this race 
condition occurs the state store will have the most recent records and some 
amount of the oldest records but will be missing some amount of records in 
between. If A->Z represents the entire changelog to the present then when this 
scenario occurs the state store would contain records [A->K and Y->Z] where the 
state store is missing records K->Y.
  
 This race condition is possible due to dirty writes and dirty reads of the 
checkpoint file.
  
 *Example:*
 Thread refers to a Kafka Streams StreamThread [0]
 Thread A, B and C are running in the same JVM in the same streams application.
  
 Scenario:
 Thread-A is in RUNNING state and up to date on partition 1.
 Thread-A is suspended on 1. This does not write a checkpoint file because EOS 
is enabled [1]
 Thread-B is assigned to 1
 Thread-B does not find checkpoint in StateManager [2]
 Thread-A is assigned a different partition. Task writes suspended tasks 
checkpoints to disk. Checkpoint for 1 is written. [3]
 Thread-B deletes LocalStore and starts restoring. The deletion of the 
LocalStore does not delete checkpoint file. [4]
 Thread-C is revoked
 Thread-A is revoked
 Thread-B is revoked from the assigned status. Does not write a checkpoint file
 - Note Thread-B never reaches the running state, it remains in the 
PARTITIONS_ASSIGNED state until it transitions to the PARTITIONS_REVOKED state

Thread-C is assigned 1
 Thread-C finds checkpoint in StateManager. This checkpoint corresponds to 
where Thread-A left the state store for partition 1 at and not where Thread-B 
left the state store at.
 Thread-C begins restoring from checkpoint. The state store is missing an 
unknown number of records at this point
 Thread-B is assigned does not write a checkpoint file for partition 1, because 
it had not reached a running status before being revoked
  
 [0] 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java]
 [1] 
[https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L522-L553]
 [2] 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java#L98]
 [3] 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java#L104-L105]
 & 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AssignedTasks.java#L316-L331]
 [4] 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java#L228]
 & 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AbstractStateManager.java#L62-L123]
 Specifically 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AbstractStateManager.java#L107-L119]
 is where the state store is deleted but the checkpoint file is not.
  
 *How we recovered:*
 1. Deleted the impacted state store. This triggered multiple exceptions and 
initiated a re-balance.
  
 *Possible approaches to address this issue:*
 1. Add a collection of global task locks for concurrency protection of the 
checkpoint file. With the lock for suspended tasks being released after 
closeNonAssignedSuspendedTasks and the locks being acquired after lock release 
for the assigned tasks.
 2. Delete checkpoint file in EOS when partitions are revoked. This doesn't 
address the race condition but would make it so that the checkpoint file would 
never be ahead of the LocalStore in EOS, this would increase the likelihood of 
triggering a full restoration of a LocalStore on partition movement between 
threads on one host.
 3. Configure task stickiness for StreamThreads. E.G. if a host with multiple 
StreamThreads is assigned a task the host had before prefer to assign the task 
to the thread on the host that had the task before.
 4. Add a new state that splits the PARTITIONS_ASSIGNED state to a clean up 
previous assignment step and a bootstrap new assignment. This would require all 
valid threads to 

[jira] [Created] (KAFKA-8584) Allow "bytes" type to generated a ByteBuffer rather than byte arrays

2019-06-21 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-8584:


 Summary: Allow "bytes" type to generated a ByteBuffer rather than 
byte arrays
 Key: KAFKA-8584
 URL: https://issues.apache.org/jira/browse/KAFKA-8584
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang


Right now in the RPC definition, type {{bytes}} would be translated into 
{{byte[]}} in generated Java code. However, for some requests like 
ProduceRequest#partitionData, the underlying type would better be a ByteBuffer 
rather than a byte array.

One proposal is to add an additional boolean tag {{useByteBuffer}} for 
{{bytes}} type, which by default is false; when set to {{true}} set the 
corresponding field to generate {{ByteBuffer}} instead of {{[]byte}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7500) MirrorMaker 2.0 (KIP-382)

2019-06-21 Thread Ryanne Dolan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869925#comment-16869925
 ] 

Ryanne Dolan commented on KAFKA-7500:
-

[~abellemare] Thanks for trying it out.

> 1) Are the offset translation topics included... KAFKA-7915

Yes, I've included the required changes to SourceTask in PR-6295.

> 2) ...switching a consumer from one cluster to another...

So glad you asked :)

The key is the RemoteClusterUtils.translateOffsets() method. This consumes from 
the checkpoint topic (not the offset-sync topic directly), which has both 
upstream and downstream offsets for each consumer group. The downstream offsets 
are calculated based on the offset-sync stream, of course, but 
MirrorCheckpointConnector does the translation for you. This makes the 
translateOffsets() method rather straightforward -- it just finds the most 
recent checkpoint for a given group.

The translateOffsets() method works both ways: you can translate from a source 
topic ("topic1") to a remote topic ("us-east.topic1") and vice versa, which 
means your failover and failback logic is identical. In both cases you just 
migrate all your consumer groups from one cluster to another.

Also note that migration only requires information that is already stored on 
the target cluster (the checkpoints), so you do not need to connect to a failed 
cluster in order to failover from it. Obviously that would defeat the purpose!

Based on translateOffsets(), you can do several nifty things wrt 
failover/failback, e.g. build scripts that bulk-migrate consumer groups from 
one cluster to another, or add consumer client logic that automatically 
failsover to a secondary cluster as needed.

In the former case, you can use kafka-consumer-groups.sh to "reset offsets" to 
those returned by translateOffsets(). This will cause consumer state to be 
transferred to the target cluster, in effect. In the latter, you can use 
translateOffsets() with KafkaConsumer.seek().

There are more advanced operations and architectures you can build using MM2 as 
well. Some are outlined in the following talk (by me): 
https://www.slideshare.net/ConfluentInc/disaster-recovery-with-mirrormaker-20-ryanne-dolan-cloudera-kafka-summit-london-2019

> 3) Do you use timestamps at all for failing over from one cluster to another?

MM2 preserves the timestamps of replicated records, but otherwise does not care 
about timestamps. Nor does failover need to involve any timestamps.

Ryanne

> MirrorMaker 2.0 (KIP-382)
> -
>
> Key: KAFKA-7500
> URL: https://issues.apache.org/jira/browse/KAFKA-7500
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Reporter: Ryanne Dolan
>Priority: Minor
> Fix For: 2.4.0
>
>
> Implement a drop-in replacement for MirrorMaker leveraging the Connect 
> framework.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0]
> [https://github.com/apache/kafka/pull/6295]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8583) Optimization for SslTransportLayer#write(ByteBuffer)

2019-06-21 Thread Mao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao updated KAFKA-8583:
---
Description: 
The is a propose to optimize SslTransportLayer#write(ByteBuffer).

Currently, sslEngine only reads data from src once per selector loop.

This can be optimized by reading all src data in one selector loop.

The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
which uses same Transport Layer code and change has been deployed to prod for 6 
months. Unfortunately, Ambry didn't record the perf diff.

 

Code change: [https://github.com/apache/kafka/pull/6984]

Let me know if anything needed. 

 

  was:
The is a propose to optimize SslTransportLayer#write(ByteBuffer).

Currently, sslEngine only reads data from src once per selector loop.

This can be optimized by reading all src data in one selector loop.

The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
which uses same Transport Layer code. 

 

Code change: [https://github.com/apache/kafka/pull/6984]

Let me know if anything needed. 

 


> Optimization for SslTransportLayer#write(ByteBuffer)
> 
>
> Key: KAFKA-8583
> URL: https://issues.apache.org/jira/browse/KAFKA-8583
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Mao
>Priority: Major
>
> The is a propose to optimize SslTransportLayer#write(ByteBuffer).
> Currently, sslEngine only reads data from src once per selector loop.
> This can be optimized by reading all src data in one selector loop.
> The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
> which uses same Transport Layer code and change has been deployed to prod for 
> 6 months. Unfortunately, Ambry didn't record the perf diff.
>  
> Code change: [https://github.com/apache/kafka/pull/6984]
> Let me know if anything needed. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8583) Optimization for SslTransportLayer#write(ByteBuffer)

2019-06-21 Thread Mao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mao updated KAFKA-8583:
---
Description: 
The is a propose to optimize SslTransportLayer#write(ByteBuffer).

Currently, sslEngine only reads data from src once per selector loop.

This can be optimized by reading all src data in one selector loop.

The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
which uses same Transport Layer code. 

 

Code change: [https://github.com/apache/kafka/pull/6984]

Let me know if anything needed. 

 

  was:
The is a propose to optimize SslTransportLayer#write(ByteBuffer).

Currently, sslEngine only reads data from src once per selector loop.

This can be optimized by reading all src data in one selector loop.

The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
which uses same Transport Layer code. 

 


> Optimization for SslTransportLayer#write(ByteBuffer)
> 
>
> Key: KAFKA-8583
> URL: https://issues.apache.org/jira/browse/KAFKA-8583
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Mao
>Priority: Major
>
> The is a propose to optimize SslTransportLayer#write(ByteBuffer).
> Currently, sslEngine only reads data from src once per selector loop.
> This can be optimized by reading all src data in one selector loop.
> The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
> which uses same Transport Layer code. 
>  
> Code change: [https://github.com/apache/kafka/pull/6984]
> Let me know if anything needed. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8583) Optimization for SslTransportLayer#write(ByteBuffer)

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869908#comment-16869908
 ] 

ASF GitHub Bot commented on KAFKA-8583:
---

zzmao commented on pull request #6984: KAFKA-8583: Optimization for 
SslTransportLayer#write(ByteBuffer)
URL: https://github.com/apache/kafka/pull/6984
 
 
   Warp data as many as possible in SslTransportLayer#write(ByteBuffer)
   The change come from Ambry, which used same TransportLayer code.
   https://github.com/linkedin/ambry/pull/1105
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimization for SslTransportLayer#write(ByteBuffer)
> 
>
> Key: KAFKA-8583
> URL: https://issues.apache.org/jira/browse/KAFKA-8583
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Mao
>Priority: Major
>
> The is a propose to optimize SslTransportLayer#write(ByteBuffer).
> Currently, sslEngine only reads data from src once per selector loop.
> This can be optimized by reading all src data in one selector loop.
> The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
> which uses same Transport Layer code. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8583) Optimization for SslTransportLayer#write(ByteBuffer)

2019-06-21 Thread Mao (JIRA)
Mao created KAFKA-8583:
--

 Summary: Optimization for SslTransportLayer#write(ByteBuffer)
 Key: KAFKA-8583
 URL: https://issues.apache.org/jira/browse/KAFKA-8583
 Project: Kafka
  Issue Type: Improvement
  Components: network
Reporter: Mao


The is a propose to optimize SslTransportLayer#write(ByteBuffer).

Currently, sslEngine only reads data from src once per selector loop.

This can be optimized by reading all src data in one selector loop.

The propose comes from [Amrby|[https://github.com/linkedin/ambry/pull/1105]], 
which uses same Transport Layer code. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8515) Materialize KTable when upstream uses Windowed instead of

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869865#comment-16869865
 ] 

ASF GitHub Bot commented on KAFKA-8515:
---

abbccdda commented on pull request #6885: KAFKA-8515: (POC) add state store 
type in KTable for correct materialization
URL: https://github.com/apache/kafka/pull/6885
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Materialize KTable when upstream uses Windowed instead of 
> 
>
> Key: KAFKA-8515
> URL: https://issues.apache.org/jira/browse/KAFKA-8515
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8106) Reducing the allocation and copying of ByteBuffer when logValidator do validation.

2019-06-21 Thread Guozhang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869862#comment-16869862
 ] 

Guozhang Wang commented on KAFKA-8106:
--

The merged PR is inspired by the original work of #6699 by [~Flower.min]. It 
tries to achieve the same CPU savings by reducing unnecessary byte allocation 
and corresponding GC. Unlike #6699 though, which depends on skipBytes of LZ4 
which used a shared byte array, in this PR we create a skip buffer outside of 
the compressed input stream. The reason is that not all compressed 
inputstream's implementation is optimized, more specifically:

1. GZIP used BufferedInputStream, which has a shared buffer, sized 16KB
2. SNAPPY used its own SnappyInputStream -> InputStream, which dynamically 
allocate
3. LZ4 used its own KafkaLZ4BlockInputStream, which has a shared buffer of 64KB
4. ZSTD used its own ZstdInputStream, but it's own overriden skip also 
dynamically allocate

The detailed implementation can be summarized as follows:

1. Add skipKeyValueIterator() into DefaultRecordBatch, used in LogValidator; 
also added PartialDefaultRecord which extends DefaultRecord.
1.a. In order make this optimization really effective, we also need to refactor 
the LogValidator to refactor part of the validation per record into the outer 
loop so that we do not need to update inPlaceAssigment inside the loop any 
more. And then based on this boolean we can decide whether or not to use 
skipKeyValueIterator or not before the loop.

1.b. Also used streaming iterator instead when skip-iterator cannot be used.

2. With SkipKeyValueIterator, pre-allocate a skip byte array with fixed size 
(2KB), and use this array to take the decompressed bytes through each record, 
validating metadata and key / value / header size, while skipping the key / 
value bytes.

3. Also tighten the unit tests of LogValidator to make sure scenarios like 
mismatched magic bytes / multiple batches per partition / discontinuous offsets 
/ etc are indeed validated.

> Reducing the allocation and copying of ByteBuffer  when logValidator  do 
> validation.
> 
>
> Key: KAFKA-8106
> URL: https://issues.apache.org/jira/browse/KAFKA-8106
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
> Environment: Server : 
> cpu:2*16 ; 
> MemTotal : 256G;
> Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network 
> Connection ; 
> SSD.
>Reporter: Flower.min
>Assignee: Flower.min
>Priority: Major
>  Labels: performance
> Fix For: 2.4.0
>
>
>       We do performance testing about Kafka in specific scenarios as 
> described below .We build a kafka cluster with one broker,and create topics 
> with different number of partitions.Then we start lots of producer processes 
> to send large amounts of messages to one of the topics at one  testing .
> *_Specific Scenario_*
>   
>  *_1.Main config of Kafka_*  
>  # Main config of Kafka  
> server:[num.network.threads=6;num.io.threads=128;queued.max.requests=500|http://num.network.threads%3D6%3Bnum.io.threads%3D128%3Bqueued.max.requests%3D500/]
>  # Number of TopicPartition : 50~2000
>  # Size of Single Message : 1024B
>  
>  *_2.Config of KafkaProducer_* 
> ||compression.type||[linger.ms|http://linger.ms/]||batch.size||buffer.memory||
> |lz4|1000ms~5000ms|16KB/10KB/100KB|128MB|
> *_3.The best result of performance testing_*  
> ||Network inflow rate||CPU Used (%)||Disk write speed||Performance of 
> production||
> |550MB/s~610MB/s|97%~99%|550MB/s~610MB/s       |23,000,000 messages/s|
> *_4.Phenomenon and  my doubt_*
>     _The upper limit of CPU usage has been reached  But  it does not 
> reach the upper limit of the bandwidth of the server  network. *We are 
> doubtful about which  cost too much CPU time and we want to Improve  
> performance and reduces CPU usage of Kafka server.*_
> _*5.Analysis*_
>         We analysis the JFIR of Kafka server when doing performance testing 
> .After we checked and completed the performance test again, we located the 
> code "*ByteBuffer recordBuffer = 
> ByteBuffer.allocate(sizeOfBodyInBytes);*(*Class:DefaultRecord,Function:readFrom()*)”
>  which consumed CPU resources and caused a lot of GC .Our modified code 
> reduces the allocation and copying of ByteBuffer, so the test performance is 
> greatly improved, and the CPU's stable usage is *below 60%*. The following is 
> a comparison of different code test performance under the same conditions.
> *Result of performance testing*
> *Main config of Kafka: Single 
> Message:1024B;TopicPartitions:200;linger.ms:1000ms.*
> | Single Message : 1024B,|Network inflow rate|CPU(%)|Messages/s|
> |Source code|600M/s|97%|25,000,000|
> |Modified 

[jira] [Updated] (KAFKA-8575) Investigate cleaning up task suspension (part 8)

2019-06-21 Thread Sophie Blee-Goldman (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-8575:
---
Summary: Investigate cleaning up task suspension (part 8)  (was: 
Investigate cleaning up task suspension (part 7))

> Investigate cleaning up task suspension (part 8)
> 
>
> Key: KAFKA-8575
> URL: https://issues.apache.org/jira/browse/KAFKA-8575
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> With KIP-429 the suspend/resume of tasks may have minimal gains while adding 
> a lot of complexity and potential bugs. We should consider removing/cleaning 
> it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8582) Consider adding an ExpiredWindowRecordHandler to Suppress

2019-06-21 Thread John Roesler (JIRA)
John Roesler created KAFKA-8582:
---

 Summary: Consider adding an ExpiredWindowRecordHandler to Suppress
 Key: KAFKA-8582
 URL: https://issues.apache.org/jira/browse/KAFKA-8582
 Project: Kafka
  Issue Type: Improvement
Reporter: John Roesler


I got some feedback on Suppress:
{quote}Specifying how to handle events outside the grace period does seem like 
a business concern, and simply discarding them thus seems risky (for example 
imagine any situation where money is involved).

This sort of situation is addressed by the late-triggering approach associated 
with watermarks 
(https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102), given 
this I wondered if you were considering adding anything similar?{quote}

It seems like, if a record has arrived past the grace period for its window, 
then the state of the windowed aggregation would already have been lost, so if 
we were to compute an aggregation result, it would be incorrect. Plus, since 
the window is already expired, we can't store the new (incorrect, but more 
importantly expired) aggregation result either, so any subsequent super-late 
records would also face the same blank-slate. I think this would wind up 
looking like this: if you have three timely records for a window, and then 
three more that arrive after the grace period, and you were doing a count 
aggregation, you'd see the counts emitted for the window as [1, 2, 3, 1, 1, 1]. 
I guess we could add a flag to the post-expiration results to indicate that 
they're broken, but this seems like the wrong approach. The post-expiration 
aggregation _results_ are meaningless, but I could see wanting to send the 
past-expiration _input records_ to a dead-letter queue or something instead of 
dropping them.

Along this line of thinking, I wonder if we should add an optional 
past-expiration record handler interface to the suppression operator. Then, you 
could define your own logic, whether it's a dead-letter queue, sending it to 
some alerting pipeline, or even just crashing the application before it can do 
something wrong. This would be a similar pattern to how we allow custom logic 
to handle deserialization errors by supplying a 
org.apache.kafka.streams.errors.DeserializationExceptionHandler.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8581) Augment ProduceResponse error messaging for specific culprit records

2019-06-21 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-8581:


Assignee: Guozhang Wang

> Augment ProduceResponse error messaging for specific culprit records
> 
>
> Key: KAFKA-8581
> URL: https://issues.apache.org/jira/browse/KAFKA-8581
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, core
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: needs-kip, user-experience
>
> We want to augment the error messaging from broker to producer clients in 
> order to let clients making finer grained handling logic as well as reporting 
> more meaning error messages to caller.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8581) Augment ProduceResponse error messaging for specific culprit records

2019-06-21 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8581:
-
Labels: needs-kip user-experience  (was: user-experience)

> Augment ProduceResponse error messaging for specific culprit records
> 
>
> Key: KAFKA-8581
> URL: https://issues.apache.org/jira/browse/KAFKA-8581
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, core
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: needs-kip, user-experience
>
> We want to augment the error messaging from broker to producer clients in 
> order to let clients making finer grained handling logic as well as reporting 
> more meaning error messages to caller.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8581) Augment ProduceResponse error messaging for specific culprit records

2019-06-21 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8581:
-
Description: We want to augment the error messaging from broker to producer 
clients in order to let clients making finer grained handling logic as well as 
reporting more meaning error messages to caller.

> Augment ProduceResponse error messaging for specific culprit records
> 
>
> Key: KAFKA-8581
> URL: https://issues.apache.org/jira/browse/KAFKA-8581
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, core
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: user-experience
>
> We want to augment the error messaging from broker to producer clients in 
> order to let clients making finer grained handling logic as well as reporting 
> more meaning error messages to caller.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8581) Augment ProduceResponse error messaging for specific culprit records

2019-06-21 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-8581:


 Summary: Augment ProduceResponse error messaging for specific 
culprit records
 Key: KAFKA-8581
 URL: https://issues.apache.org/jira/browse/KAFKA-8581
 Project: Kafka
  Issue Type: Bug
  Components: clients, core
Reporter: Guozhang Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8106) Reducing the allocation and copying of ByteBuffer when logValidator do validation.

2019-06-21 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-8106.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Reducing the allocation and copying of ByteBuffer  when logValidator  do 
> validation.
> 
>
> Key: KAFKA-8106
> URL: https://issues.apache.org/jira/browse/KAFKA-8106
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
> Environment: Server : 
> cpu:2*16 ; 
> MemTotal : 256G;
> Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network 
> Connection ; 
> SSD.
>Reporter: Flower.min
>Assignee: Flower.min
>Priority: Major
>  Labels: performance
> Fix For: 2.4.0
>
>
>       We do performance testing about Kafka in specific scenarios as 
> described below .We build a kafka cluster with one broker,and create topics 
> with different number of partitions.Then we start lots of producer processes 
> to send large amounts of messages to one of the topics at one  testing .
> *_Specific Scenario_*
>   
>  *_1.Main config of Kafka_*  
>  # Main config of Kafka  
> server:[num.network.threads=6;num.io.threads=128;queued.max.requests=500|http://num.network.threads%3D6%3Bnum.io.threads%3D128%3Bqueued.max.requests%3D500/]
>  # Number of TopicPartition : 50~2000
>  # Size of Single Message : 1024B
>  
>  *_2.Config of KafkaProducer_* 
> ||compression.type||[linger.ms|http://linger.ms/]||batch.size||buffer.memory||
> |lz4|1000ms~5000ms|16KB/10KB/100KB|128MB|
> *_3.The best result of performance testing_*  
> ||Network inflow rate||CPU Used (%)||Disk write speed||Performance of 
> production||
> |550MB/s~610MB/s|97%~99%|550MB/s~610MB/s       |23,000,000 messages/s|
> *_4.Phenomenon and  my doubt_*
>     _The upper limit of CPU usage has been reached  But  it does not 
> reach the upper limit of the bandwidth of the server  network. *We are 
> doubtful about which  cost too much CPU time and we want to Improve  
> performance and reduces CPU usage of Kafka server.*_
> _*5.Analysis*_
>         We analysis the JFIR of Kafka server when doing performance testing 
> .After we checked and completed the performance test again, we located the 
> code "*ByteBuffer recordBuffer = 
> ByteBuffer.allocate(sizeOfBodyInBytes);*(*Class:DefaultRecord,Function:readFrom()*)”
>  which consumed CPU resources and caused a lot of GC .Our modified code 
> reduces the allocation and copying of ByteBuffer, so the test performance is 
> greatly improved, and the CPU's stable usage is *below 60%*. The following is 
> a comparison of different code test performance under the same conditions.
> *Result of performance testing*
> *Main config of Kafka: Single 
> Message:1024B;TopicPartitions:200;linger.ms:1000ms.*
> | Single Message : 1024B,|Network inflow rate|CPU(%)|Messages/s|
> |Source code|600M/s|97%|25,000,000|
> |Modified code|1GB/s|<60%|41,660,000|
> **1.Before modified code(Source code) GC:**
> ![](https://i.loli.net/2019/05/07/5cd16df163ad3.png)
> **2.After modified code(remove allocation of ByteBuffer) GC:**
> ![](https://i.loli.net/2019/05/07/5cd16dae1dbc2.png)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8106) Reducing the allocation and copying of ByteBuffer when logValidator do validation.

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869836#comment-16869836
 ] 

ASF GitHub Bot commented on KAFKA-8106:
---

guozhangwang commented on pull request #6785: KAFKA-8106: Skipping ByteBuffer 
allocation of key / value / headers in logValidator
URL: https://github.com/apache/kafka/pull/6785
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reducing the allocation and copying of ByteBuffer  when logValidator  do 
> validation.
> 
>
> Key: KAFKA-8106
> URL: https://issues.apache.org/jira/browse/KAFKA-8106
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
> Environment: Server : 
> cpu:2*16 ; 
> MemTotal : 256G;
> Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network 
> Connection ; 
> SSD.
>Reporter: Flower.min
>Assignee: Flower.min
>Priority: Major
>  Labels: performance
>
>       We do performance testing about Kafka in specific scenarios as 
> described below .We build a kafka cluster with one broker,and create topics 
> with different number of partitions.Then we start lots of producer processes 
> to send large amounts of messages to one of the topics at one  testing .
> *_Specific Scenario_*
>   
>  *_1.Main config of Kafka_*  
>  # Main config of Kafka  
> server:[num.network.threads=6;num.io.threads=128;queued.max.requests=500|http://num.network.threads%3D6%3Bnum.io.threads%3D128%3Bqueued.max.requests%3D500/]
>  # Number of TopicPartition : 50~2000
>  # Size of Single Message : 1024B
>  
>  *_2.Config of KafkaProducer_* 
> ||compression.type||[linger.ms|http://linger.ms/]||batch.size||buffer.memory||
> |lz4|1000ms~5000ms|16KB/10KB/100KB|128MB|
> *_3.The best result of performance testing_*  
> ||Network inflow rate||CPU Used (%)||Disk write speed||Performance of 
> production||
> |550MB/s~610MB/s|97%~99%|550MB/s~610MB/s       |23,000,000 messages/s|
> *_4.Phenomenon and  my doubt_*
>     _The upper limit of CPU usage has been reached  But  it does not 
> reach the upper limit of the bandwidth of the server  network. *We are 
> doubtful about which  cost too much CPU time and we want to Improve  
> performance and reduces CPU usage of Kafka server.*_
> _*5.Analysis*_
>         We analysis the JFIR of Kafka server when doing performance testing 
> .After we checked and completed the performance test again, we located the 
> code "*ByteBuffer recordBuffer = 
> ByteBuffer.allocate(sizeOfBodyInBytes);*(*Class:DefaultRecord,Function:readFrom()*)”
>  which consumed CPU resources and caused a lot of GC .Our modified code 
> reduces the allocation and copying of ByteBuffer, so the test performance is 
> greatly improved, and the CPU's stable usage is *below 60%*. The following is 
> a comparison of different code test performance under the same conditions.
> *Result of performance testing*
> *Main config of Kafka: Single 
> Message:1024B;TopicPartitions:200;linger.ms:1000ms.*
> | Single Message : 1024B,|Network inflow rate|CPU(%)|Messages/s|
> |Source code|600M/s|97%|25,000,000|
> |Modified code|1GB/s|<60%|41,660,000|
> **1.Before modified code(Source code) GC:**
> ![](https://i.loli.net/2019/05/07/5cd16df163ad3.png)
> **2.After modified code(remove allocation of ByteBuffer) GC:**
> ![](https://i.loli.net/2019/05/07/5cd16dae1dbc2.png)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8557) Support named listeners in system tests

2019-06-21 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869716#comment-16869716
 ] 

Rajini Sivaram commented on KAFKA-8557:
---

Merged [https://github.com/apache/kafka/pull/6938] which adds support for 
inter-broker listener and client listener with the same security protocol on 
different listeners. Leaving this Jira open to add full support for named 
listeners

> Support named listeners in system tests
> ---
>
> Key: KAFKA-8557
> URL: https://issues.apache.org/jira/browse/KAFKA-8557
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Stanislav Vodetskyi
>Priority: Major
>  Labels: system-tests
>
> Kafka currently supports named listeners, where you can have two or more 
> listeners with the same security protocol but different names. Current 
> {{KafkaService}} implementation, however, wouldn't allow that, since 
> listeners in {{port_mappings}} are keyed by {{security_protocol}}, so there's 
> 1-1 relationship. Kafka clients in system tests use {{bootstrap_servers()}} 
> method, which also accepts {{security_protocol}}, as a way to pick a port to 
> talk to kafka.
> The scope of this jira is to refactor KafkaService to support named 
> listeners, specifically two things - ability to have custom-named listeners 
> and ability to have several listeners with the same security protocol. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8563) Minor: Remove method call in networkSend. Rely on java's vargs boxing/autoboxing

2019-06-21 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-8563.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Minor: Remove method call in networkSend. Rely on java's vargs 
> boxing/autoboxing
> 
>
> Key: KAFKA-8563
> URL: https://issues.apache.org/jira/browse/KAFKA-8563
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 2.4.0
> Environment: Darwin WM-CXX 18.2.0 Darwin Kernel Version 18.2.0: 
> Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64
> ProductName:  Mac OS X
> ProductVersion:   10.14.3
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
>Reporter: karan kumar
>Priority: Minor
> Fix For: 2.4.0
>
>
> There was a  
> [https://github.com/apache/kafka/blob/93bf96589471acadfb90e57ebfecbd91f679f77b/clients/src/main/java/org/apache/kafka/common/network/NetworkSend.java#L30]
>  which can be removed from the network send class. 
>  
> Initial JMH benchmarks suggest no performance penalty.
>  
> Present network send JMH report:
>  
> {code:java}
> jmh-benchmarks git:(trunk) ✗ ./jmh.sh -f 2 ByteBufferSendBenchmark
> running gradlew :jmh-benchmarks:clean :jmh-benchmarks:shadowJar in quiet mode
> ./jmh.sh: line 34: ../gradlew: No such file or directory
> gradle build done
> running JMH with args [-f 2 ByteBufferSendBenchmark]
> # JMH version: 1.21
> # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09
> # VM invoker: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/jre/bin/java
> # VM options: 
> # Warmup: 5 iterations, 2000 ms each
> # Measurement: 5 iterations, 5000 ms each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: 
> org.apache.kafka.jmh.common.ByteBufferSendBenchmark.benchmarkMethod
> # Run progress: 0.00% complete, ETA 00:01:10
> # Fork: 1 of 2
> # Warmup Iteration 1: 35.049 ops/us
> # Warmup Iteration 2: 60.877 ops/us
> # Warmup Iteration 3: 59.207 ops/us
> # Warmup Iteration 4: 59.077 ops/us
> # Warmup Iteration 5: 59.327 ops/us
> Iteration 1: 58.516 ops/us
> Iteration 2: 58.952 ops/us
> Iteration 3: 58.596 ops/us
> Iteration 4: 59.126 ops/us
> Iteration 5: 58.557 ops/us
> # Run progress: 50.00% complete, ETA 00:00:35
> # Fork: 2 of 2
> # Warmup Iteration 1: 36.377 ops/us
> # Warmup Iteration 2: 61.741 ops/us
> # Warmup Iteration 3: 59.683 ops/us
> # Warmup Iteration 4: 59.571 ops/us
> # Warmup Iteration 5: 59.351 ops/us
> Iteration 1: 59.044 ops/us
> Iteration 2: 59.107 ops/us
> Iteration 3: 57.771 ops/us
> Iteration 4: 59.648 ops/us
> Iteration 5: 59.408 ops/us
> Result "org.apache.kafka.jmh.common.ByteBufferSendBenchmark.benchmarkMethod":
> 58.872 ±(99.9%) 0.806 ops/us [Average]
> (min, avg, max) = (57.771, 58.872, 59.648), stdev = 0.533
> CI (99.9%): [58.066, 59.679] (assumes normal distribution)
> # Run complete. Total time: 00:01:11
> REMEMBER: The numbers below are just data. To gain reusable insights, you 
> need to follow up on
> why the numbers are the way they are. Use profilers (see -prof, -lprof), 
> design factorial
> experiments, perform baseline and negative tests that provide experimental 
> control, make sure
> the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from 
> the domain experts.
> Do not assume the numbers tell you what you want them to tell.
> Benchmark Mode Cnt Score Error Units
> *ByteBufferSendBenchmark.benchmarkMethod thrpt 10 58.872 ± 0.806 ops/us*
> JMH benchmarks done
> {code}
> and after removing the method call
>  
> {code:java}
> // code placeholder
> ./jmh.sh: line 34: ../gradlew: No such file or directory
> gradle build done
> running JMH with args [-f 2 ByteBufferSendBenchmark]
> # JMH version: 1.21
> # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09
> # VM invoker: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/jre/bin/java
> # VM options: 
> # Warmup: 5 iterations, 2000 ms each
> # Measurement: 5 iterations, 5000 ms each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: 
> org.apache.kafka.jmh.common.ByteBufferSendBenchmark.benchmarkMethod
> # Run progress: 0.00% complete, ETA 00:01:10
> # Fork: 1 of 2
> # Warmup Iteration 1: 34.273 ops/us
> # Warmup Iteration 2: 61.565 ops/us
> # Warmup Iteration 3: 59.307 ops/us
> # Warmup Iteration 4: 57.081 ops/us
> # Warmup Iteration 5: 59.970 ops/us
> Iteration 1: 59.657 ops/us
> Iteration 2: 59.607 ops/us
> Iteration 3: 59.931 ops/us
> 

[jira] [Commented] (KAFKA-8563) Minor: Remove method call in networkSend. Rely on java's vargs boxing/autoboxing

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869684#comment-16869684
 ] 

ASF GitHub Bot commented on KAFKA-8563:
---

ijuma commented on pull request #6967: KAFKA-8563: removing sizeDelimit call
URL: https://github.com/apache/kafka/pull/6967
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Minor: Remove method call in networkSend. Rely on java's vargs 
> boxing/autoboxing
> 
>
> Key: KAFKA-8563
> URL: https://issues.apache.org/jira/browse/KAFKA-8563
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 2.4.0
> Environment: Darwin WM-CXX 18.2.0 Darwin Kernel Version 18.2.0: 
> Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64
> ProductName:  Mac OS X
> ProductVersion:   10.14.3
> java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
>Reporter: karan kumar
>Priority: Minor
>
> There was a  
> [https://github.com/apache/kafka/blob/93bf96589471acadfb90e57ebfecbd91f679f77b/clients/src/main/java/org/apache/kafka/common/network/NetworkSend.java#L30]
>  which can be removed from the network send class. 
>  
> Initial JMH benchmarks suggest no performance penalty.
>  
> Present network send JMH report:
>  
> {code:java}
> jmh-benchmarks git:(trunk) ✗ ./jmh.sh -f 2 ByteBufferSendBenchmark
> running gradlew :jmh-benchmarks:clean :jmh-benchmarks:shadowJar in quiet mode
> ./jmh.sh: line 34: ../gradlew: No such file or directory
> gradle build done
> running JMH with args [-f 2 ByteBufferSendBenchmark]
> # JMH version: 1.21
> # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09
> # VM invoker: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/jre/bin/java
> # VM options: 
> # Warmup: 5 iterations, 2000 ms each
> # Measurement: 5 iterations, 5000 ms each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: 
> org.apache.kafka.jmh.common.ByteBufferSendBenchmark.benchmarkMethod
> # Run progress: 0.00% complete, ETA 00:01:10
> # Fork: 1 of 2
> # Warmup Iteration 1: 35.049 ops/us
> # Warmup Iteration 2: 60.877 ops/us
> # Warmup Iteration 3: 59.207 ops/us
> # Warmup Iteration 4: 59.077 ops/us
> # Warmup Iteration 5: 59.327 ops/us
> Iteration 1: 58.516 ops/us
> Iteration 2: 58.952 ops/us
> Iteration 3: 58.596 ops/us
> Iteration 4: 59.126 ops/us
> Iteration 5: 58.557 ops/us
> # Run progress: 50.00% complete, ETA 00:00:35
> # Fork: 2 of 2
> # Warmup Iteration 1: 36.377 ops/us
> # Warmup Iteration 2: 61.741 ops/us
> # Warmup Iteration 3: 59.683 ops/us
> # Warmup Iteration 4: 59.571 ops/us
> # Warmup Iteration 5: 59.351 ops/us
> Iteration 1: 59.044 ops/us
> Iteration 2: 59.107 ops/us
> Iteration 3: 57.771 ops/us
> Iteration 4: 59.648 ops/us
> Iteration 5: 59.408 ops/us
> Result "org.apache.kafka.jmh.common.ByteBufferSendBenchmark.benchmarkMethod":
> 58.872 ±(99.9%) 0.806 ops/us [Average]
> (min, avg, max) = (57.771, 58.872, 59.648), stdev = 0.533
> CI (99.9%): [58.066, 59.679] (assumes normal distribution)
> # Run complete. Total time: 00:01:11
> REMEMBER: The numbers below are just data. To gain reusable insights, you 
> need to follow up on
> why the numbers are the way they are. Use profilers (see -prof, -lprof), 
> design factorial
> experiments, perform baseline and negative tests that provide experimental 
> control, make sure
> the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from 
> the domain experts.
> Do not assume the numbers tell you what you want them to tell.
> Benchmark Mode Cnt Score Error Units
> *ByteBufferSendBenchmark.benchmarkMethod thrpt 10 58.872 ± 0.806 ops/us*
> JMH benchmarks done
> {code}
> and after removing the method call
>  
> {code:java}
> // code placeholder
> ./jmh.sh: line 34: ../gradlew: No such file or directory
> gradle build done
> running JMH with args [-f 2 ByteBufferSendBenchmark]
> # JMH version: 1.21
> # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09
> # VM invoker: 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/jre/bin/java
> # VM options: 
> # Warmup: 5 iterations, 2000 ms each
> # Measurement: 5 iterations, 5000 ms each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: 
> 

[jira] [Commented] (KAFKA-6173) Leader should stop accepting requests when disconnected from ZK

2019-06-21 Thread jacky (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869680#comment-16869680
 ] 

jacky commented on KAFKA-6173:
--

Has any plan to fix it? I also think it should, but if the partition only has 
one replica, it will be nothing, not need to reject request.

> Leader should stop accepting requests when disconnected from ZK
> ---
>
> Key: KAFKA-6173
> URL: https://issues.apache.org/jira/browse/KAFKA-6173
> Project: Kafka
>  Issue Type: Bug
>Reporter: Kostas Christidis
>Priority: Minor
> Attachments: 0-before.png, 1-disconnect.png, 2-after.png
>
>
> h2. Setup
> 1 consumer (C1), 2 datacenters (DC1, DC2), a ZK ensemble located in DC1, and 
> 3 brokers (B1, B2, B3), where:
> * B1 is the cluster controller, located in DC1
> * B2 is the leader for partition "foo", located in DC2
> * B3 is a replica for partition "foo", located in DC1
> h2. Condition
> There is a network partition between DC1 and DC2
> h2. Actual behavior
> B2 will not relinquish leadership and will continue to serve C1. This 
> split-brain situation will go on until C1 refreshes its metadata and finds 
> out about the new leader.
> h2. Expected behavior
> B2 should stop accepting requests when it loses the ZK connection.
> --
> I am prioritizing this a "minor" bug because the multi-DC arrangement is not 
> optimal, and we'd be better off using a tool such as MirrorMaker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8557) Support named listeners in system tests

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869679#comment-16869679
 ] 

ASF GitHub Bot commented on KAFKA-8557:
---

rajinisivaram commented on pull request #6938: KAFKA-8557: system tests - add 
support for (optional) interbroker listener with the same security protocol as 
client listeners
URL: https://github.com/apache/kafka/pull/6938
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support named listeners in system tests
> ---
>
> Key: KAFKA-8557
> URL: https://issues.apache.org/jira/browse/KAFKA-8557
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Stanislav Vodetskyi
>Priority: Major
>  Labels: system-tests
>
> Kafka currently supports named listeners, where you can have two or more 
> listeners with the same security protocol but different names. Current 
> {{KafkaService}} implementation, however, wouldn't allow that, since 
> listeners in {{port_mappings}} are keyed by {{security_protocol}}, so there's 
> 1-1 relationship. Kafka clients in system tests use {{bootstrap_servers()}} 
> method, which also accepts {{security_protocol}}, as a way to pick a port to 
> talk to kafka.
> The scope of this jira is to refactor KafkaService to support named 
> listeners, specifically two things - ability to have custom-named listeners 
> and ability to have several listeners with the same security protocol. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8570) Downconversion could fail when log contains out of order message formats

2019-06-21 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8570.

   Resolution: Fixed
Fix Version/s: 2.2.2
   2.1.2
   2.0.2
   2.3.1

> Downconversion could fail when log contains out of order message formats
> 
>
> Key: KAFKA-8570
> URL: https://issues.apache.org/jira/browse/KAFKA-8570
> Project: Kafka
>  Issue Type: Bug
>Reporter: Dhruvil Shah
>Assignee: Dhruvil Shah
>Priority: Major
> Fix For: 2.3.1, 2.0.2, 2.1.2, 2.2.2
>
>
> When the log contains out of order message formats (for example a v2 message 
> followed by a v1 message), it is possible for down-conversion to fail in 
> certain scenarios where batches compressed and greater than 1kB in size. 
> Down-conversion fails with a stack like the following:
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:275)
> at 
> org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.writeTo(FileLogInputStream.java:176)
> at 
> org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:107)
> at 
> org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:242)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8570) Downconversion could fail when log contains out of order message formats

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869657#comment-16869657
 ] 

ASF GitHub Bot commented on KAFKA-8570:
---

hachikuji commented on pull request #6974: KAFKA-8570: Grow buffer to hold down 
converted records if it was insufficiently sized
URL: https://github.com/apache/kafka/pull/6974
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Downconversion could fail when log contains out of order message formats
> 
>
> Key: KAFKA-8570
> URL: https://issues.apache.org/jira/browse/KAFKA-8570
> Project: Kafka
>  Issue Type: Bug
>Reporter: Dhruvil Shah
>Assignee: Dhruvil Shah
>Priority: Major
>
> When the log contains out of order message formats (for example a v2 message 
> followed by a v1 message), it is possible for down-conversion to fail in 
> certain scenarios where batches compressed and greater than 1kB in size. 
> Down-conversion fails with a stack like the following:
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:275)
> at 
> org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.writeTo(FileLogInputStream.java:176)
> at 
> org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:107)
> at 
> org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:242)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6408) Kafka MirrorMaker doesn't replicate messages when .* regex is used

2019-06-21 Thread Waleed Fateem (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Waleed Fateem resolved KAFKA-6408.
--
Resolution: Not A Problem

> Kafka MirrorMaker doesn't replicate messages when .* regex is used
> --
>
> Key: KAFKA-6408
> URL: https://issues.apache.org/jira/browse/KAFKA-6408
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.0
>Reporter: Waleed Fateem
>Priority: Minor
>
> When using the regular expression .* for the whitelist parameter in Kafka 
> MirrorMaker in order to mirror all topics, the MirrorMaker doesn't replicate 
> any messages. I was then able to see messages flowing again and being 
> replicated between the two Kafka clusters once I changed the whitelist 
> configuration to use another regular expression, such as 'topic1 | topic2 | 
> topic3' 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8578) Add Functionality to Expose RocksDB Metrics

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869620#comment-16869620
 ] 

ASF GitHub Bot commented on KAFKA-8578:
---

cadonna commented on pull request #6979: KAFKA-8578: Add basic functionality to 
expose RocksDB metrics
URL: https://github.com/apache/kafka/pull/6979
 
 
   - Adds `RocksDBMetrics` class that provides methods to get
 sensors from the Kafka metrics registry and to setup the
 sensors to record RocksDB metrics
   - Extends `StreamsMetricsImpl` with functionality to add the
 required metrics to the sensors.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Functionality to Expose RocksDB Metrics
> ---
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> To expose RocksDB metrics as specified in KIP-471, functionality to create 
> and record metrics in the Kafka metrics registry is required. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6408) Kafka MirrorMaker doesn't replicate messages when .* regex is used

2019-06-21 Thread Waleed Fateem (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869618#comment-16869618
 ] 

Waleed Fateem commented on KAFKA-6408:
--

Just closing the loop here. This is only an issue when you configure Kafka 
using the Cloudera Manager UI. The problem is with Cloudera Manager and not 
Kafka, where it's not parsing the regular expression correctly. 



Marking as resolved. 

> Kafka MirrorMaker doesn't replicate messages when .* regex is used
> --
>
> Key: KAFKA-6408
> URL: https://issues.apache.org/jira/browse/KAFKA-6408
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.11.0.0
>Reporter: Waleed Fateem
>Priority: Minor
>
> When using the regular expression .* for the whitelist parameter in Kafka 
> MirrorMaker in order to mirror all topics, the MirrorMaker doesn't replicate 
> any messages. I was then able to see messages flowing again and being 
> replicated between the two Kafka clusters once I changed the whitelist 
> configuration to use another regular expression, such as 'topic1 | topic2 | 
> topic3' 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8578) Add Functionality to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8578:
-
Summary: Add Functionality to Expose RocksDB Metrics  (was: Add Sensors and 
Metrics to Expose RocksDB Metrics)

> Add Functionality to Expose RocksDB Metrics
> ---
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> To expose RocksDB metrics as specified in KIP-471, functionality to create 
> and record metrics in the Kafka metrics registry is required. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8578) Add Sensors and Metrics to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8578:
-
Description: To expose RocksDB metrics as specified in KIP-471, 
functionality to create and record metrics in the Kafka metrics registry is 
required.   (was: To expose RocksDB metrics as specified in KIP-471, a bunch of 
sensors and metrics in the Kafka metrics registry are required.)

> Add Sensors and Metrics to Expose RocksDB Metrics
> -
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> To expose RocksDB metrics as specified in KIP-471, functionality to create 
> and record metrics in the Kafka metrics registry is required. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8580) Compute RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-8580:


Assignee: Bruno Cadonna

> Compute RocksDB Metrics
> ---
>
> Key: KAFKA-8580
> URL: https://issues.apache.org/jira/browse/KAFKA-8580
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
>
> Once the metrics for RocksDB are exposed, the metrics need to be recorded in 
> the RocksDB state stores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8578) Add Sensors and Metrics to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-8578:


Assignee: Bruno Cadonna

> Add Sensors and Metrics to Expose RocksDB Metrics
> -
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> To expose RocksDB metrics as specified in KIP-471, a bunch of sensors and 
> metrics in the Kafka metrics registry are required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8579) Expose RocksDB Metrics to JMX

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8579:
-
Fix Version/s: 2.4.0

> Expose RocksDB Metrics to JMX
> -
>
> Key: KAFKA-8579
> URL: https://issues.apache.org/jira/browse/KAFKA-8579
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> Sensors and the corresponding metrics specified to record RocksDB metrics 
> need to be created in the RocksDB state stores. Once the metrics are created 
> and registered in the Kafka metrics registry, they are also exposed in JMX. 
> This ticker does not include the computation of the RocksDB metrics.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8579) Expose RocksDB Metrics to JMX

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8579:
-
Description: Sensors and the corresponding metrics specified to record 
RocksDB metrics need to be created in the RocksDB state stores. Once the 
metrics are created and registered in the Kafka metrics registry, they are also 
exposed in JMX. This ticket does not include the computation of the RocksDB 
metrics.    (was: Sensors and the corresponding metrics specified to record 
RocksDB metrics need to be created in the RocksDB state stores. Once the 
metrics are created and registered in the Kafka metrics registry, they are also 
exposed in JMX. This ticker does not include the computation of the RocksDB 
metrics.  )

> Expose RocksDB Metrics to JMX
> -
>
> Key: KAFKA-8579
> URL: https://issues.apache.org/jira/browse/KAFKA-8579
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> Sensors and the corresponding metrics specified to record RocksDB metrics 
> need to be created in the RocksDB state stores. Once the metrics are created 
> and registered in the Kafka metrics registry, they are also exposed in JMX. 
> This ticket does not include the computation of the RocksDB metrics.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8579) Expose RocksDB Metrics to JMX

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna reassigned KAFKA-8579:


Assignee: Bruno Cadonna

> Expose RocksDB Metrics to JMX
> -
>
> Key: KAFKA-8579
> URL: https://issues.apache.org/jira/browse/KAFKA-8579
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> Sensors and the corresponding metrics specified to record RocksDB metrics 
> need to be created in the RocksDB state stores. Once the metrics are created 
> and registered in the Kafka metrics registry, they are also exposed in JMX. 
> This ticket does not include the computation of the RocksDB metrics.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8578) Add Sensors and Metrics to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8578:
-
Fix Version/s: 2.4.0

> Add Sensors and Metrics to Expose RocksDB Metrics
> -
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
> Fix For: 2.4.0
>
>
> To expose RocksDB metrics as specified in KIP-471, a bunch of sensors and 
> metrics in the Kafka metrics registry are required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8578) Add Sensors and Metrics to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna updated KAFKA-8578:
-
Component/s: streams

> Add Sensors and Metrics to Expose RocksDB Metrics
> -
>
> Key: KAFKA-8578
> URL: https://issues.apache.org/jira/browse/KAFKA-8578
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Bruno Cadonna
>Priority: Major
>
> To expose RocksDB metrics as specified in KIP-471, a bunch of sensors and 
> metrics in the Kafka metrics registry are required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8580) Compute RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)
Bruno Cadonna created KAFKA-8580:


 Summary: Compute RocksDB Metrics
 Key: KAFKA-8580
 URL: https://issues.apache.org/jira/browse/KAFKA-8580
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Bruno Cadonna


Once the metrics for RocksDB are exposed, the metrics need to be recorded in 
the RocksDB state stores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8579) Expose RocksDB Metrics to JMX

2019-06-21 Thread Bruno Cadonna (JIRA)
Bruno Cadonna created KAFKA-8579:


 Summary: Expose RocksDB Metrics to JMX
 Key: KAFKA-8579
 URL: https://issues.apache.org/jira/browse/KAFKA-8579
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Bruno Cadonna


Sensors and the corresponding metrics specified to record RocksDB metrics need 
to be created in the RocksDB state stores. Once the metrics are created and 
registered in the Kafka metrics registry, they are also exposed in JMX. This 
ticker does not include the computation of the RocksDB metrics.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8519) Trogdor should support network degradation

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869588#comment-16869588
 ] 

ASF GitHub Bot commented on KAFKA-8519:
---

mumrah commented on pull request #6912: KAFKA-8519 Add trogdor action to slow 
down a network
URL: https://github.com/apache/kafka/pull/6912
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Trogdor should support network degradation
> --
>
> Key: KAFKA-8519
> URL: https://issues.apache.org/jira/browse/KAFKA-8519
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: David Arthur
>Assignee: David Arthur
>Priority: Minor
>
> Trogdor should allow us to simulate degraded networks, similar to the network 
> partition spec.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8578) Add Sensors and Metrics to Expose RocksDB Metrics

2019-06-21 Thread Bruno Cadonna (JIRA)
Bruno Cadonna created KAFKA-8578:


 Summary: Add Sensors and Metrics to Expose RocksDB Metrics
 Key: KAFKA-8578
 URL: https://issues.apache.org/jira/browse/KAFKA-8578
 Project: Kafka
  Issue Type: Sub-task
Reporter: Bruno Cadonna


To expose RocksDB metrics as specified in KIP-471, a bunch of sensors and 
metrics in the Kafka metrics registry are required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8574) EOS race condition during task transition leads to LocalStateStore truncation in Kafka Streams 2.0.1

2019-06-21 Thread William Greer (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869529#comment-16869529
 ] 

William Greer commented on KAFKA-8574:
--

This is a different issue from KAFKA-8187. KAFKA-8187 had the possible local 
data loss when standby tasks were in use. This issue can have possible data 
loss regardless of usage of standbys as long as EOS is enabled. This issue may 
be mitigated by KAFKA-7672 but the race condition that makes this possible 
still exists even with the KAFKA-7672 change-set applied. The race conditions 
around the unprotected access of the checkpoint files (Writes and reads of the 
checkpoint without a lock) still exist in trunk, whether there are any 
correctness issues when the race condition occurs is a different question.

> EOS race condition during task transition leads to LocalStateStore truncation 
> in Kafka Streams 2.0.1
> 
>
> Key: KAFKA-8574
> URL: https://issues.apache.org/jira/browse/KAFKA-8574
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.1
>Reporter: William Greer
>Priority: Major
>
> *Overview*
>  While using EOS in Kafka Stream there is a race condition where the 
> checkpoint file is written by the previous owning thread (Thread A) after the 
> new owning thread (Thread B) reads the checkpoint file. Thread B then starts 
> a restoration since no checkpoint file was found. A re-balance occurs before 
> Thread B completes the restoration and a third Thread (Thread C) becomes the 
> owning thread (Thread C) reads the checkpoint file written by Thread A which 
> does not correspond to the current state of the RocksDB state store. When 
> this race condition occurs the state store will have the most recent records 
> and some amount of the oldest records but will be missing some amount of 
> records in between. If A->Z represents the entire changelog to the present 
> then when this scenario occurs the state store would contain records [A->K 
> and Y->Z] where the state store is missing records K->Y.
>   
>  This race condition is possible due to dirty writes and dirty reads of the 
> checkpoint file.
>   
>  *Example:*
>  Thread refers to a Kafka Streams StreamThread [0]
>  Thread A, B and C are running in the same JVM in the same streams 
> application.
>   
>  Scenario:
>  Thread-A is in RUNNING state and up to date on partition 1.
>  Thread-A is suspended on 1. This does not write a checkpoint file because 
> EOS is enabled [1]
>  Thread-B is assigned to 1
>  Thread-B does not find checkpoint in StateManager [2]
>  Thread-A is assigned a different partition. Task writes suspended tasks 
> checkpoints to disk. Checkpoint for 1 is written. [3]
>  Thread-B deletes LocalStore and starts restoring. The deletion of the 
> LocalStore does not delete checkpoint file. [4]
>  Thread-C is revoked
>  Thread-A is revoked
>  Thread-B is revoked from the assigned status. Does not write a checkpoint 
> file
>  - Note Thread-B never reaches the running state, it remains in the 
> PARTITIONS_ASSIGNED state until it transitions to the PARTITIONS_REVOKED state
> Thread-C is assigned 1
>  Thread-C finds checkpoint in StateManager. This checkpoint corresponds to 
> where Thread-A left the state store for partition 1 at and not where Thread-B 
> left the state store at.
>  Thread-C begins restoring from checkpoint. The state store is missing an 
> unknown number of records at this point
>  Thread-B is assigned does not write a checkpoint file for partition 1, 
> because it had not reached a running status before being revoked
>   
>  [0] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java]
>  [1] 
> [https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L522-L553]
>  [2] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java#L98]
>  [3] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java#L104-L105]
>  & 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AssignedTasks.java#L316-L331]
>  [4] 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java#L228]
>  & 
> [https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AbstractStateManager.java#L62-L123]
>  Specifically 
> 

[jira] [Commented] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-06-21 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869505#comment-16869505
 ] 

Rajini Sivaram commented on KAFKA-8010:
---

[~mimaison] I believe you should be able to use values in square brackets for 
configs that contain equals, comma etc.

> kafka-configs.sh does not allow setting config with an equal in the value
> -
>
> Key: KAFKA-8010
> URL: https://issues.apache.org/jira/browse/KAFKA-8010
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Mickael Maison
>Priority: Major
> Attachments: image-2019-03-05-19-45-47-168.png
>
>
> The sasl.jaas.config typically includes equals in its value. Unfortunately 
> the kafka-configs tool does not parse such values correctly and hits an error:
> ./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
> --entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
> org.apache.kafka.common.security.plain.PlainLoginModule required\n  
> username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
> org.apache.zookeeper.server.auth.DigestLoginModule required\n  
> username=\"myuser2\"\n  password=\"mypassword2\;\n};"
> requirement failed: Invalid entity config: all configs to be added must be in 
> the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8342) Admin tool to setup Kafka Stream topology (internal) topics

2019-06-21 Thread WooYoung (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869352#comment-16869352
 ] 

WooYoung commented on KAFKA-8342:
-

Not yet. I`m so sorry.

I`m going to share some document what I`m doing so far.

It is probably will be sharing in a week.

Once again I`m so sorry about late.

> Admin tool to setup Kafka Stream topology (internal) topics
> ---
>
> Key: KAFKA-8342
> URL: https://issues.apache.org/jira/browse/KAFKA-8342
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Boyang Chen
>Assignee: WooYoung
>Priority: Major
>  Labels: newbie
>
> We have seen customers who need to deploy their application to production 
> environment but don't have access to create changelog and repartition topics. 
> They need to ask admin team to manually create those topics before proceeding 
> to start the actual stream job. We could add an admin tool to help them go 
> through the process quicker by providing a command that could
>  # Read through current stream topology
>  # Create corresponding topics as needed, even including output topics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8342) Admin tool to setup Kafka Stream topology (internal) topics

2019-06-21 Thread Carlos Manuel Duclos Vergara (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869349#comment-16869349
 ] 

Carlos Manuel Duclos Vergara commented on KAFKA-8342:
-

I have read the information in the ticket and the comments, that is all the 
information I have. Is there a design document that you could share with me?

> Admin tool to setup Kafka Stream topology (internal) topics
> ---
>
> Key: KAFKA-8342
> URL: https://issues.apache.org/jira/browse/KAFKA-8342
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Boyang Chen
>Assignee: WooYoung
>Priority: Major
>  Labels: newbie
>
> We have seen customers who need to deploy their application to production 
> environment but don't have access to create changelog and repartition topics. 
> They need to ask admin team to manually create those topics before proceeding 
> to start the actual stream job. We could add an admin tool to help them go 
> through the process quicker by providing a command that could
>  # Read through current stream topology
>  # Create corresponding topics as needed, even including output topics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8342) Admin tool to setup Kafka Stream topology (internal) topics

2019-06-21 Thread WooYoung (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869320#comment-16869320
 ] 

WooYoung edited comment on KAFKA-8342 at 6/21/19 9:28 AM:
--

yep. It is in progress.

If I need you help, I would like to be helped you.

Do you have any reference to work this??

Thanks


was (Author: clearpal7):
yep. It is in progress.

If I need you help, I would like to be helped you.

Thanks

> Admin tool to setup Kafka Stream topology (internal) topics
> ---
>
> Key: KAFKA-8342
> URL: https://issues.apache.org/jira/browse/KAFKA-8342
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Boyang Chen
>Assignee: WooYoung
>Priority: Major
>  Labels: newbie
>
> We have seen customers who need to deploy their application to production 
> environment but don't have access to create changelog and repartition topics. 
> They need to ask admin team to manually create those topics before proceeding 
> to start the actual stream job. We could add an admin tool to help them go 
> through the process quicker by providing a command that could
>  # Read through current stream topology
>  # Create corresponding topics as needed, even including output topics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8342) Admin tool to setup Kafka Stream topology (internal) topics

2019-06-21 Thread WooYoung (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869320#comment-16869320
 ] 

WooYoung commented on KAFKA-8342:
-

yep. It is in progress.

If I need you help, I would like to be helped you.

Thanks

> Admin tool to setup Kafka Stream topology (internal) topics
> ---
>
> Key: KAFKA-8342
> URL: https://issues.apache.org/jira/browse/KAFKA-8342
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Boyang Chen
>Assignee: WooYoung
>Priority: Major
>  Labels: newbie
>
> We have seen customers who need to deploy their application to production 
> environment but don't have access to create changelog and repartition topics. 
> They need to ask admin team to manually create those topics before proceeding 
> to start the actual stream job. We could add an admin tool to help them go 
> through the process quicker by providing a command that could
>  # Read through current stream topology
>  # Create corresponding topics as needed, even including output topics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-21 Thread yanrui (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yanrui updated KAFKA-8576:
--
Priority: Blocker  (was: Critical)

> Consumer failed to join the coordinator
> ---
>
> Key: KAFKA-8576
> URL: https://issues.apache.org/jira/browse/KAFKA-8576
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: yanrui
>Priority: Blocker
> Attachments: image-2019-06-21-10-52-38-762.png
>
>
> Environment:
>  single node kafka (2.1.1)6g 4c
>  client(0.11.0.1 )
>  consumer group number:1170
> After running for a while, consumers can’t join the coordinator.The report is 
> not the correct coordinator when describing the group.The consumer is  
> endless trap in the discovery group,then marking  the group coordinator dead 
> .Ask for help analyzing the reason, thank you very much
>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8544) Remove legacy kafka.admin.AdminClient

2019-06-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869230#comment-16869230
 ] 

ASF GitHub Bot commented on KAFKA-8544:
---

ijuma commented on pull request #6947: KAFKA-8544: Remove legacy 
kafka.admin.AdminClient
URL: https://github.com/apache/kafka/pull/6947
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove legacy kafka.admin.AdminClient
> -
>
> Key: KAFKA-8544
> URL: https://issues.apache.org/jira/browse/KAFKA-8544
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>Priority: Major
> Fix For: 2.4.0
>
>
> It has been deprecated since 0.11.0, it was never meant as a publicly
> supported API and people should use
> `org.apache.kafka.clients.admin.AdminClient` instead. Its presence
> causes confusion and people still use them accidentally at times.
> `BrokerApiVersionsCommand` uses one method that is not available
> in `org.apache.kafka.clients.admin.AdminClient`, we inline it for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5723) Refactor BrokerApiVersionsCommand to use AdminClient

2019-06-21 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5723:
---
Description: 
Currently it uses the deprecated AdminClient and in order to remove usages of 
that client, this class needs to be refactored.

Update: after KAFKA-8544, the deprecated kafka.admin.AdminClient has been 
removed and the functionality needed by BrokerApiVersionsCommand was inlined. 
We should remove that as part of this ticket.

  was:Currently it uses the deprecated AdminClient and in order to remove 
usages of that client, this class needs to be refactored.


> Refactor BrokerApiVersionsCommand to use AdminClient
> 
>
> Key: KAFKA-5723
> URL: https://issues.apache.org/jira/browse/KAFKA-5723
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> Currently it uses the deprecated AdminClient and in order to remove usages of 
> that client, this class needs to be refactored.
> Update: after KAFKA-8544, the deprecated kafka.admin.AdminClient has been 
> removed and the functionality needed by BrokerApiVersionsCommand was inlined. 
> We should remove that as part of this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8577) Flaky Test `DistributedHerderTest.testJoinLeaderCatchUpFails`

2019-06-21 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-8577:
--

 Summary: Flaky Test 
`DistributedHerderTest.testJoinLeaderCatchUpFails`
 Key: KAFKA-8577
 URL: https://issues.apache.org/jira/browse/KAFKA-8577
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


Started seeing this regularly:
{code:java}
java.lang.AssertionError: 
  Unexpected method call WorkerGroupMember.maybeLeaveGroup("taking too long to 
read the log"):
WorkerGroupMember.ensureActive(): expected: 2, actual: 1
WorkerGroupMember.wakeup(): expected: 2, actual: 1
WorkerGroupMember.maybeLeaveGroup("test join leader catch up fails"): 
expected: 1, actual: 0
WorkerGroupMember.requestRejoin(): expected: 1, actual: 0
WorkerGroupMember.poll(): expected: 1, actual: 0{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)