[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015631#comment-17015631
 ] 

Matthias J. Sax commented on KAFKA-7591:


Yes, it would be a semantic issue. The issue would resolve itself naturally 
though over time when windows are eventually discarded. New windows would be 
created correctly.

However, the main argument is, that I don't think we can detect this case and 
it's the users responsibility to reset the application for this case – we can't 
really provide support for this atm – it's a more general issue tracked via 
https://issues.apache.org/jira/browse/KAFKA-8307

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015630#comment-17015630
 ] 

Matthias J. Sax commented on KAFKA-9042:


Sure. Done. :) 

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread highluck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015511#comment-17015511
 ] 

highluck edited comment on KAFKA-9042 at 1/15/20 4:15 AM:
--

 

[~mjsax]

Thank you !!

Can you give me WIKI write permission?

my id : high.lee :)

thank you!


was (Author: high.lee):
 

[~mjsax]

Thank you !!

Can you give me write permission?

my id is high.lee :)

thank you!

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9365) Add consumer group information to txn commit

2020-01-14 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-9365.

Resolution: Fixed

> Add consumer group information to txn commit 
> -
>
> Key: KAFKA-9365
> URL: https://issues.apache.org/jira/browse/KAFKA-9365
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> This effort adds consumer group information to the txn commit protocol on the 
> broker side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9365) Add consumer group information to txn commit

2020-01-14 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9365:
---
Description: This effort adds consumer group information to the txn commit 
protocol on the broker side.

> Add consumer group information to txn commit 
> -
>
> Key: KAFKA-9365
> URL: https://issues.apache.org/jira/browse/KAFKA-9365
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> This effort adds consumer group information to the txn commit protocol on the 
> broker side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9365) Add consumer group information to txn commit

2020-01-14 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen updated KAFKA-9365:
---
Summary: Add consumer group information to txn commit   (was: Add consumer 
group information to producer txn commit )

> Add consumer group information to txn commit 
> -
>
> Key: KAFKA-9365
> URL: https://issues.apache.org/jira/browse/KAFKA-9365
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9339) Increased CPU utilization in brokers in 2.4.0

2020-01-14 Thread James Brown (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015562#comment-17015562
 ] 

James Brown commented on KAFKA-9339:


I attached the requested files; `allocs.txt` is suspiciously empty

> Increased CPU utilization in brokers in 2.4.0
> -
>
> Key: KAFKA-9339
> URL: https://issues.apache.org/jira/browse/KAFKA-9339
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CentOS 6; Java 1.8.0_232 (OpenJDK)
>Reporter: James Brown
>Priority: Minor
>  Labels: performance
> Attachments: allocs.txt, profile.svg
>
>
> I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have 
> noticed a significant (40%) increase in the CPU time consumed. This is a 
> small cluster of three nodes (running on t2.large EC2 instances all in the 
> same AZ) pushing about 150 message/s in aggregate spread across 208 topics (a 
> total of 266 partitions; most topics only have one partition). Leadership is 
> reasonably well-distributed and each node has between 83 and 94 partitions 
> which it leads. This CPU time increase is visible symmetrically on all three 
> nodes in the cluster (e.g., the controller isn't using more CPU than the 
> other nodes).
>  
> The CPU consumption did not return to normal after I did the second restart 
> to bump the log and inter-broker protocol versions to 2.4, so I don't think 
> it has anything to do with down-converting to the 2.3 protocols.
>  
> No settings were changed, nor was anything about the JVM changed. There is 
> nothing interesting being written to the logs. There's no sign of any 
> instability (partitions aren't being reassigned, etc).
>  
> The best guess I have for the increased CPU usage is that the number of 
> garbage collections increased by approximately 30%, suggesting that something 
> is churning a lot more garbage inside Kafka. This is a small cluster, so it's 
> only got a 3GB heap allocated to Kafka on each node; we're using G1GC with 
> some light tuning and are on Java 8 if that helps.
>  
> We are only using OpenJDK, so I don't think I can produce a Flight Recorder 
> profile.
>  
> The kafka-users mailing list suggested this was worth filing a Jira issue 
> about.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9339) Increased CPU utilization in brokers in 2.4.0

2020-01-14 Thread James Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Brown updated KAFKA-9339:
---
Attachment: allocs.txt

> Increased CPU utilization in brokers in 2.4.0
> -
>
> Key: KAFKA-9339
> URL: https://issues.apache.org/jira/browse/KAFKA-9339
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CentOS 6; Java 1.8.0_232 (OpenJDK)
>Reporter: James Brown
>Priority: Minor
>  Labels: performance
> Attachments: allocs.txt, profile.svg
>
>
> I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have 
> noticed a significant (40%) increase in the CPU time consumed. This is a 
> small cluster of three nodes (running on t2.large EC2 instances all in the 
> same AZ) pushing about 150 message/s in aggregate spread across 208 topics (a 
> total of 266 partitions; most topics only have one partition). Leadership is 
> reasonably well-distributed and each node has between 83 and 94 partitions 
> which it leads. This CPU time increase is visible symmetrically on all three 
> nodes in the cluster (e.g., the controller isn't using more CPU than the 
> other nodes).
>  
> The CPU consumption did not return to normal after I did the second restart 
> to bump the log and inter-broker protocol versions to 2.4, so I don't think 
> it has anything to do with down-converting to the 2.3 protocols.
>  
> No settings were changed, nor was anything about the JVM changed. There is 
> nothing interesting being written to the logs. There's no sign of any 
> instability (partitions aren't being reassigned, etc).
>  
> The best guess I have for the increased CPU usage is that the number of 
> garbage collections increased by approximately 30%, suggesting that something 
> is churning a lot more garbage inside Kafka. This is a small cluster, so it's 
> only got a 3GB heap allocated to Kafka on each node; we're using G1GC with 
> some light tuning and are on Java 8 if that helps.
>  
> We are only using OpenJDK, so I don't think I can produce a Flight Recorder 
> profile.
>  
> The kafka-users mailing list suggested this was worth filing a Jira issue 
> about.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9339) Increased CPU utilization in brokers in 2.4.0

2020-01-14 Thread James Brown (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Brown updated KAFKA-9339:
---
Attachment: profile.svg

> Increased CPU utilization in brokers in 2.4.0
> -
>
> Key: KAFKA-9339
> URL: https://issues.apache.org/jira/browse/KAFKA-9339
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CentOS 6; Java 1.8.0_232 (OpenJDK)
>Reporter: James Brown
>Priority: Minor
>  Labels: performance
> Attachments: profile.svg
>
>
> I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have 
> noticed a significant (40%) increase in the CPU time consumed. This is a 
> small cluster of three nodes (running on t2.large EC2 instances all in the 
> same AZ) pushing about 150 message/s in aggregate spread across 208 topics (a 
> total of 266 partitions; most topics only have one partition). Leadership is 
> reasonably well-distributed and each node has between 83 and 94 partitions 
> which it leads. This CPU time increase is visible symmetrically on all three 
> nodes in the cluster (e.g., the controller isn't using more CPU than the 
> other nodes).
>  
> The CPU consumption did not return to normal after I did the second restart 
> to bump the log and inter-broker protocol versions to 2.4, so I don't think 
> it has anything to do with down-converting to the 2.3 protocols.
>  
> No settings were changed, nor was anything about the JVM changed. There is 
> nothing interesting being written to the logs. There's no sign of any 
> instability (partitions aren't being reassigned, etc).
>  
> The best guess I have for the increased CPU usage is that the number of 
> garbage collections increased by approximately 30%, suggesting that something 
> is churning a lot more garbage inside Kafka. This is a small cluster, so it's 
> only got a 3GB heap allocated to Kafka on each node; we're using G1GC with 
> some light tuning and are on Java 8 if that helps.
>  
> We are only using OpenJDK, so I don't think I can produce a Flight Recorder 
> profile.
>  
> The kafka-users mailing list suggested this was worth filing a Jira issue 
> about.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9235) Transaction state not cleaned up following StopReplica request

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015536#comment-17015536
 ] 

ASF GitHub Bot commented on KAFKA-9235:
---

hachikuji commented on pull request #7963: KAFKA-9235; Ensure transaction 
coordinator is stopped after replica deletion
URL: https://github.com/apache/kafka/pull/7963
 
 
   During a reassignment, it can happen that the current leader of a partition 
is demoted and removed from the replica set at the same time. In this case, we 
rely on the StopReplica request in order to stop replica fetchers and to clear 
the group coordinator cache. This patch adds similar logic to ensure that the 
transaction coordinator state cache also gets cleared.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Transaction state not cleaned up following StopReplica request
> --
>
> Key: KAFKA-9235
> URL: https://issues.apache.org/jira/browse/KAFKA-9235
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> When the broker receives a StopReplica request from the controller for one of 
> the transaction state topics, we should make sure to cleanup existing state 
> in the TransactionCoordinator for the corresponding partition. We have 
> similar logic already for the group coordinator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread highluck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015511#comment-17015511
 ] 

highluck commented on KAFKA-9042:
-

 

[~mjsax]

Thank you !!

Can you give me write permission?

my id is high.lee :)

thank you!

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015502#comment-17015502
 ] 

Matthias J. Sax commented on KAFKA-7591:


{quote}Shouldn't you need to reset the app if the window size changes?
{quote}
Maybe, but this seems to be orthogonal.
{quote}But it wouldn't be able to distinguish a change in retention from a 
change in window size,
{quote}
As above, seems to be orthogonal.

I would still prefer if KS would automatically update the topic configuration. 
Note, that we do have an explicit `topic.` prefix to specify topic configs and 
it seem reasonable to allow user to change those configs and that KS updates 
the corresponding topic configuration (this also holds for `replication.factor` 
btw). But I think we can cover all this with a single PR – if a topic exists, 
we fetch it's config, compare it to whatever config we computed and issue 
AlterTopicConfig request to update the config. (We would also log an INFO 
statement when this happens).

 

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015479#comment-17015479
 ] 

ASF GitHub Bot commented on KAFKA-6144:
---

vvcephei commented on pull request #7962: KAFKA-6144: option to query restoring 
and standby
URL: https://github.com/apache/kafka/pull/7962
 
 
   This is based on a temporary branch, which is mirrored from 
https://github.com/apache/kafka/pull/7960.
   
   I will delete the temporary branch once #7960 is merged and re-target this 
PR to trunk.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal-breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows. Adding the use case from KAFKA-8994 as it is more 
> descriptive.
> "Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9431) Expose API in KafkaStreams to fetch all local offset lags

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015475#comment-17015475
 ] 

ASF GitHub Bot commented on KAFKA-9431:
---

vinothchandar commented on pull request #7961: [KAFKA-9431] Expose API in 
KafkaStreams to fetch all local offset lags
URL: https://github.com/apache/kafka/pull/7961
 
 
   
- Adds KafkaStreams#allLocalOffsetLags(), which returns lag information of 
all active/standby tasks local to a streams instance
- LagInfo class encapsulates the current position in the changelog, 
endoffset in the changelog and their difference as lag
- Lag information is a mere estimate; it can over-estimate (source topic 
optimization), or under-estimate.
- Each call to allLocalOffsetLags() generates a metadata call to Kafka 
brokers, so caution advised
- Unit and Integration tests added.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Expose API in KafkaStreams to fetch all local offset lags
> -
>
> Key: KAFKA-9431
> URL: https://issues.apache.org/jira/browse/KAFKA-9431
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9365) Add consumer group information to producer txn commit

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015472#comment-17015472
 ] 

ASF GitHub Bot commented on KAFKA-9365:
---

guozhangwang commented on pull request #7897: KAFKA-9365: Add server side 
change  to include consumer group information within transaction commit
URL: https://github.com/apache/kafka/pull/7897
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add consumer group information to producer txn commit 
> --
>
> Key: KAFKA-9365
> URL: https://issues.apache.org/jira/browse/KAFKA-9365
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-14 Thread Jon Bates (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015433#comment-17015433
 ] 

Jon Bates commented on KAFKA-7591:
--

Agreed!
A WARN message could at least be picked up, even if synchronizing the retention 
period isn't feasible

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015432#comment-17015432
 ] 

ASF GitHub Bot commented on KAFKA-8764:
---

mumrah commented on pull request #7932: KAFKA-8764: LogCleanerManager endless 
loop while compacting/clea
URL: https://github.com/apache/kafka/pull/7932
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> LogCleanerManager endless loop while compacting/cleaning segments
> -
>
> Key: KAFKA-8764
> URL: https://issues.apache.org/jira/browse/KAFKA-8764
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.3.0, 2.2.1, 2.4.0
> Environment: docker base image: openjdk:8-jre-alpine base image, 
> kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz
>Reporter: Tomislav Rajakovic
>Priority: Major
>  Labels: patch
> Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, 
> kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, 
> log-cleaner-bug-reproduction.zip
>
>
> {{LogCleanerManager stuck in endless loop while clearing segments for one 
> partition resulting with many log outputs and heavy disk read/writes/IOPS.}}
>  
> Issue appeared on follower brokers, and it happens on every (new) broker if 
> partition assignment is changed.
>  
> Original issue setup:
>  * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers
>  * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB
>  * 5 zookeepers
>  * topic created with config:
>  ** name = "backup_br_domain_squad"
> partitions = 36
> replication_factor = 3
> config = {
>  "cleanup.policy" = "compact"
>  "min.compaction.lag.ms" = "8640"
>  "min.cleanable.dirty.ratio" = "0.3"
> }
>  
>  
> Log excerpt:
> {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,895] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, 
> 

[jira] [Updated] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8677:
---
Priority: Critical  (was: Blocker)

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Guozhang Wang
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> ---
> I found this flaky test is actually exposing a real bug in consumer: within 
> {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch 
> request before returning the data in order to pipelining the fetch requests:
> {code}
> if (!records.isEmpty()) {
> // before returning the fetched records, we can send off 
> the next round of fetches
> // and avoid block waiting for their responses to enable 
> pipelining while the user
> // is handling the fetched records.
> //
> // NOTE: since the consumed position has already been 
> updated, we must not allow
> // wakeups or any other errors to be triggered prior to 
> returning the fetched records.
> if (fetcher.sendFetches() > 0 || 
> client.hasPendingRequests()) {
> client.pollNoWakeup();
> }
> return this.interceptors.onConsume(new 
> ConsumerRecords<>(records));
> }
> {code}
> As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, 
> since at this point the fetch position has been updated. If an exception is 
> thrown here, and the callers decides to capture and continue, those records 
> would never be returned again, causing data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8677:
---
Affects Version/s: (was: 2.4.0)

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Reporter: Boyang Chen
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> ---
> I found this flaky test is actually exposing a real bug in consumer: within 
> {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch 
> request before returning the data in order to pipelining the fetch requests:
> {code}
> if (!records.isEmpty()) {
> // before returning the fetched records, we can send off 
> the next round of fetches
> // and avoid block waiting for their responses to enable 
> pipelining while the user
> // is handling the fetched records.
> //
> // NOTE: since the consumed position has already been 
> updated, we must not allow
> // wakeups or any other errors to be triggered prior to 
> returning the fetched records.
> if (fetcher.sendFetches() > 0 || 
> client.hasPendingRequests()) {
> client.pollNoWakeup();
> }
> return this.interceptors.onConsume(new 
> ConsumerRecords<>(records));
> }
> {code}
> As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, 
> since at this point the fetch position has been updated. If an exception is 
> thrown here, and the callers decides to capture and continue, those records 
> would never be returned again, causing data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8677:
---
Affects Version/s: 2.5.0

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.5.0
>Reporter: Boyang Chen
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> ---
> I found this flaky test is actually exposing a real bug in consumer: within 
> {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch 
> request before returning the data in order to pipelining the fetch requests:
> {code}
> if (!records.isEmpty()) {
> // before returning the fetched records, we can send off 
> the next round of fetches
> // and avoid block waiting for their responses to enable 
> pipelining while the user
> // is handling the fetched records.
> //
> // NOTE: since the consumed position has already been 
> updated, we must not allow
> // wakeups or any other errors to be triggered prior to 
> returning the fetched records.
> if (fetcher.sendFetches() > 0 || 
> client.hasPendingRequests()) {
> client.pollNoWakeup();
> }
> return this.interceptors.onConsume(new 
> ConsumerRecords<>(records));
> }
> {code}
> As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, 
> since at this point the fetch position has been updated. If an exception is 
> thrown here, and the callers decides to capture and continue, those records 
> would never be returned again, causing data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8677:
---
Fix Version/s: (was: 2.4.0)

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> ---
> I found this flaky test is actually exposing a real bug in consumer: within 
> {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch 
> request before returning the data in order to pipelining the fetch requests:
> {code}
> if (!records.isEmpty()) {
> // before returning the fetched records, we can send off 
> the next round of fetches
> // and avoid block waiting for their responses to enable 
> pipelining while the user
> // is handling the fetched records.
> //
> // NOTE: since the consumed position has already been 
> updated, we must not allow
> // wakeups or any other errors to be triggered prior to 
> returning the fetched records.
> if (fetcher.sendFetches() > 0 || 
> client.hasPendingRequests()) {
> client.pollNoWakeup();
> }
> return this.interceptors.onConsume(new 
> ConsumerRecords<>(records));
> }
> {code}
> As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, 
> since at this point the fetch position has been updated. If an exception is 
> thrown here, and the callers decides to capture and continue, those records 
> would never be returned again, causing data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-8677:


Reopening this ticket. Test failed again: 
[https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4226/testReport/junit/kafka.api/GroupEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/]
{code:java}
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 131085, only 28 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110)
at 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:313)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:719)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:833)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:556)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1306)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1246)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1214)
at 
kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:795)
at kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1351)
at 
kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:537)
at 
kafka.api.EndToEndAuthorizationTest.consumeRecordsIgnoreOneAuthorizationException(EndToEndAuthorizationTest.scala:556)
at 
kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:376)
{code}

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records
> ---
> I found this flaky test is actually exposing a real bug in consumer: within 
> {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch 
> request before returning the data in order to pipelining the fetch requests:
> {code}
> if (!records.isEmpty()) {
> // before returning the fetched records, we can send off 
> the next round of fetches
> // and avoid block waiting for their responses to enable 
> pipelining while the user
> // is handling the fetched records.
> //
> // NOTE: since the consumed position has already been 
> updated, we must not allow
> // wakeups or any other errors to be triggered prior to 
> returning the fetched records.
> if (fetcher.sendFetches() > 0 || 
> client.hasPendingRequests()) {
> client.pollNoWakeup();
> }
> return this.interceptors.onConsume(new 
> ConsumerRecords<>(records));
> }
> {code}
> As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, 
> since at this point the fetch position has been updated. If an exception is 
> thrown here, and the callers decides to capture and continue, those records 
> would never be returned again, causing data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9013) Flaky Test MirrorConnectorsIntegrationTest#testReplication

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015428#comment-17015428
 ] 

Matthias J. Sax commented on KAFKA-9013:


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/211/testReport/junit/org.apache.kafka.connect.mirror/MirrorConnectorsIntegrationTest/testReplication/]

> Flaky Test MirrorConnectorsIntegrationTest#testReplication
> --
>
> Key: KAFKA-9013
> URL: https://issues.apache.org/jira/browse/KAFKA-9013
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bruno Cadonna
>Priority: Major
>  Labels: flaky-test
>
> h1. Stacktrace:
> {code:java}
> java.lang.AssertionError: Condition not met within timeout 2. Offsets not 
> translated downstream to primary cluster.
>   at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:377)
>   at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:354)
>   at 
> org.apache.kafka.connect.mirror.MirrorConnectorsIntegrationTest.testReplication(MirrorConnectorsIntegrationTest.java:239)
> {code}
> h1. Standard Error
> {code}
> Standard Error
> Oct 09, 2019 11:32:00 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource 
> registered in SERVER runtime does not implement any provider interfaces 
> applicable in the SERVER runtime. Due to constraint configuration problems 
> the provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will 
> be ignored. 
> Oct 09, 2019 11:32:00 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.LoggingResource registered in 
> SERVER runtime does not implement any provider interfaces applicable in the 
> SERVER runtime. Due to constraint configuration problems the provider 
> org.apache.kafka.connect.runtime.rest.resources.LoggingResource will be 
> ignored. 
> Oct 09, 2019 11:32:00 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered 
> in SERVER runtime does not implement any provider interfaces applicable in 
> the SERVER runtime. Due to constraint configuration problems the provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be 
> ignored. 
> Oct 09, 2019 11:32:00 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.RootResource registered in 
> SERVER runtime does not implement any provider interfaces applicable in the 
> SERVER runtime. Due to constraint configuration problems the provider 
> org.apache.kafka.connect.runtime.rest.resources.RootResource will be ignored. 
> Oct 09, 2019 11:32:01 PM org.glassfish.jersey.internal.Errors logErrors
> WARNING: The following warnings have been detected: WARNING: The 
> (sub)resource method listLoggers in 
> org.apache.kafka.connect.runtime.rest.resources.LoggingResource contains 
> empty path annotation.
> WARNING: The (sub)resource method listConnectors in 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains 
> empty path annotation.
> WARNING: The (sub)resource method createConnector in 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains 
> empty path annotation.
> WARNING: The (sub)resource method listConnectorPlugins in 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource 
> contains empty path annotation.
> WARNING: The (sub)resource method serverInfo in 
> org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty 
> path annotation.
> Oct 09, 2019 11:32:02 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource 
> registered in SERVER runtime does not implement any provider interfaces 
> applicable in the SERVER runtime. Due to constraint configuration problems 
> the provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will 
> be ignored. 
> Oct 09, 2019 11:32:02 PM org.glassfish.jersey.internal.inject.Providers 
> checkProviderRuntime
> WARNING: A provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered 
> in SERVER runtime does not implement any provider interfaces applicable in 
> the SERVER runtime. Due to constraint configuration problems the provider 
> org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be 
> ignored. 
> Oct 09, 2019 11:32:02 PM org.glassfish.jersey.internal.inject.Providers 
> 

[jira] [Commented] (KAFKA-9294) Enhance DSL Naming Guide to Include All Naming Rules

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015426#comment-17015426
 ] 

ASF GitHub Bot commented on KAFKA-9294:
---

mjsax commented on pull request #7927: KAFKA-9294: Add tests for Named parameter
URL: https://github.com/apache/kafka/pull/7927
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance DSL Naming Guide to Include All Naming Rules
> 
>
> Key: KAFKA-9294
> URL: https://issues.apache.org/jira/browse/KAFKA-9294
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Major
>
> We already have a naming guide in the docs, but we should expand it to cover 
> how all components of the DSL get named.
> -Seems there is a broken link, too: 
> [https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-api.html#naming-a-streams-app]
>  links to 
> [https://kafka.apache.org/24/documentation/streams/developer-guide/dsl-topology-naming.html]
>  (that does not work)-
>  
> EDIT: Link to the naming guide is fixed now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7591) Changelog retention period doesn't synchronise with window-store size

2020-01-14 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015408#comment-17015408
 ] 

Sophie Blee-Goldman commented on KAFKA-7591:


Shouldn't you need to reset the app if the window size changes? On the other 
hand, it might be a good idea to at least verify the topic configs (for all 
internal topics), and log a warning or even throw an exception if they don't 
match. 

On a related note, users may want to increase the retention period to allow 
querying the state for longer – in that case it does seem reasonable for 
Streams to alter the changelog's retention. But it wouldn't be able to 
distinguish a change in retention from a change in window size, thus I think 
it's still better to just detect the discrepancy and alert the user so they can 
consider the best course of action (reset app or alter topic config)

> Changelog retention period doesn't synchronise with window-store size
> -
>
> Key: KAFKA-7591
> URL: https://issues.apache.org/jira/browse/KAFKA-7591
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jon Bates
>Priority: Major
>
> When a new windowed state store is created, the associated changelog topic's 
> `retention.ms` value is set to `window-size + 
> CHANGELOG_ADDITIONAL_RETENTION_MS`
> h3. Expected Behaviour
> If the window-size is updated, the changelog topic's `retention.ms` config 
> should be updated to reflect the new size
> h3. Actual Behaviour
> The changelog-topic's `retention.ms` setting is not amended, resulting in 
> possible loss of data upon application restart
>  
> n.b. Although it is easy to update changelog topic config, I logged this as 
> `major` due to the potential for data-loss for any user of Kafka-Streams who 
> may not be intimately aware of the relationship between a windowed store and 
> the changelog config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8770) Either switch to or add an option for emit-on-change

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015406#comment-17015406
 ] 

Matthias J. Sax commented on KAFKA-8770:


[~Yohan123] – just a heads up – I am swamped atm and won't have time to review 
your KIP any time soon.

> Either switch to or add an option for emit-on-change
> 
>
> Key: KAFKA-8770
> URL: https://issues.apache.org/jira/browse/KAFKA-8770
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
>  Labels: needs-kip
>
> Currently, Streams offers two emission models:
> * emit-on-window-close: (using Suppression)
> * emit-on-update: (i.e., emit a new result whenever a new record is 
> processed, regardless of whether the result has changed)
> There is also an option to drop some intermediate results, either using 
> caching or suppression.
> However, there is no support for emit-on-change, in which results would be 
> forwarded only if the result has changed. This has been reported to be 
> extremely valuable as a performance optimizations for some high-traffic 
> applications, and it reduces the computational burden both internally for 
> downstream Streams operations, as well as for external systems that consume 
> the results, and currently have to deal with a lot of "no-op" changes.
> It would be pretty straightforward to implement this, by loading the prior 
> results before a stateful operation and comparing with the new result before 
> persisting or forwarding. In many cases, we load the prior result anyway, so 
> it may not be a significant performance impact either.
> One design challenge is what to do with timestamps. If we get one record at 
> time 1 that produces a result, and then another at time 2 that produces a 
> no-op, what should be the timestamp of the result, 1 or 2? emit-on-change 
> would require us to say 1.
> Clearly, we'd need to do some serious benchmarks to evaluate any potential 
> implementation of emit-on-change.
> Another design challenge is to decide if we should just automatically provide 
> emit-on-change for stateful operators, or if it should be configurable. 
> Configuration increases complexity, so unless the performance impact is high, 
> we may just want to change the emission model without a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015405#comment-17015405
 ] 

Matthias J. Sax commented on KAFKA-9042:


Thanks for the PR [~high.lee] – note that this ticket requires a KIP (it's 
labled `needs-kip`). Hence, please first prepare a KIP and get it accepted 
before we can review and merge any PR. Details about the KIP process are 
described in the Kafka wiki: 
[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals]

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9346) Consumer fetch offset back-off with pending transactions

2020-01-14 Thread Boyang Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-9346.

Resolution: Fixed

> Consumer fetch offset back-off with pending transactions
> 
>
> Key: KAFKA-9346
> URL: https://issues.apache.org/jira/browse/KAFKA-9346
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9420) Bump APIs to enable flexible versions

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015350#comment-17015350
 ] 

ASF GitHub Bot commented on KAFKA-9420:
---

hachikuji commented on pull request #7931: KAFKA-9420: Add flexible version 
support for converted protocols
URL: https://github.com/apache/kafka/pull/7931
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump APIs to enable flexible versions
> -
>
> Key: KAFKA-9420
> URL: https://issues.apache.org/jira/browse/KAFKA-9420
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> As part of KIP-482 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields),
>  we need to bump the versions of all protocols which have been converted to 
> use the generated protocol in order to enable flexible version support . This 
> ticket covers the following APIs which now support the generated protocol:
> - SaslAuthenticate
> - SaslHandshake
> - CreatePartitions
> - DescribeDelegationToken
> - ExpireDelegationToken
> - RenewDelegationToken



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9420) Bump APIs to enable flexible versions

2020-01-14 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9420.

Fix Version/s: 2.5.0
   Resolution: Fixed

> Bump APIs to enable flexible versions
> -
>
> Key: KAFKA-9420
> URL: https://issues.apache.org/jira/browse/KAFKA-9420
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.5.0
>
>
> As part of KIP-482 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields),
>  we need to bump the versions of all protocols which have been converted to 
> use the generated protocol in order to enable flexible version support . This 
> ticket covers the following APIs which now support the generated protocol:
> - SaslAuthenticate
> - SaslHandshake
> - CreatePartitions
> - DescribeDelegationToken
> - ExpireDelegationToken
> - RenewDelegationToken



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9417) Add extension to the TransactionsTest.transactions_test for new EOS model

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-9417:
---
Component/s: streams

> Add extension to the TransactionsTest.transactions_test for new EOS model
> -
>
> Key: KAFKA-9417
> URL: https://issues.apache.org/jira/browse/KAFKA-9417
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> We would like to extend the `TransactionMessageCopier` to use the new 
> subscription mode consumer and do a system test based off that in order to 
> verify the new semantic actually works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9418) Add new sendOffsets API to include consumer group metadata

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-9418:
---
Component/s: streams

> Add new sendOffsets API to include consumer group metadata
> --
>
> Key: KAFKA-9418
> URL: https://issues.apache.org/jira/browse/KAFKA-9418
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Add the consumer group metadata as part of producer sendTransactions API to 
> enable proper fencing under 447



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9395) Improve Kafka scheduler's periodic maybeShrinkIsr()

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015303#comment-17015303
 ] 

ASF GitHub Bot commented on KAFKA-9395:
---

bdbyrne commented on pull request #7921: KAFKA-9395: Only acquire write lock in 
maybeShrinkIsr() if necessary.
URL: https://github.com/apache/kafka/pull/7921
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Kafka scheduler's periodic maybeShrinkIsr()
> ---
>
> Key: KAFKA-9395
> URL: https://issues.apache.org/jira/browse/KAFKA-9395
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Brian Byrne
>Assignee: Rajini Sivaram
>Priority: Major
>
> The ReplicaManager schedules a periodic call to maybeShrinkIsr() with the 
> KafkaScheduler for a period of replica.lag.time.max.ms / 2. While 
> replica.lag.time.max.ms defaults to 30s, my setup was 45s, which means 
> maybeShrinkIsr() was being called every 22.5 seconds. Normally this is not a 
> problem.
> Fetch/produce requests hold a partition's leaderIsrUpdateLock in reader mode 
> while they are running. When a partition is requested to check whether it 
> should shrink its ISR, it acquires a write lock. So there's potential for 
> contention here, and if the fetch/produce requests are long running, they may 
> block maybeShrinkIsr() for hundreds of ms.
> This becomes a problem due to the way the scheduler runnable is set up: it 
> calls maybeShrinkIsr() for partition per single scheduler invocation. If 
> there's a lot of partitions, this could take many seconds, even minutes. 
> However, the runnable is scheduled via 
> ScheduledThreadPoolExecutor#scheduleAtFixedRate, which means if it exceeds 
> its period, it's immediately scheduled to run again. So it backs up enough 
> that the scheduler is always executing this function.
> This may cause partitions to periodically check their ISR a lot less 
> frequently than intended. This also contributes a huge source of contention 
> for cases where the produce/fetch requests are long-running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015277#comment-17015277
 ] 

ASF GitHub Bot commented on KAFKA-6144:
---

vvcephei commented on pull request #7960: [KAFKA-6144]: Add KeyQueryMetadata 
APIs to KafkaStreams
URL: https://github.com/apache/kafka/pull/7960
 
 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal-breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows. Adding the use case from KAFKA-8994 as it is more 
> descriptive.
> "Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9427) StateRestoreListener.onRestoreEnd should report actual message count

2020-01-14 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-9427:
---
Component/s: streams

> StateRestoreListener.onRestoreEnd should report actual message count
> 
>
> Key: KAFKA-9427
> URL: https://issues.apache.org/jira/browse/KAFKA-9427
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Chris Stromberger
>Priority: Minor
>
> {{StateRestoreListener.onRestoreEnd appears to report the difference between 
> offsets as "totalRestored", which may differ from the actual number of 
> messages restored to a state store}}{{. Am assuming this is due to missing 
> offsets in compacted topics. It would be more useful if 
> }}{{StateRestoreListener.onRestoreEnd}}{{ reported the actual count of 
> messages restored (sum of values reported by 
> }}{{StateRestoreListener.onBatchRestored). }}
> Was asked to create this ticket in Slack thread 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1578956151094200]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9395) Improve Kafka scheduler's periodic maybeShrinkIsr()

2020-01-14 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9395.

  Assignee: Rajini Sivaram  (was: Brian Byrne)
Resolution: Done

> Improve Kafka scheduler's periodic maybeShrinkIsr()
> ---
>
> Key: KAFKA-9395
> URL: https://issues.apache.org/jira/browse/KAFKA-9395
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Brian Byrne
>Assignee: Rajini Sivaram
>Priority: Major
>
> The ReplicaManager schedules a periodic call to maybeShrinkIsr() with the 
> KafkaScheduler for a period of replica.lag.time.max.ms / 2. While 
> replica.lag.time.max.ms defaults to 30s, my setup was 45s, which means 
> maybeShrinkIsr() was being called every 22.5 seconds. Normally this is not a 
> problem.
> Fetch/produce requests hold a partition's leaderIsrUpdateLock in reader mode 
> while they are running. When a partition is requested to check whether it 
> should shrink its ISR, it acquires a write lock. So there's potential for 
> contention here, and if the fetch/produce requests are long running, they may 
> block maybeShrinkIsr() for hundreds of ms.
> This becomes a problem due to the way the scheduler runnable is set up: it 
> calls maybeShrinkIsr() for partition per single scheduler invocation. If 
> there's a lot of partitions, this could take many seconds, even minutes. 
> However, the runnable is scheduled via 
> ScheduledThreadPoolExecutor#scheduleAtFixedRate, which means if it exceeds 
> its period, it's immediately scheduled to run again. So it backs up enough 
> that the scheduler is always executing this function.
> This may cause partitions to periodically check their ISR a lot less 
> frequently than intended. This also contributes a huge source of contention 
> for cases where the produce/fetch requests are long-running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9428) Expose standby information in KafkaStreams via queryMetadataForKey API

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated KAFKA-9428:
--
Component/s: streams

> Expose standby information in KafkaStreams via queryMetadataForKey API
> --
>
> Key: KAFKA-9428
> URL: https://issues.apache.org/jira/browse/KAFKA-9428
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9430) Tighten up lag estimates when source topic optimization is on

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated KAFKA-9430:
--
Component/s: streams

> Tighten up lag estimates when source topic optimization is on 
> --
>
> Key: KAFKA-9430
> URL: https://issues.apache.org/jira/browse/KAFKA-9430
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>
> Right now, we use _endOffsets_ of the source topic for the computation. Since 
> the source topics can also have user event produces, this is an over estimate
>  
> From John:
> For "optimized" changelogs, this will be wrong, strictly speaking, but it's 
> an over-estimate (which seems better than an under-estimate), and it's also 
> still an apples-to-apples comparison, since all replicas would use the same 
> upper bound to compute their lags, so the "pick the freshest" replica is 
> still going to pick the right one. We can add a new 2.5 blocker ticket to 
> really fix it, and not worry about it until after this KSQL stuff is done.
>  
> For active: we need to use  consumed offsets and not end of source topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9429) Allow ability to control whether stale reads out of state stores are desirable

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated KAFKA-9429:
--
Component/s: streams

> Allow ability to control whether stale reads out of state stores are desirable
> --
>
> Key: KAFKA-9429
> URL: https://issues.apache.org/jira/browse/KAFKA-9429
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>
> From John :
>  
> I also meant to talk with you about the change to allow querying recovering 
> stores. I think you might have already talked with Matthias a little about 
> this in the scope of KIP-216, but it's probably not ok to just change the 
> default from only allowing query while running, since there are actually 
> people depending on full-consistency queries for correctness right now.
>  
> What we can do is add an overload {{KafkaStreams.store(name, 
> QueriableStoreType, QueriableStoreOptions)}}, with one option: 
> {{queryStaleState(true/false)}} (your preference on the name, I just made 
> that up right now). The default would be false, and KSQL would set it to 
> true. While false, it would not allow querying recovering stores OR standbys. 
> This basically allows a single switch to preserve existing behavior.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6144) Allow serving interactive queries from in-sync Standbys

2020-01-14 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015248#comment-17015248
 ] 

Vinoth Chandar commented on KAFKA-6144:
---

[~vvcephei] [~NaviBrar]  Added subtasks here.. 

> Allow serving interactive queries from in-sync Standbys
> ---
>
> Key: KAFKA-6144
> URL: https://issues.apache.org/jira/browse/KAFKA-6144
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Antony Stubbs
>Assignee: Navinder Brar
>Priority: Major
>  Labels: kip-535
> Attachments: image-2019-10-09-20-33-37-423.png, 
> image-2019-10-09-20-47-38-096.png
>
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal-breaker 
> for micro service use cases.
> One workaround is to allow stale data to be read from the state stores when 
> use case allows. Adding the use case from KAFKA-8994 as it is more 
> descriptive.
> "Consider the following scenario in a three node Streams cluster with node A, 
> node S and node R, executing a stateful sub-topology/topic group with 1 
> partition and `_num.standby.replicas=1_`  
>  * *t0*: A is the active instance owning the partition, B is the standby that 
> keeps replicating the A's state into its local disk, R just routes streams 
> IQs to active instance using StreamsMetadata
>  * *t1*: IQs pick node R as router, R forwards query to A, A responds back to 
> R which reverse forwards back the results.
>  * *t2:* Active A instance is killed and rebalance begins. IQs start failing 
> to A
>  * *t3*: Rebalance assignment happens and standby B is now promoted as active 
> instance. IQs continue to fail
>  * *t4*: B fully catches up to changelog tail and rewinds offsets to A's last 
> commit position, IQs continue to fail
>  * *t5*: IQs to R, get routed to B, which is now ready to serve results. IQs 
> start succeeding again
>  
> Depending on Kafka consumer group session/heartbeat timeouts, step t2,t3 can 
> take few seconds (~10 seconds based on defaults values). Depending on how 
> laggy the standby B was prior to A being killed, t4 can take few 
> seconds-minutes. 
> While this behavior favors consistency over availability at all times, the 
> long unavailability window might be undesirable for certain classes of 
> applications (e.g simple caches or dashboards). 
> This issue aims to also expose information about standby B to R, during each 
> rebalance such that the queries can be routed by an application to a standby 
> to serve stale reads, choosing availability over consistency."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9430) Tighten up lag estimates when source topic optimization is on

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated KAFKA-9430:
--
Fix Version/s: 2.5.0

> Tighten up lag estimates when source topic optimization is on 
> --
>
> Key: KAFKA-9430
> URL: https://issues.apache.org/jira/browse/KAFKA-9430
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 2.5.0
>
>
> Right now, we use _endOffsets_ of the source topic for the computation. Since 
> the source topics can also have user event produces, this is an over estimate
>  
> From John:
> For "optimized" changelogs, this will be wrong, strictly speaking, but it's 
> an over-estimate (which seems better than an under-estimate), and it's also 
> still an apples-to-apples comparison, since all replicas would use the same 
> upper bound to compute their lags, so the "pick the freshest" replica is 
> still going to pick the right one. We can add a new 2.5 blocker ticket to 
> really fix it, and not worry about it until after this KSQL stuff is done.
>  
> For active: we need to use  consumed offsets and not end of source topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9431) Expose API in KafkaStreams to fetch all local offset lags

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9431:
-

 Summary: Expose API in KafkaStreams to fetch all local offset lags
 Key: KAFKA-9431
 URL: https://issues.apache.org/jira/browse/KAFKA-9431
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9430) Tighten up lag estimates when source topic optimization is on

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9430:
-

 Summary: Tighten up lag estimates when source topic optimization 
is on 
 Key: KAFKA-9430
 URL: https://issues.apache.org/jira/browse/KAFKA-9430
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar


Right now, we use _endOffsets_ of the source topic for the computation. Since 
the source topics can also have user event produces, this is an over estimate

 

>From John:

For "optimized" changelogs, this will be wrong, strictly speaking, but it's an 
over-estimate (which seems better than an under-estimate), and it's also still 
an apples-to-apples comparison, since all replicas would use the same upper 
bound to compute their lags, so the "pick the freshest" replica is still going 
to pick the right one. We can add a new 2.5 blocker ticket to really fix it, 
and not worry about it until after this KSQL stuff is done.

 

For active: we need to use  consumed offsets and not end of source topic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9429) Allow ability to control whether stale reads out of state stores are desirable

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9429:
-

 Summary: Allow ability to control whether stale reads out of state 
stores are desirable
 Key: KAFKA-9429
 URL: https://issues.apache.org/jira/browse/KAFKA-9429
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0


>From John :

 

I also meant to talk with you about the change to allow querying recovering 
stores. I think you might have already talked with Matthias a little about this 
in the scope of KIP-216, but it's probably not ok to just change the default 
from only allowing query while running, since there are actually people 
depending on full-consistency queries for correctness right now.

 

What we can do is add an overload {{KafkaStreams.store(name, 
QueriableStoreType, QueriableStoreOptions)}}, with one option: 
{{queryStaleState(true/false)}} (your preference on the name, I just made that 
up right now). The default would be false, and KSQL would set it to true. While 
false, it would not allow querying recovering stores OR standbys. This 
basically allows a single switch to preserve existing behavior.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9428) Expose standby information in KafkaStreams via queryMetadataForKey API

2020-01-14 Thread Vinoth Chandar (Jira)
Vinoth Chandar created KAFKA-9428:
-

 Summary: Expose standby information in KafkaStreams via 
queryMetadataForKey API
 Key: KAFKA-9428
 URL: https://issues.apache.org/jira/browse/KAFKA-9428
 Project: Kafka
  Issue Type: Sub-task
Reporter: Vinoth Chandar
Assignee: Vinoth Chandar
 Fix For: 2.5.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9427) StateRestoreListener.onRestoreEnd should report actual message count

2020-01-14 Thread Chris Stromberger (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Stromberger updated KAFKA-9427:
-
Summary: StateRestoreListener.onRestoreEnd should report actual message 
count  (was: StateRestoreListener.onRestoreEnd should return actual message 
count)

> StateRestoreListener.onRestoreEnd should report actual message count
> 
>
> Key: KAFKA-9427
> URL: https://issues.apache.org/jira/browse/KAFKA-9427
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chris Stromberger
>Priority: Minor
>
> {{StateRestoreListener.onRestoreEnd appears to report the difference between 
> offsets as "totalRestored", which may differ from the actual number of 
> messages restored to a state store}}{{. Am assuming this is due to missing 
> offsets in compacted topics. It would be more useful if 
> }}{{StateRestoreListener.onRestoreEnd}}{{ reported the actual count of 
> messages restored (sum of values reported by 
> }}{{StateRestoreListener.onBatchRestored). }}
> Was asked to create this ticket in Slack thread 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1578956151094200]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9427) StateRestoreListener.onRestoreEnd should return actual message count

2020-01-14 Thread Chris Stromberger (Jira)
Chris Stromberger created KAFKA-9427:


 Summary: StateRestoreListener.onRestoreEnd should return actual 
message count
 Key: KAFKA-9427
 URL: https://issues.apache.org/jira/browse/KAFKA-9427
 Project: Kafka
  Issue Type: Improvement
Reporter: Chris Stromberger


{{StateRestoreListener.onRestoreEnd appears to report the difference between 
offsets as "totalRestored", which may differ from the actual number of messages 
restored to a state store}}{{. Am assuming this is due to missing offsets in 
compacted topics. It would be more useful if 
}}{{StateRestoreListener.onRestoreEnd}}{{ reported the actual count of messages 
restored (sum of values reported by }}{{StateRestoreListener.onBatchRestored). 
}}

Was asked to create this ticket in Slack thread 
[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1578956151094200]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread highluck (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015214#comment-17015214
 ] 

highluck commented on KAFKA-9042:
-

[~bchen225242] [~mjsax]

May I ask for a review?

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (KAFKA-9042) Auto infer external topic partitions in stream reset tool

2020-01-14 Thread highluck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

highluck updated KAFKA-9042:

Comment: was deleted

(was: [~bchen225242]

I don't know if I understand well, but please review it!)

> Auto infer external topic partitions in stream reset tool
> -
>
> Key: KAFKA-9042
> URL: https://issues.apache.org/jira/browse/KAFKA-9042
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, tools
>Reporter: Boyang Chen
>Assignee: highluck
>Priority: Major
>  Labels: needs-kip, newbie, newbie++
>
> As of today, user has to specify `--input-topic` in the stream reset to be 
> able to reset offset to a specific position. For a stream job with multiple 
> external topics that needs to be purged, users usually don't want to name all 
> the topics in order to reset the offsets. It's really painful to look through 
> the entire topology to make sure we purge all the committed offsets.
> We could add a config `--reset-all-external-topics` to the reset tool such 
> that when enabled, we could delete offsets for all involved topics. The topic 
> metadata could be acquired by issuing a `DescribeGroup` request from admin 
> client, which is stored in the member subscription information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Description: 
KAFKA-8474 changed the layout of configuration options on the website from a 
table which over time ran out of horizontal space to a list.
This vastly improved readability but is not yet ideal. Further discussion was 
had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
This ticket is to move that discussion to a separate thread and make it more 
visible to other people and to give subsequent PRs a home.

Currently proposed options are attached to this issue.

  was:
KAKFA-8474 changed the layout of configuration options on the website from a 
table which over time ran out of horizontal space to a list.
This vastly improved readability but is not yet ideal. Further discussion was 
had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
This ticket is to move that discussion to a separate thread and make it more 
visible to other people and to give subsequent PRs a home.

Currently proposed options are attached to this issue.


> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: option1.png, option2.png, option3.png, option4.png
>
>
> KAFKA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-6212) Kafka Streams - Incorrect partition rebalancing

2020-01-14 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-6212.
--
Resolution: Cannot Reproduce

> Kafka Streams - Incorrect partition rebalancing
> ---
>
> Key: KAFKA-6212
> URL: https://issues.apache.org/jira/browse/KAFKA-6212
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.1
> Environment: Linux
>Reporter: Ivan Atanasov
>Priority: Major
>
> Trying to use streaming with version 0.10.0.1 of kafka but it is not working 
> how I'd expect. I realize that this is a fairly old version now but it is 
> what we are running and are not in a position to upgrade right now.
> The particular problem I am having is when an extra instance of the streaming 
> app is run using the same application ID. What seems to happen is the newly 
> introduced instance takes half of the partitions available, which is expected 
> but the original instance drops all the partitions it was reading from. 
> therefore from then on, data is only read from half the partitions.
> Strangely offsets are still being committed for the other partitions but the 
> data from them is not consumed as expected.
> My topology is very simple for now, all it does is a print of the message. 
> Also I have tried making both instances use different client IDs and state 
> directories.
> Is this a known bug in 0.10.0.1?
> *Logs Below:*
> Instance 1:
> {quote}[2017-11-15 10:41:41,597] INFO [StreamThread-1] Setting newly assigned 
> partitions [rawEvents-5, rawEvents-6, rawEvents-3, rawEvents-4, rawEvents-9, 
> rawEvents-7, rawEvents-8, rawEvents-1, rawEvents-2, rawEvents-0] for group 
> kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:41:41,616] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_0 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,645] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_1 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,645] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_2 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_3 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_4 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_5 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_6 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_7 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_8 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,648] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_9 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:42:08,682] INFO [StreamThread-1] Revoking previously assigned 
> partitions [rawEvents-5, rawEvents-6, rawEvents-3, rawEvents-4, rawEvents-9, 
> rawEvents-7, rawEvents-8, rawEvents-1, rawEvents-2, rawEvents-0] for group 
> kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:42:08,682] INFO [StreamThread-1] Removing a task 0_0 
> (org.apache.kafka.streams.processor.internals.StreamThread){quote}
> Instance 2:
> {quote}[2017-11-15 10:42:08,827] INFO [StreamThread-1] Successfully joined 
> group kafka-stream-test with generation 2 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-11-15 10:42:08,829] INFO [StreamThread-1] Setting newly assigned 
> partitions [rawEvents-5, rawEvents-3, rawEvents-1, rawEvents-2, rawEvents-0] 
> for group kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:42:08,840] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_0 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:42:08,869] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_1 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:42:08,870] INFO [StreamThread-1] 

[jira] [Commented] (KAFKA-6212) Kafka Streams - Incorrect partition rebalancing

2020-01-14 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015205#comment-17015205
 ] 

Guozhang Wang commented on KAFKA-6212:
--

I'm closing this ticket since there's no activity for about 2 years. If it 
still exist please feel free to re-open with more information you'd like to 
provide.

> Kafka Streams - Incorrect partition rebalancing
> ---
>
> Key: KAFKA-6212
> URL: https://issues.apache.org/jira/browse/KAFKA-6212
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.1
> Environment: Linux
>Reporter: Ivan Atanasov
>Priority: Major
>
> Trying to use streaming with version 0.10.0.1 of kafka but it is not working 
> how I'd expect. I realize that this is a fairly old version now but it is 
> what we are running and are not in a position to upgrade right now.
> The particular problem I am having is when an extra instance of the streaming 
> app is run using the same application ID. What seems to happen is the newly 
> introduced instance takes half of the partitions available, which is expected 
> but the original instance drops all the partitions it was reading from. 
> therefore from then on, data is only read from half the partitions.
> Strangely offsets are still being committed for the other partitions but the 
> data from them is not consumed as expected.
> My topology is very simple for now, all it does is a print of the message. 
> Also I have tried making both instances use different client IDs and state 
> directories.
> Is this a known bug in 0.10.0.1?
> *Logs Below:*
> Instance 1:
> {quote}[2017-11-15 10:41:41,597] INFO [StreamThread-1] Setting newly assigned 
> partitions [rawEvents-5, rawEvents-6, rawEvents-3, rawEvents-4, rawEvents-9, 
> rawEvents-7, rawEvents-8, rawEvents-1, rawEvents-2, rawEvents-0] for group 
> kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:41:41,616] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_0 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,645] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_1 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,645] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_2 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_3 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_4 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,646] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_5 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_6 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_7 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,647] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_8 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:41:41,648] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_9 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:42:08,682] INFO [StreamThread-1] Revoking previously assigned 
> partitions [rawEvents-5, rawEvents-6, rawEvents-3, rawEvents-4, rawEvents-9, 
> rawEvents-7, rawEvents-8, rawEvents-1, rawEvents-2, rawEvents-0] for group 
> kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:42:08,682] INFO [StreamThread-1] Removing a task 0_0 
> (org.apache.kafka.streams.processor.internals.StreamThread){quote}
> Instance 2:
> {quote}[2017-11-15 10:42:08,827] INFO [StreamThread-1] Successfully joined 
> group kafka-stream-test with generation 2 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2017-11-15 10:42:08,829] INFO [StreamThread-1] Setting newly assigned 
> partitions [rawEvents-5, rawEvents-3, rawEvents-1, rawEvents-2, rawEvents-0] 
> for group kafka-stream-test 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2017-11-15 10:42:08,840] INFO [StreamThread-1] Creating restoration consumer 
> client for stream task #0_0 
> (org.apache.kafka.streams.processor.internals.StreamTask)
> [2017-11-15 10:42:08,869] INFO [StreamThread-1] 

[jira] [Commented] (KAFKA-9426) OffsetsForLeaderEpochClient Use Switch Statement

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015160#comment-17015160
 ] 

ASF GitHub Bot commented on KAFKA-9426:
---

belugabehr commented on pull request #7959: KAFKA-9426: 
OffsetsForLeaderEpochClient Use Switch Statement
URL: https://github.com/apache/kafka/pull/7959
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> OffsetsForLeaderEpochClient Use Switch Statement
> 
>
> Key: KAFKA-9426
> URL: https://issues.apache.org/jira/browse/KAFKA-9426
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>
> Use switch statement for Error Code Enum handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9426) OffsetsForLeaderEpochClient Use Switch Statement

2020-01-14 Thread David Mollitor (Jira)
David Mollitor created KAFKA-9426:
-

 Summary: OffsetsForLeaderEpochClient Use Switch Statement
 Key: KAFKA-9426
 URL: https://issues.apache.org/jira/browse/KAFKA-9426
 Project: Kafka
  Issue Type: Improvement
Reporter: David Mollitor


Use switch statement for Error Code Enum handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9425) InFlightRequests Class Uses Thread-Safe Counter Non-Thread-Safe Collection

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015148#comment-17015148
 ] 

ASF GitHub Bot commented on KAFKA-9425:
---

belugabehr commented on pull request #7958: KAFKA-9425: InFlightRequests Class 
Uses Thread-Safe Counter Non-Threa…
URL: https://github.com/apache/kafka/pull/7958
 
 
   …d-Safe Collection
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> InFlightRequests Class Uses Thread-Safe Counter Non-Thread-Safe Collection
> --
>
> Key: KAFKA-9425
> URL: https://issues.apache.org/jira/browse/KAFKA-9425
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>
> [https://github.com/apache/kafka/blob/d6ace7b2d7c4ad721dd8247fb2b3eff9f67fbfee/clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java#L34-L36]
>  
> Not sure why this needs the overheard of {{AtomicInteger}} when the 
> collection being modified isn't itself thread-safe.  The comment of the 
> counter says that it may lag as things currently stand.
>  
> Also, add a few more niceties since I'm opening it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9425) InFlightRequests Class Uses Thread-Safe Counter Non-Thread-Safe Collection

2020-01-14 Thread David Mollitor (Jira)
David Mollitor created KAFKA-9425:
-

 Summary: InFlightRequests Class Uses Thread-Safe Counter 
Non-Thread-Safe Collection
 Key: KAFKA-9425
 URL: https://issues.apache.org/jira/browse/KAFKA-9425
 Project: Kafka
  Issue Type: Improvement
Reporter: David Mollitor


[https://github.com/apache/kafka/blob/d6ace7b2d7c4ad721dd8247fb2b3eff9f67fbfee/clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java#L34-L36]

 

Not sure why this needs the overheard of {{AtomicInteger}} when the collection 
being modified isn't itself thread-safe.  The comment of the counter says that 
it may lag as things currently stand.

 

Also, add a few more niceties since I'm opening it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7538) Improve locking model used to update ISRs and HW

2020-01-14 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-7538.
---
Fix Version/s: 2.5.0
 Reviewer: Jason Gustafson
   Resolution: Fixed

> Improve locking model used to update ISRs and HW
> 
>
> Key: KAFKA-7538
> URL: https://issues.apache.org/jira/browse/KAFKA-7538
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.5.0
>
>
> We currently use a ReadWriteLock in Partition to update ISRs and high water 
> mark for the partition. This can result in severe lock contention if there 
> are multiple producers writing a large amount of data into a single partition.
> The current locking model is:
>  # read lock while appending to log on every Produce request on the request 
> handler thread
>  # write lock on leader change, updating ISRs etc. on request handler or 
> scheduler thread
>  # write lock on every replica fetch request to check if ISRs need to be 
> updated and to update HW and ISR on the request handler thread
> 2) is infrequent, but 1) and 3) may be frequent and can result in lock 
> contention. If there are lots of produce requests to a partition from 
> multiple processes, on the leader broker we may see:
>  # one slow log append locks up one request thread for that produce while 
> holding onto the read lock
>  # (replicationFactor-1) request threads can be blocked waiting for write 
> lock to process replica fetch request
>  # potentially several other request threads processing Produce may be queued 
> up to acquire read lock because of the waiting writers.
> In a thread dump with this issue, we noticed several request threads blocked 
> waiting for write, possibly to due to replication fetch retries.
>  
> Possible fixes:
>  # Process `Partition#maybeExpandIsr` on a single scheduler thread similar to 
> `Partition#maybeShrinkIsr` so that only a single thread is blocked on the 
> write lock. But this will delay updating ISRs and HW.
>  # Change locking in `Partition#maybeExpandIsr` so that only read lock is 
> acquired to check if ISR needs updating and write lock is acquired only to 
> update ISRs. Also use a different lock for updating HW (perhaps just the 
> Partition object lock) so that typical replica fetch requests complete 
> without acquiring Partition write lock on the request handler thread.
> I will submit a PR for 2) , but other suggestions to fix this are welcome.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7538) Improve locking model used to update ISRs and HW

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015136#comment-17015136
 ] 

ASF GitHub Bot commented on KAFKA-7538:
---

rajinisivaram commented on pull request #5866: KAFKA-7538: Reduce lock 
contention for Partition ISR lock
URL: https://github.com/apache/kafka/pull/5866
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve locking model used to update ISRs and HW
> 
>
> Key: KAFKA-7538
> URL: https://issues.apache.org/jira/browse/KAFKA-7538
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
>
> We currently use a ReadWriteLock in Partition to update ISRs and high water 
> mark for the partition. This can result in severe lock contention if there 
> are multiple producers writing a large amount of data into a single partition.
> The current locking model is:
>  # read lock while appending to log on every Produce request on the request 
> handler thread
>  # write lock on leader change, updating ISRs etc. on request handler or 
> scheduler thread
>  # write lock on every replica fetch request to check if ISRs need to be 
> updated and to update HW and ISR on the request handler thread
> 2) is infrequent, but 1) and 3) may be frequent and can result in lock 
> contention. If there are lots of produce requests to a partition from 
> multiple processes, on the leader broker we may see:
>  # one slow log append locks up one request thread for that produce while 
> holding onto the read lock
>  # (replicationFactor-1) request threads can be blocked waiting for write 
> lock to process replica fetch request
>  # potentially several other request threads processing Produce may be queued 
> up to acquire read lock because of the waiting writers.
> In a thread dump with this issue, we noticed several request threads blocked 
> waiting for write, possibly to due to replication fetch retries.
>  
> Possible fixes:
>  # Process `Partition#maybeExpandIsr` on a single scheduler thread similar to 
> `Partition#maybeShrinkIsr` so that only a single thread is blocked on the 
> write lock. But this will delay updating ISRs and HW.
>  # Change locking in `Partition#maybeExpandIsr` so that only read lock is 
> acquired to check if ISR needs updating and write lock is acquired only to 
> update ISRs. Also use a different lock for updating HW (perhaps just the 
> Partition object lock) so that typical replica fetch requests complete 
> without acquiring Partition write lock on the request handler thread.
> I will submit a PR for 2) , but other suggestions to fix this are welcome.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7787) Add error specifications to KAFKA-7609

2020-01-14 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley updated KAFKA-7787:
---
Description: In our RPC JSON, it would be nice if we could specify what 
versions of a response could contain what errors.  See the discussion here: 
https://github.com/apache/kafka/pull/5893#discussion_r244841051  (was: In our 
RPC JSON, it would be nice if we could specify what versions of a response 
could contain what errors.  See the discussion here: 
https://github.com/apache/kafka/pull/5893)

> Add error specifications to KAFKA-7609
> --
>
> Key: KAFKA-7787
> URL: https://issues.apache.org/jira/browse/KAFKA-7787
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin McCabe
>Priority: Minor
>
> In our RPC JSON, it would be nice if we could specify what versions of a 
> response could contain what errors.  See the discussion here: 
> https://github.com/apache/kafka/pull/5893#discussion_r244841051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8768) Replace DeleteRecords request/response with automated protocol

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015102#comment-17015102
 ] 

ASF GitHub Bot commented on KAFKA-8768:
---

tombentley commented on pull request #7957: KAFKA-8768: DeleteRecords 
request/response automated protocol
URL: https://github.com/apache/kafka/pull/7957
 
 
   Also add version 2 to make use of flexible versions, per KIP-482.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace DeleteRecords request/response with automated protocol
> --
>
> Key: KAFKA-8768
> URL: https://issues.apache.org/jira/browse/KAFKA-8768
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Tom Bentley
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9424) Using AclCommand,avoid call the global method loadcache in SimpleAclAuthorizer

2020-01-14 Thread Steven Lu (Jira)
Steven Lu created KAFKA-9424:


 Summary: Using AclCommand,avoid call the global method loadcache 
in SimpleAclAuthorizer
 Key: KAFKA-9424
 URL: https://issues.apache.org/jira/browse/KAFKA-9424
 Project: Kafka
  Issue Type: Improvement
  Components: admin, tools
Affects Versions: 2.3.1, 2.4.0, 0.10.2.0
 Environment: Linux,JDK7+
Reporter: Steven Lu


In the class Named AclCommand,configure SimpleAclAuthorizer,but no need call 
loadCache.
now we have 20,000 topics in kafka cluster,everytime I run AclCommand,all these 
topics's Alcs need to be authed, it will be very slow.
The purpose of this optimization is:we can choose to not load the acl of all 
topics into memory, mainly for adding and deleting permissions.

PR Available here: [https://github.com/apache/kafka/pull/7706]

mainly for adding and deleting permissions,we can choose to not load the acl of 
all topics into memory,then we can add two args "--load-acl-cache" "false" in 
AclCommand.main;else you don't add these args, it will load the acl cache 
defaultly.

we can choose improve the running time from minutes to less than one second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9392) Document and add test for deleteAcls that match single/multiple resources

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015077#comment-17015077
 ] 

ASF GitHub Bot commented on KAFKA-9392:
---

rajinisivaram commented on pull request #7956: KAFKA-9392; Clarify deleteAcls 
javadoc and add test for create/delete timing
URL: https://github.com/apache/kafka/pull/7956
 
 
   Follow-on from PR #7911  to clarify the guarantee for deleteAcls and add a 
deterministic test to ensure we don't regress.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document and add test for deleteAcls that match single/multiple resources
> -
>
> Key: KAFKA-9392
> URL: https://issues.apache.org/jira/browse/KAFKA-9392
> Project: Kafka
>  Issue Type: Task
>  Components: security
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.5.0
>
>
> From PR review of 
> [https://github.com/apache/kafka/pull/7911:|https://github.com/apache/kafka/pull/7911,]
> If you do {{Admin.createAcls()}} followed by {{Admin.deleteAcls()}}, if you 
> specify the ACL in both cases, you are guaranteed to delete the ACL 
> regardless of which broker handles the request. If you use a matching filter 
> that doesn't specify the resource pattern for {{deleteAcls}}, then we don't 
> provide that guarantee.
> We should document this and add a deterministic test for the guaranteed 
> behaviour to ensure we don't regress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015040#comment-17015040
 ] 

Sönke Liebau commented on KAFKA-9423:
-

I've taken the liberty of creating a pull request with option 3 
ping [~hachikuji] & [~mimaison] if you find a bit of time for a review.

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: option1.png, option2.png, option3.png, option4.png
>
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015037#comment-17015037
 ] 

ASF GitHub Bot commented on KAFKA-9423:
---

soenkeliebau commented on pull request #7955: KAFKA-9423: Refine layout of 
configuration options on website and make individual settings directly linkable
URL: https://github.com/apache/kafka/pull/7955
 
 
   Test strategy:
   Built the site and served from an Apache httpd docker container. Used w3c 
validator to ensure generated HTML is valid.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: option1.png, option2.png, option3.png, option4.png
>
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread moshe blumberg (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015003#comment-17015003
 ] 

moshe blumberg commented on KAFKA-9423:
---

1 & 3 as well.

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: option1.png, option2.png, option3.png, option4.png
>
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014995#comment-17014995
 ] 

Sönke Liebau commented on KAFKA-9423:
-

Personally I am in favor of one of the options that list options vertically (1 
& 3).
If we have the specifications on one line, we risk running out of horizontal 
space again for example if a setting has a very long default value.

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: option1.png, option2.png, option3.png, option4.png
>
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9308) Misses SAN after certificate creation

2020-01-14 Thread Agostino Sarubbo (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014994#comment-17014994
 ] 

Agostino Sarubbo commented on KAFKA-9308:
-

Hello,

it happens the same as described in KAFKA-7450

It works for me without ssl.client.auth=required

> Misses SAN after certificate creation
> -
>
> Key: KAFKA-9308
> URL: https://issues.apache.org/jira/browse/KAFKA-9308
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.1
>Reporter: Agostino Sarubbo
>Priority: Minor
>
> Hello,
> I followed the documentation to use kafka with ssl, however the entire 
> 'procedure' loses at the end the specified SAN.
> To test, run (after the first keytool command and after the latest):
>  
> {code:java}
> keytool -list -v -keystore server.keystore.jks
> {code}
> Reference:
>  [http://kafka.apache.org/documentation.html#security_ssl]
>  
> {code:java}
> #!/bin/bash
> #Step 1
> keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg 
> RSA -genkey -ext SAN=DNS:test.test.com
> #Step 2
> openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
> keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
> keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
> #Step 3
> keytool -keystore server.keystore.jks -alias localhost -certreq -file 
> cert-file 
> openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed 
> -days 365 -CAcreateserial -passin pass:test1234 
> keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert 
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}
>  
> In the detail, the SAN is losed after:
> {code:java}
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Attachment: (was: image-2020-01-14-11-18-36-825.png)

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Attachment: (was: image-2020-01-14-11-17-55-277.png)

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Description: 
KAKFA-8474 changed the layout of configuration options on the website from a 
table which over time ran out of horizontal space to a list.
This vastly improved readability but is not yet ideal. Further discussion was 
had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
This ticket is to move that discussion to a separate thread and make it more 
visible to other people and to give subsequent PRs a home.

Currently proposed options are attached to this issue.

  was:
KAKFA-8474 changed the layout of configuration options on the website from a 
table which over time ran out of horizontal space to a list.
This vastly improved readability but is not yet ideal. Further discussion was 
had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
This ticket is to move that discussion to a separate thread and make it more 
visible to other people and to give subsequent PRs a home.

Currently proposed options are listed below.

Option 1: 
 !image-2020-01-14-11-17-55-277.png|thumbnail! 

Option 2:
 !image-2020-01-14-11-18-12-190.png|thumbnail! 

Option 3:
 !image-2020-01-14-11-18-24-939.png|thumbnail! 

Option 4:
 !image-2020-01-14-11-18-36-825.png|thumbnail! 


> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Attachment: (was: image-2020-01-14-11-18-12-190.png)

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau updated KAFKA-9423:

Attachment: (was: image-2020-01-14-11-18-24-939.png)

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau reassigned KAFKA-9423:
---

Assignee: Sönke Liebau

> Refine layout of configuration options on website and make individual 
> settings directly linkable
> 
>
> Key: KAFKA-9423
> URL: https://issues.apache.org/jira/browse/KAFKA-9423
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Sönke Liebau
>Assignee: Sönke Liebau
>Priority: Trivial
> Attachments: image-2020-01-14-11-17-55-277.png, 
> image-2020-01-14-11-18-12-190.png, image-2020-01-14-11-18-24-939.png, 
> image-2020-01-14-11-18-36-825.png
>
>
> KAKFA-8474 changed the layout of configuration options on the website from a 
> table which over time ran out of horizontal space to a list.
> This vastly improved readability but is not yet ideal. Further discussion was 
> had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
> This ticket is to move that discussion to a separate thread and make it more 
> visible to other people and to give subsequent PRs a home.
> Currently proposed options are listed below.
> Option 1: 
>  !image-2020-01-14-11-17-55-277.png|thumbnail! 
> Option 2:
>  !image-2020-01-14-11-18-12-190.png|thumbnail! 
> Option 3:
>  !image-2020-01-14-11-18-24-939.png|thumbnail! 
> Option 4:
>  !image-2020-01-14-11-18-36-825.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable

2020-01-14 Thread Jira
Sönke Liebau created KAFKA-9423:
---

 Summary: Refine layout of configuration options on website and 
make individual settings directly linkable
 Key: KAFKA-9423
 URL: https://issues.apache.org/jira/browse/KAFKA-9423
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: Sönke Liebau
 Attachments: image-2020-01-14-11-17-55-277.png, 
image-2020-01-14-11-18-12-190.png, image-2020-01-14-11-18-24-939.png, 
image-2020-01-14-11-18-36-825.png

KAKFA-8474 changed the layout of configuration options on the website from a 
table which over time ran out of horizontal space to a list.
This vastly improved readability but is not yet ideal. Further discussion was 
had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474.
This ticket is to move that discussion to a separate thread and make it more 
visible to other people and to give subsequent PRs a home.

Currently proposed options are listed below.

Option 1: 
 !image-2020-01-14-11-17-55-277.png|thumbnail! 

Option 2:
 !image-2020-01-14-11-18-12-190.png|thumbnail! 

Option 3:
 !image-2020-01-14-11-18-24-939.png|thumbnail! 

Option 4:
 !image-2020-01-14-11-18-36-825.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9308) Misses SAN after certificate creation

2020-01-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014989#comment-17014989
 ] 

Sönke Liebau commented on KAFKA-9308:
-

Hi [~ago], thanks for the update. Would you be interested in creating a pull 
request to update the documentation?

Regarding your second question, that is a bit hard to diagnose without further 
detail, does it give any reason why the handshake failed?

> Misses SAN after certificate creation
> -
>
> Key: KAFKA-9308
> URL: https://issues.apache.org/jira/browse/KAFKA-9308
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.1
>Reporter: Agostino Sarubbo
>Priority: Minor
>
> Hello,
> I followed the documentation to use kafka with ssl, however the entire 
> 'procedure' loses at the end the specified SAN.
> To test, run (after the first keytool command and after the latest):
>  
> {code:java}
> keytool -list -v -keystore server.keystore.jks
> {code}
> Reference:
>  [http://kafka.apache.org/documentation.html#security_ssl]
>  
> {code:java}
> #!/bin/bash
> #Step 1
> keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg 
> RSA -genkey -ext SAN=DNS:test.test.com
> #Step 2
> openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
> keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
> keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
> #Step 3
> keytool -keystore server.keystore.jks -alias localhost -certreq -file 
> cert-file 
> openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed 
> -days 365 -CAcreateserial -passin pass:test1234 
> keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert 
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}
>  
> In the detail, the SAN is losed after:
> {code:java}
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)