[jira] [Commented] (KAFKA-8338) Improve consumer offset expiration logic to take subscription into account
[ https://issues.apache.org/jira/browse/KAFKA-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839998#comment-16839998 ] ASF GitHub Bot commented on KAFKA-8338: --- huxihx commented on pull request #6737: KAFKA-8338: consumer offset expiration should consider subscription. URL: https://github.com/apache/kafka/pull/6737 https://issues.apache.org/jira/browse/KAFKA-8338 Currently only empty groups will be checked to seek any expired offsets. However, if a group is in Stable state but no longer subscribes any partitions, the offsets for these partitions will never be removed. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve consumer offset expiration logic to take subscription into account > -- > > Key: KAFKA-8338 > URL: https://issues.apache.org/jira/browse/KAFKA-8338 > Project: Kafka > Issue Type: Improvement >Reporter: Gwen Shapira >Assignee: huxihx >Priority: Major > > Currently, we expire consumer offsets for a group after the group is > considered gone. > There is a case where the consumer group still exists, but is now subscribed > to different topics. In that case, the offsets of the old topics will never > expire and if lag is monitored, the monitors will show ever-growing lag on > those topics. > We need to improve the logic to expire the consumer offsets if the consumer > group didn't subscribe to specific topics/partitions for enough time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-8338) Improve consumer offset expiration logic to take subscription into account
[ https://issues.apache.org/jira/browse/KAFKA-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxihx reassigned KAFKA-8338: - Assignee: huxihx > Improve consumer offset expiration logic to take subscription into account > -- > > Key: KAFKA-8338 > URL: https://issues.apache.org/jira/browse/KAFKA-8338 > Project: Kafka > Issue Type: Improvement >Reporter: Gwen Shapira >Assignee: huxihx >Priority: Major > > Currently, we expire consumer offsets for a group after the group is > considered gone. > There is a case where the consumer group still exists, but is now subscribed > to different topics. In that case, the offsets of the old topics will never > expire and if lag is monitored, the monitors will show ever-growing lag on > those topics. > We need to improve the logic to expire the consumer offsets if the consumer > group didn't subscribe to specific topics/partitions for enough time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8361) Fix ConsumerPerformanceTest#testNonDetailedHeaderMatchBody to test a real ConsumerPerformance's method
[ https://issues.apache.org/jira/browse/KAFKA-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839982#comment-16839982 ] ASF GitHub Bot commented on KAFKA-8361: --- sekikn commented on pull request #6736: KAFKA-8361. Fix ConsumerPerformanceTest#testNonDetailedHeaderMatchBody to test a real ConsumerPerformance's method URL: https://github.com/apache/kafka/pull/6736 kafka.tools.ConsumerPerformanceTest#testNonDetailedHeaderMatchBody doesn't work as a regression test, since it checks the number of the fields which are output by an inline `println`, not by a real method of ConsumerPerformance. This PR makes ConsumerPerformance's output logic an independent method and testNonDetailedHeaderMatchBody test it. It also includes some formatting fixes. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fix ConsumerPerformanceTest#testNonDetailedHeaderMatchBody to test a real > ConsumerPerformance's method > -- > > Key: KAFKA-8361 > URL: https://issues.apache.org/jira/browse/KAFKA-8361 > Project: Kafka > Issue Type: Improvement > Components: unit tests >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > {{kafka.tools.ConsumerPerformanceTest#testNonDetailedHeaderMatchBody}} > doesn't work as a regression test for now, since it tests an anonymous > function defined in the test method itself. > {code:java} > @Test > def testNonDetailedHeaderMatchBody(): Unit = { > testHeaderMatchContent(detailed = false, 2, () => > println(s"${dateFormat.format(System.currentTimeMillis)}, " + > s"${dateFormat.format(System.currentTimeMillis)}, 1.0, 1.0, 1, 1.0, 1, > 1, 1.1, 1.1")) > } > {code} > It should test a real {{ConsumerPerformance}}'s method, just like > {{testDetailedHeaderMatchBody}}. > {code:java} > @Test > def testDetailedHeaderMatchBody(): Unit = { > testHeaderMatchContent(detailed = true, 2, > () => ConsumerPerformance.printConsumerProgress(1, 1024 * 1024, 0, 1, > 0, 0, 1, dateFormat, 1L)) > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7847) KIP-421: Automatically resolve external configurations.
[ https://issues.apache.org/jira/browse/KAFKA-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839926#comment-16839926 ] Randall Hauch commented on KAFKA-7847: -- See https://github.com/apache/kafka/pull/6467 for the PR. > KIP-421: Automatically resolve external configurations. > --- > > Key: KAFKA-7847 > URL: https://issues.apache.org/jira/browse/KAFKA-7847 > Project: Kafka > Issue Type: Improvement > Components: config >Reporter: TEJAL ADSUL >Priority: Minor > Labels: needs-kip > Fix For: 2.3.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This proposal intends to enhance the AbstractConfig base class to support > replacing variables in configurations just prior to parsing and validation. > This simple change will make it very easy for client applications, Kafka > Connect, and Kafka Streams to use shared code to easily incorporate > externalized secrets and other variable replacements within their > configurations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7847) KIP-421: Automatically resolve external configurations.
[ https://issues.apache.org/jira/browse/KAFKA-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch resolved KAFKA-7847. -- Resolution: Fixed Reviewer: Randall Hauch > KIP-421: Automatically resolve external configurations. > --- > > Key: KAFKA-7847 > URL: https://issues.apache.org/jira/browse/KAFKA-7847 > Project: Kafka > Issue Type: Improvement > Components: config >Reporter: TEJAL ADSUL >Priority: Minor > Labels: needs-kip > Fix For: 2.3.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This proposal intends to enhance the AbstractConfig base class to support > replacing variables in configurations just prior to parsing and validation. > This simple change will make it very easy for client applications, Kafka > Connect, and Kafka Streams to use shared code to easily incorporate > externalized secrets and other variable replacements within their > configurations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7847) KIP-421: Automatically resolve external configurations.
[ https://issues.apache.org/jira/browse/KAFKA-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839925#comment-16839925 ] Randall Hauch commented on KAFKA-7847: -- See [KIP-421|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100829515] > KIP-421: Automatically resolve external configurations. > --- > > Key: KAFKA-7847 > URL: https://issues.apache.org/jira/browse/KAFKA-7847 > Project: Kafka > Issue Type: Improvement > Components: config >Reporter: TEJAL ADSUL >Priority: Minor > Labels: needs-kip > Fix For: 2.3.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This proposal intends to enhance the AbstractConfig base class to support > replacing variables in configurations just prior to parsing and validation. > This simple change will make it very easy for client applications, Kafka > Connect, and Kafka Streams to use shared code to easily incorporate > externalized secrets and other variable replacements within their > configurations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-7847) KIP-421: Automatically resolve external configurations.
[ https://issues.apache.org/jira/browse/KAFKA-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-7847: - Labels: needs-kip (was: ) > KIP-421: Automatically resolve external configurations. > --- > > Key: KAFKA-7847 > URL: https://issues.apache.org/jira/browse/KAFKA-7847 > Project: Kafka > Issue Type: Improvement > Components: config >Reporter: TEJAL ADSUL >Priority: Minor > Labels: needs-kip > Fix For: 2.3.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This proposal intends to enhance the AbstractConfig base class to support > replacing variables in configurations just prior to parsing and validation. > This simple change will make it very easy for client applications, Kafka > Connect, and Kafka Streams to use shared code to easily incorporate > externalized secrets and other variable replacements within their > configurations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8366) partitions of topics being deleted show up in the offline partitions metric
radai rosenblatt created KAFKA-8366: --- Summary: partitions of topics being deleted show up in the offline partitions metric Key: KAFKA-8366 URL: https://issues.apache.org/jira/browse/KAFKA-8366 Project: Kafka Issue Type: Improvement Reporter: radai rosenblatt i believe this is a bug offline partitions is a metric that indicates an error condition - lack of kafka availability. as an artifact of how deletion is implemented the partitions for a topic undergoing deletion will show up as offline, which just creates false-positive alerts. if needed, maybe there should exist a separate "partitions to be deleted" sensor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7994) Improve Stream-Time for rebalances and restarts
[ https://issues.apache.org/jira/browse/KAFKA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839868#comment-16839868 ] Richard Yu commented on KAFKA-7994: --- Hi all, could we get some reviews on the PR? There hasn't been too much activity as of late, so I thought it would be good if we could get some thoughts. > Improve Stream-Time for rebalances and restarts > --- > > Key: KAFKA-7994 > URL: https://issues.apache.org/jira/browse/KAFKA-7994 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Matthias J. Sax >Assignee: Richard Yu >Priority: Major > Attachments: possible-patch.diff > > > We compute a per-partition partition-time as the maximum timestamp over all > records processed so far. Furthermore, we use partition-time to compute > stream-time for each task as maximum over all partition-times (for all > corresponding task partitions). This stream-time is used to make decisions > about processing out-of-order records or drop them if they are late (ie, > timestamp < stream-time - grace-period). > During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, > -1) for tasks that are newly created (or migrated). In net effect, we forget > current stream-time for this case what may lead to non-deterministic behavior > if we stop processing right before a late record, that would be dropped if we > continue processing, but is not dropped after rebalance/restart. Let's look > at an examples with a grade period of 5ms for a tumbling windowed of 5ms, and > the following records (timestamps in parenthesis): > > {code:java} > r1(0) r2(5) r3(11) r4(2){code} > In the example, stream-time advances as 0, 5, 11, 11 and thus record `r4` is > dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or > rebalance after processing `r3` but before processing `r4`, we would > reinitialize stream-time as -1, and thus would process `r4` on restart/after > rebalance. The problem is, that stream-time does advance differently from a > global point of view: 0, 5, 11, 2. > > Note, this is a corner case, because if we would stop processing one record > earlier, ie, after processing `r2` but before processing `r3`, stream-time > would be advance correctly from a global point of view. > A potential fix would be, to store latest observed partition-time in the > metadata of committed offsets. Thus way, on restart/rebalance we can > re-initialize time correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6951) Implement offset expiration semantics for unsubscribed topics
[ https://issues.apache.org/jira/browse/KAFKA-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839753#comment-16839753 ] Vahid Hashemian commented on KAFKA-6951: Hi [~apovzner]. Thanks for catching this. I added a comment in the KIP to point this out, but I feel that's not enough. Perhaps a follow up KIP for that remaining feature makes more sense? > Implement offset expiration semantics for unsubscribed topics > - > > Key: KAFKA-6951 > URL: https://issues.apache.org/jira/browse/KAFKA-6951 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > [This > portion|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets#KIP-211:ReviseExpirationSemanticsofConsumerGroupOffsets-UnsubscribingfromaTopic] > of KIP-211 will be implemented separately from the main PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8363) Config provider parsing is broken
[ https://issues.apache.org/jira/browse/KAFKA-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch resolved KAFKA-8363. -- Resolution: Fixed Reviewer: Randall Hauch > Config provider parsing is broken > - > > Key: KAFKA-8363 > URL: https://issues.apache.org/jira/browse/KAFKA-8363 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1 > > > The > [regex|https://github.com/apache/kafka/blob/63e4f67d9ba9e08bdce705b35c5acf32dcd20633/clients/src/main/java/org/apache/kafka/common/config/ConfigTransformer.java#L56] > used by the {{ConfigTransformer}} class to parse config provider syntax (see > [KIP-279|https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations]) > is broken and fails when multiple path-less configs are specified. For > example: {{"${provider:configOne} ${provider:configTwo}"}} would be parsed > incorrectly as a reference with a path of {{"configOne} $\{provider"}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8363) Config provider parsing is broken
[ https://issues.apache.org/jira/browse/KAFKA-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Randall Hauch updated KAFKA-8363: - Fix Version/s: 2.2.1 2.1.2 2.3.0 2.0.2 > Config provider parsing is broken > - > > Key: KAFKA-8363 > URL: https://issues.apache.org/jira/browse/KAFKA-8363 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1 > > > The > [regex|https://github.com/apache/kafka/blob/63e4f67d9ba9e08bdce705b35c5acf32dcd20633/clients/src/main/java/org/apache/kafka/common/config/ConfigTransformer.java#L56] > used by the {{ConfigTransformer}} class to parse config provider syntax (see > [KIP-279|https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations]) > is broken and fails when multiple path-less configs are specified. For > example: {{"${provider:configOne} ${provider:configTwo}"}} would be parsed > incorrectly as a reference with a path of {{"configOne} $\{provider"}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8365) Protocol and consumer support for follower fetching
[ https://issues.apache.org/jira/browse/KAFKA-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839727#comment-16839727 ] ASF GitHub Bot commented on KAFKA-8365: --- mumrah commented on pull request #6731: KAFKA-8365 Consumer support for follower fetch URL: https://github.com/apache/kafka/pull/6731 Support for preferred read replica in consumer client *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Protocol and consumer support for follower fetching > --- > > Key: KAFKA-8365 > URL: https://issues.apache.org/jira/browse/KAFKA-8365 > Project: Kafka > Issue Type: New Feature > Components: consumer >Reporter: David Arthur >Assignee: David Arthur >Priority: Major > Fix For: 2.3.0 > > > Add the consumer client changes and implement the protocol support for > [KIP-392|https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8363) Config provider parsing is broken
[ https://issues.apache.org/jira/browse/KAFKA-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839724#comment-16839724 ] ASF GitHub Bot commented on KAFKA-8363: --- rhauch commented on pull request #6726: KAFKA-8363: Fix parsing bug for config providers URL: https://github.com/apache/kafka/pull/6726 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Config provider parsing is broken > - > > Key: KAFKA-8363 > URL: https://issues.apache.org/jira/browse/KAFKA-8363 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > > The > [regex|https://github.com/apache/kafka/blob/63e4f67d9ba9e08bdce705b35c5acf32dcd20633/clients/src/main/java/org/apache/kafka/common/config/ConfigTransformer.java#L56] > used by the {{ConfigTransformer}} class to parse config provider syntax (see > [KIP-279|https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations]) > is broken and fails when multiple path-less configs are specified. For > example: {{"${provider:configOne} ${provider:configTwo}"}} would be parsed > incorrectly as a reference with a path of {{"configOne} $\{provider"}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8365) Protocol and consumer support for follower fetching
David Arthur created KAFKA-8365: --- Summary: Protocol and consumer support for follower fetching Key: KAFKA-8365 URL: https://issues.apache.org/jira/browse/KAFKA-8365 Project: Kafka Issue Type: New Feature Components: consumer Reporter: David Arthur Assignee: David Arthur Fix For: 2.3.0 Add the consumer client changes and implement the protocol support for [KIP-392|https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839711#comment-16839711 ] Andrew commented on KAFKA-8315: --- I constructed a super-ugly workaround for my \{{join-example}} demo app. It adds a transformer onto each of the left and right streams, and they refer to each other's streamTime to decide whether to forward the messages. Not the 'right' solution, but I might be able to fix this up to serve as a workaround. Need to ensure that the right transformers pair up depending on their assigned partition, and maybe use a state store. Its a hack but it might just work for now. https://github.com/the4thamigo-uk/join-example/pull/1/files > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more apparent where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > [https://github.com/the4thamigo-uk/join-example] > > The only way to increase the period of historically joined records is to > increase the grace period for the join windows, and this has repercussions > when you extend it to a large period e.g. 2 years of minute-by-minute records. > Related slack conversations : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] > > Research on this issue has gone through a few phases : > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > This was considered to be a problem with the documentation not with the API > and is addressed in [https://github.com/apache/kafka/pull/6664] > 2) We then found an apparent issue in the code which would affect the > partition that is selected to deliver the next record to the join. This would > only be a problem for data that is out-of-order, and join-example uses data > that is in order of timestamp in both topics. So this fix is thought not to > affect join-example. > This was considered to be an issue and is being addressed in > [https://github.com/apache/kafka/pull/6719] > 3) Further investigation using a crafted unit test seems to show that the > partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok > [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] > 4) the current assumption is that the issue is rooted in the way records are > consumed from the topics : > We have tried to set various options to suppress reads form the source topics > but it doesnt seem to make any difference : > [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8354) Replace SyncGroup request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839707#comment-16839707 ] ASF GitHub Bot commented on KAFKA-8354: --- abbccdda commented on pull request #6729: KAFKA-8354: Replace Sync group request/response with automated protocol URL: https://github.com/apache/kafka/pull/6729 As part of https://issues.apache.org/jira/browse/KAFKA-7830 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace SyncGroup request/response with automated protocol > -- > > Key: KAFKA-8354 > URL: https://issues.apache.org/jira/browse/KAFKA-8354 > Project: Kafka > Issue Type: Sub-task >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839703#comment-16839703 ] Andrew commented on KAFKA-8315: --- I constructed a super-ugly workaround for my \{{join-example}} demo app. It adds a transformer onto each of the left and right streams, and they refer to each other's streamTime to decide whether to forward the messages. Not the 'right' solution, but I might be able to fix this up to serve as a workaround. Need to ensure that the right transformers pair up depending on their assigned partition, and maybe use a state store. Its a hack but it might just work for now. https://github.com/the4thamigo-uk/join-example/pull/1/files > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more apparent where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > [https://github.com/the4thamigo-uk/join-example] > > The only way to increase the period of historically joined records is to > increase the grace period for the join windows, and this has repercussions > when you extend it to a large period e.g. 2 years of minute-by-minute records. > Related slack conversations : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] > > Research on this issue has gone through a few phases : > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > This was considered to be a problem with the documentation not with the API > and is addressed in [https://github.com/apache/kafka/pull/6664] > 2) We then found an apparent issue in the code which would affect the > partition that is selected to deliver the next record to the join. This would > only be a problem for data that is out-of-order, and join-example uses data > that is in order of timestamp in both topics. So this fix is thought not to > affect join-example. > This was considered to be an issue and is being addressed in > [https://github.com/apache/kafka/pull/6719] > 3) Further investigation using a crafted unit test seems to show that the > partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok > [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] > 4) the current assumption is that the issue is rooted in the way records are > consumed from the topics : > We have tried to set various options to suppress reads form the source topics > but it doesnt seem to make any difference : > [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-8305) AdminClient should support creating topics with default partitions and replication factor
[ https://issues.apache.org/jira/browse/KAFKA-8305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839617#comment-16839617 ] ASF GitHub Bot commented on KAFKA-8305: --- agavra commented on pull request #6728: KAFKA-8305: support default partitions & replication factor in AdminClient#createTopic URL: https://github.com/apache/kafka/pull/6728 See: [KIP-464](https://cwiki.apache.org/confluence/display/KAFKA/KIP-464%3A+Defaults+for+AdminClient%23createTopic) for more information. ### Description This change makes the two required changes to support creating topics using the cluster defaults for replication and partitions: 1. Adds a `NewTopic(String)` constructor to the `NewTopic` API 2. Changes the `AdminManager` to accept `-1` as valid options for replication factor and partitions. If this is the case, it will resolve it using the default configuration. ### Testing - Updated unit tests with the new conditions - **TODO:** I still need to do end-to-end testing by spinning up my own Kafka cluster using this code and making sure that it works, but I'm putting the PR out early for review since the feature freeze is coming up soon ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > AdminClient should support creating topics with default partitions and > replication factor > - > > Key: KAFKA-8305 > URL: https://issues.apache.org/jira/browse/KAFKA-8305 > Project: Kafka > Issue Type: Improvement > Components: admin >Reporter: Almog Gavra >Priority: Major > > Today, the AdminClient creates topics by requiring a `NewTopic` object, which > must contain either partitions and replicas or an exact broker mapping (which > then infers partitions and replicas). Some users, however, could benefit from > just using the cluster default for replication factor but may not want to use > auto topic creation. > NOTE: I am planning on working on this, but I do not have permissions to > assign this ticket to myself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KAFKA-6951) Implement offset expiration semantics for unsubscribed topics
[ https://issues.apache.org/jira/browse/KAFKA-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839605#comment-16839605 ] Anna Povzner edited comment on KAFKA-6951 at 5/14/19 4:34 PM: -- Hi [~vahid], KIP-211 wiki implies that KIP-211 is fully implemented and in 2.1.0 release. Could you please add a reference to this JIRA to KIP-211 wiki, so that it is clear which part of KIP-211 is implemented and which is not? was (Author: apovzner): Hi [~vahid], KIP-211 wiki implies that KIP-211 is fully implemented and in 2.1.1 release. Could you please add a reference to this JIRA to KIP-211 wiki, so that it is clear which part of KIP-211 is implemented and which is not? > Implement offset expiration semantics for unsubscribed topics > - > > Key: KAFKA-6951 > URL: https://issues.apache.org/jira/browse/KAFKA-6951 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > [This > portion|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets#KIP-211:ReviseExpirationSemanticsofConsumerGroupOffsets-UnsubscribingfromaTopic] > of KIP-211 will be implemented separately from the main PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6951) Implement offset expiration semantics for unsubscribed topics
[ https://issues.apache.org/jira/browse/KAFKA-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839605#comment-16839605 ] Anna Povzner commented on KAFKA-6951: - Hi [~vahid], KIP-211 wiki implies that KIP-211 is fully implemented and in 2.1.1 release. Could you please add a reference to this JIRA to KIP-211 wiki, so that it is clear which part of KIP-211 is implemented and which is not? > Implement offset expiration semantics for unsubscribed topics > - > > Key: KAFKA-6951 > URL: https://issues.apache.org/jira/browse/KAFKA-6951 > Project: Kafka > Issue Type: Improvement > Components: core >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Major > > [This > portion|https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets#KIP-211:ReviseExpirationSemanticsofConsumerGroupOffsets-UnsubscribingfromaTopic] > of KIP-211 will be implemented separately from the main PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8343) streams application crashed due to rocksdb
[ https://issues.apache.org/jira/browse/KAFKA-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoshu updated KAFKA-8343: -- Description: my streams application always crashed in few days. The crash log looks like [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] so I think it may because of RocksDBStore.java closed incorrectly in multithread, or RockesIterator.key() was accessed after RocksDBstore.close() in some cases. (was: my streams application always crashed in few days. The crash log looks like [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] so I think it may because of RocksDBStore.java closed incorrectly in multithread, or RockesIterator.key() was accessed after RocksDBstore.close().) > streams application crashed due to rocksdb > -- > > Key: KAFKA-8343 > URL: https://issues.apache.org/jira/browse/KAFKA-8343 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 > Environment: centos 7 jdk8 kafka-streams1.0 >Reporter: gaoshu >Priority: Major > Attachments: fullsizeoutput_6.jpeg > > > my streams application always crashed in few days. The crash log looks like > [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] > so I think it may because of RocksDBStore.java closed incorrectly in > multithread, or RockesIterator.key() was accessed after RocksDBstore.close() > in some cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8343) streams application crashed due to rocksdb
[ https://issues.apache.org/jira/browse/KAFKA-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoshu updated KAFKA-8343: -- Description: my streams application always crashed in few days. The crash log looks like [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] so I think it may because of RocksDBStore.java closed incorrectly in multithread, or RockesIterator.key() was accessed after RocksDBstore.close(). (was: my streams application always crashed in few days. The crash log looks like [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] so I think it may because of RocksDBStore.java closed incorrectly in multithread. I look through the below code, it means the db.close() should after openiterators.close(). However, db.close() may be executed before iterators.close() due to instructions reorder. I hope my guess is correct. {code:java} // RocksDBStore.java @Override public synchronized void close() { if (!open) { return; } open = false; closeOpenIterators(); options.close(); wOptions.close(); fOptions.close(); db.close(); options = null; wOptions = null; fOptions = null; db = null; } {code}) > streams application crashed due to rocksdb > -- > > Key: KAFKA-8343 > URL: https://issues.apache.org/jira/browse/KAFKA-8343 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 > Environment: centos 7 jdk8 kafka-streams1.0 >Reporter: gaoshu >Priority: Major > Attachments: fullsizeoutput_6.jpeg > > > my streams application always crashed in few days. The crash log looks like > [https://github.com/facebook/rocksdb/issues/5234|[https://github.com/facebook/rocksdb/issues/5234].] > so I think it may because of RocksDBStore.java closed incorrectly in > multithread, or RockesIterator.key() was accessed after RocksDBstore.close(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KAFKA-8364) Avoid decompression of record when validate record at server in the scene of inPlaceAssignment .
[ https://issues.apache.org/jira/browse/KAFKA-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flower.min reassigned KAFKA-8364: - Assignee: Flower.min > Avoid decompression of record when validate record at server in the scene of > inPlaceAssignment . > - > > Key: KAFKA-8364 > URL: https://issues.apache.org/jira/browse/KAFKA-8364 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.2.0 >Reporter: Flower.min >Assignee: Flower.min >Priority: Major > > We do performance testing about Kafka server in specific scenarios .We build > a kafka cluster with one broker,and create topics with different number of > partitions.Then we start lots of producer processes to send large amounts of > messages to one of the topics at one testing .And we found that when the > upper limit of CPU usage has been reached But it does not reach the upper > limit of the bandwidth of the server network(Network inflow > rate:600M/s;CPU(%):>97%). > We analysis the JFIR of Kafka server when doing performance testing .After we > checked and completed the performance test again, we located the code > *"*ByteBuffer recordBuffer = > ByteBuffer.allocate(sizeOfBodyInBytes);(Class:DefaultRecord,Function:readFrom())''which > consumed CPU resources and caused a lot of GC .So we remove the allocation > and copying of ByteBuffer at our modified code, the test performance is > greatly improved(Network inflow rate:1GB/s;CPU(%):<60%) .This issue already > have been raised and solved at {color:#33}*[KAFKA-8106]*{color}. > *We also analysis the code of validation to record at server. Currently the > broker will decompress whole record including 'key' and 'value' to validate > record timestamp, key, offset, uncompressed size bytes, and magic . We remove > the decompression operation and then do performance testing again . we found > the CPU's stable usage is below 30% even lower.* *Removing decompression > operation to record can minimize CPU usage and improve performance greatly.* > Should we think of preventing decompress record when validate record at > server in the scene of inPlaceAssignment ? > *We think we should optimize the process of server-side validation record > for achieving the purpose of verifying the message without decompressing the > message.* > Maybe we can add some properties ('batch.min.timestamp'(Long) > ,'records.number'(Integer),'all.key.is.null'(boolean)) *in client side to the > batch level for validation*, *so that we don't need decompress record for > validate 'offset','timestamp' and key(The value of 'all.key.is.null' will > false when there is w key is null).* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8364) Avoid decompression of record when validate record at server in the scene of inPlaceAssignment .
Flower.min created KAFKA-8364: - Summary: Avoid decompression of record when validate record at server in the scene of inPlaceAssignment . Key: KAFKA-8364 URL: https://issues.apache.org/jira/browse/KAFKA-8364 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 2.2.0 Reporter: Flower.min We do performance testing about Kafka server in specific scenarios .We build a kafka cluster with one broker,and create topics with different number of partitions.Then we start lots of producer processes to send large amounts of messages to one of the topics at one testing .And we found that when the upper limit of CPU usage has been reached But it does not reach the upper limit of the bandwidth of the server network(Network inflow rate:600M/s;CPU(%):>97%). We analysis the JFIR of Kafka server when doing performance testing .After we checked and completed the performance test again, we located the code *"*ByteBuffer recordBuffer = ByteBuffer.allocate(sizeOfBodyInBytes);(Class:DefaultRecord,Function:readFrom())''which consumed CPU resources and caused a lot of GC .So we remove the allocation and copying of ByteBuffer at our modified code, the test performance is greatly improved(Network inflow rate:1GB/s;CPU(%):<60%) .This issue already have been raised and solved at {color:#33}*[KAFKA-8106]*{color}. *We also analysis the code of validation to record at server. Currently the broker will decompress whole record including 'key' and 'value' to validate record timestamp, key, offset, uncompressed size bytes, and magic . We remove the decompression operation and then do performance testing again . we found the CPU's stable usage is below 30% even lower.* *Removing decompression operation to record can minimize CPU usage and improve performance greatly.* Should we think of preventing decompress record when validate record at server in the scene of inPlaceAssignment ? *We think we should optimize the process of server-side validation record for achieving the purpose of verifying the message without decompressing the message.* Maybe we can add some properties ('batch.min.timestamp'(Long) ,'records.number'(Integer),'all.key.is.null'(boolean)) *in client side to the batch level for validation*, *so that we don't need decompress record for validate 'offset','timestamp' and key(The value of 'all.key.is.null' will false when there is w key is null).* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more apparent where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] The only way to increase the period of historically joined records is to increase the grace period for the join windows, and this has repercussions when you extend it to a large period e.g. 2 years of minute-by-minute records. Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] Research on this issue has gone through a few phases : 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more apparent where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] Research on this issue has gone through a few phases : 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka >
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more apparent where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] Research on this issue has gone through a few phases : 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more apparent where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [ [https://github.com/the4thamigo-uk/join-example]] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63 > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more apparent where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : [https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63] > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more apparent where
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [ [https://github.com/the4thamigo-uk/join-example]] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics : We have tried to set various options to suppress reads form the source topics but it doesnt seem to make any difference : https://github.com/the4thamigo-uk/join-example/commit/c674977fd0fdc689152695065d9277abea6bef63 was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [ https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics. > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > [ > [https://github.com/the4thamigo-uk/join-example]] > >
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [ https://github.com/the4thamigo-uk/join-example] Related slack conversations : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in [https://github.com/apache/kafka/pull/6719] 3) Further investigation using a crafted unit test seems to show that the partition-selection and ordering (PartitionGroup/RecordQueue) seems to work ok [https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b] 4) the current assumption is that the issue is rooted in the way records are consumed from the topics. was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in https://github.com/apache/kafka/pull/6719 3) Further investigation using a couple of crafted unit tests Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > [ > https://github.com/the4thamigo-uk/join-example] > > Related slack conversations : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1557733979453900 > > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` >
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : [https://github.com/the4thamigo-uk/join-example] 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This would only be a problem for data that is out-of-order, and join-example uses data that is in order of timestamp in both topics. So this fix is thought not to affect join-example. This was considered to be an issue and is being addressed in https://github.com/apache/kafka/pull/6719 3) Further investigation using a couple of crafted unit tests Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : https://github.com/the4thamigo-uk/join-example 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This was considered to be an issue and is being addressed in Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > [https://github.com/the4thamigo-uk/join-example] > > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > This was considered to be a problem with the documentation not with the API > and is addressed in [https://github.com/apache/kafka/pull/6664] > 2) We then found an apparent issue in the code which would affect the > partition that is selected to deliver the next record to the join. This would > only be a problem for data that is out-of-order, and join-example uses data > that is in order of timestamp in both topics. So this fix is thought not to > affect
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : https://github.com/the4thamigo-uk/join-example 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) We then found an apparent issue in the code which would affect the partition that is selected to deliver the next record to the join. This was considered to be an issue and is being addressed in Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > https://github.com/the4thamigo-uk/join-example > > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > This was considered to be a problem with the documentation not with the API > and is addressed in [https://github.com/apache/kafka/pull/6664] > 2) We then found an apparent issue in the code which would affect the > partition that is selected to deliver the next record to the join. > This was considered to be an issue and is being addressed in > > > Slack conversation here : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [Additional] > From what I understand, the retention period should be independent of the > grace period, so I think this is more than a documentation fix (see comments > below) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. An example of the issue is provided in this repository : 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized: i.e.}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. This was considered to be a problem with the documentation not with the API and is addressed in [https://github.com/apache/kafka/pull/6664] 2) Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) was: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > An example of the issue is provided in this repository : > > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized: i.e.}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > This was considered to be a problem with the documentation not with the API > and is addressed in [https://github.com/apache/kafka/pull/6664] > 2) > > > Slack conversation here : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [Additional] > From what I understand, the retention period should be independent of the > grace period, so I think this is more than a documentation fix (see comments > below) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Description: The problem we are experiencing is that we cannot reliably perform simple joins over pre-populated kafka topics. This seems more of a problem where one topic has records at less frequent record timestamp intervals that the other. 1) This issue was initially thought to be due to the inability to set the retention period for a join window via {{Materialized}} The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) was: The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems. Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] [Additional] >From what I understand, the retention period should be independent of the >grace period, so I think this is more than a documentation fix (see comments >below) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The problem we are experiencing is that we cannot reliably perform simple > joins over pre-populated kafka topics. This seems more of a problem where one > topic has records at less frequent record timestamp intervals that the other. > 1) This issue was initially thought to be due to the inability to set the > retention period for a join window via {{Materialized}} > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > > Slack conversation here : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [Additional] > From what I understand, the retention period should be independent of the > grace period, so I think this is more than a documentation fix (see comments > below) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KAFKA-8315) Historical join issues
[ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew updated KAFKA-8315: -- Summary: Historical join issues (was: Cannot pass Materialized into a join operation - hence cant set retention period independent of grace) > Historical join issues > -- > > Key: KAFKA-8315 > URL: https://issues.apache.org/jira/browse/KAFKA-8315 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Andrew >Assignee: John Roesler >Priority: Major > Attachments: code.java > > > The documentation says to use `Materialized` not `JoinWindows.until()` > ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), > but there is no where to pass a `Materialized` instance to the join > operation, only to the group operation is supported it seems. > > Slack conversation here : > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300] > [Additional] > From what I understand, the retention period should be independent of the > grace period, so I think this is more than a documentation fix (see comments > below) -- This message was sent by Atlassian JIRA (v7.6.3#76005)