[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850501#comment-16850501 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6349: KAFKA-7652: [WIP] Peel off the segmenting layer on session store caching URL: https://github.com/apache/kafka/pull/6349 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.3.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820588#comment-16820588 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6448: KAFKA-7652: Restrict range of fetch/findSessions in cache URL: https://github.com/apache/kafka/pull/6448 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793213#comment-16793213 ] Sophie Blee-Goldman commented on KAFKA-7652: Hi [~jonathanpdx]. I've been looking into the caching layer more deeply and discussed with Guozhang, we believe his earlier patch is not an appropriate fix so I have opened a PR that should address this more completely. If you could, please try this out on top of trunk and let me know if it helps/how it compares: https://github.com/apache/kafka/pull/6448 > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793208#comment-16793208 ] ASF GitHub Bot commented on KAFKA-7652: --- ableegoldman commented on pull request #6448: KAFKA-7652: Restrict range of fetch/findSessions in cache URL: https://github.com/apache/kafka/pull/6448 Reduce the total key space cache iterators have to search for segmented byte stores by wrapping several single-segment iterators. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782175#comment-16782175 ] Guozhang Wang commented on KAFKA-7652: -- Hi [~jonathanpdx] we are not voting on the 2.2.0 RC1 right now, if it is accepted then 2.2.0 is final and this PR would not be included; if it is cancelled we will see if we can push it into 2.2.0. On the other hand, the PR I gave you is a bit hacky as it is just to validate the root cause, and I'd like to have a thorough profiling and see if we should consider this as a general regression fix not only for session store, but also for window stores. We will start the investigation right away, but in the worst case if we cannot get the clean fix into 2.2.0 we will cut out a 2.2.1 release immediately for this purpose as well. At the mean time, I think it is safe for your application to turn off caching since in session-windowed aggregations, as long as your records timestamp is monotonically increasing and there's little out-of-ordering data, your will keep merging / expanding your sessions as you accepts new data which means that you'd not have too many overwrites on the store that can be de-duplicated -- if you see the downstream traffic increased by a log if caching is not used please let me know, and we can look into that as well. cc [~ableegoldman]. > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782139#comment-16782139 ] Jonathan Gordon commented on KAFKA-7652: That did it! This is really encouraging. Any chance it'll make it into 2.2.0? [^2.3.0-7652-NamedCache.txt] > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > 2.3.0-7652-NamedCache.txt, kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781178#comment-16781178 ] Guozhang Wang commented on KAFKA-7652: -- Thanks [~jonathanpdx] The new profiling image is very helpful. Could you try out this PR: https://github.com/apache/kafka/pull/6349 on top of trunk (or 2.2, should work as well) and let me know if it helps? Guozhang > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781177#comment-16781177 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6349: KAFKA-7652: [WIP] Peel off the segmenting layer on session store caching URL: https://github.com/apache/kafka/pull/6349 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780115#comment-16780115 ] Jonathan Gordon commented on KAFKA-7652: {quote}1) when you profile on latest trunk did you see the same pattern as observed in [https://i.imgur.com/IHxC2cZ.png] as well as in the trace logging compared with 0.10.2.x? {quote} The image you linked is actually for 0.10.2.x, which is our current deployment. It shows us gated by RocksDB, but that's actually *faster* than what we saw in 0.11.0.0, the recent trunk, or the test I just ran against 2.2.0-rc0: [https://i.imgur.com/L6PWIEF.png] {quote}2) practically the lookups in the caching layer is very cheap and hence even increased a lot it should not contribute to much overhead, whereas the fetches on the underlying store would be much more expensive. Could you confirm if the performance bottleneck is from the underlying rocksDB, or from the caching layer access? {quote} For 2.2.0-rc0, we're spending the bulk of our time trying to retrieve records from the NamedCache. See: [^0.10.2.1-NamedCache.txt] [^2.2.0-rc0_b-NamedCache.txt] While I agree it seems it should be more performant per retrieval, as you can see from the latest logs, it's the difference between 1,096,089 (2.2.0-rc0) and 19,245 (0.10.2.1) hits per second to the cache. The two orders of magnitude appear to outweigh whatever performance benefit we'd receive from the caching layer. This is just one of 8 tasks. During their respective runs, the services consumed 8.4M messages (0.10.2.1) with no lag vs 637K messages (2.2.0-rc0) with considerable lag. I'd be happy to run again with whatever custom logging or configuration you suggest to help further pinpoint the problem. > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, > kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778224#comment-16778224 ] Guozhang Wang commented on KAFKA-7652: -- Oh that's bad news.. 1) when you profile on latest trunk did you see the same pattern as observed in https://i.imgur.com/IHxC2cZ.png as well as in the trace logging compared with 0.10.2.x? 2) practically the lookups in the caching layer is very cheap and hence even increased a lot it should not contribute to much overhead, whereas the fetches on the underlying store would be much more expensive. Could you confirm if the performance bottleneck is from the underlying rocksDB, or from the caching layer access? > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777630#comment-16777630 ] Jonathan Gordon commented on KAFKA-7652: I tested out with trunk on Feb 22 (commit 0d461e4ea0a8353c358ae661837f471995943bb0) and we're still seeing the same performance issue. Aside from logging the output of the NamedCache stats, is there data I can provide to help further narrow down the issue? Any other ideas? > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Assignee: Guozhang Wang >Priority: Major > Labels: kip > Fix For: 2.2.0 > > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > > KIP-420: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766843#comment-16766843 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6191: KAFKA-7652: Part III; Put to underlying before Flush URL: https://github.com/apache/kafka/pull/6191 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756771#comment-16756771 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6161: KAFKA-7652: Part II; Add single-point query for SessionStore and use for flushing / getter URL: https://github.com/apache/kafka/pull/6161 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750785#comment-16750785 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6191: KAFKA-7652: Part III; Put to underlying before Flush URL: https://github.com/apache/kafka/pull/6191 This is on top of the Part II PR and hence should be only reviewed when the part II PR is merged. 1) In the caching layer's flush listener call, we should always write to the underlying store, before flushing (see https://github.com/apache/kafka/pull/4331 's point 4) for detailed explanation). When fixing 4331, it only touches on KV stores, but it turns out that we should fix for window and session store as well. 2) Also apply the optimization that was in session-store already: when the new value bytes and old value bytes are all null (this is possible e.g. if there is a put(K, V) followed by a remove(K) or put(K, null) and these two operations only hit the cache), upon flushing this mean the underlying store does not have this value at all and also no intermediate value has been sent to downstream as well. We can skip both putting a null to the underlying store as well as calling the flush listener sending `null -> null` in this case. Modifies corresponding unit tests. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746627#comment-16746627 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6134: KAFKA-7652: Part I; Fix SessionStore's findSession(single-key) URL: https://github.com/apache/kafka/pull/6134 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744760#comment-16744760 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6161: KAFKA-7652: Part II; Add single-point query for SessionStore and use for flushing / getter URL: https://github.com/apache/kafka/pull/6161 https://github.com/apache/kafka/pull/2972 tried to fix a bug about flushing operation, but it was not complete, since `findSessions(key, earliestEnd, latestStart)` does not guarantee to only return a single entry since its semantics are to return any sessions whose end > earliestEnd and whose start < latestStart. I've tried various ways to fix it completely and I ended up having to add a single-point query to the public ReadOnlySessionStore API for the exact needed semantics. It is used for flushing to read the old values (otherwise the wrong old values will be sent downstreams, hence it is a correctness issue) and also for getting the value for value-getters (it is for perf only). ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740923#comment-16740923 ] Guozhang Wang commented on KAFKA-7652: -- [~jonathanpdx] I've finally be able to reproduce the issue you've discovered and here is a fix I've put it up: https://github.com/apache/kafka/pull/6134 This is aimed for trunk (2.2.0), but once confirmed it fixed the issue I will have a cherry-pick fix for older branches as well. Sorry for the long wait! > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
[ https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740913#comment-16740913 ] ASF GitHub Bot commented on KAFKA-7652: --- guozhangwang commented on pull request #6134: KAFKA-7652: Fix SessionStore's findSession(single-key) URL: https://github.com/apache/kafka/pull/6134 1. Let `findSessions(final K key)` to call on underlying bytes store directly, using the more restricted range. 2. Fix the conservative upper range for multi-key range in session schema. 3. Minor: removed unnecessary private WrappedSessionStoreBytesIterator class as it is only used in unit test. 4. Minor: removed unnecessary schema#init function by using the direct bytes-to-binary function. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0 > - > > Key: KAFKA-7652 > URL: https://issues.apache.org/jira/browse/KAFKA-7652 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, > 2.0.1 >Reporter: Jonathan Gordon >Priority: Major > Attachments: kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt > > > I'm creating this issue in response to [~guozhang]'s request on the mailing > list: > [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E] > We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but > experience a severe performance degradation. The highest amount of CPU time > seems spent in retrieving from the local cache. Here's an example thread > profile with 0.11.0.0: > [https://i.imgur.com/l5VEsC2.png] > When things are running smoothly we're gated by retrieving from the state > store with acceptable performance. Here's an example thread profile with > 0.10.2.1: > [https://i.imgur.com/IHxC2cZ.png] > Some investigation reveals that it appears we're performing about 3 orders > magnitude more lookups on the NamedCache over a comparable time period. I've > attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3. > We're using session windows and have the app configured for > commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760 > I'm happy to share more details if they would be helpful. Also happy to run > tests on our data. > I also found this issue, which seems like it may be related: > https://issues.apache.org/jira/browse/KAFKA-4904 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)