[GitHub] [kafka] ableegoldman commented on a diff in pull request #13993: KAFKA-15178: Improve ConsumerCoordinator.poll perf

2023-07-14 Thread via GitHub


ableegoldman commented on code in PR #13993:
URL: https://github.com/apache/kafka/pull/13993#discussion_r1264175800


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:
##
@@ -119,6 +119,7 @@ public final class ConsumerCoordinator extends 
AbstractCoordinator {
 private Set joinedSubscription;
 private MetadataSnapshot metadataSnapshot;
 private MetadataSnapshot assignmentSnapshot;
+private boolean metadataUpdated;

Review Comment:
   I think I would personally need to see benchmarks (somehow) to be convinced 
that just changing it to a `LinkedHashSet` is sufficient as an optimization, or 
even just guaranteed not to be a regression. Speaking from experience, I once 
tried to optimize the cache in Streams for better range-scan performance which 
involved swapping out the underlying data structure for a `LinkedHashMap`. The 
consequences were pretty dire, with the `LinkedHashMap` having comparably 
terrible performance characteristics when scaling up the keyspace. Granted it's 
not the exact same scenario because that was a map and also the Concurrent 
variant which surely means the scaling characteristics are not going to be 
exactly the same. But it has made me incredibly suspicious of the 
`LinkedHashXXX` data structures for use cases like this.
   
   All that said, if this is literally only used for an equality check and 
nothing else, why not just hash it or something like that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] ableegoldman commented on a diff in pull request #13993: KAFKA-15178: Improve ConsumerCoordinator.poll perf

2023-07-11 Thread via GitHub


ableegoldman commented on code in PR #13993:
URL: https://github.com/apache/kafka/pull/13993#discussion_r1260200818


##
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:
##
@@ -119,6 +119,7 @@ public final class ConsumerCoordinator extends 
AbstractCoordinator {
 private Set joinedSubscription;
 private MetadataSnapshot metadataSnapshot;
 private MetadataSnapshot assignmentSnapshot;
+private boolean metadataUpdated;

Review Comment:
   This optimization makes perfect sense but I am extremely paranoid that 
future changes will end up modifying the snapshot without remembering (or even 
knowing about) the need to update this boolean as well. Can you maybe 
encapsulate this in the `MetadataSnapshot` class or otherwise make sure it's 
impossible for this to get out of sync?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org