zpinto opened a new pull request, #2657:
URL: https://github.com/apache/helix/pull/2657
Add _deserializedPartitionCapacityMap to ResourceConfig model to ensure that
the partition capacity map is only deserialized one time. This will lead to
major performance gains as this was previously called in several parts of
BestPossibleStateCalculation and CurrentStateComputation.
### Description
During profiling we realized that the majority(up to 90%) of the
CurrentStateComputationStage is taken up by `_objectMapper.readValue`.
After investigating further, this problem is not only seen in
CurrentStateComputationStage. By only deserializing on the first call to
`getPartitionCapacityMap` we can **tremendously** improve the pipeline
performance in more than one place.
Where is this called:
- CurrentStateComputationStage
- For every resource when reporting resource capacity metrics
- For every partition of WAGED resource when
checkAndReduceInstanceCapacity based of currentStateMap
- Every pending STATE_TRANSITION message for WAGED replica to
checkAndReduceInstanceCapacity (n + 1 respect capacity solution)
- BestPossibleStateCalcStage
- For checkAndReduceCapacity of every instanceToAdd for all partitions of
WAGED resources (n + 1 respect capacity solution)
- Called for the creation of every single AssignableReplica for all
replicas of WAGED resources
Basically, every time(unbounded as cluster grows)
getPartitionCapacityWeightMap is called it is deserializing the
PARTITION_CAPACITY_MAP mapField in the ResourceConfig over and over again. This
change makes sure that deserialize only happens once per resource and will save
lots of time.
### Tests
No behavior changes, so will rely on current tests and CI to validate.
### Changes that Break Backward Compatibility (Optional)
NA
### Commits
- My commits all reference appropriate Apache Helix GitHub issues in their
subject lines. In addition, my commits follow the guidelines from "[How to
write a good git commit message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Code Quality
- My diff has been formatted using helix-style.xml
(helix-style-intellij.xml if IntelliJ IDE is used)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]