zpinto opened a new pull request, #2657:
URL: https://github.com/apache/helix/pull/2657

   Add _deserializedPartitionCapacityMap to ResourceConfig model to ensure that 
the partition capacity map is only deserialized one time. This will lead to 
major performance gains as this was previously called in several parts of 
BestPossibleStateCalculation and CurrentStateComputation.
   
   ### Description
   
   During profiling we realized that the majority(up to 90%) of the 
CurrentStateComputationStage is taken up by `_objectMapper.readValue`.
   
   After investigating further, this problem is not only seen in 
CurrentStateComputationStage. By only deserializing on the first call to 
`getPartitionCapacityMap` we can **tremendously** improve the pipeline 
performance in more than one place.
   
   Where is this called:
   - CurrentStateComputationStage
     -  For every resource when reporting resource capacity metrics
     -  For every  partition of WAGED resource when 
checkAndReduceInstanceCapacity based of currentStateMap
     -  Every pending STATE_TRANSITION message for WAGED replica to 
checkAndReduceInstanceCapacity (n + 1 respect capacity solution)
   - BestPossibleStateCalcStage 
     -  For checkAndReduceCapacity of every instanceToAdd for all partitions of 
WAGED resources (n + 1 respect capacity solution)
     - Called for the creation of every single AssignableReplica for all 
replicas of WAGED resources
   
   Basically, every time(unbounded as cluster grows) 
getPartitionCapacityWeightMap is called it is deserializing the 
PARTITION_CAPACITY_MAP mapField in the ResourceConfig over and over again. This 
change makes sure that deserialize only happens once per resource and will save 
lots of time.
   
   ### Tests
   
   No behavior changes, so will rely on current tests and CI to validate.
   
   ### Changes that Break Backward Compatibility (Optional)
   
   NA
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their 
subject lines. In addition, my commits follow the guidelines from "[How to 
write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to