GitHub user esevastyanov opened a pull request:

    https://github.com/apache/kafka/pull/4350

    Cached hashCode of a Node instance since it is immutable

    `Node` structure is immutable so it is possible to cache `hashCode` of a 
`Node` instance as it's done in the `TopicPartition` class.
    Faced with the performance degradation in case of high load and large 
number of brokers (100), topics (150) and partitions (350). Made several 
diagnostic records with the java flight recorder and found that the method 
`HashSet::contains` in 
[`RecordAccumulator::ready`](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L423)
 takes about 40% of the whole time of the application. It is caused by 
re-calculating a hash code of a 
[leader](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L433)
 (`Node` instance) for every [batch 
entry](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L429).
 The cached hash code solved this issue and the corresponding time of 
`HashSet::contains` in `RecordAccumulator::ready` decreased to ~2%.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/esevastyanov/kafka node-hash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/4350.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4350
    
----
commit 736eb4c1bfed561cdd13cfc384ef1c6838faed59
Author: esevastyanov <eugene.sevastyanov@...>
Date:   2017-12-21T15:19:08Z

    Cached hashCode of a Node instance since it is immutable.

----


---

Reply via email to