[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions
[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963190#comment-15963190 ] Denis Magda commented on IGNITE-4828: - [~yzhdanov], will you merge the final changes to the master, especially considering that the fair affinity function will be removed in 2.0? I didn't get whether the merge happened or not. > Improve the distribution of keys within partitions > -- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement >Affects Versions: 1.9 >Reporter: Michael Griggs >Assignee: Michael Griggs > Labels: important > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number of > partitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions
[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962772#comment-15962772 ] Yakov Zhdanov commented on IGNITE-4828: --- >> 3. Can we mathematically guarantee that no key.hashCode(), when XOR with >> h>>>16 will return a negative value? No, but return i & (parts - 1) is non-negative for any i and power-of-two parts. Mike, [~dmagda], I pushed the changes to ignite-4828-reviewed since it is not 100% what to do with fair affinity. So, I would suggest we merge these changes and then react accordingly. > Improve the distribution of keys within partitions > -- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement >Affects Versions: 1.9 >Reporter: Michael Griggs >Assignee: Michael Griggs > Labels: important > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number of > partitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions
[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961127#comment-15961127 ] Michael Griggs commented on IGNITE-4828: [~yzhdanov], thank you for your helpful input. My answers are below: 1. Good point, I will look in to that. Can we guarantee that such an adapter implementation would be early-bound, and not give us any runtime performance impact? 2. Using an enum combined with a switch statement is the highest performing implementation, as Java will (or should) emit a jump table in bytecode if it sees a switch with constants. It will not do that for an if-else. Given that the branch taken will likely be different for every cache, I don't know how well the CPU branch-predictor will cope; if it cannot cope, then the CPU pipeline will be flushed for every misprediction. However, to be completely transparent, this is absolutely a micro-optimisation: in my testing of a 3-branch if-else versus a 3 case switch, the difference was of the order of nanoseconds per operation. 3. Can we mathematically guarantee that no key.hashCode(), when XOR with h>>>16 will return a negative value? > Improve the distribution of keys within partitions > -- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement >Affects Versions: 1.9 >Reporter: Michael Griggs >Assignee: Michael Griggs > Labels: important > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number of > partitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions
[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959371#comment-15959371 ] Yakov Zhdanov commented on IGNITE-4828: --- Michael, Here are my comments # We have several more affinity function implementations and your change would also be helpful for them. I would create AffinityFunctionAdapter and implement {{setPartition(int);}} and {{partition(Object);}} there. Then, I would change all implementations under ignite-core/src/main to extend it. # Enum with hash type is questionable. I think it will be better to initialize {{private int hashMask = powerOfTwo ? parts - 1 : 0;}} and then check it with {{if (hashMask != 0) }}. # I don't think you need {{safeAbs()}} here {{int i = U.safeAbs((h = key.hashCode()) ^ (h >>> 16)); return i & (parts - 1) }}. I think we will always set sign bit to 0 with this. Do points above make sense? > Improve the distribution of keys within partitions > -- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement >Affects Versions: 1.9 >Reporter: Michael Griggs >Assignee: Michael Griggs > Labels: important > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number of > partitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions
[ https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926456#comment-15926456 ] Michael Griggs commented on IGNITE-4828: Performance testing: After 50 iterations, for 10,000,000 calculations per iteration || Hash Type || Min || Max || Mean || | Old | 166 | 216 | 189 | | New | 162 | 198 | 180 | > Improve the distribution of keys within partitions > -- > > Key: IGNITE-4828 > URL: https://issues.apache.org/jira/browse/IGNITE-4828 > Project: Ignite > Issue Type: Improvement >Affects Versions: 1.9 >Reporter: Michael Griggs > Fix For: 2.0 > > > An issue has been found when inserting several million string keys in to a > cache. Each string key was approximately 22-characters in length. When I > exported the partition counts (via GG Visor) I was able to see an unusual > periodicity in the number of keys allocated to partitions. I charted this in > Excel (1). > After further investigation, it appears that there is a relationship > between the number of keys being inserted, the number of partitions > assigned to the cache and amount of apparent periodicity: a small number > ofpartitions will cause periodicity to appear with a lower number of keys. > The {{RendezvousAffinityFunction#partition}} function performs a simple > calculation of key hashcode modulo partition-count: > {{U.safeAbs(key.hashCode() % parts)}} > Digging further I was led to the fact that this is how the Java HashMap > *used* to behave (2), but was upgraded around Java 1.4 to perform the > following: > {{key.hashCode() & (parts - 1)}} > which performs more efficiently. It was then updated further to do the > following: > {{(h = key.hashCode()) ^ (h >>> 16);}} > with the bit-shift performed to > bq. incorporate impact of the highest bits that would otherwise never be used > in index calculations because of table bounds > When using this function, rather than our > {{RendezvousAffinityFunction#partition}} implementation, I also saw a > significant decrease in the periodicity and a better distribution of keys > amongst partitions (3). > (1): https://i.imgur.com/0FtCZ2A.png > (2): > https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings > (3): https://i.imgur.com/8ZuCSA3.png -- This message was sent by Atlassian JIRA (v6.3.15#6346)