[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions

2017-04-10 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963190#comment-15963190
 ] 

Denis Magda commented on IGNITE-4828:
-

[~yzhdanov], will you merge the final changes to the master, especially 
considering that the fair affinity function will be removed in 2.0? I didn't 
get whether the merge happened or not.

> Improve the distribution of keys within partitions
> --
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.9
>Reporter: Michael Griggs
>Assignee: Michael Griggs
>  Labels: important
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a 
> cache.  Each string key was approximately 22-characters in length.  When I 
> exported the partition counts (via GG Visor) I was able to see an unusual 
> periodicity in the number of keys allocated to partitions.  I charted this in 
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of 
> partitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used 
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions

2017-04-10 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962772#comment-15962772
 ] 

Yakov Zhdanov commented on IGNITE-4828:
---

>> 3. Can we mathematically guarantee that no key.hashCode(), when XOR with 
>> h>>>16 will return a negative value?
No, but return i & (parts - 1) is non-negative for any i and power-of-two parts.

Mike, [~dmagda], I pushed the changes to ignite-4828-reviewed since it is not 
100% what to do with fair affinity. So, I would suggest we merge these changes 
and then react accordingly.

> Improve the distribution of keys within partitions
> --
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.9
>Reporter: Michael Griggs
>Assignee: Michael Griggs
>  Labels: important
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a 
> cache.  Each string key was approximately 22-characters in length.  When I 
> exported the partition counts (via GG Visor) I was able to see an unusual 
> periodicity in the number of keys allocated to partitions.  I charted this in 
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of 
> partitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used 
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions

2017-04-07 Thread Michael Griggs (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961127#comment-15961127
 ] 

Michael Griggs commented on IGNITE-4828:


[~yzhdanov], thank you for your helpful input.  My answers are below:

1. Good point, I will look in to that.  Can we guarantee that such an adapter 
implementation would be early-bound, and not give us any runtime performance 
impact?

2. Using an enum combined with a switch statement is the highest performing 
implementation, as Java will (or should) emit a jump table in bytecode if it 
sees a switch with constants.  It will not do that for an if-else.  Given that 
the branch taken will likely be different for every cache, I don't know how 
well the CPU branch-predictor will cope; if it cannot cope, then the CPU 
pipeline will be flushed for every misprediction.  

However, to be completely transparent, this is absolutely a micro-optimisation: 
in my testing of a 3-branch if-else versus a 3 case switch, the difference was 
of the order of nanoseconds per operation.  

3. Can we mathematically guarantee that no key.hashCode(), when XOR with h>>>16 
will return a negative value?  

> Improve the distribution of keys within partitions
> --
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.9
>Reporter: Michael Griggs
>Assignee: Michael Griggs
>  Labels: important
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a 
> cache.  Each string key was approximately 22-characters in length.  When I 
> exported the partition counts (via GG Visor) I was able to see an unusual 
> periodicity in the number of keys allocated to partitions.  I charted this in 
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of 
> partitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used 
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions

2017-04-06 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959371#comment-15959371
 ] 

Yakov Zhdanov commented on IGNITE-4828:
---

Michael,

Here are my comments

# We have several more affinity function implementations and your change would 
also be helpful for them. I would create AffinityFunctionAdapter and implement 
{{setPartition(int);}} and {{partition(Object);}} there. Then, I would change 
all implementations under ignite-core/src/main to extend it.
# Enum with hash type is questionable. I think it will be better to initialize 
{{private int hashMask = powerOfTwo ? parts - 1 : 0;}} and then check it with 
{{if (hashMask != 0) }}.
# I don't think you need {{safeAbs()}} here {{int i = U.safeAbs((h = 
key.hashCode()) ^ (h >>> 16)); return i & (parts - 1) }}. I think we will 
always set sign bit to 0 with this.

Do points above make sense?


> Improve the distribution of keys within partitions
> --
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.9
>Reporter: Michael Griggs
>Assignee: Michael Griggs
>  Labels: important
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a 
> cache.  Each string key was approximately 22-characters in length.  When I 
> exported the partition counts (via GG Visor) I was able to see an unusual 
> periodicity in the number of keys allocated to partitions.  I charted this in 
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number of 
> partitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{((h = key.hashCode()) ^ (h >>> 16) & (parts - 1));}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used 
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (IGNITE-4828) Improve the distribution of keys within partitions

2017-03-15 Thread Michael Griggs (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926456#comment-15926456
 ] 

Michael Griggs commented on IGNITE-4828:


Performance testing:

After 50 iterations, for 10,000,000 calculations per iteration

|| Hash Type || Min || Max || Mean ||
| Old | 166 | 216 | 189 |
| New | 162 | 198 | 180 |


> Improve the distribution of keys within partitions
> --
>
> Key: IGNITE-4828
> URL: https://issues.apache.org/jira/browse/IGNITE-4828
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 1.9
>Reporter: Michael Griggs
> Fix For: 2.0
>
>
> An issue has been found when inserting several million string keys in to a 
> cache.  Each string key was approximately 22-characters in length.  When I 
> exported the partition counts (via GG Visor) I was able to see an unusual 
> periodicity in the number of keys allocated to partitions.  I charted this in 
> Excel (1).
> After further investigation, it appears that there is a relationship
> between the number of keys being inserted, the number of partitions
> assigned to the cache and amount of apparent periodicity: a small number 
> ofpartitions will cause periodicity to appear with a lower number of keys.
> The {{RendezvousAffinityFunction#partition}} function performs a simple
> calculation of key hashcode modulo partition-count:
> {{U.safeAbs(key.hashCode() % parts)}}
> Digging further I was led to the fact that this is how the Java HashMap
> *used* to behave (2), but was upgraded around Java 1.4 to perform the
> following:
> {{key.hashCode() & (parts - 1)}}
> which performs more efficiently.  It was then updated further to do the
> following:
> {{(h = key.hashCode()) ^ (h >>> 16);}}
> with the bit-shift performed to
> bq. incorporate impact of the highest bits that would otherwise never be used 
> in index calculations because of table bounds
> When using this function, rather than our
> {{RendezvousAffinityFunction#partition}} implementation, I also saw a
> significant decrease in the periodicity and a better distribution of keys
> amongst partitions (3). 
> (1):  https://i.imgur.com/0FtCZ2A.png
> (2):  
> https://www.quora.com/Why-does-Java-use-a-mediocre-hashCode-implementation-for-strings
> (3):  https://i.imgur.com/8ZuCSA3.png



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)