[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175883#comment-15175883
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1685


> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173356#comment-15173356
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

Github user mbalassi commented on the pull request:

https://github.com/apache/flink/pull/1685#issuecomment-190584934
  
If no objections will merge this tomorrow morning.


> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171182#comment-15171182
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

Github user Xazax-hun commented on the pull request:

https://github.com/apache/flink/pull/1685#issuecomment-189944863
  
I think this change is done and ready to be considered for the merge. I 
think it should be merged to both the master and the release-1.0 branch.



> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156635#comment-15156635
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

Github user Xazax-hun commented on the pull request:

https://github.com/apache/flink/pull/1685#issuecomment-187076733
  
Thank you for your insight! I think you are right.
I will move the murmur hash to MathUtils as well, and document that which 
hash should be used to which purpose. And I will migrate the changes on the 
streaming API to use murmur instead of jenkins. 


> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156306#comment-15156306
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1685#issuecomment-186948795
  
It is pretty crucial that different hash functions are used for the 
partitioning across machines, and the internal partitioning of data structures. 
If the same hash function is used for both, many internal data structure 
partitions will be empty.

So far we divided it the following way (admittedly not documented)
  - murmur hash across machines
  - Jenkins hash internally in data structures

How about we stick with that division and use Murmur Hash in the streaming 
partitioner as well?




> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156257#comment-15156257
 ] 

ASF GitHub Bot commented on FLINK-3422:
---

GitHub user Xazax-hun opened a pull request:

https://github.com/apache/flink/pull/1685

[WIP][FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes.

This pull request contains a fix for FLINK-3422. Some of the tests are 
failing at the moment, because they utilized prior knowledge about the user 
hash function. Fixing those tests require knowledge about the internals of 
Flink that I do not possess yet, so Marton Balassi helps me.

The Jira ticket mentions both Murmur and Jenkins hash.
Murmur hash is already used in the batch implementation: 
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/shipping/OutputEmitter.java#L187

My approach was to move Jenkins hash from CompactingHashTable to MathUtils 
and use that in HashPartitioner. In case you think it is better to use murmur 
hash here, or it has some value to be consistent in this regard with the batch 
implementation, please let me know. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xazax-hun/flink HashPartitioner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1685


commit afaa069483423e0bbb448f773cdcb4992689745e
Author: Gabor Horvath 
Date:   2016-02-21T13:54:44Z

[FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes.

commit 102053618e11e0de784d4d02152dc439a1e274ca
Author: Márton Balassi 
Date:   2016-02-21T22:01:00Z

[WIP][FLINK-3422][streaming][api-breaking] Update tests reliant on hashing




> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-18 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152209#comment-15152209
 ] 

Gabor Horvath commented on FLINK-3422:
--

Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look 
into this issue during the weekend to get more familiar with the code.

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150221#comment-15150221
 ] 

Stephan Ewen commented on FLINK-3422:
-

True enough, but all the more reason to fix that now...

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-16 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150058#comment-15150058
 ] 

Aljoscha Krettek commented on FLINK-3422:
-

This is a breaking change, though. Some users might have already developed 
stuff that depends on the hash function to be as it is. For example, there is 
this custom akka state query thingy.

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)