[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@gatorsmile Thanks! I will close it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr Thank you for your contribution! The PR has been merged using 
your Github account. Could you close this? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20568
  
I think we can close this now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20568
  
Submitted the PR https://github.com/apache/spark/pull/20630 to take this 
over. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20568
  
To speedup the work here, I will take this over. All the contributions 
should be given to @mrkm4ntr 

Thanks for your work! @mrkm4ntr 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/20568
  
I think this should block RC4 : (  For ML, it's really important that 
MurmurHash3 behave consistently across platforms.

However, for ML, we'll need to maintain the old implementation of 
MurmushHash3 to maintain the behavior of ML Pipelines exported from previous 
versions of Spark.  I'll create & link a JIRA here in a moment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell just to make sure, given the dependency on `FeatureHasher`, 
should this block RC4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr this is legitimate failure. Can you fix the python tests? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87509/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87509 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87509/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87509 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87509/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20568
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87501/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87501 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87501/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87501 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87501/testReport)**
 for PR 20568 at commit 
[`c20cd97`](https://github.com/apache/spark/commit/c20cd97d7ce5690993b4490bb7cca955e7703d90).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr Do not worry about these failures. Since we know there are some 
unstable tests, our community is trying to fix them. For a while, we have to 
kick test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
I cannot reproduce this failure of the test in my environment.
It seems to me that this is not related to this change...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20568
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87472/testReport)**
 for PR 20568 at commit 
[`336bce0`](https://github.com/apache/spark/commit/336bce0d38d2068d12c4ba647da084e65bf30c93).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87472/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20568
  
**[Test build #87472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87472/testReport)**
 for PR 20568 at commit 
[`336bce0`](https://github.com/apache/spark/commit/336bce0d38d2068d12c4ba647da084e65bf30c93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20568
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell I added a method and changed it so that we call it only from 
FeatureHasher.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell I sent an e-mail to the topic `[VOTE] Spark 2.3.0 (RC3)`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
I registered with the same user name in dev list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr I see your point. Adding a method to Murmur3 would work.

The problem is that we are now going to release a `FeatureHasher` in Spark 
2.3 that uses the current Murmur3 implementation. If we change this to use the 
correct Murmur3 implementation after the release of Spark 2.3 we will break all 
models using feature hashing created using Spark 2.3. This might be a blocker. 
Can you send an e-mail to the dev list?

cc @sameeragarwal @srowen for more visibility.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20568
  
How about add a new config to control whether to use the new Murmur3 hash 
function and have that default turned off? We also have to document the change 
explicitly.  WDYT @gatorsmile @hvanhovell @cloud-fan ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-12 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell The main motivation is making the online prediction of trained 
parameters using FeatureHasher in MLLib.  If the generated hash value is 
different from the implementations in another language, indices of coefficients 
do not match and can not predict correctly.
But I agree backward compatibility is more important. Since FeatureHasher 
will be added from Spark 2.3.0, how about adding a new method of this content 
to Murmur 3 and using it only from FeatureHasher?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-12 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20568
  
@mrkm4ntr The change itself looks pretty reasonable. However I am very 
hesitant to merge this because this will probably break bucketing (it uses 
murmur3 to create the buckets); for example a bucketed table written by Spark 
2.2 cannot be safely read by Spark after the change.

Can you explain what problem you are trying to fix here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-11 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@kiszk Thank you for your review! I fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20568
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org