[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367895#comment-16367895 ] Apache Spark commented on SPARK-23381: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/20630 > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367845#comment-16367845 ] Joseph K. Bradley commented on SPARK-23381: --- Copying my comment from the PR: {quote} For ML, I actually don't think this has to be a blocker. It's not great, but it's not a regression. However, we should definitely fix this in the future and soon: For ML, it's really important that MurmurHash3 behave consistently across platforms. To fix this, we'll need to maintain the old implementation of MurmushHash3 to maintain the behavior of ML Pipelines exported from previous versions of Spark. {quote} > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359956#comment-16359956 ] Sean Owen commented on SPARK-23381: --- I don't think that behavior is guaranteed anyway, but I can see it's a nice-to-have. I suppose the problem is that this changes behavior of programs at the margins, too. Maybe OK for 2.4.x > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359690#comment-16359690 ] Shintaro Murakami commented on SPARK-23381: --- FeatureHasher in MLLib uses Murmur3 in hashing indices. If I made an online prediction in another environment like C++ predict server, the indices do not match and can not predict correctly. > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359501#comment-16359501 ] Sean Owen commented on SPARK-23381: --- ... what problem does this cause? > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations
[ https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359445#comment-16359445 ] Apache Spark commented on SPARK-23381: -- User 'mrkm4ntr' has created a pull request for this issue: https://github.com/apache/spark/pull/20568 > Murmur3 hash generates a different value from other implementations > --- > > Key: SPARK-23381 > URL: https://issues.apache.org/jira/browse/SPARK-23381 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Shintaro Murakami >Priority: Major > > Murmur3 hash generates a different value from the original and other > implementations (like Scala standard library and Guava or so) when the length > of a bytes array is not multiple of 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org