[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367895#comment-16367895
 ] 

Apache Spark commented on SPARK-23381:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/20630

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367845#comment-16367845
 ] 

Joseph K. Bradley commented on SPARK-23381:
---

Copying my comment from the PR:
{quote}
For ML, I actually don't think this has to be a blocker. It's not great, but 
it's not a regression.

However, we should definitely fix this in the future and soon: For ML, it's 
really important that MurmurHash3 behave consistently across platforms.

To fix this, we'll need to maintain the old implementation of MurmushHash3 to 
maintain the behavior of ML Pipelines exported from previous versions of Spark.
{quote}

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359956#comment-16359956
 ] 

Sean Owen commented on SPARK-23381:
---

I don't think that behavior is guaranteed anyway, but I can see it's a 
nice-to-have. I suppose the problem is that this changes behavior of programs 
at the margins, too. Maybe OK for 2.4.x

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-10 Thread Shintaro Murakami (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359690#comment-16359690
 ] 

Shintaro Murakami commented on SPARK-23381:
---

FeatureHasher in MLLib uses Murmur3 in hashing indices. If I made an online 
prediction in another environment like C++ predict server,  the indices do not 
match and can not predict correctly.

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-10 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359501#comment-16359501
 ] 

Sean Owen commented on SPARK-23381:
---

... what problem does this cause?

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23381) Murmur3 hash generates a different value from other implementations

2018-02-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359445#comment-16359445
 ] 

Apache Spark commented on SPARK-23381:
--

User 'mrkm4ntr' has created a pull request for this issue:
https://github.com/apache/spark/pull/20568

> Murmur3 hash generates a different value from other implementations
> ---
>
> Key: SPARK-23381
> URL: https://issues.apache.org/jira/browse/SPARK-23381
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Shintaro Murakami
>Priority: Major
>
> Murmur3 hash generates a different value from the original and other 
> implementations (like Scala standard library and Guava or so) when the length 
> of a bytes array is not multiple of 4.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org