[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

gatorsmile Fri, 16 Feb 2018 13:07:07 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20568#discussion_r168870192
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
    @@ -218,4 +221,32 @@ object FeatureHasher extends 
DefaultParamsReadable[FeatureHasher] {
     
       @Since("2.3.0")
       override def load(path: String): FeatureHasher = super.load(path)
    +
    +  private val seed = OldHashingTF.seed
    +
    +  /**
    +   * Calculate a hash code value for the term object using
    +   * Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32).
    +   * This is the default hash algorithm used from Spark 2.0 onwards.
    +   * Use hashUnsafeBytes2 to match the original algorithm with the value.
    +   * See SPARK-23381.
    +   */
    +  @Since("2.3.0")
    +  def murmur3Hash(term: Any): Int = {
    --- End diff --
    
    I would also address this comment.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Reply via email to