[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

tejasapatil Thu, 23 Feb 2017 21:45:26 -0800

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17049#discussion_r102882772
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
    @@ -781,12 +780,12 @@ object HiveHashFunction extends 
InterpretedHashFunction {
             var i = 0
             val length = struct.numFields
             while (i < length) {
    -          result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
seed + 1).toInt
    +          result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
0).toInt
    --- End diff --
    
    The `seed` is something used in murmur3 hash and hive hash does not need 
it. See original impl in Hive codebase : 
https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638
    
    Since the methods related to hashing in Spark already had `seed`, I had to 
add it in hive-hash. When I compute the hash, I always need to set `seed` to 0 
which is what is done here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

Reply via email to