[Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

Gokula Krishnan D Tue, 25 Sep 2018 18:57:54 -0700

Hello All,

I am calculating the hash value  of few columns and determining whether its
an Insert/Delete/Update Record but found a scenario which is little weird
since some of the records returns same hash value though the key's are
totally different.


For the instance,

scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show()

+---------------+---------------+

|hash(40514XXXX)|hash(41751XXXX)|

+---------------+---------------+

|      976573657|      976573657|

+---------------+---------------+

scala> spark.sql("select hash('14589'),hash('40004XXXX')").show()

+-----------+---------------+

|hash(14589)|hash(40004XXXX)|

+-----------+---------------+

|  777096871|      777096871|

+-----------+---------------+
I do understand that hash() returns an integer, are these reached the max
value?.

Thanks & Regards,
Gokula Krishnan* (Gokul)*

[Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same

Reply via email to