Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key's are totally different.
For the instance, scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show() +---------------+---------------+ |hash(40514XXXX)|hash(41751XXXX)| +---------------+---------------+ | 976573657| 976573657| +---------------+---------------+ scala> spark.sql("select hash('14589'),hash('40004XXXX')").show() +-----------+---------------+ |hash(14589)|hash(40004XXXX)| +-----------+---------------+ | 777096871| 777096871| +-----------+---------------+ I do understand that hash() returns an integer, are these reached the max value?. Thanks & Regards, Gokula Krishnan* (Gokul)*