Not sure I get what you mean…. I ran the query that you had – and don’t get the same hash as you.
From: Gokula Krishnan D <email2...@gmail.com> Date: Friday, September 28, 2018 at 10:40 AM To: "Thakrar, Jayesh" <jthak...@conversantmedia.com> Cc: user <user@spark.apache.org> Subject: Re: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same Hello Jayesh, I have masked the input values with XXXX. Thanks & Regards, Gokula Krishnan (Gokul) On Wed, Sep 26, 2018 at 2:20 PM Thakrar, Jayesh <jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote: Cannot reproduce your situation. Can you share Spark version? Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show() +----------------+---------------+ |hash(40514XXXXX)|hash(41751XXXX)| +----------------+---------------+ | -1898845883| 916273350| +----------------+---------------+ scala> spark.sql("select hash('14589'),hash('40004XXXX')").show() +-----------+---------------+ |hash(14589)|hash(40004XXXX)| +-----------+---------------+ | 777096871| -1593820563| +-----------+---------------+ scala> From: Gokula Krishnan D <email2...@gmail.com<mailto:email2...@gmail.com>> Date: Tuesday, September 25, 2018 at 8:57 PM To: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: [Spark SQL] why spark sql hash() are returns the same hash value though the keys/expr are not same Hello All, I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key's are totally different. For the instance, scala> spark.sql("select hash('40514XXXXX'),hash('41751XXXX')").show() +---------------+---------------+ |hash(40514XXXX)|hash(41751XXXX)| +---------------+---------------+ | 976573657| 976573657| +---------------+---------------+ scala> spark.sql("select hash('14589'),hash('40004XXXX')").show() +-----------+---------------+ |hash(14589)|hash(40004XXXX)| +-----------+---------------+ | 777096871| 777096871| +-----------+---------------+ I do understand that hash() returns an integer, are these reached the max value?. Thanks & Regards, Gokula Krishnan (Gokul)