Hi I am trying to implement simple hashing/checksum logic. The key logic is -
1. Generate sha1 hash 2. Extract last 8 chars 3. Convert 8 chars to Int (using base 16) Here is the cut down version of the code: --------------------------------------------------------------------------------------- *from pyspark.sql.functions import *from pyspark.sql.types import *from hashlib import sha1 as local_sha1df = spark.sql("select '4104003141' value_to_hash union all select '4102859263'")f1 = lambda x: str(int(local_sha1(x.encode('UTF-8')).hexdigest()[32:],16))f2 = lambda x: int(local_sha1(x.encode('UTF-8')).hexdigest()[32:],16)sha2Int1 = udf( f1 , StringType())sha2Int2 = udf( f2 , IntegerType())print(f('4102859263'))dfr = df.select(df.value_to_hash, sha2Int1(df.value_to_hash).alias('1'), sha2Int2(df.value_to_hash).alias('2'))* *dfr.show(truncate=False)* --------------------------------------------------------------------------------------------- I was expecting both columns should provide exact same values, however thats not the case *"always" * 2520346415 +-------------+----------+-----------+ |value_to_hash|1 |2 | +-------------+----------+-----------+ |4104003141 |478797741 |478797741 | |4102859263 |2520346415|-1774620881| +-------------+----------+-----------+ The function working fine, as shown in the print statement. However values are not matching and vary widely. Any pointer? -- Best Regards, Ayan Guha