Thanks a lot. Will try.

On Mon, Mar 23, 2020 at 8:16 PM Jacob Lynn <abebopare...@gmail.com> wrote:

> You are overflowing the integer type, which goes up to a max value
> of 2147483647 (2^31 - 1). Change the return type of `sha2Int2` to
> `LongType()` and it works as expected.
>
> On Mon, Mar 23, 2020 at 6:15 AM ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> I am trying to implement simple hashing/checksum logic. The key logic is
>> -
>>
>> 1. Generate sha1 hash
>> 2. Extract last 8 chars
>> 3. Convert 8 chars to Int (using base 16)
>>
>> Here is the cut down version of the code:
>>
>>
>> ---------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *from pyspark.sql.functions import *from pyspark.sql.types import *from
>> hashlib import sha1 as local_sha1df = spark.sql("select '4104003141'
>> value_to_hash union all  select '4102859263'")f1 = lambda x:
>> str(int(local_sha1(x.encode('UTF-8')).hexdigest()[32:],16))f2 = lambda x:
>> int(local_sha1(x.encode('UTF-8')).hexdigest()[32:],16)sha2Int1 = udf( f1 ,
>> StringType())sha2Int2 = udf( f2 , IntegerType())print(f('4102859263'))dfr =
>> df.select(df.value_to_hash, sha2Int1(df.value_to_hash).alias('1'),
>> sha2Int2(df.value_to_hash).alias('2'))*
>> *dfr.show(truncate=False)*
>>
>> ---------------------------------------------------------------------------------------------
>>
>> I was expecting both columns should provide exact same values, however
>> thats not the case *"always" *
>>
>> 2520346415 +-------------+----------+-----------+ |value_to_hash|1 |2 |
>> +-------------+----------+-----------+ |4104003141 |478797741 |478797741 | 
>> |4102859263
>> |2520346415|-1774620881| +-------------+----------+-----------+
>>
>> The function working fine, as shown in the print statement. However
>> values are not matching and vary widely.
>>
>> Any pointer?
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

-- 
Best Regards,
Ayan Guha

Reply via email to