Re: [PR] [WIP][SPARK-53938][PYTHON][CONNECT] Fix decimal rescaling in LocalDataToArrowConversion [spark]

via GitHub Sat, 18 Oct 2025 06:33:15 -0700


zhengruifeng commented on PR #52637:
URL: https://github.com/apache/spark/pull/52637#issuecomment-3413850999


   
   ```py
   import decimal
   from pyspark.sql.types import *
   from pyspark.sql.functions import udf, lit, udtf
   
   
   df = spark.sql("SELECT DOUBLE(1.234) AS v")
   
   @udf(returnType=DecimalType(38, 18))
   def f(v: float):
       return decimal.Decimal(v)
   
   df.select("*", f("v")).show()
   
   
   @udf(returnType=DecimalType(38, 18), useArrow=True)
   def f2(v: float):
       return decimal.Decimal(v)
   
   df.select("*", f2("v")).show()
   
   
   @udtf(returnType='a: DOUBLE, b: DECIMAL(38, 18)')
   class Float2Decimal:
       def eval(self, v: float):
           yield v, decimal.Decimal(v)
   
   
   Float2Decimal(lit(1.234)).show()
   
   
   @udtf(returnType='a: DOUBLE, b: DECIMAL(38, 18)', useArrow=True)
   class Float2Decimal2:
       def eval(self, v: float):
           yield v, decimal.Decimal(v)
   
   Float2Decimal2(lit(1.234)).show()
   ```
   
   it turns out that this PR happen to also fix the same issue in 
arrow-optimized udf/udtf, the `f2` and `Float2Decimal2` also fail with
   ```
   pyarrow.lib.ArrowInvalid: Rescaling Decimal value would cause data loss
   ```
   before this change.
   
   @ueshin @HyukjinKwon @allisonport-db @asl3 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP][SPARK-53938][PYTHON][CONNECT] Fix decimal rescaling in LocalDataToArrowConversion [spark]

Reply via email to