bersprockets opened a new pull request, #44099:
URL: https://github.com/apache/spark/pull/44099

   ### What changes were proposed in this pull request?
   
   In various Pandas aggregate functions, remove each comparison or arithmetic 
operation between `DoubleType` and `IntergerType` in `evaluateExpression` and 
replace with a comparison or arithmetic operation between `DoubleType` and 
`DoubleType`.
   
   Affected functions are `PandasStddev`, `PandasVariance`, `PandasSkewness`, 
`PandasKurtosis`, and `PandasCovar`.
   
   ### Why are the changes needed?
   
   These functions fail in interpreted mode. For example, `evaluateExpression` 
in `PandasKurtosis` compares a double to an integer:
   ```
   If(n < 4, Literal.create(null, DoubleType) ...
   ```
   This results in a boxed double and a boxed integer getting passed to 
`SQLOrderingUtil.compareDoubles` which expects two doubles as arguments. The 
scala runtime tries to unbox the boxed integer as a double, resulting in an 
error.
   
   Reproduction example:
   ```
   spark.sql("set spark.sql.codegen.wholeStage=false")
   spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
   
   import numpy as np
   import pandas as pd
   
   import pyspark.pandas as ps
   
   pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
   psser = ps.from_pandas(pser)
   
   psser.kurt()
   ```
   See Jira (SPARK-46189) for the other reproduction cases.
   
   This works fine in codegen mode because the integer is already unboxed and 
the Java runtime will implictly cast it to a double.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to