chitralverma edited a comment on issue #25122: 
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25122#issuecomment-510682452
 
 
   @HyukjinKwon I've raised this PR as a WIP till I incorporate your comments. 
I had some doubts regarding the tests in pivot.sql and was hoping you could 
clear it for me.
   
   While porting 'pivot.sql', I ran the command below on the original sql and 
it fails when running for configs 
`spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY`
   
   `build/sbt "sql/test-only *SQLQueryTestSuite -- -z pivot.sql"` 
   
   On inspection it seems like there is some discrepancy while handling the 
`null` values when passing through the udf. For Scala its expecting `null`, for 
Python its expecting `None` but the golden files contains `nan`. Thus the match 
is failing.
   
   This error persists in the port also. As per the guide, I tried looking for 
a related Jira but couldn't find one, so I thought I'd run this by you first 
before creating one.
   
   Stacktrace:
   
   ```
   5:21:42.536 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using 
configs: 
spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
   [info] - udf/udf-pivot.sql - Regular Python UDF *** FAILED *** (24 seconds, 
575 milliseconds)
   [info]   Expected "Java      2012    20000   [nan
   [info]   Java        2013    nan     30000
   [info]   dotNET      2012    15000   nan
   [info]   dotNET      2013    nan]    48000", but got "Java   2012    20000   
[None
   [info]   Java        2013    None    30000
   [info]   dotNET      2012    15000   None
   [info]   dotNET      2013    None]   48000" Result did not match for query #8
   [info]   SELECT * FROM (
   [info]     SELECT course, year, earnings, udf(s) as s
   [info]     FROM courseSales
   [info]     JOIN years ON year = y
   [info]   )
   [info]   PIVOT (
   [info]     udf(sum(earnings))
   [info]     FOR s IN (1, 2)
   [info]   ) (SQLQueryTestSuite.scala:333)
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
   [info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
   [info]   at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
   [info]   at org.scalatest.Assertions.assertResult(Assertions.scala:1003)
   
   ```
   ```
   5:21:17.912 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using 
configs: 
spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY
   [info] - udf/udf-pivot.sql - Scala UDF *** FAILED *** (25 seconds, 411 
milliseconds)
   [info]   Expected "Java      2012    20000   n[an
   [info]   Java        2013    nan     30000
   [info]   dotNET      2012    15000   nan
   [info]   dotNET      2013    nan]    48000", but got "Java   2012    20000   
n[ull
   [info]   Java        2013    null    30000
   [info]   dotNET      2012    15000   null
   [info]   dotNET      2013    null]   48000" Result did not match for query #8
   [info]   SELECT * FROM (
   [info]     SELECT course, year, earnings, udf(s) as s
   [info]     FROM courseSales
   [info]     JOIN years ON year = y
   [info]   )
   [info]   PIVOT (
   [info]     udf(sum(earnings))
   [info]     FOR s IN (1, 2)
   [info]   ) (SQLQueryTestSuite.scala:333)
   ```
   
   Any help will be appreciated. Thanks,
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to