Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105322758 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext{ Row("d", 15000.0, 48000.0) :: Row("J", 20000.0, 30000.0) :: Nil ) } + + test("pivot with null should not throw NPE") { + checkAnswer( + Seq(Tuple1(None), Tuple1(Some(1))).toDF("a").groupBy($"a").pivot("a").count(), + Row(null, 1, null) :: Row(1, null, 1) :: Nil) --- End diff -- Right the non optimized codepath should have been doing a null safe equals in the if statement. I have fixed that in a81c062 and added a unit test. As to whether an aggregate function of count(1) in a pivot should fill 0's for null I think that is an orthogonal issue. First note that that it will always* follow the optimized codepath as the choice is based on the return type of the aggregate. Second its not clear that that is the expected result, for instance pandas leaves those values as null and Oracle 11g gives 0 (Still need to check R/reshape2 and MS SQL Server). I think it would be best to open another JIRA ticket to discuss this further. * unless there are multiple aggregates and one of them is not supported, which is a consistancy problem.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org