[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

aray Thu, 09 Mar 2017 19:04:33 -0800

Github user aray commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17226#discussion_r105322758
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext{
           Row("d", 15000.0, 48000.0) :: Row("J", 20000.0, 30000.0) :: Nil
         )
       }
    +
    +  test("pivot with null should not throw NPE") {
    +    checkAnswer(
    +      Seq(Tuple1(None), 
Tuple1(Some(1))).toDF("a").groupBy($"a").pivot("a").count(),
    +      Row(null, 1, null) :: Row(1, null, 1) :: Nil)
    --- End diff --
    
    Right the non optimized codepath should have been doing a null safe equals 
in the if statement. I have fixed that in a81c062 and added a unit test.
    
    As to whether an aggregate function of count(1) in a pivot should fill 0's 
for null I think that is an orthogonal issue. First note that that it will 
always* follow the optimized codepath as the choice is based on the return type 
of the aggregate. Second its not clear that that is the expected result, for 
instance pandas leaves those values as null and Oracle 11g gives 0 (Still need 
to check R/reshape2 and MS SQL Server). I think it would be best to open 
another JIRA ticket to discuss this further.
    
    * unless there are multiple aggregates and one of them is not supported, 
which is a consistancy problem.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

Reply via email to