[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

aray Thu, 09 Mar 2017 19:45:46 -0800

Github user aray commented on the issue:

    https://github.com/apache/spark/pull/17226
  
    There are three things going on here in your one example.
    
    1. Spark 1.6 [first version with pivot] (and Spark 2.0+ with an aggregate 
output type unsupported by PivotFirst) gives incorrect answers to when one of 
the pivot column values is null (only affects the 'null' column) this is fixed 
by doing a null safe equals in the injected if statement 
https://github.com/apache/spark/pull/17226/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R525
    
    2. Spark 2.0+ with PivotFirst gives a NPE when one of the pivot column 
values is null. The main thing fixed in this PR.
    
    3. There is inconsistency between Spark 1.6 and 2.0+ on the result of a 
pivot with a `count(1)` aggregate when no values are aggregated for a cell. 
This is separate from the issues above and it's not clear which version is 
naturally correct (pandas leaves those values as null, Oracle 11g gives 0, and 
I need to test others).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

Reply via email to