Github user aray commented on the issue:
https://github.com/apache/spark/pull/17226
There are three things going on here in your one example.
1. Spark 1.6 [first version with pivot] (and Spark 2.0+ with an aggregate
output type unsupported by PivotFirst) gives incorrect answers to when one of
the pivot column values is null (only affects the 'null' column) this is fixed
by doing a null safe equals in the injected if statement
https://github.com/apache/spark/pull/17226/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R525
2. Spark 2.0+ with PivotFirst gives a NPE when one of the pivot column
values is null. The main thing fixed in this PR.
3. There is inconsistency between Spark 1.6 and 2.0+ on the result of a
pivot with a `count(1)` aggregate when no values are aggregated for a cell.
This is separate from the issues above and it's not clear which version is
naturally correct (pandas leaves those values as null, Oracle 11g gives 0, and
I need to test others).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]