Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17226#discussion_r105290321
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -216,4 +216,10 @@ class DataFramePivotSuite extends QueryTest with
SharedSQLContext{
Row("d", 15000.0, 48000.0) :: Row("J", 20000.0, 30000.0) :: Nil
)
}
+
+ test("pivot with null should not throw NPE") {
+ checkAnswer(
+ Seq(Tuple1(None),
Tuple1(Some(1))).toDF("a").groupBy($"a").pivot("a").count(),
+ Row(null, 1, null) :: Row(1, null, 1) :: Nil)
--- End diff --
Hi @aray, thanks for taking a look for this. I have two questions.
I tried the exact same change first :). I gave up because..
This produced different results between when it is optimized and not
optimized.
For example, it should now produce the results as below:
- Optimized (in this PR)
```
+----+----+----+
| a|null| 1|
+----+----+----+
|null| 1|null|
| 1|null| 1|
+----+----+----+
```
- Not optimized
```
+----+----+---+
| a|null| 1|
+----+----+---+
|null| 0| 0|
| 1| 0| 1|
+----+----+---+
```
Wouldn't we get a different results for `count` when the type is not
supported for this optimization?
So, I tried to do something with transformed plans or something but started
to get worried if it is worth. This is why I said in my PR - "I could not find
a clean and short way".
Do you know how to easily fill up 0 for `null` in this case? I think my
change is ported into this here for non-optimized path if you know a way to
produce the same results.
Don't we need to look up the plans to find out `Count`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]