Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21687#discussion_r199425774
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -129,7 +129,7 @@ case class CaseWhen(
case Seq(dt1, dt2) => dt1.sameType(dt2)
}
- override def dataType: DataType = branches.head._2.dataType
+ override def dataType: DataType =
valueTypes.reduce(TypeCoercion.findTightestCommonType(_, _).get)
--- End diff --
Thanks for your suggestion, but is it the best solution?
1. It seems that this solution would also fail on the exemple in the
description. The root data types in both branches are non-nullable, so
```branches.head._2.dataType``` gets called again.
```
|-- named_struct(val1, x AS `val1`, val2, 10 AS `val2`): struct (nullable
= false)
| |-- val1: string (nullable = false)
| |-- val2: integer (nullable = false)
```
and
```
|-- s: struct (nullable = false)
| |-- val1: string (nullable = true)
| |-- val2: integer (nullable = false)
```
2. Let's assume that there is no if statement and we call just
```.asNullable```. This will make all the struct fields nullable and changes
```containsNull``` to ```true``` for all ```MapTypes``` and ```ArrayTypes```
within the data type structure. Is it what we want?
Personally, I think we need something that goes recursively through
non-primitive types and merges ```nullable``` and ```containsNull``` flags.
That's exactly what ```TypeCoercion.findTightestCommonType``` does when passing
```sameType``` equal data types.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]