Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21704#discussion_r200134825
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends
Expression {
}
}
- override def dataType: DataType =
children.map(_.dataType).headOption.getOrElse(StringType)
+ override def dataType: DataType = {
+ val dataTypes = children.map(_.dataType)
+ dataTypes.headOption.map {
+ case ArrayType(et, _) =>
+ ArrayType(et,
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
--- End diff --
@ueshin For ```Concat```, ```Coalesce```, etc. it seems to be that case
since a coercion rule is executed if there is any nullability difference on any
level of nesting. But it's not the case of ```CaseWhenCoercion``` rule, since
```sameType``` method is used for comparison.
I'm wondering if the goal is to avoid generation of extra ```Cast```
expressions, shouldn't other coercion rules utilize ```sameType``` method as
well? Let's assume that the result of ```concat``` is subsequently used by
```flatten```, wouldn't it lead to generation of extra null safe checks as
mentioned
[here](https://github.com/apache/spark/pull/21704#discussion_r200110924)?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]