Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21704#discussion_r199999346
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends
Expression {
}
}
- override def dataType: DataType =
children.map(_.dataType).headOption.getOrElse(StringType)
+ override def dataType: DataType = {
+ val dataTypes = children.map(_.dataType)
+ dataTypes.headOption.map {
+ case ArrayType(et, _) =>
+ ArrayType(et,
dataTypes.exists(_.asInstanceOf[ArrayType].containsNull))
+ case dt => dt
+ }.getOrElse(StringType)
+ }
--- End diff --
Actually, `Concat` for array type has the type coercion to add casts to
make all children the same type, but we also have the optimization
`SimplifyCasts` to remove unnecessary casts which might remove casts from
arrays not contains null to arrays contains null
([optimizer/expressions.scala#L611](https://github.com/apache/spark/blob/d87a8c6c0d1a4db5c9444781160a65562f8ea738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L611)).
E.g., `concat(array(1,2,3), array(4,null,6))` might generate a wrong data
type during the execution.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]