Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r199999346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { + val dataTypes = children.map(_.dataType) + dataTypes.headOption.map { + case ArrayType(et, _) => + ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt + }.getOrElse(StringType) + } --- End diff -- Actually, `Concat` for array type has the type coercion to add casts to make all children the same type, but we also have the optimization `SimplifyCasts` to remove unnecessary casts which might remove casts from arrays not contains null to arrays contains null ([optimizer/expressions.scala#L611](https://github.com/apache/spark/blob/d87a8c6c0d1a4db5c9444781160a65562f8ea738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L611)). E.g., `concat(array(1,2,3), array(4,null,6))` might generate a wrong data type during the execution.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org