Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r207043203
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2217,6 +2218,100 @@ class Analyzer(
}
}
+ /**
+ * Resolves columns of an output table from the data in a logical plan.
This rule will:
+ *
+ * - Reorder columns when the write is by name
+ * - Insert safe casts when data types do not match
+ * - Insert aliases when column names do not match
+ * - Detect plans that are not compatible with the output table and
throw AnalysisException
+ */
+ object ResolveOutputRelation extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case append @ AppendData(table, query, isByName)
+ if table.resolved && query.resolved && !append.resolved =>
+ val projection = resolveOutputColumns(table.name, table.output,
query, isByName)
+
+ if (projection != query) {
+ append.copy(query = projection)
+ } else {
+ append
+ }
+ }
+
+ def resolveOutputColumns(
+ tableName: String,
+ expected: Seq[Attribute],
+ query: LogicalPlan,
+ byName: Boolean): LogicalPlan = {
+
+ if (expected.size < query.output.size) {
+ throw new AnalysisException(
+ s"""Cannot write to '$tableName', too many data columns:
+ |Table columns: ${expected.map(_.name).mkString(", ")}
+ |Data columns: ${query.output.map(_.name).mkString(",
")}""".stripMargin)
+ }
+
+ val errors = new mutable.ArrayBuffer[String]()
+ val resolved: Seq[NamedExpression] = if (byName) {
+ expected.flatMap { outAttr =>
+ query.resolveQuoted(outAttr.name, resolver) match {
+ case Some(inAttr) if inAttr.nullable && !outAttr.nullable =>
--- End diff --
I fixed this by always failing if `canWrite` returns false and always
adding the `UpCast`.
Now, `canWrite` will return true if the write type can be cast to the read
type for atomic types, as determined by `Cast.canSafeCast`. Since it only
returns a boolean, we always insert the cast and the optimizer should remove it
if it isn't needed.
I also added better error messages. When an error is found, the check will
add a clear error message by calling `addError: String => Unit`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]