[GitHub] [spark] dtenedor commented on a diff in pull request #40732: [SPARK-43085][SQL] Support column DEFAULT assignment for multi-part table names

via GitHub Thu, 13 Apr 2023 10:55:06 -0700


dtenedor commented on code in PR #40732:
URL: https://github.com/apache/spark/pull/40732#discussion_r1165858639



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala:
##########
@@ -550,45 +551,24 @@ case class ResolveDefaultColumns(catalog: SessionCatalog) 
extends Rule[LogicalPl
    * Returns the schema for the target table of a DML command, looking into 
the catalog if needed.
    */
   private def getSchemaForTargetTable(table: LogicalPlan): Option[StructType] 
= {
-    // First find the source relation. Note that we use 'collectFirst' to 
descend past any
-    // SubqueryAlias nodes that may be present.
-    val source: Option[LogicalPlan] = table.collectFirst {
-      case r: NamedRelation if !r.skipSchemaResolution =>
-        // Here we only resolve the default columns in the tables that require 
schema resolution
-        // during write operations.
-        r
-      case r: UnresolvedCatalogRelation => r
-    }
-    // Check if the target table is already resolved. If so, return the 
computed schema.
-    source.foreach { r =>
-      if (r.schema.fields.nonEmpty) {
-        return Some(r.schema)
-      }
-    }
-    // Lookup the relation from the catalog by name. This either succeeds or 
returns some "not
-    // found" error. In the latter cases, return out of this rule without 
changing anything and let
-    // the analyzer return a proper error message elsewhere.
-    val tableName: TableIdentifier = source match {
-      case Some(r: UnresolvedRelation) => TableIdentifier(r.name)
-      case Some(r: UnresolvedCatalogRelation) => r.tableMeta.identifier
-      case _ => return None
-    }
-    // First try to get the table metadata directly. If that fails, check for 
views below.
-    if (catalog.tableExists(tableName)) {
-      return Some(catalog.getTableMetadata(tableName).schema)
-    }
-    val lookup: LogicalPlan = try {
-      catalog.lookupRelation(tableName)
-    } catch {
-      case _: AnalysisException => return None
-    }
-    lookup match {
-      case SubqueryAlias(_, r: UnresolvedCatalogRelation) =>
-        Some(r.tableMeta.schema)
-      case SubqueryAlias(_, r: View) if r.isTempView =>
-        Some(r.desc.schema)
-      case _ => None
+    table.foreach {
+      case r: UnresolvedCatalogRelation =>
+        return Some(r.tableMeta.schema)
+      case d: DataSourceV2Relation =>
+        return Some(CatalogV2Util.v2ColumnsToStructType(d.table.columns()))
+      case r: UnresolvedRelation if !r.skipSchemaResolution =>
+        val resolved = resolveRelations(r)
+        resolved.collectFirst {
+          case u: UnresolvedCatalogRelation =>

Review Comment:
   > We already check for UnresolvedCatalogRelation above... what happens if 
   resolveRelations returns one of the other types of interest, like 
DataSourceV2Relation?
   
   I updated this to first run the resolveRelation function and then 
pattern-match against possible candidates (including View) per your suggestion.
   
   > Also, shouldn't we somehow install the result of the resolveRelations 
call, so that we don't repeat the work later?
   
   The `ResolveRelations` rule internally uses a cache for this purpose [1], so 
we are OK.
   
   > Also... in Analyzer.scala I see that ResolveRelations runs before 
ResolveDefaultColumns -- so why would the plan here ever contain an 
UnresolvedRelation?
   
   The analyzer rules run multiple times in batches to fixed-point. This means 
that each rule in the batch runs over the logical plan, and then this process 
repeats until none of the rules have changed the plan again (which is known as 
fixed-point). So each of the rules must be prepared to encounter both 
unresolved and resolved relations, since it may be executing in the first 
iteration of the batch or otherwise a subsequent iteration. For this reason, 
Catalyst rules should generally not make any assumptions about the order in 
which they appear in the batch sequence.
   
   [1] 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1262
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor commented on a diff in pull request #40732: [SPARK-43085][SQL] Support column DEFAULT assignment for multi-part table names

Reply via email to