dtenedor commented on code in PR #40732:
URL: https://github.com/apache/spark/pull/40732#discussion_r1165858639
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala:
##########
@@ -550,45 +551,24 @@ case class ResolveDefaultColumns(catalog: SessionCatalog)
extends Rule[LogicalPl
* Returns the schema for the target table of a DML command, looking into
the catalog if needed.
*/
private def getSchemaForTargetTable(table: LogicalPlan): Option[StructType]
= {
- // First find the source relation. Note that we use 'collectFirst' to
descend past any
- // SubqueryAlias nodes that may be present.
- val source: Option[LogicalPlan] = table.collectFirst {
- case r: NamedRelation if !r.skipSchemaResolution =>
- // Here we only resolve the default columns in the tables that require
schema resolution
- // during write operations.
- r
- case r: UnresolvedCatalogRelation => r
- }
- // Check if the target table is already resolved. If so, return the
computed schema.
- source.foreach { r =>
- if (r.schema.fields.nonEmpty) {
- return Some(r.schema)
- }
- }
- // Lookup the relation from the catalog by name. This either succeeds or
returns some "not
- // found" error. In the latter cases, return out of this rule without
changing anything and let
- // the analyzer return a proper error message elsewhere.
- val tableName: TableIdentifier = source match {
- case Some(r: UnresolvedRelation) => TableIdentifier(r.name)
- case Some(r: UnresolvedCatalogRelation) => r.tableMeta.identifier
- case _ => return None
- }
- // First try to get the table metadata directly. If that fails, check for
views below.
- if (catalog.tableExists(tableName)) {
- return Some(catalog.getTableMetadata(tableName).schema)
- }
- val lookup: LogicalPlan = try {
- catalog.lookupRelation(tableName)
- } catch {
- case _: AnalysisException => return None
- }
- lookup match {
- case SubqueryAlias(_, r: UnresolvedCatalogRelation) =>
- Some(r.tableMeta.schema)
- case SubqueryAlias(_, r: View) if r.isTempView =>
- Some(r.desc.schema)
- case _ => None
+ table.foreach {
+ case r: UnresolvedCatalogRelation =>
+ return Some(r.tableMeta.schema)
+ case d: DataSourceV2Relation =>
+ return Some(CatalogV2Util.v2ColumnsToStructType(d.table.columns()))
+ case r: UnresolvedRelation if !r.skipSchemaResolution =>
+ val resolved = resolveRelations(r)
+ resolved.collectFirst {
+ case u: UnresolvedCatalogRelation =>
Review Comment:
> We already check for UnresolvedCatalogRelation above... what happens if
resolveRelations returns one of the other types of interest, like
DataSourceV2Relation?
I updated this to first run the resolveRelation function and then
pattern-match against possible candidates (including View) per your suggestion.
> Also, shouldn't we somehow install the result of the resolveRelations
call, so that we don't repeat the work later?
The `ResolveRelations` rule internally uses a cache for this purpose [1], so
we are OK.
> Also... in Analyzer.scala I see that ResolveRelations runs before
ResolveDefaultColumns -- so why would the plan here ever contain an
UnresolvedRelation?
The analyzer rules run multiple times in batches to fixed-point. This means
that each rule in the batch runs over the logical plan, and then this process
repeats until none of the rules have changed the plan again (which is known as
fixed-point). So each of the rules must be prepared to encounter both
unresolved and resolved relations, since it may be executing in the first
iteration of the batch or otherwise a subsequent iteration. For this reason,
Catalyst rules should generally not make any assumptions about the order in
which they appear in the batch sequence.
[1]
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1262
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]