Re: [PR] [SPARK-54119] Support METRIC_VIEW creation on V2 catalogs [spark]

via GitHub Fri, 01 May 2026 07:45:13 -0700


cloud-fan commented on code in PR #55487:
URL: https://github.com/apache/spark/pull/55487#discussion_r3173613801



##########
sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##########
@@ -75,6 +76,20 @@ case class StructField(
       ("metadata" -> metadataJson)
   }
 
+  /**
+   * The compact JSON representation of this StructField, including name, 
type, nullable, and
+   * metadata. Inverse of [[StructField.fromJson]].
+   *
+   * @since 4.2.0
+   */
+  def json: String = compact(render(jsonValue))

Review Comment:
   The most recent commit (`d55143f5b6184d2638e7fde4dae1fc29040b9509`) adds 
three new public APIs (`StructField.json`, `StructField.fromJson`, 
`Column.fromStructField`) and widens `DataType.parseStructField` from `private` 
to `private[sql]`. None of them are called by code in this PR — verified by 
grep, the only call sites are the tests added in the same commit. Per the 
commit message they exist to support a downstream Unity Catalog connector.
   
   Public API additions to `org.apache.spark.sql.types` and 
`org.apache.spark.sql.connector.catalog` warrant a dedicated PR with its own 
JIRA ticket so they can be reviewed independently of metric-view support. Could 
you split this commit out?



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java:
##########
@@ -89,6 +91,21 @@ static Column create(
             null, identityColumnSpec, metadataInJSON);
   }
 
+  /**
+   * Creates a {@link Column} from a Spark {@link StructField}, preserving 
name, dataType,
+   * nullable, comment, and the field metadata as a JSON string. Fields with 
empty metadata
+   * map to a column with a {@code null} {@link #metadataInJSON()}.
+   *
+   * @since 4.2.0
+   */
+  static Column fromStructField(StructField field) {
+    String comment = field.getComment().isDefined() ? field.getComment().get() 
: null;
+    String metadataJson = field.metadata().equals(Metadata.empty())
+        ? null
+        : field.metadata().json();

Review Comment:
   This diverges from the canonical inverse 
`CatalogV2Util.structFieldToV2Column` (CatalogV2Util.scala:762), which strips 
the `"comment"` key from `f.metadata().json()` before stamping it as 
`metadataInJSON` — because the comment is exposed separately via 
`Column.comment()`.
   
   Here, when `field` was built with `withComment(...)`, the `"comment"` key 
sits inside `field.metadata()` and gets written into `metadataInJSON` as well, 
while also being lifted into `comment`. Net effect: the comment is duplicated. 
Round-trip via `v2ColumnToStructField` happens to recover, but any downstream 
consumer reading `metadataInJSON` will see an unexpected `"comment"` entry.
   
   Mirror `structFieldToV2Column` and strip `"comment"` before serializing — 
using a `MetadataBuilder` to drop the key, then null out the result if it 
becomes empty. That also matches the existing function's behavior of returning 
`null` `metadataInJSON` for fields whose only metadata was the comment.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala:
##########
@@ -232,18 +232,20 @@ case class DropTableCommand(
       // If the command DROP VIEW is to drop a table or DROP TABLE is to drop 
a view
       // issue an exception.
       catalog.getTableMetadata(tableName).tableType match {
-        case CatalogTableType.VIEW if !isView =>
+        // Both VIEW and METRIC_VIEW are conceptually views and must be 
dropped via DROP VIEW.
+        case CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW if !isView =>
           throw QueryCompilationErrors.wrongCommandForObjectTypeError(
             operation = "DROP TABLE",
             requiredType = s"${CatalogTableType.EXTERNAL.name} or 
${CatalogTableType.MANAGED.name}",
             objectName = catalog.getTableMetadata(tableName).qualifiedName,
             foundType = catalog.getTableMetadata(tableName).tableType.name,
             alternative = "DROP VIEW"
           )
-        case o if o != CatalogTableType.VIEW && isView =>
+        case o if o != CatalogTableType.VIEW && o != 
CatalogTableType.METRIC_VIEW && isView =>

Review Comment:
   `verifyAlterTableType` at line 1086 of this file was missed in the audit. 
With `tableType == METRIC_VIEW`:
   
   - `ALTER TABLE <metric_view> RENAME TO ...` falls through both arms (line 
1092 `case CatalogTableType.VIEW if !isView` and line 1096 `case o if o != 
CatalogTableType.VIEW && isView`) and silently renames the metric view as if it 
were a regular table — regression vs. the pre-PR `viewWithMetrics` storage 
where the same call threw `cannotAlterViewWithAlterTableError`.
   - `ALTER VIEW <metric_view> RENAME TO ...` matches `case o if o != 
CatalogTableType.VIEW && isView` and throws 
`cannotAlterTableWithAlterViewError` ("cannot alter table with alter view") — 
wrong category for a view-kind.
   
   Same fix pattern as the DROP audit on this hunk: widen both cases at line 
1092 / 1096 to also accept `METRIC_VIEW`.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala:
##########
@@ -323,6 +324,33 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
       CreateV2ViewExec(catalog.asInstanceOf[ViewCatalog], ident, 
userSpecifiedColumns, comment,
         collation, properties, sqlText, child, allowExisting, replace, 
viewSchemaMode) :: Nil
 
+    // CREATE VIEW ... WITH METRICS on a non-session v2 catalog. Routes the 
metric-view path
+    // through `CreateV2MetricViewExec`, which extends `V2ViewPreparation` to 
share the
+    // `IF NOT EXISTS` short-circuit, `OR REPLACE`, and cross-type-collision 
decoding with
+    // `CreateV2ViewExec`. Session-catalog dispatch stays in 
`CreateMetricViewCommand.run`.
+    case CreateMetricViewCommand(
+        ResolvedIdentifier(catalog, ident), userSpecifiedColumns, comment, 
properties,
+        originalText, allowExisting, replace) if 
!CatalogV2Util.isSessionCatalog(catalog) =>
+      val viewCatalog = catalog match {
+        case vc: ViewCatalog => vc
+        case _ => throw 
QueryCompilationErrors.missingCatalogViewsAbilityError(catalog)
+      }
+      // Parse + analyze the YAML body here (during planning). This mirrors 
the v1 path's
+      // late analysis in `CreateMetricViewCommand.run` -- the metric-view 
source plan is not
+      // a SQL string, so it can't ride along as a regular `query` 
`LogicalPlan` field on the
+      // logical command the way `CreateView` does.
+      val analyzed = MetricViewHelper.analyzeMetricViewText(
+        session, ident.asTableIdentifier, originalText)

Review Comment:
   `ident.asTableIdentifier` throws `requiresSinglePartNamespaceError` whenever 
`ident` has a multi-level namespace. Plain v2 views (`CreateV2ViewExec`, this 
file:324) carry `Identifier` directly and support multi-level namespace 
targets, but v2 metric views fail here because 
`MetricViewHelper.analyzeMetricViewText` requires a `TableIdentifier`.
   
   The dependency-extraction multi-level test only exercises the *source* 
(`testcat.ns_a.ns_b.events_deep`); the metric view's own ident is always 
single-level (`testcat.ns.mv`) — a CREATE at `cat.db1.db2.mv` would fail here. 
`analyzeMetricViewText` only uses the name to (a) build a synthetic 
`CatalogTable` for analysis context and (b) feed `name.nameParts` into 
`verifyTemporaryObjectsNotExists`, both of which could take `Seq[String]` 
directly.
   
   Either lift the helper to multi-part nameParts or call out the 
single-level-namespace constraint as a known limitation of the v2 metric-view 
path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54119] Support METRIC_VIEW creation on V2 catalogs [spark]

Reply via email to