cloud-fan commented on code in PR #55487:
URL: https://github.com/apache/spark/pull/55487#discussion_r3173613801
##########
sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##########
@@ -75,6 +76,20 @@ case class StructField(
("metadata" -> metadataJson)
}
+ /**
+ * The compact JSON representation of this StructField, including name,
type, nullable, and
+ * metadata. Inverse of [[StructField.fromJson]].
+ *
+ * @since 4.2.0
+ */
+ def json: String = compact(render(jsonValue))
Review Comment:
The most recent commit (`d55143f5b6184d2638e7fde4dae1fc29040b9509`) adds
three new public APIs (`StructField.json`, `StructField.fromJson`,
`Column.fromStructField`) and widens `DataType.parseStructField` from `private`
to `private[sql]`. None of them are called by code in this PR — verified by
grep, the only call sites are the tests added in the same commit. Per the
commit message they exist to support a downstream Unity Catalog connector.
Public API additions to `org.apache.spark.sql.types` and
`org.apache.spark.sql.connector.catalog` warrant a dedicated PR with its own
JIRA ticket so they can be reviewed independently of metric-view support. Could
you split this commit out?
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java:
##########
@@ -89,6 +91,21 @@ static Column create(
null, identityColumnSpec, metadataInJSON);
}
+ /**
+ * Creates a {@link Column} from a Spark {@link StructField}, preserving
name, dataType,
+ * nullable, comment, and the field metadata as a JSON string. Fields with
empty metadata
+ * map to a column with a {@code null} {@link #metadataInJSON()}.
+ *
+ * @since 4.2.0
+ */
+ static Column fromStructField(StructField field) {
+ String comment = field.getComment().isDefined() ? field.getComment().get()
: null;
+ String metadataJson = field.metadata().equals(Metadata.empty())
+ ? null
+ : field.metadata().json();
Review Comment:
This diverges from the canonical inverse
`CatalogV2Util.structFieldToV2Column` (CatalogV2Util.scala:762), which strips
the `"comment"` key from `f.metadata().json()` before stamping it as
`metadataInJSON` — because the comment is exposed separately via
`Column.comment()`.
Here, when `field` was built with `withComment(...)`, the `"comment"` key
sits inside `field.metadata()` and gets written into `metadataInJSON` as well,
while also being lifted into `comment`. Net effect: the comment is duplicated.
Round-trip via `v2ColumnToStructField` happens to recover, but any downstream
consumer reading `metadataInJSON` will see an unexpected `"comment"` entry.
Mirror `structFieldToV2Column` and strip `"comment"` before serializing —
using a `MetadataBuilder` to drop the key, then null out the result if it
becomes empty. That also matches the existing function's behavior of returning
`null` `metadataInJSON` for fields whose only metadata was the comment.
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala:
##########
@@ -232,18 +232,20 @@ case class DropTableCommand(
// If the command DROP VIEW is to drop a table or DROP TABLE is to drop
a view
// issue an exception.
catalog.getTableMetadata(tableName).tableType match {
- case CatalogTableType.VIEW if !isView =>
+ // Both VIEW and METRIC_VIEW are conceptually views and must be
dropped via DROP VIEW.
+ case CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW if !isView =>
throw QueryCompilationErrors.wrongCommandForObjectTypeError(
operation = "DROP TABLE",
requiredType = s"${CatalogTableType.EXTERNAL.name} or
${CatalogTableType.MANAGED.name}",
objectName = catalog.getTableMetadata(tableName).qualifiedName,
foundType = catalog.getTableMetadata(tableName).tableType.name,
alternative = "DROP VIEW"
)
- case o if o != CatalogTableType.VIEW && isView =>
+ case o if o != CatalogTableType.VIEW && o !=
CatalogTableType.METRIC_VIEW && isView =>
Review Comment:
`verifyAlterTableType` at line 1086 of this file was missed in the audit.
With `tableType == METRIC_VIEW`:
- `ALTER TABLE <metric_view> RENAME TO ...` falls through both arms (line
1092 `case CatalogTableType.VIEW if !isView` and line 1096 `case o if o !=
CatalogTableType.VIEW && isView`) and silently renames the metric view as if it
were a regular table — regression vs. the pre-PR `viewWithMetrics` storage
where the same call threw `cannotAlterViewWithAlterTableError`.
- `ALTER VIEW <metric_view> RENAME TO ...` matches `case o if o !=
CatalogTableType.VIEW && isView` and throws
`cannotAlterTableWithAlterViewError` ("cannot alter table with alter view") —
wrong category for a view-kind.
Same fix pattern as the DROP audit on this hunk: widen both cases at line
1092 / 1096 to also accept `METRIC_VIEW`.
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala:
##########
@@ -323,6 +324,33 @@ class DataSourceV2Strategy(session: SparkSession) extends
Strategy with Predicat
CreateV2ViewExec(catalog.asInstanceOf[ViewCatalog], ident,
userSpecifiedColumns, comment,
collation, properties, sqlText, child, allowExisting, replace,
viewSchemaMode) :: Nil
+ // CREATE VIEW ... WITH METRICS on a non-session v2 catalog. Routes the
metric-view path
+ // through `CreateV2MetricViewExec`, which extends `V2ViewPreparation` to
share the
+ // `IF NOT EXISTS` short-circuit, `OR REPLACE`, and cross-type-collision
decoding with
+ // `CreateV2ViewExec`. Session-catalog dispatch stays in
`CreateMetricViewCommand.run`.
+ case CreateMetricViewCommand(
+ ResolvedIdentifier(catalog, ident), userSpecifiedColumns, comment,
properties,
+ originalText, allowExisting, replace) if
!CatalogV2Util.isSessionCatalog(catalog) =>
+ val viewCatalog = catalog match {
+ case vc: ViewCatalog => vc
+ case _ => throw
QueryCompilationErrors.missingCatalogViewsAbilityError(catalog)
+ }
+ // Parse + analyze the YAML body here (during planning). This mirrors
the v1 path's
+ // late analysis in `CreateMetricViewCommand.run` -- the metric-view
source plan is not
+ // a SQL string, so it can't ride along as a regular `query`
`LogicalPlan` field on the
+ // logical command the way `CreateView` does.
+ val analyzed = MetricViewHelper.analyzeMetricViewText(
+ session, ident.asTableIdentifier, originalText)
Review Comment:
`ident.asTableIdentifier` throws `requiresSinglePartNamespaceError` whenever
`ident` has a multi-level namespace. Plain v2 views (`CreateV2ViewExec`, this
file:324) carry `Identifier` directly and support multi-level namespace
targets, but v2 metric views fail here because
`MetricViewHelper.analyzeMetricViewText` requires a `TableIdentifier`.
The dependency-extraction multi-level test only exercises the *source*
(`testcat.ns_a.ns_b.events_deep`); the metric view's own ident is always
single-level (`testcat.ns.mv`) — a CREATE at `cat.db1.db2.mv` would fail here.
`analyzeMetricViewText` only uses the name to (a) build a synthetic
`CatalogTable` for analysis context and (b) feed `name.nameParts` into
`verifyTemporaryObjectsNotExists`, both of which could take `Seq[String]`
directly.
Either lift the helper to multi-part nameParts or call out the
single-level-namespace constraint as a known limitation of the v2 metric-view
path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]