cloud-fan commented on code in PR #56606:
URL: https://github.com/apache/spark/pull/56606#discussion_r3445009059
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/MetadataTable.java:
##########
@@ -81,12 +82,14 @@ public Map<String, String> properties() {
@Override
public Transform[] partitioning() {
- return info.partitions();
+ // Partitioning is a table-only concept; a view wrapped as a MetadataTable
has none.
+ return info instanceof TableInfo tableInfo ? tableInfo.partitions() : new
Transform[0];
}
@Override
public Constraint[] constraints() {
- return info.constraints();
+ // Constraints are a table-only concept; a view wrapped as a MetadataTable
has none.
+ return info instanceof TableInfo tableInfo ? tableInfo.constraints() : new
Constraint[0];
Review Comment:
Thanks @aokolnychyi, this is the right instinct -- `MetadataTable` doing
double duty (delegating table *and* view carrier) is exactly the smell. I want
to go with your Option 1, with one refinement that I think makes it land with
less ceremony than the sketch.
The key realization is that **a view has no object form**. Unlike a table,
Spark never builds a view object -- it expands the query text. So `ViewInfo`
was never "metadata that gets turned into a `View`" the way `TableInfo` ->
`Table` works; the info *is* the view. That asymmetry is real and shouldn't be
papered over with a `View` wrapper the catalog has to construct. So I'd rename
`ViewInfo` -> `View` and let it *be* the loadable thing:
```java
sealed interface Relation permits Table, View {
Column[] columns();
Map<String, String> properties();
}
non-sealed interface Table extends Relation { /* object: scans, writes,
capabilities */ }
final class View implements Relation { // renamed from ViewInfo
// columns, properties, queryText, currentCatalog, sqlConfigs, schemaMode,
...
}
```
The table side keeps its two layers exactly as today -- no forced symmetry:
```java
final class TableInfo { /* partitions, constraints, provider, location, ...
*/ }
final class DelegatingTable implements Table { // renamed from
MetadataTable, table-only now
DelegatingTable(TableInfo info, String name) { ... }
}
```
`loadTableOrView` becomes `loadRelation(): Relation`. Compared with your
sketch, the catalog never constructs a `View` carrier or runs a `case
Table/case View` switch -- for a view it returns the `View` directly; for a
table it returns its own `Table` (or a `DelegatingTable` if it's metadata-only):
```scala
override def createView(ident, view: View): View = { store.put(ident, view);
view }
override def loadRelation(ident): Relation =
Option(store.get(ident)).getOrElse(throw new NoSuchTableException(ident))
```
Consequences:
- **The view is never represented as a `Table`** -- the core of your
objection. `DelegatingTable` becomes single-purpose (only ever adapts a
`TableInfo`) and is the only carrier left.
- **`RelationInfo` goes away.** Its sole purpose was being the common base
of `TableInfo` and `ViewInfo`; once `ViewInfo` becomes a loadable `View` and
`TableInfo` stays build-time metadata, they're no longer in the same layer, and
keeping both `Relation` and `RelationInfo` just reintroduces a naming clash.
The shared builder setters survive as a small impl-detail builder base.
- The model maps cleanly onto Postgres: `Relation` is the catalog entry, the
subtype is `relkind`. `loadTable`/`loadView`/`tableExists`/`viewExists` stay as
defaults that filter the `Relation` by subtype.
This is a wider rename than this followup currently does (`ViewCatalog` now
speaks `View`: `loadView`/`createView`/`replaceView`), but everything here is
`@Evolving`/unreleased, so it's free to do now and much cheaper than after
4.2.0 ships. I'll restructure the PR along these lines unless you see a problem
with dropping the `RelationInfo` base.
(Minor: `View` collides by simple name with the Catalyst logical `View` --
different package, just needs a qualified import in resolver code.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]