Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21306
@cloud-fan, thanks for the thorough feedback!
> What catalog operations we want to forward to the data source catalog?
Currently it's create/drop/alter table, I think it's good enough for now.
This PR introduces create, drop, and alter. We can always add more later.
These are the ones that we need to implement DataSourceV2 operations and DDL
support.
> Spark provides an API so that end-users can do it directly. e.g.
`spark.catalog("iceberge").createTable(...)`, or SQL API `CREATE TABLE
iceberge.db1.tbl1 . . .`
These two are the easiest and least intrusive way to start because the data
source catalog interaction is explicitly tied to a catalog. It also matches the
behavior used by other systems for multiple catalogs. I think this is what we
should start with and then tackle ideas like your second point.
> When creating/dropping/altering Spark tables, also forward it to the data
source catalog. . .
For this and a couple other questions, I don't think we need to decide
right now. This PR is about getting the interface for other sources in Spark.
We don't necessarily need to know all of the ways that users will call it or
interact with it, like how `DESC TABLE` will work.
To your question here, I'm not sure whether the `CREATE TABLE ... USING
source` syntax should use the default catalog or defer to the catalog for
`source` or forward to both, but that doesn't need to block adding this API
because I think we can decide it later. In addition, we should probably discuss
this on the dev list to make sure we get the behavior right.
> How to lookup the table metadata from data source catalog?
The SPIP proposes two catalog interfaces that return `Table`. One that uses
table identifiers and one that uses paths. Data sources can implement support
for both or just one. This PR includes just the support for table identifiers.
We would add a similar API for path-based tables in another PR.
> How to define table metadata? Maybe we can forward `DESC TABLE` . . .
That sounds like a reasonable idea to me. Like the behavior of `USING`, I
don't think this is something that we have to decide right now. We can add
support later as we implement table DDL. Maybe `Table` should return a DF that
is its `DESCRIBE` output.
> How does the table metadata involve in data reading/writing?
This is another example of something we don't need to decide yet. We have a
couple different options for the behavior and will want to think them through
and discuss them on the dev list. But I don't think that the behavior
necessarily needs to be decided before we add this API to sources.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]