I don’t have a good answer for that yet. My initial motivation here is
mainly to get consensus around this:
- DSv2 should support table names through SQL and the API, and
- It should use the existing classes in the logical plan (i.e.,
TableIdentifier)
To contrast, I think Wenchen is
I am definitely in favor of first-class / consistent support for tables and
data sources.
One thing that is not clear to me from this proposal is exactly what the
interfaces are between:
- Spark
- A (The?) metastore
- A data source
If we pass in the table identifier is the data source then
There are two main ways to load tables in Spark: by name (db.table) and by
a path. Unfortunately, the integration for DataSourceV2 has no support for
identifying tables by name.
I propose supporting the use of TableIdentifier, which is the standard way
to pass around table names.
The reason I