rdblue commented on a change in pull request #30554:
URL: https://github.com/apache/spark/pull/30554#discussion_r533833283
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##########
@@ -657,14 +659,8 @@ class ResolveSessionCatalog(
comment: Option[String],
storageFormat: CatalogStorageFormat,
external: Boolean): CatalogTable = {
- if (external) {
- if (DDLUtils.isHiveTable(Some(provider))) {
- if (location.isEmpty) {
- throw new AnalysisException(s"CREATE EXTERNAL TABLE must be
accompanied by LOCATION")
- }
- } else {
- throw new AnalysisException(s"Operation not allowed: CREATE EXTERNAL
TABLE ... USING")
- }
+ if (external && location.isEmpty) {
+ throw new AnalysisException(s"CREATE EXTERNAL TABLE must be accompanied
by LOCATION")
Review comment:
This PR adds support for `EXTERNAL` with data source tables, but only
for tables with the `LOCATION` clause. I have no problem with adding that
support, or for enforcing the `LOCATION` restriction in Spark's catalog.
The problem is that it is done in the analyzer rather than in the catalog
implementation, so the `LOCATION` clause restriction is applied before calling
a v2 extension to the session catalog.
We've now had several problems with behavior mismatches between the session
catalog and other v2 catalogs, so I think that support for `EXTERNAL` should be
added properly: pass the flag correctly to the v2 extension and enforce the
`LOCATION` restriction in the catalog rather than in the analysis rules. I'd
also like to have tests for the newly supported SQL cases to validate that
behavior.
I know that the reason behind this was to avoid test failures, but I think
only partially adding this feature means that it isn't done right: it
introduces a behavior difference between session and other v2 catalogs, and
would be missing tests.
If you want to update this to add the `LOCATION` restriction in the session
catalog, pass the v2 property correctly to catalog extensions, and add tests,
then I'm okay with this change. However, then we would be looking at a
significantly larger PR, so it is unlikely that this should be done at the same
time as adding configuration to use data source tables by default.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]