Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15024#discussion_r85079470
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
---
@@ -62,6 +62,7 @@ case class CreateDataSourceTableCommand(table:
CatalogTable, ignoreIfExists: Boo
sparkSession = sparkSession,
userSpecifiedSchema = if (table.schema.isEmpty) None else
Some(table.schema),
className = table.provider.get,
+ paths = table.storage.locationUri.toSeq,
bucketSpec = table.bucketSpec,
options = table.storage.properties).resolveRelation()
--- End diff --
I have fixed it, with a better semantic.
Previously, although we keep the `path` option, it changes after some table
operations, e.g. `SET LOCATION`, `RENAME TABLE`, so actually we can't use
`path` as data source options, as it may get changed unexpectedly.
Now, we've decoupled the `path` option and table location. We infer the
table location from `path` option at the beginning, then they are just 2
different fields, `SET LOCATION` won't affect the `path` option but only table
location. I have updated the PR description with detailed rules about it.
One drawback is, I have to change the semantic of `DataSource.options` a
little. The `path` option should only take effect if the `paths` is empty. This
means, `reader.option("path", path1).parquet(path2, path3)` will break as the
`path1` is ignored. However, I don't think this is a reasonable use case and it
seems fine to break it.
cc @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]