[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

cloud-fan Wed, 26 Oct 2016 02:12:08 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15024#discussion_r85079470
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
    @@ -62,6 +62,7 @@ case class CreateDataSourceTableCommand(table: 
CatalogTable, ignoreIfExists: Boo
             sparkSession = sparkSession,
             userSpecifiedSchema = if (table.schema.isEmpty) None else 
Some(table.schema),
             className = table.provider.get,
    +        paths = table.storage.locationUri.toSeq,
             bucketSpec = table.bucketSpec,
             options = table.storage.properties).resolveRelation()
    --- End diff --
    
    I have fixed it, with a better semantic.
    
    Previously, although we keep the `path` option, it changes after some table 
operations, e.g. `SET LOCATION`, `RENAME TABLE`, so actually we can't use 
`path` as data source options, as it may get changed unexpectedly.
    
    Now, we've decoupled the `path` option and table location. We infer the 
table location from `path` option at the beginning, then they are just 2 
different fields, `SET LOCATION` won't affect the `path` option but only table 
location. I have updated the PR description with detailed rules about it.
    
    One drawback is, I have to change the semantic of `DataSource.options` a 
little. The `path` option should only take effect if the `paths` is empty. This 
means, `reader.option("path", path1).parquet(path2, path3)` will break as the 
`path1` is ignored. However, I don't think this is a reasonable use case and it 
seems fine to break it.
    
    cc @yhuai




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

Reply via email to