GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/15996
[SPARK-18567][SQL][WIP] Simplify CreateDataSourceTableAsSelectCommand
## What changes were proposed in this pull request?
The `CreateDataSourceTableAsSelectCommand` is quite complex now, as it has
a lot of work to do if the table already exists:
1. throw exception if we don't want to ignore it.
2. do some check and adjust the schema if we want to append data.
3. drop the table and create it again if we want to overwrite.
The work 2 and 3 are required by `DataFrameWriter` only, I think it's more
reasonable to put them in `DataFrameWriter`, to simplify
`CreateDataSourceTableAsSelectCommand`. Then `saveAsTable` can work with hive
table in append mode.
Behaviour changes:
1. Previously we will throw an exception if the provider given by
`DataFrameWriter` doesn't match the provider of the existing provider. This is
annoying because `DataFrameWriter` use parquet provider by default, and users
have to specify the provider of the table they want to append data to. After
this PR, we will simply ignore the provider while appending data to existing
tables.(we can back the the previously behaviour if you think it makes sense)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark append
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15996.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15996
----
commit f52b364d448951cd73ddc8274957181166ddcfe9
Author: Wenchen Fan <[email protected]>
Date: 2016-11-23T16:37:04Z
remove OverwriteOptions
commit 7f90a100d8122531a3f668a0bf442883f92f98e0
Author: Wenchen Fan <[email protected]>
Date: 2016-11-23T17:41:35Z
tmp
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]