GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/13386

    [SPARK-14507] [SPARK-15646] [SQL] When spark.sql.hive.convertCTAS is true, 
we should not convert the table stored as TEXTFILE/SEQUENCEFILE and we need 
respect the user-defined location

    ## What changes were proposed in this pull request?
    When `spark.sql.hive.convertCTAS` is true, for a CTAS statement, we will 
create a data source table using the default source (i.e. parquet) if the CTAS 
does not specify any Hive storage format. However, there are two issues with 
this conversion logic.
    1. First, we determine if a CTAS statement defines storage format by 
checking the serde. However, TEXTFILE/SEQUENCEFILE does not have a default 
serde. When we do the check, we have not set the default serde. So, a query 
like `CREATE TABLE abc STORED AS TEXTFILE AS SELECT ...` actually creates a 
data source parquet table.
    2. In the conversion logic, we are ignoring the user-specified location.
    
    This PR fixes the above two issues.
    
    ## How was this patch tested?
    I am adding new tests in SQLQuerySuite.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark SPARK-14507

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13386
    
----
commit f613d9e6687ac306f54d2e82276561ec5eb6a1ac
Author: Yin Huai <[email protected]>
Date:   2016-05-28T23:18:49Z

    test cases

commit 1e22d53cb2089841cc4dba4dd71066bb9915c9d6
Author: Yin Huai <[email protected]>
Date:   2016-05-28T23:19:45Z

    Move the conversion logic to the parser.

commit 2615f676844d19b33552f60cd2849522d5564360
Author: Yin Huai <[email protected]>
Date:   2016-05-28T23:32:05Z

    Update tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to