[GitHub] spark pull request: SPARK-14459: Detect relation partitioning and ...

rdblue Thu, 07 Apr 2016 09:56:35 -0700

GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/12239


    SPARK-14459: Detect relation partitioning and adjust the logical plan

    ## What changes were proposed in this pull request?
    
    This detects a relation's partitioning and adds checks to the analyzer.
    If an InsertIntoTable node has no partitioning, it is replaced by the
    relation's partition scheme and input columns are correctly adjusted,
    placing the partition columns at the end in partition order. If an
    InsertIntoTable node has partitioning, it is checked against the table's
    reported partitions.
    
    These changes required adding a PartitionedRelation trait to the catalog
    interface because Hive's MetastoreRelation doesn't extend
    CatalogRelation.
    
    This commit also includes a fix to InsertIntoTable's resolved logic,
    which now detects that all expected columns are present, including
    dynamic partition columns. Previously, the number of expected columns
    was not checked and resolved was true if there were missing columns.
    
    ## How was this patch tested?
    
    This adds new tests to the InsertIntoTableSuite that are fixed by this PR.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark 
SPARK-14459-detect-hive-partitioning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12239
    
----
commit c0f38a2c18bb5f3e1c4255556b6983cf02acb3da
Author: Ryan Blue <[email protected]>
Date:   2016-04-01T19:02:59Z

    SPARK-14459: Detect relation partitioning and adjust the logical plan to 
match.
    
    This detects a relation's partitioning and adds checks to the analyzer.
    If an InsertIntoTable node has no partitioning, it is replaced by the
    relation's partition scheme and input columns are correctly adjusted,
    placing the partition columns at the end in partition order. If an
    InsertIntoTable node has partitioning, it is checked against the table's
    reported partitions.
    
    These changes required adding a PartitionedRelation trait to the catalog
    interface because Hive's MetastoreRelation doesn't extend
    CatalogRelation.
    
    This commit also includes a fix to InsertIntoTable's resolved logic,
    which now detects that all expected columns are present, including
    dynamic partition columns. Previously, the number of expected columns
    was not checked and resolved was true if there were missing columns.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: SPARK-14459: Detect relation partitioning and ...

Reply via email to