GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/12239
SPARK-14459: Detect relation partitioning and adjust the logical plan
## What changes were proposed in this pull request?
This detects a relation's partitioning and adds checks to the analyzer.
If an InsertIntoTable node has no partitioning, it is replaced by the
relation's partition scheme and input columns are correctly adjusted,
placing the partition columns at the end in partition order. If an
InsertIntoTable node has partitioning, it is checked against the table's
reported partitions.
These changes required adding a PartitionedRelation trait to the catalog
interface because Hive's MetastoreRelation doesn't extend
CatalogRelation.
This commit also includes a fix to InsertIntoTable's resolved logic,
which now detects that all expected columns are present, including
dynamic partition columns. Previously, the number of expected columns
was not checked and resolved was true if there were missing columns.
## How was this patch tested?
This adds new tests to the InsertIntoTableSuite that are fixed by this PR.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-14459-detect-hive-partitioning
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12239.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12239
----
commit c0f38a2c18bb5f3e1c4255556b6983cf02acb3da
Author: Ryan Blue <[email protected]>
Date: 2016-04-01T19:02:59Z
SPARK-14459: Detect relation partitioning and adjust the logical plan to
match.
This detects a relation's partitioning and adds checks to the analyzer.
If an InsertIntoTable node has no partitioning, it is replaced by the
relation's partition scheme and input columns are correctly adjusted,
placing the partition columns at the end in partition order. If an
InsertIntoTable node has partitioning, it is checked against the table's
reported partitions.
These changes required adding a PartitionedRelation trait to the catalog
interface because Hive's MetastoreRelation doesn't extend
CatalogRelation.
This commit also includes a fix to InsertIntoTable's resolved logic,
which now detects that all expected columns are present, including
dynamic partition columns. Previously, the number of expected columns
was not checked and resolved was true if there were missing columns.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]