GitHub user sureshthalamati opened a pull request:
https://github.com/apache/spark/pull/18994
[SPARK-21784][SQL] Adds support for defining information primary key and
foreign key constraints using ALTER TABLE DDL.
## What changes were proposed in this pull request?
This PR implements ALTER TABLE DDL ADD CONSTRAINT to add informational
primary key and foreign key (referential integrity) constraints in Spark. These
constraints will be used in query optimization and you can find more details
about this in the spec in
[SPARK-19842](https://issues.apache.org/jira/browse/SPARK-19842)
The proposed syntax of the constraints DDL is similar to the Hive 2.1
referential integrity constraints support
(https://issues.apache.org/jira/browse/HIVE-13076) which is aligned to Oracle's
semantics.
Syntax:
```sql
ALTER TABLE [db_name.]table_name ADD [CONSTRAINT constraintName]
(PRIMARY KEY (col_names) |
FOREIGN KEY (col_names) REFERENCES [db_name.]table_name [(col_names)])
[VALIDATE | NOVALIDATE] [RELY | NORELY]
```
Examples :
```sql
ALTER TABLE employee ADD CONSTRANT pk PRIMARY KEY(empno) VALIDATE RELY
ALTER TABLE department ADD CONSTRAINT emp_fk FOREIGN KEY (mgrno) REFERENCES
employee(empno) NOVALIDATE NORELY
ALTER TABLE department ADD PRIMARY KEY(deptno) VALIDATE RELY
ALTER TABLE employee ADD FOREIGN KEY (workdept) REFERENCES
department(deptno) VALIDATE RELY;
```
The constraint information is stored in the table properties as JSON string
for each constraint.
One of the advantages of storing constraints in the table properties is
that this functionality will work in all the supported Hive metastore
versions.
An alternative approach that we considered was to store the constraints
information using the hive metastore API that stores the constraints in a
separate table. The problem with this approach is this feature will only work
in Spark installations that use **Hive 2.1 metastore**, and also this version
is NOT the current spark default. More details are in the spec document.
**This PR implements the ALTER TABLE constraint DDL using table properties
because it is important to work with default hive metastore version of the
spark.**
The syntax to define the constraints as part of _create table_ definition
will be implemented in a follow-up Jira.
## How was this patch tested?
Added new unit test cases to HiveDDLSuite, and SparkSqlParserSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sureshthalamati/spark
alter_add_pk_fk_SPARK-21784
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18994.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18994
----
commit 4839e8419ca7360f0feafeceec8f3832102e3dba
Author: sureshthalamati <[email protected]>
Date: 2017-08-18T08:39:12Z
[SPARK-21784][SQL] Adds alter table add constraint DDL support to allow
users define informational primary key and foreign key constraints on a table.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]