[GitHub] spark pull request #18994: [SPARK-21784][SQL] Adds support for defining info...

sureshthalamati Fri, 18 Aug 2017 09:49:08 -0700

GitHub user sureshthalamati opened a pull request:

    https://github.com/apache/spark/pull/18994


    [SPARK-21784][SQL] Adds support for defining information primary key and 
foreign key constraints using ALTER TABLE DDL.

    ## What changes were proposed in this pull request?
    This PR implements ALTER TABLE DDL ADD CONSTRAINT  to  add  informational 
primary key and foreign key (referential integrity) constraints in Spark. These 
constraints will be used in query optimization and you can find more details 
about this in the spec in  
[SPARK-19842](https://issues.apache.org/jira/browse/SPARK-19842)
     
    The proposed syntax of the constraints DDL is similar to the Hive  2.1 
referential integrity constraints  support 
(https://issues.apache.org/jira/browse/HIVE-13076) which is aligned to Oracle's 
semantics.
     
    Syntax:
    ```sql
    ALTER TABLE [db_name.]table_name ADD [CONSTRAINT constraintName]
      (PRIMARY KEY (col_names) |
      FOREIGN KEY (col_names) REFERENCES [db_name.]table_name [(col_names)])
      [VALIDATE | NOVALIDATE] [RELY | NORELY]
    ```
    Examples :
     ```sql
    ALTER TABLE employee ADD CONSTRANT pk  PRIMARY KEY(empno) VALIDATE RELY
    ALTER TABLE department ADD CONSTRAINT emp_fk FOREIGN KEY (mgrno) REFERENCES 
employee(empno) NOVALIDATE NORELY
    ALTER TABLE department ADD PRIMARY KEY(deptno) VALIDATE RELY
    ALTER TABLE employee ADD FOREIGN KEY (workdept) REFERENCES 
department(deptno) VALIDATE RELY;
     ```
    The constraint information is stored in the table properties as JSON string 
for each constraint. 
    One of the advantages of  storing constraints in the table properties is 
that this functionality  will work in all the supported Hive metastore 
versions. 
     
    An alternative approach  that we considered was to store the constraints 
information using the hive metastore API that stores the constraints in a 
separate table.  The problem with this approach is this feature will only work 
in Spark installations that use **Hive 2.1 metastore**, and also this version 
is NOT the current  spark default.  More details are in the spec document.
     
    **This PR implements the  ALTER TABLE constraint DDL using table properties 
because it is important to work with default hive metastore version of the 
spark.** 
     
    The syntax to define the  constraints as part of _create table_  definition 
will be implemented in a follow-up Jira.
    
    
    ## How was this patch tested?
    Added  new unit test cases to HiveDDLSuite, and SparkSqlParserSuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sureshthalamati/spark 
alter_add_pk_fk_SPARK-21784

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18994
    
----
commit 4839e8419ca7360f0feafeceec8f3832102e3dba
Author: sureshthalamati <[email protected]>
Date:   2017-08-18T08:39:12Z

    [SPARK-21784][SQL] Adds alter table add constraint DDL support to allow 
users define informational primary key and foreign key constraints on a table.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18994: [SPARK-21784][SQL] Adds support for defining info...

Reply via email to