GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/11836

    [SPARK-14014] [SQL] Replace existing catalog with SessionCatalog

    ## What changes were proposed in this pull request?
    
    `SessionCatalog`, introduced in #11750, is a catalog that keeps track of 
temporary functions and tables, and delegates metastore operations to 
`ExternalCatalog`. This functionality overlaps a lot with the existing 
`analysis.Catalog`.
    
    As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be 
dead code. There are still things that need to be done after this patch, namely:
    - SPARK-14013: Properly implement temporary functions in `SessionCatalog`
    - SPARK-13879: Decide which DDL/DML commands to support natively in Spark
    - SPARK-?????: Implement the ones we do want to support through 
`SessionCatalog`.
    - SPARK-?????: Merge SQL/HiveContext
    
    ## How was this patch tested?
    
    This is largely a refactoring task so there are no new tests introduced. 
The particularly relevant tests are `SessionCatalogSuite` and 
`ExternalCatalogSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark use-session-catalog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11836
    
----
commit 9130563d025c9b3f7307c84b9b96e61a1f18091b
Author: Andrew Or <[email protected]>
Date:   2016-03-16T22:50:36Z

    Squashed commit of the following:
    
    commit ad43a5ffdeeb881aaed8944971b63a27d1f4257f
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 14:35:02 2016 -0700
    
        Expand test scope + clean up test code
    
    commit 08969cdcaf8196a30a3c879f956a8386fe400695
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 13:21:50 2016 -0700
    
        Fix tests
    
    commit 6d9fa2f946ac93ebc95a9f25cf515fb0ea54b17c
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 12:31:52 2016 -0700
    
        Keep track of current database in SessionCatalog
    
        This allows us to not pass it into every single method like
        we used to before this commit.
    
    commit ff1c2c4661986622e8071a39922e25033b3e62ab
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:42:22 2016 -0700
    
        Add TODO
    
    commit 8c84dd803829ffcb8c82ee2f593ef58c3c5c94c9
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:41:30 2016 -0700
    
        Implement tests for functions
    
    commit 3da16fb3473b750f13ffcbbb8aaf9a7de7292897
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:04:03 2016 -0700
    
        Implement tests for table partitions
    
    commit 794744565269bb9ffb00f8d7a81d7b703251f956
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 18:52:30 2016 -0700
    
        Implement tests for databases and tables
    
    commit 2f5121b43c938b2b585de0c3d80680c0ad5a8a7d
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 16:59:38 2016 -0700
    
        Fix infinite loop (woops)
    
    commit d3f252d4d21b91a22dd7277f983f84daa56d65b5
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 16:12:55 2016 -0700
    
        Refactor CatalogTestCases to make methods accessible
    
    commit caa4013e457a46ef0b8c3a2291cb375eb9064972
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:44:23 2016 -0700
    
        Clean up duplicate code in Table/FunctionIdentifier
    
    commit 90ccdbb22bd8baf8caf839047148ebfd326b3593
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:33:30 2016 -0700
    
        Fix style
    
    commit 5587a4995634af44ceecc9755165eb9a02bc0e5b
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:32:38 2016 -0700
    
        Implement SessionCatalog using ExternalCatalog
    
    commit 196f7ce1b9cfdcd607e363be10716c2dec409bd2
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:39:22 2016 -0700
    
        Document and clean up function methods
    
    commit 6d530a919c2f61e69d970625f77b99df5c93b019
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:38:50 2016 -0700
    
        Fix tests
    
    commit 2118212a6b5314838d322169c756714d9670d9ac
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:33:20 2016 -0700
    
        Refactor CatalogFunction to use FunctionIdentifier
    
    commit dd1fbaef9f53cb61cf726b95fe2bd1a845afa2c3
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:22:37 2016 -0700
    
        Refactor CatalogTable to use TableIdentifier
    
        This is a standalone commit such that in the future we can split
        it out into a separate patch if preferrable.
    
    commit 39a153c1b5ac495766eed13c5bb5e5f1135a4e4f
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 13:53:42 2016 -0700
    
        Take into account current database in table methods
    
    commit 5bf695c686d84df500b36713b2ef86226615f3c6
    Author: Andrew Or <[email protected]>
    Date:   Mon Mar 14 17:14:59 2016 -0700
    
        Do the same for functions and partitions
    
    commit 1d12578708da845fe309d3aae1dcdadfee1dee89
    Author: Andrew Or <[email protected]>
    Date:   Mon Mar 14 16:27:11 2016 -0700
    
        Clean up table method signatures + add comments
    
    commit 98c8a3b922168b843fe648664fc0e8ac2f930472
    Author: Andrew Or <[email protected]>
    Date:   Thu Mar 10 16:35:35 2016 -0800
    
        Merge in @yhuai's changes

commit aa80f9cbf232d1d7251e5e7272e0d71a2cf70cad
Author: Andrew Or <[email protected]>
Date:   2016-03-17T00:24:18Z

    Refactor SQLContext etc. to take in ExternalCatalog
    
    We need to be able to pass in ExternalCatalog in the constructor
    of SQLContext and subclasses because these should be persistent
    across sessions. Unfortunately without significant refactoring
    in the HiveContext and TestHive code we cannot make this simple
    change happen.

commit 1f1dd007124ab92ff7f064322216c934fbf497c1
Author: Andrew Or <[email protected]>
Date:   2016-03-17T18:48:31Z

    Attempt to remove old catalog from SessionState
    
    This failed because SessionCatalog does not implement
    refreshTable. This is a bigger problem because SessionCatalog
    has no notion of caching tables in the first place and so it
    doesn't really make sense to implement refreshTable. More
    refactoring involving HiveMetastoreCatalog is required to
    make this work.

commit 5daa696a9c02e0ab87d658c735472ce24e936261
Author: Andrew Or <[email protected]>
Date:   2016-03-17T19:07:20Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit 71a01e04859f307ff11dda3cabcb7188acb83117
Author: Andrew Or <[email protected]>
Date:   2016-03-17T19:14:38Z

    Fix style

commit 9f5154f46b6e78aa74f6a1f86070657ba31c6c03
Author: Andrew Or <[email protected]>
Date:   2016-03-17T22:16:37Z

    Replace all usages of analysis.Catalog
    
    This commit deletes the trait analysis.Catalog and all of its
    subclasses, with one notable exception: HiveMetastoreCatalog
    is kept because a lot of existing functionality (like caching
    data source tables) are still needed. All other occurrences
    are now replaced with SessionCatalog.
    
    Unfortunately, because HiveMetastoreCatalog is a massive
    sprawl of unmaintainable code, there is no clean way to
    integrate it nicely with the new HiveCatalog. The path of
    least resistance, then, route previous usages of
    HiveMetastoreCatalog through HiveCatalog. This requires
    some whacky initialization order hacks because HMC takes
    in HiveContext but HiveContext takes in HiveCatalog.

commit 78cbcbd28574c7d1711c7d5b6746f5d9d5b7fa69
Author: Andrew Or <[email protected]>
Date:   2016-03-18T20:24:13Z

    Fix tests
    
    The biggest change here is moving HiveMetastoreCatalog from
    HiveCatalog (the external one) to HiveSessionCatalog (the session
    specific one). This is needed because HMC depends on a lot of
    session specific things for, e.g. creating data source tables.
    This was failing tests that do things with multiple sessions,
    i.e. HiveQuerySuite.

commit 5e1648074ffb96f1b2104dc5ea3d78d25e505181
Author: Andrew Or <[email protected]>
Date:   2016-03-18T22:52:00Z

    Fix tests round 2
    
    There were some issues with case sensitivity analysis and error
    messages not being exactly as expected. The latter is now relaxed
    where possible.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to