GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/11918

    [SPARK-14014] [SQL] Replace existing catalog with SessionCatalog

    ## What changes were proposed in this pull request?
    
    SessionCatalog, introduced in #11750, is a catalog that keeps track of 
temporary functions and tables, and delegates metastore operations to 
ExternalCatalog. This functionality overlaps a lot with the existing 
analysis.Catalog.
    
    As of this commit, SessionCatalog and ExternalCatalog will no longer be 
dead code. There are still things that need to be done after this patch, namely:
    
    * SPARK-14013: Properly implement temporary functions in SessionCatalog
    * SPARK-13879: Decide which DDL/DML commands to support natively in Spark
    * SPARK-?????: Implement the ones we do want to support through 
SessionCatalog.
    * SPARK-?????: Merge SQL/HiveContext
    
    
    ## How was this patch tested?
    
    This is largely a refactoring task so there are no new tests introduced. 
The particularly relevant tests are SessionCatalogSuite and 
ExternalCatalogSuite.
    
    NOTE: This one has an extra commit on top of 
https://github.com/apache/spark/pull/11836 for fixing python tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark use-session-catalog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11918
    
----
commit 9130563d025c9b3f7307c84b9b96e61a1f18091b
Author: Andrew Or <[email protected]>
Date:   2016-03-16T22:50:36Z

    Squashed commit of the following:
    
    commit ad43a5ffdeeb881aaed8944971b63a27d1f4257f
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 14:35:02 2016 -0700
    
        Expand test scope + clean up test code
    
    commit 08969cdcaf8196a30a3c879f956a8386fe400695
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 13:21:50 2016 -0700
    
        Fix tests
    
    commit 6d9fa2f946ac93ebc95a9f25cf515fb0ea54b17c
    Author: Andrew Or <[email protected]>
    Date:   Wed Mar 16 12:31:52 2016 -0700
    
        Keep track of current database in SessionCatalog
    
        This allows us to not pass it into every single method like
        we used to before this commit.
    
    commit ff1c2c4661986622e8071a39922e25033b3e62ab
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:42:22 2016 -0700
    
        Add TODO
    
    commit 8c84dd803829ffcb8c82ee2f593ef58c3c5c94c9
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:41:30 2016 -0700
    
        Implement tests for functions
    
    commit 3da16fb3473b750f13ffcbbb8aaf9a7de7292897
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 19:04:03 2016 -0700
    
        Implement tests for table partitions
    
    commit 794744565269bb9ffb00f8d7a81d7b703251f956
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 18:52:30 2016 -0700
    
        Implement tests for databases and tables
    
    commit 2f5121b43c938b2b585de0c3d80680c0ad5a8a7d
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 16:59:38 2016 -0700
    
        Fix infinite loop (woops)
    
    commit d3f252d4d21b91a22dd7277f983f84daa56d65b5
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 16:12:55 2016 -0700
    
        Refactor CatalogTestCases to make methods accessible
    
    commit caa4013e457a46ef0b8c3a2291cb375eb9064972
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:44:23 2016 -0700
    
        Clean up duplicate code in Table/FunctionIdentifier
    
    commit 90ccdbb22bd8baf8caf839047148ebfd326b3593
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:33:30 2016 -0700
    
        Fix style
    
    commit 5587a4995634af44ceecc9755165eb9a02bc0e5b
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 15:32:38 2016 -0700
    
        Implement SessionCatalog using ExternalCatalog
    
    commit 196f7ce1b9cfdcd607e363be10716c2dec409bd2
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:39:22 2016 -0700
    
        Document and clean up function methods
    
    commit 6d530a919c2f61e69d970625f77b99df5c93b019
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:38:50 2016 -0700
    
        Fix tests
    
    commit 2118212a6b5314838d322169c756714d9670d9ac
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:33:20 2016 -0700
    
        Refactor CatalogFunction to use FunctionIdentifier
    
    commit dd1fbaef9f53cb61cf726b95fe2bd1a845afa2c3
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 14:22:37 2016 -0700
    
        Refactor CatalogTable to use TableIdentifier
    
        This is a standalone commit such that in the future we can split
        it out into a separate patch if preferrable.
    
    commit 39a153c1b5ac495766eed13c5bb5e5f1135a4e4f
    Author: Andrew Or <[email protected]>
    Date:   Tue Mar 15 13:53:42 2016 -0700
    
        Take into account current database in table methods
    
    commit 5bf695c686d84df500b36713b2ef86226615f3c6
    Author: Andrew Or <[email protected]>
    Date:   Mon Mar 14 17:14:59 2016 -0700
    
        Do the same for functions and partitions
    
    commit 1d12578708da845fe309d3aae1dcdadfee1dee89
    Author: Andrew Or <[email protected]>
    Date:   Mon Mar 14 16:27:11 2016 -0700
    
        Clean up table method signatures + add comments
    
    commit 98c8a3b922168b843fe648664fc0e8ac2f930472
    Author: Andrew Or <[email protected]>
    Date:   Thu Mar 10 16:35:35 2016 -0800
    
        Merge in @yhuai's changes

commit aa80f9cbf232d1d7251e5e7272e0d71a2cf70cad
Author: Andrew Or <[email protected]>
Date:   2016-03-17T00:24:18Z

    Refactor SQLContext etc. to take in ExternalCatalog
    
    We need to be able to pass in ExternalCatalog in the constructor
    of SQLContext and subclasses because these should be persistent
    across sessions. Unfortunately without significant refactoring
    in the HiveContext and TestHive code we cannot make this simple
    change happen.

commit 1f1dd007124ab92ff7f064322216c934fbf497c1
Author: Andrew Or <[email protected]>
Date:   2016-03-17T18:48:31Z

    Attempt to remove old catalog from SessionState
    
    This failed because SessionCatalog does not implement
    refreshTable. This is a bigger problem because SessionCatalog
    has no notion of caching tables in the first place and so it
    doesn't really make sense to implement refreshTable. More
    refactoring involving HiveMetastoreCatalog is required to
    make this work.

commit 5daa696a9c02e0ab87d658c735472ce24e936261
Author: Andrew Or <[email protected]>
Date:   2016-03-17T19:07:20Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit 71a01e04859f307ff11dda3cabcb7188acb83117
Author: Andrew Or <[email protected]>
Date:   2016-03-17T19:14:38Z

    Fix style

commit 9f5154f46b6e78aa74f6a1f86070657ba31c6c03
Author: Andrew Or <[email protected]>
Date:   2016-03-17T22:16:37Z

    Replace all usages of analysis.Catalog
    
    This commit deletes the trait analysis.Catalog and all of its
    subclasses, with one notable exception: HiveMetastoreCatalog
    is kept because a lot of existing functionality (like caching
    data source tables) are still needed. All other occurrences
    are now replaced with SessionCatalog.
    
    Unfortunately, because HiveMetastoreCatalog is a massive
    sprawl of unmaintainable code, there is no clean way to
    integrate it nicely with the new HiveCatalog. The path of
    least resistance, then, route previous usages of
    HiveMetastoreCatalog through HiveCatalog. This requires
    some whacky initialization order hacks because HMC takes
    in HiveContext but HiveContext takes in HiveCatalog.

commit 78cbcbd28574c7d1711c7d5b6746f5d9d5b7fa69
Author: Andrew Or <[email protected]>
Date:   2016-03-18T20:24:13Z

    Fix tests
    
    The biggest change here is moving HiveMetastoreCatalog from
    HiveCatalog (the external one) to HiveSessionCatalog (the session
    specific one). This is needed because HMC depends on a lot of
    session specific things for, e.g. creating data source tables.
    This was failing tests that do things with multiple sessions,
    i.e. HiveQuerySuite.

commit 5e1648074ffb96f1b2104dc5ea3d78d25e505181
Author: Andrew Or <[email protected]>
Date:   2016-03-18T22:52:00Z

    Fix tests round 2
    
    There were some issues with case sensitivity analysis and error
    messages not being exactly as expected. The latter is now relaxed
    where possible.

commit 57c8c29d30ca29301581be60e22bcba58832a9c1
Author: Andrew Or <[email protected]>
Date:   2016-03-18T23:29:31Z

    Fix MiMa

commit c439280820a3478c45b64de8c605b0cc0f96e1a1
Author: Andrew Or <[email protected]>
Date:   2016-03-18T23:29:45Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit a3c6bf7e9c0c30912872828517968b43826c356a
Author: Andrew Or <[email protected]>
Date:   2016-03-18T23:39:33Z

    Minor fixes

commit 193d93c670538a3fb7b64ea372a42c96d603de03
Author: Andrew Or <[email protected]>
Date:   2016-03-18T23:40:39Z

    sessionState.sessionCatalog -> sessionState.catalog

commit f089e2bebacc000ac65a0a14b1124c0c5a1e860c
Author: Andrew Or <[email protected]>
Date:   2016-03-18T23:43:55Z

    Fix tests round 3 (small round)

commit 9cd89f8d952b6577a9ce8e28e60cec8f1745887c
Author: Andrew Or <[email protected]>
Date:   2016-03-19T18:06:26Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit f41346b79e436e83be3dd41bc63b1b6f33122b02
Author: Andrew Or <[email protected]>
Date:   2016-03-19T18:07:32Z

    Don't bother sessionizing HiveCatalog

commit 4b37d7aae3bdaaf61dba18d23dae2c7da9938a5f
Author: Andrew Or <[email protected]>
Date:   2016-03-19T18:52:16Z

    Fix tests (round 4) - ignored test in CliSuite
    
    Note: This commit ignores a test in CliSuite. There a future
    timed out and I investigated for like half an hour and could
    not figure out why. It has something to do with the way we set
    the current database and executing commands with "-e". This
    will take a little longer to debug so I prefer to do that in
    a separate patch.

commit 1e72b0af0f03fe1149c502c52eea10497cda0f74
Author: Andrew Or <[email protected]>
Date:   2016-03-21T18:17:55Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit 52e027367dc03fcdec1aab7792f6e332e16f14a7
Author: Andrew Or <[email protected]>
Date:   2016-03-21T18:45:06Z

    Clear temp tables after each suite

commit 19750d74230e1839c0b678be946b79e5afe43261
Author: Andrew Or <[email protected]>
Date:   2016-03-21T18:51:27Z

    Require DB exists before showing tables on them

commit 561ca3ce16d4e4fbd1bc77c4484cefeed45f9f7d
Author: Andrew Or <[email protected]>
Date:   2016-03-21T19:58:17Z

    Fix tests

commit b9de78c980bca3738cb493056326cab1c81ed343
Author: Andrew Or <[email protected]>
Date:   2016-03-21T21:10:56Z

    Fix MultiDatabaseSuite

commit 536cea2382ad3349b20cdccafcf1f235bc9dc9d1
Author: Andrew Or <[email protected]>
Date:   2016-03-22T17:14:51Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit 4133d3f64747987728a0db227d32d3001e846996
Author: Andrew Or <[email protected]>
Date:   2016-03-22T17:57:39Z

    Fix HiveUDFSuite + add tests
    
    The problem was that the metadataHive didn't get any of the
    spark.sql.* confs, so the barrier prefixes weren't actually set.
    Thanks to @yhuai for uncovering this.

commit 159e51cdf6a38d26d8082a40daf6b3db70675232
Author: Andrew Or <[email protected]>
Date:   2016-03-22T21:07:36Z

    Fix HiveCompatibilitySuite?
    
    The issue is that after each test we only set the current
    database in Hive but not the one in SessionCatalog. This means
    the next test will create a table in the default database (since
    we just pass CREATE TABLE commands to hive currently) but try
    to resolve it in a database left over from a previous test.

commit 542283cdd6c4a26a127c0134ed4316bf33b4f617
Author: Andrew Or <[email protected]>
Date:   2016-03-22T21:27:03Z

    Fix CliSuite
    
    We were expecting an "OK" that never came. This test is way
    to specific anyway and is super brittle. It's also better to
    alawys set the current database through the catalog so we don't
    end up with mismatched current databases between Spark and Hive.

commit 98751ccf97345883310819655139ead59e877c07
Author: Andrew Or <[email protected]>
Date:   2016-03-22T21:28:50Z

    Merge branch 'master' of github.com:apache/spark into use-session-catalog

commit 16a54bad76a8297f15417ba931d20c4d86092c84
Author: Andrew Or <[email protected]>
Date:   2016-03-22T21:41:03Z

    Fix HiveQuerySuite?
    
    Every time we called TestHive.reset() we created a new temp
    directory for derby, and then we would go ahead and override
    the old one in the same TestHiveContext. This fails tests that
    use multiple sessions for some reason. Setting the same confs
    in metadataHive whenever we call reset() seems unnecessary,
    so I removed it.

commit 3439dc216dbaf6b7ab23246d36d9ba4bf52847ed
Author: Andrew Or <[email protected]>
Date:   2016-03-22T23:57:46Z

    Ignore new test for now...

commit e5525581d6b92b4306076fae75a7321fe346e650
Author: Andrew Or <[email protected]>
Date:   2016-03-23T00:48:07Z

    Fix HiveContextSuite?

commit 5ea8469aafd347a7d1e69077de8d31a8f0167b25
Author: Andrew Or <[email protected]>
Date:   2016-03-23T05:20:06Z

    Revert "Fix HiveContextSuite?"
    
    This reverts commit e5525581d6b92b4306076fae75a7321fe346e650.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to