GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/11836
[SPARK-14014] [SQL] Replace existing catalog with SessionCatalog
## What changes were proposed in this pull request?
`SessionCatalog`, introduced in #11750, is a catalog that keeps track of
temporary functions and tables, and delegates metastore operations to
`ExternalCatalog`. This functionality overlaps a lot with the existing
`analysis.Catalog`.
As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be
dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through
`SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext
## How was this patch tested?
This is largely a refactoring task so there are no new tests introduced.
The particularly relevant tests are `SessionCatalogSuite` and
`ExternalCatalogSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark use-session-catalog
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11836.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11836
----
commit 9130563d025c9b3f7307c84b9b96e61a1f18091b
Author: Andrew Or <[email protected]>
Date: 2016-03-16T22:50:36Z
Squashed commit of the following:
commit ad43a5ffdeeb881aaed8944971b63a27d1f4257f
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 14:35:02 2016 -0700
Expand test scope + clean up test code
commit 08969cdcaf8196a30a3c879f956a8386fe400695
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 13:21:50 2016 -0700
Fix tests
commit 6d9fa2f946ac93ebc95a9f25cf515fb0ea54b17c
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 12:31:52 2016 -0700
Keep track of current database in SessionCatalog
This allows us to not pass it into every single method like
we used to before this commit.
commit ff1c2c4661986622e8071a39922e25033b3e62ab
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:42:22 2016 -0700
Add TODO
commit 8c84dd803829ffcb8c82ee2f593ef58c3c5c94c9
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:41:30 2016 -0700
Implement tests for functions
commit 3da16fb3473b750f13ffcbbb8aaf9a7de7292897
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:04:03 2016 -0700
Implement tests for table partitions
commit 794744565269bb9ffb00f8d7a81d7b703251f956
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 18:52:30 2016 -0700
Implement tests for databases and tables
commit 2f5121b43c938b2b585de0c3d80680c0ad5a8a7d
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 16:59:38 2016 -0700
Fix infinite loop (woops)
commit d3f252d4d21b91a22dd7277f983f84daa56d65b5
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 16:12:55 2016 -0700
Refactor CatalogTestCases to make methods accessible
commit caa4013e457a46ef0b8c3a2291cb375eb9064972
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:44:23 2016 -0700
Clean up duplicate code in Table/FunctionIdentifier
commit 90ccdbb22bd8baf8caf839047148ebfd326b3593
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:33:30 2016 -0700
Fix style
commit 5587a4995634af44ceecc9755165eb9a02bc0e5b
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:32:38 2016 -0700
Implement SessionCatalog using ExternalCatalog
commit 196f7ce1b9cfdcd607e363be10716c2dec409bd2
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:39:22 2016 -0700
Document and clean up function methods
commit 6d530a919c2f61e69d970625f77b99df5c93b019
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:38:50 2016 -0700
Fix tests
commit 2118212a6b5314838d322169c756714d9670d9ac
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:33:20 2016 -0700
Refactor CatalogFunction to use FunctionIdentifier
commit dd1fbaef9f53cb61cf726b95fe2bd1a845afa2c3
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:22:37 2016 -0700
Refactor CatalogTable to use TableIdentifier
This is a standalone commit such that in the future we can split
it out into a separate patch if preferrable.
commit 39a153c1b5ac495766eed13c5bb5e5f1135a4e4f
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 13:53:42 2016 -0700
Take into account current database in table methods
commit 5bf695c686d84df500b36713b2ef86226615f3c6
Author: Andrew Or <[email protected]>
Date: Mon Mar 14 17:14:59 2016 -0700
Do the same for functions and partitions
commit 1d12578708da845fe309d3aae1dcdadfee1dee89
Author: Andrew Or <[email protected]>
Date: Mon Mar 14 16:27:11 2016 -0700
Clean up table method signatures + add comments
commit 98c8a3b922168b843fe648664fc0e8ac2f930472
Author: Andrew Or <[email protected]>
Date: Thu Mar 10 16:35:35 2016 -0800
Merge in @yhuai's changes
commit aa80f9cbf232d1d7251e5e7272e0d71a2cf70cad
Author: Andrew Or <[email protected]>
Date: 2016-03-17T00:24:18Z
Refactor SQLContext etc. to take in ExternalCatalog
We need to be able to pass in ExternalCatalog in the constructor
of SQLContext and subclasses because these should be persistent
across sessions. Unfortunately without significant refactoring
in the HiveContext and TestHive code we cannot make this simple
change happen.
commit 1f1dd007124ab92ff7f064322216c934fbf497c1
Author: Andrew Or <[email protected]>
Date: 2016-03-17T18:48:31Z
Attempt to remove old catalog from SessionState
This failed because SessionCatalog does not implement
refreshTable. This is a bigger problem because SessionCatalog
has no notion of caching tables in the first place and so it
doesn't really make sense to implement refreshTable. More
refactoring involving HiveMetastoreCatalog is required to
make this work.
commit 5daa696a9c02e0ab87d658c735472ce24e936261
Author: Andrew Or <[email protected]>
Date: 2016-03-17T19:07:20Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit 71a01e04859f307ff11dda3cabcb7188acb83117
Author: Andrew Or <[email protected]>
Date: 2016-03-17T19:14:38Z
Fix style
commit 9f5154f46b6e78aa74f6a1f86070657ba31c6c03
Author: Andrew Or <[email protected]>
Date: 2016-03-17T22:16:37Z
Replace all usages of analysis.Catalog
This commit deletes the trait analysis.Catalog and all of its
subclasses, with one notable exception: HiveMetastoreCatalog
is kept because a lot of existing functionality (like caching
data source tables) are still needed. All other occurrences
are now replaced with SessionCatalog.
Unfortunately, because HiveMetastoreCatalog is a massive
sprawl of unmaintainable code, there is no clean way to
integrate it nicely with the new HiveCatalog. The path of
least resistance, then, route previous usages of
HiveMetastoreCatalog through HiveCatalog. This requires
some whacky initialization order hacks because HMC takes
in HiveContext but HiveContext takes in HiveCatalog.
commit 78cbcbd28574c7d1711c7d5b6746f5d9d5b7fa69
Author: Andrew Or <[email protected]>
Date: 2016-03-18T20:24:13Z
Fix tests
The biggest change here is moving HiveMetastoreCatalog from
HiveCatalog (the external one) to HiveSessionCatalog (the session
specific one). This is needed because HMC depends on a lot of
session specific things for, e.g. creating data source tables.
This was failing tests that do things with multiple sessions,
i.e. HiveQuerySuite.
commit 5e1648074ffb96f1b2104dc5ea3d78d25e505181
Author: Andrew Or <[email protected]>
Date: 2016-03-18T22:52:00Z
Fix tests round 2
There were some issues with case sensitivity analysis and error
messages not being exactly as expected. The latter is now relaxed
where possible.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]