GitHub user yhuai opened a pull request:
https://github.com/apache/spark/pull/11918
[SPARK-14014] [SQL] Replace existing catalog with SessionCatalog
## What changes were proposed in this pull request?
SessionCatalog, introduced in #11750, is a catalog that keeps track of
temporary functions and tables, and delegates metastore operations to
ExternalCatalog. This functionality overlaps a lot with the existing
analysis.Catalog.
As of this commit, SessionCatalog and ExternalCatalog will no longer be
dead code. There are still things that need to be done after this patch, namely:
* SPARK-14013: Properly implement temporary functions in SessionCatalog
* SPARK-13879: Decide which DDL/DML commands to support natively in Spark
* SPARK-?????: Implement the ones we do want to support through
SessionCatalog.
* SPARK-?????: Merge SQL/HiveContext
## How was this patch tested?
This is largely a refactoring task so there are no new tests introduced.
The particularly relevant tests are SessionCatalogSuite and
ExternalCatalogSuite.
NOTE: This one has an extra commit on top of
https://github.com/apache/spark/pull/11836 for fixing python tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yhuai/spark use-session-catalog
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11918.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11918
----
commit 9130563d025c9b3f7307c84b9b96e61a1f18091b
Author: Andrew Or <[email protected]>
Date: 2016-03-16T22:50:36Z
Squashed commit of the following:
commit ad43a5ffdeeb881aaed8944971b63a27d1f4257f
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 14:35:02 2016 -0700
Expand test scope + clean up test code
commit 08969cdcaf8196a30a3c879f956a8386fe400695
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 13:21:50 2016 -0700
Fix tests
commit 6d9fa2f946ac93ebc95a9f25cf515fb0ea54b17c
Author: Andrew Or <[email protected]>
Date: Wed Mar 16 12:31:52 2016 -0700
Keep track of current database in SessionCatalog
This allows us to not pass it into every single method like
we used to before this commit.
commit ff1c2c4661986622e8071a39922e25033b3e62ab
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:42:22 2016 -0700
Add TODO
commit 8c84dd803829ffcb8c82ee2f593ef58c3c5c94c9
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:41:30 2016 -0700
Implement tests for functions
commit 3da16fb3473b750f13ffcbbb8aaf9a7de7292897
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 19:04:03 2016 -0700
Implement tests for table partitions
commit 794744565269bb9ffb00f8d7a81d7b703251f956
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 18:52:30 2016 -0700
Implement tests for databases and tables
commit 2f5121b43c938b2b585de0c3d80680c0ad5a8a7d
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 16:59:38 2016 -0700
Fix infinite loop (woops)
commit d3f252d4d21b91a22dd7277f983f84daa56d65b5
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 16:12:55 2016 -0700
Refactor CatalogTestCases to make methods accessible
commit caa4013e457a46ef0b8c3a2291cb375eb9064972
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:44:23 2016 -0700
Clean up duplicate code in Table/FunctionIdentifier
commit 90ccdbb22bd8baf8caf839047148ebfd326b3593
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:33:30 2016 -0700
Fix style
commit 5587a4995634af44ceecc9755165eb9a02bc0e5b
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 15:32:38 2016 -0700
Implement SessionCatalog using ExternalCatalog
commit 196f7ce1b9cfdcd607e363be10716c2dec409bd2
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:39:22 2016 -0700
Document and clean up function methods
commit 6d530a919c2f61e69d970625f77b99df5c93b019
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:38:50 2016 -0700
Fix tests
commit 2118212a6b5314838d322169c756714d9670d9ac
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:33:20 2016 -0700
Refactor CatalogFunction to use FunctionIdentifier
commit dd1fbaef9f53cb61cf726b95fe2bd1a845afa2c3
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 14:22:37 2016 -0700
Refactor CatalogTable to use TableIdentifier
This is a standalone commit such that in the future we can split
it out into a separate patch if preferrable.
commit 39a153c1b5ac495766eed13c5bb5e5f1135a4e4f
Author: Andrew Or <[email protected]>
Date: Tue Mar 15 13:53:42 2016 -0700
Take into account current database in table methods
commit 5bf695c686d84df500b36713b2ef86226615f3c6
Author: Andrew Or <[email protected]>
Date: Mon Mar 14 17:14:59 2016 -0700
Do the same for functions and partitions
commit 1d12578708da845fe309d3aae1dcdadfee1dee89
Author: Andrew Or <[email protected]>
Date: Mon Mar 14 16:27:11 2016 -0700
Clean up table method signatures + add comments
commit 98c8a3b922168b843fe648664fc0e8ac2f930472
Author: Andrew Or <[email protected]>
Date: Thu Mar 10 16:35:35 2016 -0800
Merge in @yhuai's changes
commit aa80f9cbf232d1d7251e5e7272e0d71a2cf70cad
Author: Andrew Or <[email protected]>
Date: 2016-03-17T00:24:18Z
Refactor SQLContext etc. to take in ExternalCatalog
We need to be able to pass in ExternalCatalog in the constructor
of SQLContext and subclasses because these should be persistent
across sessions. Unfortunately without significant refactoring
in the HiveContext and TestHive code we cannot make this simple
change happen.
commit 1f1dd007124ab92ff7f064322216c934fbf497c1
Author: Andrew Or <[email protected]>
Date: 2016-03-17T18:48:31Z
Attempt to remove old catalog from SessionState
This failed because SessionCatalog does not implement
refreshTable. This is a bigger problem because SessionCatalog
has no notion of caching tables in the first place and so it
doesn't really make sense to implement refreshTable. More
refactoring involving HiveMetastoreCatalog is required to
make this work.
commit 5daa696a9c02e0ab87d658c735472ce24e936261
Author: Andrew Or <[email protected]>
Date: 2016-03-17T19:07:20Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit 71a01e04859f307ff11dda3cabcb7188acb83117
Author: Andrew Or <[email protected]>
Date: 2016-03-17T19:14:38Z
Fix style
commit 9f5154f46b6e78aa74f6a1f86070657ba31c6c03
Author: Andrew Or <[email protected]>
Date: 2016-03-17T22:16:37Z
Replace all usages of analysis.Catalog
This commit deletes the trait analysis.Catalog and all of its
subclasses, with one notable exception: HiveMetastoreCatalog
is kept because a lot of existing functionality (like caching
data source tables) are still needed. All other occurrences
are now replaced with SessionCatalog.
Unfortunately, because HiveMetastoreCatalog is a massive
sprawl of unmaintainable code, there is no clean way to
integrate it nicely with the new HiveCatalog. The path of
least resistance, then, route previous usages of
HiveMetastoreCatalog through HiveCatalog. This requires
some whacky initialization order hacks because HMC takes
in HiveContext but HiveContext takes in HiveCatalog.
commit 78cbcbd28574c7d1711c7d5b6746f5d9d5b7fa69
Author: Andrew Or <[email protected]>
Date: 2016-03-18T20:24:13Z
Fix tests
The biggest change here is moving HiveMetastoreCatalog from
HiveCatalog (the external one) to HiveSessionCatalog (the session
specific one). This is needed because HMC depends on a lot of
session specific things for, e.g. creating data source tables.
This was failing tests that do things with multiple sessions,
i.e. HiveQuerySuite.
commit 5e1648074ffb96f1b2104dc5ea3d78d25e505181
Author: Andrew Or <[email protected]>
Date: 2016-03-18T22:52:00Z
Fix tests round 2
There were some issues with case sensitivity analysis and error
messages not being exactly as expected. The latter is now relaxed
where possible.
commit 57c8c29d30ca29301581be60e22bcba58832a9c1
Author: Andrew Or <[email protected]>
Date: 2016-03-18T23:29:31Z
Fix MiMa
commit c439280820a3478c45b64de8c605b0cc0f96e1a1
Author: Andrew Or <[email protected]>
Date: 2016-03-18T23:29:45Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit a3c6bf7e9c0c30912872828517968b43826c356a
Author: Andrew Or <[email protected]>
Date: 2016-03-18T23:39:33Z
Minor fixes
commit 193d93c670538a3fb7b64ea372a42c96d603de03
Author: Andrew Or <[email protected]>
Date: 2016-03-18T23:40:39Z
sessionState.sessionCatalog -> sessionState.catalog
commit f089e2bebacc000ac65a0a14b1124c0c5a1e860c
Author: Andrew Or <[email protected]>
Date: 2016-03-18T23:43:55Z
Fix tests round 3 (small round)
commit 9cd89f8d952b6577a9ce8e28e60cec8f1745887c
Author: Andrew Or <[email protected]>
Date: 2016-03-19T18:06:26Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit f41346b79e436e83be3dd41bc63b1b6f33122b02
Author: Andrew Or <[email protected]>
Date: 2016-03-19T18:07:32Z
Don't bother sessionizing HiveCatalog
commit 4b37d7aae3bdaaf61dba18d23dae2c7da9938a5f
Author: Andrew Or <[email protected]>
Date: 2016-03-19T18:52:16Z
Fix tests (round 4) - ignored test in CliSuite
Note: This commit ignores a test in CliSuite. There a future
timed out and I investigated for like half an hour and could
not figure out why. It has something to do with the way we set
the current database and executing commands with "-e". This
will take a little longer to debug so I prefer to do that in
a separate patch.
commit 1e72b0af0f03fe1149c502c52eea10497cda0f74
Author: Andrew Or <[email protected]>
Date: 2016-03-21T18:17:55Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit 52e027367dc03fcdec1aab7792f6e332e16f14a7
Author: Andrew Or <[email protected]>
Date: 2016-03-21T18:45:06Z
Clear temp tables after each suite
commit 19750d74230e1839c0b678be946b79e5afe43261
Author: Andrew Or <[email protected]>
Date: 2016-03-21T18:51:27Z
Require DB exists before showing tables on them
commit 561ca3ce16d4e4fbd1bc77c4484cefeed45f9f7d
Author: Andrew Or <[email protected]>
Date: 2016-03-21T19:58:17Z
Fix tests
commit b9de78c980bca3738cb493056326cab1c81ed343
Author: Andrew Or <[email protected]>
Date: 2016-03-21T21:10:56Z
Fix MultiDatabaseSuite
commit 536cea2382ad3349b20cdccafcf1f235bc9dc9d1
Author: Andrew Or <[email protected]>
Date: 2016-03-22T17:14:51Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit 4133d3f64747987728a0db227d32d3001e846996
Author: Andrew Or <[email protected]>
Date: 2016-03-22T17:57:39Z
Fix HiveUDFSuite + add tests
The problem was that the metadataHive didn't get any of the
spark.sql.* confs, so the barrier prefixes weren't actually set.
Thanks to @yhuai for uncovering this.
commit 159e51cdf6a38d26d8082a40daf6b3db70675232
Author: Andrew Or <[email protected]>
Date: 2016-03-22T21:07:36Z
Fix HiveCompatibilitySuite?
The issue is that after each test we only set the current
database in Hive but not the one in SessionCatalog. This means
the next test will create a table in the default database (since
we just pass CREATE TABLE commands to hive currently) but try
to resolve it in a database left over from a previous test.
commit 542283cdd6c4a26a127c0134ed4316bf33b4f617
Author: Andrew Or <[email protected]>
Date: 2016-03-22T21:27:03Z
Fix CliSuite
We were expecting an "OK" that never came. This test is way
to specific anyway and is super brittle. It's also better to
alawys set the current database through the catalog so we don't
end up with mismatched current databases between Spark and Hive.
commit 98751ccf97345883310819655139ead59e877c07
Author: Andrew Or <[email protected]>
Date: 2016-03-22T21:28:50Z
Merge branch 'master' of github.com:apache/spark into use-session-catalog
commit 16a54bad76a8297f15417ba931d20c4d86092c84
Author: Andrew Or <[email protected]>
Date: 2016-03-22T21:41:03Z
Fix HiveQuerySuite?
Every time we called TestHive.reset() we created a new temp
directory for derby, and then we would go ahead and override
the old one in the same TestHiveContext. This fails tests that
use multiple sessions for some reason. Setting the same confs
in metadataHive whenever we call reset() seems unnecessary,
so I removed it.
commit 3439dc216dbaf6b7ab23246d36d9ba4bf52847ed
Author: Andrew Or <[email protected]>
Date: 2016-03-22T23:57:46Z
Ignore new test for now...
commit e5525581d6b92b4306076fae75a7321fe346e650
Author: Andrew Or <[email protected]>
Date: 2016-03-23T00:48:07Z
Fix HiveContextSuite?
commit 5ea8469aafd347a7d1e69077de8d31a8f0167b25
Author: Andrew Or <[email protected]>
Date: 2016-03-23T05:20:06Z
Revert "Fix HiveContextSuite?"
This reverts commit e5525581d6b92b4306076fae75a7321fe346e650.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]