GitHub user andrewor14 opened a pull request:
https://github.com/apache/spark/pull/11189
[SPARK-13080] [SQL] [WIP] Implement new Catalog API using Hive
This is a step towards merging `SQLContext` and `HiveContext`. A new
internal `Catalog` API was introduced in #10982 and extended in #11069. This
patch introduces an implementation of this API using `HiveClient`, an existing
interface to Hive. It also extends `HiveClient` with additional calls to Hive
that are needed to complete the catalog implementation.
The new class hierarchy is as follows:
```
org.apache.spark.sql.catalyst.catalog.Catalog
- org.apache.spark.sql.catalyst.catalog.InMemoryCatalog
- org.apache.spark.sql.hive.HiveCatalog
```
Note that, as of this patch, none of these classes are currently used
anywhere yet. This will come in the future before the Spark 2.0 release.
WIP pending tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark hive-catalog
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11189.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11189
----
commit 3b6660578f23c69abfb59fae6796ee10bf4d482d
Author: Andrew Or <[email protected]>
Date: 2016-02-10T21:16:30Z
Add skeleton for HiveCatalog
commit f3e094ad21bd38d400f90b93898995182a508e9b
Author: Andrew Or <[email protected]>
Date: 2016-02-10T21:34:36Z
Implement createDatabase
commit 4b09a7da8ddcc17a813e494d868a6ea55f01cd2e
Author: Andrew Or <[email protected]>
Date: 2016-02-10T21:48:00Z
Fix style
commit 526f278d78664c49572fd1b48495ca99d12d1896
Author: Andrew Or <[email protected]>
Date: 2016-02-10T21:59:02Z
Implement dropDatabase
commit 4aa6e66b5ee9fa2e5f8e4b9955ed98de5b35a57c
Author: Andrew Or <[email protected]>
Date: 2016-02-10T22:06:08Z
Implement alterDatabase
commit 433d180260c57a905e226f0b8686eeb92d5dc938
Author: Andrew Or <[email protected]>
Date: 2016-02-10T22:14:15Z
Implement getDatabase, listDatabases and databaseExists
commit ff5c5bea8d4d84ae56acd4caf225e59231b946ba
Author: Andrew Or <[email protected]>
Date: 2016-02-10T23:18:53Z
Implement createTable
This required converting o.a.s.sql.catalyst.catalog.Table to its
counterpart in o.a.s.sql.hive.client.HiveTable. This required
making o.a.s.sql.hive.client.TableType an enum because we need
to create one of these from name.
commit ff49f0cf6fabc645121b43b5746017c838a3551d
Author: Andrew Or <[email protected]>
Date: 2016-02-10T23:22:38Z
Explicitly mark methods with override in HiveCatalog
commit ca98c00264564717ddd427282bfff301ebdb6c70
Author: Andrew Or <[email protected]>
Date: 2016-02-10T23:25:27Z
Implement dropTable
commit 71f99646cdf30a68a8e592b80ef5a6f40685551b
Author: Andrew Or <[email protected]>
Date: 2016-02-10T23:40:37Z
Implement renameTable, alterTable
commit 13795d83c325a69fb35260c300b379e2e55725aa
Author: Andrew Or <[email protected]>
Date: 2016-02-12T00:51:36Z
Remove intermediate representation of tables, columns etc.
Currently there's the catalog table, the Spark table used in the
hive module, and the Hive table. To avoid converting to and from
between these table representations, we kill the intermediate one,
which is the one currently used throughout HiveClient and friends.
commit af5ffc0ee84f3dc3c2b9249228293ae7285f916e
Author: Andrew Or <[email protected]>
Date: 2016-02-12T01:34:24Z
Remove TableType enum
Instead, this commit introduces CatalogTableType that serves
the same purpose. This adds some type-safety and keeps the code
clean.
commit d7b18e628374659f0a792d5c5a9154711fc9073b
Author: Andrew Or <[email protected]>
Date: 2016-02-12T01:48:30Z
Re-implement all table operations after the refactor
commit a915d01eac651994c4d69b961299b476fe40f77d
Author: Andrew Or <[email protected]>
Date: 2016-02-12T20:50:39Z
Implement all partition operations
commit 3ceb88d51a6e6af92cff2e90622ba235d0d107e9
Author: Andrew Or <[email protected]>
Date: 2016-02-12T22:04:45Z
Implement all function operations
commit 07332ad6803e578d9a61cc4693d8ce665ad8c29a
Author: Andrew Or <[email protected]>
Date: 2016-02-12T22:10:33Z
Simplify alterDatabase
The operation doesn't support renaming anyway, so it doesn't
make sense to pass in a name AND a CatalogDatabase that always
has the same name.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]