Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/3895#discussion_r26449025
--- Diff:
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ---
@@ -297,7 +297,7 @@ private[hive] object HiveShim {
def getStatsSetupConstRawDataSize = StatsSetupConst.RAW_DATA_SIZE
def createDefaultDBIfNeeded(context: HiveContext) = {
- context.runSqlHive("CREATE DATABASE default")
+ context.runSqlHive("CREATE DATABASE IF NOT EXISTS default")
--- End diff --
This is a bit tricky to explain. When initializing a `TestHiveContext`, the
following things happen:
1. `HiveContext.hiveconf` is initialized (notice that the metastore and
warehouse paths point to the whatever configured in `hive-site.xml` or the
default locations)
1. `HiveContext.sessionState` is initialized
1. `TestHiveContext.configure()` is called, metastore and warehouse paths
now point to temporary directories used for testing purposes, no `default`
database is defined there.
1. `HiveShim.createDefaultDbIfNeeded()` is called to create the `default`
database in the temproary directories.
As Michael [commented] [1], the `createDefaultDbIfNeeded` method is more
like a hack to fix the initialization disorder. And that's why I opened
baishuo/spark#2 against this PR branch. In that PR, the root cause of this
initialization disorder is fixed.
[1]: https://github.com/apache/spark/pull/3895#issuecomment-80642173
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]