maropu commented on a change in pull request #27827:
[SPARK-31066][SQL][test-hive1.2] Disable useless and uncleaned hive
SessionState initialization parts
URL: https://github.com/apache/spark/pull/27827#discussion_r391406612
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
##########
@@ -1232,4 +1204,42 @@ private[hive] object HiveClientImpl {
StatsSetupConst.RAW_DATA_SIZE,
StatsSetupConst.TOTAL_SIZE
)
+
+ def newHiveConf(
+ sparkConf: SparkConf,
+ hadoopConf: JIterable[JMap.Entry[String, String]],
+ extraConfig: Map[String, String],
+ classLoader: Option[ClassLoader] = None): HiveConf = {
+ val hiveConf = new HiveConf(classOf[SessionState])
+ // HiveConf is a Hadoop Configuration, which has a field of classLoader and
+ // the initial value will be the current thread's context class loader.
+ // We call hiveConf.setClassLoader(initClassLoader) at here to ensure it
use the classloader
+ // we want.
+ classLoader.foreach(hiveConf.setClassLoader)
+ // 1: Take all from the hadoopConf to this hiveConf.
+ // This hadoopConf contains user settings in Hadoop's core-site.xml file
+ // and Hive's hive-site.xml file. Note, we load hive-site.xml file
manually in
+ // SharedState and put settings in this hadoopConf instead of relying on
HiveConf
+ // to load user settings. Otherwise, HiveConf's initialize method will
override
+ // settings in the hadoopConf. This issue only shows up when
spark.sql.hive.metastore.jars
+ // is not set to builtin. When spark.sql.hive.metastore.jars is builtin,
the classpath
+ // has hive-site.xml. So, HiveConf will use that to override its default
values.
+ // 2: we set all spark confs to this hiveConf.
+ // 3: we set all entries in config to this hiveConf.
+ val confMap = (hadoopConf.iterator().asScala.map(kv => kv.getKey ->
kv.getValue) ++
+ sparkConf.getAll.toMap ++ extraConfig).toMap
+ confMap.foreach { case (k, v) => hiveConf.set(k, v) }
+ SQLConf.get.redactOptions(confMap).foreach { case (k, v) =>
+ logDebug(s"Applying Hadoop/Hive/Spark and extra properties to Hive
Conf:$k=$v")
+ }
+ // Disable CBO because we removed the Calcite dependency.
+ hiveConf.setBoolean("hive.cbo.enable", false)
+ // If this is true, SessionState.start will create a file to log hive job
which will not be
+ // deleted on exit and is useless for spark
+ hiveConf.setBoolean("hive.session.history.enabled", false)
+ // If this is tez engine, SessionState.start might bring extra logic to
initialize tez stuff,
+ // which is useless for spark.
+ hiveConf.set("hive.execution.engine", "spark")
Review comment:
WDYT this fix? @dongjoon-hyun @HyukjinKwon ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]