[GitHub] [spark] maropu commented on a change in pull request #27827: [SPARK-31066][SQL][test-hive1.2] Disable useless and uncleaned hive SessionState initialization parts

GitBox Wed, 11 Mar 2020 22:21:25 -0700

maropu commented on a change in pull request #27827: 
[SPARK-31066][SQL][test-hive1.2] Disable useless and uncleaned hive 
SessionState initialization parts
URL: https://github.com/apache/spark/pull/27827#discussion_r391406612


 ##########
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
 ##########
 @@ -1232,4 +1204,42 @@ private[hive] object HiveClientImpl {
     StatsSetupConst.RAW_DATA_SIZE,
     StatsSetupConst.TOTAL_SIZE
   )
+
+  def newHiveConf(
+      sparkConf: SparkConf,
+      hadoopConf: JIterable[JMap.Entry[String, String]],
+      extraConfig: Map[String, String],
+      classLoader: Option[ClassLoader] = None): HiveConf = {
+    val hiveConf = new HiveConf(classOf[SessionState])
+    // HiveConf is a Hadoop Configuration, which has a field of classLoader and
+    // the initial value will be the current thread's context class loader.
+    // We call hiveConf.setClassLoader(initClassLoader) at here to ensure it 
use the classloader
+    // we want.
+    classLoader.foreach(hiveConf.setClassLoader)
+    // 1: Take all from the hadoopConf to this hiveConf.
+    // This hadoopConf contains user settings in Hadoop's core-site.xml file
+    // and Hive's hive-site.xml file. Note, we load hive-site.xml file 
manually in
+    // SharedState and put settings in this hadoopConf instead of relying on 
HiveConf
+    // to load user settings. Otherwise, HiveConf's initialize method will 
override
+    // settings in the hadoopConf. This issue only shows up when 
spark.sql.hive.metastore.jars
+    // is not set to builtin. When spark.sql.hive.metastore.jars is builtin, 
the classpath
+    // has hive-site.xml. So, HiveConf will use that to override its default 
values.
+    // 2: we set all spark confs to this hiveConf.
+    // 3: we set all entries in config to this hiveConf.
+    val confMap = (hadoopConf.iterator().asScala.map(kv => kv.getKey -> 
kv.getValue) ++
+      sparkConf.getAll.toMap ++ extraConfig).toMap
+    confMap.foreach { case (k, v) => hiveConf.set(k, v) }
+    SQLConf.get.redactOptions(confMap).foreach { case (k, v) =>
+      logDebug(s"Applying Hadoop/Hive/Spark and extra properties to Hive 
Conf:$k=$v")
+    }
+    // Disable CBO because we removed the Calcite dependency.
+    hiveConf.setBoolean("hive.cbo.enable", false)
+    // If this is true, SessionState.start will create a file to log hive job 
which will not be
+    // deleted on exit and is useless for spark
+    hiveConf.setBoolean("hive.session.history.enabled", false)
+    // If this is tez engine, SessionState.start might bring extra logic to 
initialize tez stuff,
+    // which is useless for spark.
+    hiveConf.set("hive.execution.engine", "spark")
 
 Review comment:
   WDYT this fix? @dongjoon-hyun @HyukjinKwon ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #27827: [SPARK-31066][SQL][test-hive1.2] Disable useless and uncleaned hive SessionState initialization parts

Reply via email to