Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19605#discussion_r147929746
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -89,10 +89,12 @@ private[spark] class HiveExternalCatalog(conf:
SparkConf, hadoopConf: Configurat
}
/**
- * Run some code involving `client` in a [[synchronized]] block and wrap
certain
- * exceptions thrown in the process in [[AnalysisException]].
+ * Run some code involving `client` and wrap certain exceptions thrown
in the process in
+ * [[AnalysisException]]. Thread-safety is guaranteed here because
methods in the `client`
+ * ([[org.apache.spark.sql.hive.client.HiveClientImpl]]) are already
synchronized through
+ * `clientLoader` in the `retryLocked` method.
*/
- private def withClient[T](body: => T): T = synchronized {
+ private def withClient[T](body: => T): T = {
--- End diff --
I went through all methods in `HiveClient` having synchronization (except
`addJar`):
- `getState` is used only in test.
- `setOut`, `setInfo` and `setError` are only used in `SparkSQLEnv.init()`.
- all other methods are called through `HiveExternalCatalog`.
So it seems `addJar` is the only exception.
To make `addJar` also go throught `HiveExternalCatalog`, we can pass
`externalCatalog` instead of `client` at [line46 in
HiveSessionStateBuilder](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L46).
But I don't know why we need to call `newSession()` at
[line45](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L45),
where a new `HiveClientImpl` instance is created, with the same class loader
and Hive client.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]