[GitHub] spark pull request #19605: [SPARK-22394] [SQL] Remove redundant synchronizat...

wzhfy Tue, 31 Oct 2017 02:15:43 -0700

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19605#discussion_r147929746
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -89,10 +89,12 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
       }
     
       /**
    -   * Run some code involving `client` in a [[synchronized]] block and wrap 
certain
    -   * exceptions thrown in the process in [[AnalysisException]].
    +   * Run some code involving `client` and wrap certain exceptions thrown 
in the process in
    +   * [[AnalysisException]]. Thread-safety is guaranteed here because 
methods in the `client`
    +   * ([[org.apache.spark.sql.hive.client.HiveClientImpl]]) are already 
synchronized through
    +   * `clientLoader` in the `retryLocked` method.
        */
    -  private def withClient[T](body: => T): T = synchronized {
    +  private def withClient[T](body: => T): T = {
    --- End diff --
    
    I went through all methods in `HiveClient` having synchronization (except 
`addJar`):
    - `getState`  is used only in test.
    - `setOut`, `setInfo` and `setError` are only used in `SparkSQLEnv.init()`.
    - all other methods are called through `HiveExternalCatalog`.
    
    So it seems `addJar` is the only exception.
    
    To make `addJar` also go throught `HiveExternalCatalog`, we can pass 
`externalCatalog` instead of `client` at [line46 in 
HiveSessionStateBuilder](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L46).
 But I don't know why we need to call `newSession()` at 
[line45](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L45),
 where a new `HiveClientImpl` instance is created, with the same class loader 
and Hive client.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19605: [SPARK-22394] [SQL] Remove redundant synchronizat...

Reply via email to