LantaoJin commented on a change in pull request #28938:
URL: https://github.com/apache/spark/pull/28938#discussion_r509954044
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -95,11 +100,20 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,
hadoopConf: Configurat
}
/**
- * Run some code involving `client` in a [[synchronized]] block and wrap
certain
+ * Run some code involving `client` in a [[ReadWriteLock]] and wrap certain
* exceptions thrown in the process in [[AnalysisException]].
+ *
+ * @param db to specify the place of the operation act on.
+ * @param writeLock to specify it is a write lock.
*/
- private def withClient[T](body: => T): T = synchronized {
+ private def withClient[T](db: String = "", writeLock: Boolean = false)
(body: => T): T = {
Review comment:
@cloud-fan The hive client is not thread-safe. But it's thread local.
1. If there are two threads are creating two tables in the same database.
For example, thread-1 is creating db1.table1 and thread-2 is creating
db1.table2. This db-level lock in `withClient` will block one thread until the
other one finished.
2. If there are two threads are creating two tables in the different
database. For example, thread-1 is creating db1.table1 and thread-2 is creating
db2.table2. There operations could be executed concurrently since the hive
clients are thread-local. The client in thread-1 is not the same instance with
the client in thread-2:
```scala
private def client: Hive = {
// get the Hive and set to thread local
val c = Hive.get(conf)
Hive.set(c)
c
}
```
Because `Hive.get(conf)` will get a Hive client instance for each thread.
```java
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
Hive db = hiveDB.get();
if (db == null || !db.isCurrentUserOwner() || needsRefresh
|| (c != null && db.metaStoreClient != null && !isCompatible(db, c,
isFastCheck))) {
db = create(c, false, db, doRegisterAllFns);
}
if (c != null) {
db.conf = c;
}
return db;
}
```
and
```java
private static ThreadLocal<Hive> hiveDB = new ThreadLocal<Hive>() {
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]