[GitHub] [spark] LantaoJin opened a new pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

GitBox Sun, 28 Jun 2020 07:31:40 -0700


LantaoJin opened a new pull request #28938:
URL: https://github.com/apache/spark/pull/28938



   ### What changes were proposed in this pull request?
   
   Use `ReadWriteLock` for each database instead of one synchronized block to 
improve the performance.
   
   
   ### Why are the changes needed?
   In `HiveExternalCatalog`, all metastore operations are synchronized by a 
same object lock. In a heavy traffic Spark thriftserver or Spark Driver, 
users's queries may be stuck by any a long operation.
   
   For example, if a user is accessing a table which contains mass partitions, 
the operation `loadDynamicPartitions()` holds the object lock for a long time. 
All queries are blocking to wait for the lock. From the thread dump stack, 
`Thread-61500` was holding the object lock with a high frequency as mass 
partitions table access, this lead to many queries stuck.
   ```
   61500 HiveServer2-Background-Pool: Thread-61500
   
   java.lang.Object.wait(Native Method)
   org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1542)
   org.apache.hadoop.ipc.Client.call(Client.java:1498)
   org.apache.hadoop.ipc.Client.call(Client.java:1398)
   
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
   com.sun.proxy.$Proxy10.getEZForPath(Unknown Source)
   
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEZForPath(ClientNamenodeProtocolTranslatorPB.java:1448)
   sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source)
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:498)
   
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
   
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
   
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
   com.sun.proxy.$Proxy11.getEZForPath(Unknown Source)
   org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3408)
   
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2259)
   
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:339)
   
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
   org.apache.hadoop.hive.ql.metadata.Hive.needToCopy(Hive.java:2687)
   org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2621)
   org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2748)
   org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
   org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1593)
   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:498)
   
org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:1001)
   
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:961)
   
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:959)
   
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:959)
   
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1$$anonfun$apply$2.apply(HiveClientImpl.scala:326)
   
org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$retryLocked(HiveClientImpl.scala:255)
   
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:309)
   
org.apache.spark.sql.hive.client.HiveClientImpl.updateCallMetrics(HiveClientImpl.scala:339)
   
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:308)
   
org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:959)
   
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:993)
   
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:981)
   
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:981)
   
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:127)
   
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152)
   
org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:981)
   
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:262)
   
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:111)
   
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:111)
 => holding 
Monitor(org.apache.spark.sql.execution.command.DataWritingCommandExec@708550291})
   
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:109)
   
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:126)
   
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:137)
 => holding Monitor(java.lang.Object@1464134318})
   org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:197)
   org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:197)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Exists UTs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LantaoJin opened a new pull request #28938: [SPARK-32118][SQL] Use fine-grained read write lock for each database in HiveExternalCatalog

Reply via email to