turboFei commented on a change in pull request #25797: [SPARK-29043][Core]
Improve the concurrent performance of History Server
URL: https://github.com/apache/spark/pull/25797#discussion_r328416610
##########
File path:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -161,6 +160,22 @@ private[history] class FsHistoryProvider(conf: SparkConf,
clock: Clock)
new HistoryServerDiskManager(conf, path, listing, clock)
}
+ // Used to store the paths, which are being processed. This enable the
replay log tasks execute
+ // asynchronously and make sure that checkForLogs would not process a path
repeatedly.
+ private val processing = ConcurrentHashMap.newKeySet[String]
+
+ private def isProcessing(path: Path): Boolean = {
+ processing.contains(path.getName)
+ }
+
+ private def processing(path: Path): Unit = {
Review comment:
thanks for you suggestion, the name was referred that of `blacklist`.
I think the function of `processing` is similar with `blacklist`, so how
about just keeping consistent with `blacklist`?
https://github.com/apache/spark/blob/21db2f86f7c196c535e8f0b5675ae48cb2c372f7/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L164-L173
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]