[GitHub] [spark] tgravescs commented on a change in pull request #29630: [SPARK-32097] Enable Spark History Server to read from multiple directories

GitBox Thu, 03 Sep 2020 06:34:02 -0700


tgravescs commented on a change in pull request #29630:
URL: https://github.com/apache/spark/pull/29630#discussion_r482980562




##########
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -471,7 +471,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
       val newLastScanTime = clock.getTimeMillis()
       logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
 
-      val updated = Option(fs.listStatus(new 
Path(logDir))).map(_.toSeq).getOrElse(Nil)
+      val updated = Option(fs.globStatus(new Path(logDir + 
"/*"))).map(_.toSeq).getOrElse(Nil)

Review comment:
       yeah I don't quite get this, you are requiring all the directories to be 
under a single directory, I guess that makes this logic easier, but at the same 
time why the restriction and not have a list?  If we are going to support 
multiple directories and make sure it works I don't see the reason to have this 
restriction.  What if people have multiple clusters writing to different HDFS 
filesystems for instance.  
   
   I agree with @HeartSaVioR if we are going to support multiple directories we 
need to have a thorough look at all the logic here to make sure no other 
problems. I guess in this case you are using a single filesystem?
   
   I think we need to flush out more of the overall goals and design first




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #29630: [SPARK-32097] Enable Spark History Server to read from multiple directories

Reply via email to