tgravescs commented on a change in pull request #29630:
URL: https://github.com/apache/spark/pull/29630#discussion_r482980562
##########
File path:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##########
@@ -471,7 +471,7 @@ private[history] class FsHistoryProvider(conf: SparkConf,
clock: Clock)
val newLastScanTime = clock.getTimeMillis()
logDebug(s"Scanning $logDir with lastScanTime==$lastScanTime")
- val updated = Option(fs.listStatus(new
Path(logDir))).map(_.toSeq).getOrElse(Nil)
+ val updated = Option(fs.globStatus(new Path(logDir +
"/*"))).map(_.toSeq).getOrElse(Nil)
Review comment:
yeah I don't quite get this, you are requiring all the directories to be
under a single directory, I guess that makes this logic easier, but at the same
time why the restriction and not have a list? If we are going to support
multiple directories and make sure it works I don't see the reason to have this
restriction. What if people have multiple clusters writing to different HDFS
filesystems for instance.
I agree with @HeartSaVioR if we are going to support multiple directories we
need to have a thorough look at all the logic here to make sure no other
problems. I guess in this case you are using a single filesystem?
I think we need to flush out more of the overall goals and design first
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]