Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/8153#discussion_r37003277
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -204,13 +204,20 @@ private[history] class FsHistoryProvider(conf:
SparkConf, clock: Clock)
mod1 >= mod2
}
- logInfos.sliding(20, 20).foreach { batch =>
+ val tasks: Iterator[Future[_]] = logInfos.grouped(20).map { batch =>
--- End diff --
Since we're talking about cleaning this up, you could do this also:
logInfos.grouped(20)
.map { batch =>
replayExecutor.submit(new Runnable {
override def run(): Unit = mergeApplicationListing(batch)
})
}
.foreach { task =>
// Wait for all tasks to finish. This makes sure that checkForLogs
is
// not scheduled again while some tasks are already running in the
// replayExecutor.
try {
task.get()
} catch {
case e: InterruptedException =>
throw e
case e: Exception =>
logWarning("Error replaying logs.", e)
}
}
Note I added some missing exception handling, which would cause you to
revert to the old behavior of piling up executions if an error happened.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]