Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/6082#discussion_r30179302
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -331,14 +334,25 @@ private[spark] class ApplicationMaster(
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_REPORTER_FAILURE, "Exception was
thrown " +
s"${failureCount} time(s) from Reporter thread.")
-
} else {
logWarning(s"Reporter thread fails ${failureCount} time(s)
in a row.", e)
}
}
}
try {
- Thread.sleep(interval)
+ val numPendingAllocate = allocator.getNumPendingAllocate
+ if (numPendingAllocate > 0) {
+ currentAllocationInterval =
+ math.min(heartbeatInterval,currentAllocationInterval * 2)
+ logDebug(s"Number of pending allocations is
${numPendingAllocate}. " +
+ "Sleeping for " + currentAllocationInterval)
+ Thread.sleep(currentAllocationInterval)
--- End diff --
One thing that is not really covered by the bug, but would be nice to add,
is code to wake up this thread when a request for new executors arrives. That
would help in lowering the latency for new executors after some idle time, when
this loop is back to sleeping at the usual heartbeat interval.
Feel free to punt that to a separate issue, though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]