Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13482#discussion_r66713200
  
    --- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -462,10 +464,23 @@ private[spark] class ApplicationMaster(
                       nextAllocationInterval = initialAllocationInterval
                       heartbeatInterval
                     }
    -              logDebug(s"Number of pending allocations is 
$numPendingAllocate. " +
    -                       s"Sleeping for $sleepInterval.")
    +              sleepStart = System.currentTimeMillis()
                   allocatorLock.wait(sleepInterval)
                 }
    +            val sleepDuration = System.currentTimeMillis() - sleepStart
    +            if (sleepDuration < sleepInterval - 5) {
    +              // log when sleep is interrupted
    +              logInfo(s"Number of pending allocations is 
$numPendingAllocate. " +
    --- End diff --
    
    These are the only signal we have that the allocation loop is getting 
signalled too much. I think it's worth an info message so we can identify other 
cases that are causing this behavior. The normal case where the thread already 
slept for more than the min interval is debug. This doesn't add an unreasonable 
number of log messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to