liangyu-1 commented on PR #42058: URL: https://github.com/apache/spark/pull/42058#issuecomment-1665123004
> can you please provide details on exactly what is happening in the code to cause it to get stuck? If the executors are killed, then we should see we don't have enough executors and allocate more. The pr description should describe what isn't working properly such that isn't working. @tgravescs When all executors are dead, there will be no more batches complete, so the _batchProcTimeCount_ will always be zero because it only increases in _onBatchCompleted()_ method. In _manageAllocation()_ method, manager will only judge wether to request new executor when _batchProcTimeCount_ is greater than zero. The manager will only get new executors when calling _requestExecutors()_ method. This will cause the streaming apps hang. No more batches complete causes _batchProcTimeCount_ always equal to zero; _batchProcTimeCount_ always equal to zero causes the manager never judge whether to call method _requestExecutors_; manager never call method _requestExecutors_ causes that there will no longer be executors to complete new batches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
