liangyu-1 commented on PR #42058:
URL: https://github.com/apache/spark/pull/42058#issuecomment-1665123004

   > can you please provide details on exactly what is happening in the code to 
cause it to get stuck? If the executors are killed, then we should see we don't 
have enough executors and allocate more. The pr description should describe 
what isn't working properly such that isn't working.
   
   @tgravescs 
   
   When all executors are dead, there will be no more batches complete, so the 
_batchProcTimeCount_ will always be zero because it only increases in 
_onBatchCompleted()_ method.
   In _manageAllocation()_ method, manager will only judge wether to request 
new executor when _batchProcTimeCount_ is greater than zero.
   The manager will only get new executors when calling _requestExecutors()_ 
method. 
   
   This will cause the streaming apps hang. No more batches complete causes 
_batchProcTimeCount_ always equal to zero; _batchProcTimeCount_ always equal to 
zero causes the manager never judge whether to call method _requestExecutors_; 
manager never call method _requestExecutors_ causes that there will no longer 
be executors to complete new batches. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to