[GitHub] [spark] skonto edited a comment on pull request #30675: [SPARK-33711][K8S] Avoid race condition between POD lifecycle manager and scheduler backend

GitBox Tue, 15 Dec 2020 02:49:13 -0800


skonto edited a comment on pull request #30675:
URL: https://github.com/apache/spark/pull/30675#issuecomment-745201103



   @attilapiros ok got it. The part that we are doing the check is meant for 
the case we missed an event, at least this is what I read (are there any other 
edge cases we need this?):
   
   >   // Reconcile the case where Spark claims to know about an executor but 
the corresponding pod
       // is missing from the cluster. This would occur if we miss a deletion 
event and the pod
       // transitions immediately from running to absent.
   
   To avoid missing the deletion we could add a finilizer and so we see if the 
pod has a deletiontimestamp set and remove the finilizer and proceed with 
proper deletion. 
   Btw my informers+lister proposal does notify you during cache 
re-sync/re-list check 
[here](https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html)
 but if the controller re-starts you lose cache state. That is not a problem 
here because the Spark driver is not meant for a re-start anyway. So instead of 
having like a timeout to skip pod being considered as missing the moment a 
Delete callback is triggered we know that the pod was missed, this known as 
[DeleteFinalStateUnknown.](https://javadoc.io/static/io.fabric8/kubernetes-client/4.6.4/io/fabric8/kubernetes/client/informers/cache/DeltaFIFO.DeletedFinalStateUnknown.html)
   I suspect many of the stuff we do manually could be done via the 
informer+lister thing, but have not done this work to see what can be 
refactored yet.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] skonto edited a comment on pull request #30675: [SPARK-33711][K8S] Avoid race condition between POD lifecycle manager and scheduler backend

Reply via email to