skonto edited a comment on pull request #30675:
URL: https://github.com/apache/spark/pull/30675#issuecomment-745201103
@attilapiros ok got it. The part that we are doing the check is meant for
the case we missed an event, at least this is what I read (are there any other
edge cases we need this?):
> // Reconcile the case where Spark claims to know about an executor but
the corresponding pod
// is missing from the cluster. This would occur if we miss a deletion
event and the pod
// transitions immediately from running to absent.
To avoid missing the deletion we could add a finilizer and so we see if the
pod has a deletiontimestamp set and remove the finilizer and proceed with
proper deletion.
Btw my informers+lister proposal does notify you during cache
re-sync/re-list check
[here](https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html)
but if the controller re-starts you lose cache state. That is not a problem
here because the Spark driver is not meant for a re-start anyway. So instead of
having like a timeout to skip pod being considered as missing the moment a
Delete callback is triggered we know that the pod was missed, this known as
[DeleteFinalStateUnknown.](https://javadoc.io/static/io.fabric8/kubernetes-client/4.6.4/io/fabric8/kubernetes/client/informers/cache/DeltaFIFO.DeletedFinalStateUnknown.html)
I suspect many of the stuff we do manually could be done via the
informer+lister thing, but have not done this work to see what can be
refactored yet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]