pbacsko commented on PR #918: URL: https://github.com/apache/yunikorn-k8shim/pull/918#issuecomment-2401887556
> I think we need to handle orphan pods for foreign pods similar to yunikorn ones. In other words, if a pod references a non-existent node, it needs to be "orphaned" and not sent to the core. If a node appears, any currently orphaned pods (foreign or otherwise) which reference the new node need to be adopted and registered with the core. @craigcondit I looked at the code in more details, here are my findings: 1. Adding orphan foreign pod: handled inside `updateForeignPod()`: ``` if oldPod == nil && utils.IsAssignedPod(pod) && !utils.IsPodTerminated(pod) { if ctx.schedulerCache.UpdatePod(pod) { // pod was accepted by a real node log.Log(log.ShimContext).Debug("pod is assigned to a node, trigger occupied resource update", zap.String("namespace", pod.Namespace), zap.String("podName", pod.Name), zap.String("podStatusBefore", podStatusBefore), zap.String("podStatusCurrent", string(pod.Status.Phase))) allocReq := common.CreateAllocationForForeignPod(pod) if err := ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateAllocation(allocReq); err != nil { log.Log(log.ShimContext).Error("failed to add foreign allocation to the core", zap.Error(err)) } } else { // pod is orphaned (references an unknown node) log.Log(log.ShimContext).Info("skipping occupied resource update for assigned orphaned pod", zap.String("namespace", pod.Namespace), zap.String("podName", pod.Name), zap.String("nodeName", pod.Spec.NodeName)) } return } ``` 2. Pod becomes assigned to a real node: handled together with YK pods (`adoptedPods` slice) 3. Orphan pod update: same as addition, handled in `updateForeignPod()` 4. Deleting orphan pod: this is where probably the only change is necessary, don't send message to the core: ``` func (ctx *Context) deleteForeignPod(pod *v1.Pod) { releaseReq := common.CreateReleaseRequestForForeignPod(string(pod.UID), constants.DefaultPartition) if err := ctx.apiProvider.GetAPIs().SchedulerAPI.UpdateAllocation(releaseReq); err != nil { <--- don't send if orphan pod log.Log(log.ShimContext).Error("failed to remove foreign allocation from the core", zap.Error(err)) } log.Log(log.ShimContext).Debug("removing pod from cache", zap.String("podName", pod.Name)) ctx.schedulerCache.RemovePod(pod) } ``` Is there anything else? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@yunikorn.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org