Re: [PR] [YUNIKORN-2253] Support retry when bind volume failed case instead of… [yunikorn-k8shim]

via GitHub Wed, 14 Aug 2024 19:51:04 -0700


zhuqi-lucas commented on PR #890:
URL: https://github.com/apache/yunikorn-k8shim/pull/890#issuecomment-2290439182


   > Another thing we can consider is wrapping the entire `bindPodVolumes()` in 
a retry loop, I'm not sure if that makes sense.
   > 
   > As a follow-up, we can think about retrying while doing the pod binding in 
`Task.postTaskAllocated()`. Another thing can be a more generic allocation 
retry where a failed volume/pod binding does not result in a failed Task. 
Instead, we cancel the allocation from the shim and let the core re-schedule it 
at a later time.
   
   Thanks @pbacsko for review.
   I first wanted to do this in Task.postTaskAllocated(), but we can see the 
function include a lot of fine-grained operation besides the bindvolume 
operation, so i choose to retry the fine-grained function just including the 
bind volume function. 
   
   
   This is a good idea, we can follow up in future, we can retry some other 
cases task failed provide a general retry logic for those tasks, may be need a 
specific config to enable it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [YUNIKORN-2253] Support retry when bind volume failed case instead of… [yunikorn-k8shim]

Reply via email to