vandonr-amz opened a new pull request, #29077:
URL: https://github.com/apache/airflow/pull/29077

   Explained in comments in the code, but there is a race condition when 
stopping a pipeline that can make the operation fail if it's in the middle of 
two steps.
   It should be very rare in real life, but happens more often on our system 
tests, which run a dummy pipeline made of very short steps.
   Boto doesn't retry it because its retry strategy doesn't include either 409 
HTTP codes nor ConflictExceptions, which can happen for good reasons when 
trying to create resources that already exist for instance.
   
   Sagemaker team might implement a fix/retry on the server side, but in the 
meantime, we can fix it here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to