I don't believe so. We ran into a similar issue. Investigation of marathon's github account revealed the following relevant tickets:
https://github.com/mesosphere/marathon/issues/1504 https://github.com/mesosphere/marathon/issues/1111 https://github.com/mesosphere/marathon/issues/1470 Basically, the issue is that as soon as the mesos task reaches the RUNNING state, marathon clears the exponential backoff, even if the task eventually fails. Currently a ticket to fix it is slated for 0.10.0, but it's previously been slated for other releases and slipped. (We actually set our deploy process up to create the new deployment and then periodically check on its status so that we can kill it if it times out so that we don't end up with perma-failing deployments in marathon.) From: Maciej Strzelecki [mailto:[email protected]] Sent: Tuesday, July 07, 2015 10:43 AM To: [email protected] Subject: Can marathon cancel a deployment if the application is "sick"? How to make marathon cancel a deployment if the app is not starting after several tries? I saw those three settings (with defaults) in the documentation "backoffSeconds": 1, "backoffFactor": 1.15, "maxLaunchDelaySeconds": 3600, backoffSeconds, backoffFactor and maxLaunchDelaySeconds Configures exponential backoff behavior when launching potentially sick apps. This prevents sandboxes associated with consecutively failing tasks from filling up the hard disk on Mesos slaves. The backoff period is multiplied by the factor for each consecutive failure until it reaches maxLaunchDelaySeconds. This applies also to tasks that are killed due to failing too many health checks. I would expect to be able to tell marathon to "give up" after it tried few times. Is there a way? backoffseconds - 5 factor - high, - 100-200ish (so it reaches max delay very quickly after just a few failures) maxdelay - 600 ( to allow for a docker pull to finish and general startup lag) Root cause - a developer deploys application with either code failure - skipped test - or a docker image cant be pulled. If this task is left on marathon-retry-deployment for some time, mesos-ui shows thousands of failed tasks. Id love to see one, maybe two failed starts attempts, then "back-off". Maciej Strzelecki Operations Engineer Tel: +49 30 6098381-50 Fax: +49 851-213728-88 E-mail: [email protected]<mailto:[email protected]> www.crealytics.com<http://www.crealytics.com> blog.crealytics.com crealytics GmbH - Semantic PPC Advertising Technology Brunngasse 1 - 94032 Passau - Germany Oranienstraße 185 - 10999 Berlin - Germany Managing directors: Andreas Reiffen, Christof König, Dr. Markus Kurch Register court: Amtsgericht Passau, HRB 7466 Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost Reg.-Gericht: Amtsgericht Passau, HRB 7466

