Re: distinguishing failure types during upgrade

2017-11-01 Thread Bill Farner
> > How does rollback work in that case Rollback behavior is unchanged when update pulses are enabled. disable auto-rollback That's also a feasible option. On Wed, Nov 1, 2017 at 9:15 AM, Mohit Jaggi wrote: > Signal = > - exit status from service > - reason code from mesos, it task was kill

Re: distinguishing failure types during upgrade

2017-11-01 Thread Mohit Jaggi
Signal = - exit status from service - reason code from mesos, it task was killed by Mesos e.g. revocable core revoked during oversubscription Yes, I am aware of co-ordinated updates which allow this logic to be placed outside Aurora. How does rollback work in that case? Perhaps I should just disab

Re: distinguishing failure types during upgrade

2017-11-01 Thread Bill Farner
> > Can Aurora distinguish between failures caused by the upgrade itself or > other transient systemic issues There isn't any signal i know of that would allow Aurora to independently determine the cause of task failures in a generic way. Two options come to mind: 1. Human intervention - aurora

distinguishing failure types during upgrade

2017-10-31 Thread Mohit Jaggi
Folks, Sometimes in our cluster upgrades start failing due to transient outages of dependencies or reasons unrelated to the new code being pushed out. Aurora hits its failure threshold and starts automatic rollback which may make a bad condition worse (e.g. if the outage was related to load rollbac