Thanks to a few hours of debugging by glandium Tuesday night, a tweak has been made which may address the multi-hour delays we had been seeing. (As of now, we've seen one successful push that took ~38 min of CPU and wall clock time to complete.)
With that in place, the current plans are: a) right now we're monitoring for the impact (if any) of changes outlined in bug 1001735 comment 39 b) if there are any significant issues, we will perform a new history-preserving reset of try c) if there continue to be any significant issues, we will fall back to the old try reset which deletes history We have some confidence that (b) will be successful, and could be automated with minimal try closure time. I'll get a new bug open on that within a few days. --Hal On 2014-05-01, 11:26 , Hal Wine wrote: > [including dev.platform this time as originally intended] > > tl;dr: there was a 4h "outage" on Wed, here's our plan if it happens > again. > > Active bug for next reset: bug 1001735 (https://bugzil.la/1001735) > > Action Summary: > - if we get another multi-hour (>2) "outage", we will immediately do a > hard reset of try during business hours. > - in parallel, we are looking at several "history preserving" methods > for try resets > - we will announce any hard reset here, along with the status of > history. > - we will announce any updates to this plan here > > Details: > - many devs had timeouts on push to try > - nothing landed on try for a 4h period, between Wed Apr 30 18:55:32 > 2014 +0000 & Wed Apr 30 23:05:24 2014 +0000 > - try has been responsive since then. > > Please let me know if anyone has strong objections to the above. > --Hal > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform