Thanks to a few hours of debugging by glandium Tuesday night, a tweak
has been made which may address the multi-hour delays we had been
seeing. (As of now, we've seen one successful push that took ~38 min of
CPU and wall clock time to complete.)

With that in place, the current plans are:
 a) right now we're monitoring for the impact (if any) of changes
outlined in bug 1001735 comment 39
 b) if there are any significant issues, we will perform a new
history-preserving reset of try
 c) if there continue to be any significant issues, we will fall back to
the old try reset which deletes history

We have some confidence that (b) will be successful, and could be
automated with minimal try closure time. I'll get a new bug open on that
within a few days.

--Hal

On 2014-05-01, 11:26 , Hal Wine wrote:
> [including dev.platform this time as originally intended]
> 
> tl;dr: there was a 4h "outage" on Wed, here's our plan if it happens
> again.
> 
> Active bug for next reset: bug 1001735 (https://bugzil.la/1001735)
> 
> Action Summary:
>  - if we get another multi-hour (>2) "outage", we will immediately do a
>    hard reset of try during business hours.
>  - in parallel, we are looking at several "history preserving" methods
>    for try resets
>  - we will announce any hard reset here, along with the status of
>    history.
>  - we will announce any updates to this plan here
> 
> Details:
>  - many devs had timeouts on push to try
>  - nothing landed on try for a 4h period, between Wed Apr 30 18:55:32
>    2014 +0000 & Wed Apr 30 23:05:24 2014 +0000
>  - try has been responsive since then.
> 
> Please let me know if anyone has strong objections to the above.
> --Hal
> 

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to