Re: [Wikitech-l] [Engineering] Gerrit was down today
Heya! Gonna reboot Gerrit real quick this morning. Turns out "cobalt" did not have hyperthreading turned on. Services should be back momentarily! -Chad On Fri, Oct 7, 2016 at 2:07 PM Daniel Zahn wrote: > The Gerrit migration is over. It is back up and served from new server > "cobalt" now. It feels faster than before as well. Thanks much to > Brandon Black for help. > > ___ > Engineering mailing list > engineer...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/engineering > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Gerrit was down today
It was bothering to me but I'm guessing this is one of so so many flaws of gerrit itself and probably not fixable easily (other people are more qualified to comment) but i want to suggest speeding up the process to move to differential which is much better in handling such down times alongside with other benefits. Best On Fri, Oct 7, 2016, 2:26 AM Gergo Tisza wrote: > Thanks a lot for the quick recovery! > > Would it be possible to use something other than a redirect next time when > traffic needs to be blocked? An apache deny rule or a 404 would work, but a > redirect means that reloading the page (or reopening the browser) will > cause the URL to be lost with little hope of recovery (browsers don't > record redirects in the history). That can be very annoying when one uses > tabs as bookmarks (bad habit as it is). > > On Thu, Oct 6, 2016 at 3:33 PM, Chad Horohoe > wrote: > > > Hi! > > > > Sorry for the extended downtime! From what we can tell, it appears as > > though > > the machine that Gerrit is running on (lead) is having some hardware > > issues that > > are making the CPU misbehave. We've worked around it for now, so things > > should > > be up (and Zuul is processing CI events just fine). > > > > However, since it appears it's a hardware problem, we're planning to > > migrate off > > of lead to a new machine (cobalt). The public IP addresses will not be > > changing. > > The plan right now is to do this migration tomorrow with a scheduled > > downtime > > at 17:00UTC (10:00 PST). > > > > We'll be keeping a close eye on things in the meantime, so if things > > deteriorate > > again we can start the migration sooner. > > > > (and yeah, wikitech incident report to follow, I'm a little burnt out > > right now though) > > > > Thanks again for bearing with us! > > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Gerrit was down today
Hi! Sorry for the extended downtime! From what we can tell, it appears as though the machine that Gerrit is running on (lead) is having some hardware issues that are making the CPU misbehave. We've worked around it for now, so things should be up (and Zuul is processing CI events just fine). However, since it appears it's a hardware problem, we're planning to migrate off of lead to a new machine (cobalt). The public IP addresses will not be changing. The plan right now is to do this migration tomorrow with a scheduled downtime at 17:00UTC (10:00 PST). We'll be keeping a close eye on things in the meantime, so if things deteriorate again we can start the migration sooner. (and yeah, wikitech incident report to follow, I'm a little burnt out right now though) Thanks again for bearing with us! -Chad On Thu, Oct 6, 2016 at 2:32 PM Greg Grossmeier wrote: > (It wasn't just you) > > Gerrit was down today starting around 17:49 UTC. It is now back up and > services are coming back online. > > A full investigation into the cause of the outage is still on-going.[0] > > Apologies for the downtime. > > WMF Release Engineering > > [0] https://etherpad.wikimedia.org/p/gerrit-outage-20161006 > But this is missing a lot of the information/discussion that is > happening in #wikimedia-operations on Freenode. A link to the > incident report will be pasted into that etherpad when it is > created. > > -- > | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | > | Release Team ManagerA18D 1138 8E47 FAC8 1C7D | > > ___ > Engineering mailing list > engineer...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/engineering > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l