Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
quote name=Greg Grossmeier date=2015-05-28 time=12:18:07 -0700 See also: https://phabricator.wikimedia.org/maniphest/?statuses=open%28%29projects=PHID-PROJ-4uc7r7pdosfsk55qg7f6#R (aka: open tasks filed in the #Wikimedia-log-errors project on phabricator) A maybe better way to view that query: https://phabricator.wikimedia.org/maniphest/query/8G19mXxCyGox/#R That one is sorted by (phabricator) project, so you can more easily see which code repository is (probably) responsible for the error. -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
quote name=Mukunda Modell date=2015-05-28 time=13:42:50 -0500 This also means we need to be even more diligent about policing the error logs and eliminating noise which obscures real problems by burying them among the other log messages. +1000 See also: https://phabricator.wikimedia.org/maniphest/?statuses=open%28%29projects=PHID-PROJ-4uc7r7pdosfsk55qg7f6#R (aka: open tasks filed in the #Wikimedia-log-errors project on phabricator) -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
On May 28, 2015 4:21 PM, Jon Robson jdlrob...@gmail.com wrote: I suspect the idea is to lean more on our quality assurance infrastructure e.g. browser tests which I fully welcome. The more developed they become the less chance of regressions making it to code let alone our projects. When I joined 3 years ago we had no quality assurance infrastructure and now we've got things in a great place. They still need a little fine tuning but this should help us iron out the kinks by forcing us to rely on them more and push out better code. Significantly better than three years ago, sure. However I would not use the phrase great place. There are still significant gaps in our coverage. For browser tests in particular Im given to understand that the asynchronous nature and somewhat high false positive rate make them not be taken as seriously as they should. Fwiw, i have on several occasions recieved reports from users that I broke something (obviously i try to avoid that, but im far from perfect). I have never once had a browser test succesfully tell me I broke something in advanced (albeit it could be because i do backend things, but backend things do affect the front end when they explode). Occasionally the unit tests do, but i would still say there is a lot they dont. Tl;dr: imo tests are great, but no where near replacing actual testing, at least for now. That said, i dont think that the new deployment schedule will cause any problems, and is at the very least worth trying. --bawolff P.s. anyone remember writing code in the time between 1.16 and 1.17? You are all spoiled :p ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
Awesome! This will make many teams very happy since they'll be moving faster. What's the criteria by which you will evaluate the success of this? Thanks, Dan On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Engineering mailing list engineer...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
I suspect the idea is to lean more on our quality assurance infrastructure e.g. browser tests which I fully welcome. The more developed they become the less chance of regressions making it to code let alone our projects. When I joined 3 years ago we had no quality assurance infrastructure and now we've got things in a great place. They still need a little fine tuning but this should help us iron out the kinks by forcing us to rely on them more and push out better code. On 28 May 2015 3:53 pm, Risker risker...@gmail.com wrote: This is strictly a question from an uninvolved observer. Does this schedule provide for sufficient time and real-time/hands-on testing before changes hit the big projects? An IRC discussion I was following last evening suggested to me that the first deploy (to test wikis and mw.org) probably did not get sufficient hands-on testing/utilization to surface many issues that would be significant on production wikis, which means only 24 hours on smaller non-wikipedia wikis, hoping that any problems will pop up before it's applied to dewiki, frwiki and enwiki. I recognize the challenges in balancing continuous improvement and uptime - but if problems aren't surfaced before they hit wikipedias simply because the changes aren't activated by user actions or the problems aren't reported quickly enough, then it's probably going to make more work at the other end of the chain, with more likelihood that changes will need to be rolled back or patches having to be written on the fly. I have a lot of admiration for all of you who address these unplanned situations (it really is impressive to watch!), but I'd hate to see a lot of people constantly being pulled away from other tasks to problem-solve downtimes on big projects. Risker/Anne On 28 May 2015 at 07:51, Dan Garry dga...@wikimedia.org wrote: Awesome! This will make many teams very happy since they'll be moving faster. What's the criteria by which you will evaluate the success of this? Thanks, Dan On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Engineering mailing list engineer...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
This is strictly a question from an uninvolved observer. Does this schedule provide for sufficient time and real-time/hands-on testing before changes hit the big projects? An IRC discussion I was following last evening suggested to me that the first deploy (to test wikis and mw.org) probably did not get sufficient hands-on testing/utilization to surface many issues that would be significant on production wikis, which means only 24 hours on smaller non-wikipedia wikis, hoping that any problems will pop up before it's applied to dewiki, frwiki and enwiki. I recognize the challenges in balancing continuous improvement and uptime - but if problems aren't surfaced before they hit wikipedias simply because the changes aren't activated by user actions or the problems aren't reported quickly enough, then it's probably going to make more work at the other end of the chain, with more likelihood that changes will need to be rolled back or patches having to be written on the fly. I have a lot of admiration for all of you who address these unplanned situations (it really is impressive to watch!), but I'd hate to see a lot of people constantly being pulled away from other tasks to problem-solve downtimes on big projects. Risker/Anne On 28 May 2015 at 07:51, Dan Garry dga...@wikimedia.org wrote: Awesome! This will make many teams very happy since they'll be moving faster. What's the criteria by which you will evaluate the success of this? Thanks, Dan On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Engineering mailing list engineer...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
quote name=Dan Garry date=2015-05-28 time=13:51:47 +0200 Awesome! This will make many teams very happy since they'll be moving faster. :) What's the criteria by which you will evaluate the success of this? 1) the above (happier teams) 2) It's going to be hard to measure success but it'll be much easier to identify failure. I'll be talking with many PMs from WMF and looking through the SWAT deploys and incidents over the next two or more weeks (and on going, of course) to determine if this has caused any unmitigated pain. Greg -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
quote name=Risker date=2015-05-28 time=09:53:31 -0400 This is strictly a question from an uninvolved observer. Does this schedule provide for sufficient time and real-time/hands-on testing before changes hit the big projects? Yes. We still have Beta Cluster (production-like environment) which runs all code merged into master within 10 minutes of it being merged. An IRC discussion I was following last evening suggested to me that the first deploy (to test wikis and mw.org) probably did not get sufficient hands-on testing/utilization to surface many issues that would be significant on production wikis, which means only 24 hours on smaller non-wikipedia wikis, hoping that any problems will pop up before it's applied to dewiki, frwiki and enwiki. Honestly, that's the wrong perspective to take on that incident yesterday[0]. The issue is one that is hard to identify at low traffic levels (one that only really manifests itself at Wikipedia-scale with Wikipedia-scale caching). There will always be issues like this, unfortunately. The way to mitigate them better is by changing how we bucket requests to new or old versions of the software on production. Currently we bucket by domain name/project site. This doesn't give us a lot of flexibility in testing new versions at scales that can show issues by not be everyone. We would need to be able to deploy new versions based on percentage of overall requests (ie: 5% of all users to new version, then 10% of all users to new version, then everyone). Best, Greg [0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20150527-Cookie -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
This is super awesome Greg. Thanks for making this happen. The deployment schedule has always been a huge source of pain for me. On Wed, May 27, 2015 at 10:19 PM, Greg Grossmeier g...@wikimedia.org wrote: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Engineering mailing list engineer...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/engineering ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
quote name=S Page date=2015-05-27 time=14:58:16 -0700 Benito, Grossmeier! He made the trains run on time [1] Tuesday: New branch cut, deployed to test wikis and mediawiki.org as before, I assume. right right, I just mentally lump mw.org with test wikis ;) -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence
Benito, Grossmeier! He made the trains run on time [1] Tuesday: New branch cut, deployed to test wikis and mediawiki.org as before, I assume. [1] Or not, http://www.transportmyths.co.uk/mussolini.htm -- =S Page WMF Tech writer ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l