Re: [Wikitech-l] Simplifying the WMF deployment cadence
On May 30, 2015 4:07 AM, John Mark Vandenberg jay...@gmail.com wrote: So if a shorter deploy process is implemented, we need to find ways to get bug reports to you sooner, I think work/life balance is going to continue to prevent me personally from getting things any sooner ;) and ensure you are not the only one who can notice and fix bugs related to API breakages, etc. Anyone *can*, just no one really *does* at the moment. The Platform reorg made a team that would have, but then the Engineering reorg changed that plan. We decided to have the new Reading Infrastructure team take on API maintenance, but at the moment the team is short on developers. IIRC your API warnings system can send multiple distinct warnings as a single string, with each warning separated by only a new line, which is especially nasty for user agents to 'understand'. (but this may only be in older versions of the API - I'm not sure) Fixing that is on my todo list. that there would be a warning related to the module used, e.g. result['warnings']['allpages'] , but that didnt exist because this warning was in result['warnings']['query']. The query module is used there too ;) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On Fri, May 29, 2015 at 9:36 PM, Brad Jorsch (Anomie) bjor...@wikimedia.org wrote: On Thu, May 28, 2015 at 2:39 PM, John Mark Vandenberg jay...@gmail.com wrote: [T96942 https://phabricator.wikimedia.org/T96942] was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. At 20:22 in the timezone of the main API developer (me). It was 12 hours before a MediaWiki API fix was submitted to Gerrit, 09:31, basically first thing in the morning for the main API developer. There's really not much to complain about there. In the proposed deploy sequence, 12 hours is a serious amount of time. So if a shorter deploy process is implemented, we need to find ways to get bug reports to you sooner, and ensure you are not the only one who can notice and fix bugs related to API breakages, etc. and it took four additional *days* to get merged. That part does suck. On a positive note, the API breakage this week was rectified much quicker and pywikibot test builds are green again. https://phabricator.wikimedia.org/T100775 https://travis-ci.org/wikimedia/pywikibot-core/builds/64631025 This also doesnt give clients sufficient time to workaround MediaWiki's wonderful intentional API breakages. e.g. raw continue, which completely broke pywikibot and needed a large chunk of code rewritten urgently, both for pywikibot core and the much older and harder to fix pywikibot compat, which is still used as part of processes that wiki communities rely on. The continuation change hasn't actually broken anything yet. Hmm. You still don't appreciate that you actually really truly fair-dinkum broke pywikibot? warnings are part of the API, and adding/changing them can break clients. warnings are one of the more brittle part of the API. The impact on pywikibot core users wasnt so apparent, as the pywikibot core devs fixed the problem when it hit the test servers and it was merged before it hit production servers. Not all users had 'git pull' the latest pywikibot core code, and they informed us their bots were broken, but as far as I am aware we didnt get any pywikibot core bug reports submitted because (often after mentioning problems on IRC) their problems disappeared after they ran 'git pull'. However pywikipedia / compat isnt actively maintained, and it broke badly in production, with some scripts being broken for over a month: https://phabricator.wikimedia.org/T74667 https://phabricator.wikimedia.org/T74749 pywikibot core is gradually improving its understanding of the API warning system, but it isnt well supported yet. As a result, generally pywikibot reports warnings to the user. IIRC your API warnings system can send multiple distinct warnings as a single string, with each warning separated by only a new line, which is especially nasty for user agents to 'understand'. (but this may only be in older versions of the API - I'm not sure) So adding a new warning to the API can result in the same warning appearing many many times on the user console / logs, and thousands of warnings on the screen sends the users into panic mode. I strongly recommend fixing the warning system before using it again aggressively like was done for rawcontinue. e.g. It would be nice if the API emitted codes for each warning scenario (dont reuse codes for similar scenarios), so we don't need to do string matching to detect discard expected warnings, and you can i18n those messages without breaking clients. (I think there is already a phab task for this.) I also strongly recommend that Wikimedia gets heavily involved in decommissioning pywikibot compat bots on Wikimedia servers, and any other very old unmaintained clients, so that the API can be aggressively updated without breaking the many bots still using compat. pywikibot devs did some initial work with WMF staff at Lyon on this front, and we need to keep that moving ahead. Unless pywikibot was treating warnings as errors and that's what broke it? Yes, some of the code was raising an exception when it detected an API warning. However another part of the breakage was that the JSON structure of the new rawcontinue warnings was not what was expected. Some pywikipedia / compat code assumed that the presence of warnings implied that there would be a warning related to the module used, e.g. result['warnings']['allpages'] , but that didnt exist because this warning was in result['warnings']['query']. https://gerrit.wikimedia.org/r/#/c/176910/4/wikipedia.py,cm https://gerrit.wikimedia.org/r/#/c/170075/2/wikipedia.py,cm It's coming soon though. Nor should a large chunk of code *need* rewriting, just add one parameter to your action=query requests. I presume you mean we could have just added rawcontinue='' . Our very limited testing at the time (we didnt have much time to fix the bugs before it would hit production), and subsequent testing, indicates that the rawcontinue parameter can be used even for
Re: [Wikitech-l] Simplifying the WMF deployment cadence
I'll echo those nervous about the faster pace for deploys -- although not so nervous as to dig my feet in and yell stop. Mostly my concerns boil down to the fact that the beta environment isn't really a good test for anything other than absolute crashers. Hardly anyone uses the Collection extension (for example) on beta, for example. I'd *really* like to see some effort put into improving beta. In particular, running beta with an up-to-date (but sanitized, perhaps) mirror of the main WP databases would ensure that we have a decent amount of test cases on beta. We recently spent a couple of hours trying to test Parsoid on beta, only to find out that the behavior we were chasing down was caused by the fact that beta's copy of the IPA formatting templates on enwiki (!) were out-of-date and incomplete. We really need to do better than [[English language]] as far as articles to test on enbeta. --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On Thu, May 28, 2015 at 2:39 PM, John Mark Vandenberg jay...@gmail.com wrote: [T96942 https://phabricator.wikimedia.org/T96942] was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. At 20:22 in the timezone of the main API developer (me). It was 12 hours before a MediaWiki API fix was submitted to Gerrit, 09:31, basically first thing in the morning for the main API developer. There's really not much to complain about there. and it took four additional *days* to get merged. That part does suck. This also doesnt give clients sufficient time to workaround MediaWiki's wonderful intentional API breakages. e.g. raw continue, which completely broke pywikibot and needed a large chunk of code rewritten urgently, both for pywikibot core and the much older and harder to fix pywikibot compat, which is still used as part of processes that wiki communities rely on. The continuation change hasn't actually broken anything yet. It's coming soon though. Nor should a large chunk of code *need* rewriting, just add one parameter to your action=query requests. Unless pywikibot was treating warnings as errors and that's what broke it? Or you're referring to unit tests rather than actual breakage? But a notice about the warnings was sent to mediawiki-api-announce in September 2014,[1] a bit over a month before the warnings started.[2] [1]: https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2014-September/69.html [2]: https://gerrit.wikimedia.org/r/#/c/160222/ Another example is the action=help rewrite not being backwards compatible. pywikibot wasnt broken, as it only uses the help module for older MW releases; but it wouldnt surprise me if there are clients that were parsing the help text and they would have been broken. Comments on that and other proposed changes were requested on mediawiki-api-announce in July 2014,[3] three months before the change[4] was merged. No concerns were raised at the requested location[5] or on the mediawiki-api mailing list. [3]: https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2014-July/62.html [4]: https://gerrit.wikimedia.org/r/#/c/160798/ [5]: https://www.mediawiki.org/wiki/API/Architecture_work/Planning#HTMLizing_action.3Dhelp -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
Le 28/05/2015 20:39, John Mark Vandenberg a écrit : snip This also doesnt give clients sufficient time to workaround MediaWiki's wonderful intentional API breakages. e.g. raw continue, which completely broke pywikibot and needed a large chunk of code rewritten urgently, both for pywikibot core and the much older and harder to fix pywikibot compat, which is still used as part of processes that wiki communities rely on. Another example is the action=help rewrite not being backwards compatible. pywikibot wasnt broken, as it only uses the help module for older MW releases; but it wouldnt surprise me if there are clients that were parsing the help text and they would have been broken. I cant stress how important pywikibot is! It covers so many functionalities and use cases that it is an excellent test stress for the API. A low hanging fruit would be to run its test suite against beta (which runs tip of master) on an hourly basis. -- Antoine hashar Musso ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On Fri, May 29, 2015 at 2:07 AM, Greg Grossmeier g...@wikimedia.org wrote: quote name=John Mark Vandenberg date=2015-05-29 time=01:39:52 +0700 It was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. It was 12 hours before a MediaWiki API fix was submitted to Gerrit, and it took four additional *days* to get merged. The Phabricator task was marked Unbreak Now! all that time. Which shows the tooling works, but not the social aspects. The backport process (eg SWAT and related things) will improve soon as well which should address much of this. Your tooling depends on pywikibot developers (all volunteers) merging a patch within your branch-deploy cycle, which fires off a Travis-CI build of *pywikibot* unit tests which runs some tests against test.wikipedia.org and test.wikidata.org ? And your proposing to shorten the window in which all this can happen and get useful bug reports out. A little crazy but OK. The biggest problem with that approach is Travis-CI is not very reliable - often they are backlogged and tests are not run for days. So I suggest that you arrange to run the pywikibot tests daily (or more regularly) on WMF test/beta servers, and the unit tests of any other client which is a critical part of processes on the Wikimedia wikis. Not-a-great-response-but: can you specifically ping me in phabricator (I'm @greg) for issues like that above? That is a process problem. The MediaWiki ops devs need to detect escalate massive API breakages, especially after creating the fix which needs to be code reviewed. -- John Vandenberg ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
quote name=John Mark Vandenberg date=2015-05-29 time=04:11:05 +0700 On Fri, May 29, 2015 at 2:07 AM, Greg Grossmeier g...@wikimedia.org wrote: quote name=John Mark Vandenberg date=2015-05-29 time=01:39:52 +0700 It was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. It was 12 hours before a MediaWiki API fix was submitted to Gerrit, and it took four additional *days* to get merged. The Phabricator task was marked Unbreak Now! all that time. Which shows the tooling works, but not the social aspects. The backport process (eg SWAT and related things) will improve soon as well which should address much of this. Your tooling depends on pywikibot developers (all volunteers) merging a patch within your branch-deploy cycle, which fires off a Travis-CI build of *pywikibot* unit tests which runs some tests against test.wikipedia.org and test.wikidata.org ? And your proposing to shorten the window in which all this can happen and get useful bug reports out. That's not my tooling, that's pywikibot's ;). But, the point is, there was a problem identified in your testing that was reported and fix submitted in a reasonable amount of time. The failure to get it merged, however, was the failure. A little crazy but OK. The biggest problem with that approach is Travis-CI is not very reliable - often they are backlogged and tests are not run for days. So I suggest that you arrange to run the pywikibot tests daily (or more regularly) on WMF test/beta servers, and the unit tests of any other client which is a critical part of processes on the Wikimedia wikis. I would support having pywikibot use WMF hosted integration testing. Please file a task with your current setup in the #continuous-integration-config project: https://phabricator.wikimedia.org/project/profile/1208/ Not-a-great-response-but: can you specifically ping me in phabricator (I'm @greg) for issues like that above? That is a process problem. The MediaWiki ops devs need to detect escalate massive API breakages, especially after creating the fix which needs to be code reviewed. Concur. -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On May 28, 2015 8:40 PM, John Mark Vandenberg jay...@gmail.com wrote: On Fri, May 29, 2015 at 12:17 AM, Legoktm legoktm.wikipe...@gmail.com wrote: On 05/27/2015 01:19 PM, Greg Grossmeier wrote: Hi all, New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This means that if we/users spot a bug once the train hits Wikipedias, or the bug is in an extension like PageTriage which is only used on the English Wikipedia, we have to: rush to make the 4pm SWAT window, deploy on Friday, or wait until Monday; which from what I remember were similar reasons from when we moved the train from Thursday to Wednesday. Recent API breakages suggest that this doesnt give enough time for client tests to be run, bugs reported, fixed and merged. https://phabricator.wikimedia.org/T96942 was an API bug last month which completely broke pywikibot. All wikis; all use cases. It was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. It was 12 hours before a MediaWiki API fix was submitted to Gerrit, and it took four additional *days* to get merged. The Phabricator task was marked Unbreak Now! all that time. Shouldnt such tests be run against beta wiki not testwiki? --bawolff ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
quote name=Legoktm date=2015-05-28 time=10:17:19 -0700 This means that if we/users spot a bug once the train hits Wikipedias, or the bug is in an extension like PageTriage which is only used on the English Wikipedia, we have to: rush to make the 4pm SWAT window, deploy on Friday, or wait until Monday; which from what I remember were similar reasons from when we moved the train from Thursday to Wednesday. Emergency bug fixes are already OK on Fridays (just not I want my new feature out). -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On Fri, May 29, 2015 at 12:17 AM, Legoktm legoktm.wikipe...@gmail.com wrote: On 05/27/2015 01:19 PM, Greg Grossmeier wrote: Hi all, New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This means that if we/users spot a bug once the train hits Wikipedias, or the bug is in an extension like PageTriage which is only used on the English Wikipedia, we have to: rush to make the 4pm SWAT window, deploy on Friday, or wait until Monday; which from what I remember were similar reasons from when we moved the train from Thursday to Wednesday. Recent API breakages suggest that this doesnt give enough time for client tests to be run, bugs reported, fixed and merged. https://phabricator.wikimedia.org/T96942 was an API bug last month which completely broke pywikibot. All wikis; all use cases. It was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. It was 12 hours before a MediaWiki API fix was submitted to Gerrit, and it took four additional *days* to get merged. The Phabricator task was marked Unbreak Now! all that time. This also doesnt give clients sufficient time to workaround MediaWiki's wonderful intentional API breakages. e.g. raw continue, which completely broke pywikibot and needed a large chunk of code rewritten urgently, both for pywikibot core and the much older and harder to fix pywikibot compat, which is still used as part of processes that wiki communities rely on. Another example is the action=help rewrite not being backwards compatible. pywikibot wasnt broken, as it only uses the help module for older MW releases; but it wouldnt surprise me if there are clients that were parsing the help text and they would have been broken. -- John Vandenberg ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
quote name=John Mark Vandenberg date=2015-05-29 time=01:39:52 +0700 It was reported by pywikibot devs almost as soon as we detected that the test wikis were failing in our travis-ci tests. It was 12 hours before a MediaWiki API fix was submitted to Gerrit, and it took four additional *days* to get merged. The Phabricator task was marked Unbreak Now! all that time. Which shows the tooling works, but not the social aspects. The backport process (eg SWAT and related things) will improve soon as well which should address much of this. Not-a-great-response-but: can you specifically ping me in phabricator (I'm @greg) for issues like that above? -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
On 05/27/2015 01:19 PM, Greg Grossmeier wrote: Hi all, New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This means that if we/users spot a bug once the train hits Wikipedias, or the bug is in an extension like PageTriage which is only used on the English Wikipedia, we have to: rush to make the 4pm SWAT window, deploy on Friday, or wait until Monday; which from what I remember were similar reasons from when we moved the train from Thursday to Wednesday. -- Legoktm ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Simplifying the WMF deployment cadence
Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
Is there still the same week gap between version deployments to catch bugs? On Wed, May 27, 2015 at 4:19 PM, Greg Grossmeier g...@wikimedia.org wrote: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
quote name=John date=2015-05-27 time=16:24:17 -0400 Is there still the same week gap between version deployments to catch bugs? No, the time to Wikipedias from branch cut is 2 days. I trust our code review, integration, and testing workflows. If it turns out this is too aggressive, we'll switch back to the previous cadence. Greg -- | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D | ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Simplifying the WMF deployment cadence
Il 27/05/2015 22:19, Greg Grossmeier ha scritto: Hi all, Starting the week of June 8th we'll be transitioning our MediaWiki + Extensions deployment cadence to a shorter/simpler one. This will begin with 1.26wmf9. New cadence: Tuesday: New branch cut, deployed to test wikis Wednesday: deployed to non-wikipedias Thursday: deployed to Wikipedias This is not only a lot simpler to understand (wait, we deploy twice on Wednesday?) but it also shortens the time to get code to everyone (2 or 3 days from branch cut, depending on how you count). Two days... this is awesome. == Transition == Transitions from one cadence to another are hard. Here's how we'll be doing this transition: Week of June 1st (next week): * We'll complete the wmf8 rollout on June 3rd * However, we won't be cutting wmf9 on June 3rd Week of June 8th (in two weeks): * We'll begin the new cadence with wmf9 on Tuesday June 9th I hope this helps our users and developers get great new features and fixes faster. Greg endnotes: * The task: https://phabricator.wikimedia.org/T97553 * I'll be updating the relevant documentation before the transition ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l