Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Greg Grossmeier
quote name=Greg Grossmeier date=2015-05-28 time=12:18:07 -0700
 See also:
 https://phabricator.wikimedia.org/maniphest/?statuses=open%28%29projects=PHID-PROJ-4uc7r7pdosfsk55qg7f6#R
 (aka: open tasks filed in the #Wikimedia-log-errors project on
 phabricator)

A maybe better way to view that query:
https://phabricator.wikimedia.org/maniphest/query/8G19mXxCyGox/#R

That one is sorted by (phabricator) project, so you can more easily see
which code repository is (probably) responsible for the error.

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Greg Grossmeier
quote name=Mukunda Modell date=2015-05-28 time=13:42:50 -0500
 This also means we need to be even more diligent about policing the error
 logs and eliminating noise which obscures real problems by burying them
 among the other log messages. 

+1000

See also:
https://phabricator.wikimedia.org/maniphest/?statuses=open%28%29projects=PHID-PROJ-4uc7r7pdosfsk55qg7f6#R
(aka: open tasks filed in the #Wikimedia-log-errors project on
phabricator)

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Brian Wolff
On May 28, 2015 4:21 PM, Jon Robson jdlrob...@gmail.com wrote:

 I suspect the idea is to lean more on our quality assurance infrastructure
 e.g. browser tests which I fully welcome.

 The more developed they become the less chance of regressions making it to
 code let alone our projects.

 When I joined 3 years ago we had no quality assurance infrastructure and
 now we've got things in a great place. They still need a little fine
tuning
 but this should help us iron out the kinks by forcing us to rely on them
 more and push out better code.

Significantly better than three years ago, sure. However I would not use
the phrase great place. There are still significant gaps in our coverage.
For browser tests in particular Im given to understand that the
asynchronous nature and somewhat high false positive rate make them not be
taken as seriously as they should.

Fwiw, i have on several occasions recieved reports from users that I broke
something (obviously i try to avoid that, but im far from perfect). I have
never once had a browser test succesfully tell me I broke something  in
advanced (albeit it could be because i do backend things, but backend
things do affect the front end when they explode). Occasionally the unit
tests do, but i would still say there is a lot they dont.

Tl;dr: imo tests are great, but no where near replacing actual testing, at
least for now.

That said, i dont think that the new deployment schedule will cause any
problems, and is at the very least worth trying.

--bawolff

P.s. anyone remember writing code in the time between 1.16 and 1.17? You
are all spoiled :p
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Dan Garry
Awesome! This will make many teams very happy since they'll be moving
faster.

What's the criteria by which you will evaluate the success of this?

Thanks,
Dan
On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote:

 Hi all,

 Starting the week of June 8th we'll be transitioning our MediaWiki +
 Extensions deployment cadence to a shorter/simpler one. This will begin
 with 1.26wmf9.

 New cadence:
 Tuesday: New branch cut, deployed to test wikis
 Wednesday: deployed to non-wikipedias
 Thursday: deployed to Wikipedias

 This is not only a lot simpler to understand (wait, we deploy twice on
 Wednesday?) but it also shortens the time to get code to everyone (2 or
 3 days from branch cut, depending on how you count).

 == Transition ==
 Transitions from one cadence to another are hard. Here's how we'll be
 doing this transition:

 Week of June 1st (next week):
 * We'll complete the wmf8 rollout on June 3rd
 * However, we won't be cutting wmf9 on June 3rd

 Week of June 8th (in two weeks):
 * We'll begin the new cadence with wmf9 on Tuesday June 9th


 I hope this helps our users and developers get great new features and
 fixes faster.

 Greg

 endnotes:
 * The task: https://phabricator.wikimedia.org/T97553
 * I'll be updating the relevant documentation before the transition

 --
 | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
 | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Jon Robson
I suspect the idea is to lean more on our quality assurance infrastructure
e.g. browser tests which I fully welcome.

The more developed they become the less chance of regressions making it to
code let alone our projects.

When I joined 3 years ago we had no quality assurance infrastructure and
now we've got things in a great place. They still need a little fine tuning
but this should help us iron out the kinks by forcing us to rely on them
more and push out better code.
On 28 May 2015 3:53 pm, Risker risker...@gmail.com wrote:

 This is strictly a question from an uninvolved observer.  Does this
 schedule provide for sufficient time and real-time/hands-on testing before
 changes hit the big projects?

 An IRC discussion I was following last evening suggested to me that the
 first deploy (to test wikis and mw.org) probably did not get sufficient
 hands-on testing/utilization to surface many issues that would be
 significant on production wikis, which means only 24 hours on smaller
 non-wikipedia wikis, hoping that any problems will pop up before it's
 applied to dewiki, frwiki and enwiki.

 I recognize the challenges in balancing continuous improvement and uptime -
 but if problems aren't surfaced before they hit wikipedias simply because
 the changes aren't activated by user actions or the problems aren't
 reported quickly enough, then it's probably going to make more work at the
 other end of the chain, with more likelihood that changes will need to be
 rolled back or patches having to be written on the fly.  I have a lot of
 admiration for all of you who address these unplanned situations (it really
 is impressive to watch!), but I'd hate to see a lot of people constantly
 being pulled away from other tasks to problem-solve downtimes on big
 projects.

 Risker/Anne


 On 28 May 2015 at 07:51, Dan Garry dga...@wikimedia.org wrote:

  Awesome! This will make many teams very happy since they'll be moving
  faster.
 
  What's the criteria by which you will evaluate the success of this?
 
  Thanks,
  Dan
  On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote:
 
   Hi all,
  
   Starting the week of June 8th we'll be transitioning our MediaWiki +
   Extensions deployment cadence to a shorter/simpler one. This will begin
   with 1.26wmf9.
  
   New cadence:
   Tuesday: New branch cut, deployed to test wikis
   Wednesday: deployed to non-wikipedias
   Thursday: deployed to Wikipedias
  
   This is not only a lot simpler to understand (wait, we deploy twice on
   Wednesday?) but it also shortens the time to get code to everyone (2
 or
   3 days from branch cut, depending on how you count).
  
   == Transition ==
   Transitions from one cadence to another are hard. Here's how we'll be
   doing this transition:
  
   Week of June 1st (next week):
   * We'll complete the wmf8 rollout on June 3rd
   * However, we won't be cutting wmf9 on June 3rd
  
   Week of June 8th (in two weeks):
   * We'll begin the new cadence with wmf9 on Tuesday June 9th
  
  
   I hope this helps our users and developers get great new features and
   fixes faster.
  
   Greg
  
   endnotes:
   * The task: https://phabricator.wikimedia.org/T97553
   * I'll be updating the relevant documentation before the transition
  
   --
   | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
   | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |
  
   ___
   Engineering mailing list
   engineer...@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/engineering
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Risker
This is strictly a question from an uninvolved observer.  Does this
schedule provide for sufficient time and real-time/hands-on testing before
changes hit the big projects?

An IRC discussion I was following last evening suggested to me that the
first deploy (to test wikis and mw.org) probably did not get sufficient
hands-on testing/utilization to surface many issues that would be
significant on production wikis, which means only 24 hours on smaller
non-wikipedia wikis, hoping that any problems will pop up before it's
applied to dewiki, frwiki and enwiki.

I recognize the challenges in balancing continuous improvement and uptime -
but if problems aren't surfaced before they hit wikipedias simply because
the changes aren't activated by user actions or the problems aren't
reported quickly enough, then it's probably going to make more work at the
other end of the chain, with more likelihood that changes will need to be
rolled back or patches having to be written on the fly.  I have a lot of
admiration for all of you who address these unplanned situations (it really
is impressive to watch!), but I'd hate to see a lot of people constantly
being pulled away from other tasks to problem-solve downtimes on big
projects.

Risker/Anne


On 28 May 2015 at 07:51, Dan Garry dga...@wikimedia.org wrote:

 Awesome! This will make many teams very happy since they'll be moving
 faster.

 What's the criteria by which you will evaluate the success of this?

 Thanks,
 Dan
 On 27 May 2015 10:19 pm, Greg Grossmeier g...@wikimedia.org wrote:

  Hi all,
 
  Starting the week of June 8th we'll be transitioning our MediaWiki +
  Extensions deployment cadence to a shorter/simpler one. This will begin
  with 1.26wmf9.
 
  New cadence:
  Tuesday: New branch cut, deployed to test wikis
  Wednesday: deployed to non-wikipedias
  Thursday: deployed to Wikipedias
 
  This is not only a lot simpler to understand (wait, we deploy twice on
  Wednesday?) but it also shortens the time to get code to everyone (2 or
  3 days from branch cut, depending on how you count).
 
  == Transition ==
  Transitions from one cadence to another are hard. Here's how we'll be
  doing this transition:
 
  Week of June 1st (next week):
  * We'll complete the wmf8 rollout on June 3rd
  * However, we won't be cutting wmf9 on June 3rd
 
  Week of June 8th (in two weeks):
  * We'll begin the new cadence with wmf9 on Tuesday June 9th
 
 
  I hope this helps our users and developers get great new features and
  fixes faster.
 
  Greg
 
  endnotes:
  * The task: https://phabricator.wikimedia.org/T97553
  * I'll be updating the relevant documentation before the transition
 
  --
  | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
  | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |
 
  ___
  Engineering mailing list
  engineer...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/engineering
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Greg Grossmeier
quote name=Dan Garry date=2015-05-28 time=13:51:47 +0200
 Awesome! This will make many teams very happy since they'll be moving
 faster.

:)

 What's the criteria by which you will evaluate the success of this?

1) the above (happier teams)
2) It's going to be hard to measure success but it'll be much easier
to identify failure. I'll be talking with many PMs from WMF and looking
through the SWAT deploys and incidents over the next two or more weeks
(and on going, of course) to determine if this has caused any
unmitigated pain.

Greg

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-28 Thread Greg Grossmeier
quote name=Risker date=2015-05-28 time=09:53:31 -0400
 This is strictly a question from an uninvolved observer.  Does this
 schedule provide for sufficient time and real-time/hands-on testing before
 changes hit the big projects?

Yes. We still have Beta Cluster (production-like environment) which runs
all code merged into master within 10 minutes of it being merged.

 An IRC discussion I was following last evening suggested to me that the
 first deploy (to test wikis and mw.org) probably did not get sufficient
 hands-on testing/utilization to surface many issues that would be
 significant on production wikis, which means only 24 hours on smaller
 non-wikipedia wikis, hoping that any problems will pop up before it's
 applied to dewiki, frwiki and enwiki.

Honestly, that's the wrong perspective to take on that incident
yesterday[0]. The issue is one that is hard to identify at low traffic
levels (one that only really manifests itself at Wikipedia-scale with
Wikipedia-scale caching). There will always be issues like this,
unfortunately. The way to mitigate them better is by changing how we
bucket requests to new or old versions of the software on production.

Currently we bucket by domain name/project site. This doesn't give us a
lot of flexibility in testing new versions at scales that can show
issues by not be everyone. We would need to be able to deploy new
versions based on percentage of overall requests (ie: 5% of all users to
new version, then 10% of all users to new version, then everyone).

Best,

Greg

[0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20150527-Cookie

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-27 Thread Jon Robson
This is super awesome Greg. Thanks for making this happen. The deployment
schedule has always been a huge source of pain for me.

On Wed, May 27, 2015 at 10:19 PM, Greg Grossmeier g...@wikimedia.org
wrote:

 Hi all,

 Starting the week of June 8th we'll be transitioning our MediaWiki +
 Extensions deployment cadence to a shorter/simpler one. This will begin
 with 1.26wmf9.

 New cadence:
 Tuesday: New branch cut, deployed to test wikis
 Wednesday: deployed to non-wikipedias
 Thursday: deployed to Wikipedias

 This is not only a lot simpler to understand (wait, we deploy twice on
 Wednesday?) but it also shortens the time to get code to everyone (2 or
 3 days from branch cut, depending on how you count).

 == Transition ==
 Transitions from one cadence to another are hard. Here's how we'll be
 doing this transition:

 Week of June 1st (next week):
 * We'll complete the wmf8 rollout on June 3rd
 * However, we won't be cutting wmf9 on June 3rd

 Week of June 8th (in two weeks):
 * We'll begin the new cadence with wmf9 on Tuesday June 9th


 I hope this helps our users and developers get great new features and
 fixes faster.

 Greg

 endnotes:
 * The task: https://phabricator.wikimedia.org/T97553
 * I'll be updating the relevant documentation before the transition

 --
 | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
 | identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-27 Thread Greg Grossmeier
quote name=S Page date=2015-05-27 time=14:58:16 -0700
 Benito, Grossmeier! He made the trains run on time [1]
 
  Tuesday: New branch cut, deployed to test wikis
 and mediawiki.org as before, I assume.

right right, I just mentally lump mw.org with test wikis ;)

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

2015-05-27 Thread S Page
Benito, Grossmeier! He made the trains run on time [1]

 Tuesday: New branch cut, deployed to test wikis
and mediawiki.org as before, I assume.

[1] Or not, http://www.transportmyths.co.uk/mussolini.htm

-- 
=S Page  WMF Tech writer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l