Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-22 Thread Greg Grossmeier
What Alex said. This is for one off type things.

--
Sent from my phone, please excuse brevity.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-22 Thread Jaime Crespo
Let me clarify the reasoning for the idea:

We realized that some schema changes (which used to be scheduled like other
deployments) no longer take 1 hour (they can take 1 month, running
continuously like https://phabricator.wikimedia.org/T139090 , because it
affects 3 of our largest tables). Also, they no longer requires read-only
mode or affect code in anyway (unless they are a prerequisite).

On the other side, a schema change, combined with high read or write load
from long-running maintenance jobs, like those of the updateCollation
script, or any other (those where just an example), could potentially make
lagging a worse problem: a single transaction has to store pending changes
during its lifetime, or long-running reads can block and create pileups due
to metadata locking. We want to avoid those, which certainly caused
infrastructure issues in the past.

So, in summary, regular deployments are exclusive from each others.
Long-running maintenance work could affect each other. This is a way for me
(and others) to have visibility of those potential negative interactions,
and make sure we can coordinate: "You are doing work on enwiki? No problem,
we will just run this task for commons". "you need to do an emergency data
recovery? I will wait to do this other task that can wait". Even if only
DBAs use it, it is already useful to not perform incompatible changes at
the same time. But it will be even more useful if everybody uses it!

On Thu, Sep 22, 2016 at 4:27 PM, Alex Monk  wrote:

> I had been assuming that puppetised crons were not really relevant...
>
> On 22 September 2016 at 15:19, Guillaume Lederrey  > wrote:
>
>> Hello!
>>
>> Increasing visibility sounds like a great idea! How far do we want to
>> go in that direction? In particular, I'm thinking of a few of the
>> crons we have for Cirrus. For example, we do have daily crons on
>> terbium that re-generate the suggester indices. Those can run for >
>> 1h.
>>
>> My understanding is that those kind of crons should not be considered
>> scripts, but standard working parts of the system. Adding them will
>> probably generate more noise than useful information. Is this a
>> reasonable understanding?
>>
>> Thanks!
>>
>>Guillaume
>>
>>
>>
>> On Wed, Sep 21, 2016 at 12:29 AM, Greg Grossmeier 
>> wrote:
>> > In an effort to reduce surprises and potential mishaps it is now
>> > required to include any long running tasks in the deployment
>> > calendar[0].
>> >
>> > "Long running tasks" include any script that is run on production 'work
>> > machines' such as terbium that last for longer than ~1 hour. Think:
>> > migration and maintenance scripts.
>> >
>> > This was discussed and proposed in T144661[1].
>> >
>> > Best,
>> >
>> > Greg
>> >
>> > [0] https://wikitech.wikimedia.org/wiki/Deployments
>> > Relevant diff:
>> > https://wikitech.wikimedia.org/w/index.php?diff=850923=850244
>> > [1] https://phabricator.wikimedia.org/T144661
>> >
>> > --
>> > | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
>> > | Release Team ManagerA18D 1138 8E47 FAC8 1C7D |
>> >
>> > ___
>> > Engineering mailing list
>> > engineer...@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/engineering
>> >
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Operations Engineer, Discovery
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Alex Monk
> VisualEditor/Editing team
> https://wikimediafoundation.org/wiki/User:Krenair_(WMF)
>
> ___
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>
>


-- 
Jaime Crespo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-22 Thread Guillaume Lederrey
I assumed the same, but better be explicit about those assumption :)

On Thu, Sep 22, 2016 at 4:27 PM, Alex Monk  wrote:
> I had been assuming that puppetised crons were not really relevant...
>
> On 22 September 2016 at 15:19, Guillaume Lederrey 
> wrote:
>>
>> Hello!
>>
>> Increasing visibility sounds like a great idea! How far do we want to
>> go in that direction? In particular, I'm thinking of a few of the
>> crons we have for Cirrus. For example, we do have daily crons on
>> terbium that re-generate the suggester indices. Those can run for >
>> 1h.
>>
>> My understanding is that those kind of crons should not be considered
>> scripts, but standard working parts of the system. Adding them will
>> probably generate more noise than useful information. Is this a
>> reasonable understanding?
>>
>> Thanks!
>>
>>Guillaume
>>
>>
>>
>> On Wed, Sep 21, 2016 at 12:29 AM, Greg Grossmeier 
>> wrote:
>> > In an effort to reduce surprises and potential mishaps it is now
>> > required to include any long running tasks in the deployment
>> > calendar[0].
>> >
>> > "Long running tasks" include any script that is run on production 'work
>> > machines' such as terbium that last for longer than ~1 hour. Think:
>> > migration and maintenance scripts.
>> >
>> > This was discussed and proposed in T144661[1].
>> >
>> > Best,
>> >
>> > Greg
>> >
>> > [0] https://wikitech.wikimedia.org/wiki/Deployments
>> > Relevant diff:
>> > https://wikitech.wikimedia.org/w/index.php?diff=850923=850244
>> > [1] https://phabricator.wikimedia.org/T144661
>> >
>> > --
>> > | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
>> > | Release Team ManagerA18D 1138 8E47 FAC8 1C7D |
>> >
>> > ___
>> > Engineering mailing list
>> > engineer...@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/engineering
>> >
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Operations Engineer, Discovery
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
>
> --
> Alex Monk
> VisualEditor/Editing team
> https://wikimediafoundation.org/wiki/User:Krenair_(WMF)
>
> ___
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>



-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-22 Thread Guillaume Lederrey
Hello!

Increasing visibility sounds like a great idea! How far do we want to
go in that direction? In particular, I'm thinking of a few of the
crons we have for Cirrus. For example, we do have daily crons on
terbium that re-generate the suggester indices. Those can run for >
1h.

My understanding is that those kind of crons should not be considered
scripts, but standard working parts of the system. Adding them will
probably generate more noise than useful information. Is this a
reasonable understanding?

Thanks!

   Guillaume



On Wed, Sep 21, 2016 at 12:29 AM, Greg Grossmeier  wrote:
> In an effort to reduce surprises and potential mishaps it is now
> required to include any long running tasks in the deployment
> calendar[0].
>
> "Long running tasks" include any script that is run on production 'work
> machines' such as terbium that last for longer than ~1 hour. Think:
> migration and maintenance scripts.
>
> This was discussed and proposed in T144661[1].
>
> Best,
>
> Greg
>
> [0] https://wikitech.wikimedia.org/wiki/Deployments
> Relevant diff:
> https://wikitech.wikimedia.org/w/index.php?diff=850923=850244
> [1] https://phabricator.wikimedia.org/T144661
>
> --
> | Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
> | Release Team ManagerA18D 1138 8E47 FAC8 1C7D |
>
> ___
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>



-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Long running tasks/scripts now included on [[wikitech:Deployments]]

2016-09-20 Thread Greg Grossmeier

> How far ahead of time must entries be added?

I trust most people's best judgement here :)

But, as soon as you have an idea of when you'll do it is the best time
to do it. Obviously if the task will impact others and/or require no
other deploys at the same time that should be scheduled further in
advance (~1 week) than something that is low-risk and will be completed
in 2-3 hours.

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| Release Team ManagerA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l