Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Romain Manni-Bucau
2018-03-23 9:52 GMT+01:00 Robert Bradshaw :

> To put this in context, this was a brain dump of some of the things I
> encountered while doing the release. Were I to do a release again, it would
> be a lot easier (though still not ideal).
>
> At the high level, rather than focusing on steps, I think it's more
> interesting to focus on why we need a human to do the release. IMHO, the
> role of a human is
>
> 1) Choose a commit. Any commit in master should do, if we had proper test
> and code hygiene.
> 2) Sign the release artifacts. (Frankly, I would be happy with a robot
> signing everything but the tag in github, if the keys could be properly
> maintained.)
> 3) Manage the email thread.
>
> To this end, I would like a process where one would propose a release
> candidate via a lightweight, fast tool. and jenkins (or some other system)
> would recognize the release branch and test, generate artifacts, push
> artifacts, and test against those artifacts (including, ideally, nexus and
> svn staging). One should be able to easily run this locally, but that
> shouldn't be needed. If a human needs to sign, there could be a script for
> one to download artifacts, sign them, and push the signatures. On success,
> an email body (with all the links and details) would be created, to be sent
> out by a human.
>
> More responses inline below (and thanks for the feedback!).
>
>
> On Thu, Mar 22, 2018 at 11:10 PM Romain Manni-Bucau 
> wrote:
>
>> Hi
>>
>> Le 23 mars 2018 04:29, "Alan Myrvold"  a écrit :
>>
>> Robert explained his experience with the release process
>>  as the release
>> engineer for 2.4.0, and we discussed the prototype shell script for
>> checking release progress in pull/4896
>> .
>>
>> I'd like to help automate the release process, initially just checking
>> that all steps look ok, then automating all feasible steps, with the goal
>> of reducing the effort of the release engineer per release to less than 1
>> hour for creating the first RC.
>>
>> Overall, it is a large, scattered process. For someone who has done this
>> many times (like jb@ in the previous release), it is likely easy. Robert
>> was familiar with pypi, making that part easy for him. He was not as
>> familiar with the java release artifacts or nexus, making that more of a
>> challenge.
>>
>>- Several steps are not reversible, so it isn't a restartable process
>>if there are errors.
>>
>> Can we ensure they are done last? Maven introduced the deployAtEnd
>> feature so i guess it can be a model used here.
>>
>
> In retrospect, aside from pushing stuff, things weren't strictly
> irreversible. I got to be good friends with git clean, git checkout, and
> git reset. That last one, however, always feels wrong to use (and is
> error-prone).
>
> I also don't think I'll ever get used to the build system (mvn or gradle)
> editing code and creating commits, but perhaps that's needed with the way
> -SNAPSHOTS are special (though having to edit every file seems overkill).
>

it is fully automated and hidden behind mvn release:prepare normally
(gradle as the equivalent but its design is a bit different so it can be a
bit more natural for some people), did you do it manually?


>
>>- Several steps have high latency; it may take 30min of work before
>>prompting for a password.
>>
>> What prevents to put the pwd in settings.xml or use an agent - to not be
>> prompted?
>>
>
> Foreknowledge :). I did end up using an agent. (Personally, I have some
> qualms putting my gpg password in a plaintext xml file). There were also
> gitbox prompts in some of the steps, and of course svn commit (though the
> svn steps weren't high latency). can't remember if there were others.
>

Can it be worth a word in the doc? Side note: on linux you dont have to put
it in clear.


>
>>- Problems with both his laptop and desktop; GCS wasn't working well
>>on the laptop preventing running tests. Maven extremely slow on his
>>desktop, but he discovered a workaround in his configuration. Would have
>>been nice to use jenkins for most steps instead of relying on
>>laptop/desktop configurations.
>>
>> As JB mentioned, these were environmental issues. But it wasn't clear at
> the time (being new with the release process) and could have been elevated
> had I not had to do so many manual steps (including running the tests). I
> happened to get particularly unlucky with timing with one of them too.
>
>>
>>-
>>- Many of the steps ran the same tests over and over. Sometimes tests
>>were flaky, so needed to restart a long process due to a test that had
>>passed earlier now flaking.
>>
>> Wonder if remote tests shouldnt be mocked *during* a release to avoid
>> that no luck effect.
>>
>
> We should be able to release from a known good commit, as verified by
> jenkins, and never run 

Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Robert Bradshaw
To put this in context, this was a brain dump of some of the things I
encountered while doing the release. Were I to do a release again, it would
be a lot easier (though still not ideal).

At the high level, rather than focusing on steps, I think it's more
interesting to focus on why we need a human to do the release. IMHO, the
role of a human is

1) Choose a commit. Any commit in master should do, if we had proper test
and code hygiene.
2) Sign the release artifacts. (Frankly, I would be happy with a robot
signing everything but the tag in github, if the keys could be properly
maintained.)
3) Manage the email thread.

To this end, I would like a process where one would propose a release
candidate via a lightweight, fast tool. and jenkins (or some other system)
would recognize the release branch and test, generate artifacts, push
artifacts, and test against those artifacts (including, ideally, nexus and
svn staging). One should be able to easily run this locally, but that
shouldn't be needed. If a human needs to sign, there could be a script for
one to download artifacts, sign them, and push the signatures. On success,
an email body (with all the links and details) would be created, to be sent
out by a human.

More responses inline below (and thanks for the feedback!).


On Thu, Mar 22, 2018 at 11:10 PM Romain Manni-Bucau 
wrote:

> Hi
>
> Le 23 mars 2018 04:29, "Alan Myrvold"  a écrit :
>
> Robert explained his experience with the release process
>  as the release
> engineer for 2.4.0, and we discussed the prototype shell script for
> checking release progress in pull/4896
> .
>
> I'd like to help automate the release process, initially just checking
> that all steps look ok, then automating all feasible steps, with the goal
> of reducing the effort of the release engineer per release to less than 1
> hour for creating the first RC.
>
> Overall, it is a large, scattered process. For someone who has done this
> many times (like jb@ in the previous release), it is likely easy. Robert
> was familiar with pypi, making that part easy for him. He was not as
> familiar with the java release artifacts or nexus, making that more of a
> challenge.
>
>- Several steps are not reversible, so it isn't a restartable process
>if there are errors.
>
> Can we ensure they are done last? Maven introduced the deployAtEnd feature
> so i guess it can be a model used here.
>

In retrospect, aside from pushing stuff, things weren't strictly
irreversible. I got to be good friends with git clean, git checkout, and
git reset. That last one, however, always feels wrong to use (and is
error-prone).

I also don't think I'll ever get used to the build system (mvn or gradle)
editing code and creating commits, but perhaps that's needed with the way
-SNAPSHOTS are special (though having to edit every file seems overkill).

>
>- Several steps have high latency; it may take 30min of work before
>prompting for a password.
>
> What prevents to put the pwd in settings.xml or use an agent - to not be
> prompted?
>

Foreknowledge :). I did end up using an agent. (Personally, I have some
qualms putting my gpg password in a plaintext xml file). There were also
gitbox prompts in some of the steps, and of course svn commit (though the
svn steps weren't high latency). can't remember if there were others.

>
>- Problems with both his laptop and desktop; GCS wasn't working well
>on the laptop preventing running tests. Maven extremely slow on his
>desktop, but he discovered a workaround in his configuration. Would have
>been nice to use jenkins for most steps instead of relying on
>laptop/desktop configurations.
>
> As JB mentioned, these were environmental issues. But it wasn't clear at
the time (being new with the release process) and could have been elevated
had I not had to do so many manual steps (including running the tests). I
happened to get particularly unlucky with timing with one of them too.

>
>-
>- Many of the steps ran the same tests over and over. Sometimes tests
>were flaky, so needed to restart a long process due to a test that had
>passed earlier now flaking.
>
> Wonder if remote tests shouldnt be mocked *during* a release to avoid that
> no luck effect.
>

We should be able to release from a known good commit, as verified by
jenkins, and never run tests again.


>
>- Robert was new to Nexus, so setting up permissions and navigating
>the UI was confusing.
>
> When it happens dont hesitate to ping here.
>
>
>-
>- Needed to rebuild the google cloud dataflow containers to get
>dataflow working with the RC, and that ended up being a painful process.
>The github sdk/python/container is part of the portability effort and
>should help eliminate googlers needing to do steps like this with each
>release because that container 

Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Jean-Baptiste Onofré
Hi Alan,

some feedback inline:

>   * Several steps have high latency; it may take 30min of work before 
> prompting
> for a password.

Using GPG agent or  in .m2/settings.xml works fine. It's what
I'm doing in all Apache releases I'm doing.

>   * Problems with both his laptop and desktop; GCS wasn't working well on the
> laptop preventing running tests. Maven extremely slow on his desktop, but 
> he
> discovered a workaround in his configuration. Would have been nice to use
> jenkins for most steps instead of relying on laptop/desktop 
> configurations.

Not sure there, the release has to be done by the release manager IMHO. So,
that's an environment issue, not a release issue.

>   * Many of the steps ran the same tests over and over. Sometimes tests were
> flaky, so needed to restart a long process due to a test that had passed
> earlier now flaking.

Good point, definitely improvements on the tests to do (maybe better split
between utests and itests).

>   * Robert was new to Nexus, so setting up permissions and navigating the UI 
> was
> confusing.

Not a release concern IMHO, Nexus is straight forward for release with staging
repo. I think it's well explained in the release guide.

>   * Needed to rebuild the google cloud dataflow containers to get dataflow
> working with the RC, and that ended up being a painful process. The github
> sdk/python/container is part of the portability effort and should help
> eliminate googlers needing to do steps like this with each release because
> that container can be built externally.

+1

>   * Automated release notes were not seen as valuable due to limitations in 
> any
> automated documentation.

I guess you mean Release Notes available on Jira, so I agree, it's already
easily generated.

>   * Missed a step of changing the java/python version numbers but was able to
> fix that.

Changing version numbers where ? Release plugin (in the prepare goal) already
change the versions in the POMs (like a mvn versions:set). I guess you are
talking about some other files ? Maybe those files could use the project.version
from the pom ?

>   * Some copy/paste errors when creating the voting emails.>   * Many steps 
> are not possible for non-committers.

And it's normal per Apache rule.

Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Apache Beam 2.4.0 release process retrospective and automation possibilities

2018-03-23 Thread Romain Manni-Bucau
Hi

Le 23 mars 2018 04:29, "Alan Myrvold"  a écrit :

Robert explained his experience with the release process
 as the release engineer
for 2.4.0, and we discussed the prototype shell script for checking release
progress in pull/4896 .

I'd like to help automate the release process, initially just checking that
all steps look ok, then automating all feasible steps, with the goal of
reducing the effort of the release engineer per release to less than 1 hour
for creating the first RC.

Overall, it is a large, scattered process. For someone who has done this
many times (like jb@ in the previous release), it is likely easy. Robert
was familiar with pypi, making that part easy for him. He was not as
familiar with the java release artifacts or nexus, making that more of a
challenge.

   - Several steps are not reversible, so it isn't a restartable process if
   there are errors.


Can we ensure they are done last? Maven introduced the deployAtEnd feature
so i guess it can be a model used here.


   -
   - Several steps have high latency; it may take 30min of work before
   prompting for a password.


What prevents to put the pwd in settings.xml or use an agent - to not be
prompted?


   -
   - Problems with both his laptop and desktop; GCS wasn't working well on
   the laptop preventing running tests. Maven extremely slow on his desktop,
   but he discovered a workaround in his configuration. Would have been nice
   to use jenkins for most steps instead of relying on laptop/desktop
   configurations.
   - Many of the steps ran the same tests over and over. Sometimes tests
   were flaky, so needed to restart a long process due to a test that had
   passed earlier now flaking.

Wonder if remote tests shouldnt be mocked *during* a release to avoid that
no luck effect.


   -
   - Robert was new to Nexus, so setting up permissions and navigating the
   UI was confusing.

When it happens dont hesitate to ping here.


   -
   - Needed to rebuild the google cloud dataflow containers to get dataflow
   working with the RC, and that ended up being a painful process. The github
   sdk/python/container is part of the portability effort and should help
   eliminate googlers needing to do steps like this with each release because
   that container can be built externally.

Is there a way to see this process or is it "closed"?


   -
   - Automated release notes were not seen as valuable due to limitations
   in any automated documentation.


If needed i have a script to do it from jira, i filter issue by the label
"changelog". RM must review tickets before the release with that.


   - Missed a step of changing the java/python version numbers but was able
   to fix that.

Is it doable through maven/gradle filtering?


   - Some copy/paste errors when creating the voting emails.


At tomee we had a cli to do it if interested.


   - Many steps are not possible for non-committers.

Sadly intended/expected but you cannhelp on jira review, snapshot
validation before the release.


   - The prototype shell script was seen as helpful, especially since it
   can be restarted. Some concerns over the maintainability of such a large
   shell script.

Move to groovy? Or is it just a size issue?

The steps that should not be automated, and need human involvement:

   1. All emails (propose release, ask for votes)
   2. Picking the commit to start the release
   3. Signing artifacts

Most everything else should be possible to automate, although
non-committers do not have access to logging into jira, nexus, or the
jenkins ui, making some of this tricky to automate for non-contributors.
Also not clear to us how nexus picks he sequential artifact suffix (1031)?


It is a sequence - as db sequence, dont recall if it is per user or repo
(repo from memory but not 100% sure). If that is to automate close and
mailing maybe check sonatype plugin which replace deploy one - ensure to
deactivate auto release - or just nexus api which returns this value. A
maven (gradle) extension should be writable too.


Next steps for me are to enhance the release-checking script, automate
feasible actions, and pair with the next release engineer to make this
smoother, especially if they are at google, but even if they are not.


+1, anything should be bound to release:perform except the vote process and
the postvote tasks which can be automated (dist update, site update,
release staging, )


Alan