On 30.05.2018 13:01, Victor Stinner wrote:
Hi,

I would like to delegate the maintenance task "watch buildbots", since
I'm already very busy with many other maintenance tasks. I'm looking
for volunteers to handle incoming emails on buildbot-status. I already
started to explain to Pablo Galindo Salgado how to do that, but it
would be great to have at least two people doing this task. Otherwise,
Pablo wouldn't be able to take holiday or just make a break for any
reason. Buildbots are evil beast which require care every day.
Otherwise, they quickly turn red and become less useful :-(

It seems like the first blocker issue is that we have no explicit
documentation "how to deal with buildbots?" (the devguide
documentation is incomplete, it doesn't explain what I'm explaining
below). Let me start with a few notes of how I watch buildbots.

I'm getting buildbot notifications on IRC (#python-dev on Freenode)
and on the buildbot-status mailing list:
https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/

When a buildbot fails, I look at tests logs and I try to check if an
issue has already been reported. For example, search for the test
method in title (ex: "test_complex" for test_complex() method). If no
result, search using the test filename (ex: "test_os" for
Lib/test/test_os.py). If there is no result, repeat with full text
searchs ("All Text"). If you cannot find any open bug, create a new
one:

* The title should contain the test name, test method and the buildbot
name. Example: " test_posix: TestPosixSpawn fails on PPC64 Fedora
3.x".
* The description should contain the link to the buildbot failure. Try
to identify useful parts of tests log and copy them in the
description.
* Fill the Python version field (ex: "3.8" for 3.x buildbots)
* Select at least the "Tests" Component. You may select additional
Components depending on the bug.

If a bug was already open, you may add a comment to mention that there
is a new failure: add at least a link to buildbot name and a link to
the failure.

And that's all! Simple, isn't it? At this stage, there is no need to
investigate the test failure.

To finish, reply to the failure notification on the mailing list with
a very short email: add a link to the existing or the freshly created
issue, maybe copy one line of the failure and/or the issue title.

Recent bug example: https://bugs.python.org/issue33630

--

Later, you may want to analyze these failures, but I consider that
it's a different job (different "maintenance task"). If you don't feel
able to analyze the bug, you may try to find someone who knows more
than you about the failure.

For better bug reports, you can look at the [Changes] tab of a build
failure, and try to identify which recent change introduced the
regression. This task requires to follow recent commits, since
sometimes the failure is old, it's just that the test fails randomly
depending on network issues, system load, or anything else. Sometimes,
previous tests have side effects. Or the buildbot owner made a change
on the system. There are many different explanation, it's hard to
write a complete list. It's really on a case by case basis.

Hopefully, it's now more common that a buildbot failure is obvious and
caused by a very specific recent changes which can be found in the
[Changes] tab.

--

If you are interested to help me on watching our CIs: please come on
the python-build...@python.org mailing list! Introduce yourself and
explain how do you plan to help. I may propose to mentor you to assist
you the first weeks.

As I wrote, maybe a first step would be to write down a documentation
how to deal with buildbots and/or update and complete existing
documentations.

https://devguide.python.org/buildbots/

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru

What's the big idea of separate buildbots anyway? I thought the purpose of CI is to test everything _before_ it breaks the main codebase. Then it's the job of the contributor rather than maintainer to fix any breakages.

So, maybe making them be driven by Github checks would be a better time investment. Especially since we've got VSTS checks just recently, so whoever was doing that still knows how to interface with this Github machinery.

If the bots cancel a previous build if a new one for the same PR arrives, this will not lead to a significant load difference 'cuz the number of actively developed PRs is stable and roughly equal to the number of merges according to the open/closed tickets dynamics.

--
Regards,
Ivan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to