STINNER Victor <[email protected]> added the comment:
Charris, Pablo and me identified that TCP connections are closed by the load
balancer on some buildbot workers.
When the "buildbot.python.org" host name is used, TCP connections (tcp port
9020) go through a load balancer.
Ernest exposed the TCP port 9020 directly to the Internet (without the load
balancer) using a new host name: "buildbot-api.python.org".
Buildbot workers should be updated to use "buildbot-api.python.org". I also
suggest to use a keepalive of 60 seconds, rather than 600 seconds.
If your worker got impacted the this issue, I strongly advice you to clean up
manually the temporary directory (/tmp). When a worker was disconnected, the
build was interrupted without removing temporary files. On some workers, we got
around 20 GB of temporary files in /tmp: "ccXXXX" files and "tmpXXXX" files. I
guess that some files are coming from the compiler, some other from the Python
test suite.
I updated the buildbot client configuration of the 9 workers operated by Red
Hat:
Fedora Rawhide x64-86
Fedora Stable x64-86
RHEL8 x64-86
RHEL7 x64-86
RHEL8 FIPS x86-64
Fedora Rawhide AArch64
Fedora Stable AArch64
RHEL 8 ppc64le
RHEL 7 ppc64le
On our owners, I used the following commands:
systemctl stop buildbot-worker.service
du -sh /tmp; rm -f /tmp/{cc,tmp}*; du -sh /tmp
sed -i -e "s/buildmaster_host = 'buildbot.python.org'/buildmaster_host =
'buildbot-api.python.org'/;s/keepalive = .*/keepalive = 60/"
/home/buildbot/buildarea/buildbot.tac; grep -E '(host|keepalive) ='
/home/buildbot/buildarea/buildbot.tac
systemctl start buildbot-worker.service
systemctl status buildbot-worker.service
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41642>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com