[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor


STINNER Victor  added the comment:

The buildbot server migrated to a new machine and is now behind a load 
balancer. tcp/80 (buildbot web page, HTTP) and tcp/9020 (used by buildbot 
workers) are both behind the load balancer.

Maybe the load balancer closes TCP connections which are idle for 60 seconds?

Buildbot workers have a TCP keepalive option of 1 hour (3600 seconds) by 
default:
https://docs.buildbot.net/latest/manual/configuration/workers.html#master-worker-tcp-keepalive

Pablo told me that his worker uses a keepalive of 2 minutes (120 seconds).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor


STINNER Victor  added the comment:

On the worker (client) side, I see many "lost remote step" every 1 to 3 
minutes. Example with the PPC64LE Fedora Stable 
(cstratak-fedora-stable-ppc64le) worker:

2020-08-27 01:30:09-0400 [Broker,client] lost remote step
2020-08-27 01:31:57-0400 [Broker,client] lost remote step
2020-08-27 01:34:34-0400 [Broker,client] lost remote step
2020-08-27 01:36:29-0400 [Broker,client] lost remote step
2020-08-27 01:38:37-0400 [Broker,client] lost remote step
(...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor


STINNER Victor  added the comment:

> I have found a large number of un-removed files in /tmp.

Right. I found many /tmp/cc.XXX and /tmp/tmpX files. Around 20 GB of 
these files! Maybe using passing "-pipe" to gcc/clang would avoid the 
/tmp/cc.XXX files when a build is interrupted. For example, I saw assembly 
files (.s) of around 20 MB.

I don't know what are the /tmp/tmpX files.

I'm disappointed that in 2020, buildbot has no safe way to ensure that all 
created files are removed at the end of a build. chroot, containers, etc. are 
effecient way to ensure that everything is removed at the end of a build.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread David Edelsohn


David Edelsohn  added the comment:

I have found a large number of un-removed files in /tmp.  Things seem to 
function better with Buildbots running older 0.x "buildslave" as opposed to 
newer "builtbot-worker" instances.

--
nosy: +David.Edelsohn
title: Buildbot: workers detached every minute and "no space left on device" 
issue -> RHEL and fedora buildbots fail due to disk space error

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor


STINNER Victor  added the comment:

Statistics on partition which are the most full.

Fedora Rawhide x86-64 is ok:

/dev/mapper/vg_root_python--builder--rawhide.osci.io-root14G5,4G  7,6G  
42% /
/dev/mapper/vg_root_python--builder--rawhide.osci.io-home36G 24G   11G  
70% /home

Fedora Stable x86-64 is ok:

/dev/mapper/vg_root_python--builder2--rawhide.osci.io-root14G7,7G  5,2G 
 60% /
/dev/mapper/vg_root_python--builder2--rawhide.osci.io-home36G 23G   12G 
 67% /home

RHEL8 x86-64 is ok:

/dev/mapper/vg_root_python--builder--rhel8.osci.io-root14G3,5G  9,5G  
27% /
/dev/mapper/vg_root_python--builder--rhel8.osci.io-home36G9,7G   24G  
29% /home

RHEL7 x86-64 is ok:

/dev/mapper/vg_root_python--builder--rhel7.osci.io-root   7,6G3,6G  3,7G  
49% /
/dev/mapper/vg_root_python--builder--rhel7.osci.io-home22G 15G  5,9G  
71% /home

RHEL8 FIPS x86-64 is ok:

/dev/mapper/vg_root_python--builder--rhel8--fips.osci.io-root   15G  2.8G   12G 
 20% /
/dev/mapper/vg_root_python--builder--rhel8--fips.osci.io-home   34G  3.7G   29G 
 12% /home

Fedora Rawhide AArch64 is ok:

/dev/mapper/fedora-root44G 26G   19G  58% /
tmpfs  16G436K   16G   1% /tmp

Fedora Stable AArch64 is ok:

/dev/mapper/fedora-root44G 33G   11G  76% /
tmpfs  16G1,6G   15G  11% /tmp

RHEL7 AArch64 is ok:

/dev/mapper/rhel-root44G 15G   30G  33% /

RHEL8 AArch64 had like 22 GB in /tmp, I removed them. It's now better:

* before: /dev/mapper/rhel-root   44G   41G  3.3G  93% /
* after: /dev/mapper/rhel-root   44G   16G   28G  36% /


Fedora Stable ppc64le /tmp contained 1 GB of temporay files. I removed them. 
Before:

/dev/mapper/fedora-root   45G   29G   17G  63% /
tmpfs4.0G  1.1G  3.0G  27% /tmp

Fedora Rawhide ppc64le is ok:

/dev/mapper/fedora-root   45G   27G   19G  59% /
tmpfs4.0G  384K  4.0G   1% /tmp

RHEL7 ppc64le is ok:

/dev/mapper/rhel-root45G 19G   27G  42% /

RHEL8 ppc64le had 22 GB of old files in /tmp: removed, rebooted. Before:

/dev/mapper/rhel-root   45G   41G  4.2G  91% /

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor


STINNER Victor  added the comment:

python-builder-rawhide had its /tmp partition full of temporary "cc.XXX" 
files. Before: /tmp was full at 100% (3.9 GB). After sudo rm -f /tmp/cc*, only 
52 KB are used (1%).

I'm not sure why gcc/clang left so many temporary files :-/ There are many 
large (22 MB) assembly files (.s).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Charalampos Stratakis


Charalampos Stratakis  added the comment:

There were almost 10GB of remnant cc* files in /tmp from the compilers used, 
which I presume were also the temporary artifacts which remained there after 
the disconnects.

Cleaned those up and rebooted the RHEL8 x86_64 buildbot.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Charalampos Stratakis


Charalampos Stratakis  added the comment:

There is an issue which I discovered after I returned from holidays, basically 
the buildbot-worker keeps getting disconnected from master, so builds start and 
end abruptly, retaining some artifacts.

The next second it tried again with the same results, eventually filling the 
hard disk with the artifacts.

Might be due to an updated package, but I've yet to discover what the issue is.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor


STINNER Victor  added the comment:

> It seems many of the RHEL and Fedora builds fail due to disk space

These workers have different owners and so need to reach different people. We 
should list all impacted workers.

> https://buildbot.python.org/all/#/builders/185/builds/2

AMD64 RHEL8 3.x is the worker: cstratak-RHEL8-x86_64.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Karthikeyan Singaravelan


New submission from Karthikeyan Singaravelan :

It seems many of the RHEL and Fedora builds fail due to disk space

https://buildbot.python.org/all/#/builders/185/builds/2

./configure: line 2382: cannot create temp file for here-document: No space 
left on device
./configure: line 2394: cannot create temp file for here-document: No space 
left on device
./configure: line 2429: cannot create temp file for here-document: No space 
left on device
./configure: line 2591: cannot create temp file for here-document: No space 
left on device
./configure: line 2595: cannot create temp file for here-document: No space 
left on device
./configure: line 2599: cannot create temp file for here-document: No space 
left on device
./configure: line 2603: cannot create temp file for here-document: No space 
left on device
./configure: line 2607: cannot create temp file for here-document: No space 
left on device
./configure: line 2611: cannot create temp file for here-document: No space 
left on device

--
components: Tests
messages: 375925
nosy: cstratak, pablogsal, vstinner, xtreak
priority: normal
severity: normal
status: open
title: RHEL and fedora buildbots fail due to disk space error
type: behavior
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com