Re: poudriere job && find jobs which received signal 11
El día miércoles, octubre 18, 2023 a las 12:10:27p. m. +0200, Alexander Leidinger escribió: > Am 2023-10-18 09:54, schrieb Matthias Apitz: > > Hello, > > > > I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports, > > from git October 14, 2023. In the last two day 2229 packages were > > produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken). > > > > This morning I was looking for something in /var/log/messages and > > accidentally I detected that yesterday a few compilations failed: > > > > # grep 'signal 11' /var/log/messages | grep -v conftest > > Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: > > exited on signal 11 (core dumped) > > Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: > > exited on signal 11 (core dumped) > > Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: > > exited on signal 11 (core dumped) > > Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: > > exited on signal 11 (core dumped) > > Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: > > exited on signal 11 (core dumped) > > > > As I said, without that any of the 2229 jobs were failing: > > > > # cd > > /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg > > # ls -C1 | wc -l > > 2229 > > # grep -l 'build failure' * > > p5-Gtk2-1.24993_3.log > > > > How this is possible, that the make engines didn't failing? The uid > > That can be part of configure runs which try to test some features. > > > 65534 is the one used by poudriere, can I use the jid 24 somehow to find > > the job which received the signal 11? Or is the time the only way to > > jid = jail ID, the first column in the output of "jls". If you have the > ... Thanks for the detailed explanation and hints. I don't have logged the stdout of the poudriere, I only have the build logs of all 2229 jobs. I managed to identify the 47 builds which where running at that time between 10:00 and 13:00 (with some grep commands, cutting away all builds which ended before 10:00, and then all which started after 13:00). I run the build for the 47 ports again, one after the other with only one builder. The culprit seems to be lang/gcc10 which is still running at the moment of typing but already produce again two times: Oct 18 17:44:45 jet kernel: pid 21011 (cc1plus), jid 169, uid 65534: exited on signal 11 (core dumped) Oct 18 17:45:17 jet kernel: pid 30102 (cc1plus), jid 169, uid 65534: exited on signal 11 (core dumped) Will dig into its build log later ... Yours matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Re: poudriere job && find jobs which received signal 11
Am 2023-10-18 09:54, schrieb Matthias Apitz: Hello, I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports, from git October 14, 2023. In the last two day 2229 packages were produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken). This morning I was looking for something in /var/log/messages and accidentally I detected that yesterday a few compilations failed: # grep 'signal 11' /var/log/messages | grep -v conftest Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) As I said, without that any of the 2229 jobs were failing: # cd /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg # ls -C1 | wc -l 2229 # grep -l 'build failure' * p5-Gtk2-1.24993_3.log How this is possible, that the make engines didn't failing? The uid That can be part of configure runs which try to test some features. 65534 is the one used by poudriere, can I use the jid 24 somehow to find the job which received the signal 11? Or is the time the only way to jid = jail ID, the first column in the output of "jls". If you have the poudriere runtime logs (where it lists which package it is processing ATM), you will see a number from 1 to the max number of jails which run in parallel. This number is part of the hostname of the jail. So if you have the poudriere jails still running, you can make a mapping from the jid to the name to the number, and together with the time you can see which package it was building at that time. Unfortunately poudriere doesn't list the hostname of the builder nor the jid (feature request anyone?). Example poudriere runtime log: ---snip--- [00:54:11] [03] [00:00:00] Building security/nss | nss-3.94 [00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success [00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9 [00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: Success [00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3 ---snip--- While poudriere is running, jls reports this: ---snip--- # jls jid host.hostname [...] 91 poudriere-bastille-default 92 poudriere-bastille-default 93 poudriere-bastille-default-job-01 94 poudriere-bastille-default-job-01 95 poudriere-bastille-default-job-02 96 poudriere-bastille-default-job-03 97 poudriere-bastille-default-job-02 98 poudriere-bastille-default-job-03 ---snip--- So if we assume a coredump in jid 96 or 98, this means it was in builder 3. nss and gseed where build by poudriere builder number 3 (both about 56 minutes after start of poudriere), and gtk30 and qt6-base by poudriere builder number 1. If we assume further that the coredumps are in the timerange of 54 to 56 minutes after the poudriere start, the logs of nss may have a trace of it (or not, if it was part of configure, then you would have to do the configure run and check the messages if it generates similar coredumps) look, which of the 4 poudriere engines were running at this time? I'd like to rerun/reproduce the package again. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
poudriere job && find jobs which received signal 11
Hello, I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports, from git October 14, 2023. In the last two day 2229 packages were produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken). This morning I was looking for something in /var/log/messages and accidentally I detected that yesterday a few compilations failed: # grep 'signal 11' /var/log/messages | grep -v conftest Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) As I said, without that any of the 2229 jobs were failing: # cd /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg # ls -C1 | wc -l 2229 # grep -l 'build failure' * p5-Gtk2-1.24993_3.log How this is possible, that the make engines didn't failing? The uid 65534 is the one used by poudriere, can I use the jid 24 somehow to find the job which received the signal 11? Or is the time the only way to look, which of the 4 poudriere engines were running at this time? I'd like to rerun/reproduce the package again. Thanks matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub