Re: freebsd-update 12.3 to 14.0RC1 takes 12-24 hours (block cloning regression)

2023-10-18 Thread Piotr P. Stefaniak

On 2023-10-17 09:40:37, Kevin Bowling wrote:


The flash SLOG system took around 12 hours to complete freebsd-update
from 13.2 to 14.0-RC1.  The system without the SLOG took nearly 24
hours.  This was the result of ~50k patches, and ~10k files from
freebsd-update and a very pathological 'install' command performance.



I spoke with mjg about this and because my pools do not have block
cloning enabled, copy_file_range turns into a massive pessimization in
'install'.  


I reported on IRC what I think is the same issue, except in my case this
was on an MMC and took many days (I stopped paying attention after 3
days).

Piotr



compiling lang/gcc10 produces messages about signal 11 kills in /var/log/messages

2023-10-18 Thread Matthias Apitz


Hello salvad...@freebsd.org,

I'm writing to you as MAINTAINER of the port lang/gcc10.

When compiling the port lang/gcc10 with poudriere on a recent
14.0-CURRENT system and very recent ports from git, I encounter
in /var/log/messages lines like these:

Oct 18 17:44:45 jet kernel: pid 21011 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 17:45:17 jet kernel: pid 30102 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 18:17:33 jet kernel: pid 24168 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 18:17:53 jet kernel: pid 32905 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 18:32:42 jet kernel: pid 99948 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 18:33:02 jet kernel: pid 8837 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)

See also as background the forwarded mail.

Any clue about this? The package itself is produced fine.

Yours

matthias


- Forwarded message from Matthias Apitz  -

Date: Wed, 18 Oct 2023 18:19:51 +0200
From: Matthias Apitz 
To: freebsd-current@freebsd.org
Subject: Re: poudriere job && find jobs which received signal 11

El día miércoles, octubre 18, 2023 a las 12:10:27p. m. +0200, Alexander 
Leidinger escribió:

> Am 2023-10-18 09:54, schrieb Matthias Apitz:
> > Hello,
> > 
> > I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
> > from git October 14, 2023. In the last two day 2229 packages were
> > produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).
> > 
> > This morning I was looking for something in /var/log/messages and
> > accidentally I detected that yesterday a few compilations failed:
> > 
> > # grep 'signal 11' /var/log/messages | grep -v conftest
> > Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > 
> > As I said, without that any of the 2229 jobs were failing:
> > 
> > # cd 
> > /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg
> > # ls -C1  | wc -l
> > 2229
> > # grep -l 'build failure' *
> > p5-Gtk2-1.24993_3.log
> > 
> > How this is possible, that the make engines didn't failing? The uid
> 
> That can be part of configure runs which try to test some features.
> 
> > 65534 is the one used by poudriere, can I use the jid 24 somehow to find
> > the job which received the signal 11? Or is the time the only way to
> 
> jid = jail ID, the first column in the output of "jls". If you have the
> ...

Thanks for the detailed explanation and hints. I don't have logged the stdout of
the poudriere, I only have the build logs of all 2229 jobs. I managed to
identify the 47 builds which where running at that time between 10:00 and 
13:00 (with some grep commands, cutting away all builds which ended
before 10:00, and then all which started after 13:00). I run the build
for the 47 ports again, one after the other with only one builder. The
culprit seems to be lang/gcc10 which is still running at the moment of typing
but already produce again two times:

Oct 18 17:44:45 jet kernel: pid 21011 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 17:45:17 jet kernel: pid 30102 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)

...

Yours

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub


- End forwarded message -

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: poudriere job && find jobs which received signal 11

2023-10-18 Thread Matthias Apitz
El día miércoles, octubre 18, 2023 a las 12:10:27p. m. +0200, Alexander 
Leidinger escribió:

> Am 2023-10-18 09:54, schrieb Matthias Apitz:
> > Hello,
> > 
> > I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
> > from git October 14, 2023. In the last two day 2229 packages were
> > produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).
> > 
> > This morning I was looking for something in /var/log/messages and
> > accidentally I detected that yesterday a few compilations failed:
> > 
> > # grep 'signal 11' /var/log/messages | grep -v conftest
> > Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534:
> > exited on signal 11 (core dumped)
> > 
> > As I said, without that any of the 2229 jobs were failing:
> > 
> > # cd 
> > /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg
> > # ls -C1  | wc -l
> > 2229
> > # grep -l 'build failure' *
> > p5-Gtk2-1.24993_3.log
> > 
> > How this is possible, that the make engines didn't failing? The uid
> 
> That can be part of configure runs which try to test some features.
> 
> > 65534 is the one used by poudriere, can I use the jid 24 somehow to find
> > the job which received the signal 11? Or is the time the only way to
> 
> jid = jail ID, the first column in the output of "jls". If you have the
> ...

Thanks for the detailed explanation and hints. I don't have logged the stdout of
the poudriere, I only have the build logs of all 2229 jobs. I managed to
identify the 47 builds which where running at that time between 10:00 and 
13:00 (with some grep commands, cutting away all builds which ended
before 10:00, and then all which started after 13:00). I run the build
for the 47 ports again, one after the other with only one builder. The
culprit seems to be lang/gcc10 which is still running at the moment of typing
but already produce again two times:

Oct 18 17:44:45 jet kernel: pid 21011 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)
Oct 18 17:45:17 jet kernel: pid 30102 (cc1plus), jid 169, uid 65534: exited on 
signal 11 (core dumped)

Will dig into its build log later ...

Yours

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: issue: poudriere jail update fails after recent changes around certctl

2023-10-18 Thread Dag-Erling Smørgrav
Alexander Leidinger  writes:
> If FreeBSD provides some certs as trusted (as part of
> e.g. installworld), and I have some of them listed in untrusted, I
> would not expect an error case, but a failsafe action of not trusting
> them and not complaining... am I doing something wrong?

No, this is definitely something we want to support.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: odd bhyve guest message (Khelp module "ertt" can't unload)

2023-10-18 Thread Gary Jennejohn
On Wed, 18 Oct 2023 12:20:24 +0100
void  wrote:

> I'm seeing this message on exit from a 14.0-RC1 guest on a host
> running main-n265915 when it exits normally (via poweroff within
> the guest):
>
> pflog0: promiscuous mode disabled
> Waiting (max 60 seconds) for system process `vnlru' to stop... done
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining... 0 0 0 0 done
> All buffers synced.
> GEOM_ELI: Device vtbd0p2.eli destroyed.
> GEOM_ELI: Detached vtbd0p2.eli on last close.
> Uptime: 7h26m6s
> GEOM_ELI: Device vtbd0p3.eli destroyed.
> GEOM_ELI: Detached vtbd0p3.eli on last close.
>
> Khelp module "ertt" can't unload until its refcount drops from 18 to 0.
>
> acpi0: Powering system off
>
> Not sure it's indicative of a problem
> --
>

I see the ertt message also when I shutdown my FreeBSD-15.  I don't think
that it's indicative of a problem and I've never seen a shutdown failure.

--
Gary Jennejohn



odd bhyve guest message (Khelp module "ertt" can't unload)

2023-10-18 Thread void

I'm seeing this message on exit from a 14.0-RC1 guest on a host
running main-n265915 when it exits normally (via poweroff within
the guest):

pflog0: promiscuous mode disabled
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 0 done
All buffers synced.
GEOM_ELI: Device vtbd0p2.eli destroyed.
GEOM_ELI: Detached vtbd0p2.eli on last close.
Uptime: 7h26m6s
GEOM_ELI: Device vtbd0p3.eli destroyed.
GEOM_ELI: Detached vtbd0p3.eli on last close.

Khelp module "ertt" can't unload until its refcount drops from 18 to 0.

acpi0: Powering system off

Not sure it's indicative of a problem
--



Re: poudriere job && find jobs which received signal 11

2023-10-18 Thread Alexander Leidinger

Am 2023-10-18 09:54, schrieb Matthias Apitz:

Hello,

I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
from git October 14, 2023. In the last two day 2229 packages were
produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).

This morning I was looking for something in /var/log/messages and
accidentally I detected that yesterday a few compilations failed:

# grep 'signal 11' /var/log/messages | grep -v conftest
Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)


As I said, without that any of the 2229 jobs were failing:

# cd 
/usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg

# ls -C1  | wc -l
2229
# grep -l 'build failure' *
p5-Gtk2-1.24993_3.log

How this is possible, that the make engines didn't failing? The uid


That can be part of configure runs which try to test some features.

65534 is the one used by poudriere, can I use the jid 24 somehow to 
find

the job which received the signal 11? Or is the time the only way to


jid = jail ID, the first column in the output of "jls". If you have the 
poudriere runtime logs (where it lists which package it is processing 
ATM), you will see a number from 1 to the max number of jails which run 
in parallel. This number is part of the hostname of the jail. So if you 
have the poudriere jails still running, you can make a mapping from the 
jid to the name to the number, and together with the time you can see 
which package it was building at that time. Unfortunately poudriere 
doesn't list the hostname of the builder nor the jid (feature request 
anyone?).


Example poudriere runtime log:
---snip---
[00:54:11] [03] [00:00:00] Building security/nss | nss-3.94
[00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success
[00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9
[00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: 
Success

[00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3
---snip---

While poudriere is running, jls reports this:
---snip---
# jls jid host.hostname
[...]
91 poudriere-bastille-default
92 poudriere-bastille-default
93 poudriere-bastille-default-job-01
94 poudriere-bastille-default-job-01
95 poudriere-bastille-default-job-02
96 poudriere-bastille-default-job-03
97 poudriere-bastille-default-job-02
98 poudriere-bastille-default-job-03
---snip---

So if we assume a coredump in jid 96 or 98, this means it was in builder 
3.
nss and gseed where build by poudriere builder number 3 (both about 56 
minutes after start of poudriere), and gtk30 and qt6-base by poudriere 
builder number 1.
If we assume further that the coredumps are in the timerange of 54 to 56 
minutes after the poudriere start, the logs of nss may have a trace of 
it (or not, if it was part of configure, then you would have to do the 
configure run and check the messages if it generates similar coredumps)



look, which of the 4 poudriere engines were running at this time?
I'd like to rerun/reproduce the package again.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


poudriere job && find jobs which received signal 11

2023-10-18 Thread Matthias Apitz


Hello,

I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
from git October 14, 2023. In the last two day 2229 packages were
produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).

This morning I was looking for something in /var/log/messages and
accidentally I detected that yesterday a few compilations failed:

# grep 'signal 11' /var/log/messages | grep -v conftest
Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: exited on 
signal 11 (core dumped)
Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: exited on 
signal 11 (core dumped)
Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: exited on 
signal 11 (core dumped)
Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: exited on 
signal 11 (core dumped)
Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: exited on 
signal 11 (core dumped)

As I said, without that any of the 2229 jobs were failing:

# cd 
/usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg
# ls -C1  | wc -l
2229
# grep -l 'build failure' *
p5-Gtk2-1.24993_3.log

How this is possible, that the make engines didn't failing? The uid
65534 is the one used by poudriere, can I use the jid 24 somehow to find
the job which received the signal 11? Or is the time the only way to
look, which of the 4 poudriere engines were running at this time?
I'd like to rerun/reproduce the package again.

Thanks

matthias


-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub