Re: bsdinstall wifi setup is broken on CURRENT
On 18/05/24 11:33, Alfonso S. Siciliano wrote: On 5/16/24 20:40, Renato Botelho wrote: I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. Thank you for your report, the video is highly appreciated to understand the problem quickly and exactly. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. Problem 1. Looking at wlanconfig it seems related to $height $width $rows for the selecting menu. Please could you open a PR adding me, so we can test and solve. The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Problem 2. I know this issue about --mixedform, my last import 2 day ago should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c. You could try the last main snapshot (yesterday 17 May), please let me know any problem. I confirmed it is fixed with bsddialog 1.0.2 but I found another issue while testing. Instead of password, it was adding SSID to psk field of wpa_supplicant.conf. I've created following review to address that https://reviews.freebsd.org/D45344 Thanks! -- Renato Botelho
Re: bsdinstall wifi setup is broken on CURRENT
On 18/05/24 11:33, Alfonso S. Siciliano wrote: On 5/16/24 20:40, Renato Botelho wrote: I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. Thank you for your report, the video is highly appreciated to understand the problem quickly and exactly. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. Problem 1. Looking at wlanconfig it seems related to $height $width $rows for the selecting menu. Please could you open a PR adding me, so we can test and solve. I've fixed it locally and submitted a fix for review https://reviews.freebsd.org/D45271 The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Problem 2. I know this issue about --mixedform, my last import 2 day ago should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c. You could try the last main snapshot (yesterday 17 May), please let me know any problem. Last snapshot still contains bsddialog 1.0 so I'll wait for the next one and give it a try. Jessica, I've cc'd you because git shows you were the last person making changes in this area. If it's not related and I made a mistake, just ignore me. [1] https://youtube.com/shorts/Gmeckokw2a0 Again thanks for the video. Best Regards, Alfonso -- Renato Botelho
Re: bsdinstall wifi setup is broken on CURRENT
On 5/16/24 20:40, Renato Botelho wrote: I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. Thank you for your report, the video is highly appreciated to understand the problem quickly and exactly. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. Problem 1. Looking at wlanconfig it seems related to $height $width $rows for the selecting menu. Please could you open a PR adding me, so we can test and solve. The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Problem 2. I know this issue about --mixedform, my last import 2 day ago should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c. You could try the last main snapshot (yesterday 17 May), please let me know any problem. Jessica, I've cc'd you because git shows you were the last person making changes in this area. If it's not related and I made a mistake, just ignore me. [1] https://youtube.com/shorts/Gmeckokw2a0 Again thanks for the video. Best Regards, Alfonso
Re: bsdinstall wifi setup is broken on CURRENT
Renato Botelho writes: > I'm not sure about a good way to test it on a running system instead. Update your source tree, build and install world, run `sudo bsdconfig`, scroll down and select “Network Management”, then select “Wireless Networks”. DES -- Dag-Erling Smørgrav - d...@freebsd.org
Re: bsdinstall wifi setup is broken on CURRENT
Hello Renato, I will give it a try this weekend with bhyve since I have a passtrhu for iwlwifi card. Cheers, Renato Botelho escreveu (quinta, 16/05/2024 à(s) 19:56): > On 16/05/24 15:47, Jessica Clarke wrote: > > On 16 May 2024, at 19:40, Renato Botelho wrote: > >> > >> I saw some users on a .br group complaining bsdinstall was failing to > setup wifi network on 15.0 snapshots and tried it myself. I was able to > reproduce the problem and also noticed another one. > >> > >> I noticed Network Selection screen only shows one line, it's not > beautiful to navigate through items this way. On 14.1-BETA2 it shows > multiple lines so it seems to be a regression. > >> > >> The problem users reported was: after selecting desired network it just > starts over instead of asking for password. I made a video [1] showing the > problem. > >> > >> Jessica, I've cc'd you because git shows you were the last person > making changes in this area. If it's not related and I made a mistake, > just ignore me. > > > > Hi Renato, > > I touched the code that lets you select the wireless interface in the > > first place, but not the script that then gets called to set it up and > > is responsible for the dialogs you see. Given the behaviour, I wonder > > if this is what today’s import of bsddialog[1] fixes? From reading the > > script the next dialog uses --mixedform, and restarts the script on > > error, which it looks like is what you observe. > > Thanks for pointing that out, Jessica. I'll wait for the next 15 > snapshot and will check. > > I'm not sure about a good way to test it on a running system instead. > > -- > Renato Botelho > > -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org
bsdinstall wifi setup is broken on CURRENT
Thank you for the information. The right email address is i...@aktionheizung.de Pay information exclusively to this email address. Thanks - On 16 May 2024, at 19:40, Renato Botelho wrote: I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Jessica, I've cc'd you because git shows you were the last person making changes in this area. If it's not related and I made a mistake, just ignore me. Hi Renato, I touched the code that lets you select the wireless interface in the first place, but not the script that then gets called to set it up and is responsible for the dialogs you see. Given the behaviour, I wonder if this is what today’s import of bsddialog[1] fixes? From reading the script the next dialog uses --mixedform, and restarts the script on error, which it looks like is what you observe. Jess [1]https://cgit.freebsd.org/src/commit/?id=a6d8be451f62d425b71a4874f7d4e133b9fb393c [1]https://youtube.com/shorts/Gmeckokw2a0 -- Renato Botelho
Re: bsdinstall wifi setup is broken on CURRENT
On 16/05/24 15:47, Jessica Clarke wrote: On 16 May 2024, at 19:40, Renato Botelho wrote: I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Jessica, I've cc'd you because git shows you were the last person making changes in this area. If it's not related and I made a mistake, just ignore me. Hi Renato, I touched the code that lets you select the wireless interface in the first place, but not the script that then gets called to set it up and is responsible for the dialogs you see. Given the behaviour, I wonder if this is what today’s import of bsddialog[1] fixes? From reading the script the next dialog uses --mixedform, and restarts the script on error, which it looks like is what you observe. Thanks for pointing that out, Jessica. I'll wait for the next 15 snapshot and will check. I'm not sure about a good way to test it on a running system instead. -- Renato Botelho
Re: bsdinstall wifi setup is broken on CURRENT
On 16 May 2024, at 19:40, Renato Botelho wrote: > > I saw some users on a .br group complaining bsdinstall was failing to setup > wifi network on 15.0 snapshots and tried it myself. I was able to reproduce > the problem and also noticed another one. > > I noticed Network Selection screen only shows one line, it's not beautiful to > navigate through items this way. On 14.1-BETA2 it shows multiple lines so it > seems to be a regression. > > The problem users reported was: after selecting desired network it just > starts over instead of asking for password. I made a video [1] showing the > problem. > > Jessica, I've cc'd you because git shows you were the last person making > changes in this area. If it's not related and I made a mistake, just ignore > me. Hi Renato, I touched the code that lets you select the wireless interface in the first place, but not the script that then gets called to set it up and is responsible for the dialogs you see. Given the behaviour, I wonder if this is what today’s import of bsddialog[1] fixes? From reading the script the next dialog uses --mixedform, and restarts the script on error, which it looks like is what you observe. Jess [1] https://cgit.freebsd.org/src/commit/?id=a6d8be451f62d425b71a4874f7d4e133b9fb393c > [1] https://youtube.com/shorts/Gmeckokw2a0 > -- > Renato Botelho
bsdinstall wifi setup is broken on CURRENT
I saw some users on a .br group complaining bsdinstall was failing to setup wifi network on 15.0 snapshots and tried it myself. I was able to reproduce the problem and also noticed another one. I noticed Network Selection screen only shows one line, it's not beautiful to navigate through items this way. On 14.1-BETA2 it shows multiple lines so it seems to be a regression. The problem users reported was: after selecting desired network it just starts over instead of asking for password. I made a video [1] showing the problem. Jessica, I've cc'd you because git shows you were the last person making changes in this area. If it's not related and I made a mistake, just ignore me. [1] https://youtube.com/shorts/Gmeckokw2a0 -- Renato Botelho
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, . . .] [Update to Host OSVERSION 1500018 did not help]
On 2024-05-08 23:53:57 (+0800), Mark Millard wrote: On Apr 29, 2024, at 20:16, Mark Millard wrote: On Apr 29, 2024, at 20:11, Mark Millard wrote: On Apr 29, 2024, at 19:54, Mark Millard wrote: On Apr 28, 2024, at 18:06, Philip Paeps wrote: On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: On Apr 18, 2024, at 08:02, Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: [...] My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . One thing of possible note: Failing . . . Host OSVERSION: 156 Jail OSVERSION: 1500014 I have finished a package builder refresh this morning. All our builder hosts (except PowerPC - I don't touch those) are now on main-n269671-feabaf8d5389 (OSVERSION 1500018). ampere1 successfully finished its 140releng-armv7-quarterly build, so it looks like the problem with stuck builds was limited to ampere2 building main-armv7. I'll keep a close eye on this one when it starts its next build. I see that main-armv7 started. It queued only 31935 instead of the prior 34528 (or more): it is doing an incremental build instead of a full build. For example, pkg was not built but instead the prior build is in use. Thus bad results from the prior build might be involved in this new build. I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch build for the purposes of the main-armv7 test. Actually the test is not going to previde the information we are after as things are. giflib-5.2.2 failed to build, which leads to devel/doxygen being skipped. devel/doxygen was the first one to hang up in the prior 2 failing attempts, if I remember right. giflib-5.2.2 also causes graphics/graphviz to be skipped. graphics/graphviz was installed just before the hangup in all of the example hanups. So the context will not be replicated. We need graphics/giflib to build to actually do the test. Looks like: https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f is the fix for the graphic/giflib build failure. Well, main-armv7 is building again and things are still getting stuck. So much for my idea. For reference I list the over 10-hr-so-far ones: doxygen-1.9.6_1,2 build-depends 13:03:54 py39-pydot-2.0.0run-depends 12:24:04 py39-pygraphviz-1.6 lib-depends 12:10:38 "ps -alxdww" would likely be appropriate to get a copy of the otuput of. "procstat -k -k" usage and the like on stuck processes would probably be appropriate. Does anyone with appropriate investigative background have login access to ampere2 to take a look at what is getting stuck? This is unfortunate. I'm sure I have the appropriate background, but I'm spread very thin! I'll get as much information as I can about this machine while it's stuck, before I bounce it again. I think it may be worth a try building those ports in isolation on ref14-aarch64, and see what they're trying to do. I'll also set up a set of refX-armv7 jails on that machine. Hopefully we can get to the bottom of this soon. This is a very tedious failure mode. We could also try to put an older armv7 image on the builder jail on ampere2. Depending on whether we have a sufficiently old image, that will either be very straightforward, or a very deep rabbit hole. Thanks again for keeping an eye on this. We really should have better monitoring for stuck builds than "Mark will tell us". :-) Philip
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, . . .] [Update to Host OSVERSION 1500018 did not help]
On Apr 29, 2024, at 20:16, Mark Millard wrote: > On Apr 29, 2024, at 20:11, Mark Millard wrote: > >> On Apr 29, 2024, at 19:54, Mark Millard wrote: >> >>> On Apr 28, 2024, at 18:06, Philip Paeps wrote: >>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: > On Apr 18, 2024, at 08:02, Mark Millard wrote: >> void wrote on >> Date: Thu, 18 Apr 2024 14:08:36 UTC : >> >>> Not sure where to post this.. >>> >>> The last bulk build for arm64 appears to have happened around >>> mid-March on ampere2. Is it broken? >> >> main-armv7 building is broken and the last completed build >> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It >> gets stuck making no progress until manually forced to stop, >> which leads to huge elapsed times for the incomplete builds: >> >> [...] >> >> My guess is that FreeBSD has something that broken after bd45bbe440 >> that was broken as of f5f08e41aa and was still broken at 75464941dc . >> > > One thing of possible note: > > Failing . . . > > Host OSVERSION: 156 > Jail OSVERSION: 1500014 I have finished a package builder refresh this morning. All our builder hosts (except PowerPC - I don't touch those) are now on main-n269671-feabaf8d5389 (OSVERSION 1500018). ampere1 successfully finished its 140releng-armv7-quarterly build, so it looks like the problem with stuck builds was limited to ampere2 building main-armv7. I'll keep a close eye on this one when it starts its next build. >>> >>> I see that main-armv7 started. >>> >>> It queued only 31935 instead of the prior 34528 (or more): it is doing an >>> incremental build instead of a full build. For example, pkg was not built >>> but instead the prior build is in use. Thus bad results from the prior >>> build might be involved in this new build. >>> >>> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch >>> build for the purposes of the main-armv7 test. >> >> Actually the test is not going to previde the information we are >> after as things are. >> >> giflib-5.2.2 failed to build, which leads to devel/doxygen being >> skipped. devel/doxygen was the first one to hang up in the prior >> 2 failing attempts, if I remember right. >> >> giflib-5.2.2 also causes graphics/graphviz to be skipped. >> graphics/graphviz was installed just before the hangup in all of >> the example hanups. So the context will not be replicated. >> >> We need graphics/giflib to build to actually do the test. > > Looks like: > > https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f > > is the fix for the graphic/giflib build failure. Well, main-armv7 is building again and things are still getting stuck. So much for my idea. For reference I list the over 10-hr-so-far ones: doxygen-1.9.6_1,2 build-depends 13:03:54 py39-pydot-2.0.0run-depends 12:24:04 py39-pygraphviz-1.6 lib-depends 12:10:38 "ps -alxdww" would likely be appropriate to get a copy of the otuput of. "procstat -k -k" usage and the like on stuck processes would probably be appropriate. Does anyone with appropriate investigative background have login access to ampere2 to take a look at what is getting stuck? === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 29, 2024, at 20:11, Mark Millard wrote: > On Apr 29, 2024, at 19:54, Mark Millard wrote: > >> On Apr 28, 2024, at 18:06, Philip Paeps wrote: >> >>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: On Apr 18, 2024, at 08:02, Mark Millard wrote: > void wrote on > Date: Thu, 18 Apr 2024 14:08:36 UTC : > >> Not sure where to post this.. >> >> The last bulk build for arm64 appears to have happened around >> mid-March on ampere2. Is it broken? > > main-armv7 building is broken and the last completed build > was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It > gets stuck making no progress until manually forced to stop, > which leads to huge elapsed times for the incomplete builds: > > [...] > > My guess is that FreeBSD has something that broken after bd45bbe440 > that was broken as of f5f08e41aa and was still broken at 75464941dc . > One thing of possible note: Failing . . . Host OSVERSION: 156 Jail OSVERSION: 1500014 >>> >>> I have finished a package builder refresh this morning. All our builder >>> hosts (except PowerPC - I don't touch those) are now on >>> main-n269671-feabaf8d5389 (OSVERSION 1500018). >>> >>> ampere1 successfully finished its 140releng-armv7-quarterly build, so it >>> looks like the problem with stuck builds was limited to ampere2 building >>> main-armv7. I'll keep a close eye on this one when it starts its next >>> build. >>> >> >> I see that main-armv7 started. >> >> It queued only 31935 instead of the prior 34528 (or more): it is doing an >> incremental build instead of a full build. For example, pkg was not built >> but instead the prior build is in use. Thus bad results from the prior >> build might be involved in this new build. >> >> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch >> build for the purposes of the main-armv7 test. > > Actually the test is not going to previde the information we are > after as things are. > > giflib-5.2.2 failed to build, which leads to devel/doxygen being > skipped. devel/doxygen was the first one to hang up in the prior > 2 failing attempts, if I remember right. > > giflib-5.2.2 also causes graphics/graphviz to be skipped. > graphics/graphviz was installed just before the hangup in all of > the example hanups. So the context will not be replicated. > > We need graphics/giflib to build to actually do the test. Looks like: https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f is the fix for the graphic/giflib build failure. === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 29, 2024, at 19:54, Mark Millard wrote: > On Apr 28, 2024, at 18:06, Philip Paeps wrote: > >> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: >>> On Apr 18, 2024, at 08:02, Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : > Not sure where to post this.. > > The last bulk build for arm64 appears to have happened around > mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: [...] My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . >>> >>> One thing of possible note: >>> >>> Failing . . . >>> >>> Host OSVERSION: 156 >>> Jail OSVERSION: 1500014 >> >> I have finished a package builder refresh this morning. All our builder >> hosts (except PowerPC - I don't touch those) are now on >> main-n269671-feabaf8d5389 (OSVERSION 1500018). >> >> ampere1 successfully finished its 140releng-armv7-quarterly build, so it >> looks like the problem with stuck builds was limited to ampere2 building >> main-armv7. I'll keep a close eye on this one when it starts its next build. >> > > I see that main-armv7 started. > > It queued only 31935 instead of the prior 34528 (or more): it is doing an > incremental build instead of a full build. For example, pkg was not built > but instead the prior build is in use. Thus bad results from the prior > build might be involved in this new build. > > I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch > build for the purposes of the main-armv7 test. Actually the test is not going to previde the information we are after as things are. giflib-5.2.2 failed to build, which leads to devel/doxygen being skipped. devel/doxygen was the first one to hang up in the prior 2 failing attempts, if I remember right. giflib-5.2.2 also causes graphics/graphviz to be skipped. graphics/graphviz was installed just before the hangup in all of the example hanups. So the context will not be replicated. We need graphics/giflib to build to actually do the test. === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 28, 2024, at 18:06, Philip Paeps wrote: > On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: >> On Apr 18, 2024, at 08:02, Mark Millard wrote: >>> void wrote on >>> Date: Thu, 18 Apr 2024 14:08:36 UTC : >>> Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? >>> >>> main-armv7 building is broken and the last completed build >>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It >>> gets stuck making no progress until manually forced to stop, >>> which leads to huge elapsed times for the incomplete builds: >>> >>> [...] >>> >>> My guess is that FreeBSD has something that broken after bd45bbe440 >>> that was broken as of f5f08e41aa and was still broken at 75464941dc . >>> >> >> One thing of possible note: >> >> Failing . . . >> >> Host OSVERSION: 156 >> Jail OSVERSION: 1500014 > > I have finished a package builder refresh this morning. All our builder > hosts (except PowerPC - I don't touch those) are now on > main-n269671-feabaf8d5389 (OSVERSION 1500018). > > ampere1 successfully finished its 140releng-armv7-quarterly build, so it > looks like the problem with stuck builds was limited to ampere2 building > main-armv7. I'll keep a close eye on this one when it starts its next build. > I see that main-armv7 started. It queued only 31935 instead of the prior 34528 (or more): it is doing an incremental build instead of a full build. For example, pkg was not built but instead the prior build is in use. Thus bad results from the prior build might be involved in this new build. I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch build for the purposes of the main-armv7 test. === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On 2024-04-18 23:14:22 (+0800), Mark Millard wrote: On Apr 18, 2024, at 08:02, Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: [...] My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . One thing of possible note: Failing . . . Host OSVERSION: 156 Jail OSVERSION: 1500014 I have finished a package builder refresh this morning. All our builder hosts (except PowerPC - I don't touch those) are now on main-n269671-feabaf8d5389 (OSVERSION 1500018). ampere1 successfully finished its 140releng-armv7-quarterly build, so it looks like the problem with stuck builds was limited to ampere2 building main-armv7. I'll keep a close eye on this one when it starts its next build. Philip
Re: TXT Kernel linking failed on -CURRENT
Konstantin, good day! 25.04.2024 0:09, Konstantin Belousov пишет: On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote: linking kernel ld: error: undefined symbol: ktrcapfail referenced by vfs_lookup.c vfs_lookup.o:(namei) referenced by vfs_lookup.c vfs_lookup.o:(namei_setup) referenced by vfs_lookup.c vfs_lookup.o:(vfs_lookup) referenced 3 more times *** [kernel] Error code 1 Try https://reviews.freebsd.org/D44931 Yes, now system and kernel builds fine. Thanks!
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 26, 2024, at 18:55, Philip Paeps wrote: > On 2024-04-18 23:02:30 (+0800), Mark Millard wrote: >> void wrote on >> Date: Thu, 18 Apr 2024 14:08:36 UTC : >> >>> Not sure where to post this.. >>> >>> The last bulk build for arm64 appears to have happened around >>> mid-March on ampere2. Is it broken? >> >> main-armv7 building is broken and the last completed build >> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It >> gets stuck making no progress until manually forced to stop, >> which leads to huge elapsed times for the incomplete builds: >> >> pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 >> (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 >> >> p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 >> (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 >> >> ampere2 alternates between trying to build main-arm64 and main-armv7, so >> main-armv7 being stuck blocks main-arm64 from building. >> >> One can see that all 13 job ID's show over 570 hours: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc >> >> It is not random which packages are building when this happens. Compare: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa >> >> By contrast, the 19 Feb 2024 from-scratch (full) build worked: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 >> >> My guess is that FreeBSD has something that broken after bd45bbe440 >> that was broken as of f5f08e41aa and was still broken at 75464941dc . > > It looks like ampere2 is going to end up in this state again: > > https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca > > It's got a couple of things stuck in -depends already. I'll keep an eye on > it for the next hour or two. If no progress is made, I'll kill this build > and force an upgrade. The next build will start at 01:01 UTC Sunday. So we > won't have long to wait before it tries again. > > ampere1 is chewing away at llvm, and doesn't look stuck. > > ampere3 has been upgraded. Output from the likes of: # ps -axldww could be interesting. As might be output from: # pstat -k -k PIDs_OF_STUCK_PROCESSES (kernel stack backtraces). === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On 2024-04-18 23:02:30 (+0800), Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 ampere2 alternates between trying to build main-arm64 and main-armv7, so main-armv7 being stuck blocks main-arm64 from building. One can see that all 13 job ID's show over 570 hours: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc It is not random which packages are building when this happens. Compare: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa By contrast, the 19 Feb 2024 from-scratch (full) build worked: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . It looks like ampere2 is going to end up in this state again: https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca It's got a couple of things stuck in -depends already. I'll keep an eye on it for the next hour or two. If no progress is made, I'll kill this build and force an upgrade. The next build will start at 01:01 UTC Sunday. So we won't have long to wait before it tries again. ampere1 is chewing away at llvm, and doesn't look stuck. ampere3 has been upgraded. Philip
Re: TXT Kernel linking failed on -CURRENT
On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote: > linking kernel > ld: error: undefined symbol: ktrcapfail > >>> referenced by vfs_lookup.c > >>> vfs_lookup.o:(namei) > >>> referenced by vfs_lookup.c > >>> vfs_lookup.o:(namei_setup) > >>> referenced by vfs_lookup.c > >>> vfs_lookup.o:(vfs_lookup) > >>> referenced 3 more times > *** [kernel] Error code 1 Try https://reviews.freebsd.org/D44931
TXT Kernel linking failed on -CURRENT
Sorry for HTML-trash from previous mail :) Hi, FreeBSD Community! I have a teach with FreeBSD and use -CURRENT on my test machine. And some days ago after - git pull - make buildworld - make buildkernel There is /etc/src.conf and BSDSERV below, what can cause that error? Thanks for help! My /usr/src state is: git log -n 1 commit a0d7d68a2dd818ce84e37e1ff20c8849cda6d853 (HEAD -> main, origin/main, origin/HEAD) Author: Cy Schubert kernel building failed with such messages: -- --- force-dynamic-hack.pico --- cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -shared -O2 -pipe -fno-strict-aliasing -march=native -nostdinc -I. -I/usr/src/sys -I/u sr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -MD -MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-pr efix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fn o-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdi agnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negativ e-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -nostdlib force-dynamic-hack.c -o force-dynamic-hack.pico --- vers.c --- MAKE="make" sh /usr/src/sys/conf/newvers.sh BSDSERV --- vers.o --- cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing -march=native -nostdinc -I. -I/usr/src/sys -I/usr/src /sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/ usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-proto types -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautologi cal-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -Werror vers.c --- kernel --- linking kernel ld: error: undefined symbol: ktrcapfail >>> referenced by vfs_lookup.c >>> vfs_lookup.o:(namei) >>> referenced by vfs_lookup.c >>> vfs_lookup.o:(namei_setup) >>> referenced by vfs_lookup.c >>> vfs_lookup.o:(vfs_lookup) >>> referenced 3 more times *** [kernel] Error code 1 make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV make[2]: 1 error make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV 1098.27 real 2002.17 user 176.26 sys make[1]: stopped in /usr/src make: stopped in /usr/src /etc/src.conf === WITHOUT_APM=yes WITHOUT_ASSERT_DEBUG=yes WITHOUT_AUTHPF=yes WITHOUT_BHYVE=yes WITHOUT_BLACKLIST=yes WITHOUT_BLUETOOTH=yes WITHOUT_CCD=yes WITHOUT_CXGBETOOL=yes WITHOUT_DEBUG_FILES=yes WITHOUT_DTRACE=yes WITHOUT_FLOPPY=yes WITHOUT_GOOGLETEST=yes WITHOUT_HAST=yes WITHOUT_HTML=yes WITHOUT_HYPERV=yes WITHOUT_INET6=yes WITHOUT_IPFILTER=yes WITHOUT_ISCSI=yes WITHOUT_KDUMP=yes WITHOUT_KERNEL_SYMBOLS=yes WITH_MALLOC_PRODUCTION=yes WITHOUT_MLX5TOOL=yes WITHOUT_NVME=yes WITHOUT_OFED=yes WITHOUT_PF=yes WITHOUT_PTHREADS_ASSERTIONS=yes WITHOUT_RADIUS_SUPPORT=yes WITHOUT_RELRO=yes WITHOUT_SSP=yes WITHOUT_WARNS=yes WITHOUT_WERROR=yes WITHOUT_TESTS=yes WITHOUT_WIRELESS=yes BSDSERV === cpu HAMMER ident BSDSERV device amdtemp options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options VIMAGE # Subsystem virtualization, e.g. VNET options INET # InterNETworking options TCP_OFFLOAD
Kernel linking error on -CURRENT
Hi, FreeBSD Community! I have a teach with FreeBSD and use -CURRENT on my test machine.And some days ago after- git pull- make buildworld- make buildkernel There is /etc/src.conf and BSDSERV below, what can cause that error?Thanks for help! kernel building failed with such messages:- force-dynamic-hack.pico ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -shared -O2 -pipe -fno-strict-aliasing -march=native -nostdinc -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -MD -MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -nostdlib force-dynamic-hack.c -o force-dynamic-hack.pico--- vers.c ---MAKE="make" sh /usr/src/sys/conf/newvers.sh BSDSERV--- vers.o ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing -march=native -nostdinc -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length -mno-aes -mno-avx -std=gnu99 -Werror vers.c--- kernel ---linking kernelld: error: undefined symbol: ktrcapfail>>> referenced by vfs_lookup.c>>> vfs_lookup.o:(namei)>>> referenced by vfs_lookup.c>>> vfs_lookup.o:(namei_setup)>>> referenced by vfs_lookup.c>>> vfs_lookup.o:(vfs_lookup)>>> referenced 3 more times*** [kernel] Error code 1 make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERVmake[2]: 1 error make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV 1098.27 real 2002.17 user 176.26 sys make[1]: stopped in /usr/src make: stopped in /usr/src /etc/src.conf===WITHOUT_APM=yesWITHOUT_ASSERT_DEBUG=yesWITHOUT_AUTHPF=yesWITHOUT_BHYVE=yesWITHOUT_BLACKLIST=yesWITHOUT_BLUETOOTH=yesWITHOUT_CCD=yesWITHOUT_CXGBETOOL=yesWITHOUT_DEBUG_FILES=yesWITHOUT_DTRACE=yesWITHOUT_FLOPPY=yesWITHOUT_GOOGLETEST=yesWITHOUT_HAST=yesWITHOUT_HTML=yesWITHOUT_HYPERV=yesWITHOUT_INET6=yesWITHOUT_IPFILTER=yesWITHOUT_ISCSI=yesWITHOUT_KDUMP=yesWITHOUT_KERNEL_SYMBOLS=yesWITH_MALLOC_PRODUCTION=yesWITHOUT_MLX5TOOL=yesWITHOUT_NVME=yesWITHOUT_OFED=yesWITHOUT_PF=yesWITHOUT_PTHREADS_ASSERTIONS=yesWITHOUT_RADIUS_SUPPORT=yesWITHOUT_RELRO=yesWITHOUT_SSP=yesWITHOUT_WARNS=yesWITHOUT_WERROR=yesWITHOUT_TESTS=yesWITHOUT_WIRELESS=yes BSDSERV===cpu HAMMERident BSDSERVdevice amdtempoptions SCHED_ULE # ULE scheduleroptions PREEMPTION # Enable kernel thread preemptionoptions VIMAGE # Subsystem virtualization, e.g. VNEToptions INET # InterNETworkingoptions TCP_OFFLOAD # TCP offloadoptions TCP_BLACKBOX # Enhanced TCP event loggingoptions TCP_HHOOK # hhook(9) framework for TCPoptions TCP_RFC7413 # TCP Fast Openoptions KERN_TLS # TLS transmit & receive offloadoptions FFS # Berkeley Fast Filesystemoptions SOF
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On 2024-04-24 02:12:41 (+0800), Mark Millard wrote: On Apr 19, 2024, at 07:16, Philip Paeps wrote: On 2024-04-18 23:02:30 (+0800), Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 ampere2 alternates between trying to build main-arm64 and main-armv7, so main-armv7 being stuck blocks main-arm64 from building. One can see that all 13 job ID's show over 570 hours: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc It is not random which packages are building when this happens. Compare: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa By contrast, the 19 Feb 2024 from-scratch (full) build worked: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . I'll kill the build on ampere2 again. Thanks for the nudge. We don't really have good monitoring for this. Also: builds should time out after 36 hours. The fact that this one does not is a bug in itself. Philip [hat: clusteradm] I'll note that I've never managed to replicate the problem for building for armv7 on aarch64. But my context never has the likes of: QUOTE Host OSVERSION: 156 Jail OSVERSION: 1500015 . . . !!! Jail is newer than host. (Jail: 1500015, Host: 156) !!! !!! This is not supported. !!! !!! Host kernel must be same or newer than jail. !!! !!! Expect build failures. !!! END QUOTE but always has the two OSVERSION's the same, such as: Host OSVERSION: 1500015 Jail OSVERSION: 1500015 or, recently, Host OSVERSION: 1500018 Jail OSVERSION: 1500018 My bulk runs do go through the sequence where the hangups have repeated for main-armv7 on ampere2. I wonder what would happen if "Host OSVERSION" was updated (modernized) to match the modern "Jail OSVERSION" that would be used? The package builders are due for a regular refresh to newer -CURRENT dogfood. I'll do the aarch64 builders first this time. I've set /root/stop-builds on them. I'll upgrade them when they go idle. Or I'll kill them if they take much longer to build what they're building. It annoys me that they do not stop building after 36 hours, like they're supposed to. They're currently running: n266879-6abee52e0d79 2023-12-09 01:06:28 jlduran strfmon: Silence scan-build warning Our current clusteradm build is: n269399-bbc6e6c5ec8c 2024-04-14 03:12:36 sigsys daemon: fix -R to enable supervision mode I may do a new build while waiting for them to go idle: - quarterly 140arm64 1b931669de11 parallel_build 28776 15299 33 588 985 0 11871 3D:01:08:29 https://pkg-status.freebsd.org/ampere1/build.html?mastername=140arm64-quarterly=1b931669de11 - default main-arm64 p1c7a816cd0ad_s1bd4f769caf parallel_build 34528 19888 65 669980 0 12926 4D:00:52:21 https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-arm64-default=p1c7a816cd0ad_s1bd4f769caf - default 140releng-armv7 2910ff97e727 parallel_build 34543 14826 60 5539 1397 0 12721 1D:09:35:28 https://pkg-status.freebsd.org/ampere3/build.html?mastername=140releng-armv7-default=2910ff97e727 Philip
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 19, 2024, at 07:16, Philip Paeps wrote: > On 2024-04-18 23:02:30 (+0800), Mark Millard wrote: > >> void wrote on >> Date: Thu, 18 Apr 2024 14:08:36 UTC : >> >>> Not sure where to post this.. >>> >>> The last bulk build for arm64 appears to have happened around >>> mid-March on ampere2. Is it broken? >> >> main-armv7 building is broken and the last completed build >> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It >> gets stuck making no progress until manually forced to stop, >> which leads to huge elapsed times for the incomplete builds: >> >> pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 >> (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 >> >> p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 >> (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 >> >> ampere2 alternates between trying to build main-arm64 and main-armv7, so >> main-armv7 being stuck blocks main-arm64 from building. >> >> One can see that all 13 job ID's show over 570 hours: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc >> >> It is not random which packages are building when this happens. Compare: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa >> >> By contrast, the 19 Feb 2024 from-scratch (full) build worked: >> >> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 >> >> My guess is that FreeBSD has something that broken after bd45bbe440 >> that was broken as of f5f08e41aa and was still broken at 75464941dc . > > I'll kill the build on ampere2 again. Thanks for the nudge. > > We don't really have good monitoring for this. Also: builds should time out > after 36 hours. The fact that this one does not is a bug in itself. > > Philip [hat: clusteradm] I'll note that I've never managed to replicate the problem for building for armv7 on aarch64. But my context never has the likes of: QUOTE Host OSVERSION: 156 Jail OSVERSION: 1500015 . . !!! Jail is newer than host. (Jail: 1500015, Host: 156) !!! !!! This is not supported. !!! !!! Host kernel must be same or newer than jail. !!! !!! Expect build failures. !!! END QUOTE but always has the two OSVERSION's the same, such as: Host OSVERSION: 1500015 Jail OSVERSION: 1500015 or, recently, Host OSVERSION: 1500018 Jail OSVERSION: 1500018 My bulk runs do go through the sequence where the hangups have repeated for main-armv7 on ampere2. I wonder what would happen if "Host OSVERSION" was updated (modernized) to match the modern "Jail OSVERSION" that would be used? === Mark Millard marklmi at yahoo.com
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On 2024-04-18 23:02:30 (+0800), Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 ampere2 alternates between trying to build main-arm64 and main-armv7, so main-armv7 being stuck blocks main-arm64 from building. One can see that all 13 job ID's show over 570 hours: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc It is not random which packages are building when this happens. Compare: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa By contrast, the 19 Feb 2024 from-scratch (full) build worked: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . I'll kill the build on ampere2 again. Thanks for the nudge. We don't really have good monitoring for this. Also: builds should time out after 36 hours. The fact that this one does not is a bug in itself. Philip [hat: clusteradm]
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Thu, Apr 18, 2024 at 08:02:30AM -0700, Mark Millard wrote: void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : Not sure where to post this.. The last bulk build for arm64 appears to have happened around mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: Should I report it in bugzilla? --
Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
On Apr 18, 2024, at 08:02, Mark Millard wrote: > void wrote on > Date: Thu, 18 Apr 2024 14:08:36 UTC : > >> Not sure where to post this.. >> >> The last bulk build for arm64 appears to have happened around >> mid-March on ampere2. Is it broken? > > main-armv7 building is broken and the last completed build > was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It > gets stuck making no progress until manually forced to stop, > which leads to huge elapsed times for the incomplete builds: > > pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 > (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 > > p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 > (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 > > ampere2 alternates between trying to build main-arm64 and main-armv7, so > main-armv7 being stuck blocks main-arm64 from building. > > One can see that all 13 job ID's show over 570 hours: > > http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc > > It is not random which packages are building when this happens. Compare: > > http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa > > By contrast, the 19 Feb 2024 from-scratch (full) build worked: > > http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 > > My guess is that FreeBSD has something that broken after bd45bbe440 > that was broken as of f5f08e41aa and was still broken at 75464941dc . > One thing of possible note: Failing . . . Host OSVERSION: 156 Jail OSVERSION: 1500014 and, more recently, Host OSVERSION: 156 Jail OSVERSION: 1500015 But the most recent working had . . . Host OSVERSION: 156 Jail OSVERSION: 1500014 So, if it is a FreeBSD problem, it seems to have started during 1500014 . === Mark Millard marklmi at yahoo.com
pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]
void wrote on Date: Thu, 18 Apr 2024 14:08:36 UTC : > Not sure where to post this.. > > The last bulk build for arm64 appears to have happened around > mid-March on ampere2. Is it broken? main-armv7 building is broken and the last completed build was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It gets stuck making no progress until manually forced to stop, which leads to huge elapsed times for the incomplete builds: pd5512ae7b8c6_s75464941dc 34472 12282 (+9196) 107 (+77) 4753 (+2247) 1390 (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56 p43e3af5f5763_sf5f08e41aa 19809 5919 (+3126) 137 (+100) 5363 (+2741) 1395 (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2 ampere2 alternates between trying to build main-arm64 and main-armv7, so main-armv7 being stuck blocks main-arm64 from building. One can see that all 13 job ID's show over 570 hours: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc It is not random which packages are building when this happens. Compare: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa By contrast, the 19 Feb 2024 from-scratch (full) build worked: http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440 My guess is that FreeBSD has something that broken after bd45bbe440 that was broken as of f5f08e41aa and was still broken at 75464941dc . === Mark Millard marklmi at yahoo.com
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:21, schrieb Alexander Leidinger: Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... A rather obscure problem was causing this. The "last" BE had canmount set to "on" instead of "noauto". No idea how this happened, but this resulted in the "last" BE to be mounted on "zfs mount -a" on top of the current BE. This means that all modules loaded after the zfs rc script has run was loading old kernel modules and the error message of kernel version mismatch was correct. I fiund the issue while bisecting the tree and suddenly the error message went away but the new issue of missing dev entries popped up (/dev was mounted correctly on the booting dataset, but the last BE was mounted on top of it and /dev went empty...). It looks to me like bectl was doing this (from "zpool history")... 2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211_ok 2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351 2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351 2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok I surely didn't do the "zfs set canmount=..." for those by hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230
(For the not working Wifi chip, I use at the moment an USB-Wifi dongle, Realtek RTL8191S WLAN Adapter, which works fine). I also can't get Xorg plus twm up; it says in /var/log/Xorg.0.log at the end: .. REDWOOD, ATI Mobility Radeon Graphics, CEDAR, ATI FirePro 2270, ATI Radeon HD 5450, CAYMAN, AMD Radeon HD 6900 Series, AMD Radeon HD 6900M Series, Mobility Radeon HD 6000 Series, BARTS, AMD Radeon HD 6800 Series, AMD Radeon HD 6700 Series, TURKS, CAICOS, ARUBA, TAHITI, PITCAIRN, VERDE, OLAND, HAINAN, BONAIRE, KABINI, MULLINS, KAVERI, HAWAII [ 248.442] (II) modesetting: Driver for Modesetting Kernel Drivers: kms [ 248.442] (II) scfb: driver for wsdisplay framebuffer: scfb [ 248.442] (II) VESA: driver for VESA chipsets: vesa [ 248.442] (--) Using syscons driver with X support (version 2.0) [ 248.442] (--) using VT number 9 [ 248.447] (EE) open /dev/dri/card0: No such file or directory [ 248.447] (WW) Falling back to old probe method for modesetting [ 248.447] (EE) open /dev/dri/card0: No such file or directory [ 248.447] (WW) Falling back to old probe method for scfb [ 248.447] scfb trace: probe start .. The kernel modules loaded are: Id Refs AddressSize Name 1 139 0x8020 1d4f010 kernel 21 0x81f5 36c0 coretemp.ko 31 0x81f55000 9c48 if_cdce.ko 42 0x81f5f000 6138 uether.ko 51 0x81f66000 a698 cuse.ko 61 0x81f71000f7f38 ipl.ko 71 0x83c0 462be0 zfs.ko 81 0x84063000 1510b8 radeonkms.ko 92 0x841b500073da0 drm.ko 101 0x83bd7000 22a8 iic.ko 113 0x83bda000 1100 linuxkpi_gplv2.ko 124 0x83bdc000 6320 dmabuf.ko 134 0x83be3000 3080 linuxkpi_hdmi.ko 141 0x83be7000 c7b0 ttm.ko 151 0x83bf4000 3370 acpi_wmi.ko 161 0x83bf8000 5ee0 ig4.ko 171 0x84229000 3210 intpm.ko 181 0x8422d000 2178 smbus.ko 191 0x842330ad8 linux.ko 204 0x84261000 be30 linux_common.ko 211 0x8426d0002ccf8 linux64.ko 221 0x8429a000 2270 pty.ko 231 0x8429d000 3540 fdescfs.ko 241 0x842a1000 73c0 linprocfs.ko 251 0x842a9000 43e4 linsysfs.ko 261 0x842ae000 4d00 ng_ubt.ko 276 0x842b3000 bb28 netgraph.ko 282 0x842bf000 a238 ng_hci.ko 294 0x842ca000 2668 ng_bluetooth.ko 301 0x842cd000 a7e0 if_rsu.ko 311 0x842d8000 3218 iichid.ko 325 0x842dc000 32a8 hidbus.ko 331 0x842e f250 ng_l2cap.ko 341 0x842f19f08 ng_btsocket.ko 351 0x8430a000 38b8 ng_socket.ko 371 0x8432e000 21e0 hms.ko 381 0x84331000 40a8 hidmap.ko 391 0x84336000 334d hmt.ko 401 0x8433a000 22c4 hconf.ko The complete Xorg.0.log is here: http://www.unixarea.de/Xorg.0.log.txt Thanks in advance for ideas. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
On 3/29/24 16:52, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? This is indeed how it works, those messages are emitted by CTF loading routines in 'kern/kern_ctf.c' as a warning and do not affect the rest of the module loading process. However, I completely agree that they are cryptic and spammy, I'll try to do something about that. Bojan
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? From my reading of linker_ctf_load_file(), this is exactly how it already works. Great that it works this way, I still suggest to print a message what the warning about missing stuff means. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: > Hi, > > sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work > (see below for the issue). As the monthly stabilisation pass didn't find > obvious issues, it is something related to my setup: > - not a generic kernel > - very modular kernel (as much as possible as a module) > - bind_now (a build without fails too, tested with clean /usr/obj) > - ccache (a build without fails too, tested with clean /usr/obj) > - kernel retpoline (build without in progress) > - userland retpoline (build without in progress) > - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't > retpoline) > - -fno-builtin > - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) > - malloc production > - COPTFLAGS= -O2 -pipe > > The issue is, that kernel modules load OK from loader, but once it starts > init any module fails to load (e.g. via autodetection of hardware or rc.conf > kld_list) with the message that the kernel and module versions are out of > sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. > I tried the workaround to load the modules from the loader, which works, but > then I can't login remotely as ssh fails to allocate a pty. By loading > modules via the loader, I can see messages about missing CTF info when the > nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko > instead of /boot/kernel/...ko) try to get initialised... and it looks like > they are failing to get initialised because of this missing CTF stuff (I'm > back to the previous boot env to be able to login remotely and send mails, I > don't have a copy of the failure message at hand). > > I assume the missing CTF stuff is due to the CTF based pretty printing > (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). > Is this supposed to fail to load modules which are compiled without CTF > data? Shouldn't this work gracefully (e.g. spit out a warning that pretty > printing is not available for module X and have the module working)? >From my reading of linker_ctf_load_file(), this is exactly how it already works. > Next steps: > - try a world without retpoline (bind_now and ccache active) > - try a kernel without CTF (bind now, ccache, retpoline active) > - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS > > If anyone has an idea how to debug this in some other way... > > Bye, > Alexander. > > -- > http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? Next steps: - try a world without retpoline (bind_now and ccache active) - try a kernel without CTF (bind now, ccache, retpoline active) - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS If anyone has an idea how to debug this in some other way... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230
El día miércoles, marzo 27, 2024 a las 03:51:33p. m. +0100, Matthias Apitz escribió: > The WLAN card seems to be: > > none2@pci0:1:0:0: class=0x028000 rev=0x00 hdr=0x00 vendor=0x14c3 > device=0x7961 subvendor=0x1a3b subdevice=0x4680 > vendor = 'MEDIATEK Corp.' > device = 'MT7921 802.11ax PCI Express Wireless Network Adapter' > class = network > > Perhaps not supported until today: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264300 While grepping through /usr/src I found the driver in /usr/src/sys/contrib/dev/mediatek/mt76/mt7921/ and a make && make install in # cd /usr/src/sys/modules/mt76 # make .. # make install ===> core (install) install -T release -o root -g wheel -m 555 mt76_core.ko /boot/modules/ kldxref /boot/modules ===> mt7915 (install) install -T release -o root -g wheel -m 555 if_mt7915.ko /boot/modules/ kldxref /boot/modules ===> mt7921 (install) install -T release -o root -g wheel -m 555 if_mt7921.ko /boot/modules/ kldxref /boot/modules installs it fine. There is also a manpage in /usr/src/share/man/man4/mt7921.4 (attached) I still have to build the firmware from ports/net/wifi-firmware-mt76-kmod. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub MT7921(4) FreeBSD Kernel Interfaces Manual MT7921(4) NAME mt7921 – MediaTek IEEE 802.11ax wireless network driver SYNOPSIS The driver will auto-load without any user interaction using devmatch(8) if enabled in rc.conf(5). Only if auto-loading is explicitly disabled, place the following lines in rc.conf(5) to manually load the driver as a module at boot time: kld_list="${kld_list} if_mt7921" The driver should automatically load any firmware needed for the particular chipset. It is discouraged to load the driver from loader(8). DESCRIPTION The mt7921 driver is derived from MediaTek's Linux mt76 driver and provides support for the following chipsets: MediaTek MT7921E (PCIe) This driver requires firmware to be loaded before it will work. The package wifi-firmware-mt76-kmod from the ports/net/wifi-firmware-mt76-kmod port needs to be installed before the driver is loaded. Otherwise no wlan(4) interface can be created using ifconfig(8). The driver uses the linuxkpi_wlan and linuxkpi compat framework to bridge between the Linux and native FreeBSD driver code as well as to the native net80211(4) wireless stack. While mt7921 supports all 802.11 a/b/g/n/ac and ax the compatibility code currently only supports 802.11 a/b/g modes. Support for 802.11 n/ac is to come. BUGS Certainly. SEE ALSO wlan(4), ifconfig(8), wpa_supplicant(8) HISTORY The mt7921 driver first appeared in FreeBSD 14.0. FreeBSD 14.0-CURRENT April 18, 2023FreeBSD 14.0-CURRENT
Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230
El día miércoles, marzo 27, 2024 a las 10:37:48a. m. +0100, Matthias Apitz escribió: > > Hello, > > I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to > boot FreeBSD with boot verbose messages from an USB key and I'm able to > login. The /var/log/messages are here > http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt A 'gpart list nda0' is here http://www.unixarea.de/nda0.txt Actual the (Windows) partitions are: # egrep 'Name:|Mediasize:' nda0.txt 1. Name: nda0p1 Mediasize: 272629760 (260M) 2. Name: nda0p2 Mediasize: 16777216 (16M) 3. Name: nda0p3 Mediasize: 510507949568 (475G) 4. Name: nda0p4 Mediasize: 1101004800 (1.0G) 5. Name: nda0p5 Mediasize: 209715200 (200M) 1. Name: nda0 Mediasize: 512110190592 (477G) A 'pciconf -lv' is here: http://www.unixarea.de/pciconf.txt The WLAN card seems to be: none2@pci0:1:0:0: class=0x028000 rev=0x00 hdr=0x00 vendor=0x14c3 device=0x7961 subvendor=0x1a3b subdevice=0x4680 vendor = 'MEDIATEK Corp.' device = 'MT7921 802.11ax PCI Express Wireless Network Adapter' class = network Perhaps not supported until today: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264300 An attached USB mouse is detected and works. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230
On Wed, Mar 27, 2024 at 12:38 PM Matthias Apitz wrote: > > > Hello, > > I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to > boot FreeBSD with boot verbose messages from an USB key and I'm able to > login. The /var/log/messages are here > http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt `pciconf -lv` would be useful too.
CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230
Hello, I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to boot FreeBSD with boot verbose messages from an USB key and I'm able to login. The /var/log/messages are here http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt I can identify the harddisk as nda0 (correct), but don't see any mouse and WLAN card. I will also boot an Ubuntu 22.04.4 from USB, maybe this gives more hints about hardware details. Thanks for any comments. I have 30 days to return the laptop to the store. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
CURRENT 220ee18f1964 memstick kernel panic, MacBookPro8,3
Originally posted to <https://discord.com/channels/727023752348434432/1221505362016862288> Photograph: <https://media.discordapp.net/attachments/1221505362016862288/1221505364936364152/image.png?ex=6612d285=66005d85=9b188930e96072deb379c61c52cca279c7cde78c0ba125199a62f336fe2083bd&==webp=lossless=915=686> USB flash drive written from FreeBSD-15.0-CURRENT-amd64-20240314-220ee18f1964-268793-memstick.img.xz Broadcom Wi-Fi-related, maybe? <https://bsd-hardware.info/?probe=89647876db#pci:14e4-4331-106b-00d6> <https://cgit.freebsd.org/src/log/?qt=range=220ee18f1964> Reproducible in safe mode.
Re: sysutils/pam_xdg: Cancelled on -CURRENT
On 2024-03-19 16:02, Emmanuel Vadot wrote: > On Tue, 19 Mar 2024 07:55:15 + > Alastair Hogge wrote: > >> On 2024-03-19 15:23, Emmanuel Vadot wrote: >> > Hi, >> >> Hey Emmanuel, >> >> > On Tue, 19 Mar 2024 06:54:27 + >> > Alastair Hogge wrote: >> > >> >> Hello, >> >> >> >> Recently a similar module (PAM) mentioned in the subject was committed >> >> to base[1]. The module in base masks the currently installed Port, the >> >> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg, >> >> however, I can now no longer build the Port. I noticed that the base >> >> module has no WITHOUT_ option, tho, that might be extreme for one >> >> module, but then again, the base module masks a more feature full >> >> module. What is the practice to enable use of the Port again? At the >> >> moment I am updating my host, and testing the following: >> >> >> >> diff --git a/lib/libpam/modules/modules.inc >> >> b/lib/libpam/modules/modules.inc >> >> index f3ab65333f4f..ddbb326f0312 100644 >> >> --- a/lib/libpam/modules/modules.inc >> >> +++ b/lib/libpam/modules/modules.inc >> >> @@ -30,4 +30,3 @@ MODULES += pam_ssh >> >> .endif >> >> MODULES+= pam_tacplus >> >> MODULES+= pam_unix >> >> -MODULES+= pam_xdg >> >> \ No newline at end of file >> >> >> >> 1: >> >> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292 >> >> >> >> Thanks. >> >> >> > >> > I don't see why you can't build the ports. >> >> From sysutils/pam_xdg[2]: >> >> if exists(/usr/lib/pam_xdg.so) >> IGNORE= module name conflict with a different implementation in >> base system >> endif > > Ah yes, I've missed this :) > >> > Using would be a problem but why do you want to use it now that we >> > have one in base ? >> > Do you have any problems with the one in base ? >> >> I would like to continue using sysutils/pam_xdg because it handles all >> ${XDG_*_HOME}, and local name spaces. > > XDG_*_HOME variables aren't needed, all applications must have a > fallback to the base directories in the spec and sysutils/pam_xdg > doesn't offer to use other directories so that's why I didn't implement > those in the base one. > What do you mean by "local name spaces" ? I meant all the other ${XDG_FU} excluding ${XDG_*_HOME}. Anyways, turns out incredibly mistaken. I deployed another corporate craptop from the dumpster today, and the User's homedir was not populated with XDG dirs. I was sure I was using sysutils/pam_xdg for that, but will now have to find my older scripts that predate using sysutils/pam_xdg, to achieve that. Sorry for the noise. Thanks, Alastair
Re: sysutils/pam_xdg: Cancelled on -CURRENT
On Tue, 19 Mar 2024 07:55:15 + Alastair Hogge wrote: > On 2024-03-19 15:23, Emmanuel Vadot wrote: > > Hi, > > Hey Emmanuel, > > > On Tue, 19 Mar 2024 06:54:27 + > > Alastair Hogge wrote: > > > >> Hello, > >> > >> Recently a similar module (PAM) mentioned in the subject was committed > >> to base[1]. The module in base masks the currently installed Port, the > >> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg, > >> however, I can now no longer build the Port. I noticed that the base > >> module has no WITHOUT_ option, tho, that might be extreme for one > >> module, but then again, the base module masks a more feature full > >> module. What is the practice to enable use of the Port again? At the > >> moment I am updating my host, and testing the following: > >> > >> diff --git a/lib/libpam/modules/modules.inc > >> b/lib/libpam/modules/modules.inc > >> index f3ab65333f4f..ddbb326f0312 100644 > >> --- a/lib/libpam/modules/modules.inc > >> +++ b/lib/libpam/modules/modules.inc > >> @@ -30,4 +30,3 @@ MODULES += pam_ssh > >> .endif > >> MODULES+= pam_tacplus > >> MODULES+= pam_unix > >> -MODULES+= pam_xdg > >> \ No newline at end of file > >> > >> 1: > >> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292 > >> > >> Thanks. > >> > > > > I don't see why you can't build the ports. > > From sysutils/pam_xdg[2]: > > if exists(/usr/lib/pam_xdg.so) > IGNORE= module name conflict with a different implementation in > base system > endif Ah yes, I've missed this :) > > Using would be a problem but why do you want to use it now that we > > have one in base ? > > Do you have any problems with the one in base ? > > I would like to continue using sysutils/pam_xdg because it handles all > ${XDG_*_HOME}, and local name spaces. XDG_*_HOME variables aren't needed, all applications must have a fallback to the base directories in the spec and sysutils/pam_xdg doesn't offer to use other directories so that's why I didn't implement those in the base one. What do you mean by "local name spaces" ? > 2: https://cgit.freebsd.org/ports/tree/sysutils/pam_xdg/Makefile#n16 > > Thanks. > Cheers, -- Emmanuel Vadot
Re: sysutils/pam_xdg: Cancelled on -CURRENT
On 2024-03-19 15:23, Emmanuel Vadot wrote: > Hi, Hey Emmanuel, > On Tue, 19 Mar 2024 06:54:27 + > Alastair Hogge wrote: > >> Hello, >> >> Recently a similar module (PAM) mentioned in the subject was committed >> to base[1]. The module in base masks the currently installed Port, the >> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg, >> however, I can now no longer build the Port. I noticed that the base >> module has no WITHOUT_ option, tho, that might be extreme for one >> module, but then again, the base module masks a more feature full >> module. What is the practice to enable use of the Port again? At the >> moment I am updating my host, and testing the following: >> >> diff --git a/lib/libpam/modules/modules.inc >> b/lib/libpam/modules/modules.inc >> index f3ab65333f4f..ddbb326f0312 100644 >> --- a/lib/libpam/modules/modules.inc >> +++ b/lib/libpam/modules/modules.inc >> @@ -30,4 +30,3 @@ MODULES += pam_ssh >> .endif >> MODULES+= pam_tacplus >> MODULES+= pam_unix >> -MODULES+= pam_xdg >> \ No newline at end of file >> >> 1: >> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292 >> >> Thanks. >> > > I don't see why you can't build the ports. >From sysutils/pam_xdg[2]: if exists(/usr/lib/pam_xdg.so) IGNORE= module name conflict with a different implementation in base system endif > Using would be a problem but why do you want to use it now that we > have one in base ? > Do you have any problems with the one in base ? I would like to continue using sysutils/pam_xdg because it handles all ${XDG_*_HOME}, and local name spaces. 2: https://cgit.freebsd.org/ports/tree/sysutils/pam_xdg/Makefile#n16 Thanks.
Re: sysutils/pam_xdg: Cancelled on -CURRENT
Hi, On Tue, 19 Mar 2024 06:54:27 + Alastair Hogge wrote: > Hello, > > Recently a similar module (PAM) mentioned in the subject was committed > to base[1]. The module in base masks the currently installed Port, the > man page can be accessed with man -M /usr/local/share/man 8 pam_xdg, > however, I can now no longer build the Port. I noticed that the base > module has no WITHOUT_ option, tho, that might be extreme for one > module, but then again, the base module masks a more feature full > module. What is the practice to enable use of the Port again? At the > moment I am updating my host, and testing the following: > > diff --git a/lib/libpam/modules/modules.inc > b/lib/libpam/modules/modules.inc > index f3ab65333f4f..ddbb326f0312 100644 > --- a/lib/libpam/modules/modules.inc > +++ b/lib/libpam/modules/modules.inc > @@ -30,4 +30,3 @@ MODULES += pam_ssh > .endif > MODULES+= pam_tacplus > MODULES+= pam_unix > -MODULES+= pam_xdg > \ No newline at end of file > > 1: > https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292 > > Thanks. > I don't see why you can't build the ports. Using would be a problem but why do you want to use it now that we have one in base ? Do you have any problems with the one in base ? Cheers, -- Emmanuel Vadot
sysutils/pam_xdg: Cancelled on -CURRENT
Hello, Recently a similar module (PAM) mentioned in the subject was committed to base[1]. The module in base masks the currently installed Port, the man page can be accessed with man -M /usr/local/share/man 8 pam_xdg, however, I can now no longer build the Port. I noticed that the base module has no WITHOUT_ option, tho, that might be extreme for one module, but then again, the base module masks a more feature full module. What is the practice to enable use of the Port again? At the moment I am updating my host, and testing the following: diff --git a/lib/libpam/modules/modules.inc b/lib/libpam/modules/modules.inc index f3ab65333f4f..ddbb326f0312 100644 --- a/lib/libpam/modules/modules.inc +++ b/lib/libpam/modules/modules.inc @@ -30,4 +30,3 @@ MODULES += pam_ssh .endif MODULES+= pam_tacplus MODULES+= pam_unix -MODULES+= pam_xdg \ No newline at end of file 1: https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292 Thanks.
Re: Unable to boot -CURRENT on Thinkpad P16s G2
On Thu, Mar 7, 2024 at 4:50 PM Doug Ambrisko wrote: > On Thu, Mar 07, 2024 at 07:15:48PM +0100, Philipp Ost wrote: > | On 2/28/24 21:10, Philipp Ost wrote: > | [boot log stripped] > | > Does anyone have any suggestions on how to proceed at this point? [..] > | > | Short follow-up: disabling uart0 and uart1 at the loader prompt allowed > us > | to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in > case > | it matters). > > UARTS on AMD can be a bit different. Some BIOS implementations seem > to set them up to work like legacy ports others do not. On a Naples > platform I helped add support for them since they were not setup > in the legacy configuration. The AMD servers I'm using now have them > setup in legacy mode and just work like on other systems. > > If I remember right those UARTS were defined in ACPI. On a laptop they > probably don't have serial ports and the probe is getting stuck on > something. It might be good to instrument it to see what. > It might also be time to finally drop the UART fallback when ACPI is present. I've seen spotty reports of accessing these registers (for uart, kbd and maybe mouse) causing problems. The ACPI definition of the UARTs would be additional uart units. The fallback stuff is needed only for extremely edge cases at this point. Warner
Re: Unable to boot -CURRENT on Thinkpad P16s G2
On Thu, Mar 07, 2024 at 07:15:48PM +0100, Philipp Ost wrote: | On 2/28/24 21:10, Philipp Ost wrote: | [boot log stripped] | > Does anyone have any suggestions on how to proceed at this point? [...] | | Short follow-up: disabling uart0 and uart1 at the loader prompt allowed us | to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in case | it matters). UARTS on AMD can be a bit different. Some BIOS implementations seem to set them up to work like legacy ports others do not. On a Naples platform I helped add support for them since they were not setup in the legacy configuration. The AMD servers I'm using now have them setup in legacy mode and just work like on other systems. If I remember right those UARTS were defined in ACPI. On a laptop they probably don't have serial ports and the probe is getting stuck on something. It might be good to instrument it to see what. Thanks, Doug A.
Re: Unable to boot -CURRENT on Thinkpad P16s G2
On 2/28/24 21:10, Philipp Ost wrote: [boot log stripped] Does anyone have any suggestions on how to proceed at this point? [...] Short follow-up: disabling uart0 and uart1 at the loader prompt allowed us to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in case it matters). Best Philipp
Unable to boot -CURRENT on Thinkpad P16s G2
Hi everyone, we have a Lenovo Thinkpad P16s Gen2 (Type 21K9) here with an AMD Ryzen 7 CPU which we would like to install FreeBSD on. Alas, it won't boot. The FreeBSD 15-CURRENT amd64 snapshot from February 22 hangs after: [...] acpi_acad0: on acpi0 battery0: on acpi0 A verbose boot provides some more information, but then hangs as well: [...] acpi_acad0: on acpi0 AcpiOsExecute: task queue not started battery0: on acpi0 AcpiOsExecute: task queue not started ACPI: Enabled 1 GPEs in block 00 to 1F ahc_isa_identify 0: ioport 0xc00 alloc failed ahc_isa_identify 1: ioport 0x1c00 alloc failed ahc_isa_identify 2: ioport 0x2c00 alloc failed ahc_isa_identify 3: ioport 0x3c00 alloc failed ahc_isa_identify 4: ioport 0x4c00 alloc failed ahc_isa_identify 5: ioport 0x5c00 alloc failed ahc_isa_identify 6: ioport 0x6c00 alloc failed isa_probe_children: disabling PnP devices atkbdc: atkbdc0 already exists; skipping it atrtc: atrtc0 already exists; skipping it attimer: attimer0 already exists; skipping it sc: sc0 already exists; skipping it isa_probe_children: probing non-PnP devices sc0 failed to probe on isa0 vga0 failed to probe on isa0 fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 ppc0: cannot reserve I/O port range ppc0 failed to probe at irq 7 on isa0 uart0 failed to probe at port 0x3f8 irq 4 on isa0 Booting with ACPI disabled results in a kernel panic. An older snapshot (-CURRENT from 15. February) as well as 14-STABLE/-RELEASE fail to boot in the same manner. Does anyone have any suggestions on how to proceed at this point? We really would like to get FreeBSD running on this machine and are willing to help. As neither of us is versed in low-level system programming, we would require some support. Currently, the notebook is running Lubuntu 22.04 from which we gathered some information about the hardware. Everything we collected so far can be found here: https://philippost.de/TP-P16s-G2/ If there is anything missing, please indicate which information you require. Thanks a lot in advance! Best Philipp and Klaus
Re: Missing files on -current
On Sat, Feb 24, 2024 at 03:59:01PM +, Gary Jennejohn wrote: > > The function run_rc_scripts is defined in /usr/src/libexec/rc/rc.subr and > is called in /usr/src/libexec/rc/rc. /etc/rc includes /etc/rc.subr. > > So, maybe one of these files is not up to date under /etc? > My fault, etcupdate reported a conflict and I didn't notice it. Sorry for the noise! bob prohaska
Re: Missing files on -current
On Sat, 24 Feb 2024 06:53:37 -0800 bob prohaska wrote: > A Pi4 running -current completed a build/install cycle for world and kernel > without obvious errors but failed to reboot, reporting: > ... > Warning: no time-of-day clock registered, system time will not be set > accurately > Dual Console: Serial Primary, Video Secondary > /etc/rc: run_rc_scripts: not found > /etc/rc: run_rc_scripts: not found > /etc/rc: have: not found > > Sat Feb 24 13:42:09 UTC 2024 > 2024-02-24T13:42:10.007616+00:00 - init 31 - - can't exec getty > '/usr/libexec/getty' for port /dev/ttyv1: No such file or directory > ... > > Uname -a reports: > FreeBSD 15.0-CURRENT FreeBSD 15.0-CURRENT #121 main-n268499-b9870ba93ea9: > Fri Feb 23 23:14:59 PST 2024 > b...@nemesis.zefox.com:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC > arm64distribution. > > Power cycling allowed boot to single-user, running fsck -fy reports a clean > root file system. > > /etc/fstab contains > /dev/da0s2a / ufs rw 1 1 > /dev/da0s1 /boot/msdos msdosfs rw,noatime 0 0 > #tmpfs /tmp tmpfs rw,mode=1777,size=50m 0 0 > /dev/da0s2d /usrufs rw 2 2 > /dev/da0s2b noneswapsw > > There does not seem to be a file named run_rc_scripts present > in the filesystem. > > Any suggestions on how to back myself out of this corner > would be much appreciated! > > Thanks for reading, > The function run_rc_scripts is defined in /usr/src/libexec/rc/rc.subr and is called in /usr/src/libexec/rc/rc. /etc/rc includes /etc/rc.subr. So, maybe one of these files is not up to date under /etc? -- Gary Jennejohn
Re: Missing files on -current
On Sat, Feb 24, 2024 at 07:02:19AM -0800, David Wolfskill wrote: > > This is from an amd64 system at main-n268514-61b88a230bac, but > run_rc_scripts is a shell function defined in /etc/rc.subr. > > So the whine about not finding run_rc_scripts would indicate that at > least one of the following is true: > > * The script that should have sourced /etc/rc.subr failed to do so. > > * /etc/rc.csubr is corrupted, and fails to define run_rc_scripts(). > Indeed, it seems to be absent: root@:~ # more /etc/rc.csubr /etc/rc.csubr: No such file or directory root@:~ # However, the same is true of a Pi3 running 14-release p5. It boots reliably once it reaches loader. I wouldn't expect this part of the boot process to be platform dependent. Maybe -current and -release do things differently? > * /etc/rc.subr is missing. Present and accounted for: root@:~ # ls -l /etc/rc.subr -rw-r--r-- 1 root wheel 51911 Nov 18 21:46 /etc/rc.subr root@:~ # Thanks for writing! bob prohaska
Missing files on -current
A Pi4 running -current completed a build/install cycle for world and kernel without obvious errors but failed to reboot, reporting: ... Warning: no time-of-day clock registered, system time will not be set accurately Dual Console: Serial Primary, Video Secondary /etc/rc: run_rc_scripts: not found /etc/rc: run_rc_scripts: not found /etc/rc: have: not found Sat Feb 24 13:42:09 UTC 2024 2024-02-24T13:42:10.007616+00:00 - init 31 - - can't exec getty '/usr/libexec/getty' for port /dev/ttyv1: No such file or directory ... Uname -a reports: FreeBSD 15.0-CURRENT FreeBSD 15.0-CURRENT #121 main-n268499-b9870ba93ea9: Fri Feb 23 23:14:59 PST 2024 b...@nemesis.zefox.com:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64distribution. Power cycling allowed boot to single-user, running fsck -fy reports a clean root file system. /etc/fstab contains /dev/da0s2a / ufs rw 1 1 /dev/da0s1 /boot/msdos msdosfs rw,noatime 0 0 #tmpfs /tmp tmpfs rw,mode=1777,size=50m 0 0 /dev/da0s2d /usrufs rw 2 2 /dev/da0s2b noneswapsw There does not seem to be a file named run_rc_scripts present in the filesystem. Any suggestions on how to back myself out of this corner would be much appreciated! Thanks for reading, bob prohaska
Re: FreeBSD CURRENT stabilization cycle
Hi, And whom do you want to „stab“ with this? ;) Why not do the same thing that ports does and call this „monthly“ which is pretty much what it is and easy to understand and you can have one build at the end of that week? Cheers, Franco > On 24. Feb 2024, at 12:51, Kirill Ponomarev wrote: > > On 02/23, Mark Millard wrote: >> Gleb Smirnoff wrote on >> Date: Sat, 24 Feb 2024 04:32:52 UTC : >> >>> More seriously speaking, I >>> actually hope that in some future snapshots.FreeBSD.org will start using >>> these >>> points for snapshot generation. >> >> How about also the likes of: >> >> https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/ >> >> for pkgbase (various "aarch64" replacements too)? > > yes, great idea, base_stabweek or similar is something I'd vote for.
Re: FreeBSD CURRENT stabilization cycle
On 02/23, Mark Millard wrote: > Gleb Smirnoff wrote on > Date: Sat, 24 Feb 2024 04:32:52 UTC : > > > More seriously speaking, I > > actually hope that in some future snapshots.FreeBSD.org will start using > > these > > points for snapshot generation. > > How about also the likes of: > > https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/ > > for pkgbase (various "aarch64" replacements too)? yes, great idea, base_stabweek or similar is something I'd vote for. signature.asc Description: PGP signature
RE: FreeBSD CURRENT stabilization cycle
Gleb Smirnoff wrote on Date: Sat, 24 Feb 2024 04:32:52 UTC : > More seriously speaking, I > actually hope that in some future snapshots.FreeBSD.org will start using these > points for snapshot generation. How about also the likes of: https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/ for pkgbase (various "aarch64" replacements too)? === Mark Millard marklmi at yahoo.com
FreeBSD CURRENT stabilization cycle
Hi FreeBSD CURRENT users, back in November I came up with a proposal of providing some stabilization cadence to development of the main branch, also known as FreeBSD CURRENT. Here is a video with the initial proposal and following discussion at VendorBSD Conference (18 minutes): https://www.youtube.com/live/k-AzShVdAHo?si=hPAhCd_-RuoTRqcW=2511 And here goes an up to date version of the plan! In the last decade quality of FreeBSD CURRENT improved so much, that not only brave developers run it on their laptops, but also large companies use it. Time to bring it to a new level. Every individual or a business that use CURRENT has their own protocol of how to stay up date and avoid disasters. An individual will first update their desktop and only after that will update server(s). A company would run their internal regression test suite or some other validation protocol. Right now we all do that independently from each other having little coordination and providing little help each other. We also do not broadcast to the world that FreeBSD CURRENT is usable. I've seen a lot of people who stay away from CURRENT based on their 20-year old experience with it. Here is how we are going to improve: * Last week of a month is declared a stabilization week Src committers are encouraged to avoid pushing risky changes to FreeBSD/main during this week. This is an advice, not a policy! If a committer breaks something during the week they got 3x public shame, but no administrative penalties or fines. Committers are encouraged to push bug fixes, improve unit tests, clean up comments and improve documentation. It is a also a good time to do merging of past work to stable branches. Developers of course will continue their work on bigger projects in their private branches. Sidenote: there is no agreement in the world what is "the last week of a month". For our purposes we will use the week that contains the last Friday of the month. Because we want the monthly snapshot to be called by the name of the month (not next month) and thus we want the last day of the stabilization cycle always to be in that month. * Monday of the stabweek is the day to update your CURRENT and test it Monday 8:00 GMT a tag is created and published. Right now it is published at my personal https://github.com/glebius/FreeBSD/tags. Note that the tag points at a hash in the official repo, so there is no trust involved here. At Netflix I will be working on merging the tagged revision into our tree and I will hand off the resulting branch to our excellent testing team (dhw@ + olivier@) usually by the end of Monday (PST time). Other companies and parties are encouraged to start testing the tagged revision. Peter Holm may switch his stress2 to run that revision. You are encouraged to update your desktop or laptop that of course runs FreeBSD CURRENT. * A short lived stabilization branch may be created In case we discover regressions compared to the previous month stabweek, bug fixes to them will be committed to a short lived branch. This branch may contain direct cherry-picks from main, as well as work-in-progress bugfixes that had not yet been committed to main, reverts of commits and even stop gaps that disable certain functionality for the sake of stability. This branch may be rebased and force pushed if a temporary bugfix appears different to a final one in main. The branch may observe commits immediately Monday morning in case we already know about a certain regression. The branch will not observe commits to a long standing bugs that were fixed in main during the stabweek, unless somebody explicitly asks to include one. And finally, the branch may not even be created in case testing confirms everything is alright with the Monday tag. The branch will be published at https://github.com/glebius/FreeBSD. There is certain level of trust required to use it. That may change to a more official publishing point in the future. * The stabweek quiet period ends no later than Friday 18:00 GMT No matter if we were able to identify and fix any or all bugs the quiet period ends. The public shame level for src committers breaking FreeBSD CURRENT goes back to normal level. In a case we were not able to address all issues by end of Friday the stabweek branch will be active past the end of the stabweek, as we want to collect all regression fixes in the branch. But this is the worst case scenario! A more appreciated scenario is that the stabilization period ends earlier in the week. If all testing parties report their satisfaction with state of main as is or of the stabweek branch and if I don't see any fresh bug reports in bugzilla or submissions via other channels, there is no reason to withheld committers with pushing their stuff. At the end of the stabilization period be it Friday or earlier I will write email to current@ reporting the results: - were there any regression identified with the Monday tag - what has been a
Re: nvme controller reset failures on recent -CURRENT
Hi all, > Am 13.02.2024 um 20:56 schrieb Pete Wright : > 1. M.2 nvme really does need proper cooling, much more so than traditional > SATA/SAS/SCSI drives. I recently found a tool named "Scrutiny" that presents a nice dashboard of all your disk devices and their SMART data including crucial points like temperature. Pros: Open source Nice web UI Uses smartmontools to gather the data, not reinventing the wheel Agents that can be called from cron jobs for many OSes including FreeBSD Alerting via a variety of communication channels Cons: Central hub best run on Linux plus docker compose No authentication whatsoever, so strictly internal use No grouping or any organisation of systems so does not scale beyond tens of servers I found a couple of problematic HDDs and SSDs right after deploying it which regular SMART tests overlooked. https://github.com/AnalogJ/scrutiny Look for the Hub/Spoke deployment if you are willing to use e.g. a Linux VM to run the tool, then point your FreeBSD systems at that. It probably can be deployed strictly on FreeBSD, too, using the manual installation instructions. HTH, kind regards, Patrick
Re: nvme controller reset failures on recent -CURRENT
I had issues with a nvme drive in an intel nuc. When I asked freebsd-hackers, overheating was the first guess: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-May/052783.html I blew the dust out of the fan assembly and changed the bios fan settings to be more aggressive and the system has been rock solid since. Craig
Re: nvme controller reset failures on recent -CURRENT
There's a tiny chance that this could be something more exotic, but my money is on hardware gone bad after 2 years of service. I don't think this is 'wear out' of the NAND (it's only 15TB written, but it could be if this drive is really really crappy nand: first generation QLC maybe, but it seems too new). It might also be a connector problem that's developed over time. There might be a few other things too, but I don't think this is a U.2 drive with funky cables. The system was probably idle the majority of those two years of power on time. It's one of these: https://www.techpowerup.com/ssd-specs/intel-660p-512-gb.d437 I've seen comments that these generally don't need cooling. I just ordered a heatsink with some nice big fins, but it will take a week or more to arrive. just wanted to add another data-point to this discussion. i had a crucial NVME drive on my workstation that recently was showing similar problems. after much debugging i came to the same conclusion that it was getting too hot. i went ahead an purchased a Sabrent NVME drive that came with a heat sink. i've also starting making much more use of my workstation (and the disk subsystem) and have had zero issues. so lessons learnt: 1. M.2 nvme really does need proper cooling, much more so than traditional SATA/SAS/SCSI drives. 2. not all vendors do a great job reporting the health of devices -pete -- Pete Wright p...@nomadlogic.org
Re: nvme controller reset failures on recent -CURRENT
On 12 Feb, Warner Losh wrote: > On Mon, Feb 12, 2024 at 9:15 PM Don Lewis wrote: > >> On 12 Feb, Maxim Sobolev wrote: >> > Might be an overheating. Today's nvme drives are notoriously flaky if you >> > run them without proper heat sink attached to it. >> >> I don't think it is a thermal problem. According to the drive health >> page, the device temperature has never reached Temperature 2, whatever >> that is. The room temperature is around 65F. The system was stable >> last summer when the room temperature spent a lot of time in the 80-85F >> range. The device temperature depends a lot on the I/O rate, and the >> last panic happened when the I/O rate had been below 40tps for quite a >> while. >> > > It did reach temperature 1, though. That's the 'Warning this drive is too > hot' temperature. It has spent 41213 minutes of your 19297 hours of up > time, or an average of 2 minutes per hour. That's too much. Temperature > 2 is critical error: we are about to shut down completely due to it > being too hot. It's only a couple degrees below hardware power off > due to temperature in many drives. Some really cheap ones don't really > implement it at all. On my card with the bad heat sink, Warning temp is > 70C while critical is 75C while IIRC thermal shutdown is 78C or 80C. > > I don't think we report these values in nvmecontrol identify. But you can > do a raw dump with -x look at bytes 266:267 for warning and 268:269 > for critical. > > In contrast, the few dozen drives that I have, all of which have been > abused in various ways, And only one of them has any heat issues, > and that one is an engineering special / sample with what I think is > a damaged heat sink. If your card has no heat sink, this could well > be what's going on. > > This panic means "the nvme card lost its mind and stopped talking > to the host". Its status registers read 0xff's, which means that the card > isn't decoding bus signals. Usually this means that the firmware on the > card has faulted and rebooted. If the card is overheating, then this could > well be what's happening. > > There's a tiny chance that this could be something more exotic, > but my money is on hardware gone bad after 2 years of service. I don't think > this is 'wear out' of the NAND (it's only 15TB written, but it could be if > this > drive is really really crappy nand: first generation QLC maybe, but it seems > too new). It might also be a connector problem that's developed over time. > There might be a few other things too, but I don't think this is a U.2 drive > with funky cables. The system was probably idle the majority of those two years of power on time. It's one of these: https://www.techpowerup.com/ssd-specs/intel-660p-512-gb.d437 I've seen comments that these generally don't need cooling. I just ordered a heatsink with some nice big fins, but it will take a week or more to arrive. > >> > On Mon, Feb 12, 2024, 4:28 PM Don Lewis wrote: >> > >> >> I just upgraded my package build machine to: >> >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e >> >> from: >> >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 >> >> and I've had two nvme-triggered panics in the last day. >> >> >> >> nvme is being used for swap and L2ARC. I'm not able to get a crash >> >> dump, probably because the nvme device has gone away and I get an error >> >> about not having a dump device. It looks like a low-memory panic >> >> because free memory is low and zfs is calling malloc(). >> >> >> >> This shows up in the log leading up to the panic: >> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a >> >> timeout a >> >> nd possible hot unplug. >> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller >> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a >> >> timeout a >> >> nd possible hot unplug. >> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete >> >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times >> >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o >> >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping >> watchdog >> >> ti >> >> meout. >> >> >> >> The device looks healthy to me: >> >> SMART/Health Information Log >> >> ===
Re: nvme controller reset failures on recent -CURRENT
On Mon, Feb 12, 2024 at 9:15 PM Don Lewis wrote: > On 12 Feb, Maxim Sobolev wrote: > > Might be an overheating. Today's nvme drives are notoriously flaky if you > > run them without proper heat sink attached to it. > > I don't think it is a thermal problem. According to the drive health > page, the device temperature has never reached Temperature 2, whatever > that is. The room temperature is around 65F. The system was stable > last summer when the room temperature spent a lot of time in the 80-85F > range. The device temperature depends a lot on the I/O rate, and the > last panic happened when the I/O rate had been below 40tps for quite a > while. > It did reach temperature 1, though. That's the 'Warning this drive is too hot' temperature. It has spent 41213 minutes of your 19297 hours of up time, or an average of 2 minutes per hour. That's too much. Temperature 2 is critical error: we are about to shut down completely due to it being too hot. It's only a couple degrees below hardware power off due to temperature in many drives. Some really cheap ones don't really implement it at all. On my card with the bad heat sink, Warning temp is 70C while critical is 75C while IIRC thermal shutdown is 78C or 80C. I don't think we report these values in nvmecontrol identify. But you can do a raw dump with -x look at bytes 266:267 for warning and 268:269 for critical. In contrast, the few dozen drives that I have, all of which have been abused in various ways, And only one of them has any heat issues, and that one is an engineering special / sample with what I think is a damaged heat sink. If your card has no heat sink, this could well be what's going on. This panic means "the nvme card lost its mind and stopped talking to the host". Its status registers read 0xff's, which means that the card isn't decoding bus signals. Usually this means that the firmware on the card has faulted and rebooted. If the card is overheating, then this could well be what's happening. There's a tiny chance that this could be something more exotic, but my money is on hardware gone bad after 2 years of service. I don't think this is 'wear out' of the NAND (it's only 15TB written, but it could be if this drive is really really crappy nand: first generation QLC maybe, but it seems too new). It might also be a connector problem that's developed over time. There might be a few other things too, but I don't think this is a U.2 drive with funky cables. Warner > > On Mon, Feb 12, 2024, 4:28 PM Don Lewis wrote: > > > >> I just upgraded my package build machine to: > >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > >> from: > >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 > >> and I've had two nvme-triggered panics in the last day. > >> > >> nvme is being used for swap and L2ARC. I'm not able to get a crash > >> dump, probably because the nvme device has gone away and I get an error > >> about not having a dump device. It looks like a low-memory panic > >> because free memory is low and zfs is calling malloc(). > >> > >> This shows up in the log leading up to the panic: > >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a > >> timeout a > >> nd possible hot unplug. > >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller > >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a > >> timeout a > >> nd possible hot unplug. > >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete > >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times > >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o > >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping > watchdog > >> ti > >> meout. > >> > >> The device looks healthy to me: > >> SMART/Health Information Log > >> > >> Critical Warning State: 0x00 > >> Available spare: 0 > >> Temperature: 0 > >> Device reliability:0 > >> Read only: 0 > >> Volatile memory backup:0 > >> Temperature:312 K, 38.85 C, 101.93 F > >> Available spare:100 > >> Available spare threshold: 10 > >> Percentage used:3 > >> Data units (512,000 byte) read: 5761183 > >> Data units written: 29911502 > >> Host read commands: 471921188 >
Re: nvme controller reset failures on recent -CURRENT
On 12 Feb, Maxim Sobolev wrote: > Might be an overheating. Today's nvme drives are notoriously flaky if you > run them without proper heat sink attached to it. I don't think it is a thermal problem. According to the drive health page, the device temperature has never reached Temperature 2, whatever that is. The room temperature is around 65F. The system was stable last summer when the room temperature spent a lot of time in the 80-85F range. The device temperature depends a lot on the I/O rate, and the last panic happened when the I/O rate had been below 40tps for quite a while. > On Mon, Feb 12, 2024, 4:28 PM Don Lewis wrote: > >> I just upgraded my package build machine to: >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e >> from: >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 >> and I've had two nvme-triggered panics in the last day. >> >> nvme is being used for swap and L2ARC. I'm not able to get a crash >> dump, probably because the nvme device has gone away and I get an error >> about not having a dump device. It looks like a low-memory panic >> because free memory is low and zfs is calling malloc(). >> >> This shows up in the log leading up to the panic: >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a >> timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a >> timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog >> ti >> meout. >> >> The device looks healthy to me: >> SMART/Health Information Log >> >> Critical Warning State: 0x00 >> Available spare: 0 >> Temperature: 0 >> Device reliability:0 >> Read only: 0 >> Volatile memory backup:0 >> Temperature:312 K, 38.85 C, 101.93 F >> Available spare:100 >> Available spare threshold: 10 >> Percentage used:3 >> Data units (512,000 byte) read: 5761183 >> Data units written: 29911502 >> Host read commands: 471921188 >> Host write commands:605394753 >> Controller busy time (minutes): 32359 >> Power cycles: 110 >> Power on hours: 19297 >> Unsafe shutdowns: 14 >> Media errors: 0 >> No. error info log entries: 0 >> Warning Temp Composite Time:0 >> Error Temp Composite Time: 0 >> Temperature 1 Transition Count: 5231 >> Temperature 2 Transition Count: 0 >> Total Time For Temperature 1: 41213 >> Total Time For Temperature 2: 0 >> >> >>
Re: nvme controller reset failures on recent -CURRENT
On 12 Feb, Mark Johnston wrote: > On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: >> I just upgraded my package build machine to: >> FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e >> from: >> FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 >> and I've had two nvme-triggered panics in the last day. >> >> nvme is being used for swap and L2ARC. I'm not able to get a crash >> dump, probably because the nvme device has gone away and I get an error >> about not having a dump device. It looks like a low-memory panic >> because free memory is low and zfs is calling malloc(). >> >> This shows up in the log leading up to the panic: >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a >> nd possible hot unplug. >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti >> meout. > > Are you by chance using the drive mentioned here? > https://github.com/openzfs/zfs/discussions/14793 > > I was bitten by that and ended up replacing the drive with a different > model. The crash manifested exactly as you describe, though I didn't > have L2ARC or swap enabled on it. Nope: nda0 at nvme0 bus 0 scbus9 target 0 lun 1 nda0: nda0: Serial Number BTNH940617WE512A nda0: nvme version 1.3 nda0: 488386MB (1000215216 512 byte sectors) I'm not seeing super high I/O rates> I happened to have iostat running when the machine paniced: 0 584 88.431 2.68 65.8 112 7.18 68.2 107 7.13 80 0 20 0 0 0 565 99.132 3.06 27.974 2.01 30.570 2.08 80 0 20 0 0 0 612 92.831 2.77 18.9 148 2.74 18.9 148 2.73 86 0 14 0 0 0 618 88.613 1.17 25.059 1.44 24.261 1.44 89 0 11 0 0 0 586 45.4 5 0.22 31.455 1.70 30.857 1.70 84 0 16 0 0 0 598 12.7 3 0.03 38.164 2.40 37.166 2.40 84 0 16 0 0 0 675 36.1 6 0.21 23.7 156 3.62 22.7 164 3.63 88 0 12 0 0 0 641 6.9 6 0.04 25.7 243 6.10 25.3 246 6.08 71 0 29 0 0 0 737 20.1 9 0.18 36.4 148 5.24 37.2 144 5.24 78 0 22 0 0 0 578 44.723 1.03 25.1 164 4.01 25.5 161 3.99 86 0 14 0 0 0 608 70.315 1.06 51.164 3.19 51.364 3.19 89 0 11 0 0 0 624 38.6 9 0.35 32.3 121 3.80 32.2 121 3.79 90 0 10 0 0 0 577 80.616 1.28 37.866 2.44 36.569 2.46 90 0 10 0 0 tty nda0 ada0 ada1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 566 87.716 1.39 27.260 1.60 25.366 1.62 87 0 13 0 0 0 599 77.211 0.83 17.4 391 6.66 17.3 395 6.66 74 0 26 0 0 0 660 45.0 7 0.31 18.7 575 10.51 18.6 578 10.49 76 0 24 0 0 0 615 37.7 8 0.31 24.0 303 7.11 24.0 303 7.11 58 0 42 0 0 Fssh_packet_write_wait: ... port 22: Broken pipe ada* are old and slow spinning rust. That report does mention something else that could also be a cause. I upgraded the motherboard BIOS around the same time. When I get a chance, I'll drop back to the older FreeBSD version and see if the problem goes away.
Re: nvme controller reset failures on recent -CURRENT
On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote: > I just upgraded my package build machine to: > FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > from: > FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 > and I've had two nvme-triggered panics in the last day. > > nvme is being used for swap and L2ARC. I'm not able to get a crash > dump, probably because the nvme device has gone away and I get an error > about not having a dump device. It looks like a low-memory panic > because free memory is low and zfs is calling malloc(). > > This shows up in the log leading up to the panic: > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: resetting controller > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete > Feb 12 10:07:41 zipper syslogd: last message repeated 2 times > Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o > Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti > meout. Are you by chance using the drive mentioned here? https://github.com/openzfs/zfs/discussions/14793 I was bitten by that and ended up replacing the drive with a different model. The crash manifested exactly as you describe, though I didn't have L2ARC or swap enabled on it. > The device looks healthy to me: > SMART/Health Information Log > > Critical Warning State: 0x00 > Available spare: 0 > Temperature: 0 > Device reliability:0 > Read only: 0 > Volatile memory backup:0 > Temperature:312 K, 38.85 C, 101.93 F > Available spare:100 > Available spare threshold: 10 > Percentage used:3 > Data units (512,000 byte) read: 5761183 > Data units written: 29911502 > Host read commands: 471921188 > Host write commands:605394753 > Controller busy time (minutes): 32359 > Power cycles: 110 > Power on hours: 19297 > Unsafe shutdowns: 14 > Media errors: 0 > No. error info log entries: 0 > Warning Temp Composite Time:0 > Error Temp Composite Time: 0 > Temperature 1 Transition Count: 5231 > Temperature 2 Transition Count: 0 > Total Time For Temperature 1: 41213 > Total Time For Temperature 2: 0 > >
Re: nvme controller reset failures on recent -CURRENT
Might be an overheating. Today's nvme drives are notoriously flaky if you run them without proper heat sink attached to it. -Max On Mon, Feb 12, 2024, 4:28 PM Don Lewis wrote: > I just upgraded my package build machine to: > FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e > from: > FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 > and I've had two nvme-triggered panics in the last day. > > nvme is being used for swap and L2ARC. I'm not able to get a crash > dump, probably because the nvme device has gone away and I get an error > about not having a dump device. It looks like a low-memory panic > because free memory is low and zfs is calling malloc(). > > This shows up in the log leading up to the panic: > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a > timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: resetting controller > Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a > timeout a > nd possible hot unplug. > Feb 12 10:07:41 zipper syslogd: last message repeated 1 times > Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete > Feb 12 10:07:41 zipper syslogd: last message repeated 2 times > Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o > Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog > ti > meout. > > The device looks healthy to me: > SMART/Health Information Log > > Critical Warning State: 0x00 > Available spare: 0 > Temperature: 0 > Device reliability:0 > Read only: 0 > Volatile memory backup:0 > Temperature:312 K, 38.85 C, 101.93 F > Available spare:100 > Available spare threshold: 10 > Percentage used:3 > Data units (512,000 byte) read: 5761183 > Data units written: 29911502 > Host read commands: 471921188 > Host write commands:605394753 > Controller busy time (minutes): 32359 > Power cycles: 110 > Power on hours: 19297 > Unsafe shutdowns: 14 > Media errors: 0 > No. error info log entries: 0 > Warning Temp Composite Time:0 > Error Temp Composite Time: 0 > Temperature 1 Transition Count: 5231 > Temperature 2 Transition Count: 0 > Total Time For Temperature 1: 41213 > Total Time For Temperature 2: 0 > > >
nvme controller reset failures on recent -CURRENT
I just upgraded my package build machine to: FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e from: FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 and I've had two nvme-triggered panics in the last day. nvme is being used for swap and L2ARC. I'm not able to get a crash dump, probably because the nvme device has gone away and I get an error about not having a dump device. It looks like a low-memory panic because free memory is low and zfs is calling malloc(). This shows up in the log leading up to the panic: Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: resetting controller Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete Feb 12 10:07:41 zipper syslogd: last message repeated 2 times Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti meout. The device looks healthy to me: SMART/Health Information Log Critical Warning State: 0x00 Available spare: 0 Temperature: 0 Device reliability:0 Read only: 0 Volatile memory backup:0 Temperature:312 K, 38.85 C, 101.93 F Available spare:100 Available spare threshold: 10 Percentage used:3 Data units (512,000 byte) read: 5761183 Data units written: 29911502 Host read commands: 471921188 Host write commands:605394753 Controller busy time (minutes): 32359 Power cycles: 110 Power on hours: 19297 Unsafe shutdowns: 14 Media errors: 0 No. error info log entries: 0 Warning Temp Composite Time:0 Error Temp Composite Time: 0 Temperature 1 Transition Count: 5231 Temperature 2 Transition Count: 0 Total Time For Temperature 1: 41213 Total Time For Temperature 2: 0
Re: make buildworld failure on arm64 on -current n267777
On Fri, 26 Jan 2024, at 00:14, void wrote: > In /usr/src # git rev-list --count --first-parent HEAD > 26 in /usr/src, a 'git reset --hard' followed by 'git pull' and then 'git checkout main' fixed this. For some reason, 'git pull --ff-only' didn't pull /usr/src/sys/contrib/dev/acpica/include/platform ! --
make buildworld failure on arm64 on -current n267777
In /usr/src # git rev-list --count --first-parent HEAD 26 include/machine -> /usr/src/sys/arm64/include Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/vers.c Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/8x16.c Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/autoload.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/bootinfo.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/conf.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/copy.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/efi_main.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/framebuffer.o Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/main.o /usr/src/stand/efi/loader_4th/../loader/main.c:63:10: fatal error: 'platform/acfreebsd.h' file not found 63 | #include "platform/acfreebsd.h" | ^~ 1 error generated. make[2]: stopped in /usr/src make[2]: stopped in /usr/src make[4]: stopped in /usr/src/secure/lib make[3]: stopped in /usr/src/secure make[2]: stopped in /usr/src make[3]: stopped in /usr/src/lib make[2]: stopped in /usr/src 93.16 real 26.94 user 8.34 sys
Re: NFSv4 crash of CURRENT
On Mon, Jan 15, 2024 at 11:03 AM FreeBSD User wrote: > > Am Mon, 15 Jan 2024 11:53:31 +0100 > Peter Blok schrieb: > > > Hi, > > > > Forgot to mention I’m on 13-stable. The fix that is causing the crash with > > automounted NFS > > is: > > > > commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b > > Author: Konstantin Belousov > > Date: Tue Jan 2 00:22:44 2024 +0200 > > > > nfsclient: limit situations when we do unlocked read-ahead by nfsiod > > > > (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) > > > > When I remove the fix, the problem is gone. Add it back and the crash > > happens. > > > > Peter > > > > > On 15 Jan 2024, at 09:31, Peter Blok wrote: > > > > > > Hi, > > > > > > I do have a crash on a NFS client with stable of today > > > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related > > > Maybe it is the > > > same problem. > > > > > > I have ports automounted on /am/ports. When I do cd /am/ports/sys and > > > type tab to > > > autocomplete it crashes with the below stack trace. If I plainly mount > > > ports on /usr/ports > > > and do the same everything works. I am using NFSv3 > > > > > > Peter > > > > > > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 2; apic id = 04 > > > fault virtual address = 0x89 > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x20:0xffff809645d4 > > > stack pointer = 0x28:0xfe00acadb830 > > > frame pointer = 0x28:0xfe00acadb830 > > > code segment= base 0x0, limit 0xf, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > current process = 6869 (csh) > > > trap number = 12 > > > panic: page fault > > > cpuid = 2 > > > time = 1705306940 > > > KDB: stack backtrace: > > > #0 0x806232f5 at kdb_backtrace+0x65 > > > #1 0x805d7a02 at vpanic+0x152 > > > #2 0x805d78a3 at panic+0x43 > > > #3 0x809d58ad at trap_fatal+0x38d > > > #4 0x809d58ff at trap_pfault+0x4f > > > #5 0x809af048 at calltrap+0x8 > > > #6 0x804c7a7e at ncl_bioread+0xb7e > > > #7 0x804b9d90 at nfs_readdir+0x1f0 > > > #8 0x8069c61a at vop_sigdefer+0x2a > > > #9 0x809f8ae0 at VOP_READDIR_APV+0x20 > > > #10 0x81ce75de at autofs_readdir+0x2ce > > > #11 0x809f8ae0 at VOP_READDIR_APV+0x20 > > > #12 0x806c3002 at kern_getdirentries+0x222 > > > #13 0x806c33a9 at sys_getdirentries+0x29 > > > #14 0x809d6180 at amd64_syscall+0x110 > > > #15 0x809af95b at fast_syscall_common+0xf8 > > > > > > > > > > > >> On 15 Jan 2024, at 06:46, FreeBSD User > >> <mailto:free...@walstatt-de.de>> wrote: > > >> > > >> Am Sun, 14 Jan 2024 20:34:12 -0800 > > >> Cy Schubert > >> <mailto:Cy.Schubert@cschubertcom>> schrieb: > > >> > > >>> In message > > >>> > >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c> > > >>> om> > > >>> , Rick Macklem writes: > > >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop > > >>>> > >>>> <mailto:ronald-li...@klop.ws>>= wrote: > > >>>>> > > >>>>> > > >>>>> Van: FreeBSD User > >>>>> <mailto:free...@walstatt-de.de>> > > >>>>> Datum: 13 januari 2024 19:34 > > >>>>> Aan: FreeBSD CURRENT > >>>>> <mailto:freebsd-current@freebsd.org>> > > >>>>> Onderwerp: NFSv4 crash of CURRENT > > >>>>> > > >>>>> Hello, > > >>>>> > > >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 > > >>>>> main-n267556-69748e62e82a= > > >>>> : Sat Jan 13 18:08:32 > > >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the > > >>>>> mentioned cl= > > >>>> i
Re: NFSv4 crash of CURRENT
Am Mon, 15 Jan 2024 16:59:07 +0100 Peter Blok schrieb: > Rick, > > I can confirm Kostik’s fix works on 13-stable. > > Peter Me, too. The patch fixed the reported problem. Thank you very much. oh > > > On 15 Jan 2024, at 16:13, Peter Blok wrote: > > > > I can give it a shot on one of my clients. > > > >> On 15 Jan 2024, at 16:04, Rick Macklem >> <mailto:rick.mack...@gmail.com>> wrote: > >> > >> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok >> <mailto:pb...@bsd4all.org>> > >> wrote: > >>> > >>> Hi, > >>> > >>> Forgot to mention I’m on 13-stable. The fix that is causing the crash > >>> with automounted > >>> NFS is: > >>> > >>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b > >>> Author: Konstantin Belousov mailto:k...@freebsd.org>> > >>> Date: Tue Jan 2 00:22:44 2024 +0200 > >>> > >>>nfsclient: limit situations when we do unlocked read-ahead by nfsiod > >>> > >>>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) > >>> > >>> When I remove the fix, the problem is gone. Add it back and the crash > >>> happens. > >> Kostik has already come up with a probable fix. If you want it right > >> away, here it is, > >> but he'll probably commit it soon anyhow: > >> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbioc > >> index c027d7d7c3fd..1cf45bb0c924 100644 > >> --- a/sys/fs/nfsclient/nfs_clbio.c > >> +++ b/sys/fs/nfsclient/nfs_clbio.c > >> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct > >> thread *td, struct ucred *cred) > >>return (error); > >> } > >> > >> +static bool > >> +ncl_bioread_dora(struct vnode *vp) > >> +{ > >> + vm_object_t obj; > >> + > >> + obj = vp->v_object; > >> + if (obj == NULL) > >> + return (true); > >> + return (!vm_object_mightbedirty(vp->v_object) && > >> + vp->v_object->un_pager.vnp.writemappings == 0); > >> +} > >> + > >> /* > >> * Vnode op for read using bio > >> */ > >> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > >> ioflag, struct ucred *cred) > >> * unlocked read by nfsiod could obliterate changes > >> * done by userspace. > >> */ > >> - if (nmp->nm_readahead > 0 && > >> - !vm_object_mightbedirty(vp->v_object) && > >> - vp->v_object->un_pager.vnp.writemappings == 0) { > >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { > >>for (nra = 0; nra < nmp->nm_readahead && nra < seqcount > >> && > >>(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) { > >>rabn = lbn + 1 + nra; > >> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > >> ioflag, struct ucred *cred) > >> * directory offset cookie of the next block.) > >> */ > >>NFSLOCKNODE(np); > >> - if (nmp->nm_readahead > 0 && > >> - !vm_object_mightbedirty(vp->v_object) && > >> - vp->v_object->un_pager.vnp.writemappings == 0 && > >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && > >>(bp->b_flags & B_INVAL) == 0 && > >>(np->n_direofoffset == 0 || > >>(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && > >> > >> rick > >> ps: It appears that autofs causes the directory to be read before it > >> is open'd for > >> some reason. I've never looked at autofs. > >> > >>> > >>> Peter > >>> > >>> On 15 Jan 2024, at 09:31, Peter Blok >>> <mailto:pb...@bsd4all.org>> > >>> wrote: > >>> > >>> Hi, > >>> > >>> I do have a crash on a NFS client with stable of today > >>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related
Re: NFSv4 crash of CURRENT
Am Mon, 15 Jan 2024 11:53:31 +0100 Peter Blok schrieb: > Hi, > > Forgot to mention I’m on 13-stable. The fix that is causing the crash with > automounted NFS > is: > > commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b > Author: Konstantin Belousov > Date: Tue Jan 2 00:22:44 2024 +0200 > > nfsclient: limit situations when we do unlocked read-ahead by nfsiod > > (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) > > When I remove the fix, the problem is gone. Add it back and the crash happens. > > Peter > > > On 15 Jan 2024, at 09:31, Peter Blok wrote: > > > > Hi, > > > > I do have a crash on a NFS client with stable of today > > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. > > Maybe it is the > > same problem. > > > > I have ports automounted on /am/ports. When I do cd /am/ports/sys and type > > tab to > > autocomplete it crashes with the below stack trace. If I plainly mount > > ports on /usr/ports > > and do the same everything works. I am using NFSv3 > > > > Peter > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 2; apic id = 04 > > fault virtual address = 0x89 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0x809645d4 > > stack pointer = 0x28:0xfe00acadb830 > > frame pointer = 0x28:0xfe00acadb830 > > code segment= base 0x0, limit 0xf, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags= interrupt enabled, resume, IOPL = 0 > > current process = 6869 (csh) > > trap number = 12 > > panic: page fault > > cpuid = 2 > > time = 1705306940 > > KDB: stack backtrace: > > #0 0x806232f5 at kdb_backtrace+0x65 > > #1 0x805d7a02 at vpanic+0x152 > > #2 0x805d78a3 at panic+0x43 > > #3 0x809d58ad at trap_fatal+0x38d > > #4 0x809d58ff at trap_pfault+0x4f > > #5 0x809af048 at calltrap+0x8 > > #6 0x804c7a7e at ncl_bioread+0xb7e > > #7 0x804b9d90 at nfs_readdir+0x1f0 > > #8 0x8069c61a at vop_sigdefer+0x2a > > #9 0x809f8ae0 at VOP_READDIR_APV+0x20 > > #10 0x81ce75de at autofs_readdir+0x2ce > > #11 0x809f8ae0 at VOP_READDIR_APV+0x20 > > #12 0x806c3002 at kern_getdirentries+0x222 > > #13 0x806c33a9 at sys_getdirentries+0x29 > > #14 0x809d6180 at amd64_syscall+0x110 > > #15 0x809af95b at fast_syscall_common+0xf8 > > > > > > > >> On 15 Jan 2024, at 06:46, FreeBSD User >> <mailto:free...@walstatt-de.de>> wrote: > >> > >> Am Sun, 14 Jan 2024 20:34:12 -0800 > >> Cy Schubert mailto:cy.schub...@cschubert.com>> > >> schrieb: > >> > >>> In message > >>> >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c> > >>> > >>> om> > >>> , Rick Macklem writes: > >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop > >>>> >>>> <mailto:ronald-li...@klop.ws>>= wrote: > >>>>> > >>>>> > >>>>> Van: FreeBSD User >>>>> <mailto:free...@walstatt-de.de>> > >>>>> Datum: 13 januari 2024 19:34 > >>>>> Aan: FreeBSD CURRENT >>>>> <mailto:freebsd-current@freebsd.org>> > >>>>> Onderwerp: NFSv4 crash of CURRENT > >>>>> > >>>>> Hello, > >>>>> > >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 > >>>>> main-n267556-69748e62e82a= > >>>> : Sat Jan 13 18:08:32 > >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned > >>>>> cl= > >>>> ient, other is FreeBSD > >>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. > >>>>> > >>>>> I can crash the client reproducable by accessing the one or other NFSv4 > >>>>> F= > >>>> S (a simple ls -la). > >>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla > >>>>> a= > >>>> ccess to the client > >>>>> host, luckily the box recovers. > >>>> Did you rebuild both the nfscommon and nfscl modules from the same > >>>> sources? > >>>> I did a commit to main that changes the interface between these two > >>>> modules and did bump the > >>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt. > >>>> (If you have "options NFSCL" in your kernel config, both should have > >>>> been rebuilt as a part of > >>>> the kernel build.) > >>>> > >>> > >>> Is anyone by chance seeing autofs in the backtrace too? > >>> > >>> > >> > >> Hello Cy Shubert, > >> > >> I forgot to mention that those crashes occur with autofs mounted > >> filesystems. Good > >> question, by the way, I will check whether crashes also happen when > >> mounting the > >> tradidional way. > >> > >> Kind regards, > >> > >> oh > >> > >> -- > >> O. Hartmann > > > good catch! -- O. Hartmann
Re: NFSv4 crash of CURRENT
Rick, I can confirm Kostik’s fix works on 13-stable. Peter > On 15 Jan 2024, at 16:13, Peter Blok wrote: > > I can give it a shot on one of my clients. > >> On 15 Jan 2024, at 16:04, Rick Macklem > <mailto:rick.mack...@gmail.com>> wrote: >> >> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok > <mailto:pb...@bsd4all.org>> wrote: >>> >>> Hi, >>> >>> Forgot to mention I’m on 13-stable. The fix that is causing the crash with >>> automounted NFS is: >>> >>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b >>> Author: Konstantin Belousov mailto:k...@freebsd.org>> >>> Date: Tue Jan 2 00:22:44 2024 +0200 >>> >>>nfsclient: limit situations when we do unlocked read-ahead by nfsiod >>> >>>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) >>> >>> When I remove the fix, the problem is gone. Add it back and the crash >>> happens. >> Kostik has already come up with a probable fix. If you want it right >> away, here it is, >> but he'll probably commit it soon anyhow: >> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c >> index c027d7d7c3fd..1cf45bb0c924 100644 >> --- a/sys/fs/nfsclient/nfs_clbio.c >> +++ b/sys/fs/nfsclient/nfs_clbio.c >> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct >> thread *td, struct ucred *cred) >>return (error); >> } >> >> +static bool >> +ncl_bioread_dora(struct vnode *vp) >> +{ >> + vm_object_t obj; >> + >> + obj = vp->v_object; >> + if (obj == NULL) >> + return (true); >> + return (!vm_object_mightbedirty(vp->v_object) && >> + vp->v_object->un_pager.vnp.writemappings == 0); >> +} >> + >> /* >> * Vnode op for read using bio >> */ >> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int >> ioflag, struct ucred *cred) >> * unlocked read by nfsiod could obliterate changes >> * done by userspace. >> */ >> - if (nmp->nm_readahead > 0 && >> - !vm_object_mightbedirty(vp->v_object) && >> - vp->v_object->un_pager.vnp.writemappings == 0) { >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { >>for (nra = 0; nra < nmp->nm_readahead && nra < seqcount && >>(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) { >>rabn = lbn + 1 + nra; >> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int >> ioflag, struct ucred *cred) >> * directory offset cookie of the next block.) >> */ >>NFSLOCKNODE(np); >> - if (nmp->nm_readahead > 0 && >> - !vm_object_mightbedirty(vp->v_object) && >> - vp->v_object->un_pager.vnp.writemappings == 0 && >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && >>(bp->b_flags & B_INVAL) == 0 && >>(np->n_direofoffset == 0 || >>(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && >> >> rick >> ps: It appears that autofs causes the directory to be read before it >> is open'd for >> some reason. I've never looked at autofs. >> >>> >>> Peter >>> >>> On 15 Jan 2024, at 09:31, Peter Blok >> <mailto:pb...@bsd4all.org>> wrote: >>> >>> Hi, >>> >>> I do have a crash on a NFS client with stable of today >>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. >>> Maybe it is the same problem. >>> >>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type >>> tab to autocomplete it crashes with the below stack trace. If I plainly >>> mount ports on /usr/ports and do the same everything works. I am using NFSv3 >>> >>> Peter >>> >>> >>> >>> >>> Fatal trap 12: page fault while in kernel mode >>> cpuid = 2; apic id = 04 >>> fault virtual address = 0x89 >>> fault code = supervisor read data, page not present >>> instruction pointer = 0x20:0x809645d4 >>&
Re: NFSv4 crash of CURRENT
I can give it a shot on one of my clients. > On 15 Jan 2024, at 16:04, Rick Macklem wrote: > > On Mon, Jan 15, 2024 at 2:53 AM Peter Blok <mailto:pb...@bsd4all.org>> wrote: >> >> Hi, >> >> Forgot to mention I’m on 13-stable. The fix that is causing the crash with >> automounted NFS is: >> >> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b >> Author: Konstantin Belousov >> Date: Tue Jan 2 00:22:44 2024 +0200 >> >>nfsclient: limit situations when we do unlocked read-ahead by nfsiod >> >>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) >> >> When I remove the fix, the problem is gone. Add it back and the crash >> happens. > Kostik has already come up with a probable fix. If you want it right > away, here it is, > but he'll probably commit it soon anyhow: > diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c > index c027d7d7c3fd..1cf45bb0c924 100644 > --- a/sys/fs/nfsclient/nfs_clbio.c > +++ b/sys/fs/nfsclient/nfs_clbio.c > @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct > thread *td, struct ucred *cred) >return (error); > } > > +static bool > +ncl_bioread_dora(struct vnode *vp) > +{ > + vm_object_t obj; > + > + obj = vp->v_object; > + if (obj == NULL) > + return (true); > + return (!vm_object_mightbedirty(vp->v_object) && > + vp->v_object->un_pager.vnp.writemappings == 0); > +} > + > /* > * Vnode op for read using bio > */ > @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > ioflag, struct ucred *cred) > * unlocked read by nfsiod could obliterate changes > * done by userspace. > */ > - if (nmp->nm_readahead > 0 && > - !vm_object_mightbedirty(vp->v_object) && > - vp->v_object->un_pager.vnp.writemappings == 0) { > + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { >for (nra = 0; nra < nmp->nm_readahead && nra < seqcount && >(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) { >rabn = lbn + 1 + nra; > @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > ioflag, struct ucred *cred) > * directory offset cookie of the next block.) > */ >NFSLOCKNODE(np); > - if (nmp->nm_readahead > 0 && > - !vm_object_mightbedirty(vp->v_object) && > - vp->v_object->un_pager.vnp.writemappings == 0 && > + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && >(bp->b_flags & B_INVAL) == 0 && >(np->n_direofoffset == 0 || >(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && > > rick > ps: It appears that autofs causes the directory to be read before it > is open'd for > some reason. I've never looked at autofs. > >> >> Peter >> >> On 15 Jan 2024, at 09:31, Peter Blok wrote: >> >> Hi, >> >> I do have a crash on a NFS client with stable of today >> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe >> it is the same problem. >> >> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type >> tab to autocomplete it crashes with the below stack trace. If I plainly >> mount ports on /usr/ports and do the same everything works. I am using NFSv3 >> >> Peter >> >> >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 04 >> fault virtual address = 0x89 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0x809645d4 >> stack pointer= 0x28:0xfe00acadb830 >> frame pointer= 0x28:0xfe00acadb830 >> code segment = base 0x0, limit 0xf, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 6869 (csh) >> trap number = 12 >> panic: page fault >> cpuid = 2 >> time = 1705306940 >> KDB: stack backtrace: >> #0 0x806232f5 at kdb_backtrace+0x65 >> #1 0x805d7a02 at vpanic+0x152 >> #2 0x805d78a3 at panic+0x43 >> #3 0x809d58ad at trap_fatal+0x38d >> #4 0x
Re: NFSv4 crash of CURRENT
On Mon, Jan 15, 2024 at 2:53 AM Peter Blok wrote: > > Hi, > > Forgot to mention I’m on 13-stable. The fix that is causing the crash with > automounted NFS is: > > commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b > Author: Konstantin Belousov > Date: Tue Jan 2 00:22:44 2024 +0200 > > nfsclient: limit situations when we do unlocked read-ahead by nfsiod > > (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) > > When I remove the fix, the problem is gone. Add it back and the crash happens. Kostik has already come up with a probable fix. If you want it right away, here it is, but he'll probably commit it soon anyhow: diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c index c027d7d7c3fd..1cf45bb0c924 100644 --- a/sys/fs/nfsclient/nfs_clbio.c +++ b/sys/fs/nfsclient/nfs_clbio.c @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct thread *td, struct ucred *cred) return (error); } +static bool +ncl_bioread_dora(struct vnode *vp) +{ + vm_object_t obj; + + obj = vp->v_object; + if (obj == NULL) + return (true); + return (!vm_object_mightbedirty(vp->v_object) && + vp->v_object->un_pager.vnp.writemappings == 0); +} + /* * Vnode op for read using bio */ @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int ioflag, struct ucred *cred) * unlocked read by nfsiod could obliterate changes * done by userspace. */ - if (nmp->nm_readahead > 0 && - !vm_object_mightbedirty(vp->v_object) && - vp->v_object->un_pager.vnp.writemappings == 0) { + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { for (nra = 0; nra < nmp->nm_readahead && nra < seqcount && (off_t)(lbn + 1 + nra) * biosize < nsize; nra++) { rabn = lbn + 1 + nra; @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int ioflag, struct ucred *cred) * directory offset cookie of the next block.) */ NFSLOCKNODE(np); - if (nmp->nm_readahead > 0 && - !vm_object_mightbedirty(vp->v_object) && - vp->v_object->un_pager.vnp.writemappings == 0 && + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && (bp->b_flags & B_INVAL) == 0 && (np->n_direofoffset == 0 || (lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && rick ps: It appears that autofs causes the directory to be read before it is open'd for some reason. I've never looked at autofs. > > Peter > > On 15 Jan 2024, at 09:31, Peter Blok wrote: > > Hi, > > I do have a crash on a NFS client with stable of today > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe > it is the same problem. > > I have ports automounted on /am/ports. When I do cd /am/ports/sys and type > tab to autocomplete it crashes with the below stack trace. If I plainly mount > ports on /usr/ports and do the same everything works. I am using NFSv3 > > Peter > > > > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 04 > fault virtual address = 0x89 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x809645d4 > stack pointer= 0x28:0xfe00acadb830 > frame pointer= 0x28:0xfe00acadb830 > code segment = base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 6869 (csh) > trap number = 12 > panic: page fault > cpuid = 2 > time = 1705306940 > KDB: stack backtrace: > #0 0x806232f5 at kdb_backtrace+0x65 > #1 0x805d7a02 at vpanic+0x152 > #2 0x805d78a3 at panic+0x43 > #3 0x809d58ad at trap_fatal+0x38d > #4 0x809d58ff at trap_pfault+0x4f > #5 0x809af048 at calltrap+0x8 > #6 0x804c7a7e at ncl_bioread+0xb7e > #7 0x804b9d90 at nfs_readdir+0x1f0 > #8 0x8069c61a at vop_sigdefer+0x2a > #9 0x809f8ae0 at VOP_READDIR_APV+0x20 > #10 0x81ce75de at autofs_readdir+0x2ce > #11 0x809f8ae0 at VOP_READDIR_APV+0x20 > #12 0x806c3002 at kern_getdirentries+0x222 > #13 0x806c33a9 at sys_getdirentries+0x29 > #14 0x809d6180 at amd64_syscall+0x110 > #15 0xffff809af95b at fast_syscall_common+0xf8 > > > > On 15 Jan 2024, at 06:46, FreeBSD User wrote: > >
Re: NFSv4 crash of CURRENT
Hi, Forgot to mention I’m on 13-stable. The fix that is causing the crash with automounted NFS is: commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b Author: Konstantin Belousov Date: Tue Jan 2 00:22:44 2024 +0200 nfsclient: limit situations when we do unlocked read-ahead by nfsiod (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e) When I remove the fix, the problem is gone. Add it back and the crash happens. Peter > On 15 Jan 2024, at 09:31, Peter Blok wrote: > > Hi, > > I do have a crash on a NFS client with stable of today > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe > it is the same problem. > > I have ports automounted on /am/ports. When I do cd /am/ports/sys and type > tab to autocomplete it crashes with the below stack trace. If I plainly mount > ports on /usr/ports and do the same everything works. I am using NFSv3 > > Peter > > > > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 04 > fault virtual address = 0x89 > fault code= supervisor read data, page not present > instruction pointer = 0x20:0x809645d4 > stack pointer = 0x28:0xfe00acadb830 > frame pointer = 0x28:0xfe00acadb830 > code segment = base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 6869 (csh) > trap number = 12 > panic: page fault > cpuid = 2 > time = 1705306940 > KDB: stack backtrace: > #0 0x806232f5 at kdb_backtrace+0x65 > #1 0x805d7a02 at vpanic+0x152 > #2 0x805d78a3 at panic+0x43 > #3 0x809d58ad at trap_fatal+0x38d > #4 0x809d58ff at trap_pfault+0x4f > #5 0x809af048 at calltrap+0x8 > #6 0x804c7a7e at ncl_bioread+0xb7e > #7 0x804b9d90 at nfs_readdir+0x1f0 > #8 0x8069c61a at vop_sigdefer+0x2a > #9 0x809f8ae0 at VOP_READDIR_APV+0x20 > #10 0x81ce75de at autofs_readdir+0x2ce > #11 0x809f8ae0 at VOP_READDIR_APV+0x20 > #12 0x806c3002 at kern_getdirentries+0x222 > #13 0x806c33a9 at sys_getdirentries+0x29 > #14 0x809d6180 at amd64_syscall+0x110 > #15 0x809af95b at fast_syscall_common+0xf8 > > > >> On 15 Jan 2024, at 06:46, FreeBSD User > <mailto:free...@walstatt-de.de>> wrote: >> >> Am Sun, 14 Jan 2024 20:34:12 -0800 >> Cy Schubert mailto:cy.schub...@cschubert.com>> >> schrieb: >> >>> In message >>> >> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c> >>> om> >>> , Rick Macklem writes: >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop >>> <mailto:ronald-li...@klop.ws>>= >>>> wrote: >>>>> >>>>> >>>>> Van: FreeBSD User mailto:free...@walstatt-de.de>> >>>>> Datum: 13 januari 2024 19:34 >>>>> Aan: FreeBSD CURRENT >>>> <mailto:freebsd-current@freebsd.org>> >>>>> Onderwerp: NFSv4 crash of CURRENT >>>>> >>>>> Hello, >>>>> >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 >>>>> main-n267556-69748e62e82a= >>>> : Sat Jan 13 18:08:32 >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned >>>>> cl= >>>> ient, other is FreeBSD >>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. >>>>> >>>>> I can crash the client reproducable by accessing the one or other NFSv4 >>>>> F= >>>> S (a simple ls -la). >>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla >>>>> a= >>>> ccess to the client >>>>> host, luckily the box recovers. >>>> Did you rebuild both the nfscommon and nfscl modules from the same sources? >>>> I did a commit to main that changes the interface between these two >>>> modules and did bump the >>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt. >>>> (If you have "options NFSCL" in your kernel config, both should have >>>> been rebuilt as a part of >>>> the kernel build.) >>>> >>> >>> Is anyone by chance seeing autofs in the backtrace too? >>> >>> >> >> Hello Cy Shubert, >> >> I forgot to mention that those crashes occur with autofs mounted >> filesystems. Good question, >> by the way, I will check whether crashes also happen when mounting the >> tradidional way. >> >> Kind regards, >> >> oh >> >> -- >> O. Hartmann >
Re: NFSv4 crash of CURRENT
Hi, I do have a crash on a NFS client with stable of today (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe it is the same problem. I have ports automounted on /am/ports. When I do cd /am/ports/sys and type tab to autocomplete it crashes with the below stack trace. If I plainly mount ports on /usr/ports and do the same everything works. I am using NFSv3 Peter Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x89 fault code = supervisor read data, page not present instruction pointer = 0x20:0x809645d4 stack pointer = 0x28:0xfe00acadb830 frame pointer = 0x28:0xfe00acadb830 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 6869 (csh) trap number = 12 panic: page fault cpuid = 2 time = 1705306940 KDB: stack backtrace: #0 0x806232f5 at kdb_backtrace+0x65 #1 0x805d7a02 at vpanic+0x152 #2 0x805d78a3 at panic+0x43 #3 0x809d58ad at trap_fatal+0x38d #4 0x809d58ff at trap_pfault+0x4f #5 0x809af048 at calltrap+0x8 #6 0x804c7a7e at ncl_bioread+0xb7e #7 0x804b9d90 at nfs_readdir+0x1f0 #8 0x8069c61a at vop_sigdefer+0x2a #9 0x809f8ae0 at VOP_READDIR_APV+0x20 #10 0x81ce75de at autofs_readdir+0x2ce #11 0x809f8ae0 at VOP_READDIR_APV+0x20 #12 0x806c3002 at kern_getdirentries+0x222 #13 0x806c33a9 at sys_getdirentries+0x29 #14 0x809d6180 at amd64_syscall+0x110 #15 0x809af95b at fast_syscall_common+0xf8 > On 15 Jan 2024, at 06:46, FreeBSD User wrote: > > Am Sun, 14 Jan 2024 20:34:12 -0800 > Cy Schubert mailto:cy.schub...@cschubert.com>> > schrieb: > >> In message > om> >> , Rick Macklem writes: >>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop = >>> wrote: >>>> >>>> >>>> Van: FreeBSD User >>>> Datum: 13 januari 2024 19:34 >>>> Aan: FreeBSD CURRENT >>>> Onderwerp: NFSv4 crash of CURRENT >>>> >>>> Hello, >>>> >>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a= >>>> >>> : Sat Jan 13 18:08:32 >>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl= >>>> >>> ient, other is FreeBSD >>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. >>>> >>>> I can crash the client reproducable by accessing the one or other NFSv4 F= >>>> >>> S (a simple ls -la). >>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a= >>>> >>> ccess to the client >>>> host, luckily the box recovers. >>> Did you rebuild both the nfscommon and nfscl modules from the same sources? >>> I did a commit to main that changes the interface between these two >>> modules and did bump the >>> __FreeBSD_version to 1500010, which should cause both to be rebuilt. >>> (If you have "options NFSCL" in your kernel config, both should have >>> been rebuilt as a part of >>> the kernel build.) >>> >> >> Is anyone by chance seeing autofs in the backtrace too? >> >> > > Hello Cy Shubert, > > I forgot to mention that those crashes occur with autofs mounted filesystems. > Good question, > by the way, I will check whether crashes also happen when mounting the > tradidional way. > > Kind regards, > > oh > > -- > O. Hartmann
Re: NFSv4 crash of CURRENT
Am Sun, 14 Jan 2024 20:34:12 -0800 Cy Schubert schrieb: > In message om> > , Rick Macklem writes: > > On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop = > > wrote: > > > > > > > > > Van: FreeBSD User > > > Datum: 13 januari 2024 19:34 > > > Aan: FreeBSD CURRENT > > > Onderwerp: NFSv4 crash of CURRENT > > > > > > Hello, > > > > > > running CURRENT client (FreeBSD 15.0-CURRENT #4 > > > main-n267556-69748e62e82a= > > : Sat Jan 13 18:08:32 > > > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned > > > cl= > > ient, other is FreeBSD > > > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. > > > > > > I can crash the client reproducable by accessing the one or other NFSv4 > > > F= > > S (a simple ls -la). > > > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla > > > a= > > ccess to the client > > > host, luckily the box recovers. > > Did you rebuild both the nfscommon and nfscl modules from the same sources? > > I did a commit to main that changes the interface between these two > > modules and did bump the > > __FreeBSD_version to 1500010, which should cause both to be rebuilt. > > (If you have "options NFSCL" in your kernel config, both should have > > been rebuilt as a part of > > the kernel build.) > > > > Is anyone by chance seeing autofs in the backtrace too? > > Hello Cy Shubert, I forgot to mention that those crashes occur with autofs mounted filesystems. Good question, by the way, I will check whether crashes also happen when mounting the tradidional way. Kind regards, oh -- O. Hartmann
Re: NFSv4 crash of CURRENT
In message , Rick Macklem writes: > On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop = > wrote: > > > > > > Van: FreeBSD User > > Datum: 13 januari 2024 19:34 > > Aan: FreeBSD CURRENT > > Onderwerp: NFSv4 crash of CURRENT > > > > Hello, > > > > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a= > : Sat Jan 13 18:08:32 > > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl= > ient, other is FreeBSD > > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. > > > > I can crash the client reproducable by accessing the one or other NFSv4 F= > S (a simple ls -la). > > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a= > ccess to the client > > host, luckily the box recovers. > Did you rebuild both the nfscommon and nfscl modules from the same sources? > I did a commit to main that changes the interface between these two > modules and did bump the > __FreeBSD_version to 1500010, which should cause both to be rebuilt. > (If you have "options NFSCL" in your kernel config, both should have > been rebuilt as a part of > the kernel build.) > Is anyone by chance seeing autofs in the backtrace too? -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0
Re: NFSv4 crash of CURRENT
Am Sat, 13 Jan 2024 19:41:30 -0800 Rick Macklem schrieb: > On Sat, Jan 13, 2024 at 12:39 PM Ronald Klop wrote: > > > > > > Van: FreeBSD User > > Datum: 13 januari 2024 19:34 > > Aan: FreeBSD CURRENT > > Onderwerp: NFSv4 crash of CURRENT > > > > Hello, > > > > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: > > Sat Jan 13 > > 18:08:32 CET 2024 amd64). One NFSv4 server is same OS revision as the > > mentioned client, > > other is FreeBSD 13.2-RELEASE-p8. Both offer NFSv4 filesystems, > > non-kerberized. > > > > I can crash the client reproducable by accessing the one or other NFSv4 FS > > (a simple ls > > -la). The NFSv4 FS is backed by ZFS (if this matters). I do not have > > physicla access to > > the client host, luckily the box recovers. > Did you rebuild both the nfscommon and nfscl modules from the same sources? Yes, as requested, as soon as the commit occured. I recompiled the whole OS from a "make -j4 cleanworld cleandir" . But I have a custom kernel with several custom options statically compiled in. > I did a commit to main that changes the interface between these two > modules and did bump the > __FreeBSD_version to 1500010, which should cause both to be rebuilt. > (If you have "options NFSCL" in your kernel config, both should have > been rebuilt as a part of > the kernel build.) Monday I will try to compile in several debug options whe I get hands on the machine again and I can test Tuesday on several other boxes running CURRENT (after update) how they interact with themselfes (CURRENT) and other (FBSD14, FBSD13) via NFSv4. > > rick > > > > I have no idea what causes this problem ... > > > > Kind regards, > > > > O. Hartmann > > > > > > -- > > O. Hartmann > > > > > > > > > > > > Do you have something like a panic message, stack trace or core dump? > > > > Regards > > Ronald > -- O. Hartmann
Re: NFSv4 crash of CURRENT
On Sat, Jan 13, 2024 at 12:39 PM Ronald Klop wrote: > > > Van: FreeBSD User > Datum: 13 januari 2024 19:34 > Aan: FreeBSD CURRENT > Onderwerp: NFSv4 crash of CURRENT > > Hello, > > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: > Sat Jan 13 18:08:32 > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned > client, other is FreeBSD > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. > > I can crash the client reproducable by accessing the one or other NFSv4 FS (a > simple ls -la). > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla > access to the client > host, luckily the box recovers. Did you rebuild both the nfscommon and nfscl modules from the same sources? I did a commit to main that changes the interface between these two modules and did bump the __FreeBSD_version to 1500010, which should cause both to be rebuilt. (If you have "options NFSCL" in your kernel config, both should have been rebuilt as a part of the kernel build.) rick > > I have no idea what causes this problem ... > > Kind regards, > > O. Hartmann > > > -- > O. Hartmann > > > > > > Do you have something like a panic message, stack trace or core dump? > > Regards > Ronald
Re: NFSv4 crash of CURRENT
Van: FreeBSD User Datum: 13 januari 2024 19:34 Aan: FreeBSD CURRENT Onderwerp: NFSv4 crash of CURRENT Hello, running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: Sat Jan 13 18:08:32 CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned client, other is FreeBSD 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. I can crash the client reproducable by accessing the one or other NFSv4 FS (a simple ls -la). The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla access to the client host, luckily the box recovers. I have no idea what causes this problem ... Kind regards, O. Hartmann -- O. Hartmann Do you have something like a panic message, stack trace or core dump? Regards Ronald
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
> On Jan 9, 2024, at 6:24 PM, void wrote: > > On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote: >> >> Was the kernel/utility built with IPv6? If not, that’s a general bug which >> should be filed (which can be easily checked/avoided using the FEATURES(9) >> subsystem)… >> Cheers! >> -Enji > > world/kernel was built with WITHOUT_INET6= in /etc/src.conf > > I made the problem go away with removing WITHOUT_INET6= and rebuilding. > The system was installed by taking > FreeBSD-15.0-CURRENT-arm64-aarch64-RPI-20240104-8bf0882e186e-267378.img > and dd-ing it to a usb3-connected hd. > > Where can I read about features? Features can be retrieved by `sysctl kern.features`. As for INET6 it should be `kern.features.inet6` . > > % man features > No manual entry for "features" > > it's not in apropos > thanks, > -- > Best regards, Zhenlei
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
> On Jan 9, 2024, at 7:17 AM, void wrote: > > On Tue, Jan 09, 2024 at 12:24:40PM +, void wrote: >> On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote: >>> On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote: >>>> >>>> Was the kernel/utility built with IPv6? If not, that’s a general bug which >>>> should be filed (which can be easily checked/avoided using the FEATURES(9) >>>> subsystem)… >>>> Cheers! >>>> -Enji >>> >>> world/kernel was built with WITHOUT_INET6= in /etc/src.conf >>> >>> I made the problem go away with removing WITHOUT_INET6= and rebuilding. >> >> I'll re-add this to try and replicate the problem with the same sources >> (main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it > > I forgot about this line: > > options INET6 # IPv6 communications protocols > > which, on current/arm64 lives in std.arm64 which gets included by > GENERIC which is included by GENERIC-MMCCAM which is included by > GENERIC-MMCCAM-NODEBUG > > commenting it out and having WITHOUT_INET6= in /etc/src.conf and rebuilding > fixes the problem. Sorry for the noise. It’s not noise; what you found is a valid issue. Please file an issue for this, noting that the kernel was built without INET6 support (that’s the key bit of info for reproing the issue). Thank you! -Enji signature.asc Description: Message signed with OpenPGP
Re: kernel: fatal trap 12 on CURRENT, when using WireGuard
Am 09.01.24 um 21:40 schrieb Gleb Smirnoff: Rainer, On Tue, Jan 09, 2024 at 09:23:54PM +0100, Rainer Hurling wrote: R> I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very R> recent commit. The build and install went fine. After booting with new R> base, I got a page fault with the following error: Sorry for that, my fault. Can you please test this patch? Hi Gleb, Thanks for the very fast response. I tried your patch and it seems to work as expected. I have a running system, with WireGuard on, at commit main-n267469-0013741108bc-dirty. Many thanks again and best wishes, Rainer
Re: kernel: fatal trap 12 on CURRENT, when using WireGuard
Rainer, On Tue, Jan 09, 2024 at 09:23:54PM +0100, Rainer Hurling wrote: R> I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very R> recent commit. The build and install went fine. After booting with new R> base, I got a page fault with the following error: Sorry for that, my fault. Can you please test this patch? -- Gleb Smirnoff diff --git a/sys/netlink/netlink_domain.c b/sys/netlink/netlink_domain.c index 7660dcada103..4790845d1d31 100644 --- a/sys/netlink/netlink_domain.c +++ b/sys/netlink/netlink_domain.c @@ -233,7 +233,7 @@ nl_send_group(struct nl_writer *nw) copy = nl_buf_copy(nb); if (copy != NULL) { nw->buf = copy; - (void)nl_send_one(nw); + (void)nl_send(nw, nlp_last); } else { NLP_LOCK(nlp_last); if (nlp_last->nl_socket != NULL) @@ -246,7 +246,7 @@ nl_send_group(struct nl_writer *nw) } if (nlp_last != NULL) { nw->buf = nb; - (void)nl_send_one(nw); + (void)nl_send(nw, nlp_last); } else nl_buf_free(nb); diff --git a/sys/netlink/netlink_io.c b/sys/netlink/netlink_io.c index fb8e0a46e8dd..5f50c40f71d8 100644 --- a/sys/netlink/netlink_io.c +++ b/sys/netlink/netlink_io.c @@ -194,9 +194,8 @@ nl_taskqueue_handler(void *_arg, int pending) * If no queue overrunes happened, wakes up socket owner. */ bool -nl_send_one(struct nl_writer *nw) +nl_send(struct nl_writer *nw, struct nlpcb *nlp) { - struct nlpcb *nlp = nw->nlp; struct socket *so = nlp->nl_socket; struct sockbuf *sb = >so_rcv; struct nl_buf *nb; diff --git a/sys/netlink/netlink_message_writer.c b/sys/netlink/netlink_message_writer.c index 0b85378b41b6..50305e3d9d80 100644 --- a/sys/netlink/netlink_message_writer.c +++ b/sys/netlink/netlink_message_writer.c @@ -65,6 +65,13 @@ nlmsg_get_buf(struct nl_writer *nw, u_int len, bool waitok) return (true); } +static bool +nl_send_one(struct nl_writer *nw) +{ + + return (nl_send(nw, nw->nlp)); +} + bool _nlmsg_get_unicast_writer(struct nl_writer *nw, int size, struct nlpcb *nlp) { diff --git a/sys/netlink/netlink_var.h b/sys/netlink/netlink_var.h index c8f0d02a0dab..ddf30b373446 100644 --- a/sys/netlink/netlink_var.h +++ b/sys/netlink/netlink_var.h @@ -130,9 +130,7 @@ void nl_osd_unregister(void); void nl_set_thread_nlp(struct thread *td, struct nlpcb *nlp); /* netlink_io.c */ -#define NL_IOF_UNTRANSLATED 0x01 -#define NL_IOF_IGNORE_LIMIT 0x02 -bool nl_send_one(struct nl_writer *); +bool nl_send(struct nl_writer *, struct nlpcb *); void nlmsg_ack(struct nlpcb *nlp, int error, struct nlmsghdr *nlmsg, struct nl_pstate *npt); void nl_on_transmit(struct nlpcb *nlp);
kernel: fatal trap 12 on CURRENT, when using WireGuard
I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very recent commit. The build and install went fine. After booting with new base, I got a page fault with the following error: Kernel page fault with the following non-sleepable locks held: shared rm netlink lock (netlink lock) r = 0 (0xf8005fc8ca20) locked @ /usr/src/sys/netlink/netlink_domain.c:241 exclusive rw lle (lle) r = 0 (0xf801951dce90) locked @ /usr/src/sys/netinet/in.c:1716 stack backtrace: #0 0x80bc6c45 at witness_debugger+0x65 #1 0x80bc7d89 at witness_warn+0x3e9 #2 0x81056b18 at trap_pfault+0x88 #3 0x81028708 at calltrap+0x8 #4 0x80dbd6a2 at nl_send_group+0x1d2 #5 0x80dc0e27 at _nlmsg_flush+0x37 #6 0x80dc4fdc at rtnl_lle_event+0x10c #7 0x80d15e32 at arp_mark_lle_reachable+0xd2 #8 0x80d15b43 at arp_check_update_lle+0x293 #9 0x80d151c5 at arpintr+0xa65 #10 0x80caaaed at netisr_dispatch_src+0xad #11 0x80c8d57a at ether_demux+0x0x17a #12 0x80c8ec53 at ether_nh_input+0x403 #13 0x80caaaed at netisr_dispatch_src+0xad #14 0x80c8d9c9 at ether_input+0xd9 #15 0x80ca66ac at iflib_rxeof+0xe4c #16 0x80ca0b5a at _task_fn_rx+0x7a #17 0x80ba0118 at gtaskqueue_run_locked+0xa8 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x3 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80dc0a10 stack pointer = 0x28:0xfe006a3a8760 frame pointer = 0x28:0xfe006a3a8790 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1. def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) rdi: fe006a3a8850 rsi: fe006a3a86f0 rdx: fe006a3a87b0 rcx: f80001f88740 r8: 83210090 r9: rax: rbx: 0003 rbp: fe006a3a8790 r10: 0001 r11: r12: f8005fc8ca00 r13: f8005fc8ca20 r14: fe006a3a8850 r15: trap number = 12 panic: page fault cpuid = 0 time = 1704824328 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe006a3a8430 vpanic() at vpanic+0x131/frame 0xfe006a3a8560 panic() at panic+0x43/frame 0xfe006a3a85c0 trap_fatal() at trap_fatal+0x40f/frame 0xfe006a3a8620 trap_pfault() at trap_pfault+0xae/frame 0xfe006a3a8690 calltrap() at calltrap+0x8/frame 0xfe006a3a8690 --- trap 0xc, rip = 0x80dc0a10, rsp = 0xfe006a3a8760, rbp = 0xfe006a3a8790 --- nl_send_one() at nl_send_one+0x20/frame 0xfe006a3a8790 nl_send_group() at nl_send_group+0x1d2/frame 0xfe006a3a8820 _nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfe006a3a8840 rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfe006a3a88e0 arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 0xfe006a3a8930 arp_check_update_lle() at arp_check_update_lle+0x293/frame 0xfe006a3a8a00 arpintr() at arpintr+0xa65/frame 0xfe006a3a8b60 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfe006a3a8bc8 ether_demux() at ether_demux+0x17a/frame 0xfe006a4a8bf0 ether_nh_input() at ether_nh_input+0x403/frame 0xfe006a3a8c40 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfe006a3a8ca0 ether_input() at ehter_input+0xd9/frame 0xfe006a3a8d00 iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfe006a3a8e00 _task_fn_rx() at _task_fn_rx+0x7a/frame 0xfe006a3a8e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 0xfe006a3a8ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 0xfe006a3a8ef0 fork_exit() at fork_exit+0x82/frame 0xfe006a3a8f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe006a3a8f30 --- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, rbp = 0x61f64fc36db64fc7 KDB: enter: panic [ thread pid 0 tid 100067 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db> Since the current process 'if_io_tqg_0' and problems with netlink are mentioned, I searched in the area of my network connections. I discovered that this page fault only occurs when a connection is established with WireGuard (wg-quick up wg0). Without using WireGuard, this error does not occur. I was able to find out at which commit this behavior occurs with my box: - Up to commit main-n267347-660bd40a598a everything is fine. - The two following commits n267348-67d9023f07a4 and n267349-0ad011ececb9 do not build on my box (module/netlink broken ...). - From commit n267349-0ad011ececb9 (netlink) onwards this page fault occurs when WireGuard is started. Any help is greatly appreciated. CC'ed Gleb Smirnoff due to the affected commits. Regards, Rainer Hurling
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
On Tue, Jan 09, 2024 at 12:24:40PM +, void wrote: On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote: On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote: Was the kernel/utility built with IPv6? If not, that’s a general bug which should be filed (which can be easily checked/avoided using the FEATURES(9) subsystem)… Cheers! -Enji world/kernel was built with WITHOUT_INET6= in /etc/src.conf I made the problem go away with removing WITHOUT_INET6= and rebuilding. I'll re-add this to try and replicate the problem with the same sources (main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it I forgot about this line: options INET6 # IPv6 communications protocols which, on current/arm64 lives in std.arm64 which gets included by GENERIC which is included by GENERIC-MMCCAM which is included by GENERIC-MMCCAM-NODEBUG commenting it out and having WITHOUT_INET6= in /etc/src.conf and rebuilding fixes the problem. Sorry for the noise. --
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote: On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote: Was the kernel/utility built with IPv6? If not, that’s a general bug which should be filed (which can be easily checked/avoided using the FEATURES(9) subsystem)… Cheers! -Enji world/kernel was built with WITHOUT_INET6= in /etc/src.conf I made the problem go away with removing WITHOUT_INET6= and rebuilding. I'll re-add this to try and replicate the problem with the same sources (main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it --
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote: Was the kernel/utility built with IPv6? If not, that’s a general bug which should be filed (which can be easily checked/avoided using the FEATURES(9) subsystem)… Cheers! -Enji world/kernel was built with WITHOUT_INET6= in /etc/src.conf I made the problem go away with removing WITHOUT_INET6= and rebuilding. The system was installed by taking FreeBSD-15.0-CURRENT-arm64-aarch64-RPI-20240104-8bf0882e186e-267378.img and dd-ing it to a usb3-connected hd. Where can I read about features? % man features No manual entry for "features" it's not in apropos thanks, --
[Bug 197921] scheduler: Allow non-migratable threads to bind to their current CPU
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197921 Zhenlei Huang changed: What|Removed |Added CC||z...@freebsd.org --- Comment #3 from Zhenlei Huang --- It seems we do not have usage that bind a thread to local CPU, otherwise `KASSERT(THREAD_CAN_MIGRATE(td), ("%p must be migratable", td))` will complain (when kernel built with option INVARIANTS). (In reply to Ed Maste from comment #1) > but, what about just moving the KASSERT after the `if (PCPU_GET(cpuid) == > cpu)` test? I think that is much simpler. -- You are receiving this mail because: You are on the CC list for the bug.
Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
> On Jan 7, 2024, at 6:29 AM, void wrote: > > Hi, > > on a rpi4/8GB, my rc.conf looks like so. It's an ipv4-only system on a LAN > not directly connected to the internet > > hostname="generic.home.arpa" > ifconfig_genet0="inet 192.168.1.199 netmask 255.255.255.0" > defaultrouter="192.168.1.1" > sshd_enable="YES" > sendmail_enable="NONE" > sendmail_submit_enable="NO" > sendmail_outbound_enable="NO" > sendmail_msp_queue_enable="NO" > growfs_enable="YES" > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable > dumpdev="AUTO" > ntpd_enable="YES" > ntpdate_enable="YES" > > when it boots, the following appears in the serial console > > ### > > Starting devd. > Autoloading module: uhid > Autoloading module: usbhid > Autoloading module: wmt > route: message indicates error: File exists > add host 127.0.0.1: gateway lo0 fib 0: route already in table > add net default: gateway 192.168.1.1 > route: bad keyword: inet6 > route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] > route: bad keyword: inet6 > route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] > route: bad keyword: inet6 > route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] > route: bad keyword: inet6 > route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] > route: bad keyword: inet6 > route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] > Updating motd:. > Creating and/or trimming log files. Was the kernel/utility built with IPv6? If not, that’s a general bug which should be filed (which can be easily checked/avoided using the FEATURES(9) subsystem)… Cheers! -Enji signature.asc Description: Message signed with OpenPGP
route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64
Hi, on a rpi4/8GB, my rc.conf looks like so. It's an ipv4-only system on a LAN not directly connected to the internet hostname="generic.home.arpa" ifconfig_genet0="inet 192.168.1.199 netmask 255.255.255.0" defaultrouter="192.168.1.1" sshd_enable="YES" sendmail_enable="NONE" sendmail_submit_enable="NO" sendmail_outbound_enable="NO" sendmail_msp_queue_enable="NO" growfs_enable="YES" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="AUTO" ntpd_enable="YES" ntpdate_enable="YES" when it boots, the following appears in the serial console ### Starting devd. Autoloading module: uhid Autoloading module: usbhid Autoloading module: wmt route: message indicates error: File exists add host 127.0.0.1: gateway lo0 fib 0: route already in table add net default: gateway 192.168.1.1 route: bad keyword: inet6 route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] route: bad keyword: inet6 route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] route: bad keyword: inet6 route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] route: bad keyword: inet6 route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] route: bad keyword: inet6 route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args] Updating motd:. Creating and/or trimming log files. ### Why is it erroring for ipv6 when theres no ipv6 in rc.conf? I've not tried an amd64 -current system yet. --
[Bug 197921] scheduler: Allow non-migratable threads to bind to their current CPU
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197921 Mark Linimon changed: What|Removed |Added Flags|mfc-stable12?, | |mfc-stable11? | --- Comment #2 from Mark Linimon --- ^Triage: remove OBE flags. -- You are receiving this mail because: You are on the CC list for the bug.
Re: Checksum Error on installer (3 iso images in CURRENT)
On 29/12/2023 05:46, Christopher Davidson wrote: Hi FreeBSD mailing list, I have recently started to look at the CURRENT isos, for installation in a virtualbox, and while trying to install these images I have received verification issues with the checksums. Problem: FreeBSD installer will error out of the installer upon verification of checksums, they do not align with the checksum files in the directory: https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ <https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/150/> Steps to replicate: 1. Create a virtualbox profile with freebsd 2. Attach one of the below iso images 3. Run the installation 4. Select keymap 5. Select partition setup (UFS) 6. Select packages to install 7. Program starts verifying the base package and comes back with the error message I have confirmed this with some people on the liberachat IRC server, under #freebsd and this is not an isolated event. The 3 iso images in question are: * FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso * FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso * FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso Here are the respective checksums for each of these files: CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973) = 827182ccbfbce984c969790e7aac43828dffc4a21d43e855c91bac03f29dc74e SHA256 (FreeBSD-150-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso) = fdd8870549474f38d35665c330d209df7733aa8608630845471685b291c06746 CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058) = 60f01c27aa02acb47cab7dec58119f34e7215c3656b8486854bc64217cdfe3bb SHA256 (FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso) = abdd81c253c651bbc10e3db1b97b8b111f73b3f657f729e37cdbe975de0dc056 CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242) = 83698ee594d56108b29e40d635671c7a2de6ada2af636ef5254eafbd35e95e96 SHA256 (FreeBSD-150-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso) = 2deb850673f148cf1ab269175ddf40448e6a96b331b4ca0027f8abe16b3edfa0 If any further information/clarification is required, please do let me know. Kind Regards, Chris Hi Chris, I had the same problem some time ago because I used to test iso/bsdinstall every day. Fortunately, the re@ team explained the cause and the solution to me. The new development snapshot builds are propagated to the mirror. It is non-atomic, so there is a bit of a possible race condition. (This happens once a week, unless there is some sort of build failure.) You could subscribe to the freebsd-snapshots@ mailing list if you want to be notified when the propagation is complete. Alfonso
Checksum Error on installer (3 iso images in CURRENT)
Hi FreeBSD mailing list, I have recently started to look at the CURRENT isos, for installation in a virtualbox, and while trying to install these images I have received verification issues with the checksums. Problem: FreeBSD installer will error out of the installer upon verification of checksums, they do not align with the checksum files in the directory: https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ Steps to replicate: 1. Create a virtualbox profile with freebsd 2. Attach one of the below iso images 3. Run the installation 4. Select keymap 5. Select partition setup (UFS) 6. Select packages to install 7. Program starts verifying the base package and comes back with the error message I have confirmed this with some people on the libera.chat IRC server, under #freebsd and this is not an isolated event. The 3 iso images in question are: * FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso * FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso * FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso Here are the respective checksums for each of these files: CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973) = 827182ccbfbce984c969790e7aac43828dffc4a21d43e855c91bac03f29dc74e SHA256 (FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso) = fdd8870549474f38d35665c330d209df7733aa8608630845471685b291c06746 CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058) = 60f01c27aa02acb47cab7dec58119f34e7215c3656b8486854bc64217cdfe3bb SHA256 (FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso) = abdd81c253c651bbc10e3db1b97b8b111f73b3f657f729e37cdbe975de0dc056 CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242 SHA256 (CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242) = 83698ee594d56108b29e40d635671c7a2de6ada2af636ef5254eafbd35e95e96 SHA256 (FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso) = 2deb850673f148cf1ab269175ddf40448e6a96b331b4ca0027f8abe16b3edfa0 If any further information/clarification is required, please do let me know. Kind Regards, Chris
Re: Problem building world on current
Seems that it was related to PR273661. I follow this https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273661#c5 and its now building Thanks Santi On 12/28/23 20:32, Santiago Martinez wrote: Hi David, - I'm running 14.0R-P3 - Last commit: 5f71f9636efa25f6de1a832202bae7c78ad013aa (HEAD -> main, origin/main, origin/HEAD) Author: rilysh Date: Thu Dec 28 02:34:32 2023 -0500 - Just a clean build, no options on command line or src.conf/make - Kernel builds without a problem ( just in case). Thanks. Santi On 12/28/23 16:23, David Wolfskill wrote: On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote: Hi Everyone, I'm having issues building world from current (just now). Same header missing on multiple parts. Best regards. Santiago It might be useful to know: * What you are running at the time; * what the most recent commit in your source tree is; * whether this is a clean build, you are using make's META_MODE, or you are just setting -DNO_CLEAN. I have not seen the issue you cite; my most recent builds of head: FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 12:14:41 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 15:52:04 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 12:18:34 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 (in my case, using make's META_MODE). More details at https://www.catwhisker.org/~david/FreeBSD/history/. Peace, david
Re: Problem building world on current
Hi David, - I'm running 14.0R-P3 - Last commit: 5f71f9636efa25f6de1a832202bae7c78ad013aa (HEAD -> main, origin/main, origin/HEAD) Author: rilysh Date: Thu Dec 28 02:34:32 2023 -0500 - Just a clean build, no options on command line or src.conf/make - Kernel builds without a problem ( just in case). Thanks. Santi On 12/28/23 16:23, David Wolfskill wrote: On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote: Hi Everyone, I'm having issues building world from current (just now). Same header missing on multiple parts. Best regards. Santiago It might be useful to know: * What you are running at the time; * what the most recent commit in your source tree is; * whether this is a clean build, you are using make's META_MODE, or you are just setting -DNO_CLEAN. I have not seen the issue you cite; my most recent builds of head: FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 12:14:41 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 15:52:04 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 12:18:34 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 (in my case, using make's META_MODE). More details at https://www.catwhisker.org/~david/FreeBSD/history/. Peace, david
Re: Problem building world on current
On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote: > Hi Everyone, I'm having issues building world from current (just now). > > Same header missing on multiple parts. > > Best regards. > > Santiago > It might be useful to know: * What you are running at the time; * what the most recent commit in your source tree is; * whether this is a clean build, you are using make's META_MODE, or you are just setting -DNO_CLEAN. I have not seen the issue you cite; my most recent builds of head: FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 12:14:41 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 1500008 1500008 FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 15:52:04 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 1500008 1500008 FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 12:18:34 UTC 2023 r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158 (in my case, using make's META_MODE). More details at https://www.catwhisker.org/~david/FreeBSD/history/. Peace, david -- David H. Wolfskill da...@catwhisker.org Do these ends really justify those means? See https://www.catwhisker.org/~david/publickey.gpg for my public key. signature.asc Description: PGP signature
Problem building world on current
Hi Everyone, I'm having issues building world from current (just now). Same header missing on multiple parts. Best regards. Santiago """ In file included from /usr/src/contrib/llvm-project/llvm/lib/Demangle/ItaniumDemangle.cpp:13: In file included from /usr/src/contrib/llvm-project/llvm/include/llvm/Demangle/Demangle.h:13: /usr/include/c++/v1/string:561:10: fatal error: '__string/char_traits.h' file not found #include <__string/char_traits.h> ^~~~ 1 error generated. *** Error code 1 """
Re: compile 13.2p8 on a recent current fails: compiler issue ?
Dimitry Andric writes: > henry vogt writes: > > ===> usr.sbin/zic (obj,all,install) > > Building /usr/obj/usr/src/13.2/amd64.amd64/tmp/obj-tools/usr.sbin/zic/zic.o > > --- zic.o --- > > /usr/src/13.2/contrib/tzcode/zic.c:464:8: error: an attribute list cannot > > appear here > > 464 | static ATTRIBUTE_NORETURN void > > |^~ > This appears to have been fixed upstream some time ago: > https://github.com/eggert/tz/commit/9cfe9507fcc22cd4a0c4da486ea1c7f0de6b075f It's also fixed in 14 and 15: https://cgit.freebsd.org/src/commit/?id=75411d157232ee3b4789b92c9205453e7d59a3d2 It was too late for 13.2, but I'll make sure it's merged before 13.3. DES -- Dag-Erling Smørgrav - d...@freebsd.org
Re: compile 13.2p8 on a recent current fails: compiler issue ?
On 13 Dec 2023, at 13:08, henry vogt wrote: > > attempt to compile 13.2p8 on a recent current fails: compiler issue ? > > ... > > ===> usr.sbin/zic (obj,all,install) > Building /usr/obj/usr/src/13.2/amd64.amd64/tmp/obj-tools/usr.sbin/zic/zic.o > --- zic.o --- > /usr/src/13.2/contrib/tzcode/zic.c:464:8: error: an attribute list cannot > appear here > 464 | static ATTRIBUTE_NORETURN void > |^~ > /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro > 'ATTRIBUTE_NORETURN' > 471 | # define ATTRIBUTE_NORETURN [[noreturn]] > | ^~~~ > /usr/src/13.2/contrib/tzcode/zic.c:471:8: error: an attribute list cannot > appear here > 471 | static ATTRIBUTE_NORETURN void > |^~ > /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro > 'ATTRIBUTE_NORETURN' > 471 | # define ATTRIBUTE_NORETURN [[noreturn]] > | ^~~~ > /usr/src/13.2/contrib/tzcode/zic.c:669:8: error: an attribute list cannot > appear here > 669 | static ATTRIBUTE_NORETURN void > |^~ > /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro > 'ATTRIBUTE_NORETURN' > 471 | # define ATTRIBUTE_NORETURN [[noreturn]] > | ^~~~ > /usr/src/13.2/contrib/tzcode/zic.c:3778:8: error: an attribute list cannot > appear here > 3778 | static ATTRIBUTE_NORETURN void > |^~ > /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro > 'ATTRIBUTE_NORETURN' > 471 | # define ATTRIBUTE_NORETURN [[noreturn]] > | ^~~~ > 4 errors generated. > *** [zic.o] Error code 1 > > make[3]: stopped in /usr/src/13.2/usr.sbin/zic > > # cc -v > FreeBSD clang version 17.0.6 (https://github.com/llvm/llvm-project.git > llvmorg-17.0.6-0-g6009708b4367) > Target: x86_64-unknown-freebsd15.0 > Thread model: posix > InstalledDir: /usr/bin This appears to have been fixed upstream some time ago: https://github.com/eggert/tz/commit/9cfe9507fcc22cd4a0c4da486ea1c7f0de6b075f but clang 17 has become more strict about invalid attribute placement, possibly to be more like gcc (which should already have given this as a warning or error). So I guess for 13.2-p8 you will have to apply that fix manually, if you want to build it on a 15-CURRENT box. Otherwise, I would advise a jail. -Dimitry signature.asc Description: Message signed with OpenPGP