Re: latest current fails to boot.
On Sun, 26 Sep 2021 12:16:55 -0400 Alexander Motin wrote: > Thank you for the notification. 08063e9f98a should fix the hang. Thanks for the fix! Rebuilt, Several reboots with varying 0, 1, commented out whole line for kern.sched.steal_thresh and all went fine. > I just want to add that lowering kern.sched.steal_thresh below 2 should > not be a proper fix for any problem, but only a workaround. I guess > either some CPU can't wake up from sleep for too long, or the wakeup > interrupt is not properly sent to it when the load is assigned. In such > case stealing makes other CPU to do the work instead. It would be good > to find and fix the real problem. For me (with Core i7-8750H), lowering kern.sched.steal_thresh didn't made significant improvement but had no reason to go back to default. *As sched_ule got modified now, I'm trying default kern.sched.steal_thresh value now. In the other hand, at least some Ryzen users seem to have much more severe problem than me and the workaround make significant imrovement. > On 25.09.2021 21:47, Konstantin Belousov wrote: > > On Sun, Sep 26, 2021 at 10:23:47AM +0900, Tomoaki AOKI wrote: > >> On Sat, 25 Sep 2021 23:46:48 +0300 > >> Andriy Gapon wrote: > >> > >>> On 25/09/2021 19:10, Johan Hendriks wrote: > For me i had kern.sched.steal_thresh=1 in my sysctl as i use this > machine mainly > for tests and so on. > By removing this sysctl the system boots again. I already used the > latest > snapshot and that booted fine. > >>> > >>> Might have something to do with > >>> https://cgit.FreeBSD.org/src/commit/?id=bd84094a51c4648a7c97ececdaccfb30bc832096 > >>> > >>> -- > >>> Andriy Gapon > >> > >> Commenting out kern.sched.steal_thresh=0 line in /etc/sysctl.conf let > >> me boot fine. No other setting of kern.sched.* affected. > >> I've introduced the setting by reading posts [1] and [2] on > >> freebsd-current ML. Thanks for the hint, Jan! > >> > >> Andriy, I took time to bi-sect and determined the commit triggered > >> this issue was e745d729be60. [3] > >> Worked OK even with kern.sched.steal_thresh=0 at a342ecd326ee. [4] > >> > >> Tested commits are as below (tested order, not using git bisect): > >> 0b79a76f8487: [Known to be OK] > >> 8db1669959ce: [Problematic rev I first encountered] > >> 0f6829488ef3: OK > >> df8dd6025af8: OK > >> 4f917847c903: OK > >> e745d729be60: NG! > >> bd84094a51c4: OK > >> a342ecd326ee: OK > >> > >> Konstantin, no more chance to get into ddb on hang up until my previous > >> post. ^T never worked on hang up situation. Sory. But does info above > >> help? > > Let the author of the commit look. > > > >> > >> > >> [1] > >> https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079237.html > >> > >> [2] > >> https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079240.html > >> > >> [3] > >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007513.html > >> > >> [4] > >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007512.html > >> > >> -- > >> Tomoaki AOKI > > -- > Alexander Motin > -- Tomoaki AOKI
Re: latest current fails to boot.
Thank you for the notification. 08063e9f98a should fix the hang. I just want to add that lowering kern.sched.steal_thresh below 2 should not be a proper fix for any problem, but only a workaround. I guess either some CPU can't wake up from sleep for too long, or the wakeup interrupt is not properly sent to it when the load is assigned. In such case stealing makes other CPU to do the work instead. It would be good to find and fix the real problem. On 25.09.2021 21:47, Konstantin Belousov wrote: > On Sun, Sep 26, 2021 at 10:23:47AM +0900, Tomoaki AOKI wrote: >> On Sat, 25 Sep 2021 23:46:48 +0300 >> Andriy Gapon wrote: >> >>> On 25/09/2021 19:10, Johan Hendriks wrote: For me i had kern.sched.steal_thresh=1 in my sysctl as i use this machine mainly for tests and so on. By removing this sysctl the system boots again. I already used the latest snapshot and that booted fine. >>> >>> Might have something to do with >>> https://cgit.FreeBSD.org/src/commit/?id=bd84094a51c4648a7c97ececdaccfb30bc832096 >>> >>> -- >>> Andriy Gapon >> >> Commenting out kern.sched.steal_thresh=0 line in /etc/sysctl.conf let >> me boot fine. No other setting of kern.sched.* affected. >> I've introduced the setting by reading posts [1] and [2] on >> freebsd-current ML. Thanks for the hint, Jan! >> >> Andriy, I took time to bi-sect and determined the commit triggered >> this issue was e745d729be60. [3] >> Worked OK even with kern.sched.steal_thresh=0 at a342ecd326ee. [4] >> >> Tested commits are as below (tested order, not using git bisect): >> 0b79a76f8487: [Known to be OK] >> 8db1669959ce: [Problematic rev I first encountered] >> 0f6829488ef3: OK >> df8dd6025af8: OK >> 4f917847c903: OK >> e745d729be60: NG! >> bd84094a51c4: OK >> a342ecd326ee: OK >> >> Konstantin, no more chance to get into ddb on hang up until my previous >> post. ^T never worked on hang up situation. Sory. But does info above >> help? > Let the author of the commit look. > >> >> >> [1] >> https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079237.html >> >> [2] >> https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079240.html >> >> [3] >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007513.html >> >> [4] >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007512.html >> >> -- >> Tomoaki AOKI -- Alexander Motin
Re: latest current fails to boot.
On Sun, Sep 26, 2021 at 10:23:47AM +0900, Tomoaki AOKI wrote: > On Sat, 25 Sep 2021 23:46:48 +0300 > Andriy Gapon wrote: > > > On 25/09/2021 19:10, Johan Hendriks wrote: > > > For me i had kern.sched.steal_thresh=1 in my sysctl as i use this machine > > > mainly > > > for tests and so on. > > > By removing this sysctl the system boots again. I already used the latest > > > snapshot and that booted fine. > > > > Might have something to do with > > https://cgit.FreeBSD.org/src/commit/?id=bd84094a51c4648a7c97ececdaccfb30bc832096 > > > > -- > > Andriy Gapon > > Commenting out kern.sched.steal_thresh=0 line in /etc/sysctl.conf let > me boot fine. No other setting of kern.sched.* affected. > I've introduced the setting by reading posts [1] and [2] on > freebsd-current ML. Thanks for the hint, Jan! > > Andriy, I took time to bi-sect and determined the commit triggered > this issue was e745d729be60. [3] > Worked OK even with kern.sched.steal_thresh=0 at a342ecd326ee. [4] > > Tested commits are as below (tested order, not using git bisect): > 0b79a76f8487: [Known to be OK] > 8db1669959ce: [Problematic rev I first encountered] > 0f6829488ef3: OK > df8dd6025af8: OK > 4f917847c903: OK > e745d729be60: NG! > bd84094a51c4: OK > a342ecd326ee: OK > > Konstantin, no more chance to get into ddb on hang up until my previous > post. ^T never worked on hang up situation. Sory. But does info above > help? Let the author of the commit look. > > > [1] > https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079237.html > > [2] > https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079240.html > > [3] > https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007513.html > > [4] > https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007512.html > > -- > Tomoaki AOKI
Re: latest current fails to boot.
On Sat, 25 Sep 2021 23:46:48 +0300 Andriy Gapon wrote: > On 25/09/2021 19:10, Johan Hendriks wrote: > > For me i had kern.sched.steal_thresh=1 in my sysctl as i use this machine > > mainly > > for tests and so on. > > By removing this sysctl the system boots again. I already used the latest > > snapshot and that booted fine. > > Might have something to do with > https://cgit.FreeBSD.org/src/commit/?id=bd84094a51c4648a7c97ececdaccfb30bc832096 > > -- > Andriy Gapon Commenting out kern.sched.steal_thresh=0 line in /etc/sysctl.conf let me boot fine. No other setting of kern.sched.* affected. I've introduced the setting by reading posts [1] and [2] on freebsd-current ML. Thanks for the hint, Jan! Andriy, I took time to bi-sect and determined the commit triggered this issue was e745d729be60. [3] Worked OK even with kern.sched.steal_thresh=0 at a342ecd326ee. [4] Tested commits are as below (tested order, not using git bisect): 0b79a76f8487: [Known to be OK] 8db1669959ce: [Problematic rev I first encountered] 0f6829488ef3: OK df8dd6025af8: OK 4f917847c903: OK e745d729be60: NG! bd84094a51c4: OK a342ecd326ee: OK Konstantin, no more chance to get into ddb on hang up until my previous post. ^T never worked on hang up situation. Sory. But does info above help? [1] https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079237.html [2] https://lists.freebsd.org/pipermail/freebsd-current/2021-March/079240.html [3] https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007513.html [4] https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-September/007512.html -- Tomoaki AOKI
Re: latest current fails to boot.
On 25/09/2021 19:10, Johan Hendriks wrote: For me i had kern.sched.steal_thresh=1 in my sysctl as i use this machine mainly for tests and so on. By removing this sysctl the system boots again. I already used the latest snapshot and that booted fine. Might have something to do with https://cgit.FreeBSD.org/src/commit/?id=bd84094a51c4648a7c97ececdaccfb30bc832096 -- Andriy Gapon
Re: latest current fails to boot.
On 25/09/2021 06:45, Jan Beich wrote: Tomoaki AOKI writes: On Wed, 22 Sep 2021 05:47:46 -0700 David Wolfskill wrote: On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: I did a git pull this morning and it fails to boot. I hangs at Setting hostid : 0x917bf354 This is a vm running on vmware. If i boot the old kernel from yesterday it boots normally. uname -a FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 I had no issues with my build machine or either of two laptops, either from yesterday: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 or today: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 [uname strings from my main laptop shown, but I keep the machines in sync rather aggressively.] Perhaps the issue you are encountering involves things not in my environment (such as VMs or ZFS)? Peace, david -- David H. Wolfskill da...@catwhisker.org Life is not intended to be a zero-sum game. See https://www.catwhisker.org/~david/publickey.gpg for my public key. For me, on bare metal (non-vm) amd64 with root-on-ZFS, Fails to boot to multiuser at git: 8db1669959ce Boot fine at git: 0b79a76f8487 Boot to singleuser is fine even with failed revision. Failure mode: Hard hangup or spinning and non-operable. Hard power-off needed. Seems to happen after starting rc.conf processing and before setting hostid. Does "git revert --no-edit -2 8db1669959ce" help? Do you modify kern.sched.* via /etc/sysctl.conf? For me i had kern.sched.steal_thresh=1 in my sysctl as i use this machine mainly for tests and so on. By removing this sysctl the system boots again. I already used the latest snapshot and that booted fine. regards Johan
Re: latest current fails to boot.
Tomoaki AOKI writes: > On Wed, 22 Sep 2021 05:47:46 -0700 > David Wolfskill wrote: > >> On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: >> > I did a git pull this morning and it fails to boot. >> > I hangs at Setting hostid : 0x917bf354 >> > >> > This is a vm running on vmware. >> > If i boot the old kernel from yesterday it boots normally. >> > >> > uname -a >> > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 >> > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 >> > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 >> > >> >> I had no issues with my build machine or either of two laptops, either >> from yesterday: >> >> FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 >> main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 >> r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY >> amd64 1400033 1400033 >> >> or today: >> >> FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 >> main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 >> r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY >> amd64 1400033 1400033 >> >> [uname strings from my main laptop shown, but I keep the machines >> in sync rather aggressively.] >> >> Perhaps the issue you are encountering involves things not in my >> environment (such as VMs or ZFS)? >> >> Peace, >> david >> -- >> David H. Wolfskill da...@catwhisker.org >> Life is not intended to be a zero-sum game. >> >> See https://www.catwhisker.org/~david/publickey.gpg for my public key. > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > Fails to boot to multiuser at git: 8db1669959ce > Boot fine at git: 0b79a76f8487 > > Boot to singleuser is fine even with failed revision. > > Failure mode: > Hard hangup or spinning and non-operable. Hard power-off needed. > Seems to happen after starting rc.conf processing and before setting > hostid. Does "git revert --no-edit -2 8db1669959ce" help? Do you modify kern.sched.* via /etc/sysctl.conf?
Re: latest current fails to boot.
On Sat, Sep 25, 2021 at 11:00:50AM +0900, Tomoaki AOKI wrote: > On Fri, 24 Sep 2021 01:33:33 +0300 > Konstantin Belousov wrote: > > > On Thu, Sep 23, 2021 at 09:20:51PM +0200, Johan Hendriks wrote: > > > > > > On 23/09/2021 19:52, Konstantin Belousov wrote: > > > > On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote: > > > > > On Wed, 22 Sep 2021 23:09:05 +0900 > > > > > Tomoaki AOKI wrote: > > > > > > > > > > > On Wed, 22 Sep 2021 05:47:46 -0700 > > > > > > David Wolfskill wrote: > > > > > > > > > > > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > > > > > > > I did a git pull this morning and it fails to boot. > > > > > > > > I hangs at Setting hostid : 0x917bf354 > > > > > > > > > > > > > > > > This is a vm running on vmware. > > > > > > > > If i boot the old kernel from yesterday it boots normally. > > > > > > > > > > > > > > > > uname -a > > > > > > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > > > > > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > > > > > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL > > > > > > > > amd64 > > > > > > > > > > > > > > > I had no issues with my build machine or either of two laptops, > > > > > > > either > > > > > > > from yesterday: > > > > > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT > > > > > > > #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > > > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > > > amd64 1400033 1400033 > > > > > > > > > > > > > > or today: > > > > > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT > > > > > > > #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > > > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > > > amd64 1400033 1400033 > > > > > > > > > > > > > > [uname strings from my main laptop shown, but I keep the machines > > > > > > > in sync rather aggressively.] > > > > > > > > > > > > > > Perhaps the issue you are encountering involves things not in my > > > > > > > environment (such as VMs or ZFS)? > > > > > > > > > > > > > > Peace, > > > > > > > david > > > > > > > -- > > > > > > > David H. Wolfskill > > > > > > > da...@catwhisker.org > > > > > > > Life is not intended to be a zero-sum game. > > > > > > > > > > > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public > > > > > > > key. > > > > > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > > > > > > > > > > >Fails to boot to multiuser at git: 8db1669959ce > > > > > >Boot fine at git: 0b79a76f8487 > > > > > > > > > > > > Boot to singleuser is fine even with failed revision. > > > > > > > > > > > > Failure mode: > > > > > > Hard hangup or spinning and non-operable. Hard power-off needed. > > > > > > Seems to happen after starting rc.conf processing and before > > > > > > setting > > > > > > hostid. > > > > > > > > > > > > -- > > > > > > Tomoaki AOKI > > > > > > > > > > > Additional info and correction. > > > > > *Hung up before setting hostuuid, not hostid. > > > > > > > > > > *^T doesn't respond at all, only hard power off worked. > > > > > > > > > > *`kldload nvidia-modeset.ko` on single user mode sanely work. > > > > > > > > > > > > > > > Why I could know rc.conf is started to be processed: > > > > > > > > > > I have lines below at the end of /etc/rc.conf and its output is > > > > > always > > > > > the first line related to /etc/rc.conf, at least for non-verbose > > > > > boot. > > > > > The next line is normally "Setting hostuuid: " line, which was not > > > > > displayed when boot hung up. > > > > > > > > > > > > > > > kldstat -q -n nvidia.ko > > > > > if [ 0 -ne $? ] ; then > > > > >echo "Loading nvidia-driver modules via rc.conf." > > > > >if [ -e /boot/modules/nvidia-modeset.ko ] ; then > > > > > kld_list="${kld_list} nvidia-modeset.ko" > > > > >else > > > > > kld_list="${kld_list} nvidia.ko" > > > > >fi > > > > > fi > > > > If you do not load nvidia-modeset.ko at all, does the boot proceed? > > > > > > > > When the boot hangs, can you enter into ddb? > > > > > > > > > > > I do not load a nvidia-modeset.ko kernel module and it will not boot. It > > > hangs with Setting hostid : as the last message. Then only a powercycle > > > gets > > > me back. If i boot in single user mode all is fine, but as soon as i exit > > > single user mode it hangs at the same spot. > > > > Can you enter ddb at the hang point? > > It depends. In most cases, nothing other than power cycle works, but I > could get into ddb by ctrl-alt-esc only once. `bt` was like below. > Converted from photo using Google Lens, and hand-fixed mis-conversion > as much as possible, but there can be remaining mis-conversion. > > > = `bt` output = > > KDB:
Re: latest current fails to boot.
On Fri, 24 Sep 2021 01:33:33 +0300 Konstantin Belousov wrote: > On Thu, Sep 23, 2021 at 09:20:51PM +0200, Johan Hendriks wrote: > > > > On 23/09/2021 19:52, Konstantin Belousov wrote: > > > On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote: > > > > On Wed, 22 Sep 2021 23:09:05 +0900 > > > > Tomoaki AOKI wrote: > > > > > > > > > On Wed, 22 Sep 2021 05:47:46 -0700 > > > > > David Wolfskill wrote: > > > > > > > > > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > > > > > > I did a git pull this morning and it fails to boot. > > > > > > > I hangs at Setting hostid : 0x917bf354 > > > > > > > > > > > > > > This is a vm running on vmware. > > > > > > > If i boot the old kernel from yesterday it boots normally. > > > > > > > > > > > > > > uname -a > > > > > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > > > > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > > > > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL > > > > > > > amd64 > > > > > > > > > > > > > I had no issues with my build machine or either of two laptops, > > > > > > either > > > > > > from yesterday: > > > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 > > > > > > main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > > amd64 1400033 1400033 > > > > > > > > > > > > or today: > > > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 > > > > > > main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > > amd64 1400033 1400033 > > > > > > > > > > > > [uname strings from my main laptop shown, but I keep the machines > > > > > > in sync rather aggressively.] > > > > > > > > > > > > Perhaps the issue you are encountering involves things not in my > > > > > > environment (such as VMs or ZFS)? > > > > > > > > > > > > Peace, > > > > > > david > > > > > > -- > > > > > > David H. Wolfskill da...@catwhisker.org > > > > > > Life is not intended to be a zero-sum game. > > > > > > > > > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public > > > > > > key. > > > > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > > > > > > > > >Fails to boot to multiuser at git: 8db1669959ce > > > > >Boot fine at git: 0b79a76f8487 > > > > > > > > > > Boot to singleuser is fine even with failed revision. > > > > > > > > > > Failure mode: > > > > > Hard hangup or spinning and non-operable. Hard power-off needed. > > > > > Seems to happen after starting rc.conf processing and before setting > > > > > hostid. > > > > > > > > > > -- > > > > > Tomoaki AOKI > > > > > > > > > Additional info and correction. > > > > *Hung up before setting hostuuid, not hostid. > > > > > > > > *^T doesn't respond at all, only hard power off worked. > > > > > > > > *`kldload nvidia-modeset.ko` on single user mode sanely work. > > > > > > > > > > > > Why I could know rc.conf is started to be processed: > > > > > > > > I have lines below at the end of /etc/rc.conf and its output is always > > > > the first line related to /etc/rc.conf, at least for non-verbose boot. > > > > The next line is normally "Setting hostuuid: " line, which was not > > > > displayed when boot hung up. > > > > > > > > > > > > kldstat -q -n nvidia.ko > > > > if [ 0 -ne $? ] ; then > > > >echo "Loading nvidia-driver modules via rc.conf." > > > >if [ -e /boot/modules/nvidia-modeset.ko ] ; then > > > > kld_list="${kld_list} nvidia-modeset.ko" > > > >else > > > > kld_list="${kld_list} nvidia.ko" > > > >fi > > > > fi > > > If you do not load nvidia-modeset.ko at all, does the boot proceed? > > > > > > When the boot hangs, can you enter into ddb? > > > > > > > > I do not load a nvidia-modeset.ko kernel module and it will not boot. It > > hangs with Setting hostid : as the last message. Then only a powercycle gets > > me back. If i boot in single user mode all is fine, but as soon as i exit > > single user mode it hangs at the same spot. > > Can you enter ddb at the hang point? It depends. In most cases, nothing other than power cycle works, but I could get into ddb by ctrl-alt-esc only once. `bt` was like below. Converted from photo using Google Lens, and hand-fixed mis-conversion as much as possible, but there can be remaining mis-conversion. = `bt` output = KDB: enter: manual escape to debugger [ thread pid 12 tid 100041 ] Stopped at kdb_enter+0x37: movq $0,0x103aale (Xrip) db> bt Tracing pid 12 tid 100041 td 0xfe00e32c kdb_enter() at kdb_enter+0x37/frame 0xfe00e2e80d40 vt kbdevent() at vt_kbdevent+0x22f/frame 0xfe00e2e80da0 kbdmux_intr() at kbdmux_intr+0x45/frame Oxfe00e2e80dc0
Re: latest current fails to boot.
On Thu, 23 Sep 2021 20:45:41 +0200 Juraj Lutter wrote: > > > > On 23 Sep 2021, at 19:52, Konstantin Belousov wrote: > > > > If you do not load nvidia-modeset.ko at all, does the boot proceed? > > > > When the boot hangs, can you enter into ddb? > > That also brings up a question: Does nvidia kmods (and probably also drm > kmod) match the kernel? > > > 〓 > Juraj Lutter > o...@freebsd.org Of course. I habitally rebuild all ports having *.ko after installworld and etcupdate. (I don't use drm-*-kmod, as nvidia-driver doesn't need it.) -- Tomoaki AOKI
Re: latest current fails to boot.
On Thu, Sep 23, 2021 at 09:20:51PM +0200, Johan Hendriks wrote: > > On 23/09/2021 19:52, Konstantin Belousov wrote: > > On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote: > > > On Wed, 22 Sep 2021 23:09:05 +0900 > > > Tomoaki AOKI wrote: > > > > > > > On Wed, 22 Sep 2021 05:47:46 -0700 > > > > David Wolfskill wrote: > > > > > > > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > > > > > I did a git pull this morning and it fails to boot. > > > > > > I hangs at Setting hostid : 0x917bf354 > > > > > > > > > > > > This is a vm running on vmware. > > > > > > If i boot the old kernel from yesterday it boots normally. > > > > > > > > > > > > uname -a > > > > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > > > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > > > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 > > > > > > > > > > > I had no issues with my build machine or either of two laptops, either > > > > > from yesterday: > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 > > > > > main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > amd64 1400033 1400033 > > > > > > > > > > or today: > > > > > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 > > > > > main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > > > amd64 1400033 1400033 > > > > > > > > > > [uname strings from my main laptop shown, but I keep the machines > > > > > in sync rather aggressively.] > > > > > > > > > > Perhaps the issue you are encountering involves things not in my > > > > > environment (such as VMs or ZFS)? > > > > > > > > > > Peace, > > > > > david > > > > > -- > > > > > David H. Wolfskill da...@catwhisker.org > > > > > Life is not intended to be a zero-sum game. > > > > > > > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public key. > > > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > > > > > > >Fails to boot to multiuser at git: 8db1669959ce > > > >Boot fine at git: 0b79a76f8487 > > > > > > > > Boot to singleuser is fine even with failed revision. > > > > > > > > Failure mode: > > > > Hard hangup or spinning and non-operable. Hard power-off needed. > > > > Seems to happen after starting rc.conf processing and before setting > > > > hostid. > > > > > > > > -- > > > > Tomoaki AOKI > > > > > > > Additional info and correction. > > > *Hung up before setting hostuuid, not hostid. > > > > > > *^T doesn't respond at all, only hard power off worked. > > > > > > *`kldload nvidia-modeset.ko` on single user mode sanely work. > > > > > > > > > Why I could know rc.conf is started to be processed: > > > > > > I have lines below at the end of /etc/rc.conf and its output is always > > > the first line related to /etc/rc.conf, at least for non-verbose boot. > > > The next line is normally "Setting hostuuid: " line, which was not > > > displayed when boot hung up. > > > > > > > > > kldstat -q -n nvidia.ko > > > if [ 0 -ne $? ] ; then > > >echo "Loading nvidia-driver modules via rc.conf." > > >if [ -e /boot/modules/nvidia-modeset.ko ] ; then > > > kld_list="${kld_list} nvidia-modeset.ko" > > >else > > > kld_list="${kld_list} nvidia.ko" > > >fi > > > fi > > If you do not load nvidia-modeset.ko at all, does the boot proceed? > > > > When the boot hangs, can you enter into ddb? > > > > > I do not load a nvidia-modeset.ko kernel module and it will not boot. It > hangs with Setting hostid : as the last message. Then only a powercycle gets > me back. If i boot in single user mode all is fine, but as soon as i exit > single user mode it hangs at the same spot. Can you enter ddb at the hang point? Do you load any other modules besides nvidia, from rc.conf?
Re: latest current fails to boot.
On 23/09/2021 19:52, Konstantin Belousov wrote: On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote: On Wed, 22 Sep 2021 23:09:05 +0900 Tomoaki AOKI wrote: On Wed, 22 Sep 2021 05:47:46 -0700 David Wolfskill wrote: On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: I did a git pull this morning and it fails to boot. I hangs at Setting hostid : 0x917bf354 This is a vm running on vmware. If i boot the old kernel from yesterday it boots normally. uname -a FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 I had no issues with my build machine or either of two laptops, either from yesterday: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 or today: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 [uname strings from my main laptop shown, but I keep the machines in sync rather aggressively.] Perhaps the issue you are encountering involves things not in my environment (such as VMs or ZFS)? Peace, david -- David H. Wolfskill da...@catwhisker.org Life is not intended to be a zero-sum game. See https://www.catwhisker.org/~david/publickey.gpg for my public key. For me, on bare metal (non-vm) amd64 with root-on-ZFS, Fails to boot to multiuser at git: 8db1669959ce Boot fine at git: 0b79a76f8487 Boot to singleuser is fine even with failed revision. Failure mode: Hard hangup or spinning and non-operable. Hard power-off needed. Seems to happen after starting rc.conf processing and before setting hostid. -- Tomoaki AOKI Additional info and correction. *Hung up before setting hostuuid, not hostid. *^T doesn't respond at all, only hard power off worked. *`kldload nvidia-modeset.ko` on single user mode sanely work. Why I could know rc.conf is started to be processed: I have lines below at the end of /etc/rc.conf and its output is always the first line related to /etc/rc.conf, at least for non-verbose boot. The next line is normally "Setting hostuuid: " line, which was not displayed when boot hung up. kldstat -q -n nvidia.ko if [ 0 -ne $? ] ; then echo "Loading nvidia-driver modules via rc.conf." if [ -e /boot/modules/nvidia-modeset.ko ] ; then kld_list="${kld_list} nvidia-modeset.ko" else kld_list="${kld_list} nvidia.ko" fi fi If you do not load nvidia-modeset.ko at all, does the boot proceed? When the boot hangs, can you enter into ddb? I do not load a nvidia-modeset.ko kernel module and it will not boot. It hangs with Setting hostid : as the last message. Then only a powercycle gets me back. If i boot in single user mode all is fine, but as soon as i exit single user mode it hangs at the same spot.
Re: latest current fails to boot.
> On 23 Sep 2021, at 19:52, Konstantin Belousov wrote: > > If you do not load nvidia-modeset.ko at all, does the boot proceed? > > When the boot hangs, can you enter into ddb? That also brings up a question: Does nvidia kmods (and probably also drm kmod) match the kernel? — Juraj Lutter o...@freebsd.org
Re: latest current fails to boot.
On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote: > On Wed, 22 Sep 2021 23:09:05 +0900 > Tomoaki AOKI wrote: > > > On Wed, 22 Sep 2021 05:47:46 -0700 > > David Wolfskill wrote: > > > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > > > I did a git pull this morning and it fails to boot. > > > > I hangs at Setting hostid : 0x917bf354 > > > > > > > > This is a vm running on vmware. > > > > If i boot the old kernel from yesterday it boots normally. > > > > > > > > uname -a > > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 > > > > > > > > > > I had no issues with my build machine or either of two laptops, either > > > from yesterday: > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 > > > main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > amd64 1400033 1400033 > > > > > > or today: > > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 > > > main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > > amd64 1400033 1400033 > > > > > > [uname strings from my main laptop shown, but I keep the machines > > > in sync rather aggressively.] > > > > > > Perhaps the issue you are encountering involves things not in my > > > environment (such as VMs or ZFS)? > > > > > > Peace, > > > david > > > -- > > > David H. Wolfskill da...@catwhisker.org > > > Life is not intended to be a zero-sum game. > > > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public key. > > > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > > > Fails to boot to multiuser at git: 8db1669959ce > > Boot fine at git: 0b79a76f8487 > > > > Boot to singleuser is fine even with failed revision. > > > > Failure mode: > > Hard hangup or spinning and non-operable. Hard power-off needed. > > Seems to happen after starting rc.conf processing and before setting > > hostid. > > > > -- > > Tomoaki AOKI > > > > Additional info and correction. > *Hung up before setting hostuuid, not hostid. > > *^T doesn't respond at all, only hard power off worked. > > *`kldload nvidia-modeset.ko` on single user mode sanely work. > > > Why I could know rc.conf is started to be processed: > > I have lines below at the end of /etc/rc.conf and its output is always > the first line related to /etc/rc.conf, at least for non-verbose boot. > The next line is normally "Setting hostuuid: " line, which was not > displayed when boot hung up. > > > kldstat -q -n nvidia.ko > if [ 0 -ne $? ] ; then > echo "Loading nvidia-driver modules via rc.conf." > if [ -e /boot/modules/nvidia-modeset.ko ] ; then > kld_list="${kld_list} nvidia-modeset.ko" > else > kld_list="${kld_list} nvidia.ko" > fi > fi If you do not load nvidia-modeset.ko at all, does the boot proceed? When the boot hangs, can you enter into ddb?
Re: latest current fails to boot.
On Wed, 22 Sep 2021 23:09:05 +0900 Tomoaki AOKI wrote: > On Wed, 22 Sep 2021 05:47:46 -0700 > David Wolfskill wrote: > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > > I did a git pull this morning and it fails to boot. > > > I hangs at Setting hostid : 0x917bf354 > > > > > > This is a vm running on vmware. > > > If i boot the old kernel from yesterday it boots normally. > > > > > > uname -a > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 > > > > > > > I had no issues with my build machine or either of two laptops, either > > from yesterday: > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 > > main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > amd64 1400033 1400033 > > > > or today: > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 > > main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > > amd64 1400033 1400033 > > > > [uname strings from my main laptop shown, but I keep the machines > > in sync rather aggressively.] > > > > Perhaps the issue you are encountering involves things not in my > > environment (such as VMs or ZFS)? > > > > Peace, > > david > > -- > > David H. Wolfskill da...@catwhisker.org > > Life is not intended to be a zero-sum game. > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public key. > > For me, on bare metal (non-vm) amd64 with root-on-ZFS, > > Fails to boot to multiuser at git: 8db1669959ce > Boot fine at git: 0b79a76f8487 > > Boot to singleuser is fine even with failed revision. > > Failure mode: > Hard hangup or spinning and non-operable. Hard power-off needed. > Seems to happen after starting rc.conf processing and before setting > hostid. > > -- > Tomoaki AOKI > Additional info and correction. *Hung up before setting hostuuid, not hostid. *^T doesn't respond at all, only hard power off worked. *`kldload nvidia-modeset.ko` on single user mode sanely work. Why I could know rc.conf is started to be processed: I have lines below at the end of /etc/rc.conf and its output is always the first line related to /etc/rc.conf, at least for non-verbose boot. The next line is normally "Setting hostuuid: " line, which was not displayed when boot hung up. kldstat -q -n nvidia.ko if [ 0 -ne $? ] ; then echo "Loading nvidia-driver modules via rc.conf." if [ -e /boot/modules/nvidia-modeset.ko ] ; then kld_list="${kld_list} nvidia-modeset.ko" else kld_list="${kld_list} nvidia.ko" fi fi -- Tomoaki AOKI
Re: latest current fails to boot.
On 22/09/2021 16:09, Tomoaki AOKI wrote: On Wed, 22 Sep 2021 05:47:46 -0700 David Wolfskill wrote: On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: I did a git pull this morning and it fails to boot. I hangs at Setting hostid : 0x917bf354 This is a vm running on vmware. If i boot the old kernel from yesterday it boots normally. uname -a FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 I had no issues with my build machine or either of two laptops, either from yesterday: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 or today: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 [uname strings from my main laptop shown, but I keep the machines in sync rather aggressively.] Perhaps the issue you are encountering involves things not in my environment (such as VMs or ZFS)? Peace, david -- David H. Wolfskill da...@catwhisker.org Life is not intended to be a zero-sum game. See https://www.catwhisker.org/~david/publickey.gpg for my public key. For me, on bare metal (non-vm) amd64 with root-on-ZFS, Fails to boot to multiuser at git: 8db1669959ce Boot fine at git: 0b79a76f8487 Boot to singleuser is fine even with failed revision. Failure mode: Hard hangup or spinning and non-operable. Hard power-off needed. Seems to happen after starting rc.conf processing and before setting hostid. For me a boot in single user works also.
Re: latest current fails to boot.
On Wed, 22 Sep 2021 05:47:46 -0700 David Wolfskill wrote: > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > > I did a git pull this morning and it fails to boot. > > I hangs at Setting hostid : 0x917bf354 > > > > This is a vm running on vmware. > > If i boot the old kernel from yesterday it boots normally. > > > > uname -a > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 > > > > I had no issues with my build machine or either of two laptops, either > from yesterday: > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 > main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > amd64 1400033 1400033 > > or today: > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 > main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY > amd64 1400033 1400033 > > [uname strings from my main laptop shown, but I keep the machines > in sync rather aggressively.] > > Perhaps the issue you are encountering involves things not in my > environment (such as VMs or ZFS)? > > Peace, > david > -- > David H. Wolfskill da...@catwhisker.org > Life is not intended to be a zero-sum game. > > See https://www.catwhisker.org/~david/publickey.gpg for my public key. For me, on bare metal (non-vm) amd64 with root-on-ZFS, Fails to boot to multiuser at git: 8db1669959ce Boot fine at git: 0b79a76f8487 Boot to singleuser is fine even with failed revision. Failure mode: Hard hangup or spinning and non-operable. Hard power-off needed. Seems to happen after starting rc.conf processing and before setting hostid. -- Tomoaki AOKI
Re: latest current fails to boot.
On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > I did a git pull this morning and it fails to boot. > I hangs at Setting hostid : 0x917bf354 > > This is a vm running on vmware. > If i boot the old kernel from yesterday it boots normally. > > uname -a > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 You did not provided any useful information. What is displayed on console if you press ^T ? What processes are running, what do they wait for?
Re: latest current fails to boot.
On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote: > I did a git pull this morning and it fails to boot. > I hangs at Setting hostid : 0x917bf354 > > This is a vm running on vmware. > If i boot the old kernel from yesterday it boots normally. > > uname -a > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0 > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021 > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64 > I had no issues with my build machine or either of two laptops, either from yesterday: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 or today: FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021 r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY amd64 1400033 1400033 [uname strings from my main laptop shown, but I keep the machines in sync rather aggressively.] Perhaps the issue you are encountering involves things not in my environment (such as VMs or ZFS)? Peace, david -- David H. Wolfskill da...@catwhisker.org Life is not intended to be a zero-sum game. See https://www.catwhisker.org/~david/publickey.gpg for my public key. signature.asc Description: PGP signature