Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
On 2017-Dec-12, at 3:19 PM, Mark Millard wrote: > On 2017-Dec-12, at 2:02 PM, Andreas Schwarz wrote: > >> On 12.12.17, Mark Millard wrote: >> >>> I initially jumped from -r326192 to -r326726 and >>> ended up with a rpi2 that would normally hang >>> somewhere around release APs being displayed. >>> (I have had a couple of completed boots but many >>> dozens of hung-up attempts.) Both a debug kernel >>> and a non-debug kernel hang the same way. >>> >>> Bisecting the kernel (holding world -r326726 >>> constant) showed: >>> >>> -r326346 did not hang (nor did before) >>> -r326347 and later hung. >> >> JFYI, the latest kernel (and world) running at one of my >> RPI2-B is r326631, without any issues. > > Interesting. (By the way: My context > is with a V1.1 Cortex-A7 based rpi2, > not V1.2 and Cortex-A53.) > > I've almost always run the root file > system being on a USB SSD instead of > on mmcsd0 . I wonder if that is > somehow involved since it may be > unusual. UFS file system. > > The USB SSD is on a powered hub that > is in turn plugged into the rpi2. > > [I had the hang problem before the > following and after.] > > The mechanism for holding mmcsd0 in > failed recently but the ejection > mechanism still works. So I hold > in mmcsd0 until after I get a USB > SSD boot now. (Interrupt boot, unload, > boot/autoboot, picks up the kernel > from the USB SSD.) > > This means that I effectively can > not avoid the USB SSD any more > unless I get my hands on a different > V1.1 rpi2. Looks like I'll get my hands on a different rpi2 V1.1 in a few days. So I should then be able to do reasonable mmcsd0-only experiments. At least once I find the time. FYI, in case boot details are involved in reproducing the problem. . . On the mmcsd0 I have /boot/loader.conf with: kern.cam.boot_delay="1" vfs.mountroot.timeout="10" and /etc/fstab with: /dev/ufs/RPI2rootfs / ufs rw,noatime 1 1 /dev/label/RPI2Aswapnoneswapsw 0 0 /dev/label/RPI2Aboot/boot/msdos msdosfs rw,noatime 0 0 where the /dev/ufs/RPI2rootfs was the USB SSD. However, I interrupt the loader and unload and then boot or autoboot. (But the hangs started before this extra sequence was involved.) On the USB SSD I have /boot/loader.conf with: kern.cam.boot_delay="1" vfs.mountroot.timeout="10" and /etc/fstab with: /dev/da0p1 / ufs rw,noatime 1 1 /dev/da0p2 noneswapsw 0 0 What db> showed does point to things that -r326347 involve: chain 32: thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100077 (pid 23, uma) is on a run queue But for all I know -r326347 could be depending on something required to be true but not correct elsewhere in the rpi2 support. I'm not claiming that -r326347 is wrong, just that it is involved. I've way to little knowledge to claim to know what is wrong on the evidence that I have. I've not yet tried a bpi-m3 Cortex-A7 context or a pine64+ 2GB or rpi3 Cortex-A53 context. Nor powerpc64 nor powerpc. At some point I'll get the time for one or more of these. I've not had amd64 problems in this area. I may not be able to test the bpi-m3: its barrel connector for power seems flaky and it is difficult to keep the board powered for long periods in recent times. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot [backtraces added]
[Just adding back traces. ffs_mount use of uma_zcreate is involved in the kernel thread, as is uma_reclaim_worker's use of uma_reclaim_locked in the uma thread.] On 2017-Dec-12, at 4:20 PM, Mark Millard wrote: > On 2017-Dec-12, at 3:19 PM, Mark Millard wrote: > >> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz >> wrote: >> >>> On 12.12.17, Mark Millard wrote: >>> I initially jumped from -r326192 to -r326726 and ended up with a rpi2 that would normally hang somewhere around release APs being displayed. (I have had a couple of completed boots but many dozens of hung-up attempts.) Both a debug kernel and a non-debug kernel hang the same way. Bisecting the kernel (holding world -r326726 constant) showed: -r326346 did not hang (nor did before) -r326347 and later hung. >>> >>> JFYI, the latest kernel (and world) running at one of my >>> RPI2-B is r326631, without any issues. >> >> Interesting. (By the way: My context >> is with a V1.1 Cortex-A7 based rpi2, >> not V1.2 and Cortex-A53.) >> >> I've almost always run the root file >> system being on a USB SSD instead of >> on mmcsd0 . I wonder if that is >> somehow involved since it may be >> unusual. UFS file system. >> >> The USB SSD is on a powered hub that >> is in turn plugged into the rpi2. >> >> [I had the hang problem before the >> following and after.] >> >> The mechanism for holding mmcsd0 in >> failed recently but the ejection >> mechanism still works. So I hold >> in mmcsd0 until after I get a USB >> SSD boot now. (Interrupt boot, unload, >> boot/autoboot, picks up the kernel >> from the USB SSD.) >> >> This means that I effectively can >> not avoid the USB SSD any more >> unless I get my hands on a different >> V1.1 rpi2. > > Looks like I'll get my hands on a different > rpi2 V1.1 in a few days. So I should then > be able to do reasonable mmcsd0-only > experiments. At least once I find the time. > > FYI, in case boot details are involved > in reproducing the problem. . . > > On the mmcsd0 I have /boot/loader.conf with: > > kern.cam.boot_delay="1" > vfs.mountroot.timeout="10" > > and /etc/fstab with: > > /dev/ufs/RPI2rootfs / ufs rw,noatime 1 1 > /dev/label/RPI2Aswapnoneswapsw 0 0 > /dev/label/RPI2Aboot/boot/msdos msdosfs rw,noatime 0 0 > > where the /dev/ufs/RPI2rootfs was the USB SSD. > > However, I interrupt the loader and unload and > then boot or autoboot. (But the hangs started > before this extra sequence was involved.) > > On the USB SSD I have /boot/loader.conf with: > > kern.cam.boot_delay="1" > vfs.mountroot.timeout="10" > > and /etc/fstab with: > > /dev/da0p1 / ufs rw,noatime 1 1 > /dev/da0p2 noneswapsw 0 0 > > > What db> showed does point to things that > -r326347 involve: > > chain 32: > thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK > thread 100077 (pid 23, uma) is on a run queue > > But for all I know -r326347 could be depending > on something required to be true but not correct > elsewhere in the rpi2 support. I'm not claiming > that -r326347 is wrong, just that it is involved. > I've way to little knowledge to claim to know > what is wrong on the evidence that I have. > > I've not yet tried a bpi-m3 Cortex-A7 context or > a pine64+ 2GB or rpi3 Cortex-A53 context. Nor > powerpc64 nor powerpc. At some point I'll get the > time for one or more of these. I've not had amd64 > problems in this area. > > I may not be able to test the bpi-m3: its barrel > connector for power seems flaky and it is > difficult to keep the board powered for long > periods in recent times. For this new example -r326347 kernel based boot-hang: chain 35: thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100073 (pid 23, uma) is on a run queue db> bt Tracing pid 1 tid 11 td 0xd429f000 cpu_switch() at cpu_switch+0x18 pc = 0xc0584f80 lr = 0xc0299ad8 (sched_switch+0x5c0) sp = 0xd42a56b8 fp = 0xd42a56e8 sched_switch() at sched_switch+0x5c0 pc = 0xc0299ad8 lr = 0xc0275f28 (mi_switch+0x258) sp = 0xd42a56f0 fp = 0xd42a5718 r4 = 0x0001f0f5 r5 = 0x r6 = 0xd429f000 r7 = 0x0006612d r8 = 0x r9 = 0x0104 r10 = 0xc08073f0 mi_switch() at mi_switch+0x258 pc = 0xc0275f28 lr = 0xc02c1508 (sleepq_switch+0x1c0) sp = 0xd42a5720 fp = 0xd42a5748 r4 = 0xd429f000 r5 = 0xc0947dc4 r6 = 0xc09a9c68 r7 = 0xc0947dc0 r8 = 0x r9 = 0xc09440c0 r10 = 0x00f4 sleepq_switch() at sleepq_switch+0x1c0 pc = 0xc02c1508 lr = 0xc02c1308 (sleepq_wait+0x48) sp = 0xd42a5750 fp = 0xd42a5760 r4 = 0xd429f000 r5 = 0x r6 = 0xc09a9c68 r7 = 0xc06b415c r8 = 0x0001 r9 = 0xc09a9c78 r10 = 0x sleepq_wait() at sleepq
Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
On 12.12.17, Mark Millard wrote: > On 2017-Dec-12, at 2:02 PM, Andreas Schwarz wrote: >> JFYI, the latest kernel (and world) running at one of my >> RPI2-B is r326631, without any issues. > > Interesting. (By the way: My context > is with a V1.1 Cortex-A7 based rpi2, > not V1.2 and Cortex-A53.) Same here. The RPI2-B (not V1.2) is the most stable system of all my SBCs running FreeBSD. > I've almost always run the root file > system being on a USB SSD instead of > on mmcsd0 . I wonder if that is > somehow involved since it may be > unusual. UFS file system. > > The USB SSD is on a powered hub that > is in turn plugged into the rpi2. Ok, I don't use an USB HDD, the limited USB bridge is the weakest point at the whole rpi, so there are probably not much users with this setup. I understand that you are forced to use ist -asc ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
On 12.12.17, Mark Millard wrote: > I initially jumped from -r326192 to -r326726 and > ended up with a rpi2 that would normally hang > somewhere around release APs being displayed. > (I have had a couple of completed boots but many > dozens of hung-up attempts.) Both a debug kernel > and a non-debug kernel hang the same way. > > Bisecting the kernel (holding world -r326726 > constant) showed: > > -r326346 did not hang (nor did before) > -r326347 and later hung. JFYI, the latest kernel (and world) running at one of my RPI2-B is r326631, without any issues. -asc ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
On 2017-Dec-12, at 2:02 PM, Andreas Schwarz wrote: > On 12.12.17, Mark Millard wrote: > >> I initially jumped from -r326192 to -r326726 and >> ended up with a rpi2 that would normally hang >> somewhere around release APs being displayed. >> (I have had a couple of completed boots but many >> dozens of hung-up attempts.) Both a debug kernel >> and a non-debug kernel hang the same way. >> >> Bisecting the kernel (holding world -r326726 >> constant) showed: >> >> -r326346 did not hang (nor did before) >> -r326347 and later hung. > > JFYI, the latest kernel (and world) running at one of my > RPI2-B is r326631, without any issues. Interesting. (By the way: My context is with a V1.1 Cortex-A7 based rpi2, not V1.2 and Cortex-A53.) I've almost always run the root file system being on a USB SSD instead of on mmcsd0 . I wonder if that is somehow involved since it may be unusual. UFS file system. The USB SSD is on a powered hub that is in turn plugged into the rpi2. [I had the hang problem before the following and after.] The mechanism for holding mmcsd0 in failed recently but the ejection mechanism still works. So I hold in mmcsd0 until after I get a USB SSD boot now. (Interrupt boot, unload, boot/autoboot, picks up the kernel from the USB SSD.) This means that I effectively can not avoid the USB SSD any more unless I get my hands on a different V1.1 rpi2. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
I initially jumped from -r326192 to -r326726 and ended up with a rpi2 that would normally hang somewhere around release APs being displayed. (I have had a couple of completed boots but many dozens of hung-up attempts.) Both a debug kernel and a non-debug kernel hang the same way. Bisecting the kernel (holding world -r326726 constant) showed: -r326346 did not hang (nor did before) -r326347 and later hung. Conclusion: -r326347 changes are involved in the boot hangups. Context: head -r326726 based (from before I did the bisect). My knowledge is limited so I may not have picked optimal material to get from the db> prompt. It appears that the messages about "random" occur during the hang (indefinite wait). As I remember, I have seen examples where the "Trying to mount" did not show up --but normally it did. And example from a hung boot: . . . Release APs Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]... random: unblocking device. arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache The hang never seems to time out but just sits there, even for hours. It allows the ~^B sequence to get to the db> prompt. I've looked around a little a couple of times. One common point is that show allchains has everything listed as sleeping except: chain 32: thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK thread 100077 (pid 23, uma) is on a run queue chain 33: (Note: uma's thread number has varied, as has the one for [pagedaemon].) bd> ps pid ppid pgrp uid state wmesg wchan cmd 28 0 0 0 DL syncer 0xc095a31c [syncer] 27 0 0 0 DL vlruwt 0xd7417730 [vnlru] 26 0 0 0 DL psleep 0xc0959c00 [bufdaemon] 25 0 0 0 RL [bufspacedaemon] 24 0 0 0 DL psleep 0xc095e8f8 [vmdaemon] 23 0 0 0 RL (threaded) [pagedaemon] 100065 RunQ[pagedaemon] 100076 D launds 0xc095e804 [laundry: dom0] 100077 RunQ[uma] . . . 1 0 0 0 DL umadrai 0xc095e528 [kernel] 0 0 0 0 RLs (threaded) [kernel] 10 D swapin 0xc09673c8 [swapper] . . . 100072 D - 0xd75b7e80 [if_io_tqg_1] 100073 RunQ[if_io_tqg_2] 100074 D - 0xd75b7d80 [if_io_tqg_3] 100075 D - 0xd75b7d00 [if_config_tqg_0] 100078 D - 0xd83dc100 [softirq_0] 100079 D - 0xd75b7c00 [softirq_1] 100080 RunQ[softirq_2] 100081 D - 0xd75b7b00 [softirq_3] (Which if_io_tqg_ and softirq_ pair has RunQ varies.) All RunQ's are shown above. One or two [idle: CPU]'s have state CanRun and the other [idle: CPU]'s have state RUN. They are the only items with those states. Example from the same hangup: 10 0 0 0 RL (threaded) [idle] 12 Run CPU 0 [idle: cpu0] 13 Run CPU 1 [idle: cpu1] 14 CanRun [idle: cpu2] 15 CanRun [idle: cpu3] db> show lock 0xc095e528 class: sx name: umadrain state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma") waiters: shared db> show thread 11 Thread 11 at 0xd40a7000: proc (pid 1): 0xd40a3000 name: kernel stack: 0xd40ac000-0xd40adfff flags: 0x4 pflags: 0x2000 state: INHIBITED: {SLEEPING} wmesg: umadrain wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. 5eac6a04) priority: 84 container lock: sleepq chain (0xc0957244) last voluntary switch: 1297717 ms ago last involuntary switch: 1297809 ms ago db> show thread 100077 Thread 100077 at 0xd6cbe740: proc (pid 23): 0xd6cab000 name: uma stack: 0xd83ca000-0xd83cbfff flags: 0x4 pflags: 0x20 state: RUNQ priority: 84 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297815 ms ago last involuntary switch: 1297815 ms ago db> show thread 100073 Thread 100073 at 0xd7406740: proc (pid 0): 0xc09673c8 name: if_io_tqg_2 stack: 0xd742a000-0xd742bfff flags: 0x4 pflags: 0x20 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297818 ms ago db> show thread 100080 Thread 100080 at 0xd7431ae0: proc (pid 0): 0xc09673c8 name: softirq_2 stack: 0xd83f5000-0xd83f6fff flags: 0x4 pflags: 0x20 state: RUNQ priority: 24 container lock: sched lock 2 (0xc0952640) last voluntary switch: 1297816 ms ago db> show lock 0xc0952640 class: spin mutex name: sched lock 2 flags: {SPIN, RECURSE} state: {UNOWNED} db> show lock 0xc0957244 class: spin mutex name: sleepq chain flags: {SPIN, RECURSE} state: {UNOWNED}