Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

2017-12-12 Thread Mark Millard
On 2017-Dec-12, at 3:19 PM, Mark Millard  wrote:

> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz  wrote:
> 
>> On 12.12.17, Mark Millard wrote:
>> 
>>> I initially jumped from -r326192 to -r326726 and
>>> ended up with a rpi2 that would normally hang
>>> somewhere around release APs being displayed.
>>> (I have had a couple of completed boots but many
>>> dozens of hung-up attempts.) Both a debug kernel
>>> and a non-debug kernel hang the same way.
>>> 
>>> Bisecting the kernel (holding world -r326726
>>> constant) showed:
>>> 
>>> -r326346 did not hang (nor did before)
>>> -r326347 and later hung.
>> 
>> JFYI, the latest kernel (and world) running at one of my 
>> RPI2-B is r326631, without any issues.
> 
> Interesting. (By the way: My context
> is with a V1.1 Cortex-A7 based rpi2,
> not V1.2 and Cortex-A53.)
> 
> I've almost always run the root file
> system being on a USB SSD instead of
> on mmcsd0 . I wonder if that is
> somehow involved since it may be
> unusual. UFS file system.
> 
> The USB SSD is on a powered hub that
> is in turn plugged into the rpi2.
> 
> [I had the hang problem before the
> following and after.]
> 
> The mechanism for holding mmcsd0 in
> failed recently but the ejection
> mechanism still works. So I hold
> in mmcsd0 until after I get a USB
> SSD boot now. (Interrupt boot, unload,
> boot/autoboot, picks up the kernel
> from the USB SSD.)
> 
> This means that I effectively can
> not avoid the USB SSD any more
> unless I get my hands on a different
> V1.1 rpi2.

Looks like I'll get my hands on a different
rpi2 V1.1 in a few days. So I should then
be able to do reasonable mmcsd0-only
experiments. At least once I find the time.

FYI, in case boot details are involved
in reproducing the problem. . .

On the mmcsd0 I have /boot/loader.conf with:

kern.cam.boot_delay="1"
vfs.mountroot.timeout="10"

and /etc/fstab with:

/dev/ufs/RPI2rootfs /   ufs rw,noatime  1 1
/dev/label/RPI2Aswapnoneswapsw  0 0
/dev/label/RPI2Aboot/boot/msdos msdosfs rw,noatime  0 0

where the /dev/ufs/RPI2rootfs was the USB SSD.

However, I interrupt the loader and unload and
then boot or autoboot. (But the hangs started
before this extra sequence was involved.)

On the USB SSD I have /boot/loader.conf with:

kern.cam.boot_delay="1"
vfs.mountroot.timeout="10"

and /etc/fstab with:

/dev/da0p1  /   ufs rw,noatime  1 1
/dev/da0p2  noneswapsw  0 0


What db> showed does point to things that
-r326347 involve:

chain 32:
thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK
thread 100077 (pid 23, uma) is on a run queue

But for all I know -r326347 could be depending
on something required to be true but not correct
elsewhere in the rpi2 support. I'm not claiming
that -r326347 is wrong, just that it is involved.
I've way to little knowledge to claim to know
what is wrong on the evidence that I have.

I've not yet tried a bpi-m3 Cortex-A7 context or
a pine64+ 2GB or rpi3 Cortex-A53 context. Nor
powerpc64 nor powerpc. At some point I'll get the
time for one or more of these. I've not had amd64
problems in this area.

I may not be able to test the bpi-m3: its barrel
connector for power seems flaky and it is
difficult to keep the board powered for long
periods in recent times.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot [backtraces added]

2017-12-12 Thread Mark Millard
[Just adding back traces. ffs_mount
use of uma_zcreate is involved in the
kernel thread, as is uma_reclaim_worker's
use of uma_reclaim_locked in the uma
thread.]

On 2017-Dec-12, at 4:20 PM, Mark Millard  wrote:

> On 2017-Dec-12, at 3:19 PM, Mark Millard  wrote:
> 
>> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz  
>> wrote:
>> 
>>> On 12.12.17, Mark Millard wrote:
>>> 
 I initially jumped from -r326192 to -r326726 and
 ended up with a rpi2 that would normally hang
 somewhere around release APs being displayed.
 (I have had a couple of completed boots but many
 dozens of hung-up attempts.) Both a debug kernel
 and a non-debug kernel hang the same way.
 
 Bisecting the kernel (holding world -r326726
 constant) showed:
 
 -r326346 did not hang (nor did before)
 -r326347 and later hung.
>>> 
>>> JFYI, the latest kernel (and world) running at one of my 
>>> RPI2-B is r326631, without any issues.
>> 
>> Interesting. (By the way: My context
>> is with a V1.1 Cortex-A7 based rpi2,
>> not V1.2 and Cortex-A53.)
>> 
>> I've almost always run the root file
>> system being on a USB SSD instead of
>> on mmcsd0 . I wonder if that is
>> somehow involved since it may be
>> unusual. UFS file system.
>> 
>> The USB SSD is on a powered hub that
>> is in turn plugged into the rpi2.
>> 
>> [I had the hang problem before the
>> following and after.]
>> 
>> The mechanism for holding mmcsd0 in
>> failed recently but the ejection
>> mechanism still works. So I hold
>> in mmcsd0 until after I get a USB
>> SSD boot now. (Interrupt boot, unload,
>> boot/autoboot, picks up the kernel
>> from the USB SSD.)
>> 
>> This means that I effectively can
>> not avoid the USB SSD any more
>> unless I get my hands on a different
>> V1.1 rpi2.
> 
> Looks like I'll get my hands on a different
> rpi2 V1.1 in a few days. So I should then
> be able to do reasonable mmcsd0-only
> experiments. At least once I find the time.
> 
> FYI, in case boot details are involved
> in reproducing the problem. . .
> 
> On the mmcsd0 I have /boot/loader.conf with:
> 
> kern.cam.boot_delay="1"
> vfs.mountroot.timeout="10"
> 
> and /etc/fstab with:
> 
> /dev/ufs/RPI2rootfs /   ufs rw,noatime  1 1
> /dev/label/RPI2Aswapnoneswapsw  0 0
> /dev/label/RPI2Aboot/boot/msdos msdosfs rw,noatime  0 0
> 
> where the /dev/ufs/RPI2rootfs was the USB SSD.
> 
> However, I interrupt the loader and unload and
> then boot or autoboot. (But the hangs started
> before this extra sequence was involved.)
> 
> On the USB SSD I have /boot/loader.conf with:
> 
> kern.cam.boot_delay="1"
> vfs.mountroot.timeout="10"
> 
> and /etc/fstab with:
> 
> /dev/da0p1  /   ufs rw,noatime  1 1
> /dev/da0p2  noneswapsw  0 0
> 
> 
> What db> showed does point to things that
> -r326347 involve:
> 
> chain 32:
> thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK
> thread 100077 (pid 23, uma) is on a run queue
> 
> But for all I know -r326347 could be depending
> on something required to be true but not correct
> elsewhere in the rpi2 support. I'm not claiming
> that -r326347 is wrong, just that it is involved.
> I've way to little knowledge to claim to know
> what is wrong on the evidence that I have.
> 
> I've not yet tried a bpi-m3 Cortex-A7 context or
> a pine64+ 2GB or rpi3 Cortex-A53 context. Nor
> powerpc64 nor powerpc. At some point I'll get the
> time for one or more of these. I've not had amd64
> problems in this area.
> 
> I may not be able to test the bpi-m3: its barrel
> connector for power seems flaky and it is
> difficult to keep the board powered for long
> periods in recent times.

For this new example -r326347 kernel based
boot-hang:

chain 35:
 thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK
 thread 100073 (pid 23, uma) is on a run queue

db> bt
Tracing pid 1 tid 11 td 0xd429f000
cpu_switch() at cpu_switch+0x18
 pc = 0xc0584f80  lr = 0xc0299ad8 (sched_switch+0x5c0)
 sp = 0xd42a56b8  fp = 0xd42a56e8
sched_switch() at sched_switch+0x5c0
 pc = 0xc0299ad8  lr = 0xc0275f28 (mi_switch+0x258)
 sp = 0xd42a56f0  fp = 0xd42a5718
 r4 = 0x0001f0f5  r5 = 0x
 r6 = 0xd429f000  r7 = 0x0006612d
 r8 = 0x  r9 = 0x0104
r10 = 0xc08073f0
mi_switch() at mi_switch+0x258
 pc = 0xc0275f28  lr = 0xc02c1508 (sleepq_switch+0x1c0)
 sp = 0xd42a5720  fp = 0xd42a5748
 r4 = 0xd429f000  r5 = 0xc0947dc4
 r6 = 0xc09a9c68  r7 = 0xc0947dc0
 r8 = 0x  r9 = 0xc09440c0
r10 = 0x00f4
sleepq_switch() at sleepq_switch+0x1c0
 pc = 0xc02c1508  lr = 0xc02c1308 (sleepq_wait+0x48)
 sp = 0xd42a5750  fp = 0xd42a5760
 r4 = 0xd429f000  r5 = 0x
 r6 = 0xc09a9c68  r7 = 0xc06b415c
 r8 = 0x0001  r9 = 0xc09a9c78
r10 = 0x
sleepq_wait() at sleepq

Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

2017-12-12 Thread Andreas Schwarz
On 12.12.17, Mark Millard wrote:

> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz  wrote:

>> JFYI, the latest kernel (and world) running at one of my 
>> RPI2-B is r326631, without any issues.
>
> Interesting. (By the way: My context
> is with a V1.1 Cortex-A7 based rpi2,
> not V1.2 and Cortex-A53.)

Same here. The RPI2-B (not V1.2) is the most stable system of all my SBCs
running FreeBSD.


> I've almost always run the root file
> system being on a USB SSD instead of
> on mmcsd0 . I wonder if that is
> somehow involved since it may be
> unusual. UFS file system.
>
> The USB SSD is on a powered hub that
> is in turn plugged into the rpi2.

Ok, I don't use an USB HDD, the limited USB bridge is the weakest point at 
the whole rpi, so there are probably not much users with this setup. I 
understand 
that you are forced to use ist

-asc

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

2017-12-12 Thread Andreas Schwarz
On 12.12.17, Mark Millard wrote:

> I initially jumped from -r326192 to -r326726 and
> ended up with a rpi2 that would normally hang
> somewhere around release APs being displayed.
> (I have had a couple of completed boots but many
> dozens of hung-up attempts.) Both a debug kernel
> and a non-debug kernel hang the same way.
>
> Bisecting the kernel (holding world -r326726
> constant) showed:
>
> -r326346 did not hang (nor did before)
> -r326347 and later hung.

JFYI, the latest kernel (and world) running at one of my 
RPI2-B is r326631, without any issues.


-asc

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

2017-12-12 Thread Mark Millard
On 2017-Dec-12, at 2:02 PM, Andreas Schwarz  wrote:

> On 12.12.17, Mark Millard wrote:
> 
>> I initially jumped from -r326192 to -r326726 and
>> ended up with a rpi2 that would normally hang
>> somewhere around release APs being displayed.
>> (I have had a couple of completed boots but many
>> dozens of hung-up attempts.) Both a debug kernel
>> and a non-debug kernel hang the same way.
>> 
>> Bisecting the kernel (holding world -r326726
>> constant) showed:
>> 
>> -r326346 did not hang (nor did before)
>> -r326347 and later hung.
> 
> JFYI, the latest kernel (and world) running at one of my 
> RPI2-B is r326631, without any issues.

Interesting. (By the way: My context
is with a V1.1 Cortex-A7 based rpi2,
not V1.2 and Cortex-A53.)

I've almost always run the root file
system being on a USB SSD instead of
on mmcsd0 . I wonder if that is
somehow involved since it may be
unusual. UFS file system.

The USB SSD is on a powered hub that
is in turn plugged into the rpi2.

[I had the hang problem before the
following and after.]

The mechanism for holding mmcsd0 in
failed recently but the ejection
mechanism still works. So I hold
in mmcsd0 until after I get a USB
SSD boot now. (Interrupt boot, unload,
boot/autoboot, picks up the kernel
from the USB SSD.)

This means that I effectively can
not avoid the USB SSD any more
unless I get my hands on a different
V1.1 rpi2.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot

2017-12-12 Thread Mark Millard
I initially jumped from -r326192 to -r326726 and
ended up with a rpi2 that would normally hang
somewhere around release APs being displayed.
(I have had a couple of completed boots but many
dozens of hung-up attempts.) Both a debug kernel
and a non-debug kernel hang the same way.

Bisecting the kernel (holding world -r326726
constant) showed:

-r326346 did not hang (nor did before)
-r326347 and later hung.

Conclusion: -r326347 changes are involved in the
boot hangups.

Context: head -r326726 based (from before I did
the bisect). My knowledge is limited so I may not
have picked optimal material to get from the db>
prompt.

It appears that the messages about "random" occur
during the hang (indefinite wait). As I remember,
I have seen examples where the "Trying to mount"
did not show up --but normally it did.

And example from a hung boot:

. . .
Release APs
Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
random: unblocking device.
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache

The hang never seems to time out but just sits
there, even for hours.

It allows the ~^B sequence to get to the db>
prompt. I've looked around a little a couple of
times.

One common point is that show allchains has
everything listed as sleeping except:

chain 32:
thread 11 (pid 1, kernel) blocked on sx "umadrain" XLOCK
thread 100077 (pid 23, uma) is on a run queue
chain 33:

(Note: uma's thread number has varied, as has the one
for [pagedaemon].)

bd> ps
 pid  ppid  pgrp   uid  state   wmesg   wchan   cmd
  28 0 0 0  DL  syncer  0xc095a31c  [syncer]
  27 0 0 0  DL  vlruwt  0xd7417730  [vnlru]
  26 0 0 0  DL  psleep  0xc0959c00  [bufdaemon]
  25 0 0 0  RL  [bufspacedaemon]
  24 0 0 0  DL  psleep  0xc095e8f8  [vmdaemon]
  23 0 0 0  RL  (threaded)  [pagedaemon]
100065   RunQ[pagedaemon]
100076   D   launds  0xc095e804  [laundry: dom0]
100077   RunQ[uma]
. . .
   1 0 0 0  DL  umadrai 0xc095e528  [kernel]
   0 0 0 0  RLs (threaded)  [kernel]
10   D   swapin  0xc09673c8  [swapper]
. . .
100072   D   -   0xd75b7e80  [if_io_tqg_1]
100073   RunQ[if_io_tqg_2]
100074   D   -   0xd75b7d80  [if_io_tqg_3]
100075   D   -   0xd75b7d00  [if_config_tqg_0]
100078   D   -   0xd83dc100  [softirq_0]
100079   D   -   0xd75b7c00  [softirq_1]
100080   RunQ[softirq_2]
100081   D   -   0xd75b7b00  [softirq_3]

(Which if_io_tqg_ and softirq_ pair has RunQ varies.)

All RunQ's are shown above. One or two [idle: CPU]'s have
state CanRun and the other [idle: CPU]'s have state RUN.
They are the only items with those states. Example from the
same hangup:

  10 0 0 0  RL  (threaded)  [idle]
12   Run CPU 0   [idle: cpu0]
13   Run CPU 1   [idle: cpu1]
14   CanRun  [idle: cpu2]
15   CanRun  [idle: cpu3]


db> show lock 0xc095e528
class: sx
name: umadrain
state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma")
waiters: shared

db> show thread 11
Thread 11 at 0xd40a7000:
proc (pid 1): 0xd40a3000
name: kernel
stack: 0xd40ac000-0xd40adfff
flags: 0x4  pflags: 0x2000
state: INHIBITED: {SLEEPING}
wmesg: umadrain  wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. 5eac6a04)
priority: 84
container lock: sleepq chain (0xc0957244)
last voluntary switch: 1297717 ms ago
last involuntary switch: 1297809 ms ago

db> show thread 100077
Thread 100077 at 0xd6cbe740:
proc (pid 23): 0xd6cab000
name: uma
stack: 0xd83ca000-0xd83cbfff
flags: 0x4  pflags: 0x20
state: RUNQ
priority: 84
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297815 ms ago
last involuntary switch: 1297815 ms ago

db> show thread 100073
Thread 100073 at 0xd7406740:
proc (pid 0): 0xc09673c8
name: if_io_tqg_2
stack: 0xd742a000-0xd742bfff
flags: 0x4  pflags: 0x20
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297818 ms ago

db> show thread 100080
Thread 100080 at 0xd7431ae0:
proc (pid 0): 0xc09673c8
name: softirq_2
stack: 0xd83f5000-0xd83f6fff
flags: 0x4  pflags: 0x20
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297816 ms ago

db> show lock 0xc0952640
class: spin mutex
name: sched lock 2
flags: {SPIN, RECURSE}
state: {UNOWNED}

db> show lock 0xc0957244
class: spin mutex
name: sleepq chain
flags: {SPIN, RECURSE}
state: {UNOWNED}