Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]
FYI: differences in FreeBSD's code vs.: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABJDBHI.html and its "Example 11.5. Cleaning by Virtual Address". . . (I do not claim to know that the differences point to any error. But I note them in case they prompt something. A couple of points do seem odd to me for the FreeBSD code.) The ARM example code does: DSB ISH DSB ISH ISB This has the property of forcing visibility only after everything is available to be visible (DSB ISH being referenced here). (The FreeBSD code does not have this property, but enforces visibility of a mix of old and new as it goes along.) It is point-of-unification code (data and instruction) for its purpose. The FreeBSD arm64_dcache__range code always pairs: DC CAVAC (or CIVAC or IVAC) DSB ISH inside it loop, forcing visibility of its intermediate state as it goes along. It is also point-of-coherency instead of point of unification (because of its purpose). The FreeBSD arm64_idcache_wbinv_range code always does the sequence of 4: DC CIVAC DSB ISH IC IVAU DSB ISH inside its loop and after the loop does one: ISB This is is a mix of point-of-coherency and point of unification code that forces visibility (DSB ISH) as it goes along, not just after the overall loop. To me the mix of point-of-coherency and point-of-unification seems strange. The FreeBSD arm64_icache_sync_range code always does the sequence of 4: DC CIVAU DSB ISH IC IVAU DSB ISH inside its loop and after the loop does one: ISB This is the closest to the "Example 11.5. Cleaning by Virtual Address" code and its purpose. It is all point-of-unification code that forces visibility (DSB ISH) as it goes long, not just after the overall loop. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]
On 2017-May-2, at 2:59 PM, Mark Millard wrote: > The code around handle_el1h_sync+0x70 : > > 00607804 sub sp, sp, #0x80 > 00607808 sub sp, sp, #0x120 > 0060780c stp x29, x30, [sp,#272] > 00607810 stpx28, x29, [sp,#256] > 00607814 stpx26, x27, [sp,#240] > 00607818 stpx24, x25, [sp,#224] > 0060781c stpx22, x23, [sp,#208] > 00607820 stpx20, x21, [sp,#192] > 00607824 stpx18, x19, [sp,#176] > 00607828 stpx16, x17, [sp,#160] > 0060782c stpx14, x15, [sp,#144] > 00607830 stpx12, x13, [sp,#128] > 00607834 stpx10, x11, [sp,#112] > 00607838 stpx8, x9, [sp,#96] > 0060783c stpx6, x7, [sp,#80] > 00607840 stpx4, x5, [sp,#64] > 00607844 stpx2, x3, [sp,#48] > 00607848 stpx0, x1, [sp,#32] > 0060784c mrsx10, elr_el1 > 00607850 mrsx11, spsr_el1 > 00607854 mrsx12, esr_el1 > 00607858 strx10, [sp,#16] > 0060785c stpw11, w12, [sp,#24] > 00607860 stpx18, x30, [sp] > 00607864 mrsx18, tpidr_el1 > 00607868 addx29, sp, #0x110 > 0060786c movx0, sp > 00607870 bl 0061aad8 > > 00607874 msrdaifset, #0x2 > 00607878 ldpx18, x30, [sp] > 0060787c ldpx10, x11, [sp,#16] > 00607880 msrspsr_el1, x11 > 00607884 msrelr_el1, x10 > 00607888 ldpx0, x1, [sp,#32] > 0060788c ldpx2, x3, [sp,#48] > 00607890 ldpx4, x5, [sp,#64] > 00607894 ldpx6, x7, [sp,#80] > 00607898 ldpx8, x9, [sp,#96] > 0060789c ldpx10, x11, [sp,#112] > 006078a0 ldpx12, x13, [sp,#128] > 006078a4 ldpx14, x15, [sp,#144] > 006078a8 ldpx16, x17, [sp,#160] > 006078ac ldrx29, [sp,#264] > 006078b0 movsp, x18 > 006078b4 mrsx18, tpidr_el1 > 006078b8 eret > > So the bl to do_el1h_sync apparently gets the data_abort. It turns out that in the first type of example there is also a: data_abort() at handle_el1h_sync+0x70 pc = 0x0061ad94 lr = 0x00607870 sp = 0x40238180 fp = 0x40238290 handle_el1h_sync() at pmap_enter+0x678 pc = 0x00607870 lr = 0x00615684 sp = 0x402382a0 fp = 0x402383b0 in what I showed. And around pmap_enter+0x678 happens to be: 0061566c b 00615688 00615670 and x8, x28, #0x1f 00615674 cmp x8, #0xb 00615678 b.ne00615688 0061567c ldr x0, [sp,#32] 00615680 orr w1, wzr, #0x1000 00615684 bl 00605884 00615688 ldrbw8, [x22,#93] 0061568c tbnzw8, #2, 006157a4 00615690 add x1, sp, #0x38 00615694 mov x0, x19 00615698 mov x24, x23 0061569c orr x23, x23, #0x100 006156a0 bl 00615f44 So again handle_el1h_sync happens at a bl to arm64_dcache_wb_range and ends up with a data_abort at handle_el1h_sync+0x70 . The context is pmap_enter instead of pmap_remove_pages. But an example of a pmap_remove_pages+0x2a8 context for handle_el1h_sync is also in the call chain for the first type of example that I originally showed. > The code around pmap_remove_pages+0x2a8 : > > 00617570 bl 005cf83c > > 00617574 ldr x9, [sp,#80] > 00617578 adrp x8, 00bbd000 > > 0061757c add x8, x8, #0x848 > 00617580 str x0, [sp,#48] > 00617584 cmp x9, x8 > 00617588 b.eq 006175a4 > > 0061758c ldr x8, [x18] > 00617590 ldr x8, [x8,#8] > 00617594 ldr x8, [x8,#512] > 00617598 ldr x8, [x8,#224] > 0061759c cmp x8, x9 > 006175a0 b.ne 006175d8 > > 006175a4 and x8, x22, #0x1f > 006175a8 cmp x28, #0x3 > 006175ac b.ne 006175c4 > > 006175b0 cmp x8, #0xb > 006175b4 b.ne 006175d8 > > 006175b8 ldr x0, [x24] > 006175bc orr w1, wzr, #0x1000 > 006175c0 b006175d4 > > 006175c4 cmp x8, #0x9 > 006175c8 b.ne 006175d8 > > 006175cc ldr x0, [x24] > 006175d0 orr w1, wzr, #0x20 > 006175d4 bl 00605884 > > 006175d8 mov x8, xzr > 006175dc orr w1, wzr, #0x8 > 006175e0 mov x0, x26 > 006175e4 ldxr x9, [x26] > 006175e8 stxr w10, x8, [x26] > 006175ec cbnz w10, 006175e4 > > 006175f0 bl 00605884 > > > So this happens to involve arm64_dcache_wb_range (that has > not started yet). I still have not replicated the example that in
Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]
On 2017-May-2, at 2:30 PM, Mark Millard wrote: > On 2017-May-2, at 2:22 PM, Mark Millard wrote: > >> It turns out that the bt's from the example panics are >> repeatable for the pc and lr sequence involved (but not >> the specific sp's and fp's involved). I report this in >> case it suggests anything. I'll note that the build had >> a production style kernel for a build of -r317015 . >> >> The first type of panic actually a back to back >> sequence of two bt's, this is the sleeping-thread type >> pf example. The second type is just one bt by itself. >> >> There is one variable lr in the bt for the sleeping-thread >> type of example (the first type of panic of the two shown >> later, the one with back-to-back bt's): >> >> 131,133c131,133 >> < handle_el0_sync() at 0x40040070 >> < pc = 0x006079e8 lr = 0x40040070 >> < sp = 0x65dfdba0 fp = 0xeb00 >> --- >>> handle_el0_sync() at 0x40044490 >>> pc = 0x006079e8 lr = 0x40044490 >>> sp = 0x40229ba0 fp = 0xe3d0 >> >> Otherwise the two bt's in the example match for the pc/lr >> sequence. >> >> I only have the two examples of this type to compare so >> far (one diff). >> >> I have 3 examples of the second type and they had no such >> variation. >> >> One thing in common to all 5 of these examples is the >> sequence: >> >> data_abort() at handle_el1h_sync+0x70 >> lr = 0x00607870 >> handle_el1h_sync() at pmap_remove_pages+0x2a8 >>pc = 0x00607870 lr = 0x006175d4 >> pmap_remove_pages() >> >> being involved in each example. >> >> >> I'm not saying that I can cause any panics at will, but >> when either of the two types happen the bt is (mostly) >> stable for the pc and lr sequence and that short >> sequence above is involved. >> >> I have seen one other type of panic but I did not manage >> to record a bt for it yet. It involved the instruction >> cache instead of arm64_dcache_wb_range . >> >> I quote the prior reported example bt's below. >> >> On 2017-May-2, at 5:24 AM, Mark Millard wrote: >> >>> On 2017-May-2, at 3:37 AM, Mark Millard wrote: >>> On 2017-May-2, at 2:53 AM, Mark Millard wrote: . . . FYI: I do sometimes get things like: System shutdown time has arrived Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock KDB: stack backtrace of thread 100093: sched_switch() at mi_switch+0x100 pc = 0x00347d44 lr = 0x00327358 sp = 0x40237e00 fp = 0x40237e20 mi_switch() at sleepq_wait+0x3c pc = 0x00327358 lr = 0x0036c174 sp = 0x40237e30 fp = 0x40237e50 sleepq_wait() at _sleep+0x29c pc = 0x0036c174 lr = 0x00326c7c sp = 0x40237e60 fp = 0x40237ee0 _sleep() at vm_page_sleep_if_busy+0xb0 pc = 0x00326c7c lr = 0x005cfcf4 sp = 0x40237ef0 fp = 0x40237f10 vm_page_sleep_if_busy() at vm_fault_hold+0xcc8 pc = 0x005cfcf4 lr = 0x005ba17c sp = 0x40237f20 fp = 0x40238070 vm_fault_hold() at vm_fault+0x70 pc = 0x005ba17c lr = 0x005b9464 sp = 0x40238080 fp = 0x402380b0 vm_fault() at data_abort+0xe0 pc = 0x005b9464 lr = 0x0061ad94 sp = 0x402380c0 fp = 0x40238170 data_abort() at handle_el1h_sync+0x70 pc = 0x0061ad94 lr = 0x00607870 sp = 0x40238180 fp = 0x40238290 handle_el1h_sync() at pmap_enter+0x678 pc = 0x00607870 lr = 0x00615684 sp = 0x402382a0 fp = 0x402383b0 pmap_enter() at vm_fault_hold+0x17c0 pc = 0x00615684 lr = 0x005bac74 sp = 0x402383c0 fp = 0x40238510 vm_fault_hold() at vm_fault+0x70 pc = 0x005bac74 lr = 0x005b9464 sp = 0x40238520 fp = 0x40238550 vm_fault() at data_abort+0xe0 pc = 0x005b9464 lr = 0x0061ad94 sp = 0x40238560 fp = 0x40238610 data_abort() at handle_el1h_sync+0x70 pc = 0x0061ad94 lr = 0x00607870 sp = 0x40238620 fp = 0x40238730 handle_el1h_sync() at pmap_remove_pages+0x2a8 pc = 0x00607870 lr = 0x006175d4 sp = 0x40238740 fp = 0x40238870 pmap_remove_pages() at vmspace_exit+0xb0 pc = 0x006175d4 lr = 0x005c020c sp = 0x40238880 fp = 0x402388b0 >>
Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]
On 2017-May-2, at 2:22 PM, Mark Millard wrote: > It turns out that the bt's from the example panics are > repeatable for the pc and lr sequence involved (but not > the specific sp's and fp's involved). I report this in > case it suggests anything. I'll note that the build had > a production style kernel for a build of -r317015 . > > The first type of panic actually a back to back > sequence of two bt's, this is the sleeping-thread type > pf example. The second type is just one bt by itself. > > There is one variable lr in the bt for the sleeping-thread > type of example (the first type of panic of the two shown > later, the one with back-to-back bt's): > > 131,133c131,133 > < handle_el0_sync() at 0x40040070 > < pc = 0x006079e8 lr = 0x40040070 > < sp = 0x65dfdba0 fp = 0xeb00 > --- >> handle_el0_sync() at 0x40044490 >> pc = 0x006079e8 lr = 0x40044490 >> sp = 0x40229ba0 fp = 0xe3d0 > > Otherwise the two bt's in the example match for the pc/lr > sequence. > > I only have the two examples of this type to compare so > far (one diff). > > I have 3 examples of the second type and they had no such > variation. > > One thing in common to all 5 of these examples is the > sequence: > > data_abort() at handle_el1h_sync+0x70 > lr = 0x00607870 > handle_el1h_sync() at pmap_remove_pages+0x2a8 > pc = 0x00607870 lr = 0x006175d4 > pmap_remove_pages() > > being involved in each example. > > > I'm not saying that I can cause any panics at will, but > when either of the two types happen the bt is (mostly) > stable for the pc and lr sequence and that short > sequence above is involved. > > I have seen one other type of panic but I did not manage > to record a bt for it yet. It involved the instruction > cache instead of arm64_dcache_wb_range . > > I quote the prior reported example bt's below. > > On 2017-May-2, at 5:24 AM, Mark Millard wrote: > >> On 2017-May-2, at 3:37 AM, Mark Millard wrote: >> >>> On 2017-May-2, at 2:53 AM, Mark Millard wrote: >>> >>> . . . >>> FYI: >>> >>> I do sometimes get things like: >>> >>> >>> System shutdown time has arrived >>> Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: >>> Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock >>> KDB: stack backtrace of thread 100093: >>> sched_switch() at mi_switch+0x100 >>> pc = 0x00347d44 lr = 0x00327358 >>> sp = 0x40237e00 fp = 0x40237e20 >>> >>> mi_switch() at sleepq_wait+0x3c >>> pc = 0x00327358 lr = 0x0036c174 >>> sp = 0x40237e30 fp = 0x40237e50 >>> >>> sleepq_wait() at _sleep+0x29c >>> pc = 0x0036c174 lr = 0x00326c7c >>> sp = 0x40237e60 fp = 0x40237ee0 >>> >>> _sleep() at vm_page_sleep_if_busy+0xb0 >>> pc = 0x00326c7c lr = 0x005cfcf4 >>> sp = 0x40237ef0 fp = 0x40237f10 >>> >>> vm_page_sleep_if_busy() at vm_fault_hold+0xcc8 >>> pc = 0x005cfcf4 lr = 0x005ba17c >>> sp = 0x40237f20 fp = 0x40238070 >>> >>> vm_fault_hold() at vm_fault+0x70 >>> pc = 0x005ba17c lr = 0x005b9464 >>> sp = 0x40238080 fp = 0x402380b0 >>> >>> vm_fault() at data_abort+0xe0 >>> pc = 0x005b9464 lr = 0x0061ad94 >>> sp = 0x402380c0 fp = 0x40238170 >>> >>> data_abort() at handle_el1h_sync+0x70 >>> pc = 0x0061ad94 lr = 0x00607870 >>> sp = 0x40238180 fp = 0x40238290 >>> >>> handle_el1h_sync() at pmap_enter+0x678 >>> pc = 0x00607870 lr = 0x00615684 >>> sp = 0x402382a0 fp = 0x402383b0 >>> >>> pmap_enter() at vm_fault_hold+0x17c0 >>> pc = 0x00615684 lr = 0x005bac74 >>> sp = 0x402383c0 fp = 0x40238510 >>> >>> vm_fault_hold() at vm_fault+0x70 >>> pc = 0x005bac74 lr = 0x005b9464 >>> sp = 0x40238520 fp = 0x40238550 >>> >>> vm_fault() at data_abort+0xe0 >>> pc = 0x005b9464 lr = 0x0061ad94 >>> sp = 0x40238560 fp = 0x40238610 >>> >>> data_abort() at handle_el1h_sync+0x70 >>> pc = 0x0061ad94 lr = 0x00607870 >>> sp = 0x40238620 fp = 0x40238730 >>> >>> handle_el1h_sync() at pmap_remove_pages+0x2a8 >>> pc = 0x00607870 lr = 0x006175d4 >>> sp = 0x40238740 fp = 0x40238870 >>> >>> pmap_remove_pages() at vmspace_exit+0xb0 >>> pc = 0x006175d4 lr = 0x005c020c >>> sp = 0x40238880 fp = 0x402388b0 >>> >>> vmspace_exit() at exit1+0x604 >>> pc = 0x005c020c lr = 0x002db5e0 >>> sp = 0x402388c0 fp = 0x40238920 >
Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]
It turns out that the bt's from the example panics are repeatable for the pc and lr sequence involved (but not the specific sp's and fp's involved). I report this in case it suggests anything. I'll note that the build had a production style kernel for a build of -r317015 . The first type of panic actually a back to back sequence of two bt's, this is the sleeping-thread type pf example. The second type is just one bt by itself. There is one variable lr in the bt for the sleeping-thread type of example (the first type of panic of the two shown later, the one with back-to-back bt's): 131,133c131,133 < handle_el0_sync() at 0x40040070handle_el0_sync() at 0x40044490 >pc = 0x006079e8 lr = 0x40044490 >sp = 0x40229ba0 fp = 0xe3d0 Otherwise the two bt's in the example match for the pc/lr sequence. I only have the two examples of this type to compare so far (one diff). I have 3 examples of the second type and they had no such variation. One thing in common to all 5 of these examples is the sequence: data_abort() at handle_el1h_sync+0x70 lr = 0x00607870 handle_el1h_sync() at pmap_remove_pages+0x2a8 pc = 0x00607870 lr = 0x006175d4 pmap_remove_pages() being involved in each example. I'm not saying that I can cause any panics at will, but when either of the two types happen the bt is (mostly) stable for the pc and lr sequence and that short sequence above is involved. I have seen one other type of panic but I did not manage to record a bt for it yet. It involved the instruction cache instead of arm64_dcache_wb_range . I quote the prior reported example bt's below. On 2017-May-2, at 5:24 AM, Mark Millard wrote: > On 2017-May-2, at 3:37 AM, Mark Millard wrote: > >> On 2017-May-2, at 2:53 AM, Mark Millard wrote: >> >> . . . >> FYI: >> >> I do sometimes get things like: >> >> >> System shutdown time has arrived >> Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: >> Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock >> KDB: stack backtrace of thread 100093: >> sched_switch() at mi_switch+0x100 >> pc = 0x00347d44 lr = 0x00327358 >> sp = 0x40237e00 fp = 0x40237e20 >> >> mi_switch() at sleepq_wait+0x3c >> pc = 0x00327358 lr = 0x0036c174 >> sp = 0x40237e30 fp = 0x40237e50 >> >> sleepq_wait() at _sleep+0x29c >> pc = 0x0036c174 lr = 0x00326c7c >> sp = 0x40237e60 fp = 0x40237ee0 >> >> _sleep() at vm_page_sleep_if_busy+0xb0 >> pc = 0x00326c7c lr = 0x005cfcf4 >> sp = 0x40237ef0 fp = 0x40237f10 >> >> vm_page_sleep_if_busy() at vm_fault_hold+0xcc8 >> pc = 0x005cfcf4 lr = 0x005ba17c >> sp = 0x40237f20 fp = 0x40238070 >> >> vm_fault_hold() at vm_fault+0x70 >> pc = 0x005ba17c lr = 0x005b9464 >> sp = 0x40238080 fp = 0x402380b0 >> >> vm_fault() at data_abort+0xe0 >> pc = 0x005b9464 lr = 0x0061ad94 >> sp = 0x402380c0 fp = 0x40238170 >> >> data_abort() at handle_el1h_sync+0x70 >> pc = 0x0061ad94 lr = 0x00607870 >> sp = 0x40238180 fp = 0x40238290 >> >> handle_el1h_sync() at pmap_enter+0x678 >> pc = 0x00607870 lr = 0x00615684 >> sp = 0x402382a0 fp = 0x402383b0 >> >> pmap_enter() at vm_fault_hold+0x17c0 >> pc = 0x00615684 lr = 0x005bac74 >> sp = 0x402383c0 fp = 0x40238510 >> >> vm_fault_hold() at vm_fault+0x70 >> pc = 0x005bac74 lr = 0x005b9464 >> sp = 0x40238520 fp = 0x40238550 >> >> vm_fault() at data_abort+0xe0 >> pc = 0x005b9464 lr = 0x0061ad94 >> sp = 0x40238560 fp = 0x40238610 >> >> data_abort() at handle_el1h_sync+0x70 >> pc = 0x0061ad94 lr = 0x00607870 >> sp = 0x40238620 fp = 0x40238730 >> >> handle_el1h_sync() at pmap_remove_pages+0x2a8 >> pc = 0x00607870 lr = 0x006175d4 >> sp = 0x40238740 fp = 0x40238870 >> >> pmap_remove_pages() at vmspace_exit+0xb0 >> pc = 0x006175d4 lr = 0x005c020c >> sp = 0x40238880 fp = 0x402388b0 >> >> vmspace_exit() at exit1+0x604 >> pc = 0x005c020c lr = 0x002db5e0 >> sp = 0x402388c0 fp = 0x40238920 >> >> exit1() at sys_sys_exit+0x10 >> pc = 0x002db5e0 lr = 0x002dafd8 >> sp = 0x40238930 fp = 0x40238930 >> >> sys_sys_exit() at do_el0_sync+0xa48 >>