Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]

2017-05-02 Thread Mark Millard
FYI: differences in FreeBSD's code vs.:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/BABJDBHI.html

and its "Example 11.5.  Cleaning by Virtual Address". . .

(I do not claim to know that the differences
point to any error. But I note them in case
they prompt something. A couple of points
do seem odd to me for the FreeBSD code.)

The ARM example code does:


DSB ISH

DSB ISH
ISB

This has the property of forcing visibility only
after everything is available to be visible
(DSB ISH being referenced here). (The FreeBSD
code does not have this property, but enforces
visibility of a mix of old and new as it
goes along.)

It is point-of-unification code (data and instruction)
for its purpose.

The FreeBSD arm64_dcache__range code always
pairs:

DC CAVAC (or CIVAC or IVAC)
DSB ISH

inside it loop, forcing visibility of its
intermediate state as it goes along. It
is also point-of-coherency instead of point
of unification (because of its purpose).

The FreeBSD arm64_idcache_wbinv_range code
always does the sequence of 4:

DC CIVAC
DSB ISH
IC IVAU
DSB ISH

inside its loop and after the loop does
one:

ISB

This is is a mix of point-of-coherency and
point of unification code that forces
visibility (DSB ISH) as it goes along, not
just after the overall loop.

To me the mix of point-of-coherency and
point-of-unification seems strange.

The FreeBSD arm64_icache_sync_range code
always does the sequence of 4:

DC CIVAU
DSB ISH
IC IVAU
DSB ISH

inside its loop and after the loop does
one:

ISB

This is the closest to the "Example 11.5.
Cleaning by Virtual Address" code and its
purpose. It is all point-of-unification
code that forces visibility (DSB ISH) as
it goes long, not just after the overall
loop.


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]

2017-05-02 Thread Mark Millard
On 2017-May-2, at 2:59 PM, Mark Millard  wrote:

> The code around handle_el1h_sync+0x70 :
> 
> 00607804  sub sp, sp, #0x80
> 00607808  sub sp, sp, #0x120
> 0060780c  stp x29, x30, [sp,#272]
> 00607810  stpx28, x29, [sp,#256]
> 00607814  stpx26, x27, [sp,#240]
> 00607818  stpx24, x25, [sp,#224]
> 0060781c  stpx22, x23, [sp,#208]
> 00607820  stpx20, x21, [sp,#192]
> 00607824  stpx18, x19, [sp,#176]
> 00607828  stpx16, x17, [sp,#160]
> 0060782c  stpx14, x15, [sp,#144]
> 00607830  stpx12, x13, [sp,#128]
> 00607834  stpx10, x11, [sp,#112]
> 00607838  stpx8, x9, [sp,#96]
> 0060783c  stpx6, x7, [sp,#80]
> 00607840  stpx4, x5, [sp,#64]
> 00607844  stpx2, x3, [sp,#48]
> 00607848  stpx0, x1, [sp,#32]
> 0060784c  mrsx10, elr_el1
> 00607850  mrsx11, spsr_el1
> 00607854  mrsx12, esr_el1
> 00607858  strx10, [sp,#16]
> 0060785c  stpw11, w12, [sp,#24]
> 00607860  stpx18, x30, [sp]
> 00607864  mrsx18, tpidr_el1
> 00607868  addx29, sp, #0x110
> 0060786c  movx0, sp
> 00607870  bl 0061aad8 
> 
> 00607874  msrdaifset, #0x2
> 00607878  ldpx18, x30, [sp]
> 0060787c  ldpx10, x11, [sp,#16]
> 00607880  msrspsr_el1, x11
> 00607884  msrelr_el1, x10
> 00607888  ldpx0, x1, [sp,#32]
> 0060788c  ldpx2, x3, [sp,#48]
> 00607890  ldpx4, x5, [sp,#64]
> 00607894  ldpx6, x7, [sp,#80]
> 00607898  ldpx8, x9, [sp,#96]
> 0060789c  ldpx10, x11, [sp,#112]
> 006078a0  ldpx12, x13, [sp,#128]
> 006078a4  ldpx14, x15, [sp,#144]
> 006078a8  ldpx16, x17, [sp,#160]
> 006078ac  ldrx29, [sp,#264]
> 006078b0  movsp, x18
> 006078b4  mrsx18, tpidr_el1
> 006078b8  eret
> 
> So the bl to do_el1h_sync apparently gets the data_abort.

It turns out that in the first type of example there
is also a:

data_abort() at handle_el1h_sync+0x70
 pc = 0x0061ad94  lr = 0x00607870
 sp = 0x40238180  fp = 0x40238290
handle_el1h_sync() at pmap_enter+0x678
 pc = 0x00607870  lr = 0x00615684
 sp = 0x402382a0  fp = 0x402383b0

in what I showed. And around pmap_enter+0x678
happens to be:

0061566c  b   00615688 
00615670  and x8, x28, #0x1f
00615674  cmp x8, #0xb
00615678  b.ne00615688 

0061567c  ldr x0, [sp,#32]
00615680  orr w1, wzr, #0x1000
00615684  bl  00605884 
00615688  ldrbw8, [x22,#93]
0061568c  tbnzw8, #2, 006157a4 

00615690  add x1, sp, #0x38
00615694  mov x0, x19
00615698  mov x24, x23
0061569c  orr x23, x23, #0x100
006156a0  bl  00615f44 

So again handle_el1h_sync happens at a bl to
arm64_dcache_wb_range and ends up with a
data_abort at handle_el1h_sync+0x70 .

The context is pmap_enter instead of
pmap_remove_pages.

But an example of a pmap_remove_pages+0x2a8
context for handle_el1h_sync is also in the
call chain for the first type of example
that I originally showed.

> The code around pmap_remove_pages+0x2a8 :
> 
> 00617570  bl   005cf83c 
> 
> 00617574 

Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]

2017-05-02 Thread Mark Millard

On 2017-May-2, at 2:30 PM, Mark Millard  wrote:

> On 2017-May-2, at 2:22 PM, Mark Millard  wrote:
> 
>> It turns out that the bt's from the example panics are
>> repeatable for the pc and lr sequence involved (but not
>> the specific sp's and fp's involved). I report this in
>> case it suggests anything. I'll note that the build had
>> a production style kernel for a build of -r317015 .
>> 
>> The first type of panic actually a back to back
>> sequence of two bt's, this is the sleeping-thread type
>> pf example. The second type is just one bt by itself.
>> 
>> There is one variable lr in the bt for the sleeping-thread
>> type of example (the first type of panic of the two shown
>> later, the one with back-to-back bt's):
>> 
>> 131,133c131,133
>> < handle_el0_sync() at 0x40040070
>> < pc = 0x006079e8  lr = 0x40040070
>> < sp = 0x65dfdba0  fp = 0xeb00
>> ---
>>> handle_el0_sync() at 0x40044490
>>>  pc = 0x006079e8  lr = 0x40044490
>>>  sp = 0x40229ba0  fp = 0xe3d0
>> 
>> Otherwise the two bt's in the example match for the pc/lr
>> sequence.
>> 
>> I only have the two examples of this type to compare so
>> far (one diff).
>> 
>> I have 3 examples of the second type and they had no such
>> variation.
>> 
>> One thing in common to all 5 of these examples is the
>> sequence:
>> 
>> data_abort() at handle_el1h_sync+0x70
>> lr = 0x00607870
>> handle_el1h_sync() at pmap_remove_pages+0x2a8
>>pc = 0x00607870  lr = 0x006175d4
>> pmap_remove_pages()
>> 
>> being involved in each example.
>> 
>> 
>> I'm not saying that I can cause any panics at will, but
>> when either of the two types happen the bt is (mostly)
>> stable for the pc and lr sequence and that short
>> sequence above is involved.
>> 
>> I have seen one other type of panic but I did not manage
>> to record a bt for it yet. It involved the instruction
>> cache instead of arm64_dcache_wb_range .
>> 
>> I quote the prior reported example bt's below.
>> 
>> On 2017-May-2, at 5:24 AM, Mark Millard  wrote:
>> 
>>> On 2017-May-2, at 3:37 AM, Mark Millard  wrote:
>>> 
 On 2017-May-2, at 2:53 AM, Mark Millard  wrote:
 
 . . .
 FYI:
 
 I do sometimes get things like:
 
 
 System shutdown time has arrived
 Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: 
 Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock
 KDB: stack backtrace of thread 100093:
 sched_switch() at mi_switch+0x100
 pc = 0x00347d44  lr = 0x00327358
 sp = 0x40237e00  fp = 0x40237e20
 
 mi_switch() at sleepq_wait+0x3c
 pc = 0x00327358  lr = 0x0036c174
 sp = 0x40237e30  fp = 0x40237e50
 
 sleepq_wait() at _sleep+0x29c
 pc = 0x0036c174  lr = 0x00326c7c
 sp = 0x40237e60  fp = 0x40237ee0
 
 _sleep() at vm_page_sleep_if_busy+0xb0
 pc = 0x00326c7c  lr = 0x005cfcf4
 sp = 0x40237ef0  fp = 0x40237f10
 
 vm_page_sleep_if_busy() at vm_fault_hold+0xcc8
 pc = 0x005cfcf4  lr = 0x005ba17c
 sp = 0x40237f20  fp = 0x40238070
 
 vm_fault_hold() at vm_fault+0x70
 pc = 0x005ba17c  lr = 0x005b9464
 sp = 0x40238080  fp = 0x402380b0
 
 vm_fault() at data_abort+0xe0
 pc = 0x005b9464  lr = 0x0061ad94
 sp = 0x402380c0  fp = 0x40238170
 
 data_abort() at handle_el1h_sync+0x70
 pc = 0x0061ad94  lr = 0x00607870
 sp = 0x40238180  fp = 0x40238290
 
 handle_el1h_sync() at pmap_enter+0x678
 pc = 0x00607870  lr = 0x00615684
 sp = 0x402382a0  fp = 0x402383b0
 
 pmap_enter() at vm_fault_hold+0x17c0
 pc = 0x00615684  lr = 0x005bac74
 sp = 0x402383c0  fp = 0x40238510
 
 vm_fault_hold() at vm_fault+0x70
 pc = 0x005bac74  lr = 0x005b9464
 sp = 0x40238520  fp = 0x40238550
 
 vm_fault() at data_abort+0xe0
 pc = 0x005b9464  lr = 0x0061ad94
 sp = 0x40238560  fp = 0x40238610
 
 data_abort() at handle_el1h_sync+0x70
 pc = 0x0061ad94  lr = 0x00607870
 sp = 0x40238620  fp = 0x40238730
 
 handle_el1h_sync() at pmap_remove_pages+0x2a8
 pc = 0x00607870  lr = 0x006175d4
 sp = 0x40238740  fp = 0x40238870
 
 pmap_remove_pages() at vmspace_exit+0xb0
 pc = 0x006175d4  lr = 0x005c020c
 sp = 0x40238880  fp = 0x402388b0
 

Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]

2017-05-02 Thread Mark Millard
On 2017-May-2, at 2:22 PM, Mark Millard  wrote:

> It turns out that the bt's from the example panics are
> repeatable for the pc and lr sequence involved (but not
> the specific sp's and fp's involved). I report this in
> case it suggests anything. I'll note that the build had
> a production style kernel for a build of -r317015 .
> 
> The first type of panic actually a back to back
> sequence of two bt's, this is the sleeping-thread type
> pf example. The second type is just one bt by itself.
> 
> There is one variable lr in the bt for the sleeping-thread
> type of example (the first type of panic of the two shown
> later, the one with back-to-back bt's):
> 
> 131,133c131,133
> < handle_el0_sync() at 0x40040070
> <  pc = 0x006079e8  lr = 0x40040070
> <  sp = 0x65dfdba0  fp = 0xeb00
> ---
>> handle_el0_sync() at 0x40044490
>>   pc = 0x006079e8  lr = 0x40044490
>>   sp = 0x40229ba0  fp = 0xe3d0
> 
> Otherwise the two bt's in the example match for the pc/lr
> sequence.
> 
> I only have the two examples of this type to compare so
> far (one diff).
> 
> I have 3 examples of the second type and they had no such
> variation.
> 
> One thing in common to all 5 of these examples is the
> sequence:
> 
> data_abort() at handle_el1h_sync+0x70
>  lr = 0x00607870
> handle_el1h_sync() at pmap_remove_pages+0x2a8
> pc = 0x00607870  lr = 0x006175d4
> pmap_remove_pages()
> 
> being involved in each example.
> 
> 
> I'm not saying that I can cause any panics at will, but
> when either of the two types happen the bt is (mostly)
> stable for the pc and lr sequence and that short
> sequence above is involved.
> 
> I have seen one other type of panic but I did not manage
> to record a bt for it yet. It involved the instruction
> cache instead of arm64_dcache_wb_range .
> 
> I quote the prior reported example bt's below.
> 
> On 2017-May-2, at 5:24 AM, Mark Millard  wrote:
> 
>> On 2017-May-2, at 3:37 AM, Mark Millard  wrote:
>> 
>>> On 2017-May-2, at 2:53 AM, Mark Millard  wrote:
>>> 
>>> . . .
>>> FYI:
>>> 
>>> I do sometimes get things like:
>>> 
>>> 
>>> System shutdown time has arrived
>>> Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: 
>>> Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock
>>> KDB: stack backtrace of thread 100093:
>>> sched_switch() at mi_switch+0x100
>>>  pc = 0x00347d44  lr = 0x00327358
>>>  sp = 0x40237e00  fp = 0x40237e20
>>> 
>>> mi_switch() at sleepq_wait+0x3c
>>>  pc = 0x00327358  lr = 0x0036c174
>>>  sp = 0x40237e30  fp = 0x40237e50
>>> 
>>> sleepq_wait() at _sleep+0x29c
>>>  pc = 0x0036c174  lr = 0x00326c7c
>>>  sp = 0x40237e60  fp = 0x40237ee0
>>> 
>>> _sleep() at vm_page_sleep_if_busy+0xb0
>>>  pc = 0x00326c7c  lr = 0x005cfcf4
>>>  sp = 0x40237ef0  fp = 0x40237f10
>>> 
>>> vm_page_sleep_if_busy() at vm_fault_hold+0xcc8
>>>  pc = 0x005cfcf4  lr = 0x005ba17c
>>>  sp = 0x40237f20  fp = 0x40238070
>>> 
>>> vm_fault_hold() at vm_fault+0x70
>>>  pc = 0x005ba17c  lr = 0x005b9464
>>>  sp = 0x40238080  fp = 0x402380b0
>>> 
>>> vm_fault() at data_abort+0xe0
>>>  pc = 0x005b9464  lr = 0x0061ad94
>>>  sp = 0x402380c0  fp = 0x40238170
>>> 
>>> data_abort() at handle_el1h_sync+0x70
>>>  pc = 0x0061ad94  lr = 0x00607870
>>>  sp = 0x40238180  fp = 0x40238290
>>> 
>>> handle_el1h_sync() at pmap_enter+0x678
>>>  pc = 0x00607870  lr = 0x00615684
>>>  sp = 0x402382a0  fp = 0x402383b0
>>> 
>>> pmap_enter() at vm_fault_hold+0x17c0
>>>  pc = 0x00615684  lr = 0x005bac74
>>>  sp = 0x402383c0  fp = 0x40238510
>>> 
>>> vm_fault_hold() at vm_fault+0x70
>>>  pc = 0x005bac74  lr = 0x005b9464
>>>  sp = 0x40238520  fp = 0x40238550
>>> 
>>> vm_fault() at data_abort+0xe0
>>>  pc = 0x005b9464  lr = 0x0061ad94
>>>  sp = 0x40238560  fp = 0x40238610
>>> 
>>> data_abort() at handle_el1h_sync+0x70
>>>  pc = 0x0061ad94  lr = 0x00607870
>>>  sp = 0x40238620  fp = 0x40238730
>>> 
>>> handle_el1h_sync() at pmap_remove_pages+0x2a8
>>>  pc = 0x00607870  lr = 0x006175d4
>>>  sp = 0x40238740  fp = 0x40238870
>>> 
>>> pmap_remove_pages() at vmspace_exit+0xb0
>>>  pc = 0x006175d4  lr = 0x005c020c
>>>  sp = 0x40238880  fp = 0x402388b0
>>> 
>>> vmspace_exit() at exit1+0x604
>>>  pc = 0x005c020c  lr = 0x002db5e0
>>>  sp = 0x402388c0  fp = 0x40238920

Re: FYI: [My FreeBSD-12.0-CURRENT-arm64-aarch64.raw ] under qemu-system-aarch64 on odroid-c2 under UbuntuMate : [A combination that boots but gets some panics]

2017-05-02 Thread Mark Millard
It turns out that the bt's from the example panics are
repeatable for the pc and lr sequence involved (but not
the specific sp's and fp's involved). I report this in
case it suggests anything. I'll note that the build had
a production style kernel for a build of -r317015 .

The first type of panic actually a back to back
sequence of two bt's, this is the sleeping-thread type
pf example. The second type is just one bt by itself.

There is one variable lr in the bt for the sleeping-thread
type of example (the first type of panic of the two shown
later, the one with back-to-back bt's):

131,133c131,133
< handle_el0_sync() at 0x40040070
 handle_el0_sync() at 0x40044490
>pc = 0x006079e8  lr = 0x40044490
>sp = 0x40229ba0  fp = 0xe3d0

Otherwise the two bt's in the example match for the pc/lr
sequence.

I only have the two examples of this type to compare so
far (one diff).

I have 3 examples of the second type and they had no such
variation.

One thing in common to all 5 of these examples is the
sequence:

data_abort() at handle_el1h_sync+0x70
  lr = 0x00607870
handle_el1h_sync() at pmap_remove_pages+0x2a8
 pc = 0x00607870  lr = 0x006175d4
pmap_remove_pages()

being involved in each example.


I'm not saying that I can cause any panics at will, but
when either of the two types happen the bt is (mostly)
stable for the pc and lr sequence and that short
sequence above is involved.

I have seen one other type of panic but I did not manage
to record a bt for it yet. It involved the instruction
cache instead of arm64_dcache_wb_range .

I quote the prior reported example bt's below.

On 2017-May-2, at 5:24 AM, Mark Millard  wrote:

> On 2017-May-2, at 3:37 AM, Mark Millard  wrote:
> 
>> On 2017-May-2, at 2:53 AM, Mark Millard  wrote:
>> 
>> . . .
>> FYI:
>> 
>> I do sometimes get things like:
>> 
>> 
>> System shutdown time has arrived
>> Apr 30 19:43:15 ODC2FBSD shutdown: power-down by root: 
>> Sleeping thread (tid 100093, pid 708) owns a non-sleepable lock
>> KDB: stack backtrace of thread 100093:
>> sched_switch() at mi_switch+0x100
>>   pc = 0x00347d44  lr = 0x00327358
>>   sp = 0x40237e00  fp = 0x40237e20
>> 
>> mi_switch() at sleepq_wait+0x3c
>>   pc = 0x00327358  lr = 0x0036c174
>>   sp = 0x40237e30  fp = 0x40237e50
>> 
>> sleepq_wait() at _sleep+0x29c
>>   pc = 0x0036c174  lr = 0x00326c7c
>>   sp = 0x40237e60  fp = 0x40237ee0
>> 
>> _sleep() at vm_page_sleep_if_busy+0xb0
>>   pc = 0x00326c7c  lr = 0x005cfcf4
>>   sp = 0x40237ef0  fp = 0x40237f10
>> 
>> vm_page_sleep_if_busy() at vm_fault_hold+0xcc8
>>   pc = 0x005cfcf4  lr = 0x005ba17c
>>   sp = 0x40237f20  fp = 0x40238070
>> 
>> vm_fault_hold() at vm_fault+0x70
>>   pc = 0x005ba17c  lr = 0x005b9464
>>   sp = 0x40238080  fp = 0x402380b0
>> 
>> vm_fault() at data_abort+0xe0
>>   pc = 0x005b9464  lr = 0x0061ad94
>>   sp = 0x402380c0  fp = 0x40238170
>> 
>> data_abort() at handle_el1h_sync+0x70
>>   pc = 0x0061ad94  lr = 0x00607870
>>   sp = 0x40238180  fp = 0x40238290
>> 
>> handle_el1h_sync() at pmap_enter+0x678
>>   pc = 0x00607870  lr = 0x00615684
>>   sp = 0x402382a0  fp = 0x402383b0
>> 
>> pmap_enter() at vm_fault_hold+0x17c0
>>   pc = 0x00615684  lr = 0x005bac74
>>   sp = 0x402383c0  fp = 0x40238510
>> 
>> vm_fault_hold() at vm_fault+0x70
>>   pc = 0x005bac74  lr = 0x005b9464
>>   sp = 0x40238520  fp = 0x40238550
>> 
>> vm_fault() at data_abort+0xe0
>>   pc = 0x005b9464  lr = 0x0061ad94
>>   sp = 0x40238560  fp = 0x40238610
>> 
>> data_abort() at handle_el1h_sync+0x70
>>   pc = 0x0061ad94  lr = 0x00607870
>>   sp = 0x40238620  fp = 0x40238730
>> 
>> handle_el1h_sync() at pmap_remove_pages+0x2a8
>>   pc = 0x00607870  lr = 0x006175d4
>>   sp = 0x40238740  fp = 0x40238870
>> 
>> pmap_remove_pages() at vmspace_exit+0xb0
>>   pc = 0x006175d4  lr = 0x005c020c
>>   sp = 0x40238880  fp = 0x402388b0
>> 
>> vmspace_exit() at exit1+0x604
>>   pc = 0x005c020c  lr = 0x002db5e0
>>   sp = 0x402388c0  fp = 0x40238920
>> 
>> exit1() at sys_sys_exit+0x10
>>   pc = 0x002db5e0  lr = 0x002dafd8
>>   sp = 0x40238930  fp = 0x40238930
>> 
>> sys_sys_exit() at do_el0_sync+0xa48
>>