On Wed, Jul 27, 2016 at 08:06:57PM +0200, Patrick Wildt wrote:
> On Tue, Jul 26, 2016 at 06:43:18PM +0200, Patrick Wildt wrote:
> > Hi,
> > 
> > I've been trying to debug a pmap issue on ARM.  I have no solution, but
> > I would like to share my findings so far.
> > 
> > First of all, the panic/uvm_fault:
> > 
> > 
> > login:
> > uvm_fault(0xca06c858, 0, 1, 0) -> e
> > Fatal kernel mode data abort: 'Translation Fault (P)'
> > trapframe: 0xc986bd98
> > DFSR=00000007, DFAR=0000000c, spsr=60000113
> > r0 =00000000, r1 =ccaa7148, r2 =00136000, r3 =00000000
> > r4 =00000012, r5 =ccaa7148, r6 =32130000, r7 =00000005
> > r8 =3213000e, r9 =00000021, r10=cc634698, r11=c986be34
> > r12=cd43a9a0, ssp=c986bde8, slr=c054c1d0, pc =c054c1d0
> > 
> > Stopped at      pmap_enter+0x178:       ldr     r6, [r0, #0x00c]
> > ddb> trace
> > pmap_enter+0xc
> >         scp=0xc054c064 rlv=0xc04f60c0 (uvm_fault+0xa28)
> >         rsp=0xc986be38 rfp=0xc986bf54
> >         r10=0x00000000 r9=0x00000001 r8=0x00000000 r7=0xc050fe80
> >         r6=0x00000000 r5=0xc4a5aad0 r4=0x00000001
> > uvm_fault+0xc
> >         scp=0xc04f56a4 rlv=0xc0547e68 (data_abort_handler+0x264)
> >         rsp=0xc986bf58 rfp=0xc986bfac
> >         r10=0xc986bfb0 r9=0x00000007 r8=0xc986a000 r7=0x00000001
> >         r6=0xca06c858 r5=0x00000001 r4=0x00136000
> > data_abort_handler+0xc
> >         scp=0xc0547c10 rlv=0xc05477bc (exception_exit)
> >         rsp=0xc986bfb0 rfp=0xbffc2430
> >         r10=0x00da776c r9=0x00000001 r8=0x60165dbc r7=0x00000000
> >         r6=0x00000200 r5=0x00d0c920 r4=0x4188e9f4
> > 
> > The faulting instruction is in pmap_enter() of pamp7.c:
> > 
> >                         if (opg) {
> >                                 /*
> >                                  * Replacing an existing mapping with a new 
> > one.
> >                                  * It is part of our managed memory so we
> >                                  * must remove it from the PV list
> >                                  */
> >                                 pve = pmap_remove_pv(opg, pm, va);
> >     pve is NULL ->              oflags = pve->pv_flags;
> >                         } else
> > 
> > How did we come here?  pmap_enter() was called to enter mapping for a
> > given virtual and physical address.  The VA is 0x136000.  Then it looked
> > if there is already a mapping at the given address.  Turns out there is!
> > opte (old pagetable entry) is set to 0x429f202c.  This is an "invalid"
> > (means not active) page entry pointing to the physical page 0x429f2000.
> > Using the vm_pages struct for that physical address, we can have a look
> > at its virtual addresses.
> > 
> > vm_pages:
> > 0xc4f86754:     429f1000
> > 0xc4f86758:     0
> > 0xc4f8675c:     0
> > 0xc4f86760:     3
> > 0xc4f86764:     0
> > 0xc4f86768:     0
> > 0xc4f8676c:     0
> > 0xc4f86770:     c47263f0
> > 0xc4f86774:     c4f86590
> > 0xc4f86778:     c4a721d0
> > 0xc4f8677c:     c520d200
> > 0xc4f86780:     0
> > 0xc4f86784:     0
> > 0xc4f86788:     ca32b230
> > 0xc4f8678c:     0
> > 0xc4f86790:     0
> > 0xc4f86794:     0
> > 0xc4f86798:     140000
> > 0xc4f8679c:     41
> > 0xc4f867a0:     0
> > 0xc4f867a4:     429f2000  <-- vm_page.phys_addr
> > 0xc4f867a8:     0
> > 0xc4f867ac:     cd43a9a0  <-- vm_page.mdpage.pvh_list
> > 0xc4f867b0:     3
> > 0xc4f867b4:     0
> > 0xc4f867b8:     0
> > 0xc4f867bc:     0
> > 0xc4f867c0:     ffffffff
> > 0xc4f867c4:     ffffffff
> > 0xc4f867c8:     c4f50760
> > 0xc4f867cc:     c4c67990
> > 0xc4f867d0:     c50374d0
> > 0xc4f867d4:     0
> > 0xc4f867d8:     0
> > 0xc4f867dc:     c0714248
> > 0xc4f867e0:     cf2c6000
> > 0xc4f867e4:     0
> > 0xc4f867e8:     4d
> > 0xc4f867ec:     385
> > 0xc4f867f0:     1
> > 0xc4f867f4:     429f3000
> > 
> > vm_page.mdpage.pvh_list:
> > 0xcd43a9a0:     0 <-- next ptr in list
> > 0xcd43a9a4:     ccaa7148 <- pmap
> > 0xcd43a9a8:     717c7000 <- va
> > 0xcd43a9ac:     b <- flags
> > 
> > This means, the physical page already has a VA.  Oh, and it belongs
> > to the same pmap.  Have a look at register r5, that's the pmap we want
> > to enter a mapping for.  But it's not the kernel pmap:
> > 
> > ddb> print kernel_pmap_store
> > c073b8f4
> > 
> > So wait, the only VA for that physical address is 0x717c7000; but the
> > VA I used to look the physical page up is 0x136000.  That is weird.
> > 
> > So now we got a pmap_enter() for a physical address and a virtual
> > address, where the virtual address is already used by the same pmap
> > for another physical address.  But that physical address thinks a
> > completely different VA is using it.
> > 
> > The phys address behind the mapping is 0x429f2000.  The phys address
> > it wanted to map the VA to is in R6: 0x32130000.
> > 
> > That's probably not enough to find the exact fault, but it's a writeup
> > of what I have found so far.
> > 
> > Patrick
> 
> So, I found a suspicious place, added printfs, triggered the bug again
> and voila, I think I got it.
> 
> I was looking for places where the PTE for a given VA were to be zeroed
> and where that same VA would be removed from the reverse link in the
> vm pages.
> 
> The following part of the code goes through each mapped virtual page
> for a given physical page, and removes the mappings.  It looks at the
> PTE pointer.  If there is no valid mapping, then it just removes the
> VA from the list and does not zero the PTE.
> 
> But invalid and zeroed is a difference.  Zeroed means there is no
> mapping.  Invalid means either "no mapping" or "mapped but not
> enabled for reference emulation".
> 
> So the VA was mapped, no process accessed the page, so it never got
> valid.  And in that case, the code in pmap_page_remove() removes the
> VA from the vm pages, but does not change the PTE.  Because it assumed
> there is no mapping anway.
> 
> There are more places in that pmap where we explicitly check for zero
> and not for being valid.  Unfortunately this place was missed.
> 
> Patrick

You're right, you should know, you put it there in rev1.2. of pmap7.c
-Artturi

> 
> diff --git a/sys/arch/arm/arm/pmap7.c b/sys/arch/arm/arm/pmap7.c
> index 0d32bf9..e965c48 100644
> --- a/sys/arch/arm/arm/pmap7.c
> +++ b/sys/arch/arm/arm/pmap7.c
> @@ -1155,7 +1155,7 @@ pmap_page_remove(struct vm_page *pg)
>               KDASSERT(l2b != NULL);
>  
>               ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)];
> -             if (l2pte_valid(*ptep)) {
> +             if (*ptep != 0) {
>                       pte = *ptep;
>  
>                       /* inline pmap_is_current(pm) */
> 

Reply via email to