So Theo's favourite way to trigger crashes on arm64 is to run "make
-j20" in /usr/src/lib/libc.  If I do this, I don't really see any
crashes.  However somewhere halfthrough my machine just hangs.  Now
that I can break into ddb, it quickly became obvious why.  There are
lots of processes waiting on the "vp" channel.  That's the pool used
by the pmap to build page tables and keep track of mappings.  There is
still plenty of free memory when the hang happens, so the reason the
pool allocations fail must be related to a kva shortage of some sorts.

The items in the vp pool are large: 8192 bytes, 2 pages.  So the pool
uses the (interrupt-safe) multi-page allocator.  That allocator uses
the kmem_map, which only covers a rather limited part of kva space.
And when we run out, we effectively deadlock, since we have no real
push back mechanism.  We could make the kmem_map bigger, but since the
page tables can grow without a clear bound that doesn't really solve
anything.  

A better approach would be to use the non-interrupt-safe multi-page
pool allocator here.  That should be ok since we don't actually enter
userland mappings from interrupt context.  It may have some
implications for SMP though.  At the very least, future SMP work will
have to be aware that the non-interrupt-safe pool allocator may take
the kernel lock when allocating new pool pages.

Thoughts?  ok?

P.S. The current pmap_vp_enter() code suggests that it may be called
for the kernel pmap.  That isn't actually true and drahn@ had a fix
for this in his SMP patch serious.  That diff also switched from
PR_WAITOK to PR_NOWAIT, which I think is a good move.  However, doing
that without addressing the kva issue leads to a scenario where the
kernel just spins refaulting if it runs out of kva space.

Index: arch/arm64/arm64/pmap.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v
retrieving revision 1.31
diff -u -p -r1.31 pmap.c
--- arch/arm64/arm64/pmap.c     4 Apr 2017 12:56:24 -0000       1.31
+++ arch/arm64/arm64/pmap.c     13 Apr 2017 10:05:24 -0000
@@ -1474,8 +1474,8 @@ pmap_init(void)
        pool_init(&pmap_pted_pool, sizeof(struct pte_desc), 0, IPL_VM, 0,
            "pted", NULL);
        pool_setlowat(&pmap_pted_pool, 20);
-       pool_init(&pmap_vp_pool, sizeof(struct pmapvp2), PAGE_SIZE, IPL_VM, 0,
-           "vp", NULL);
+       pool_init(&pmap_vp_pool, sizeof(struct pmapvp2), PAGE_SIZE, IPL_VM,
+           PR_WAITOK, "vp", NULL);
        /* pool_setlowat(&pmap_vp_pool, 20); */
 
        pmap_initialized = 1;




Reply via email to