So Theo's favourite way to trigger crashes on arm64 is to run "make -j20" in /usr/src/lib/libc. If I do this, I don't really see any crashes. However somewhere halfthrough my machine just hangs. Now that I can break into ddb, it quickly became obvious why. There are lots of processes waiting on the "vp" channel. That's the pool used by the pmap to build page tables and keep track of mappings. There is still plenty of free memory when the hang happens, so the reason the pool allocations fail must be related to a kva shortage of some sorts.
The items in the vp pool are large: 8192 bytes, 2 pages. So the pool uses the (interrupt-safe) multi-page allocator. That allocator uses the kmem_map, which only covers a rather limited part of kva space. And when we run out, we effectively deadlock, since we have no real push back mechanism. We could make the kmem_map bigger, but since the page tables can grow without a clear bound that doesn't really solve anything. A better approach would be to use the non-interrupt-safe multi-page pool allocator here. That should be ok since we don't actually enter userland mappings from interrupt context. It may have some implications for SMP though. At the very least, future SMP work will have to be aware that the non-interrupt-safe pool allocator may take the kernel lock when allocating new pool pages. Thoughts? ok? P.S. The current pmap_vp_enter() code suggests that it may be called for the kernel pmap. That isn't actually true and drahn@ had a fix for this in his SMP patch serious. That diff also switched from PR_WAITOK to PR_NOWAIT, which I think is a good move. However, doing that without addressing the kva issue leads to a scenario where the kernel just spins refaulting if it runs out of kva space. Index: arch/arm64/arm64/pmap.c =================================================================== RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v retrieving revision 1.31 diff -u -p -r1.31 pmap.c --- arch/arm64/arm64/pmap.c 4 Apr 2017 12:56:24 -0000 1.31 +++ arch/arm64/arm64/pmap.c 13 Apr 2017 10:05:24 -0000 @@ -1474,8 +1474,8 @@ pmap_init(void) pool_init(&pmap_pted_pool, sizeof(struct pte_desc), 0, IPL_VM, 0, "pted", NULL); pool_setlowat(&pmap_pted_pool, 20); - pool_init(&pmap_vp_pool, sizeof(struct pmapvp2), PAGE_SIZE, IPL_VM, 0, - "vp", NULL); + pool_init(&pmap_vp_pool, sizeof(struct pmapvp2), PAGE_SIZE, IPL_VM, + PR_WAITOK, "vp", NULL); /* pool_setlowat(&pmap_vp_pool, 20); */ pmap_initialized = 1;