Re: remove kcopy
On Fri, Jan 10, 2014 at 6:31 AM, Ted Unangst t...@tedunangst.com wrote: On Fri, Jan 10, 2014 at 05:14, Miod Vallat wrote: The only caller of kcopy is uiomove. There is no way a function like this can ever work. If you need to rely on your copy function to save you from pointers outside the address space, it means you don't know what garbage you're passing it. Meaning you may well be passing it pointers inside the address space, but to something unexpected, which you will then shit on. Replace with memcpy. Vetoed. kcopy() is not only used to move data from the kernel data section to the kernel data section. It is used to move data *within the kernel address space* to data *within the kernel address space*. Think dd if=/dev/mem ... isn't that an example of kernel address space to userland? i did dig around a bit into uvm_io and callers, but didn't see anything that depended on kcopy fault protection. there were some comments indicating it is perhaps a holdover from swappable upage? uvm_io maps userland map entries into kernel_map leaving them in the exact same state as in userland. Even if it wasn't possible to create valid userland map entires that always fault (it is, see below) they can still fault on errors. Here's a test that will crash the kernel without kcopy: https://github.com/art4711/stuff/blob/master/pttest/pttest.c Instead of ptrace we can trigger this with sysctl(KERN_PROC_ARGS) or dumping core. Instead of mmap of unallocated file space you can use revoke(2), mprotect, PT_WRITE to an mmap:ed hole in a file on a full filesystem, etc. Add to that a combinatorial explosion of other situations where errors propagate back to the fault and it is almost impossible to make sure that whatever goes through uvm_io will never fault (I guess you could try with vslock). This is not just limited to uvm_io. I bet this can be triggered through exec on tmpfs and the pageable mappings of its aobj too (out of memory) and anything else that can somehow end up being an uiomove to/from kernel_map, exec_map or some other pageable map. Don't do it. //art
Kill P_BIGLOCK
P_BIGLOCK is only used to figure out if the process holds the biglock. The problem with this is that the first entry point from a sleepable context to the kernel needs to call KERNEL_PROC_LOCK while recursive (or non-process) entry points need to call KERNEL_LOCK. Pedro showed at least one entry point where we got it wrong, there might be others. Instead of playing with the flag in mi_switch, just check that we're the current biglock holder. Make KERNEL_PROC_LOCK and KERNEL_LOCK more or less equivalent. Cleanup will come after. //art Index: kern/kern_lock.c === RCS file: /cvs/src/sys/kern/kern_lock.c,v retrieving revision 1.35 diff -u -r1.35 kern_lock.c --- kern/kern_lock.c26 Apr 2010 05:48:17 - 1.35 +++ kern/kern_lock.c5 Jul 2011 19:34:47 - @@ -378,13 +378,11 @@ { SCHED_ASSERT_UNLOCKED(); __mp_lock(kernel_lock); - atomic_setbits_int(p-p_flag, P_BIGLOCK); } void _kernel_proc_unlock(struct proc *p) { - atomic_clearbits_int(p-p_flag, P_BIGLOCK); __mp_unlock(kernel_lock); } Index: kern/sched_bsd.c === RCS file: /cvs/src/sys/kern/sched_bsd.c,v retrieving revision 1.25 diff -u -r1.25 sched_bsd.c --- kern/sched_bsd.c7 Mar 2011 07:07:13 - 1.25 +++ kern/sched_bsd.c5 Jul 2011 19:34:47 - @@ -366,8 +366,10 @@ * Release the kernel_lock, as we are about to yield the CPU. */ sched_count = __mp_release_all_but_one(sched_lock); - if (p-p_flag P_BIGLOCK) + if (__mp_lock_held(kernel_lock)) hold_count = __mp_release_all(kernel_lock); + else + hold_count = 0; #endif /* @@ -448,7 +450,7 @@ * released the scheduler lock to avoid deadlock, and before * we reacquire the interlock and the scheduler lock. */ - if (p-p_flag P_BIGLOCK) + if (hold_count) __mp_acquire_count(kernel_lock, hold_count); __mp_acquire_count(sched_lock, sched_count + 1); #endif
Re: Filesystem Hierarchy Standard (FHS) and OpenBSD
On Tue, May 10, 2011 at 5:33 AM, Jeff Licquia j...@licquia.org wrote: My question to you is: do you consider the FHS to be relevant to current and future development of OpenBSD? If not, is this simply due to lack of maintenance; would your interest in the FHS be greater with more consistent updates? More updates will not atone for /lib64. //art
I am an idiot in km_alloc
Free the correct memory when we failed to allocate va. //art Index: uvm/uvm_km.c === RCS file: /cvs/src/sys/uvm/uvm_km.c,v retrieving revision 1.97 diff -u -r1.97 uvm_km.c --- uvm/uvm_km.c18 Apr 2011 19:23:46 - 1.97 +++ uvm/uvm_km.c19 Apr 2011 15:46:45 - @@ -928,7 +928,8 @@ while (uvm_km_pages.free == 0) { if (kd-kd_waitok == 0) { mtx_leave(uvm_km_pages.mtx); - uvm_pagefree(pg); + if (!TAILQ_EMPTY(pgl)) + uvm_pglistfree(pgl); return NULL; } msleep(uvm_km_pages.free, uvm_km_pages.mtx, PVM, @@ -961,6 +962,8 @@ tsleep(map, PVM, km_allocva, 0); goto try_map; } + if (!TAILQ_EMPTY(pgl)) + uvm_pglistfree(pgl); return (NULL); } }
km_alloc for stack and exec
A repeat of an earlier diff. Change stack and exec arguments allocation from old allocators to km_alloc(9). //art Index: kern/kern_exec.c === RCS file: /cvs/src/sys/kern/kern_exec.c,v retrieving revision 1.117 diff -u -r1.117 kern_exec.c --- kern/kern_exec.c4 Apr 2011 13:00:13 - 1.117 +++ kern/kern_exec.c18 Apr 2011 19:37:08 - @@ -227,6 +227,11 @@ return (error); } +struct kmem_va_mode kv_exec = { + .kv_map = exec_map, + .kv_wait = 1 +}; + /* * exec system call */ @@ -312,7 +317,7 @@ /* XXX -- THE FOLLOWING SECTION NEEDS MAJOR CLEANUP */ /* allocate an argument buffer */ - argp = (char *) uvm_km_valloc_wait(exec_map, NCARGS); + argp = km_alloc(NCARGS, kv_exec, kp_pageable, kd_waitok); #ifdef DIAGNOSTIC if (argp == NULL) panic(execve: argp == NULL); @@ -592,7 +597,7 @@ splx(s); } - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); vn_close(pack.ep_vp, FREAD, cred, p); @@ -689,7 +694,7 @@ /* close and put the exec'd file */ vn_close(pack.ep_vp, FREAD, cred, p); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); freehdr: free(pack.ep_hdr, M_EXEC); @@ -717,7 +722,7 @@ free(pack.ep_emul_arg, M_TEMP); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); vn_close(pack.ep_vp, FREAD, cred, p); - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); free_pack_abort: free(pack.ep_hdr, M_EXEC); Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.125 diff -u -r1.125 kern_fork.c --- kern/kern_fork.c3 Apr 2011 14:56:28 - 1.125 +++ kern/kern_fork.c18 Apr 2011 19:37:08 - @@ -195,6 +195,11 @@ /* print the 'table full' message once per 10 seconds */ struct timeval fork_tfmrate = { 10, 0 }; +struct kmem_va_mode kv_fork = { + .kv_map = kernel_map, + .kv_align = USPACE_ALIGN +}; + int fork1(struct proc *p1, int exitsig, int flags, void *stack, size_t stacksize, void (*func)(void *), void *arg, register_t *retval, @@ -204,7 +209,7 @@ uid_t uid; struct vmspace *vm; int count; - vaddr_t uaddr; + struct user *uaddr; int s; extern void endtsleep(void *); extern void realitexpire(void *); @@ -251,10 +256,7 @@ return (EAGAIN); } - uaddr = uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object, USPACE, - USPACE_ALIGN, UVM_KMF_ZERO, - dma_constraint.ucr_low, dma_constraint.ucr_high, - 0, 0, USPACE/PAGE_SIZE); + uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok); if (uaddr == 0) { chgproccnt(uid, -1); nprocs--; Index: kern/sys_pipe.c === RCS file: /cvs/src/sys/kern/sys_pipe.c,v retrieving revision 1.58 diff -u -r1.58 sys_pipe.c --- kern/sys_pipe.c 14 Jan 2010 23:12:11 - 1.58 +++ kern/sys_pipe.c 18 Apr 2011 19:37:08 - @@ -168,9 +168,9 @@ int pipespace(struct pipe *cpipe, u_int size) { - caddr_t buffer; + void *buffer; - buffer = (caddr_t)uvm_km_valloc(kernel_map, size); + buffer = km_alloc(size, kv_any, kp_pageable, kd_waitok); if (buffer == NULL) { return (ENOMEM); } @@ -714,8 +714,8 @@ if (cpipe-pipe_buffer.size PIPE_SIZE) --nbigpipe; amountpipekva -= cpipe-pipe_buffer.size; - uvm_km_free(kernel_map, (vaddr_t)cpipe-pipe_buffer.buffer, - cpipe-pipe_buffer.size); + km_free(cpipe-pipe_buffer.buffer, cpipe-pipe_buffer.size, + kv_any, kp_pageable); cpipe-pipe_buffer.buffer = NULL; } } Index: uvm/uvm_glue.c === RCS file: /cvs/src/sys/uvm/uvm_glue.c,v retrieving revision 1.58 diff -u -r1.58 uvm_glue.c --- uvm/uvm_glue.c 15 Apr 2011 21:47:24 - 1.58 +++ uvm/uvm_glue.c 18 Apr 2011 19:37:09 - @@ -361,9 +361,11 @@ void uvm_exit(struct proc *p) { + extern struct kmem_va_mode kv_fork; + uvmspace_free(p-p_vmspace); p-p_vmspace = NULL; - uvm_km_free(kernel_map, (vaddr_t)p-p_addr, USPACE); + km_free(p-p_addr, USPACE, kv_fork, kp_dma); p-p_addr = NULL; }
more km_alloc - fork, exec and pipes
A few more conversions to km_alloc: exec arguments, kernel stacks and pipe buffers. Tested on amd64, i386 and sparc. Please give it a spin on other architectures, I would be especially interested in mips64 since it's the only one that needs kernel stack alignment. //art Index: kern/kern_exec.c === RCS file: /cvs/src/sys/kern/kern_exec.c,v retrieving revision 1.117 diff -u -r1.117 kern_exec.c --- kern/kern_exec.c4 Apr 2011 13:00:13 - 1.117 +++ kern/kern_exec.c5 Apr 2011 20:45:08 - @@ -227,6 +227,11 @@ return (error); } +struct kmem_va_mode kv_exec = { + .kv_map = exec_map, + .kv_wait = 1 +}; + /* * exec system call */ @@ -312,7 +317,7 @@ /* XXX -- THE FOLLOWING SECTION NEEDS MAJOR CLEANUP */ /* allocate an argument buffer */ - argp = (char *) uvm_km_valloc_wait(exec_map, NCARGS); + argp = km_alloc(NCARGS, kv_exec, kp_pageable, kd_waitok); #ifdef DIAGNOSTIC if (argp == NULL) panic(execve: argp == NULL); @@ -592,7 +597,7 @@ splx(s); } - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); vn_close(pack.ep_vp, FREAD, cred, p); @@ -689,7 +694,7 @@ /* close and put the exec'd file */ vn_close(pack.ep_vp, FREAD, cred, p); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); freehdr: free(pack.ep_hdr, M_EXEC); @@ -717,7 +722,7 @@ free(pack.ep_emul_arg, M_TEMP); pool_put(namei_pool, nid.ni_cnd.cn_pnbuf); vn_close(pack.ep_vp, FREAD, cred, p); - uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS); + km_free(argp, NCARGS, kv_exec, kp_pageable); free_pack_abort: free(pack.ep_hdr, M_EXEC); Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.125 diff -u -r1.125 kern_fork.c --- kern/kern_fork.c3 Apr 2011 14:56:28 - 1.125 +++ kern/kern_fork.c5 Apr 2011 20:45:08 - @@ -195,6 +195,11 @@ /* print the 'table full' message once per 10 seconds */ struct timeval fork_tfmrate = { 10, 0 }; +struct kmem_va_mode kv_fork = { + .kv_map = kernel_map, + .kv_align = USPACE_ALIGN +}; + int fork1(struct proc *p1, int exitsig, int flags, void *stack, size_t stacksize, void (*func)(void *), void *arg, register_t *retval, @@ -204,7 +209,7 @@ uid_t uid; struct vmspace *vm; int count; - vaddr_t uaddr; + struct user *uaddr; int s; extern void endtsleep(void *); extern void realitexpire(void *); @@ -251,10 +256,7 @@ return (EAGAIN); } - uaddr = uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object, USPACE, - USPACE_ALIGN, UVM_KMF_ZERO, - dma_constraint.ucr_low, dma_constraint.ucr_high, - 0, 0, USPACE/PAGE_SIZE); + uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok); if (uaddr == 0) { chgproccnt(uid, -1); nprocs--; Index: kern/sys_pipe.c === RCS file: /cvs/src/sys/kern/sys_pipe.c,v retrieving revision 1.58 diff -u -r1.58 sys_pipe.c --- kern/sys_pipe.c 14 Jan 2010 23:12:11 - 1.58 +++ kern/sys_pipe.c 5 Apr 2011 20:45:08 - @@ -168,9 +168,9 @@ int pipespace(struct pipe *cpipe, u_int size) { - caddr_t buffer; + void *buffer; - buffer = (caddr_t)uvm_km_valloc(kernel_map, size); + buffer = km_alloc(size, kv_any, kp_pageable, kd_waitok); if (buffer == NULL) { return (ENOMEM); } @@ -714,8 +714,8 @@ if (cpipe-pipe_buffer.size PIPE_SIZE) --nbigpipe; amountpipekva -= cpipe-pipe_buffer.size; - uvm_km_free(kernel_map, (vaddr_t)cpipe-pipe_buffer.buffer, - cpipe-pipe_buffer.size); + km_free(cpipe-pipe_buffer.buffer, cpipe-pipe_buffer.size, + kv_any, kp_pageable); cpipe-pipe_buffer.buffer = NULL; } } Index: uvm/uvm_glue.c === RCS file: /cvs/src/sys/uvm/uvm_glue.c,v retrieving revision 1.56 diff -u -r1.56 uvm_glue.c --- uvm/uvm_glue.c 1 Apr 2011 15:43:13 - 1.56 +++ uvm/uvm_glue.c 5 Apr 2011 20:45:08 - @@ -361,9 +361,11 @@ void uvm_exit(struct proc *p) { + extern struct kmem_va_mode kv_fork; + uvmspace_free(p-p_vmspace); p-p_vmspace = NULL; - uvm_km_free(kernel_map, (vaddr_t)p-p_addr, USPACE); + km_free(p-p_addr, USPACE, kv_fork, kp_dma);
Re: more km_alloc - fork, exec and pipes
On Tue, Apr 5, 2011 at 11:16 PM, Mark Kettenis mark.kette...@xs4all.nl wrote: + uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok); if (uaddr == 0) { ...you should use NULL in the comparison here and drop the (struct user *) cast a bit further down. Yup. I'll fix that after commit. //art
Use km_alloc instead of the single page allocator
First proper use of the new km_alloc. - Change pool constraints to use kmem_pa_mode instead of uvm_constraint_range - Use km_alloc for all backend allocations in pools. - Use km_alloc for the emergmency kentry allocations in uvm_mapent_alloc - Garbage collect uvm_km_getpage, uvm_km_getpage_pla and uvm_km_putpage Please eyeball and test this. //art Index: kern/dma_alloc.c === RCS file: /cvs/src/sys/kern/dma_alloc.c,v retrieving revision 1.5 diff -u -r1.5 dma_alloc.c --- kern/dma_alloc.c2 Apr 2011 17:06:21 - 1.5 +++ kern/dma_alloc.c4 Apr 2011 21:30:57 - @@ -37,7 +37,7 @@ 1 (i + DMA_BUCKET_OFFSET)); pool_init(dmapools[i], 1 (i + DMA_BUCKET_OFFSET), 0, 0, 0, dmanames[i], NULL); - pool_set_constraints(dmapools[i], dma_constraint, 1); + pool_set_constraints(dmapools[i], kp_dma); pool_setipl(dmapools[i], IPL_VM); /* XXX need pool_setlowat(dmapools[i], dmalowat); */ } Index: kern/subr_pool.c === RCS file: /cvs/src/sys/kern/subr_pool.c,v retrieving revision 1.101 diff -u -r1.101 subr_pool.c --- kern/subr_pool.c4 Apr 2011 11:13:55 - 1.101 +++ kern/subr_pool.c4 Apr 2011 21:30:58 - @@ -401,8 +401,7 @@ } /* pglistalloc/constraint parameters */ - pp-pr_crange = no_constraint; - pp-pr_pa_nsegs = 0; + pp-pr_crange = kp_dirty; /* Insert this into the list of all pools. */ TAILQ_INSERT_HEAD(pool_head, pp, pr_poollist); @@ -1013,18 +1012,9 @@ } void -pool_set_constraints(struct pool *pp, struct uvm_constraint_range *range, -int nsegs) +pool_set_constraints(struct pool *pp, struct kmem_pa_mode *mode) { - /* -* Subsequent changes to the constrictions are only -* allowed to make them _more_ strict. -*/ - KASSERT(pp-pr_crange-ucr_high = range-ucr_high - pp-pr_crange-ucr_low = range-ucr_low); - - pp-pr_crange = range; - pp-pr_pa_nsegs = nsegs; + pp-pr_crange = mode; } void @@ -1495,32 +1485,36 @@ void * pool_page_alloc(struct pool *pp, int flags, int *slowdown) { - int kfl = (flags PR_WAITOK) ? 0 : UVM_KMF_NOWAIT; + struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER; + + kd.kd_waitok = (flags PR_WAITOK); + kd.kd_slowdown = slowdown; - return (uvm_km_getpage_pla(kfl, slowdown, pp-pr_crange-ucr_low, - pp-pr_crange-ucr_high, 0, 0)); + return (km_alloc(PAGE_SIZE, kv_page, pp-pr_crange, kd)); } void pool_page_free(struct pool *pp, void *v) { - uvm_km_putpage(v); + km_free(v, PAGE_SIZE, kv_page, pp-pr_crange); } void * pool_large_alloc(struct pool *pp, int flags, int *slowdown) { - int kfl = (flags PR_WAITOK) ? 0 : UVM_KMF_NOWAIT; - vaddr_t va; + struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER; + void *v; int s; + kd.kd_waitok = (flags PR_WAITOK); + kd.kd_slowdown = slowdown; + s = splvm(); - va = uvm_km_kmemalloc_pla(kmem_map, NULL, pp-pr_alloc-pa_pagesz, 0, - kfl, pp-pr_crange-ucr_low, pp-pr_crange-ucr_high, - 0, 0, pp-pr_pa_nsegs); + v = km_alloc(pp-pr_alloc-pa_pagesz, kv_intrsafe, pp-pr_crange, + kd); splx(s); - return ((void *)va); + return (v); } void @@ -1529,23 +1523,23 @@ int s; s = splvm(); - uvm_km_free(kmem_map, (vaddr_t)v, pp-pr_alloc-pa_pagesz); + km_free(v, pp-pr_alloc-pa_pagesz, kv_intrsafe, pp-pr_crange); splx(s); } void * pool_large_alloc_ni(struct pool *pp, int flags, int *slowdown) { - int kfl = (flags PR_WAITOK) ? 0 : UVM_KMF_NOWAIT; + struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER; + + kd.kd_waitok = (flags PR_WAITOK); + kd.kd_slowdown = slowdown; - return ((void *)uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object, - pp-pr_alloc-pa_pagesz, 0, kfl, - pp-pr_crange-ucr_low, pp-pr_crange-ucr_high, - 0, 0, pp-pr_pa_nsegs)); + return (km_alloc(pp-pr_alloc-pa_pagesz, kv_any, pp-pr_crange, kd)); } void pool_large_free_ni(struct pool *pp, void *v) { - uvm_km_free(kernel_map, (vaddr_t)v, pp-pr_alloc-pa_pagesz); + km_free(v, pp-pr_alloc-pa_pagesz, kv_any, pp-pr_crange); } Index: kern/uipc_mbuf.c === RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v retrieving revision 1.149 diff -u -r1.149 uipc_mbuf.c --- kern/uipc_mbuf.c29 Jan 2011 13:15:39 - 1.149 +++ kern/uipc_mbuf.c4 Apr 2011 21:30:59 - @@ -136,7 +136,7 @@ int i; pool_init(mbpool, MSIZE, 0, 0, 0, mbpl, NULL); - pool_set_constraints(mbpool, dma_constraint, 1); + pool_set_constraints(mbpool, kp_dma);
Fix physio on bigmem
There were two problems with vslock_device functions that are used for magic page flipping for physio and bigmem. - Fix error handling so that we free stuff on error. - We use the mappings to keep track of which pages need to be freed so don't unmap before freeing (this is theoretically incorrect and will be fixed soon). This makes fsck happy on bigmem machines (it doesn't leak all dma:able memory anymore). Index: uvm/uvm_glue.c === RCS file: /cvs/src/sys/uvm/uvm_glue.c,v retrieving revision 1.55 diff -u -r1.55 uvm_glue.c --- uvm/uvm_glue.c 2 Jul 2010 22:38:32 - 1.55 +++ uvm/uvm_glue.c 1 Apr 2011 15:08:40 - @@ -222,8 +222,10 @@ paddr_t pa; if (!pmap_extract(p-p_vmspace-vm_map.pmap, - start + ptoa(i), pa)) - return (EFAULT); + start + ptoa(i), pa)) { + error = EFAULT; + goto out_unwire; + } if (!PADDR_IS_DMA_REACHABLE(pa)) break; } @@ -233,13 +235,15 @@ } if ((va = uvm_km_valloc(kernel_map, sz)) == 0) { - return (ENOMEM); + error = ENOMEM; + goto out_unwire; } TAILQ_INIT(pgl); error = uvm_pglistalloc(npages * PAGE_SIZE, dma_constraint.ucr_low, dma_constraint.ucr_high, 0, 0, pgl, npages, UVM_PLA_WAITOK); - KASSERT(error == 0); + if (error) + goto out_unmap; sva = va; while ((pg = TAILQ_FIRST(pgl)) != NULL) { @@ -252,7 +256,16 @@ KASSERT(va == sva + sz); *retp = (void *)(sva + off); - error = copyin(addr, *retp, len); + if ((error = copyin(addr, *retp, len)) == 0) + return 0; + + uvm_km_pgremove_intrsafe(sva, sva + sz); + pmap_kremove(sva, sz); + pmap_update(pmap_kernel()); +out_unmap: + uvm_km_free(kernel_map, sva, sz); +out_unwire: + uvm_fault_unwire(p-p_vmspace-vm_map, start, end); return (error); } @@ -277,9 +290,9 @@ return; kva = trunc_page((vaddr_t)map); + uvm_km_pgremove_intrsafe(kva, kva + sz); pmap_kremove(kva, sz); pmap_update(pmap_kernel()); - uvm_km_pgremove_intrsafe(kva, kva + sz); uvm_km_free(kernel_map, kva, sz); }
Re: UBC?
Ariane van der Steldt ari...@stack.nl writes: Why are the pventries allocated from the kmem_map anyway? I think they should be allocated using the uvm_km_getpage instead. Or even better, from a pvpool like amd64. Recursion. caller holds lock on kernel_map. getpage pool is empty, caller wakes up the getpage thread, goes to sleep (still holding the kernel_map lock), getpage thread wakes up, deadlocks on the kernel_map lock. It's not an easily detectable recursion either, so we don't panic when it happens, we just hang. amd64 can avoid it thanks to the direct map (no kernel_map involed when calling getpage). We could try some magic with allocating from a pool with NOWAIT and then fall back to kmem_map when that fails, but the logic would become hairy. Maybe a pool allocator with those semantics? //art