Re: remove kcopy

2014-01-10 Thread Artur Grabowski
On Fri, Jan 10, 2014 at 6:31 AM, Ted Unangst t...@tedunangst.com wrote:
 On Fri, Jan 10, 2014 at 05:14, Miod Vallat wrote:
 The only caller of kcopy is uiomove. There is no way a function like
 this can ever work. If you need to rely on your copy function to save
 you from pointers outside the address space, it means you don't know
 what garbage you're passing it. Meaning you may well be passing it
 pointers inside the address space, but to something unexpected, which
 you will then shit on.

 Replace with memcpy.

 Vetoed.

 kcopy() is not only used to move data from the kernel data section to
 the kernel data section.

 It is used to move data *within the kernel address space* to data
 *within the kernel address space*. Think dd if=/dev/mem ...

 isn't that an example of kernel address space to userland?

 i did dig around a bit into uvm_io and callers, but didn't see
 anything that depended on kcopy fault protection. there were some
 comments indicating it is perhaps a holdover from swappable upage?

uvm_io maps userland map entries into kernel_map leaving them in the
exact same state as in userland. Even if it wasn't possible to create
valid userland map entires that always fault (it is, see below) they
can still fault on errors.

Here's a test that will crash the kernel without kcopy:
https://github.com/art4711/stuff/blob/master/pttest/pttest.c

Instead of ptrace we can trigger this with sysctl(KERN_PROC_ARGS) or
dumping core. Instead of mmap of unallocated file space you can use
revoke(2), mprotect, PT_WRITE to an mmap:ed hole in a file on a full
filesystem, etc. Add to that a combinatorial explosion of other
situations where errors propagate back to the fault and it is almost
impossible to make sure that whatever goes through uvm_io will never
fault (I guess you could try with vslock).

This is not just limited to uvm_io. I bet this can be triggered
through exec on tmpfs and the pageable mappings of its aobj too (out
of memory) and anything else that can somehow end up being an uiomove
to/from kernel_map, exec_map or some other pageable map.

Don't do it.

//art



Kill P_BIGLOCK

2011-07-05 Thread Artur Grabowski
P_BIGLOCK is only used to figure out if the process holds the biglock.
The problem with this is that the first entry point from a sleepable context
to the kernel needs to call KERNEL_PROC_LOCK while recursive (or non-process)
entry points need to call KERNEL_LOCK. Pedro showed at least one entry
point where we got it wrong, there might be others.

Instead of playing with the flag in mi_switch, just check that we're the
current biglock holder. Make KERNEL_PROC_LOCK and KERNEL_LOCK more or less
equivalent.

Cleanup will come after.

//art

Index: kern/kern_lock.c
===
RCS file: /cvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.35
diff -u -r1.35 kern_lock.c
--- kern/kern_lock.c26 Apr 2010 05:48:17 -  1.35
+++ kern/kern_lock.c5 Jul 2011 19:34:47 -
@@ -378,13 +378,11 @@
 {
SCHED_ASSERT_UNLOCKED();
__mp_lock(kernel_lock);
-   atomic_setbits_int(p-p_flag, P_BIGLOCK);
 }
 
 void
 _kernel_proc_unlock(struct proc *p)
 {
-   atomic_clearbits_int(p-p_flag, P_BIGLOCK);
__mp_unlock(kernel_lock);
 }
 
Index: kern/sched_bsd.c
===
RCS file: /cvs/src/sys/kern/sched_bsd.c,v
retrieving revision 1.25
diff -u -r1.25 sched_bsd.c
--- kern/sched_bsd.c7 Mar 2011 07:07:13 -   1.25
+++ kern/sched_bsd.c5 Jul 2011 19:34:47 -
@@ -366,8 +366,10 @@
 * Release the kernel_lock, as we are about to yield the CPU.
 */
sched_count = __mp_release_all_but_one(sched_lock);
-   if (p-p_flag  P_BIGLOCK)
+   if (__mp_lock_held(kernel_lock))
hold_count = __mp_release_all(kernel_lock);
+   else
+   hold_count = 0;
 #endif
 
/*
@@ -448,7 +450,7 @@
 * released the scheduler lock to avoid deadlock, and before
 * we reacquire the interlock and the scheduler lock.
 */
-   if (p-p_flag  P_BIGLOCK)
+   if (hold_count)
__mp_acquire_count(kernel_lock, hold_count);
__mp_acquire_count(sched_lock, sched_count + 1);
 #endif



Re: Filesystem Hierarchy Standard (FHS) and OpenBSD

2011-05-10 Thread Artur Grabowski
On Tue, May 10, 2011 at 5:33 AM, Jeff Licquia j...@licquia.org wrote:

 My question to you is: do you consider the FHS to be relevant to current
and
 future development of OpenBSD?  If not, is this simply due to lack of
 maintenance; would your interest in the FHS be greater with more consistent
 updates?

More updates will not atone for /lib64.

//art



I am an idiot in km_alloc

2011-04-19 Thread Artur Grabowski
Free the correct memory when we failed to allocate va.

//art

Index: uvm/uvm_km.c
===
RCS file: /cvs/src/sys/uvm/uvm_km.c,v
retrieving revision 1.97
diff -u -r1.97 uvm_km.c
--- uvm/uvm_km.c18 Apr 2011 19:23:46 -  1.97
+++ uvm/uvm_km.c19 Apr 2011 15:46:45 -
@@ -928,7 +928,8 @@
while (uvm_km_pages.free == 0) {
if (kd-kd_waitok == 0) {
mtx_leave(uvm_km_pages.mtx);
-   uvm_pagefree(pg);
+   if (!TAILQ_EMPTY(pgl))
+   uvm_pglistfree(pgl);
return NULL;
}
msleep(uvm_km_pages.free, uvm_km_pages.mtx, PVM,
@@ -961,6 +962,8 @@
tsleep(map, PVM, km_allocva, 0);
goto try_map;
}
+   if (!TAILQ_EMPTY(pgl))
+   uvm_pglistfree(pgl);
return (NULL);
}
}



km_alloc for stack and exec

2011-04-18 Thread Artur Grabowski
A repeat of an earlier diff.

Change stack and exec arguments allocation from old allocators to km_alloc(9).

//art


Index: kern/kern_exec.c
===
RCS file: /cvs/src/sys/kern/kern_exec.c,v
retrieving revision 1.117
diff -u -r1.117 kern_exec.c
--- kern/kern_exec.c4 Apr 2011 13:00:13 -   1.117
+++ kern/kern_exec.c18 Apr 2011 19:37:08 -
@@ -227,6 +227,11 @@
return (error);
 }
 
+struct kmem_va_mode kv_exec = {
+   .kv_map = exec_map,
+   .kv_wait = 1
+};
+
 /*
  * exec system call
  */
@@ -312,7 +317,7 @@
/* XXX -- THE FOLLOWING SECTION NEEDS MAJOR CLEANUP */
 
/* allocate an argument buffer */
-   argp = (char *) uvm_km_valloc_wait(exec_map, NCARGS);
+   argp = km_alloc(NCARGS, kv_exec, kp_pageable, kd_waitok);
 #ifdef DIAGNOSTIC
if (argp == NULL)
panic(execve: argp == NULL);
@@ -592,7 +597,7 @@
splx(s);
}
 
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
vn_close(pack.ep_vp, FREAD, cred, p);
@@ -689,7 +694,7 @@
/* close and put the exec'd file */
vn_close(pack.ep_vp, FREAD, cred, p);
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
  freehdr:
free(pack.ep_hdr, M_EXEC);
@@ -717,7 +722,7 @@
free(pack.ep_emul_arg, M_TEMP);
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
vn_close(pack.ep_vp, FREAD, cred, p);
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
 free_pack_abort:
free(pack.ep_hdr, M_EXEC);
Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.125
diff -u -r1.125 kern_fork.c
--- kern/kern_fork.c3 Apr 2011 14:56:28 -   1.125
+++ kern/kern_fork.c18 Apr 2011 19:37:08 -
@@ -195,6 +195,11 @@
 /* print the 'table full' message once per 10 seconds */
 struct timeval fork_tfmrate = { 10, 0 };
 
+struct kmem_va_mode kv_fork = {
+   .kv_map = kernel_map,
+   .kv_align = USPACE_ALIGN
+};
+
 int
 fork1(struct proc *p1, int exitsig, int flags, void *stack, size_t stacksize,
 void (*func)(void *), void *arg, register_t *retval,
@@ -204,7 +209,7 @@
uid_t uid;
struct vmspace *vm;
int count;
-   vaddr_t uaddr;
+   struct user *uaddr;
int s;
extern void endtsleep(void *);
extern void realitexpire(void *);
@@ -251,10 +256,7 @@
return (EAGAIN);
}
 
-   uaddr = uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object, USPACE,
-   USPACE_ALIGN, UVM_KMF_ZERO,
-   dma_constraint.ucr_low, dma_constraint.ucr_high,
-   0, 0, USPACE/PAGE_SIZE);
+   uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok);
if (uaddr == 0) {
chgproccnt(uid, -1);
nprocs--;
Index: kern/sys_pipe.c
===
RCS file: /cvs/src/sys/kern/sys_pipe.c,v
retrieving revision 1.58
diff -u -r1.58 sys_pipe.c
--- kern/sys_pipe.c 14 Jan 2010 23:12:11 -  1.58
+++ kern/sys_pipe.c 18 Apr 2011 19:37:08 -
@@ -168,9 +168,9 @@
 int
 pipespace(struct pipe *cpipe, u_int size)
 {
-   caddr_t buffer;
+   void *buffer;
 
-   buffer = (caddr_t)uvm_km_valloc(kernel_map, size);
+   buffer = km_alloc(size, kv_any, kp_pageable, kd_waitok);
if (buffer == NULL) {
return (ENOMEM);
}
@@ -714,8 +714,8 @@
if (cpipe-pipe_buffer.size  PIPE_SIZE)
--nbigpipe;
amountpipekva -= cpipe-pipe_buffer.size;
-   uvm_km_free(kernel_map, (vaddr_t)cpipe-pipe_buffer.buffer,
-   cpipe-pipe_buffer.size);
+   km_free(cpipe-pipe_buffer.buffer, cpipe-pipe_buffer.size,
+   kv_any, kp_pageable);
cpipe-pipe_buffer.buffer = NULL;
}
 }
Index: uvm/uvm_glue.c
===
RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
retrieving revision 1.58
diff -u -r1.58 uvm_glue.c
--- uvm/uvm_glue.c  15 Apr 2011 21:47:24 -  1.58
+++ uvm/uvm_glue.c  18 Apr 2011 19:37:09 -
@@ -361,9 +361,11 @@
 void
 uvm_exit(struct proc *p)
 {
+   extern struct kmem_va_mode kv_fork;
+
uvmspace_free(p-p_vmspace);
p-p_vmspace = NULL;
-   uvm_km_free(kernel_map, (vaddr_t)p-p_addr, USPACE);
+   km_free(p-p_addr, USPACE, kv_fork, kp_dma);
p-p_addr = NULL;
 }



more km_alloc - fork, exec and pipes

2011-04-05 Thread Artur Grabowski
A few more conversions to km_alloc: exec arguments, kernel stacks and
pipe buffers.

Tested on amd64, i386 and sparc. Please give it a spin on other architectures,
I would be especially interested in mips64 since it's the only one that needs
kernel stack alignment.

//art

Index: kern/kern_exec.c
===
RCS file: /cvs/src/sys/kern/kern_exec.c,v
retrieving revision 1.117
diff -u -r1.117 kern_exec.c
--- kern/kern_exec.c4 Apr 2011 13:00:13 -   1.117
+++ kern/kern_exec.c5 Apr 2011 20:45:08 -
@@ -227,6 +227,11 @@
return (error);
 }
 
+struct kmem_va_mode kv_exec = {
+   .kv_map = exec_map,
+   .kv_wait = 1
+};
+
 /*
  * exec system call
  */
@@ -312,7 +317,7 @@
/* XXX -- THE FOLLOWING SECTION NEEDS MAJOR CLEANUP */
 
/* allocate an argument buffer */
-   argp = (char *) uvm_km_valloc_wait(exec_map, NCARGS);
+   argp = km_alloc(NCARGS, kv_exec, kp_pageable, kd_waitok);
 #ifdef DIAGNOSTIC
if (argp == NULL)
panic(execve: argp == NULL);
@@ -592,7 +597,7 @@
splx(s);
}
 
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
vn_close(pack.ep_vp, FREAD, cred, p);
@@ -689,7 +694,7 @@
/* close and put the exec'd file */
vn_close(pack.ep_vp, FREAD, cred, p);
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
  freehdr:
free(pack.ep_hdr, M_EXEC);
@@ -717,7 +722,7 @@
free(pack.ep_emul_arg, M_TEMP);
pool_put(namei_pool, nid.ni_cnd.cn_pnbuf);
vn_close(pack.ep_vp, FREAD, cred, p);
-   uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
+   km_free(argp, NCARGS, kv_exec, kp_pageable);
 
 free_pack_abort:
free(pack.ep_hdr, M_EXEC);
Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.125
diff -u -r1.125 kern_fork.c
--- kern/kern_fork.c3 Apr 2011 14:56:28 -   1.125
+++ kern/kern_fork.c5 Apr 2011 20:45:08 -
@@ -195,6 +195,11 @@
 /* print the 'table full' message once per 10 seconds */
 struct timeval fork_tfmrate = { 10, 0 };
 
+struct kmem_va_mode kv_fork = {
+   .kv_map = kernel_map,
+   .kv_align = USPACE_ALIGN
+};
+
 int
 fork1(struct proc *p1, int exitsig, int flags, void *stack, size_t stacksize,
 void (*func)(void *), void *arg, register_t *retval,
@@ -204,7 +209,7 @@
uid_t uid;
struct vmspace *vm;
int count;
-   vaddr_t uaddr;
+   struct user *uaddr;
int s;
extern void endtsleep(void *);
extern void realitexpire(void *);
@@ -251,10 +256,7 @@
return (EAGAIN);
}
 
-   uaddr = uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object, USPACE,
-   USPACE_ALIGN, UVM_KMF_ZERO,
-   dma_constraint.ucr_low, dma_constraint.ucr_high,
-   0, 0, USPACE/PAGE_SIZE);
+   uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok);
if (uaddr == 0) {
chgproccnt(uid, -1);
nprocs--;
Index: kern/sys_pipe.c
===
RCS file: /cvs/src/sys/kern/sys_pipe.c,v
retrieving revision 1.58
diff -u -r1.58 sys_pipe.c
--- kern/sys_pipe.c 14 Jan 2010 23:12:11 -  1.58
+++ kern/sys_pipe.c 5 Apr 2011 20:45:08 -
@@ -168,9 +168,9 @@
 int
 pipespace(struct pipe *cpipe, u_int size)
 {
-   caddr_t buffer;
+   void *buffer;
 
-   buffer = (caddr_t)uvm_km_valloc(kernel_map, size);
+   buffer = km_alloc(size, kv_any, kp_pageable, kd_waitok);
if (buffer == NULL) {
return (ENOMEM);
}
@@ -714,8 +714,8 @@
if (cpipe-pipe_buffer.size  PIPE_SIZE)
--nbigpipe;
amountpipekva -= cpipe-pipe_buffer.size;
-   uvm_km_free(kernel_map, (vaddr_t)cpipe-pipe_buffer.buffer,
-   cpipe-pipe_buffer.size);
+   km_free(cpipe-pipe_buffer.buffer, cpipe-pipe_buffer.size,
+   kv_any, kp_pageable);
cpipe-pipe_buffer.buffer = NULL;
}
 }
Index: uvm/uvm_glue.c
===
RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
retrieving revision 1.56
diff -u -r1.56 uvm_glue.c
--- uvm/uvm_glue.c  1 Apr 2011 15:43:13 -   1.56
+++ uvm/uvm_glue.c  5 Apr 2011 20:45:08 -
@@ -361,9 +361,11 @@
 void
 uvm_exit(struct proc *p)
 {
+   extern struct kmem_va_mode kv_fork;
+
uvmspace_free(p-p_vmspace);
p-p_vmspace = NULL;
-   uvm_km_free(kernel_map, (vaddr_t)p-p_addr, USPACE);
+   km_free(p-p_addr, USPACE, kv_fork, kp_dma);

Re: more km_alloc - fork, exec and pipes

2011-04-05 Thread Artur Grabowski
On Tue, Apr 5, 2011 at 11:16 PM, Mark Kettenis mark.kette...@xs4all.nl
wrote:

 + uaddr = km_alloc(USPACE, kv_fork, kp_dma_zero, kd_waitok);
   if (uaddr == 0) {

 ...you should use NULL in the comparison here and drop the (struct
 user *) cast a bit further down.


Yup. I'll fix that after commit.

//art



Use km_alloc instead of the single page allocator

2011-04-04 Thread Artur Grabowski
First proper use of the new km_alloc.

 - Change pool constraints to use kmem_pa_mode instead of uvm_constraint_range
 - Use km_alloc for all backend allocations in pools.
 - Use km_alloc for the emergmency kentry allocations in uvm_mapent_alloc
 - Garbage collect uvm_km_getpage, uvm_km_getpage_pla and uvm_km_putpage

Please eyeball and test this.

//art

Index: kern/dma_alloc.c
===
RCS file: /cvs/src/sys/kern/dma_alloc.c,v
retrieving revision 1.5
diff -u -r1.5 dma_alloc.c
--- kern/dma_alloc.c2 Apr 2011 17:06:21 -   1.5
+++ kern/dma_alloc.c4 Apr 2011 21:30:57 -
@@ -37,7 +37,7 @@
1  (i + DMA_BUCKET_OFFSET));
pool_init(dmapools[i], 1  (i + DMA_BUCKET_OFFSET), 0, 0, 0,
dmanames[i], NULL);
-   pool_set_constraints(dmapools[i], dma_constraint, 1);
+   pool_set_constraints(dmapools[i], kp_dma);
pool_setipl(dmapools[i], IPL_VM);
/* XXX need pool_setlowat(dmapools[i], dmalowat); */
}
Index: kern/subr_pool.c
===
RCS file: /cvs/src/sys/kern/subr_pool.c,v
retrieving revision 1.101
diff -u -r1.101 subr_pool.c
--- kern/subr_pool.c4 Apr 2011 11:13:55 -   1.101
+++ kern/subr_pool.c4 Apr 2011 21:30:58 -
@@ -401,8 +401,7 @@
}
 
/* pglistalloc/constraint parameters */
-   pp-pr_crange = no_constraint;
-   pp-pr_pa_nsegs = 0;
+   pp-pr_crange = kp_dirty;
 
/* Insert this into the list of all pools. */
TAILQ_INSERT_HEAD(pool_head, pp, pr_poollist);
@@ -1013,18 +1012,9 @@
 }
 
 void
-pool_set_constraints(struct pool *pp, struct uvm_constraint_range *range,
-int nsegs)
+pool_set_constraints(struct pool *pp, struct kmem_pa_mode *mode)
 {
-   /*
-* Subsequent changes to the constrictions are only
-* allowed to make them _more_ strict.
-*/
-   KASSERT(pp-pr_crange-ucr_high = range-ucr_high 
-   pp-pr_crange-ucr_low = range-ucr_low);
-
-   pp-pr_crange = range;
-   pp-pr_pa_nsegs = nsegs;
+   pp-pr_crange = mode;
 }
 
 void
@@ -1495,32 +1485,36 @@
 void *
 pool_page_alloc(struct pool *pp, int flags, int *slowdown)
 {
-   int kfl = (flags  PR_WAITOK) ? 0 : UVM_KMF_NOWAIT;
+   struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER;
+
+   kd.kd_waitok = (flags  PR_WAITOK);
+   kd.kd_slowdown = slowdown;
 
-   return (uvm_km_getpage_pla(kfl, slowdown, pp-pr_crange-ucr_low,
-   pp-pr_crange-ucr_high, 0, 0));
+   return (km_alloc(PAGE_SIZE, kv_page, pp-pr_crange, kd));
 }
 
 void
 pool_page_free(struct pool *pp, void *v)
 {
-   uvm_km_putpage(v);
+   km_free(v, PAGE_SIZE, kv_page, pp-pr_crange);
 }
 
 void *
 pool_large_alloc(struct pool *pp, int flags, int *slowdown)
 {
-   int kfl = (flags  PR_WAITOK) ? 0 : UVM_KMF_NOWAIT;
-   vaddr_t va;
+   struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER;
+   void *v;
int s;
 
+   kd.kd_waitok = (flags  PR_WAITOK);
+   kd.kd_slowdown = slowdown;
+
s = splvm();
-   va = uvm_km_kmemalloc_pla(kmem_map, NULL, pp-pr_alloc-pa_pagesz, 0,
-   kfl, pp-pr_crange-ucr_low, pp-pr_crange-ucr_high,
-   0, 0, pp-pr_pa_nsegs);
+   v = km_alloc(pp-pr_alloc-pa_pagesz, kv_intrsafe, pp-pr_crange,
+   kd);
splx(s);
 
-   return ((void *)va);
+   return (v);
 }
 
 void
@@ -1529,23 +1523,23 @@
int s;
 
s = splvm();
-   uvm_km_free(kmem_map, (vaddr_t)v, pp-pr_alloc-pa_pagesz);
+   km_free(v, pp-pr_alloc-pa_pagesz, kv_intrsafe, pp-pr_crange);
splx(s);
 }
 
 void *
 pool_large_alloc_ni(struct pool *pp, int flags, int *slowdown)
 {
-   int kfl = (flags  PR_WAITOK) ? 0 : UVM_KMF_NOWAIT;
+   struct kmem_dyn_mode kd = KMEM_DYN_INITIALIZER;
+
+   kd.kd_waitok = (flags  PR_WAITOK);
+   kd.kd_slowdown = slowdown;
 
-   return ((void *)uvm_km_kmemalloc_pla(kernel_map, uvm.kernel_object,
-   pp-pr_alloc-pa_pagesz, 0, kfl,
-   pp-pr_crange-ucr_low, pp-pr_crange-ucr_high,
-   0, 0, pp-pr_pa_nsegs));
+   return (km_alloc(pp-pr_alloc-pa_pagesz, kv_any, pp-pr_crange, kd));
 }
 
 void
 pool_large_free_ni(struct pool *pp, void *v)
 {
-   uvm_km_free(kernel_map, (vaddr_t)v, pp-pr_alloc-pa_pagesz);
+   km_free(v, pp-pr_alloc-pa_pagesz, kv_any, pp-pr_crange);
 }
Index: kern/uipc_mbuf.c
===
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.149
diff -u -r1.149 uipc_mbuf.c
--- kern/uipc_mbuf.c29 Jan 2011 13:15:39 -  1.149
+++ kern/uipc_mbuf.c4 Apr 2011 21:30:59 -
@@ -136,7 +136,7 @@
int i;
 
pool_init(mbpool, MSIZE, 0, 0, 0, mbpl, NULL);
-   pool_set_constraints(mbpool, dma_constraint, 1);
+   pool_set_constraints(mbpool, kp_dma);

Fix physio on bigmem

2011-04-01 Thread Artur Grabowski
There were two problems with vslock_device functions that are
used for magic page flipping for physio and bigmem.

 - Fix error handling so that we free stuff on error.
 - We use the mappings to keep track of which pages need to be
   freed so don't unmap before freeing (this is theoretically
   incorrect and will be fixed soon).

This makes fsck happy on bigmem machines (it doesn't leak all
dma:able memory anymore).

Index: uvm/uvm_glue.c
===
RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
retrieving revision 1.55
diff -u -r1.55 uvm_glue.c
--- uvm/uvm_glue.c  2 Jul 2010 22:38:32 -   1.55
+++ uvm/uvm_glue.c  1 Apr 2011 15:08:40 -
@@ -222,8 +222,10 @@
paddr_t pa;
 
if (!pmap_extract(p-p_vmspace-vm_map.pmap,
-   start + ptoa(i), pa))
-   return (EFAULT);
+   start + ptoa(i), pa)) {
+   error = EFAULT;
+   goto out_unwire;
+   }
if (!PADDR_IS_DMA_REACHABLE(pa))
break;
}
@@ -233,13 +235,15 @@
}
 
if ((va = uvm_km_valloc(kernel_map, sz)) == 0) {
-   return (ENOMEM);
+   error = ENOMEM;
+   goto out_unwire;
}
 
TAILQ_INIT(pgl);
error = uvm_pglistalloc(npages * PAGE_SIZE, dma_constraint.ucr_low,
dma_constraint.ucr_high, 0, 0, pgl, npages, UVM_PLA_WAITOK);
-   KASSERT(error == 0);
+   if (error)
+   goto out_unmap;
 
sva = va;
while ((pg = TAILQ_FIRST(pgl)) != NULL) {
@@ -252,7 +256,16 @@
KASSERT(va == sva + sz);
*retp = (void *)(sva + off);
 
-   error = copyin(addr, *retp, len);   
+   if ((error = copyin(addr, *retp, len)) == 0)
+   return 0;
+
+   uvm_km_pgremove_intrsafe(sva, sva + sz);
+   pmap_kremove(sva, sz);
+   pmap_update(pmap_kernel());
+out_unmap:
+   uvm_km_free(kernel_map, sva, sz);
+out_unwire:
+   uvm_fault_unwire(p-p_vmspace-vm_map, start, end);
return (error);
 }
 
@@ -277,9 +290,9 @@
return;
 
kva = trunc_page((vaddr_t)map);
+   uvm_km_pgremove_intrsafe(kva, kva + sz);
pmap_kremove(kva, sz);
pmap_update(pmap_kernel());
-   uvm_km_pgremove_intrsafe(kva, kva + sz);
uvm_km_free(kernel_map, kva, sz);
 }



Re: UBC?

2010-02-01 Thread Artur Grabowski
Ariane van der Steldt ari...@stack.nl writes:

 Why are the pventries allocated from the kmem_map anyway? I think they
 should be allocated using the uvm_km_getpage instead. Or even better,
 from a pvpool like amd64.

Recursion.

caller holds lock on kernel_map. getpage pool is empty, caller wakes
up the getpage thread, goes to sleep (still holding the kernel_map
lock), getpage thread wakes up, deadlocks on the kernel_map lock. It's
not an easily detectable recursion either, so we don't panic when it
happens, we just hang.

amd64 can avoid it thanks to the direct map (no kernel_map involed when
calling getpage).

We could try some magic with allocating from a pool with NOWAIT and
then fall back to kmem_map when that fails, but the logic would become
hairy. Maybe a pool allocator with those semantics?

//art