Hello, I'm trying to develop a gntdev for NetBSD, I've posted a first version of the device some time ago, but it had problems (mainly it was unable to work with HVM domains).
Maybe it would be good to have a little introduction about the gnt device. It is used by Xen userspace programs to map memory from other domains (it does not allow to share memory from the current domain, only to map memory from other domains that have previously allowed it). It is mainly used to run device backends in userspace. This device works by passing a memory region previously allocated with mmap(NULL, size, prot, MAP_ANON | MAP_SHARED, -1, 0). This region is passed to the device using an ioctl, and then inside the device we get the mfm of the ptes of the allocated memory region and pass them to Xen, so the hypervisor can modify the ptes to point to the right mfns from the other domain. So far I've been able to get the ptes, pass them to Xen and stablish the mapping. Writing to that memory area from userspace seems to work fine (using pread), but the problem comes when the userspace program executes something like: pwrite(fd, buf,...) Where "buf" is a region of the memory mapped by the gnt device. This triggers a page fault in UVM, and this fault will try to modify the pte of the mapped memory region. This pte should not be modified, because if we modify the content of the pte, Xen will probably complain and crash, and if Xen doesn't crash we won't be able to unmap the pte later on, since the pte doesn't contain the value that Xen expects. I've added a little hack to my gnt device, to be able to know who is trying to change the content of the pte, and I've got the following trace: breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x1f2 printf_nolog() at netbsd:printf_nolog xpq_flush_queue() at netbsd:xpq_flush_queue+0x180 pmap_enter_ma() at netbsd:pmap_enter_ma+0x5c1 pmap_enter() at netbsd:pmap_enter+0x35 uvm_fault_upper_enter.clone.4() at netbsd:uvm_fault_upper_enter.clone.4+0x22a uvm_fault_internal() at netbsd:uvm_fault_internal+0x28f4 uvm_fault_wire() at netbsd:uvm_fault_wire+0x53 genfs_directio() at netbsd:genfs_directio+0x16a ffs_write() at netbsd:ffs_write+0x43a VOP_WRITE() at netbsd:VOP_WRITE+0x55 vn_write() at netbsd:vn_write+0xf9 do_filewritev() at netbsd:do_filewritev+0x1fd sys_pwritev() at netbsd:sys_pwritev+0x2b syscall() at netbsd:syscall+0x94 --- syscall (number 290) --- Is there anyway to prevent UVM from faulting? The address on that VA is already set AFAIK, but I don't know almost anything about how UVM works, so I would like to ask if someone could help me with that. I'm attaching the code of the gntdev, the main function that contains interesting code is gntmap_grant_ref, that's where I try to get the ptes and set the mapping. This is not finished code, but I would like to understand why this page faults happen, and how can I solve this problem. Thanks, Roger.
>From 1e6fb3749453810be6bec2e14e8f8abd371e9a6f Mon Sep 17 00:00:00 2001 From: Roger Pau Monne <roger....@citrix.com> Date: Tue, 8 Jan 2013 19:42:29 +0100 Subject: [PATCH] xen: add gntdev This is a basic (and experimental) gntdev implementation for NetBSD. The gnt device allows usermode applications to map grant references in userspace. It is mainly used by Qemu to implement a Xen backend (that runs in userspace). Due to the fact that qemu-upstream is not yet functional in NetBSD, the only way to try this gntdev is to use the old qemu (qemu-traditional). This device allows to map memory from guests domains that request it, but it doesn't allow to map memory from the current domain to another one. --- etc/MAKEDEV.tmpl | 5 + etc/etc.amd64/MAKEDEV.conf | 2 +- etc/etc.i386/MAKEDEV.conf | 2 +- sys/arch/amd64/conf/XEN3_DOM0 | 1 + sys/arch/amd64/conf/majors.amd64 | 1 + sys/arch/i386/conf/XEN3_DOM0 | 1 + sys/arch/i386/conf/majors.i386 | 1 + sys/arch/xen/conf/files.xen | 2 + sys/arch/xen/include/xen_shm.h | 3 + sys/arch/xen/include/xenio.h | 76 ++++++ sys/arch/xen/x86/x86_xpmap.c | 24 ++ sys/arch/xen/x86/xen_shm_machdep.c | 70 +++++- sys/arch/xen/xen/gntdev.c | 492 ++++++++++++++++++++++++++++++++++++ sys/dev/DEVNAMES | 1 + sys/rump/librump/rumpkern/devsw.c | 1 + 15 files changed, 679 insertions(+), 3 deletions(-) create mode 100644 sys/arch/xen/xen/gntdev.c diff --git a/etc/MAKEDEV.tmpl b/etc/MAKEDEV.tmpl index 21b0568..00029c6 100644 --- a/etc/MAKEDEV.tmpl +++ b/etc/MAKEDEV.tmpl @@ -289,6 +289,7 @@ # wsfont* console font control # wsmux* wscons event multiplexor # xenevt Xen event interface +# gntdev Xen grant table interface # # iSCSI communication devices # iscsi* iSCSI driver and /sbin/iscsid communication @@ -1020,6 +1021,10 @@ xsd_kva) mkdev xsd_kva c %xenevt_chr% 1 ;; +gntdev) + mkdev gntdev c %gntdev_chr% 0 + ;; + xencons) mkdev xencons c %xencons_chr% 0 ;; diff --git a/etc/etc.amd64/MAKEDEV.conf b/etc/etc.amd64/MAKEDEV.conf index a4a831c..5e2098c 100644 --- a/etc/etc.amd64/MAKEDEV.conf +++ b/etc/etc.amd64/MAKEDEV.conf @@ -44,5 +44,5 @@ all_md) ;; xen) - makedev xenevt xencons xsd_kva + makedev xenevt xencons xsd_kva gntdev ;; diff --git a/etc/etc.i386/MAKEDEV.conf b/etc/etc.i386/MAKEDEV.conf index ba3e2cc..bd38673 100644 --- a/etc/etc.i386/MAKEDEV.conf +++ b/etc/etc.i386/MAKEDEV.conf @@ -48,7 +48,7 @@ all_md) ;; xen) - makedev xenevt xencons xsd_kva + makedev xenevt xencons xsd_kva gntdev ;; floppy) diff --git a/sys/arch/amd64/conf/XEN3_DOM0 b/sys/arch/amd64/conf/XEN3_DOM0 index e5f9f1f..1807dd2 100644 --- a/sys/arch/amd64/conf/XEN3_DOM0 +++ b/sys/arch/amd64/conf/XEN3_DOM0 @@ -838,6 +838,7 @@ pseudo-device wsfont pseudo-device drvctl # xen pseudo-devices +pseudo-device gntdev pseudo-device xenevt pseudo-device xvif pseudo-device xbdback diff --git a/sys/arch/amd64/conf/majors.amd64 b/sys/arch/amd64/conf/majors.amd64 index 9e6b1ac..cf15f7d 100644 --- a/sys/arch/amd64/conf/majors.amd64 +++ b/sys/arch/amd64/conf/majors.amd64 @@ -96,6 +96,7 @@ device-major nsmb char 98 nsmb # - they appear in the i386 MAKEDEV # +device-major gntdev char 140 gntdev device-major xenevt char 141 xenevt device-major xbd char 142 block 142 xbd device-major xencons char 143 xencons diff --git a/sys/arch/i386/conf/XEN3_DOM0 b/sys/arch/i386/conf/XEN3_DOM0 index 8b5cf99..be28bbc 100644 --- a/sys/arch/i386/conf/XEN3_DOM0 +++ b/sys/arch/i386/conf/XEN3_DOM0 @@ -820,6 +820,7 @@ pseudo-device wsfont pseudo-device drvctl # xen pseudo-devices +pseudo-device gntdev pseudo-device xenevt pseudo-device xvif pseudo-device xbdback diff --git a/sys/arch/i386/conf/majors.i386 b/sys/arch/i386/conf/majors.i386 index 38c043f..9aab728 100644 --- a/sys/arch/i386/conf/majors.i386 +++ b/sys/arch/i386/conf/majors.i386 @@ -111,6 +111,7 @@ device-major mt char 107 block 24 mt # - they appear in the i386 MAKEDEV # +device-major gntdev char 140 gntdev device-major xenevt char 141 xenevt device-major xbd char 142 block 142 xbd device-major xencons char 143 xencons diff --git a/sys/arch/xen/conf/files.xen b/sys/arch/xen/conf/files.xen index e022db5..91ff858 100644 --- a/sys/arch/xen/conf/files.xen +++ b/sys/arch/xen/conf/files.xen @@ -198,6 +198,7 @@ attach xencons at xendevbus file arch/xen/xen/xencons.c xencons needs-flag # Xen event peudo-device +defpseudo gntdev defpseudo xenevt defpseudo xvif defpseudo xbdback @@ -390,6 +391,7 @@ include "dev/pcmcia/files.pcmcia" # Domain-0 operations defflag opt_xen.h DOM0OPS file arch/xen/xen/privcmd.c dom0ops +file arch/xen/xen/gntdev.c dom0ops file arch/xen/x86/xen_shm_machdep.c dom0ops file arch/x86/pci/pci_machdep.c hypervisor & pci & dom0ops file arch/xen/xen/pci_intr_machdep.c hypervisor & pci diff --git a/sys/arch/xen/include/xen_shm.h b/sys/arch/xen/include/xen_shm.h index e2d89d0..6416ca1 100644 --- a/sys/arch/xen/include/xen_shm.h +++ b/sys/arch/xen/include/xen_shm.h @@ -37,7 +37,10 @@ */ int xen_shm_map(int, int, grant_ref_t *, vaddr_t *, grant_handle_t *, int); +int xen_shm_map_pte(int nentries, int *domid, grant_ref_t *grefp, + pt_entry_t **pte, grant_handle_t *handlep, int flags); void xen_shm_unmap(vaddr_t, int, grant_handle_t *); +int xen_shm_unmap_pte(int, pt_entry_t **, grant_handle_t *); int xen_shm_callback(int (*)(void *), void *); /* flags for xen_shm_map() */ diff --git a/sys/arch/xen/include/xenio.h b/sys/arch/xen/include/xenio.h index 6b25733..87cd376 100644 --- a/sys/arch/xen/include/xenio.h +++ b/sys/arch/xen/include/xenio.h @@ -122,4 +122,80 @@ typedef struct oprivcmd_hypercall /* EVTCHN_UNBIND: Unbind from the specified event-channel port. */ #define EVTCHN_UNBIND _IOW('E', 3, unsigned long) +/* Interface to /dev/gntdev */ + +typedef struct ioctl_gntdev_grant_ref { + /* The domain ID of the grant to be mapped. */ + uint32_t domid; + /* The grant reference of the grant to be mapped. */ + uint32_t ref; +} ioctl_gntdev_grant_ref; + +typedef struct ioctl_gntdev_map_grant_ref { + /* IN parameters */ + /* The number of grants to be mapped. */ + uint32_t count; + uint32_t pad; + uint64_t vaddr; + /* OUT parameters */ + /* The offset to be used on a subsequent call to mmap(). */ + uint64_t index; + /* Variable IN parameter. */ + /* Array of grant references, of size @count. */ + ioctl_gntdev_grant_ref *refs; +} ioctl_gntdev_map_grant_ref; + +typedef struct ioctl_gntdev_unmap_grant_ref { + /* IN parameters */ + /* The offset was returned by the corresponding map operation. */ + uint64_t index; + /* The number of pages to be unmapped. */ + uint32_t count; + uint32_t pad; +} ioctl_gntdev_unmap_grant_ref; + +typedef struct ioctl_gntdev_get_offset_for_vaddr { + /* IN parameters */ + /* The virtual address of the first mapped page in a range. */ + uint64_t vaddr; + /* OUT parameters */ + /* The offset that was used in the initial mmap() operation. */ + uint64_t offset; + /* The number of pages mapped in the VM area that begins at @vaddr. */ + uint32_t count; + uint32_t pad; +} ioctl_gntdev_get_offset_for_vaddr; + +/* + * Inserts the grant references into the mapping table of an instance + * of gntdev. N.B. This does not perform the mapping, which is deferred + * until mmap() is called with @index as the offset. + */ +#define IOCTL_GNTDEV_MAP_GRANT_REF \ + _IOWR('G', 0, ioctl_gntdev_map_grant_ref) + +/* + * Removes the grant references from the mapping table of an instance of + * of gntdev. N.B. munmap() must be called on the relevant virtual address(es) + * before this ioctl is called, or an error will result. + */ +#define IOCTL_GNTDEV_UNMAP_GRANT_REF \ + _IOW('G', 1, ioctl_gntdev_unmap_grant_ref) + +/* + * Returns the offset in the driver's address space that corresponds + * to @vaddr. This can be used to perform a munmap(), followed by an + * UNMAP_GRANT_REF ioctl, where no state about the offset is retained by + * the caller. The number of pages that were allocated at the same time as + * @vaddr is returned in @count. + * + * N.B. Where more than one page has been mapped into a contiguous range, the + * supplied @vaddr must correspond to the start of the range; otherwise + * an error will result. It is only possible to munmap() the entire + * contiguously-allocated range at once, and not any subrange thereof. + */ +#define IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR \ + _IOWR('G', 2, ioctl_gntdev_get_offset_for_vaddr) + + #endif /* __XEN_XENIO_H__ */ diff --git a/sys/arch/xen/x86/x86_xpmap.c b/sys/arch/xen/x86/x86_xpmap.c index ebb0567..2c71a8a 100644 --- a/sys/arch/xen/x86/x86_xpmap.c +++ b/sys/arch/xen/x86/x86_xpmap.c @@ -173,6 +173,9 @@ void xpq_debug_dump(void); static mmu_update_t xpq_queue_array[MAXCPUS][XPQUEUE_SIZE]; static int xpq_idx_array[MAXCPUS]; +paddr_t grant_pte[XPQUEUE_SIZE]; +static int initialized = 0; + #ifdef i386 extern union descriptor tmpgdt[]; #endif /* i386 */ @@ -180,6 +183,7 @@ void xpq_flush_queue(void) { int i, ok = 0, ret; + int j; mmu_update_t *xpq_queue = xpq_queue_array[curcpu()->ci_cpuid]; int xpq_idx = xpq_idx_array[curcpu()->ci_cpuid]; @@ -189,6 +193,26 @@ xpq_flush_queue(void) XENPRINTK2(("%d: 0x%08" PRIx64 " 0x%08" PRIx64 "\n", i, xpq_queue[i].ptr, xpq_queue[i].val)); + if (initialized == 0) { + memset(grant_pte, 0, sizeof(grant_pte[0]) * XPQUEUE_SIZE); + initialized = 1; + } + + /* XXX: This is the other part of the lame hack, + * Ptes that hold references to grant frames should not + * be modified, or we will not be able to unmap them! + */ + for (i = 0; i < 2048; i++) { + if (grant_pte[i] == 0) + continue; + for(j = 0; j < xpq_idx; j++) { + if (xpq_queue[j].ptr == grant_pte[i]) { + panic("bang: %p -> %p", (void *) xpq_queue[j].ptr, + (void *) xpq_queue[j].val); + } + } + } + retry: ret = HYPERVISOR_mmu_update_self(xpq_queue, xpq_idx, &ok); diff --git a/sys/arch/xen/x86/xen_shm_machdep.c b/sys/arch/xen/x86/xen_shm_machdep.c index d47745c..ba99b7c 100644 --- a/sys/arch/xen/x86/xen_shm_machdep.c +++ b/sys/arch/xen/x86/xen_shm_machdep.c @@ -35,6 +35,7 @@ __KERNEL_RCSID(0, "$NetBSD: xen_shm_machdep.c,v 1.10 2011/09/02 22:25:08 dyoung #include <sys/queue.h> #include <sys/vmem.h> #include <sys/kernel.h> +#include <sys/malloc.h> #include <uvm/uvm.h> #include <machine/pmap.h> @@ -116,7 +117,6 @@ xen_shm_init(void) } } -int xen_shm_map(int nentries, int domid, grant_ref_t *grefp, vaddr_t *vap, grant_handle_t *handlep, int flags) { @@ -185,6 +185,74 @@ xen_shm_map(int nentries, int domid, grant_ref_t *grefp, vaddr_t *vap, return 0; } +int +xen_shm_map_pte(int nentries, int *domid, grant_ref_t *grefp, + pt_entry_t **pte, grant_handle_t *handlep, int flags) +{ + int i; + int err; + gnttab_map_grant_ref_t op[XENSHM_MAX_PAGES_PER_REQUEST]; + +#ifdef DIAGNOSTIC + if (nentries > XENSHM_MAX_PAGES_PER_REQUEST) { + printf("xen_shm_map_pte: %d entries\n", nentries); + panic("xen_shm_map_pte"); + } +#endif + + for (i = 0; i < nentries; i++) { + op[i].host_addr = xpmap_ptetomach(pte[i]); + op[i].dom = domid[i]; + op[i].ref = grefp[i]; + op[i].flags = GNTMAP_host_map | GNTMAP_contains_pte | + GNTMAP_application_map | + ((flags & XSHM_RO) ? GNTMAP_readonly : 0); + } + err = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, op, nentries); + if (__predict_false(err)) + panic("xen_shm_map_pte: HYPERVISOR_grant_table_op failed"); + for (i = 0; i < nentries; i++) { + if (__predict_false(op[i].status)) { + /* On error, unmap mapped grefs and return */ + xen_shm_unmap_pte(i, pte, handlep); + return op[i].status; + } + handlep[i] = op[i].handle; + } + return 0; +} + +int +xen_shm_unmap_pte(int nentries, pt_entry_t **pte, grant_handle_t *handlep) +{ + gnttab_unmap_grant_ref_t op[XENSHM_MAX_PAGES_PER_REQUEST]; + int ret; + int i; + +#ifdef DIAGNOSTIC + if (nentries > XENSHM_MAX_PAGES_PER_REQUEST) { + printf("xen_shm_unmap_pte: %d entries\n", nentries); + panic("xen_shm_unmap_pte"); + } +#endif + + for (i = 0; i < nentries; i++) { + op[i].host_addr = xpmap_ptetomach(pte[i]); + op[i].dev_bus_addr = 0; + op[i].handle = handlep[i]; + } + ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, + op, nentries); + if (__predict_false(ret)) + panic("xen_shm_unmap_pte: unmap failed"); + for (i = 0; i < nentries; i++) { + if(__predict_false(op[i].status)) { + return op[i].status; + } + } + return 0; +} + void xen_shm_unmap(vaddr_t va, int nentries, grant_handle_t *handlep) { diff --git a/sys/arch/xen/xen/gntdev.c b/sys/arch/xen/xen/gntdev.c new file mode 100644 index 0000000..5ac5098 --- /dev/null +++ b/sys/arch/xen/xen/gntdev.c @@ -0,0 +1,492 @@ +/* + * Copyright (c) 2012 Roger Pau Monné. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + + +#include <sys/cdefs.h> + +#include "opt_xen.h" + +#include <sys/param.h> +#include <sys/malloc.h> +#include <sys/mutex.h> +#include <sys/file.h> +#include <sys/filedesc.h> +#include <sys/conf.h> + +#include <uvm/uvm.h> + +#include <xen/xen_shm.h> +#include <xen/xenio.h> + +extern paddr_t grant_pte[2048]; + +void gntdevattach(int n); + +#define freem(va) \ + if (va) free(va, M_DEVBUF) + +#define GNTDEBUG +#ifdef GNTDEBUG + #define debug(M, ...) \ + printk("gntdev:%d: " M "\n", __LINE__, ##__VA_ARGS__) +#else + #define debug(M, ...) +#endif + +#define error(M, ...) \ + printk("gntdev:%d error:" M "\n", __LINE__, ##__VA_ARGS__) + +#define VA_FREE 0 + +static int gntdev_fioctl(struct file *, u_long, void *); +static int gntdev_fclose(struct file *); + +static const struct fileops gntdev_fileops = { + .fo_read = fbadop_read, + .fo_write = fbadop_write, + .fo_ioctl = gntdev_fioctl, + .fo_fcntl = fnullop_fcntl, + .fo_poll = fnullop_poll, + .fo_stat = fbadop_stat, + .fo_close = gntdev_fclose, + .fo_kqfilter = fnullop_kqfilter, + .fo_restart = fnullop_restart, +}; + +dev_type_open(gntdev_open); + +const struct cdevsw gntdev_cdevsw = { + gntdev_open, nullclose, noread, nowrite, noioctl, + nostop, notty, nopoll, nommap, nokqfilter, D_OTHER +}; + +struct gntmap { + struct uvm_object uobj; + struct vm_map *vmap; + LIST_ENTRY(gntmap) next_map; + int index; + int count; + grant_ref_t *grants; + int *domids; + vaddr_t va; + grant_handle_t *handles; + pd_entry_t **pte; + bool ro; + bool mapped; +}; + +struct gntproc { + LIST_HEAD(,gntmap) maps; + kmutex_t lock; + struct lwp *lwp; + unsigned int num_maps; +}; + +static void +gntdev_insert_map(struct gntproc *proc, struct gntmap *map); +static struct gntmap * +gntdev_find_map(struct gntproc *proc, int index, int count); +static struct gntmap * +gntdev_find_vaddr(struct gntproc *proc, vaddr_t va); +static void +gntdev_remove_map(struct gntproc *proc, struct gntmap *map); + +/* --- Helpers --- */ + +static void +gntdev_insert_map(struct gntproc *proc, struct gntmap *map) +{ + struct gntmap *tmap; + + mutex_enter(&proc->lock); + proc->num_maps++; + if (LIST_EMPTY(&proc->maps)) { + LIST_INSERT_HEAD(&proc->maps, map, next_map); + goto out; + } + LIST_FOREACH(tmap, &proc->maps, next_map) { + if (map->index + map->count < tmap->index) { + LIST_INSERT_BEFORE(tmap, map, next_map); + goto out; + } + map->index = tmap->index + tmap->count; + if (LIST_NEXT(tmap, next_map) == NULL) { + LIST_INSERT_AFTER(tmap, map, next_map); + goto out; + } + } + +out: + mutex_exit(&proc->lock); + return; +} + +static struct gntmap * +gntdev_find_map(struct gntproc *proc, int index, int count) +{ + struct gntmap *map = NULL; + + mutex_enter(&proc->lock); + if (LIST_EMPTY(&proc->maps)) + goto out; + + LIST_FOREACH(map, &proc->maps, next_map) { + if (index != map->index) { + continue; + } + if (count && count != map->count) { + continue; + } + goto out; + } + map = NULL; + +out: + mutex_exit(&proc->lock); + return map; +} + +static struct gntmap * +gntdev_find_vaddr(struct gntproc *proc, vaddr_t va) +{ + struct gntmap *map = NULL; + + mutex_enter(&proc->lock); + if (LIST_EMPTY(&proc->maps)) + goto out; + + LIST_FOREACH(map, &proc->maps, next_map) { + if (va >= map->va && va < (map->va + (map->count * PAGE_SIZE))) + goto out; + } + map = NULL; + +out: + mutex_exit(&proc->lock); + return map; +} + +static void +gntdev_remove_map(struct gntproc *proc, struct gntmap *map) +{ + int i, j; + + mutex_enter(&proc->lock); + LIST_REMOVE(map, next_map); + proc->num_maps--; + mutex_exit(&proc->lock); + if (map->mapped) { + debug("unmapping map at index: %d", map->index); + if (xen_shm_unmap_pte(map->count, map->pte, map->handles)) { + error("unable to unmap grant references for index %d", map->index); + } + /* XXX: Since we have unmapped the grants, remove the protection */ + for (i = 0; i < map->count; i++) { + for (j = 0; j < 2048; j++) { + if (grant_pte[j] == xpmap_ptetomach(map->pte[i])) { + grant_pte[j] = 0; + break; + } + } + } + map->mapped = false; + } + free(map->grants, M_DEVBUF); + free(map->handles, M_DEVBUF); + free(map->domids, M_DEVBUF); + free(map->pte, M_DEVBUF); + free(map, M_DEVBUF); +} + +static int +gntmap_grant_ref(struct gntmap *map) +{ + int i, j, rc; + pt_entry_t *ptep; + pd_entry_t *ptes; + pd_entry_t * const *pdes; + pmap_t pmap = vm_map_pmap(map->vmap); + struct pmap *pmap2; + + memset(map->handles, -1, sizeof(map->handles[0]) * map->count); + + /* Lock pmap for the operation */ + kpreempt_disable(); + pmap_map_ptes(pmap, &pmap2, &ptes, &pdes); + for (i = 0; i < map->count; i++) { + /* Get ptes to pass to the grant table operation */ + ptep = &ptes[pl1_i(map->va + (i * PAGE_SIZE))]; + if (!pmap_valid_entry(*ptep)) { + error("pte at %p not valid", ptep); + rc = EINVAL; + goto out; + } + map->pte[i] = ptep; + } + + rc = xen_shm_map_pte(map->count, map->domids, map->grants, map->pte, + map->handles, map->ro ? XSHM_RO : 0); + if (rc) { + error("unable to map ptes"); + goto out; + } + + /* XXX: This is a lame debug hack to check if someone (UVM) + * is modifying those ptes behind our back. + * + * Ptes used to map grant refs should not be modified, or we will + * not be able to unmap them! + */ + for (i = 0; i < map->count; i++) { + debug("VA: %p *pte: %p pte: %p *pte maddr: %p", + map->va + (i * PAGE_SIZE), map->pte[i], *(map->pte[i]), + xpmap_ptetomach(map->pte[i])); + for (j = 0; j < 2048; j++) { + if (grant_pte[j] == 0) { + grant_pte[j] = xpmap_ptetomach(map->pte[i]); + break; + } + } + } + + rc = 0; +out: + pmap_unmap_ptes(pmap, pmap2); + kpreempt_enable(); + return rc; +} + +/* --- ioctl handlers --- */ + +static int +gntdev_ioctl_map_grant_ref(struct gntproc *proc, + ioctl_gntdev_map_grant_ref *map_grants) +{ + grant_ref_t *refs = NULL; + grant_handle_t *handles = NULL; + int *domids = NULL; + pt_entry_t **pte = NULL; + struct gntmap *map = NULL; + struct vm_map *vmm; + ioctl_gntdev_grant_ref ioctl_map; + int i, rc; + vaddr_t va0; + + if (gntdev_find_vaddr(proc, map_grants->vaddr)) { + error("memory area %p already in use", (void *) map_grants->vaddr); + rc = EINVAL; + goto error; + } + + debug("mapping %d refs", map_grants->count); + + refs = malloc(sizeof(*refs) * map_grants->count, M_DEVBUF, + M_WAITOK | M_ZERO); + handles = malloc(sizeof(*handles) * map_grants->count, M_DEVBUF, + M_WAITOK | M_ZERO); + domids = malloc(sizeof(*domids) * map_grants->count, M_DEVBUF, + M_WAITOK | M_ZERO); + pte = malloc(sizeof(*pte) * map_grants->count, M_DEVBUF, + M_WAITOK | M_ZERO); + + for (i = 0; i < map_grants->count; i++) { + rc = copyin(&map_grants->refs[i], &ioctl_map, sizeof(ioctl_map)); + if (rc != 0) { + error("unable to copyin grant ref info %d", i); + goto error; + } + debug("mapping ref: %u Dom: %u", ioctl_map.ref, ioctl_map.domid); + refs[i] = ioctl_map.ref; + domids[i] = ioctl_map.domid; + } + map = malloc(sizeof(*map), M_DEVBUF, + M_WAITOK | M_ZERO); + vmm = &proc->lwp->l_proc->p_vmspace->vm_map; + va0 = map_grants->vaddr & ~PAGE_MASK; + vm_map_lock_read(vmm); + if (uvm_map_checkprot(vmm, va0, va0 + (map_grants->count << PGSHIFT) - 1, + VM_PROT_WRITE)) { + map->ro = false; + } else if (uvm_map_checkprot(vmm, va0, + va0 + (map_grants->count << PGSHIFT) - 1, VM_PROT_READ)) { + map->ro = true; + } else { + error("unable check protection"); + rc = EINVAL; + vm_map_unlock_read(vmm); + goto error; + } + vm_map_unlock_read(vmm); + map->grants = refs; + map->handles = handles; + map->pte = pte; + map->domids = domids; + map->va = map_grants->vaddr; + map->count = map_grants->count; + map->vmap = vmm; + map->index = 0; + map->mapped = false; + + rc = gntmap_grant_ref(map); + if (rc) { + error("map_grant_ref failed"); + goto error; + } + map->mapped = true; + gntdev_insert_map(proc, map); + map_grants->index = map->index << PAGE_SHIFT; + debug("gntrefs mapped at index %" PRIu64 "", map->index); + return 0; + +error: + freem(refs); + freem(handles); + freem(pte); + freem(domids); + freem(map); + error("unable to map grant refs"); + return rc; +} + +static int +gntdev_ioctl_unmap_grant_ref(struct gntproc *proc, + ioctl_gntdev_unmap_grant_ref *unmap_grants) +{ + struct gntmap *map; + uint64_t index = unmap_grants->index >> PAGE_SHIFT; + int rc = 0; + + map = gntdev_find_map(proc, index, unmap_grants->count); + if (map == NULL) { + error("unable to find index %" PRIu64, index); + rc = EINVAL; + goto out; + } + gntdev_remove_map(proc, map); +out: + return rc; +} + +static int +gntdev_ioctl_get_offset_vaddr(struct gntproc *proc, + ioctl_gntdev_get_offset_for_vaddr *offset_vaddr) +{ + struct gntmap *map; + int rc = 0; + + debug("find offset va: %p", (void *)offset_vaddr->vaddr); + + map = gntdev_find_vaddr(proc, offset_vaddr->vaddr); + if (map == NULL) { + error("unable to find vaddr"); + rc = EINVAL; + goto out; + } + + offset_vaddr->offset = map->index << PAGE_SHIFT; + offset_vaddr->count = map->count; + +out: + return rc; +} + +/* --- Device ops handlers --- */ + +static int +gntdev_fioctl(struct file *fp, u_long cmd, void *addr) +{ + struct gntproc *proc = fp->f_data; + ioctl_gntdev_map_grant_ref *map_grants; + ioctl_gntdev_unmap_grant_ref *unmap_grants; + ioctl_gntdev_get_offset_for_vaddr *offset_vaddr; + int rc; + + switch (cmd) { + case IOCTL_GNTDEV_MAP_GRANT_REF: + map_grants = addr; + rc = gntdev_ioctl_map_grant_ref(proc, map_grants); + break; + case IOCTL_GNTDEV_UNMAP_GRANT_REF: + unmap_grants = addr; + rc = gntdev_ioctl_unmap_grant_ref(proc, unmap_grants); + break; + case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR: + offset_vaddr = addr; + rc = gntdev_ioctl_get_offset_vaddr(proc, offset_vaddr); + break; + default: + error("unknown ioctl 0x%08lu", cmd); + rc = EINVAL; + } + return rc; +} + +int +gntdev_open(dev_t dev, int flags, int mode, struct lwp *l) +{ + struct gntproc *proc; + struct file *fp; + int fd, rc; + + rc = fd_allocfile(&fp, &fd); + if (rc) + return rc; + + proc = malloc(sizeof(*proc), M_DEVBUF, M_WAITOK | M_ZERO); + mutex_init(&proc->lock, MUTEX_DEFAULT, IPL_NONE); + LIST_INIT(&proc->maps); + proc->lwp = l; + proc->num_maps = 0; + debug("opened for proc %p", l); + return fd_clone(fp, fd, flags, &gntdev_fileops, proc); +} + +static int +gntdev_fclose(struct file *fp) +{ + struct gntproc *proc = fp->f_data; + struct gntmap *map; + + mutex_enter(&proc->lock); + while (LIST_FIRST(&proc->maps) != NULL) { + map = LIST_FIRST(&proc->maps); + mutex_exit(&proc->lock); + gntdev_remove_map(proc, map); + mutex_enter(&proc->lock); + } + KASSERT(proc->num_maps == 0); + mutex_exit(&proc->lock); + mutex_destroy(&proc->lock); + debug("closed device for proc %p", proc->lwp); + free(proc, M_DEVBUF); + return 0; +} + +void +gntdevattach(int n) +{ + debug("attached"); + return; +} diff --git a/sys/dev/DEVNAMES b/sys/dev/DEVNAMES index 45cf018..765fe45 100644 --- a/sys/dev/DEVNAMES +++ b/sys/dev/DEVNAMES @@ -1517,6 +1517,7 @@ xdc MI xdc sun3 xe next68k xel x68k +gntdev xen xencons xen xenevt xen xennet xen diff --git a/sys/rump/librump/rumpkern/devsw.c b/sys/rump/librump/rumpkern/devsw.c index 5a1af01..e513885 100644 --- a/sys/rump/librump/rumpkern/devsw.c +++ b/sys/rump/librump/rumpkern/devsw.c @@ -134,6 +134,7 @@ struct devsw_conv devsw_conv0[] = { { "rd", 22, 105, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, { "ct", 23, 106, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, { "mt", 24, 107, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, + { "gntdev", -1, 140, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, { "xenevt", -1, 141, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, { "xbd", 142, 142, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, { "xencons", -1, 143, DEVNODE_DONTBOTHER, 0, { 0, 0 }}, -- 1.7.7.5 (Apple Git-26)