[kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Hi all, Just finished my prototype of inter-guest virtio, using networking as an example. Each guest mmaps the other's address space and uses a FIFO for notifications. There are two issues with this approach. The first is that neither guest can change its mappings. See patch 1. The second is that our feature configuration is host presents, guest chooses which breaks down when we don't know the capabilities of each guest. In particular, TSO capability for networking. There are three possible solutions: 1) Just offer the lowest common denominator to both sides (ie. no features). This is what I do with lguest in these patches. 2) Offer something and handle the case where one Guest accepts and another doesn't by emulating it. ie. de-TSO the packets manually. 3) Hot unplug the device from the guest which asks for the greater features, then re-add it offering less features. Requires hotplug in the guest OS. I haven't tuned or even benchmarked these patches, but it pings! Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- Documentation/lguest/lguest.c | 59 -- 1 file changed, 40 insertions(+), 19 deletions(-) diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -236,19 +236,51 @@ static int open_or_die(const char *name, return fd; } -/* map_zeroed_pages() takes a number of pages. */ +/* unlink_memfile() removes the backing file for the Guest's memory, if we exit + * cleanly. */ +static char memfile_path[PATH_MAX]; + +static void unlink_memfile(void) +{ + unlink(memfile_path); +} + +/* map_zeroed_pages() takes a number of pages, and creates a mapping file where + * this Guest's memory lives. */ static void *map_zeroed_pages(unsigned int num) { - int fd = open_or_die(/dev/zero, O_RDONLY); + int fd; void *addr; - /* We use a private mapping (ie. if we write to the page, it will be -* copied). */ + /* We create a .lguest directory in the user's home, to put the memory +* files into. */ + snprintf(memfile_path, PATH_MAX, %s/.lguest, getenv(HOME) ?: ); + if (mkdir(memfile_path, S_IRWXU) != 0 errno != EEXIST) + err(1, Creating directory %s, memfile_path); + + /* Name the memfiles by the process ID of this launcher. */ + snprintf(memfile_path, PATH_MAX, %s/.lguest/%u, +getenv(HOME) ?: , getpid()); + fd = open(memfile_path, O_RDWR | O_CREAT | O_TRUNC, S_IRWXU); + if (fd 0) + err(1, Creating memory backing file %s, memfile_path); + + /* Make sure we remove it when we're finished. */ + atexit(unlink_memfile); + + /* Now, we opened it with O_TRUNC, so the file is 0 bytes long. Here +* we expand it to the length we need, and it will be filled with +* zeroes. */ + if (ftruncate(fd, num * getpagesize()) != 0) + err(1, Truncating file %s %u pages, memfile_path, num); + + /* We use a shared mapping, so others can share with us. */ addr = mmap(NULL, getpagesize() * num, - PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0); + PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) err(1, Mmaping %u pages of /dev/zero, num); + verbose(Memory backing file is %s @ %p\n, memfile_path, addr); return addr; } @@ -263,23 +295,12 @@ static void *get_pages(unsigned int num) return addr; } -/* This routine is used to load the kernel or initrd. It tries mmap, but if - * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries), - * it falls back to reading the memory in. */ +/* This routine is used to load the kernel or initrd. We used to mmap, but now + * we simply read it in, so it will be present in the shared underlying + * file. */ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) { ssize_t r; - - /* We map writable even though for some segments are marked read-only. -* The kernel really wants to be writable: it patches its own -* instructions. -* -* MAP_PRIVATE means that the page won't be copied until a write is -* done to it. This allows us to share untouched memory between -* Guests. */ - if (mmap(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC, -MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED) - return; /* pread does a seek and a read in one shot: saves a few lines. */ r = pread(fd, addr, len, offset); - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC PATCH 2/5] lguest: Encapsulate Guest memory ready for dealing with other Guests.
We currently keep Guest memory pointer and size in globals. We move this into a structure and explicitly hand that to to_guest_phys() and from_guest_phys() so we can deal with other Guests' memory. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- Documentation/lguest/lguest.c | 89 +++--- 1 file changed, 49 insertions(+), 40 deletions(-) diff -r 95558c7d210e Documentation/lguest/lguest.c --- a/Documentation/lguest/lguest.c Thu Mar 13 14:11:40 2008 +1100 +++ b/Documentation/lguest/lguest.c Thu Mar 13 23:05:35 2008 +1100 @@ -76,10 +76,20 @@ static bool verbose; /* The pipe to send commands to the waker process */ static int waker_fd; -/* The pointer to the start of guest memory. */ -static void *guest_base; -/* The maximum guest physical address allowed, and maximum possible. */ -static unsigned long guest_limit, guest_max; + +struct guest_memory +{ + /* The pointer to the start of guest memory. */ + void *base; + /* The maximum guest physical address allowed. */ + unsigned long limit; +}; + +/* The maximum possible page for the guest. */ +static unsigned long guest_max; + +/* This Guest's memory. */ +static struct guest_memory gmem; /* a per-cpu variable indicating whose vcpu is currently running */ static unsigned int __thread cpu_id; @@ -207,20 +217,19 @@ static u8 *get_feature_bits(struct devic * will get you through this section. Or, maybe not. * * The Launcher sets up a big chunk of memory to be the Guest's physical - * memory and stores it in guest_base. In other words, Guest physical == - * Launcher virtual with an offset. + * memory. In other words, Guest physical == Launcher virtual with an offset. * * This can be tough to get your head around, but usually it just means that we * use these trivial conversion functions when the Guest gives us it's * physical addresses: */ -static void *from_guest_phys(unsigned long addr) +static void *from_guest_phys(struct guest_memory *mem, unsigned long addr) { - return guest_base + addr; + return mem-base + addr; } -static unsigned long to_guest_phys(const void *addr) +static unsigned long to_guest_phys(struct guest_memory *mem, const void *addr) { - return (addr - guest_base); + return (addr - mem-base); } /*L:130 @@ -287,10 +296,10 @@ static void *map_zeroed_pages(unsigned i /* Get some more pages for a device. */ static void *get_pages(unsigned int num) { - void *addr = from_guest_phys(guest_limit); + void *addr = from_guest_phys(gmem, gmem.limit); - guest_limit += num * getpagesize(); - if (guest_limit guest_max) + gmem.limit += num * getpagesize(); + if (gmem.limit guest_max) errx(1, Not enough memory for devices); return addr; } @@ -351,7 +360,7 @@ static unsigned long map_elf(int elf_fd, i, phdr[i].p_memsz, (void *)phdr[i].p_paddr); /* We map this section of the file at its physical address. */ - map_at(elf_fd, from_guest_phys(phdr[i].p_paddr), + map_at(elf_fd, from_guest_phys(gmem, phdr[i].p_paddr), phdr[i].p_offset, phdr[i].p_filesz); } @@ -371,7 +380,7 @@ static unsigned long load_bzimage(int fd struct boot_params boot; int r; /* Modern bzImages get loaded at 1M. */ - void *p = from_guest_phys(0x10); + void *p = from_guest_phys(gmem, 0x10); /* Go back to the start of the file and read the header. It should be * a Linux boot header (see Documentation/i386/boot.txt) */ @@ -444,7 +453,7 @@ static unsigned long load_initrd(const c /* We map the initrd at the top of memory, but mmap wants it to be * page-aligned, so we round the size up for that. */ len = page_align(st.st_size); - map_at(ifd, from_guest_phys(mem - len), 0, st.st_size); + map_at(ifd, from_guest_phys(gmem, mem - len), 0, st.st_size); /* Once a file is mapped, you can close the file descriptor. It's a * little odd, but quite useful. */ close(ifd); @@ -473,7 +482,7 @@ static unsigned long setup_pagetables(un linear_pages = (mapped_pages + ptes_per_page-1)/ptes_per_page; /* We put the toplevel page directory page at the top of memory. */ - pgdir = from_guest_phys(mem) - initrd_size - getpagesize(); + pgdir = from_guest_phys(gmem, mem) - initrd_size - getpagesize(); /* Now we use the next linear_pages pages as pte pages */ linear = (void *)pgdir - linear_pages*getpagesize(); @@ -487,16 +496,16 @@ static unsigned long setup_pagetables(un /* The top level points to the linear page table pages above. */ for (i = 0; i mapped_pages; i += ptes_per_page) { pgdir[i/ptes_per_page] - = ((to_guest_phys(linear) + i*sizeof(void *)) + = ((to_guest_phys(gmem,
[kvm-devel] [RFC PATCH 3/5] lguest: separate out virtqueue info from device info.
To deal with other Guest's virtqueue, we need to separate out the parts of the structure which deal with the actual virtqueue from configuration information and the device. Then we can change the virtqueue descriptor handling functions to take that smaller structure. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- Documentation/lguest/lguest.c | 142 ++ 1 file changed, 76 insertions(+), 66 deletions(-) diff -r 49ed4fa72c7c Documentation/lguest/lguest.c --- a/Documentation/lguest/lguest.c Mon Mar 17 15:33:54 2008 +1100 +++ b/Documentation/lguest/lguest.c Mon Mar 17 22:33:20 2008 +1100 @@ -148,6 +148,18 @@ struct device }; /* The virtqueue structure describes a queue attached to a device. */ +struct virtqueue_info +{ + /* The memory this virtqueue sits in (usually gmem, our Guest). */ + struct guest_memory *mem; + + /* The actual ring of buffers. */ + struct vring vring; + + /* Last available index we saw. */ + u16 last_avail_idx; +}; + struct virtqueue { struct virtqueue *next; @@ -158,11 +170,8 @@ struct virtqueue /* The configuration for this queue. */ struct lguest_vqconfig config; - /* The actual ring of buffers. */ - struct vring vring; - - /* Last available index we saw. */ - u16 last_avail_idx; + /* Information about the Guest's virtqueue. */ + struct virtqueue_info vqi; /* The routine to call when the Guest pings us. */ void (*handle_output)(int fd, struct virtqueue *me); @@ -656,7 +665,7 @@ static void *_check_pointer(unsigned lon errx(1, %s:%i: Invalid address %#lx, __FILE__, line, addr); /* We return a pointer for the caller's convenience, now we know it's * safe to use. */ - return from_guest_phys(gmem, addr); + return from_guest_phys(mem, addr); } /* A macro which transparently hands the line number to the real function. */ #define check_pointer(mem,addr,size) _check_pointer(addr, size, mem, __LINE__) @@ -664,20 +673,20 @@ static void *_check_pointer(unsigned lon /* Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, or vq-vring.num if we're * at the end. */ -static unsigned next_desc(struct virtqueue *vq, unsigned int i) +static unsigned next_desc(struct virtqueue_info *vqi, unsigned int i) { unsigned int next; /* If this descriptor says it doesn't chain, we're done. */ - if (!(vq-vring.desc[i].flags VRING_DESC_F_NEXT)) - return vq-vring.num; + if (!(vqi-vring.desc[i].flags VRING_DESC_F_NEXT)) + return vqi-vring.num; /* Check they're not leading us off end of descriptors. */ - next = vq-vring.desc[i].next; + next = vqi-vring.desc[i].next; /* Make sure compiler knows to grab that: we don't want it changing! */ wmb(); - if (next = vq-vring.num) + if (next = vqi-vring.num) errx(1, Desc next is %u, next); return next; @@ -688,29 +697,29 @@ static unsigned next_desc(struct virtque * number of output then some number of input descriptors, it's actually two * iovecs, but we pack them into one and note how many of each there were. * - * This function returns the descriptor number found, or vq-vring.num (which - * is never a valid descriptor number) if none was found. */ -static unsigned get_vq_desc(struct virtqueue *vq, - struct iovec iov[], - unsigned int *out_num, unsigned int *in_num) + * This function returns the descriptor number found, or -1 if none was + * found. */ +static int get_vq_desc(struct virtqueue_info *vqi, + struct iovec iov[], + unsigned int *out_num, unsigned int *in_num) { unsigned int i, head; /* Check it isn't doing very strange things with descriptor numbers. */ - if ((u16)(vq-vring.avail-idx - vq-last_avail_idx) vq-vring.num) + if ((u16)(vqi-vring.avail-idx - vqi-last_avail_idx) vqi-vring.num) errx(1, Guest moved used index from %u to %u, -vq-last_avail_idx, vq-vring.avail-idx); +vqi-last_avail_idx, vqi-vring.avail-idx); /* If there's nothing new since last we looked, return invalid. */ - if (vq-vring.avail-idx == vq-last_avail_idx) - return vq-vring.num; + if (vqi-vring.avail-idx == vqi-last_avail_idx) + return -1; /* Grab the next descriptor number they're advertising, and increment * the index we've seen. */ - head = vq-vring.avail-ring[vq-last_avail_idx++ % vq-vring.num]; + head = vqi-vring.avail-ring[vqi-last_avail_idx++ % vqi-vring.num]; /* If their number is silly, that's a fatal mistake. */ - if (head = vq-vring.num) + if (head = vqi-vring.num)
Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make sync in userspace
Avi Kivity wrote: Zhang, Xiantao wrote: Hi, Avi Currently, make sync in userspace only syncs x86-specific heads from kernel source due to hard-coded in Makefile. Do you have plan to provide cross-arch support for that? No plans. I'll apply patches though. But don't you need kernel changes which make it impossible to run kvm-ia64 on older kernels? Other archs may need it for save/restore :) Save/restore? Don't understand. You know, currently make sync would sync header files to userspace from include/asm-x86/, so kvm.h and kvm_host.h are always synced from there for any archs. Since some arch-specific stuff for save/restore should be defined in include/asm-$arch/(kvm.h; kvm_host.h), so ia64 or other archs should need it when they implement save/restore. Xiantao - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make sync in userspace
Zhang, Xiantao wrote: Avi Kivity wrote: Zhang, Xiantao wrote: Avi Kivity wrote: Zhang, Xiantao wrote: Hi, Avi Currently, make sync in userspace only syncs x86-specific heads from kernel source due to hard-coded in Makefile. Do you have plan to provide cross-arch support for that? No plans. I'll apply patches though. But don't you need kernel changes which make it impossible to run kvm-ia64 on older kernels? Other archs may need it for save/restore :) Save/restore? Don't understand. You know, currently make sync would sync header files to userspace from include/asm-x86/, so kvm.h and kvm_host.h are always synced from there for any archs. Since some arch-specific stuff for save/restore should be defined in include/asm-$arch/(kvm.h; kvm_host.h), so ia64 or other archs should need it when they implement save/restore. I see. But is 'make sync' actually useful for you? Can you run kvm-ia64 on top of 2.6.24, which doesn't include your ia64 core API changes? Now we don't intend to provide support for kernel which is older than 2.6.24. And we don't want to compile kernel module in userspace. But at least we need to ensure make sync work first, because we need it to guarantee Qemu to use right header files for its compilation. Xiantao I see. ./configure --with-patched-kernel should work for that, but I have no issue with copying include/asm-ia64 either. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make syncin userspace
Avi Kivity wrote: Zhang, Xiantao wrote: Avi Kivity wrote: I see. ./configure --with-patched-kernel should work for that, but I have no issue with copying include/asm-ia64 either. Copy should be ugly, since it needs extral documentation to describle. If --with-patched-kernel can call a script, that should be fine as well. Xiantao - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file
On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote: + snprintf(memfile_path, PATH_MAX, %s/.lguest, getenv(HOME) ?: ); Hi Rusty, Is that safe if being run via setuid/gid or shared root? It might be better to just look it up in /etc/passwd against the real UID, considering that anyone can change (or null) that env string. Of course its also practical to just say DON'T RUN LGUEST AS SETUID/GID. Even if you say that, someone will do it. You might also add beware of sudoers. For people (like myself and lab mates) who are forced to share machines, it could breed a whole new strain of practical jokes :) That will cause lguest to inherit a memory leak from getpwuid(), but it only leaks once. Cheers, --Tim - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-guest-drivers patch for RHEL4
david ahern wrote: Backport of the virtio drivers to RHEL4. The patch applies against the kvm-guest-drivers-linux-1 release but also contains diffs for Anthony's spin_lock_irqsave/restore patch. Of note is that to build for RHEL4 Makefile is renamed to Makefile-2.6 so that Makefile can contain the build rules for the modules. RHEL4 (AFAIK) does not contain the Kbuild stuff. The make command is then make -f Makefile-2.6 Looks much better than I feared, the #ifdef COMPAT_RHEL4s are quite small. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] PATCH: dont call exit() from pci_nic_init(), let caller handle
On Wed, Mar 19, 2008 at 07:19:51PM -0500, Ryan Harper wrote: While exploring the PCI hotplug code recently posted, I encountered a situation where I don't believe the current behavior is ideal. With hotplug, we can add additional pci-based nic devices like e1000 and rtl8139 from the qemu monitor. If one mistakenly specifies model=ne2000 (the ISA version), qemu just exits. If a command is run from the monitor and specifies bogus values, I don't believe the right behavior is to exit out of the guest entirely. The attached patch (which doesn't apply directly against qemu-cvs since hotplug hasn't been merged) changes pci_nic_init() to return NULL on error instead of exiting and then I've replaced all callers to check the return value and exit(), preserving the existing behavior, but allowing flexibility so hotplug can do the right thing and just report the error rather than exiting the guest. Hi Ryan, Looks good, thanks. There might still be some exit()'s lurking around due to device/cpu hot/add failure. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1920897 ] KVM Guest Drivers fail on Windows
Bugs item #1920897, was opened at 2008-03-20 14:01 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1920897group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: KVM Guest Drivers fail on Windows Initial Comment: KVM Guest Drivers v1 fail on Windows XP (SP2 ACPI) - it basically halts the VM as soon as the drivers get installed. -Alexey, 20.3.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1920897group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1920900 ] KVM Guest Drivers fail on Linux
Bugs item #1920900, was opened at 2008-03-20 14:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1920900group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: KVM Guest Drivers fail on Linux Initial Comment: The newly released kvm-guest-drivers-linux fail on stage 1: make sync Tested on openSUSE 10.3 32-bit guest. (gcc+kernel-source installed) -Alexey -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1920900group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl
Marcelo Tosatti wrote: Add an ioctl to zap all mappings to a given gfn. This allows userspace remove the QEMU process mappings and the page without causing inconsistency. I'm thinking of comitting rmap_nuke() to kvm.git, and the rest to the external module, since this is only needed on kernels without mmu notifiers. Andrea, is rmap_nuke() suitable for the mmu notifiers pte clear callback? Oh, and a single gfn may have multiple hvas, so we need to iterate over something here. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f0cdfba..c41464f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -642,6 +642,67 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static void rmap_nuke(struct kvm *kvm, u64 gfn) +{ + unsigned long *rmapp; + u64 *spte; + int nuked = 0; + + gfn = unalias_gfn(kvm, gfn); + rmapp = gfn_to_rmap(kvm, gfn, 0); + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!spte); + BUG_ON(!(*spte PT_PRESENT_MASK)); + rmap_printk(rmap_nuke: spte %p %llx\n, spte, *spte); + rmap_remove(kvm, spte); + set_shadow_pte(spte, shadow_trap_nonpresent_pte); +nuked = 1; + spte = rmap_next(kvm, rmapp, spte); + } + /* check for huge page mappings */ + rmapp = gfn_to_rmap(kvm, gfn, 1); + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!spte); + BUG_ON(!(*spte PT_PRESENT_MASK)); + BUG_ON((*spte (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)); + pgprintk(rmap_nuke(large): spte %p %llx %lld\n, spte, *spte, gfn); + rmap_remove(kvm, spte); + --kvm-stat.lpages; + set_shadow_pte(spte, shadow_trap_nonpresent_pte); + nuked = 1; + spte = rmap_next(kvm, rmapp, spte); + } + + if (nuked) + kvm_flush_remote_tlbs(kvm); +} + +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn) +{ + unsigned long addr; + int have_mmu_notifiers = 0; + + down_read(kvm-slots_lock); + addr = gfn_to_hva(kvm, gfn); + + if (kvm_is_error_hva(addr)) { + up_read(kvm-slots_lock); + return -EINVAL; + } + + if (!have_mmu_notifiers) { + spin_lock(kvm-mmu_lock); + rmap_nuke(kvm, gfn); + spin_unlock(kvm-mmu_lock); + } + up_read(kvm-slots_lock); + + return 0; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e65a9d6..d982ca1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -816,6 +816,9 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_MEMORY_SLOTS; break; + case KVM_CAP_ZAP_GFN: + r = 1; + break; default: r = 0; break; @@ -1636,6 +1639,15 @@ long kvm_arch_vm_ioctl(struct file *filp, r = 0; break; } + case KVM_ZAP_GFN: { + gfn_t gfn; + + r = -EFAULT; + if (copy_from_user(gfn, argp, sizeof gfn)) + goto out; + r = kvm_zap_single_gfn(kvm, gfn); + break; +} default: ; } diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 024b57c..4e45bd2 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -425,6 +425,7 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); void kvm_mmu_zap_all(struct kvm *kvm); +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/include/linux/kvm.h b/include/linux/kvm.h index e92e703..9ea714f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -236,6 +236,7 @@ struct kvm_vapic_addr { #define KVM_CAP_CLOCKSOURCE 8 #define KVM_CAP_NR_VCPUS 9 /* returns max vcpus per vm */ #define KVM_CAP_NR_MEMSLOTS 10 /* returns max memory slots per vm */ +#define KVM_CAP_ZAP_GFN 11 /* * ioctls for VM fds @@ -258,6 +259,7 @@ struct kvm_vapic_addr { #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) #define KVM_GET_IRQCHIP_IOWR(KVMIO, 0x62, struct kvm_irqchip) #define KVM_SET_IRQCHIP_IOR(KVMIO, 0x63, struct kvm_irqchip) +#define KVM_ZAP_GFN_IOR(KVMIO, 0x64, unsigned long) /* * ioctls for vcpu
Re: [kvm-devel] KVM Test result, kernel f1080a0.., userspace 49cf2d2..
Yunfeng Zhao wrote: Following issues fixed: 1. qcow based smp linux guests likely hang https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831 2. smp windows installer crashes while rebooting https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831 No idea how these were fixed. 3. Timer of guest is inaccurate https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599 This may be the in-kernel pit. 4. Installer of 64bit vista guest will pause for ten minutes after reboot https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599 The pit again?! Confused. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Avi Kivity wrote: Rusty Russell wrote: Hi all, Just finished my prototype of inter-guest virtio, using networking as an example. Each guest mmaps the other's address space and uses a FIFO for notifications. Isn't that a security hole (hole? chasm)? If the two guests can access each other's memory, they might as well be just one guest, and communicate internally. Each guest's host userspace mmaps the other guest's address space. The userspace then does a copy on both the tx and rx paths. Conceivably, this could be done as a read-only mapping so that each guest userspace copies only the rx packets. That's about as secure as you're going to get with this approach I think. Regards, Anthony Liguori My feeling is that the host needs to copy the data, using dma if available. Another option is to have one guest map the other's memory for read and write, while the other guest is unprivileged. This allows one privileged guest to provide services for other, unprivileged guests, like domain 0 or driver domains in Xen. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
Rusty Russell wrote: From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. I created a test program recently that measured the latency of a reads/writes to an mmap() file in /dev/shm and in a normal filesystem. Even after unlinking the underlying file, the write latency was much better with a mmap()'d file in /dev/shm. /dev/shm is not really for general use. I think we'll want to have our own tmpfs mount that we use to create VM images. I also prefer to use a unix socket for communication, unlink the file immediately after open, and then pass the fd via SCM_RIGHTS to the other process. Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file
On Thu, Mar 20, 2008 at 04:16:00PM +0800, Tim Post wrote: On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote: + snprintf(memfile_path, PATH_MAX, %s/.lguest, getenv(HOME) ?: ); Hi Rusty, Is that safe if being run via setuid/gid or shared root? It might be better to just look it up in /etc/passwd against the real UID, considering that anyone can change (or null) that env string. Of course its also practical to just say DON'T RUN LGUEST AS SETUID/GID. Even if you say that, someone will do it. You might also add beware of sudoers. For people (like myself and lab mates) who are forced to share machines, it could breed a whole new strain of practical jokes :) I'm not sure I see the risk here. Surely not anyone can modify your environment variables out from under you? Are you worried that other root users are going to point root's .lguest directory somewhere else, but not the non-root user's directory? I fear I'm missing something here... There _is_ an issue I hadn't thought of at the time, which is if your $HOME is on shared media, and you clash PIDs between lguest launchers on two machines sharing that media as $HOME, you're going to clash memfiles, specifically truncating the earlier memfile. (Sorry for the double-up, lguest list. I hit send too quickly) -- --- Paul TBBle Hampson, B.Sc, LPI, MCSE Very-later-year Asian Studies student, ANU The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361) [EMAIL PROTECTED] Of course Pacman didn't influence us as kids. If it did, we'd be running around in darkened rooms, popping pills and listening to repetitive music. -- Kristian Wilson, Nintendo, Inc, 1989 License: http://creativecommons.org/licenses/by/2.1/au/ --- pgp0IB4uev1kE.pgp Description: PGP signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Rusty Russell wrote: Hi all, Just finished my prototype of inter-guest virtio, using networking as an example. Each guest mmaps the other's address space and uses a FIFO for notifications. There are two issues with this approach. The first is that neither guest can change its mappings. See patch 1. Avi mentioned that with MMU notifiers, it may be possible to introduce a new kernel mechanism whereas you could map an arbitrary region of one process's memory into another process. This would address this problem quite nicely. The second is that our feature configuration is host presents, guest chooses which breaks down when we don't know the capabilities of each guest. In particular, TSO capability for networking. There are three possible solutions: 1) Just offer the lowest common denominator to both sides (ie. no features). This is what I do with lguest in these patches. 2) Offer something and handle the case where one Guest accepts and another doesn't by emulating it. ie. de-TSO the packets manually. 3) Hot unplug the device from the guest which asks for the greater features, then re-add it offering less features. Requires hotplug in the guest OS. 4) Add a feature negotiation feature. The feature that gets set is the feature negotiate feature. If a guest doesn't support feature negotiation, you end up with the least-common denominator (no features). If both guests support feature negotiation, you can then add something new to determine the true common subset. I haven't tuned or even benchmarked these patches, but it pings! Very nice! It's particularly cool that it was possible entirely in userspace. Regards, Anthony Liguori Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] PATCH: dont call exit() from pci_nic_init(), let caller handle
* Avi Kivity [EMAIL PROTECTED] [2008-03-20 07:19]: Ryan Harper wrote: While exploring the PCI hotplug code recently posted, I encountered a situation where I don't believe the current behavior is ideal. With hotplug, we can add additional pci-based nic devices like e1000 and rtl8139 from the qemu monitor. If one mistakenly specifies model=ne2000 (the ISA version), qemu just exits. If a command is run from the monitor and specifies bogus values, I don't believe the right behavior is to exit out of the guest entirely. The attached patch (which doesn't apply directly against qemu-cvs since hotplug hasn't been merged) changes pci_nic_init() to return NULL on error instead of exiting and then I've replaced all callers to check the return value and exit(), preserving the existing behavior, but allowing flexibility so hotplug can do the right thing and just report the error rather than exiting the guest. Applied, thanks. [this didn't make it to kvm-devel for some reason?] Yeah, not sure about that, sometimes it gets clogged in our outgoing system; they tend to not get along with some servers for unknown reasons to me. It has worked in the past for me. *shrugs* -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 [EMAIL PROTECTED] - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Anthony Liguori wrote: Avi Kivity wrote: Rusty Russell wrote: Hi all, Just finished my prototype of inter-guest virtio, using networking as an example. Each guest mmaps the other's address space and uses a FIFO for notifications. Isn't that a security hole (hole? chasm)? If the two guests can access each other's memory, they might as well be just one guest, and communicate internally. Each guest's host userspace mmaps the other guest's address space. The userspace then does a copy on both the tx and rx paths. Well, that's better security-wise (I'd still prefer to avoid it, so we can run each guest under a separate uid), but then we lose performance wise. Conceivably, this could be done as a read-only mapping so that each guest userspace copies only the rx packets. That's about as secure as you're going to get with this approach I think. Maybe we can terminate the virtio queue in the host kernel as a pipe, and splice pipes together. That gives us guest-guest and guest-process communications, and if you use aio the kernel can use a dma engine for the copy. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file
On Thu, Mar 20, 2008 at 09:04:17AM -0500, Anthony Liguori wrote: Rusty Russell wrote: From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. I created a test program recently that measured the latency of a reads/writes to an mmap() file in /dev/shm and in a normal filesystem. Even after unlinking the underlying file, the write latency was much better with a mmap()'d file in /dev/shm. /dev/shm is not really for general use. I think we'll want to have our own tmpfs mount that we use to create VM images. I also prefer to use a unix socket for communication, unlink the file immediately after open, and then pass the fd via SCM_RIGHTS to the other process. The original motivations for the file-backed mmap (rather than the /dev/zero mmap) were two-fold. Firstly, to allow suspend and resume to be done to a guest, it would need somewhere for its memory to survive. (ie. a guest could be suspended externally immediately, and its state would be resumable from that mmap file) Secondly, heading towards some kind of common-page-sharing trick, where each lguest could spot and share pages in common with other lguests. Both of these assume the file is going to be visible in the filesystem until the guest is shut down. As to whether these are still interesting motivations, I withhold any opinion in favour of those who know better. ^_^ -- --- Paul TBBle Hampson, B.Sc, LPI, MCSE Very-later-year Asian Studies student, ANU The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361) [EMAIL PROTECTED] Of course Pacman didn't influence us as kids. If it did, we'd be running around in darkened rooms, popping pills and listening to repetitive music. -- Kristian Wilson, Nintendo, Inc, 1989 License: http://creativecommons.org/licenses/by/2.1/au/ --- pgp7xRVKYuGtl.pgp Description: PGP signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: Each guest's host userspace mmaps the other guest's address space. The userspace then does a copy on both the tx and rx paths. Well, that's better security-wise (I'd still prefer to avoid it, so we can run each guest under a separate uid), but then we lose performance wise. What performance win? I'm not sure the copies can be eliminated in the case of interguest IO. Fast interguest IO means mmap()'ing the other guest's address space read-only. If you had a pv dma registration api you could conceivably only allow the active dma entries to be mapped but my fear would be that the zap'ing on unregister would hurt performance. Conceivably, this could be done as a read-only mapping so that each guest userspace copies only the rx packets. That's about as secure as you're going to get with this approach I think. Maybe we can terminate the virtio queue in the host kernel as a pipe, and splice pipes together. That gives us guest-guest and guest-process communications, and if you use aio the kernel can use a dma engine for the copy. Ah, so you're looking to use a DMA engine for accelerated copy. Perhaps the answer is to expose the DMA engine via a userspace API? Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: Each guest's host userspace mmaps the other guest's address space. The userspace then does a copy on both the tx and rx paths. Well, that's better security-wise (I'd still prefer to avoid it, so we can run each guest under a separate uid), but then we lose performance wise. What performance win? I'm not sure the copies can be eliminated in the case of interguest IO. I guess not. But at least you can dma instead of busy-copying. Fast interguest IO means mmap()'ing the other guest's address space read-only. This implies trusting the other userspace, which is not a good thing. Let the kernel copy, we already trust it, and it has more resources to do the copy. If you had a pv dma registration api you could conceivably only allow the active dma entries to be mapped but my fear would be that the zap'ing on unregister would hurt performance. Yes, mmu games are costly. They also only work on page granularity which isn't always possible to guarantee. Conceivably, this could be done as a read-only mapping so that each guest userspace copies only the rx packets. That's about as secure as you're going to get with this approach I think. Maybe we can terminate the virtio queue in the host kernel as a pipe, and splice pipes together. That gives us guest-guest and guest-process communications, and if you use aio the kernel can use a dma engine for the copy. Ah, so you're looking to use a DMA engine for accelerated copy. Perhaps the answer is to expose the DMA engine via a userspace API? That's one option, but it still involves sharing all of memory. Splicing pipes might be better. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: Each guest's host userspace mmaps the other guest's address space. The userspace then does a copy on both the tx and rx paths. Well, that's better security-wise (I'd still prefer to avoid it, so we can run each guest under a separate uid), but then we lose performance wise. What performance win? I'm not sure the copies can be eliminated in the case of interguest IO. I guess not. But at least you can dma instead of busy-copying. Fast interguest IO means mmap()'ing the other guest's address space read-only. You can have the file descriptor be opened O_RDONLY so trust isn't an issue. This implies trusting the other userspace, which is not a good thing. Let the kernel copy, we already trust it, and it has more resources to do the copy. You're going to end up with the same trust issues no matter what unless you let the kernel look directly at the virtio ring queue. That's the only way to arbitrate what memory gets copied. There may be a generic API here for fast interprocess IO, I don't know. splice() is a little awkward though for this because you really don't want to sit in a splice() loop. What you want is for both sides to be kick'ing the kernel and the kernel to raise an event via eventfd() or something. Absent whatever this kernel API is (which is really just helpful with a DMA engine), I think the current userspace approach is pretty reasonable. Not just for interguest IO but also for driver domains which I think is a logical extension. Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
Anthony Liguori wrote: Rusty Russell wrote: From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. I created a test program recently that measured the latency of a reads/writes to an mmap() file in /dev/shm and in a normal filesystem. Even after unlinking the underlying file, the write latency was much better with a mmap()'d file in /dev/shm. Surely the difference disappears once the pages have been faulted in? -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
Avi Kivity wrote: Anthony Liguori wrote: Rusty Russell wrote: From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. I created a test program recently that measured the latency of a reads/writes to an mmap() file in /dev/shm and in a normal filesystem. Even after unlinking the underlying file, the write latency was much better with a mmap()'d file in /dev/shm. Surely the difference disappears once the pages have been faulted in? I don't recall. I believe rewrite was okay but initial write was much worse. Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Anthony Liguori wrote: You can have the file descriptor be opened O_RDONLY so trust isn't an issue. Reading is just as bad as writing. This implies trusting the other userspace, which is not a good thing. Let the kernel copy, we already trust it, and it has more resources to do the copy. You're going to end up with the same trust issues no matter what unless you let the kernel look directly at the virtio ring queue. That's the only way to arbitrate what memory gets copied. That's what we need, then. There may be a generic API here for fast interprocess IO, I don't know. splice() is a little awkward though for this because you really don't want to sit in a splice() loop. What you want is for both sides to be kick'ing the kernel and the kernel to raise an event via eventfd() or something. Absent whatever this kernel API is (which is really just helpful with a DMA engine), I think the current userspace approach is pretty reasonable. Not just for interguest IO but also for driver domains which I think is a logical extension. I disagree. A driver domain is shared between multiple guests, and if one of the guests manages to break into qemu then it can see other guest's data. [Driver domains are a horrible idea IMO, but that's another story] -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
Avi Kivity wrote: I disagree. A driver domain is shared between multiple guests, and if one of the guests manages to break into qemu then it can see other guest's data. You still don't strictly need to do things in the kernel if this is your concern. You can have another process map both guest's address spaces and do the copying on behalf of each guest if you're paranoid about escaping into QEMU. [Driver domains are a horrible idea IMO, but that's another story] I don't disagree :-) Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable
From: Martin Schwidefsky [EMAIL PROTECTED] The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/Kconfig |4 ++ arch/s390/kernel/setup.c |4 ++ arch/s390/mm/pgtable.c | 55 ++--- include/asm-s390/mmu.h |1 include/asm-s390/mmu_context.h |8 + include/asm-s390/pgtable.h |1 kernel/fork.c |2 - 7 files changed, 70 insertions(+), 5 deletions(-) Index: kvm/arch/s390/Kconfig === --- kvm.orig/arch/s390/Kconfig +++ kvm/arch/s390/Kconfig @@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK default y depends on SMP PREEMPT +config PGSTE + bool + default y if KVM + mainmenu Linux Kernel Configuration config S390 Index: kvm/arch/s390/kernel/setup.c === --- kvm.orig/arch/s390/kernel/setup.c +++ kvm/arch/s390/kernel/setup.c @@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c early_param(ipldelay, early_parse_ipldelay); #ifdef CONFIG_S390_SWITCH_AMODE +#ifdef CONFIG_PGSTE +unsigned int switch_amode = 1; +#else unsigned int switch_amode = 0; +#endif EXPORT_SYMBOL_GPL(switch_amode); static void set_amode_and_uaccess(unsigned long user_amode, Index: kvm/arch/s390/mm/pgtable.c === --- kvm.orig/arch/s390/mm/pgtable.c +++ kvm/arch/s390/mm/pgtable.c @@ -30,11 +30,27 @@ #define TABLES_PER_PAGE4 #define FRAG_MASK 15UL #define SECOND_HALVES 10UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 256, 0, PAGE_SIZE/4); + clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4); + memset(table + 768, 0, PAGE_SIZE/4); +} + #else #define ALLOC_ORDER2 #define TABLES_PER_PAGE2 #define FRAG_MASK 3UL #define SECOND_HALVES 2UL + +void clear_table_pgstes(unsigned long *table) +{ + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2); + memset(table + 256, 0, PAGE_SIZE/2); +} + #endif unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec) @@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m unsigned long *table; unsigned long bits; - bits = mm-context.noexec ? 3UL : 1UL; + bits = (mm-context.noexec || mm-context.pgstes) ? 3UL : 1UL; spin_lock(mm-page_table_lock); page = NULL; if (!list_empty(mm-context.pgtable_list)) { @@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m pgtable_page_ctor(page); page-flags = ~FRAG_MASK; table = (unsigned long *) page_to_phys(page); - clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); + if (mm-context.pgstes) + clear_table_pgstes(table); + else + clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE); spin_lock(mm-page_table_lock); list_add(page-lru, mm-context.pgtable_list); } @@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m struct page *page; unsigned long bits; - bits = mm-context.noexec ? 3UL : 1UL; + bits = (mm-context.noexec || mm-context.pgstes) ? 3UL : 1UL; bits = (__pa(table) (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long); page = pfn_to_page(__pa(table) PAGE_SHIFT); spin_lock(mm-page_table_lock); @@ -228,3 +247,33 @@ void disable_noexec(struct mm_struct *mm mm-context.noexec = 0; update_mm(mm, tsk); } + +struct mm_struct *dup_mm(struct task_struct *tsk); + +/* + * switch on pgstes for its userspace process (for kvm) + */ +int s390_enable_sie(void) +{ + struct task_struct *tsk = current; + struct mm_struct *mm; + + if (tsk-mm-context.pgstes) + return 0; + if (!tsk-mm || atomic_read(tsk-mm-mm_users) 1 || + tsk-mm != tsk-active_mm || tsk-mm-ioctx_list) + return -EINVAL; + tsk-mm-context.pgstes = 1;/* dirty little tricks .. */ + mm = dup_mm(tsk); + tsk-mm-context.pgstes = 0; + if (!mm) + return
[kvm-devel] [RFC/PATCH 02/15] preparation: host memory management changes for s390 kvm
From: Heiko Carstens [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This patch changes the s390 memory management defintions to use the pgste field for dirty and reference bit tracking of host and guest code. Usually on s390, dirty and referenced are tracked in storage keys, which belong to the physical page. This changes with virtualization: The guest and host dirty/reference bits are defined to be the logical OR of the values for the mapping and the physical page. This patch implements the necessary changes in pgtable.h for s390. There is a common code change in mm/rmap.c, the call to page_test_and_clear_young must be moved. This is a no-op for all architecture but s390. page_referenced checks the referenced bits for the physiscal page and for all mappings: o The physical page is checked with page_test_and_clear_young. o The mappings are checked with ptep_test_and_clear_young and friends. Without pgstes (the current implementation on Linux s390) the physical page check is implemented but the mapping callbacks are no-ops because dirty and referenced are not tracked in the s390 page tables. The pgstes introduces guest and host dirty and reference bits for s390 in the host mapping. These mapping must be checked before page_test_and_clear_young resets the reference bit. Signed-off-by: Heiko Carstens [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- include/asm-s390/pgtable.h | 109 +++-- mm/rmap.c |7 +- 2 files changed, 110 insertions(+), 6 deletions(-) Index: kvm/include/asm-s390/pgtable.h === --- kvm.orig/include/asm-s390/pgtable.h +++ kvm/include/asm-s390/pgtable.h @@ -30,6 +30,7 @@ */ #ifndef __ASSEMBLY__ #include linux/mm_types.h +#include asm/atomic.h #include asm/bug.h #include asm/processor.h @@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE]; * swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid. */ +/* Page status extended for virtualization */ +#define _PAGE_RCP_PCL 0x0080UL +#define _PAGE_RCP_HR 0x0040UL +#define _PAGE_RCP_HC 0x0020UL +#define _PAGE_RCP_GR 0x0004UL +#define _PAGE_RCP_GC 0x0002UL + #ifndef __s390x__ /* Bits in the segment table address-space-control-element */ @@ -513,6 +521,67 @@ static inline int pte_file(pte_t pte) #define __HAVE_ARCH_PTE_SAME #define pte_same(a,b) (pte_val(a) == pte_val(b)) +static inline void rcp_lock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE); + preempt_disable(); + atomic64_set_mask(_PAGE_RCP_PCL, rcp); +#endif +} + +static inline void rcp_unlock(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE); + atomic64_clear_mask(_PAGE_RCP_PCL, rcp); + preempt_enable(); +#endif +} + +static inline void rcp_set_bits(pte_t *ptep, unsigned long val) +{ +#ifdef CONFIG_PGSTE + *(unsigned long *) (ptep + PTRS_PER_PTE) |= val; +#endif +} + +static inline int rcp_test_and_clear_bits(pte_t *ptep, unsigned long val) +{ +#ifdef CONFIG_PGSTE + unsigned long ret; + + ret = *(unsigned long *) (ptep + PTRS_PER_PTE); + *(unsigned long *) (ptep + PTRS_PER_PTE) = ~val; + return (ret val) == val; +#else + return 0; +#endif +} + + +/* forward declaration for SetPageUptodate in page-flags.h*/ +static inline void page_clear_dirty(struct page *page); +#include linux/page-flags.h + +static inline void ptep_rcp_copy(pte_t *ptep) +{ +#ifdef CONFIG_PGSTE + struct page *page = virt_to_page(pte_val(*ptep)); + unsigned int skey; + + skey = page_get_storage_key(page_to_phys(page)); + if (skey _PAGE_CHANGED) + rcp_set_bits(ptep, _PAGE_RCP_GC); + if (skey _PAGE_REFERENCED) + rcp_set_bits(ptep, _PAGE_RCP_GR); + if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HC)) + SetPageDirty(page); + if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR)) + SetPageReferenced(page); +#endif +} + /* * query functions pte_write/pte_dirty/pte_young only work if * pte_present() is true. Undefined behaviour if not.. @@ -599,6 +668,8 @@ static inline void pmd_clear(pmd_t *pmd) static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + if (mm-context.pgstes) + ptep_rcp_copy(ptep); pte_val(*ptep) = _PAGE_TYPE_EMPTY; if (mm-context.noexec) pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY; @@ -667,6 +738,22 @@ static inline pte_t pte_mkyoung(pte_t pt static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { +#ifdef
[kvm-devel] [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] From: Heiko Carstens [EMAIL PROTECTED] This patch contains the port of Qumranet's kvm kernel module to IBM zSeries (aka s390x, mainframe) architecture. It uses the mainframe's virtualization instruction SIE to run virtual machines with up to 64 virtual CPUs each. This port is only usable on 64bit host kernels, and can only run 64bit guest kernels. However, running 31bit applications in guest userspace is possible. The following source files are introduced by this patch arch/s390/kvm/kvm-s390.csimilar to arch/x86/kvm/x86.c, this implements all arch callbacks for kvm. __vcpu_run calls back into sie64a to enter the guest machine context arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest context via SIE, and switches world before and afterthat include/asm-s390/kvm_host.h contains all vital data structures needed to run virtual machines on the mainframe include/asm-s390/kvm.h defines kvm_regs and friends for user access to guest register content arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory arch/s390/kvm/kvm-s390.hheader file for kvm-s390 internals, extended by later patches Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Heiko Carstens [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/Makefile |2 arch/s390/kernel/vtime.c|1 arch/s390/kvm/Makefile | 14 + arch/s390/kvm/gaccess.h | 280 + arch/s390/kvm/kvm-s390.c| 574 arch/s390/kvm/kvm-s390.h| 29 ++ arch/s390/kvm/sie64a.S | 47 +++ include/asm-s390/Kbuild |1 include/asm-s390/kvm.h | 44 +++ include/asm-s390/kvm_host.h | 119 + include/asm-s390/kvm_para.h | 30 ++ include/linux/kvm.h | 15 + include/linux/kvm_host.h|4 13 files changed, 1159 insertions(+), 1 deletion(-) Index: kvm/arch/s390/Makefile === --- kvm.orig/arch/s390/Makefile +++ kvm/arch/s390/Makefile @@ -87,7 +87,7 @@ LDFLAGS_vmlinux := -e start head-y := arch/s390/kernel/head.o arch/s390/kernel/init_task.o core-y += arch/s390/mm/ arch/s390/kernel/ arch/s390/crypto/ \ - arch/s390/appldata/ arch/s390/hypfs/ + arch/s390/appldata/ arch/s390/hypfs/ arch/s390/kvm/ libs-y += arch/s390/lib/ drivers-y += drivers/s390/ drivers-$(CONFIG_MATHEMU) += arch/s390/math-emu/ Index: kvm/arch/s390/kernel/vtime.c === --- kvm.orig/arch/s390/kernel/vtime.c +++ kvm/arch/s390/kernel/vtime.c @@ -110,6 +110,7 @@ void account_system_vtime(struct task_st S390_lowcore.steal_clock -= cputime 12; account_system_time(tsk, 0, cputime); } +EXPORT_SYMBOL_GPL(account_system_vtime); static inline void set_vtimer(__u64 expires) { Index: kvm/arch/s390/kvm/Makefile === --- /dev/null +++ kvm/arch/s390/kvm/Makefile @@ -0,0 +1,14 @@ +# Makefile for kernel virtual machines on s390 +# +# Copyright IBM Corp. 2008 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License (version 2 only) +# as published by the Free Software Foundation. + +common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o) + +EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm + +kvm-objs := $(common-objs) kvm-s390.o sie64a.o +obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/gaccess.h === --- /dev/null +++ kvm/arch/s390/kvm/gaccess.h @@ -0,0 +1,280 @@ +/* + * gaccess.h - access guest memory + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Carsten Otte [EMAIL PROTECTED] + */ + +#ifndef __KVM_S390_GACCESS_H +#define __KVM_S390_GACCESS_H + +#include linux/compiler.h +#include linux/kvm_host.h +#include asm/uaccess.h + +static inline void __user *__guestaddr_to_user(struct kvm_vcpu *vcpu, + u64 guestaddr) +{ + u64 prefix = vcpu-arch.sie_block-prefix; + u64 origin = vcpu-kvm-arch.guest_origin; + u64 memsize = vcpu-kvm-arch.guest_memsize; + + if (guestaddr 2 * PAGE_SIZE) + guestaddr += prefix; + else if ((guestaddr = prefix) (guestaddr prefix + 2 * PAGE_SIZE)) +
[kvm-devel] [RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use
From: Christian Borntraeger [EMAIL PROTECTED] drivers/s390/sysinfo.c uses the store system information intruction to query the system about information of the machine, the LPAR and additional hypervisors. KVM has to implement the host part for this instruction. To avoid code duplication, this patch splits the common definitions from sysinfo.c into a separate header file include/asm-s390/sysinfo.h for KVM use. Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- drivers/s390/sysinfo.c | 100 include/asm-s390/sysinfo.h | 112 + 2 files changed, 113 insertions(+), 99 deletions(-) Index: kvm/drivers/s390/sysinfo.c === --- kvm.orig/drivers/s390/sysinfo.c +++ kvm/drivers/s390/sysinfo.c @@ -11,111 +11,13 @@ #include linux/init.h #include linux/delay.h #include asm/ebcdic.h +#include asm/sysinfo.h /* Sigh, math-emu. Don't ask. */ #include asm/sfp-util.h #include math-emu/soft-fp.h #include math-emu/single.h -struct sysinfo_1_1_1 { - char reserved_0[32]; - char manufacturer[16]; - char type[4]; - char reserved_1[12]; - char model_capacity[16]; - char sequence[16]; - char plant[4]; - char model[16]; -}; - -struct sysinfo_1_2_1 { - char reserved_0[80]; - char sequence[16]; - char plant[4]; - char reserved_1[2]; - unsigned short cpu_address; -}; - -struct sysinfo_1_2_2 { - char format; - char reserved_0[1]; - unsigned short acc_offset; - char reserved_1[24]; - unsigned int secondary_capability; - unsigned int capability; - unsigned short cpus_total; - unsigned short cpus_configured; - unsigned short cpus_standby; - unsigned short cpus_reserved; - unsigned short adjustment[0]; -}; - -struct sysinfo_1_2_2_extension { - unsigned int alt_capability; - unsigned short alt_adjustment[0]; -}; - -struct sysinfo_2_2_1 { - char reserved_0[80]; - char sequence[16]; - char plant[4]; - unsigned short cpu_id; - unsigned short cpu_address; -}; - -struct sysinfo_2_2_2 { - char reserved_0[32]; - unsigned short lpar_number; - char reserved_1; - unsigned char characteristics; - unsigned short cpus_total; - unsigned short cpus_configured; - unsigned short cpus_standby; - unsigned short cpus_reserved; - char name[8]; - unsigned int caf; - char reserved_2[16]; - unsigned short cpus_dedicated; - unsigned short cpus_shared; -}; - -#define LPAR_CHAR_DEDICATED(1 7) -#define LPAR_CHAR_SHARED (1 6) -#define LPAR_CHAR_LIMITED (1 5) - -struct sysinfo_3_2_2 { - char reserved_0[31]; - unsigned char count; - struct { - char reserved_0[4]; - unsigned short cpus_total; - unsigned short cpus_configured; - unsigned short cpus_standby; - unsigned short cpus_reserved; - char name[8]; - unsigned int caf; - char cpi[16]; - char reserved_1[24]; - - } vm[8]; -}; - -static inline int stsi(void *sysinfo, int fc, int sel1, int sel2) -{ - register int r0 asm(0) = (fc 28) | sel1; - register int r1 asm(1) = sel2; - - asm volatile( - stsi 0(%2)\n - 0: jz 2f\n - 1: lhi %0,%3\n - 2:\n - EX_TABLE(0b,1b) - : +d (r0) : d (r1), a (sysinfo), K (-ENOSYS) - : cc, memory ); - return r0; -} - static inline int stsi_0(void) { int rc = stsi (NULL, 0, 0, 0); Index: kvm/include/asm-s390/sysinfo.h === --- /dev/null +++ kvm/include/asm-s390/sysinfo.h @@ -0,0 +1,112 @@ +/* + * definition for store system information stsi + * + * Copyright IBM Corp. 2001,2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Ulrich Weigand [EMAIL PROTECTED] + * Christian Borntraeger [EMAIL PROTECTED] + */ + +struct sysinfo_1_1_1 { + char reserved_0[32]; + char manufacturer[16]; + char type[4]; + char reserved_1[12]; + char model_capacity[16]; + char sequence[16]; + char plant[4]; + char model[16]; +}; + +struct sysinfo_1_2_1 { + char reserved_0[80]; + char sequence[16]; + char plant[4]; + char reserved_1[2]; + unsigned short cpu_address; +}; + +struct sysinfo_1_2_2 { + char format; + char reserved_0[1]; + unsigned short acc_offset; +
[kvm-devel] [RFC/PATCH 03/15] preparation: address of the 64bit extint parm in lowcore
From: Christian Borntraeger [EMAIL PROTECTED] The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to provide a 64 bit extint parameter. virtio uses the same address, so its time to update the lowcore structure. Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- include/asm-s390/lowcore.h | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) Index: kvm/include/asm-s390/lowcore.h === --- kvm.orig/include/asm-s390/lowcore.h +++ kvm/include/asm-s390/lowcore.h @@ -380,27 +380,32 @@ struct _lowcore /* whether the kernel died with panic() or not */ __u32panic_magic; /* 0xe00 */ - __u8 pad13[0x1200-0xe04]; /* 0xe04 */ + __u8 pad13[0x11b8-0xe04]; /* 0xe04 */ + + /* 64 bit extparam used for pfault, diag 250 etc */ + __u64ext_params2; /* 0x11B8 */ + + __u8 pad14[0x1200-0x11C0]; /* 0x11C0 */ /* System info area */ __u64floating_pt_save_area[16]; /* 0x1200 */ __u64gpregs_save_area[16]; /* 0x1280 */ __u32st_status_fixed_logout[4]; /* 0x1300 */ - __u8 pad14[0x1318-0x1310]; /* 0x1310 */ + __u8 pad15[0x1318-0x1310]; /* 0x1310 */ __u32prefixreg_save_area; /* 0x1318 */ __u32fpt_creg_save_area;/* 0x131c */ - __u8 pad15[0x1324-0x1320]; /* 0x1320 */ + __u8 pad16[0x1324-0x1320]; /* 0x1320 */ __u32tod_progreg_save_area; /* 0x1324 */ __u32cpu_timer_save_area[2];/* 0x1328 */ __u32clock_comp_save_area[2]; /* 0x1330 */ - __u8 pad16[0x1340-0x1338]; /* 0x1338 */ + __u8 pad17[0x1340-0x1338]; /* 0x1338 */ __u32access_regs_save_area[16]; /* 0x1340 */ __u64cregs_save_area[16]; /* 0x1380 */ /* align to the top of the prefix area */ - __u8 pad17[0x2000-0x1400]; /* 0x1400 */ + __u8 pad18[0x2000-0x1400]; /* 0x1400 */ #endif /* !__s390x__ */ } __attribute__((packed)); /* End structure*/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw
From: Carsten Otte [EMAIL PROTECTED] This patch contains the s390 interrupt subsystem (similar to in kernel apic) including timer interrupts (similar to in-kernel-pit) and enabled wait (similar to in kernel hlt). In order to achieve that, this patch also introduces intercept handling for instruction intercepts, and it implements load control instructions. This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both the vm file descriptors and the vcpu file descriptors. In case this ioctl is issued against a vm file descriptor, the interrupt is considered floating. Floating interrupts may be delivered to any virtual cpu in the configuration. The following interrupts are supported: SIGP STOP - interprocessor signal that stops a remote cpu SIGP SET PREFIX - interprocessor signal that sets the prefix register of a (stopped) remote cpu INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed and for smp_call_function() in the guest. PROGRAM INT - exception during program execution such as page fault, illegal instruction and friends RESTART - interprocessor signal that starts a stopped cpu INT VIRTIO - floating interrupt for virtio signalisation INT SERVICE - floating interrupt for signalisations from the system service processor struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting an interrupt, also carrys parameter data for interrupts along with the interrupt type. Interrupts on s390 usually have a state that represents the current operation, or identifies which device has caused the interruption on s390. kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a disabled wait (that is, disabled for interrupts), we exit to userspace. In case of an enabled wait we set up a timer that equals the cpu clock comparator value and sleep on a wait queue. Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/kvm/Makefile |2 arch/s390/kvm/intercept.c | 123 + arch/s390/kvm/interrupt.c | 583 arch/s390/kvm/kvm-s390.c| 48 +++ arch/s390/kvm/kvm-s390.h| 15 + include/asm-s390/kvm_host.h | 75 + include/linux/kvm.h | 17 + 7 files changed, 860 insertions(+), 3 deletions(-) Index: kvm/arch/s390/kvm/Makefile === --- kvm.orig/arch/s390/kvm/Makefile +++ kvm/arch/s390/kvm/Makefile @@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm -kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o +kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -18,6 +18,91 @@ #include asm/kvm_host.h #include kvm-s390.h +#include gaccess.h + +static int handle_lctg(struct kvm_vcpu *vcpu) +{ + int reg1 = (vcpu-arch.sie_block-ipa 0x00f0) 4; + int reg3 = vcpu-arch.sie_block-ipa 0x000f; + int base2 = vcpu-arch.sie_block-ipb 28; + int disp2 = ((vcpu-arch.sie_block-ipb 0x0fff) 16) + + ((vcpu-arch.sie_block-ipb 0xff00) 4); + u64 useraddr; + int reg, rc; + + vcpu-stat.instruction_lctg++; + if ((vcpu-arch.sie_block-ipb 0xff) != 0x2f) + return -ENOTSUPP; + + useraddr = disp2; + if (base2) + useraddr += vcpu-arch.guest_gprs[base2]; + + reg = reg1; + + VCPU_EVENT(vcpu, 5, lctg r1:%x, r3:%x,b2:%x,d2:%x, reg1, reg3, base2, + disp2); + + do { + rc = get_guest_u64(vcpu, useraddr, + vcpu-arch.sie_block-gcr[reg]); + if (rc == -EFAULT) { + kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + break; + } + useraddr += 8; + if (reg == reg3) + break; + reg = reg + 1; + if (reg 15) + reg = 0; + } while (1); + return 0; +} + +static int handle_lctl(struct kvm_vcpu *vcpu) +{ + int reg1 = (vcpu-arch.sie_block-ipa 0x00f0) 4; + int reg3 = vcpu-arch.sie_block-ipa 0x000f; + int base2 = vcpu-arch.sie_block-ipb 28; + int disp2 = ((vcpu-arch.sie_block-ipb 0x0fff) 16); + u64 useraddr; + u32 val = 0; + int reg, rc; + + vcpu-stat.instruction_lctl++; + + useraddr = disp2; + if (base2) + useraddr += vcpu-arch.guest_gprs[base2]; + + reg = reg1; + + VCPU_EVENT(vcpu, 5, lctl r1:%x, r3:%x,b2:%x,d2:%x, reg1, reg3, base2, + disp2); + +
[kvm-devel] [RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This patch introduces in-kernel handling of _some_ sigp interprocessor signals (similar to ipi). kvm_s390_handle_sigp() decodes the sigp instruction and calls individual handlers depending on the operation requested: - sigp sense tries to retrieve information such as existence or running state of the remote cpu - sigp emergency sends an external interrupt to the remove cpu - sigp stop stops a remove cpu - sigp stop store status stops a remote cpu, and stores its entire internal state to the cpus lowcore - sigp set arch sets the architecture mode of the remote cpu. setting to ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is denied, all others are passed to userland - sigp set prefix sets the prefix register of a remote cpu For implementation of this, the stop intercept indication starts to get reused on purpose: a set of action bits defines what to do once a cpu gets stopped: ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is recognized Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/kvm/Makefile |2 arch/s390/kvm/intercept.c | 22 +++ arch/s390/kvm/kvm-s390.c|7 + arch/s390/kvm/kvm-s390.h|7 + arch/s390/kvm/sigp.c| 289 include/asm-s390/kvm_host.h | 12 + 6 files changed, 336 insertions(+), 3 deletions(-) Index: kvm/arch/s390/kvm/Makefile === --- kvm.orig/arch/s390/kvm/Makefile +++ kvm/arch/s390/kvm/Makefile @@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm -kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o +kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o sigp.o obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu * } static intercept_handler_t instruction_handlers[256] = { + [0xae] = kvm_s390_handle_sigp, [0xb2] = kvm_s390_handle_priv, [0xb7] = handle_lctl, [0xeb] = handle_lctg, @@ -122,10 +123,27 @@ static int handle_noop(struct kvm_vcpu * static int handle_stop(struct kvm_vcpu *vcpu) { + int rc; + vcpu-stat.exit_stop_request++; - VCPU_EVENT(vcpu, 3, %s, cpu stopped); atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags); - return -ENOTSUPP; + spin_lock_bh(vcpu-arch.local_int.lock); + if (vcpu-arch.local_int.action_bits ACTION_STORE_ON_STOP) { + vcpu-arch.local_int.action_bits = ~ACTION_STORE_ON_STOP; + rc = __kvm_s390_vcpu_store_status(vcpu, + KVM_S390_STORE_STATUS_NOADDR); + if (rc = 0) + rc = -ENOTSUPP; + } + + if (vcpu-arch.local_int.action_bits ACTION_STOP_ON_STOP) { + vcpu-arch.local_int.action_bits = ~ACTION_STOP_ON_STOP; + VCPU_EVENT(vcpu, 3, %s, cpu stopped); + rc = -ENOTSUPP; + } else + rc = 0; + spin_unlock_bh(vcpu-arch.local_int.lock); + return rc; } static int handle_validity(struct kvm_vcpu *vcpu) Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -57,6 +57,12 @@ struct kvm_stats_debugfs_item debugfs_en { instruction_chsc, VCPU_STAT(instruction_chsc) }, { instruction_stsi, VCPU_STAT(instruction_stsi) }, { instruction_stfl, VCPU_STAT(instruction_stfl) }, + { instruction_sigp_sense, VCPU_STAT(instruction_sigp_sense) }, + { instruction_sigp_emergency, VCPU_STAT(instruction_sigp_emergency) }, + { instruction_sigp_stop, VCPU_STAT(instruction_sigp_stop) }, + { instruction_sigp_set_arch, VCPU_STAT(instruction_sigp_arch) }, + { instruction_sigp_set_prefix, VCPU_STAT(instruction_sigp_prefix) }, + { instruction_sigp_restart, VCPU_STAT(instruction_sigp_restart) }, { NULL } }; @@ -290,6 +296,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st spin_lock_bh(kvm-arch.float_int.lock); kvm-arch.float_int.local_int[id] = vcpu-arch.local_int; init_waitqueue_head(vcpu-arch.local_int.wq); + vcpu-arch.local_int.cpuflags = vcpu-arch.sie_block-cpuflags; spin_unlock_bh(kvm-arch.float_int.lock); rc = kvm_vcpu_init(vcpu, kvm, id); Index:
[kvm-devel] [RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This patch introduces interpretation of some diagnose instruction intercepts. Diagnose is our classic architected way of doing a hypercall. This patch features the following diagnose codes: - vm storage size, that tells the guest about its memory layout - time slice end, which is used by the guest to indicate that it waits for a lock and thus cannot use up its time slice in a useful way - ipl functions, which a guest can use to reset and reboot itself In order to implement ipl functions, we also introduce an exit reason that causes userspace to perform various resets on the virtual machine. All resets are described in the principles of operation book, except KVM_S390_RESET_IPL which causes a reboot of the machine. Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/kvm/Makefile |2 - arch/s390/kvm/diag.c| 67 arch/s390/kvm/intercept.c |1 arch/s390/kvm/kvm-s390.c|1 arch/s390/kvm/kvm-s390.h|2 + include/asm-s390/kvm_host.h |5 ++- include/linux/kvm.h |8 + 7 files changed, 84 insertions(+), 2 deletions(-) Index: kvm/arch/s390/kvm/Makefile === --- kvm.orig/arch/s390/kvm/Makefile +++ kvm/arch/s390/kvm/Makefile @@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm -kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o sigp.o +kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o sigp.o diag.o obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/diag.c === --- /dev/null +++ kvm/arch/s390/kvm/diag.c @@ -0,0 +1,67 @@ +/* + * diag.c - handling diagnose instructions + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Carsten Otte [EMAIL PROTECTED] + * Christian Borntraeger [EMAIL PROTECTED] + */ + +#include linux/kvm.h +#include linux/kvm_host.h +#include kvm-s390.h + +static int __diag_time_slice_end(struct kvm_vcpu *vcpu) +{ + VCPU_EVENT(vcpu, 5, %s, diag time slice end); + vcpu-stat.diagnose_44++; + vcpu_put(vcpu); + schedule(); + vcpu_load(vcpu); + return 0; +} + +static int __diag_ipl_functions(struct kvm_vcpu *vcpu) +{ + unsigned int reg = vcpu-arch.sie_block-ipa 0xf; + unsigned long subcode = vcpu-arch.guest_gprs[reg] 0x; + + VCPU_EVENT(vcpu, 5, diag ipl functions, subcode %lx, subcode); + switch (subcode) { + case 3: + vcpu-run-s390_reset_flags = KVM_S390_RESET_CLEAR; + break; + case 4: + vcpu-run-s390_reset_flags = 0; + break; + default: + return -ENOTSUPP; + } + + atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags); + vcpu-run-s390_reset_flags |= KVM_S390_RESET_SUBSYSTEM; + vcpu-run-s390_reset_flags |= KVM_S390_RESET_IPL; + vcpu-run-s390_reset_flags |= KVM_S390_RESET_CPU_INIT; + vcpu-run-exit_reason = KVM_EXIT_S390_RESET; + VCPU_EVENT(vcpu, 3, requesting userspace resets %lx, + vcpu-run-s390_reset_flags); + return -EREMOTE; +} + +int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) +{ + int code = (vcpu-arch.sie_block-ipb 0xfff) 16; + + switch (code) { + case 0x44: + return __diag_time_slice_end(vcpu); + case 0x308: + return __diag_ipl_functions(vcpu); + default: + return -ENOTSUPP; + } +} Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu * } static intercept_handler_t instruction_handlers[256] = { + [0x83] = kvm_s390_handle_diag, [0xae] = kvm_s390_handle_sigp, [0xb2] = kvm_s390_handle_priv, [0xb7] = handle_lctl, Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -63,6 +63,7 @@ struct kvm_stats_debugfs_item debugfs_en { instruction_sigp_set_arch, VCPU_STAT(instruction_sigp_arch) }, { instruction_sigp_set_prefix, VCPU_STAT(instruction_sigp_prefix) }, { instruction_sigp_restart, VCPU_STAT(instruction_sigp_restart) }, + { diagnose_44, VCPU_STAT(diagnose_44) }, { NULL } }; Index:
[kvm-devel] [RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This patch adds the virtualization submenu and the kvm option to the kernel config. It also defines HAVE_KVM for 64bit kernels. Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/Kconfig |3 +++ arch/s390/kvm/Kconfig | 43 +++ 2 files changed, 46 insertions(+) Index: kvm/arch/s390/Kconfig === --- kvm.orig/arch/s390/Kconfig +++ kvm/arch/s390/Kconfig @@ -66,6 +66,7 @@ config S390 select HAVE_OPROFILE select HAVE_KPROBES select HAVE_KRETPROBES + select HAVE_KVM if 64BIT source init/Kconfig @@ -553,3 +554,5 @@ source security/Kconfig source crypto/Kconfig source lib/Kconfig + +source arch/s390/kvm/Kconfig Index: kvm/arch/s390/kvm/Kconfig === --- /dev/null +++ kvm/arch/s390/kvm/Kconfig @@ -0,0 +1,43 @@ +# +# KVM configuration +# +config HAVE_KVM + bool + +menuconfig VIRTUALIZATION + bool Virtualization + default y + ---help--- + Say Y here to get to see options for using your Linux host to run other + operating systems inside virtual machines (guests). + This option alone does not add any kernel code. + + If you say N, all options in this submenu will be skipped and disabled. + +if VIRTUALIZATION + +config KVM + tristate Kernel-based Virtual Machine (KVM) support + depends on HAVE_KVM EXPERIMENTAL + select PREEMPT_NOTIFIERS + select ANON_INODES + select S390_SWITCH_AMODE + select PREEMPT + ---help--- + Support hosting paravirtualized guest machines using the SIE + virtualization capability on the mainframe. This should work + on any 64bit machine. + + This module provides access to the hardware capabilities through + a character device node named /dev/kvm. + + To compile this as a module, choose M here: the module + will be called kvm. + + If unsure, say N. + +# OK, it's a little counter-intuitive to do this, but it puts it neatly under +# the virtualization menu. +source drivers/virtio/Kconfig + +endif # VIRTUALIZATION - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC/PATCH 12/15] kvm-s390: API documentation
From: Carsten Otte [EMAIL PROTECTED] This patch adds Documentation/s390/kvm.txt, which describes specifics of kvm's user interface that are unique to s390 architecture. Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- Documentation/s390/kvm.txt | 125 + 1 file changed, 125 insertions(+) Index: kvm/Documentation/s390/kvm.txt === --- /dev/null +++ kvm/Documentation/s390/kvm.txt @@ -0,0 +1,125 @@ +*** BIG FAT WARNING *** +The kvm module is currently in EXPERIMENTAL state for s390. This means, that +the interface to the module is not yet considered to remain stable. Thus, be +prepared that we keep breaking your userspace application and guest +compatibility over and over again until we feel happy with the result. Make sure +your guest kernel, your host kernel, and your userspace launcher are in a +consistent state. + +This Documentation describes the unique ioctl calls to /dev/kvm, the resulting +kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86. + +1. ioctl calls to /dev/kvm +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_GET_API_VERSION +KVM_CREATE_VM (*) see note +KVM_CHECK_EXTENSION +KVM_GET_VCPU_MMAP_SIZE + +Notes: +* KVM_CREATE_VM may fail on s390, if the calling process has multiple +threads and has not called KVM_S390_ENABLE_SIE before. + +In addition, on s390 the following architecture specific ioctls are supported: +ioctl: KVM_S390_ENABLE_SIE +args: none +see also: include/linux/kvm.h +This call causes the kernel to switch on PGSTE in the user page table. This +operation is needed in order to run a virtual machine, and it requires the +calling process to be single-threaded. Note that the first call to KVM_CREATE_VM +will implicitly try to switch on PGSTE if the user process has not called +KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads +before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will +observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time +operation, is not reversible, and will persist over the entire lifetime of +the calling process. It does not have any user-visibe effect other than a small +performance penalty. + +2. ioctl calls to the kvm-vm file descriptor +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_CREATE_VCPU +KVM_SET_USER_MEMORY_REGION (*) see note +KVM_GET_DIRTY_LOG (**) see note + +Notes: +* kvm does only allow exactly one memory slot on s390, which has to start + at guest absolute address zero and at a user address that is aligned on any + page boundary. This hardware limitation allows us to have a few unique + optimizations. The memory slot does'nt have to be filled + with memory actually, it may contain sparse holes. That said, with different + user memory layout this does still allow a large flexibility when + doing the guest memory setup. +** KVM_GET_DIRTY_LOG does'nt work proper yet. The user will receive an empty +log. This ioctl call is only needed for guest migration, and we intend to +implement this one in the future. + +In addition, on s390 the following architecture specific ioctls for the kvm-vm +file descriptor are supported: +ioctl: KVM_S390_INTERRUPT +args: struct kvm_s390_interrupt * +see also: include/linux/kvm.h +This ioctl is used to submit a floating interrupt for a virtual machine. +Floating interrupts may be delivered to any virtual cpu in the configuration. +Only some interrupt types defined in include/linux/kvm.h make sense when +submitted as floating interrupt. The following interrupts are not considered +to be useful as floating interrupt, and a call to inject them will result in +-EINVAL error code: program interrupts, and interprocessor signals. Valid +floating interrupts are: +KVM_S390_INT_VIRTIO +KVM_S390_INT_SERVICE + +3. ioctl calls to the kvm-vcpu file descriptor +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_RUN +KVM_GET_REGS +KVM_SET_REGS +KVM_GET_SREGS +KVM_SET_SREGS +KVM_GET_FPU +KVM_SET_FPU + +In addition, on s390 the following architecture specific ioctls for the +kvm-vcpu file descriptor are supported: +ioctl: KVM_S390_INTERRUPT +args: struct kvm_s390_interrupt * +see also: include/linux/kvm.h +This ioctl is used to submit an interrupt for a specific virtual cpu. +Only some interrupt types defined in include/linux/kvm.h make sense when +submitted for a specific cpu. The following interrupts are not considered +to be useful, and a call to inject them will result in -EINVAL error code: +service processor calls, and virtio interrupts. Valid interrupt types are: +KVM_S390_PROGRAM_INT +KVM_S390_SIGP_STOP
[kvm-devel] [RFC/PATCH 13/15] kvm-s390: update maintainers
From: Christian Borntraeger [EMAIL PROTECTED] This patch adds an entry for kvm on s390 to the MAINTAINERS file :-). We intend to push all patches regarding this via Avi's kvm.git. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- MAINTAINERS | 10 ++ 1 file changed, 10 insertions(+) Index: kvm/MAINTAINERS === --- kvm.orig/MAINTAINERS +++ kvm/MAINTAINERS @@ -2296,6 +2296,16 @@ L: [EMAIL PROTECTED] W: kvm.sourceforge.net S: Supported +KERNEL VIRTUAL MACHINE for s390 (KVM/s390) +P: Carsten Otte +M: [EMAIL PROTECTED] +P: Christian Borntraeger +M: [EMAIL PROTECTED] +M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] +W: http://www.ibm.com/developerworks/linux/linux390/ +S: Supported + KEXEC P: Eric Biederman M: [EMAIL PROTECTED] - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
From: Christian Borntraeger [EMAIL PROTECTED] From: Carsten Otte [EMAIL PROTECTED] This patch adds functionality to detect if the kernel runs under the KVM hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This allows drivers to skip device detection if the systems runs non-virtualized. We also define a preferred console to avoid having the ttyS0, which is a line mode only console. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/Kconfig|7 +++ arch/s390/kernel/early.c |4 arch/s390/kernel/setup.c | 10 +++--- include/asm-s390/setup.h |1 + 4 files changed, 19 insertions(+), 3 deletions(-) Index: kvm/arch/s390/Kconfig === --- kvm.orig/arch/s390/Kconfig +++ kvm/arch/s390/Kconfig @@ -533,6 +533,13 @@ config ZFCPDUMP Select this option if you want to build an zfcpdump enabled kernel. Refer to file:Documentation/s390/zfcpdump.txt for more details on this. +config S390_GUEST +bool s390 guest support (EXPERIMENTAL) + depends on 64BIT EXPERIMENTAL + select VIRTIO + select VIRTIO_RING + help + Select this option if you want to run the kernel under s390 linux endmenu source net/Kconfig Index: kvm/arch/s390/kernel/early.c === --- kvm.orig/arch/s390/kernel/early.c +++ kvm/arch/s390/kernel/early.c @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + + /* Running under KVM ? */ + if (cpuinfo-cpu_id.version == 0xfe) + machine_flags |= 64; } #ifdef CONFIG_64BIT Index: kvm/arch/s390/kernel/setup.c === --- kvm.orig/arch/s390/kernel/setup.c +++ kvm/arch/s390/kernel/setup.c @@ -793,9 +793,13 @@ setup_arch(char **cmdline_p) This machine has an IEEE fpu\n : This machine has no IEEE fpu\n); #else /* CONFIG_64BIT */ - printk((MACHINE_IS_VM) ? - We are running under VM (64 bit mode)\n : - We are running native (64 bit mode)\n); + if (MACHINE_IS_VM) + printk(We are running under VM (64 bit mode)\n); + else if (MACHINE_IS_KVM) { + printk(We are running under KVM (64 bit mode)\n); + add_preferred_console(ttyS, 1, NULL); + } else + printk(We are running native (64 bit mode)\n); #endif /* CONFIG_64BIT */ /* Save unparsed command line copy for /proc/cmdline */ Index: kvm/include/asm-s390/setup.h === --- kvm.orig/include/asm-s390/setup.h +++ kvm/include/asm-s390/setup.h @@ -62,6 +62,7 @@ extern unsigned long machine_flags; #define MACHINE_IS_VM (machine_flags 1) #define MACHINE_IS_P390(machine_flags 4) #define MACHINE_HAS_MVPG (machine_flags 16) +#define MACHINE_IS_KVM (machine_flags 64) #define MACHINE_HAS_IDTE (machine_flags 128) #define MACHINE_HAS_DIAG9C (machine_flags 256) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls
From: Christian Borntraeger [EMAIL PROTECTED] This patch implements kvm guest kernel support for paravirtualized devices and contains two parts: o a basic virtio stub using virtio_ring and external interrupts and hypercalls o full hypercall implementation in kvm_para.h Currently we dont have PCI on s390. Making virtio_pci usable for s390 seems more complicated that providing an own stub. This virtio stub is similar to the lguest one, the memory for the descriptors and the device detection is made via additional mapped memory on top of the guest storage. We use an external interrupt with extint code 1237 for host-guest notification. The hypercall definition uses the diag instruction for issuing a hypercall. The parameters are written in R2-R7, the hypercall number is written in R1. This is similar to the system call ABI (svc) which can use R1 for the number and R2-R6 for the parameters. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- drivers/s390/Makefile |2 drivers/s390/kvm/Makefile |9 + drivers/s390/kvm/kvm_virtio.c | 326 ++ drivers/s390/kvm/kvm_virtio.h | 47 ++ include/asm-s390/kvm_para.h | 124 +++ 5 files changed, 505 insertions(+), 3 deletions(-) Index: kvm/drivers/s390/Makefile === --- kvm.orig/drivers/s390/Makefile +++ kvm/drivers/s390/Makefile @@ -5,7 +5,7 @@ CFLAGS_sysinfo.o += -Iinclude/math-emu -Iarch/s390/math-emu -w obj-y += s390mach.o sysinfo.o s390_rdev.o -obj-y += cio/ block/ char/ crypto/ net/ scsi/ +obj-y += cio/ block/ char/ crypto/ net/ scsi/ kvm/ drivers-y += drivers/s390/built-in.o Index: kvm/drivers/s390/kvm/Makefile === --- /dev/null +++ kvm/drivers/s390/kvm/Makefile @@ -0,0 +1,9 @@ +# Makefile for kvm guest drivers on s390 +# +# Copyright IBM Corp. 2008 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License (version 2 only) +# as published by the Free Software Foundation. + +obj-$(CONFIG_VIRTIO) += kvm_virtio.o Index: kvm/drivers/s390/kvm/kvm_virtio.c === --- /dev/null +++ kvm/drivers/s390/kvm/kvm_virtio.c @@ -0,0 +1,326 @@ +/* + * kvm_virtio.c - virtio for kvm on s390 + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Christian Borntraeger [EMAIL PROTECTED] + */ + +#include linux/init.h +#include linux/bootmem.h +#include linux/err.h +#include linux/virtio.h +#include linux/virtio_config.h +#include linux/interrupt.h +#include linux/virtio_ring.h +#include asm/io.h +#include asm/kvm_para.h +#include asm/setup.h +#include asm/s390_ext.h + +#include kvm_virtio.h + +/* + * The pointer to our (page) of device descriptions. + */ +static void *kvm_devices; + +/* + * Unique numbering for kvm devices. + */ +static unsigned int dev_index; + +struct kvm_device { + struct virtio_device vdev; + struct kvm_device_desc *desc; +}; + +#define to_kvmdev(vd) container_of(vd, struct kvm_device, vdev) + +/* + * memory layout: + * - kvm_device_descriptor + *struct kvm_device_desc + * - configuration + *struct kvm_vqconfig + * - feature bits + * - config space + */ +static struct kvm_vqconfig *kvm_vq_config(const struct kvm_device_desc *desc) +{ + return (struct kvm_vqconfig *)(desc + 1); +} + +static u8 *kvm_vq_features(const struct kvm_device_desc *desc) +{ + return (u8 *)(kvm_vq_config(desc) + desc-num_vq); +} + +static u8 *kvm_vq_configspace(const struct kvm_device_desc *desc) +{ + return kvm_vq_features(desc) + desc-feature_len * 2; +} + +/* + * The total size of the config page used by this device (incl. desc) + */ +static unsigned desc_size(const struct kvm_device_desc *desc) +{ + return sizeof(*desc) + + desc-num_vq * sizeof(struct kvm_vqconfig) + + desc-feature_len * 2 + + desc-config_len; +} + +/* + * This tests (and acknowleges) a feature bit. + */ +static bool kvm_feature(struct virtio_device *vdev, unsigned fbit) +{ + struct kvm_device_desc *desc = to_kvmdev(vdev)-desc; + u8 *features; + + if (fbit / 8 desc-feature_len) + return false; + + features = kvm_vq_features(desc); + if (!(features[fbit / 8] (1 (fbit % 8 + return false; + + /* +* We set the matching bit in the other half of the bitmap to tell the +* Host we want to use this feature. +*/ + features[desc-feature_len + fbit / 8] |= (1 (fbit % 8)); + return true; +} +
[kvm-devel] [RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This patch introduces in-kernel handling of some intercepts for privileged instructions: handle_set_prefix()sets the prefix register of the local cpu handle_store_prefix() stores the content of the prefix register to memory handle_store_cpu_address() stores the cpu number of the current cpu to memory handle_skey() just decrements the instruction address and retries handle_stsch() delivers condition code 3 operation not supported handle_chsc() same here handle_stfl() stores the facility list which contains the capabilities of the cpu handle_stidp() stores cpu type/model/revision and such handle_stsi() stores information about the system topology Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/kvm/Makefile |2 arch/s390/kvm/intercept.c |1 arch/s390/kvm/kvm-s390.c| 11 + arch/s390/kvm/kvm-s390.h|3 arch/s390/kvm/priv.c| 322 include/asm-s390/kvm_host.h | 13 + 6 files changed, 351 insertions(+), 1 deletion(-) Index: kvm/arch/s390/kvm/Makefile === --- kvm.orig/arch/s390/kvm/Makefile +++ kvm/arch/s390/kvm/Makefile @@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm -kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o +kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/intercept.c === --- kvm.orig/arch/s390/kvm/intercept.c +++ kvm/arch/s390/kvm/intercept.c @@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu * } static intercept_handler_t instruction_handlers[256] = { + [0xb2] = kvm_s390_handle_priv, [0xb7] = handle_lctl, [0xeb] = handle_lctg, }; Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -48,6 +48,15 @@ struct kvm_stats_debugfs_item debugfs_en { deliver_restart_signal, VCPU_STAT(deliver_restart_signal) }, { deliver_program_interruption, VCPU_STAT(deliver_program_int) }, { exit_wait_state, VCPU_STAT(exit_wait_state) }, + { instruction_stidp, VCPU_STAT(instruction_stidp) }, + { instruction_spx, VCPU_STAT(instruction_spx) }, + { instruction_stpx, VCPU_STAT(instruction_stpx) }, + { instruction_stap, VCPU_STAT(instruction_stap) }, + { instruction_storage_key, VCPU_STAT(instruction_storage_key) }, + { instruction_stsch, VCPU_STAT(instruction_stsch) }, + { instruction_chsc, VCPU_STAT(instruction_chsc) }, + { instruction_stsi, VCPU_STAT(instruction_stsi) }, + { instruction_stfl, VCPU_STAT(instruction_stfl) }, { NULL } }; @@ -249,6 +258,8 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu vcpu-arch.sie_block-eca = 0xC1002001U; setup_timer(vcpu-arch.ckc_timer, kvm_s390_idle_wakeup, (unsigned long) vcpu); + get_cpu_id(vcpu-arch.cpu_id); + vcpu-arch.cpu_id.version = 0xfe; return 0; } Index: kvm/arch/s390/kvm/kvm-s390.h === --- kvm.orig/arch/s390/kvm/kvm-s390.h +++ kvm/arch/s390/kvm/kvm-s390.h @@ -47,4 +47,7 @@ int kvm_s390_inject_vm(struct kvm *kvm, int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu, struct kvm_s390_interrupt *s390int); int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code); + +/* implemented in priv.c */ +int kvm_s390_handle_priv(struct kvm_vcpu *vcpu); #endif Index: kvm/arch/s390/kvm/priv.c === --- /dev/null +++ kvm/arch/s390/kvm/priv.c @@ -0,0 +1,322 @@ +/* + * priv.c - handling privileged instructions + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Carsten Otte [EMAIL PROTECTED] + * Christian Borntraeger [EMAIL PROTECTED] + */ + +#include linux/kvm.h +#include linux/errno.h +#include asm/current.h +#include asm/debug.h +#include asm/ebcdic.h +#include asm/sysinfo.h +#include gaccess.h +#include kvm-s390.h + +static int handle_set_prefix(struct kvm_vcpu *vcpu) +{ + int base2 = vcpu-arch.sie_block-ipb 28; + int disp2 = ((vcpu-arch.sie_block-ipb 0x0fff) 16); + u64 operand2; + u32 address = 0; + u8 tmp; + +
[kvm-devel] [RFC/PATCH 06/15] kvm-s390: sie intercept handling
From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] This path introduces handling of sie intercepts in three flavors: Intercepts are either handled completely in-kernel by kvm_handle_sie_intercept(), or passed to userspace with corresponding data in struct kvm_run in case kvm_handle_sie_intercept() returns -ENOTSUPP. In case of partial execution in kernel with the need of userspace support, kvm_handle_sie_intercept() may choose to set up struct kvm_run and return -EREMOTE. The trivial intercept reasons are handled in this patch: handle_noop() just does nothing for intercepts that don't require our support at all handle_stop() is called when a cpu enters stopped state, and it drops out to userland after updating our vcpu state handle_validity() faults in the cpu lowcore if needed, or passes the request to userland Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/kvm/Makefile |2 - arch/s390/kvm/intercept.c | 83 arch/s390/kvm/kvm-s390.c| 46 +++- arch/s390/kvm/kvm-s390.h|6 +++ include/asm-s390/kvm_host.h |4 ++ include/linux/kvm.h |9 6 files changed, 148 insertions(+), 2 deletions(-) Index: kvm/arch/s390/kvm/Makefile === --- kvm.orig/arch/s390/kvm/Makefile +++ kvm/arch/s390/kvm/Makefile @@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/ EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm -kvm-objs := $(common-objs) kvm-s390.o sie64a.o +kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o obj-$(CONFIG_KVM) += kvm.o Index: kvm/arch/s390/kvm/intercept.c === --- /dev/null +++ kvm/arch/s390/kvm/intercept.c @@ -0,0 +1,83 @@ +/* + * intercept.c - in-kernel handling for sie intercepts + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Carsten Otte [EMAIL PROTECTED] + * Christian Borntraeger [EMAIL PROTECTED] + */ + +#include linux/kvm_host.h +#include linux/errno.h +#include linux/pagemap.h + +#include asm/kvm_host.h + +#include kvm-s390.h + +static int handle_noop(struct kvm_vcpu *vcpu) +{ + switch (vcpu-arch.sie_block-icptcode) { + case 0x10: + vcpu-stat.exit_external_request++; + break; + case 0x14: + vcpu-stat.exit_external_interrupt++; + break; + default: + break; /* nothing */ + } + return 0; +} + +static int handle_stop(struct kvm_vcpu *vcpu) +{ + vcpu-stat.exit_stop_request++; + VCPU_EVENT(vcpu, 3, %s, cpu stopped); + atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags); + return -ENOTSUPP; +} + +static int handle_validity(struct kvm_vcpu *vcpu) +{ + int viwhy = vcpu-arch.sie_block-ipb 16; + vcpu-stat.exit_validity++; + if (viwhy == 0x37) { + fault_in_pages_writeable((char __user *) + vcpu-kvm-arch.guest_origin + + vcpu-arch.sie_block-prefix, PAGE_SIZE); + return 0; + } + VCPU_EVENT(vcpu, 2, unhandled validity intercept code %d, + viwhy); + return -ENOTSUPP; +} + +static const intercept_handler_t intercept_funcs[0x48 2] = { + [0x00 2] = handle_noop, + [0x10 2] = handle_noop, + [0x14 2] = handle_noop, + [0x20 2] = handle_validity, + [0x28 2] = handle_stop, +}; + +int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu) +{ + intercept_handler_t func; + u8 code = vcpu-arch.sie_block-icptcode; + + if (code 3 || code 0x48) + return -ENOTSUPP; + + func = intercept_funcs[code 2]; + + if (func) + return func(vcpu); + + return -ENOTSUPP; +} + Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -23,12 +23,17 @@ #include asm/lowcore.h #include asm/pgtable.h +#include kvm-s390.h #include gaccess.h #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU struct kvm_stats_debugfs_item debugfs_entries[] = { { userspace_handled, VCPU_STAT(exit_userspace) }, + { exit_validity, VCPU_STAT(exit_validity) }, + { exit_stop_request, VCPU_STAT(exit_stop_request) }, + { exit_external_request, VCPU_STAT(exit_external_request) }, + { exit_external_interrupt, VCPU_STAT(exit_external_interrupt) }, { NULL } }; @@ -384,6 +389,7 @@ static void __vcpu_run(struct
Re: [kvm-devel] [RFC/PATCH 05/15] KVM_MAX_VCPUS
Hollis Blanchard wrote: On Thu, 2008-03-20 at 17:24 +0100, Carsten Otte wrote: Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -24,7 +24,11 @@ #include asm/kvm_host.h +#ifdef CONFIG_S390 +#define KVM_MAX_VCPUS 64 +#else #define KVM_MAX_VCPUS 16 +#endif #define KVM_MEMORY_SLOTS 32 /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 Why don't we just define this in asm/kvm_host.h ? No problem with that, I just wanted to keep impact on common code very low and things like this seperated from the actual port. I have a few things like this that can safely be taken care about later. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
On Thu, 20 Mar 2008 17:25:26 +0100 Carsten Otte wrote: From: Christian Borntraeger [EMAIL PROTECTED] From: Carsten Otte [EMAIL PROTECTED] This patch adds functionality to detect if the kernel runs under the KVM hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This allows drivers to skip device detection if the systems runs non-virtualized. We also define a preferred console to avoid having the ttyS0, which is a line mode only console. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] Acked-by: Martin Schwidefsky [EMAIL PROTECTED] Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- arch/s390/Kconfig|7 +++ arch/s390/kernel/early.c |4 arch/s390/kernel/setup.c | 10 +++--- include/asm-s390/setup.h |1 + 4 files changed, 19 insertions(+), 3 deletions(-) Index: kvm/arch/s390/kernel/early.c === --- kvm.orig/arch/s390/kernel/early.c +++ kvm/arch/s390/kernel/early.c @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + + /* Running under KVM ? */ + if (cpuinfo-cpu_id.version == 0xfe) Hi, Where are these magic numbers documented? (0x7490, 0xfe, etc.) + machine_flags |= 64; } #ifdef CONFIG_64BIT --- ~Randy - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 12/15] kvm-s390: API documentation
On Thu, 20 Mar 2008 17:25:20 +0100 Carsten Otte wrote: This patch adds Documentation/s390/kvm.txt, which describes specifics of kvm's user interface that are unique to s390 architecture. Signed-off-by: Carsten Otte [EMAIL PROTECTED] --- Documentation/s390/kvm.txt | 125 + 1 file changed, 125 insertions(+) Index: kvm/Documentation/s390/kvm.txt === --- /dev/null +++ kvm/Documentation/s390/kvm.txt @@ -0,0 +1,125 @@ +*** BIG FAT WARNING *** +The kvm module is currently in EXPERIMENTAL state for s390. This means, that This means that [no comma] +the interface to the module is not yet considered to remain stable. Thus, be +prepared that we keep breaking your userspace application and guest +compatibility over and over again until we feel happy with the result. Make sure +your guest kernel, your host kernel, and your userspace launcher are in a +consistent state. + +This Documentation describes the unique ioctl calls to /dev/kvm, the resulting +kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86. + +1. ioctl calls to /dev/kvm +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_GET_API_VERSION +KVM_CREATE_VM(*) see note +KVM_CHECK_EXTENSION +KVM_GET_VCPU_MMAP_SIZE + +Notes: +* KVM_CREATE_VM may fail on s390, if the calling process has multiple +threads and has not called KVM_S390_ENABLE_SIE before. + +In addition, on s390 the following architecture specific ioctls are supported: +ioctl: KVM_S390_ENABLE_SIE +args:none +see also:include/linux/kvm.h +This call causes the kernel to switch on PGSTE in the user page table. This +operation is needed in order to run a virtual machine, and it requires the +calling process to be single-threaded. Note that the first call to KVM_CREATE_VM +will implicitly try to switch on PGSTE if the user process has not called +KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads +before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will +observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time +operation, is not reversible, and will persist over the entire lifetime of +the calling process. It does not have any user-visibe effect other than a small user-visible +performance penalty. + +2. ioctl calls to the kvm-vm file descriptor +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_CREATE_VCPU +KVM_SET_USER_MEMORY_REGION (*) see note +KVM_GET_DIRTY_LOG(**) see note + +Notes: +* kvm does only allow exactly one memory slot on s390, which has to start + at guest absolute address zero and at a user address that is aligned on any + page boundary. This hardware limitation allows us to have a few unique + optimizations. The memory slot does'nt have to be filled doesn't + with memory actually, it may contain sparse holes. That said, with different + user memory layout this does still allow a large flexibility when + doing the guest memory setup. +** KVM_GET_DIRTY_LOG does'nt work proper yet. The user will receive an empty doesn't work properly +log. This ioctl call is only needed for guest migration, and we intend to +implement this one in the future. + +In addition, on s390 the following architecture specific ioctls for the kvm-vm +file descriptor are supported: +ioctl: KVM_S390_INTERRUPT +args:struct kvm_s390_interrupt * +see also:include/linux/kvm.h +This ioctl is used to submit a floating interrupt for a virtual machine. +Floating interrupts may be delivered to any virtual cpu in the configuration. +Only some interrupt types defined in include/linux/kvm.h make sense when +submitted as floating interrupt. The following interrupts are not considered interrupts. +to be useful as floating interrupt, and a call to inject them will result in interrupts, +-EINVAL error code: program interrupts, and interprocessor signals. Valid no comma +floating interrupts are: +KVM_S390_INT_VIRTIO +KVM_S390_INT_SERVICE + +3. ioctl calls to the kvm-vcpu file descriptor +KVM does support the following ioctls on s390 that are common with other +architectures and do behave the same: +KVM_RUN +KVM_GET_REGS +KVM_SET_REGS +KVM_GET_SREGS +KVM_SET_SREGS +KVM_GET_FPU +KVM_SET_FPU + +In addition, on s390 the following architecture specific ioctls for the +kvm-vcpu file descriptor are supported: +ioctl: KVM_S390_INTERRUPT +args:struct kvm_s390_interrupt * +see also:
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
Randy Dunlap wrote: Index: kvm/arch/s390/kernel/early.c === --- kvm.orig/arch/s390/kernel/early.c +++ kvm/arch/s390/kernel/early.c @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + +/* Running under KVM ? */ +if (cpuinfo-cpu_id.version == 0xfe) Hi, Where are these magic numbers documented? (0x7490, 0xfe, etc.) +machine_flags |= 64; } #ifdef CONFIG_64BIT The cpuid (and most other things about s390 arch) are documented in the principles of operation: http://publibz.boulder.ibm.com/epubs/pdf/a2278324.pdf http://publibz.boulder.ibm.com/epubs/pdf/dz9zs001.pdf (see chapter control instructions - store cpu id) The 0xfe however is convention, the kvm arch code sets this value where it implements that instruction. See privileged instructions patch. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable
Carsten Otte wrote: +struct mm_struct *dup_mm(struct task_struct *tsk); No prototypes in .c files. Put this in an appropriate header. J - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] virtio_blk: Dont waste major numbers
Christian Borntraeger wrote: Rusty, currently virtio_blk uses one major number per device. While this works quite well on most systems it is wasteful and will exhaust major numbers on larger installations. This patch allocates a major number on init and will use 16 minor numbers for each disk. That will allow ~64k virtio_blk disks. Would it be too much to allow 64 minors (63 partitions)? I have run out of 16, myself, but never 64. -hpa - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] virtio_blk: Dont waste major numbers
Anthony Liguori wrote: Christian Borntraeger wrote: Rusty, currently virtio_blk uses one major number per device. While this works quite well on most systems it is wasteful and will exhaust major numbers on larger installations. This patch allocates a major number on init and will use 16 minor numbers for each disk. That will allow ~64k virtio_blk disks. There's are some other limitations to the number of virtio block devices. For instances... sprintf(vblk-disk-disk_name, vd%c, virtblk_index++); This gets bogus after 64 disks. We also have a hard limit for virtio-pci based on the number of PCI slots available. One thing I was considering was whether we should try to support multiple disks per virtio device. I would much rather prefer a /dev/vd/dXpY naming scheme, similar to cciss and other large disk installations. Unfortunately yet another side effect of people not habitually registering major numbers is that the namespace is not as well maintained. -hpa - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote: @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + + /* Running under KVM ? */ + if (cpuinfo-cpu_id.version == 0xfe) + machine_flags |= 64; Shouldn't these have symbolic names? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] performance question
I am trying to understand spikes in system time that I am seeing in a VM. The guest OS is RHEL4, with 2 vpcus, and 2.5Gb RAM; host is running 2.6.24.2 kernel. kvm version is kvm-63. Using the stat scripts Christian Ehrhardt posted a few days ago (thanks, Christian, very handy tool) I collected kvm_stat data as a function of time (I added time to the output). Comparing plots of guest system time to plots of kvm_stat the spikes in system time most correlate to the following kvm_stat variables: mmu_cache_miss mmu_flooded mmu_pte_updated mmu_pte_write mmu_shadow_zapped pf_fixed pf_guest remote_tlb_flush tlb_flush Can someone provide some guidance/hints on what would cause spikes in the above and if there is anything I can do to improve it? The load on the VM is fairly constant (network traffic of ~48kB/sec received and ~189kB/sec transmit) with some moderate disk IO as well. thanks, david - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable
On Thu, 2008-03-20 at 10:28 -0700, Jeremy Fitzhardinge wrote: Carsten Otte wrote: +struct mm_struct *dup_mm(struct task_struct *tsk); No prototypes in .c files. Put this in an appropriate header. Well, and more fundamentally: do we really want dup_mm() able to be called from other code? Maybe we need a bit more detailed justification why fork() itself isn't good enough. It looks to me like they basically need an arch-specific argument to fork, telling the new process's page tables to take the fancy new bit. I'm really curious how this new stuff is going to get used. Are you basically replacing fork() when creating kvm guests? -- Dave - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable
Dave Hansen wrote: Well, and more fundamentally: do we really want dup_mm() able to be called from other code? Maybe we need a bit more detailed justification why fork() itself isn't good enough. It looks to me like they basically need an arch-specific argument to fork, telling the new process's page tables to take the fancy new bit. I'm really curious how this new stuff is going to get used. Are you basically replacing fork() when creating kvm guests? No. The trick is, that we do need bigger page tables when running guests: our page tables are usually 2k, but when running a guest they're 4k to track both guest and host dirtyreference information. This looks like this: *--* *2k PTE's * *--* *2k PGSTE * *--* We don't want to waste precious memory for all page tables. We'd like to have one kernel image that runs regular server workload _and_ guests. Therefore, we need to reallocate the page table after fork() once we know that task is going to be a hypervisor. That's what this code does: reallocate a bigger page table to accomondate the extra information. The task needs to be single-threaded when calling for extended page tables. Btw: at fork() time, we cannot tell whether or not the user's going to be a hypervisor. Therefore we cannot do this in fork. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote: @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + +/* Running under KVM ? */ +if (cpuinfo-cpu_id.version == 0xfe) +machine_flags |= 64; Shouldn't these have symbolic names? You mean symbolics for machine_flags? Or symbolics for cpu ids? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote: Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote: @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + + /* Running under KVM ? */ + if (cpuinfo-cpu_id.version == 0xfe) + machine_flags |= 64; Shouldn't these have symbolic names? You mean symbolics for machine_flags? Or symbolics for cpu ids? Either. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote: Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote: @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + + /* Running under KVM ? */ + if (cpuinfo-cpu_id.version == 0xfe) + machine_flags |= 64; Shouldn't these have symbolic names? You mean symbolics for machine_flags? Or symbolics for cpu ids? Either. Hmmh. For cpu id's did'nt make sense probably until now that kvm also uses them. Before, this was the only one place that uses them. With kvm and 0xfe, this one is sort of temporary one. We intend to rework this code to use store system information, which would give us way more information about the machine and it's hypervisor topology. Up until my todo list gets to that point, I think we'll have to cope with a temporary number. We'll aim for making that change before 2.6.26 gets released. The machine flags do have symbolic names, defined in include/asm-s390/setup.h. And yea, they should be used here. Will change that. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] QEMU hotplug: check device name in drive_add
Using drive_add with bogus devfn values would segfault QEMU when attempting to add scsi devices. Attached patch checks in hotplug code for appropriate devices that drive_add() will work with (looking before leaping) and bails if you don't specify a proper device with your bus,devfn. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 [EMAIL PROTECTED] diffstat output: device-hotplug.c | 14 +- 1 files changed, 13 insertions(+), 1 deletion(-) Signed-off-by: Ryan Harper [EMAIL PROTECTED] --- When using drive_add in the QEMU monitor, if one specifies a bogus devfn to the command while specifying a scsi disk (if=scsi), then QEMU segfaults due to issues with getting a valid return from find_pci_dev, and vl.c setting unit_id=0 avoiding lsi_scsi_attach's check for a controller. Rather than muck through the unit_id calculation (which does make sense for the case that users don't specify a unit_id), in drive_add() we know that we only support the SCSI controller and virtio_blk, so ignore any devfn that doesn't point to either type of device. Signed-off-by: Ryan Harper [EMAIL PROTECTED] diff --git a/qemu/hw/device-hotplug.c b/qemu/hw/device-hotplug.c index 98a467c..a717d9b 100644 --- a/qemu/hw/device-hotplug.c +++ b/qemu/hw/device-hotplug.c @@ -55,7 +55,7 @@ void drive_hot_add(int pcibus, const char *devfn_string, const char *opts) { int drive_idx, type, bus; int devfn; -int success = 0; +int success = 0, valid_dev = 0; PCIDevice *dev; devfn = strtoul(devfn_string, NULL, 0); @@ -67,6 +67,18 @@ void drive_hot_add(int pcibus, const char *devfn_string, const char *opts) return; } +if (!strcmp(dev-name, LSI53C895A SCSI HBA)) { +valid_dev = 1; +} else if (!strcmp(dev-name, virtio-blk)) { +valid_dev = 1; +} + +if (!valid_dev) { +term_printf(Invalid PCI Device specified by bus:%d devfn:%d\n, +pcibus, devfn); +return; +} + drive_idx = add_init_drive(opts); if (drive_idx 0) return; - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm
On Thu, Mar 20, 2008 at 09:59:32PM +0100, Carsten Otte wrote: Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote: Christoph Hellwig wrote: On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote: @@ -143,6 +143,10 @@ static noinline __init void detect_machi /* Running on a P/390 ? */ if (cpuinfo-cpu_id.machine == 0x7490) machine_flags |= 4; + +/* Running under KVM ? */ +if (cpuinfo-cpu_id.version == 0xfe) +machine_flags |= 64; Shouldn't these have symbolic names? You mean symbolics for machine_flags? Or symbolics for cpu ids? Either. [...] The machine flags do have symbolic names, defined in include/asm-s390/setup.h. And yea, they should be used here. Will change that. Since when do we have symbolic names for the bits? It was always on my todo list to do a cleanup and replace the numbers we use everywhere with names. Especially since we have clashes from time to time... but that didn't hurt enough yet, obviously. But now that you volunteered to take care of this... :) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
On Friday 21 March 2008 01:04:17 Anthony Liguori wrote: Rusty Russell wrote: From: Paul TBBle Hampson [EMAIL PROTECTED] This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory mappings created by map_zeroed_pages. I created a test program recently that measured the latency of a reads/writes to an mmap() file in /dev/shm and in a normal filesystem. Even after unlinking the underlying file, the write latency was much better with a mmap()'d file in /dev/shm. How odd! Do you have any idea why? /dev/shm is not really for general use. I think we'll want to have our own tmpfs mount that we use to create VM images. If we're going to mod the kernel, how about a mmap this part of their address space and having the kernel keep the mappings in sync. But I think that if we want to get speed, we should probably be doing the copy between address spaces in-kernel so we can do lightweight exits. I also prefer to use a unix socket for communication, unlink the file immediately after open, and then pass the fd via SCM_RIGHTS to the other process. Yeah, I shied away from that because cred passing kills whole litters of puppies. It makes for better encapsulation tho, so I'd do it that way in a serious implementation. Cheers, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest
On Thursday 20 March 2008 17:54:45 Avi Kivity wrote: Rusty Russell wrote: Hi all, Just finished my prototype of inter-guest virtio, using networking as an example. Each guest mmaps the other's address space and uses a FIFO for notifications. Isn't that a security hole (hole? chasm)? If the two guests can access each other's memory, they might as well be just one guest, and communicate internally. Sorry, sloppy language on my part. Each launcher process maps the other guest's memory as well: ie. copying occurs in the host. My feeling is that the host needs to copy the data, using dma if available. Another option is to have one guest map the other's memory for read and write, while the other guest is unprivileged. This allows one privileged guest to provide services for other, unprivileged guests, like domain 0 or driver domains in Xen. One having privilege is possible, even trivial with the current patch (it's actually doing a completely generic inter-virtio-ring shuffle). I chose the symmetrical approach for this demo for no particularly good reason. Cheers, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file
Rusty Russell wrote: How odd! Do you have any idea why? Nope, but part of the reason I did this was I recalled a similar discussion relating to kqemu and why it used /dev/shm. I thought it was only an issue with older kernels but apparently not. /dev/shm is not really for general use. I think we'll want to have our own tmpfs mount that we use to create VM images. If we're going to mod the kernel, how about a mmap this part of their address space and having the kernel keep the mappings in sync. But I think that if we want to get speed, we should probably be doing the copy between address spaces in-kernel so we can do lightweight exits. I don't think lightweight exits help the situation very much. The difference between a light weight and heavy weight exit is only 3-4k cycles or so. in-kernel doesn't make the situation much easier. You have to map pages in from a different task. It's a lot easier if you have both guest mapped in userspace. I also prefer to use a unix socket for communication, unlink the file immediately after open, and then pass the fd via SCM_RIGHTS to the other process. Yeah, I shied away from that because cred passing kills whole litters of puppies. It makes for better encapsulation tho, so I'd do it that way in a serious implementation. I'm working on an implementation for KVM at the moment. Instead of just supporting two guests, I'm looking to support N-guests and provide a simple switch. I'll have patches soon. Regards, Anthony Liguori Cheers, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls
On Friday 21 March 2008 03:25:28 Carsten Otte wrote: +static void kvm_set_status(struct virtio_device *vdev, u8 status) +{ + BUG_ON(!status); + to_kvmdev(vdev)-desc-status = status; +} + +/* + * To reset the device, we (ab)use the NOTIFY hypercall, with the descriptor + * address of the device. The Host will zero the status and all the + * features. + */ +static void kvm_reset(struct virtio_device *vdev) +{ + unsigned long offset = (void *)to_kvmdev(vdev)-desc - kvm_devices; + + kvm_hypercall1(1237, (max_pfnPAGE_SHIFT) + offset); +} I'd recommend a hypercall after set_status, as well as reset. The reason lguest doesn't do this is that we don't do feature negotiation (assuming guest kernel matches host kernel). In general, the host needs to know when the VIRTIO_CONFIG_S_DRIVER_OK is set so it can see what features the guest driver accepted. Overloading the notify hypercall is kind of a hack too, but it works so no real need to change that. + * The root device for the kvm virtio devices. + * This makes them appear as /sys/devices/kvm/0,1,2 not /sys/devices/0,1,2. + */ +static struct device kvm_root = { + .parent = NULL, + .bus_id = kvm_s390, +}; You mean /sys/devices/kvm_s390/0,1,2? +static int __init kvm_devices_init(void) +{ + if (!MACHINE_IS_KVM) + return -ENODEV; + + if (device_register(kvm_root) != 0) + panic(Could not register kvm root); + + if (add_shared_memory((max_pfn) PAGE_SHIFT, PAGE_SIZE)) { + device_unregister(kvm_root); + return -ENOMEM; + } Hmm, panic on device_register fail, but -ENOMEM on add_shared_memory fail? My theory was that since this is boot time, panic() is the right thing. Cheers, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file
On Thursday 20 March 2008 19:16:00 Tim Post wrote: On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote: + snprintf(memfile_path, PATH_MAX, %s/.lguest, getenv(HOME) ?: ); Hi Rusty, Is that safe if being run via setuid/gid or shared root? It might be better to just look it up in /etc/passwd against the real UID, considering that anyone can change (or null) that env string. Hi Tim, Fair point: it is bogus in this usage case. Of course, setuid-ing lguest is dumb anyway, since you could use --block= to read and write any file in the filesystem. The mid-term goal is to allow non-root to run lguest, which fixes this problem (we don't allow that at the moment, as the guest can pin memory). Cheers, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] KVM Test result, kernel f1080a0.., userspace 49cf2d2..
Avi Kivity wrote: Yunfeng Zhao wrote: Following issues fixed: 1. qcow based smp linux guests likely hang https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831 2. smp windows installer crashes while rebooting https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831 No idea how these were fixed. The first one should be fixed by in-kernel pit. But not sure about the second one. 3. Timer of guest is inaccurate https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599 This may be the in-kernel pit. 4. Installer of 64bit vista guest will pause for ten minutes after reboot https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599 The pit again?! Confused. Yes, this one should be fixed by in-kernel pit too. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel