[kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Rusty Russell
Hi all,

   Just finished my prototype of inter-guest virtio, using networking as an 
example.  Each guest mmaps the other's address space and uses a FIFO for 
notifications.

   There are two issues with this approach.  The first is that neither guest 
can change its mappings.  See patch 1.  The second is that our feature 
configuration is host presents, guest chooses which breaks down when we 
don't know the capabilities of each guest.  In particular, TSO capability for 
networking.

   There are three possible solutions:
1) Just offer the lowest common denominator to both sides (ie. no features). 
   This is what I do with lguest in these patches.
2) Offer something and handle the case where one Guest accepts and another
   doesn't by emulating it.  ie. de-TSO the packets manually.
3) Hot unplug the device from the guest which asks for the greater features,
   then re-add it offering less features.  Requires hotplug in the guest OS.

I haven't tuned or even benchmarked these patches, but it pings!
Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Rusty Russell
From: Paul TBBle Hampson [EMAIL PROTECTED]

This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory
mappings created by map_zeroed_pages.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 Documentation/lguest/lguest.c |   59 --
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -236,19 +236,51 @@ static int open_or_die(const char *name,
return fd;
 }
 
-/* map_zeroed_pages() takes a number of pages. */
+/* unlink_memfile() removes the backing file for the Guest's memory, if we exit
+ * cleanly. */
+static char memfile_path[PATH_MAX];
+
+static void unlink_memfile(void)
+{
+   unlink(memfile_path);
+}
+
+/* map_zeroed_pages() takes a number of pages, and creates a mapping file where
+ * this Guest's memory lives. */
 static void *map_zeroed_pages(unsigned int num)
 {
-   int fd = open_or_die(/dev/zero, O_RDONLY);
+   int fd;
void *addr;
 
-   /* We use a private mapping (ie. if we write to the page, it will be
-* copied). */
+   /* We create a .lguest directory in the user's home, to put the memory
+* files into. */
+   snprintf(memfile_path, PATH_MAX, %s/.lguest, getenv(HOME) ?: );
+   if (mkdir(memfile_path, S_IRWXU) != 0  errno != EEXIST)
+   err(1, Creating directory %s, memfile_path);
+
+   /* Name the memfiles by the process ID of this launcher. */
+   snprintf(memfile_path, PATH_MAX, %s/.lguest/%u,
+getenv(HOME) ?: , getpid());
+   fd = open(memfile_path, O_RDWR | O_CREAT | O_TRUNC, S_IRWXU);
+   if (fd  0)
+   err(1, Creating memory backing file %s, memfile_path);
+
+   /* Make sure we remove it when we're finished. */
+   atexit(unlink_memfile);
+
+   /* Now, we opened it with O_TRUNC, so the file is 0 bytes long.  Here
+* we expand it to the length we need, and it will be filled with
+* zeroes. */
+   if (ftruncate(fd, num * getpagesize()) != 0)
+   err(1, Truncating file %s %u pages, memfile_path, num);
+
+   /* We use a shared mapping, so others can share with us. */
addr = mmap(NULL, getpagesize() * num,
-   PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0);
+   PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
err(1, Mmaping %u pages of /dev/zero, num);
 
+   verbose(Memory backing file is %s @ %p\n, memfile_path, addr);
return addr;
 }
 
@@ -263,23 +295,12 @@ static void *get_pages(unsigned int num)
return addr;
 }
 
-/* This routine is used to load the kernel or initrd.  It tries mmap, but if
- * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
- * it falls back to reading the memory in. */
+/* This routine is used to load the kernel or initrd.  We used to mmap, but now
+ * we simply read it in, so it will be present in the shared underlying
+ * file. */
 static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
 {
ssize_t r;
-
-   /* We map writable even though for some segments are marked read-only.
-* The kernel really wants to be writable: it patches its own
-* instructions.
-*
-* MAP_PRIVATE means that the page won't be copied until a write is
-* done to it.  This allows us to share untouched memory between
-* Guests. */
-   if (mmap(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC,
-MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
-   return;
 
/* pread does a seek and a read in one shot: saves a few lines. */
r = pread(fd, addr, len, offset);

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC PATCH 2/5] lguest: Encapsulate Guest memory ready for dealing with other Guests.

2008-03-20 Thread Rusty Russell
We currently keep Guest memory pointer and size in globals.  We move
this into a structure and explicitly hand that to to_guest_phys() and
from_guest_phys() so we can deal with other Guests' memory.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 Documentation/lguest/lguest.c |   89 +++---
 1 file changed, 49 insertions(+), 40 deletions(-)

diff -r 95558c7d210e Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c Thu Mar 13 14:11:40 2008 +1100
+++ b/Documentation/lguest/lguest.c Thu Mar 13 23:05:35 2008 +1100
@@ -76,10 +76,20 @@ static bool verbose;
 
 /* The pipe to send commands to the waker process */
 static int waker_fd;
-/* The pointer to the start of guest memory. */
-static void *guest_base;
-/* The maximum guest physical address allowed, and maximum possible. */
-static unsigned long guest_limit, guest_max;
+
+struct guest_memory
+{
+   /* The pointer to the start of guest memory. */
+   void *base;
+   /* The maximum guest physical address allowed. */
+   unsigned long limit;
+};
+
+/* The maximum possible page for the guest. */
+static unsigned long guest_max;
+
+/* This Guest's memory. */
+static struct guest_memory gmem;
 
 /* a per-cpu variable indicating whose vcpu is currently running */
 static unsigned int __thread cpu_id;
@@ -207,20 +217,19 @@ static u8 *get_feature_bits(struct devic
  * will get you through this section.  Or, maybe not.
  *
  * The Launcher sets up a big chunk of memory to be the Guest's physical
- * memory and stores it in guest_base.  In other words, Guest physical ==
- * Launcher virtual with an offset.
+ * memory.  In other words, Guest physical == Launcher virtual with an offset.
  *
  * This can be tough to get your head around, but usually it just means that we
  * use these trivial conversion functions when the Guest gives us it's
  * physical addresses: */
-static void *from_guest_phys(unsigned long addr)
+static void *from_guest_phys(struct guest_memory *mem, unsigned long addr)
 {
-   return guest_base + addr;
+   return mem-base + addr;
 }
 
-static unsigned long to_guest_phys(const void *addr)
+static unsigned long to_guest_phys(struct guest_memory *mem, const void *addr)
 {
-   return (addr - guest_base);
+   return (addr - mem-base);
 }
 
 /*L:130
@@ -287,10 +296,10 @@ static void *map_zeroed_pages(unsigned i
 /* Get some more pages for a device. */
 static void *get_pages(unsigned int num)
 {
-   void *addr = from_guest_phys(guest_limit);
+   void *addr = from_guest_phys(gmem, gmem.limit);
 
-   guest_limit += num * getpagesize();
-   if (guest_limit  guest_max)
+   gmem.limit += num * getpagesize();
+   if (gmem.limit  guest_max)
errx(1, Not enough memory for devices);
return addr;
 }
@@ -351,7 +360,7 @@ static unsigned long map_elf(int elf_fd,
i, phdr[i].p_memsz, (void *)phdr[i].p_paddr);
 
/* We map this section of the file at its physical address. */
-   map_at(elf_fd, from_guest_phys(phdr[i].p_paddr),
+   map_at(elf_fd, from_guest_phys(gmem, phdr[i].p_paddr),
   phdr[i].p_offset, phdr[i].p_filesz);
}
 
@@ -371,7 +380,7 @@ static unsigned long load_bzimage(int fd
struct boot_params boot;
int r;
/* Modern bzImages get loaded at 1M. */
-   void *p = from_guest_phys(0x10);
+   void *p = from_guest_phys(gmem, 0x10);
 
/* Go back to the start of the file and read the header.  It should be
 * a Linux boot header (see Documentation/i386/boot.txt) */
@@ -444,7 +453,7 @@ static unsigned long load_initrd(const c
/* We map the initrd at the top of memory, but mmap wants it to be
 * page-aligned, so we round the size up for that. */
len = page_align(st.st_size);
-   map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
+   map_at(ifd, from_guest_phys(gmem, mem - len), 0, st.st_size);
/* Once a file is mapped, you can close the file descriptor.  It's a
 * little odd, but quite useful. */
close(ifd);
@@ -473,7 +482,7 @@ static unsigned long setup_pagetables(un
linear_pages = (mapped_pages + ptes_per_page-1)/ptes_per_page;
 
/* We put the toplevel page directory page at the top of memory. */
-   pgdir = from_guest_phys(mem) - initrd_size - getpagesize();
+   pgdir = from_guest_phys(gmem, mem) - initrd_size - getpagesize();
 
/* Now we use the next linear_pages pages as pte pages */
linear = (void *)pgdir - linear_pages*getpagesize();
@@ -487,16 +496,16 @@ static unsigned long setup_pagetables(un
/* The top level points to the linear page table pages above. */
for (i = 0; i  mapped_pages; i += ptes_per_page) {
pgdir[i/ptes_per_page]
-   = ((to_guest_phys(linear) + i*sizeof(void *))
+   = ((to_guest_phys(gmem, 

[kvm-devel] [RFC PATCH 3/5] lguest: separate out virtqueue info from device info.

2008-03-20 Thread Rusty Russell
To deal with other Guest's virtqueue, we need to separate out the
parts of the structure which deal with the actual virtqueue from
configuration information and the device.  Then we can change the
virtqueue descriptor handling functions to take that smaller
structure.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 Documentation/lguest/lguest.c |  142 ++
 1 file changed, 76 insertions(+), 66 deletions(-)

diff -r 49ed4fa72c7c Documentation/lguest/lguest.c
--- a/Documentation/lguest/lguest.c Mon Mar 17 15:33:54 2008 +1100
+++ b/Documentation/lguest/lguest.c Mon Mar 17 22:33:20 2008 +1100
@@ -148,6 +148,18 @@ struct device
 };
 
 /* The virtqueue structure describes a queue attached to a device. */
+struct virtqueue_info
+{
+   /* The memory this virtqueue sits in (usually gmem, our Guest). */
+   struct guest_memory *mem;
+
+   /* The actual ring of buffers. */
+   struct vring vring;
+
+   /* Last available index we saw. */
+   u16 last_avail_idx;
+};
+
 struct virtqueue
 {
struct virtqueue *next;
@@ -158,11 +170,8 @@ struct virtqueue
/* The configuration for this queue. */
struct lguest_vqconfig config;
 
-   /* The actual ring of buffers. */
-   struct vring vring;
-
-   /* Last available index we saw. */
-   u16 last_avail_idx;
+   /* Information about the Guest's virtqueue. */
+   struct virtqueue_info vqi;
 
/* The routine to call when the Guest pings us. */
void (*handle_output)(int fd, struct virtqueue *me);
@@ -656,7 +665,7 @@ static void *_check_pointer(unsigned lon
errx(1, %s:%i: Invalid address %#lx, __FILE__, line, addr);
/* We return a pointer for the caller's convenience, now we know it's
 * safe to use. */
-   return from_guest_phys(gmem, addr);
+   return from_guest_phys(mem, addr);
 }
 /* A macro which transparently hands the line number to the real function. */
 #define check_pointer(mem,addr,size) _check_pointer(addr, size, mem, __LINE__)
@@ -664,20 +673,20 @@ static void *_check_pointer(unsigned lon
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain, or vq-vring.num if we're
  * at the end. */
-static unsigned next_desc(struct virtqueue *vq, unsigned int i)
+static unsigned next_desc(struct virtqueue_info *vqi, unsigned int i)
 {
unsigned int next;
 
/* If this descriptor says it doesn't chain, we're done. */
-   if (!(vq-vring.desc[i].flags  VRING_DESC_F_NEXT))
-   return vq-vring.num;
+   if (!(vqi-vring.desc[i].flags  VRING_DESC_F_NEXT))
+   return vqi-vring.num;
 
/* Check they're not leading us off end of descriptors. */
-   next = vq-vring.desc[i].next;
+   next = vqi-vring.desc[i].next;
/* Make sure compiler knows to grab that: we don't want it changing! */
wmb();
 
-   if (next = vq-vring.num)
+   if (next = vqi-vring.num)
errx(1, Desc next is %u, next);
 
return next;
@@ -688,29 +697,29 @@ static unsigned next_desc(struct virtque
  * number of output then some number of input descriptors, it's actually two
  * iovecs, but we pack them into one and note how many of each there were.
  *
- * This function returns the descriptor number found, or vq-vring.num (which
- * is never a valid descriptor number) if none was found. */
-static unsigned get_vq_desc(struct virtqueue *vq,
-   struct iovec iov[],
-   unsigned int *out_num, unsigned int *in_num)
+ * This function returns the descriptor number found, or -1 if none was
+ * found. */
+static int get_vq_desc(struct virtqueue_info *vqi,
+  struct iovec iov[],
+  unsigned int *out_num, unsigned int *in_num)
 {
unsigned int i, head;
 
/* Check it isn't doing very strange things with descriptor numbers. */
-   if ((u16)(vq-vring.avail-idx - vq-last_avail_idx)  vq-vring.num)
+   if ((u16)(vqi-vring.avail-idx - vqi-last_avail_idx)  vqi-vring.num)
errx(1, Guest moved used index from %u to %u,
-vq-last_avail_idx, vq-vring.avail-idx);
+vqi-last_avail_idx, vqi-vring.avail-idx);
 
/* If there's nothing new since last we looked, return invalid. */
-   if (vq-vring.avail-idx == vq-last_avail_idx)
-   return vq-vring.num;
+   if (vqi-vring.avail-idx == vqi-last_avail_idx)
+   return -1;
 
/* Grab the next descriptor number they're advertising, and increment
 * the index we've seen. */
-   head = vq-vring.avail-ring[vq-last_avail_idx++ % vq-vring.num];
+   head = vqi-vring.avail-ring[vqi-last_avail_idx++ % vqi-vring.num];
 
/* If their number is silly, that's a fatal mistake. */
-   if (head = vq-vring.num)
+   if (head = vqi-vring.num)
   

Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make sync in userspace

2008-03-20 Thread Zhang, Xiantao
Avi Kivity wrote:
 Zhang, Xiantao wrote:
 Hi, Avi
 Currently, make sync in userspace only syncs x86-specific heads from
 kernel source due to hard-coded in Makefile.
 Do you have plan to provide cross-arch support for that?
 
 No plans.  I'll apply patches though.  But don't you need kernel
 changes which make it impossible to run kvm-ia64 on older kernels?
 
 Other archs may
 need it for save/restore :)
 
 
 Save/restore?  Don't understand.

You know, currently make sync would sync header files to userspace from
include/asm-x86/, so kvm.h and kvm_host.h are always synced from there
for any archs. Since some arch-specific stuff for save/restore should be
defined in include/asm-$arch/(kvm.h; kvm_host.h), so ia64 or other archs
should need it when they implement save/restore.  
Xiantao

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make sync in userspace

2008-03-20 Thread Avi Kivity
Zhang, Xiantao wrote:
 Avi Kivity wrote:
   
 Zhang, Xiantao wrote:
 
 Avi Kivity wrote:

   
 Zhang, Xiantao wrote:

 
 Hi, Avi
 Currently, make sync in userspace only syncs x86-specific heads
 from kernel source due to hard-coded in Makefile.
 Do you have plan to provide cross-arch support for that?

   
 No plans.  I'll apply patches though.  But don't you need kernel
 changes which make it impossible to run kvm-ia64 on older kernels?


 
 Other archs may
 need it for save/restore :)


   
 Save/restore?  Don't understand.

 
 You know, currently make sync would sync header files to userspace
 from include/asm-x86/, so kvm.h and kvm_host.h are always synced
 from there for any archs. Since some arch-specific stuff for
 save/restore should be defined in include/asm-$arch/(kvm.h;
 kvm_host.h), so ia64 or other archs should need it when they
 implement save/restore. 
   
 I see.  But is 'make sync' actually useful for you?  Can you run
 kvm-ia64 on top of 2.6.24, which doesn't include your ia64 core API
 changes? 
 

 Now we don't intend to provide support for kernel which is older than
 2.6.24. And we don't want to compile kernel module in userspace.
 But at least we need to ensure make sync work first, because we need
 it to guarantee Qemu to use right header files for its compilation. 
 Xiantao
   

I see.  ./configure --with-patched-kernel should work for that, but I 
have no issue with copying include/asm-ia64 either.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ia64-devel] Cross-arch support for make syncin userspace

2008-03-20 Thread Zhang, Xiantao
Avi Kivity wrote:
 Zhang, Xiantao wrote:
 Avi Kivity wrote:

 
 
 I see.  ./configure --with-patched-kernel should work for that, but I
 have no issue with copying include/asm-ia64 either. 

Copy should be ugly, since it needs extral documentation to describle.
If --with-patched-kernel can call a script, that should be fine as well.
Xiantao

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Tim Post
On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote:
 +   snprintf(memfile_path, PATH_MAX, %s/.lguest,
 getenv(HOME) ?: );

Hi Rusty,

Is that safe if being run via setuid/gid or shared root? It might be
better to just look it up in /etc/passwd against the real UID,
considering that anyone can change (or null) that env string.

Of course its also practical to just say DON'T RUN LGUEST AS
SETUID/GID. Even if you say that, someone will do it. You might also
add beware of sudoers.

For people (like myself and lab mates) who are forced to share machines,
it could breed a whole new strain of practical jokes :)

That will cause lguest to inherit a memory leak from getpwuid(), but it
only leaks once.

Cheers,
--Tim



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] kvm-guest-drivers patch for RHEL4

2008-03-20 Thread Avi Kivity
david ahern wrote:
 Backport of the virtio drivers to RHEL4.

 The patch applies against the kvm-guest-drivers-linux-1 release but 
 also contains diffs for Anthony's spin_lock_irqsave/restore patch. Of 
 note is that to build for RHEL4 Makefile is renamed to Makefile-2.6 so 
 that Makefile can contain the build rules for the modules. RHEL4 
 (AFAIK) does not contain the Kbuild stuff. The make command is then 
 make -f Makefile-2.6


Looks much better than I feared, the #ifdef COMPAT_RHEL4s are quite small.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] PATCH: dont call exit() from pci_nic_init(), let caller handle

2008-03-20 Thread Marcelo Tosatti
On Wed, Mar 19, 2008 at 07:19:51PM -0500, Ryan Harper wrote:
 While exploring the PCI hotplug code recently posted, I encountered a
 situation where I don't believe the current behavior is ideal.  With
 hotplug, we can add additional pci-based nic devices like e1000 and
 rtl8139 from the qemu monitor.  If one mistakenly specifies model=ne2000
 (the ISA version), qemu just exits.  If a command is run from the
 monitor and specifies bogus values, I don't believe the right behavior
 is to exit out of the guest entirely.  The attached patch (which doesn't
 apply directly against qemu-cvs since hotplug hasn't been merged)
 changes pci_nic_init() to return NULL on error instead of exiting
 and then I've replaced all callers to check the return value and exit(),
 preserving the existing behavior, but allowing flexibility so
 hotplug can do the right thing and just report the error rather than
 exiting the guest.

Hi Ryan,

Looks good, thanks.

There might still be some exit()'s lurking around due to device/cpu hot/add
failure.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [ kvm-Bugs-1920897 ] KVM Guest Drivers fail on Windows

2008-03-20 Thread SourceForge.net
Bugs item #1920897, was opened at 2008-03-20 14:01
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1920897group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM Guest Drivers fail on Windows

Initial Comment:
KVM Guest Drivers v1 fail on Windows XP (SP2 ACPI) - it basically halts the VM 
as soon as the drivers get installed.

-Alexey, 20.3.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1920897group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [ kvm-Bugs-1920900 ] KVM Guest Drivers fail on Linux

2008-03-20 Thread SourceForge.net
Bugs item #1920900, was opened at 2008-03-20 14:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1920900group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM Guest Drivers fail on Linux

Initial Comment:
The newly released kvm-guest-drivers-linux fail on stage 1: make sync

Tested on openSUSE 10.3 32-bit guest. (gcc+kernel-source installed)

-Alexey

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1920900group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM: MMU: add KVM_ZAP_GFN ioctl

2008-03-20 Thread Avi Kivity
Marcelo Tosatti wrote:
 Add an ioctl to zap all mappings to a given gfn. This allows userspace
 remove the QEMU process mappings and the page without causing
 inconsistency.

   

I'm thinking of comitting rmap_nuke() to kvm.git, and the rest to the 
external module, since this is only needed on kernels without mmu notifiers.

Andrea, is rmap_nuke() suitable for the mmu notifiers pte clear callback?


Oh, and a single gfn may have multiple hvas, so we need to iterate over 
something here.

 Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]


 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index f0cdfba..c41464f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -642,6 +642,67 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
   account_shadowed(kvm, gfn);
  }
  
 +static void rmap_nuke(struct kvm *kvm, u64 gfn)
 +{
 + unsigned long *rmapp;
 + u64 *spte;
 + int nuked = 0;
 +
 + gfn = unalias_gfn(kvm, gfn);
 + rmapp = gfn_to_rmap(kvm, gfn, 0);
 +
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + BUG_ON(!spte);
 + BUG_ON(!(*spte  PT_PRESENT_MASK));
 + rmap_printk(rmap_nuke: spte %p %llx\n, spte, *spte);
 + rmap_remove(kvm, spte);
 + set_shadow_pte(spte, shadow_trap_nonpresent_pte);
 +nuked = 1;
 + spte = rmap_next(kvm, rmapp, spte);
 + }
 + /* check for huge page mappings */
 + rmapp = gfn_to_rmap(kvm, gfn, 1);
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + BUG_ON(!spte);
 + BUG_ON(!(*spte  PT_PRESENT_MASK));
 + BUG_ON((*spte  (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != 
 (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
 + pgprintk(rmap_nuke(large): spte %p %llx %lld\n, spte, *spte, 
 gfn);
 + rmap_remove(kvm, spte);
 + --kvm-stat.lpages;
 + set_shadow_pte(spte, shadow_trap_nonpresent_pte);
 + nuked = 1;
 + spte = rmap_next(kvm, rmapp, spte);
 + }
 +
 + if (nuked)
 + kvm_flush_remote_tlbs(kvm);
 +}
 +
 +int kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn)
 +{
 + unsigned long addr;
 + int have_mmu_notifiers = 0;
 +
 + down_read(kvm-slots_lock);
 + addr = gfn_to_hva(kvm, gfn);
 +
 + if (kvm_is_error_hva(addr)) {
 + up_read(kvm-slots_lock);
 + return -EINVAL;
 + }
 +
 + if (!have_mmu_notifiers) {
 + spin_lock(kvm-mmu_lock);
 + rmap_nuke(kvm, gfn);
 + spin_unlock(kvm-mmu_lock);
 + }
 + up_read(kvm-slots_lock);
 +
 + return 0;
 +}
 +
  #ifdef MMU_DEBUG
  static int is_empty_shadow_page(u64 *spt)
  {
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index e65a9d6..d982ca1 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -816,6 +816,9 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_NR_MEMSLOTS:
   r = KVM_MEMORY_SLOTS;
   break;
 + case KVM_CAP_ZAP_GFN:
 + r = 1;
 + break;
   default:
   r = 0;
   break;
 @@ -1636,6 +1639,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
   r = 0;
   break;
   }
 + case KVM_ZAP_GFN: {
 + gfn_t gfn;
 +
 + r = -EFAULT;
 + if (copy_from_user(gfn, argp, sizeof gfn))
 + goto out;
 + r = kvm_zap_single_gfn(kvm, gfn);
 + break;
 +} 
   default:
   ;
   }
 diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
 index 024b57c..4e45bd2 100644
 --- a/include/asm-x86/kvm_host.h
 +++ b/include/asm-x86/kvm_host.h
 @@ -425,6 +425,7 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 
 notrap_pte);
  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
  void kvm_mmu_zap_all(struct kvm *kvm);
 +int  kvm_zap_single_gfn(struct kvm *kvm, gfn_t gfn);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
  
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index e92e703..9ea714f 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -236,6 +236,7 @@ struct kvm_vapic_addr {
  #define KVM_CAP_CLOCKSOURCE 8
  #define KVM_CAP_NR_VCPUS 9   /* returns max vcpus per vm */
  #define KVM_CAP_NR_MEMSLOTS 10   /* returns max memory slots per vm */
 +#define KVM_CAP_ZAP_GFN  11
  
  /*
   * ioctls for VM fds
 @@ -258,6 +259,7 @@ struct kvm_vapic_addr {
  #define KVM_IRQ_LINE   _IOW(KVMIO, 0x61, struct kvm_irq_level)
  #define KVM_GET_IRQCHIP_IOWR(KVMIO, 0x62, struct kvm_irqchip)
  #define KVM_SET_IRQCHIP_IOR(KVMIO,  0x63, struct kvm_irqchip)
 +#define KVM_ZAP_GFN_IOR(KVMIO,  0x64, unsigned long)
  
  /*
   * ioctls for vcpu 

Re: [kvm-devel] KVM Test result, kernel f1080a0.., userspace 49cf2d2..

2008-03-20 Thread Avi Kivity
Yunfeng Zhao wrote:

 Following issues fixed:
 1. qcow based smp linux guests likely hang
 https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831
  

 2. smp windows installer crashes while rebooting
 https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831
  

   

No idea how these were fixed.

 3. Timer of guest is inaccurate
 https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599
  
   

This may be the in-kernel pit.

 4. Installer of 64bit vista guest will pause for ten minutes after reboot
 https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599
  

   

The pit again?!

Confused.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Anthony Liguori
Avi Kivity wrote:
 Rusty Russell wrote:
   
 Hi all,

Just finished my prototype of inter-guest virtio, using networking as an 
 example.  Each guest mmaps the other's address space and uses a FIFO for 
 notifications.

   
 

 Isn't that a security hole (hole? chasm)?  If the two guests can access 
 each other's memory, they might as well be just one guest, and 
 communicate internally.
   

Each guest's host userspace mmaps the other guest's address space.  The 
userspace then does a copy on both the tx and rx paths.

Conceivably, this could be done as a read-only mapping so that each 
guest userspace copies only the rx packets.  That's about as secure as 
you're going to get with this approach I think.

Regards,

Anthony Liguori

 My feeling is that the host needs to copy the data, using dma if 
 available.  Another option is to have one guest map the other's memory 
 for read and write, while the other guest is unprivileged.  This allows 
 one privileged guest to provide services for other, unprivileged guests, 
 like domain 0 or driver domains in Xen.

   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Anthony Liguori
Rusty Russell wrote:
 From: Paul TBBle Hampson [EMAIL PROTECTED]

 This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory
 mappings created by map_zeroed_pages.
   

I created a test program recently that measured the latency of a 
reads/writes to an mmap() file in /dev/shm and in a normal filesystem.  
Even after unlinking the underlying file, the write latency was much 
better with a mmap()'d file in /dev/shm.

/dev/shm is not really for general use.  I think we'll want to have our 
own tmpfs mount that we use to create VM images.  I also prefer to use a 
unix socket for communication, unlink the file immediately after open, 
and then pass the fd via SCM_RIGHTS to the other process.

Regards,

Anthony Liguori


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Paul TBBle Hampson
On Thu, Mar 20, 2008 at 04:16:00PM +0800, Tim Post wrote:
 On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote:
 +   snprintf(memfile_path, PATH_MAX, %s/.lguest,
 getenv(HOME) ?: );

 Hi Rusty,

 Is that safe if being run via setuid/gid or shared root? It might be
 better to just look it up in /etc/passwd against the real UID,
 considering that anyone can change (or null) that env string.

 Of course its also practical to just say DON'T RUN LGUEST AS
 SETUID/GID. Even if you say that, someone will do it. You might also
 add beware of sudoers.

 For people (like myself and lab mates) who are forced to share machines,
 it could breed a whole new strain of practical jokes :)

I'm not sure I see the risk here. Surely not anyone can modify your   

   
environment variables out from under you?   

   


   
Are you worried that other root users are going to point root's .lguest 

   
directory somewhere else, but not the non-root user's directory?

   


   
I fear I'm missing something here...

   


   
There _is_ an issue I hadn't thought of at the time, which is if your   

   
$HOME is on shared media, and you clash PIDs between lguest launchers on

   
two machines sharing that media as $HOME, you're going to clash 

   
memfiles, specifically truncating the earlier memfile.  

   

(Sorry for the double-up, lguest list. I hit send too quickly)

-- 
---
Paul TBBle Hampson, B.Sc, LPI, MCSE
Very-later-year Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
[EMAIL PROTECTED]

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
---


pgp0IB4uev1kE.pgp
Description: PGP signature
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Anthony Liguori
Rusty Russell wrote:
 Hi all,

Just finished my prototype of inter-guest virtio, using networking as an 
 example.  Each guest mmaps the other's address space and uses a FIFO for 
 notifications.

There are two issues with this approach.  The first is that neither guest 
 can change its mappings.  See patch 1.

Avi mentioned that with MMU notifiers, it may be possible to introduce a 
new kernel mechanism whereas you could map an arbitrary region of one 
process's memory into another process.  This would address this problem 
quite nicely.

   The second is that our feature 
 configuration is host presents, guest chooses which breaks down when we 
 don't know the capabilities of each guest.  In particular, TSO capability for 
 networking.
There are three possible solutions:
 1) Just offer the lowest common denominator to both sides (ie. no features). 
This is what I do with lguest in these patches.
 2) Offer something and handle the case where one Guest accepts and another
doesn't by emulating it.  ie. de-TSO the packets manually.
 3) Hot unplug the device from the guest which asks for the greater features,
then re-add it offering less features.  Requires hotplug in the guest OS.
   
4) Add a feature negotiation feature.  The feature that gets set is the 
feature negotiate feature.  If a guest doesn't support feature 
negotiation, you end up with the least-common denominator (no 
features).  If both guests support feature negotiation, you can then add 
something new to determine the true common subset.

 I haven't tuned or even benchmarked these patches, but it pings!
   

Very nice!  It's particularly cool that it was possible entirely in 
userspace.

Regards,

Anthony Liguori

 Rusty.

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 kvm-devel mailing list
 kvm-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/kvm-devel
   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Qemu-devel] PATCH: dont call exit() from pci_nic_init(), let caller handle

2008-03-20 Thread Ryan Harper
* Avi Kivity [EMAIL PROTECTED] [2008-03-20 07:19]:
 Ryan Harper wrote:
  While exploring the PCI hotplug code recently posted, I encountered a
  situation where I don't believe the current behavior is ideal.  With
  hotplug, we can add additional pci-based nic devices like e1000 and
  rtl8139 from the qemu monitor.  If one mistakenly specifies model=ne2000
  (the ISA version), qemu just exits.  If a command is run from the
  monitor and specifies bogus values, I don't believe the right behavior
  is to exit out of the guest entirely.  The attached patch (which doesn't
  apply directly against qemu-cvs since hotplug hasn't been merged)
  changes pci_nic_init() to return NULL on error instead of exiting
  and then I've replaced all callers to check the return value and exit(),
  preserving the existing behavior, but allowing flexibility so
  hotplug can do the right thing and just report the error rather than
  exiting the guest.
 

 
 Applied, thanks.
 
 [this didn't make it to kvm-devel for some reason?]

Yeah, not sure about that, sometimes it gets clogged in our outgoing
system; they tend to not get along with some servers for unknown reasons
to me.  It has worked in the past for me.  *shrugs*

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Avi Kivity
Anthony Liguori wrote:
 Avi Kivity wrote:
 Rusty Russell wrote:
  
 Hi all,

Just finished my prototype of inter-guest virtio, using 
 networking as an example.  Each guest mmaps the other's address 
 space and uses a FIFO for notifications.

   

 Isn't that a security hole (hole? chasm)?  If the two guests can 
 access each other's memory, they might as well be just one guest, and 
 communicate internally.
   

 Each guest's host userspace mmaps the other guest's address space.  
 The userspace then does a copy on both the tx and rx paths.


Well, that's better security-wise (I'd still prefer to avoid it, so we 
can run each guest under a separate uid), but then we lose performance wise.

 Conceivably, this could be done as a read-only mapping so that each 
 guest userspace copies only the rx packets.  That's about as secure as 
 you're going to get with this approach I think.


Maybe we can terminate the virtio queue in the host kernel as a pipe, 
and splice pipes together.

That gives us guest-guest and guest-process communications, and if you 
use aio the kernel can use a dma engine for the copy.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Paul TBBle Hampson
On Thu, Mar 20, 2008 at 09:04:17AM -0500, Anthony Liguori wrote:
 Rusty Russell wrote:
 From: Paul TBBle Hampson [EMAIL PROTECTED]

 This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory
 mappings created by map_zeroed_pages.

 I created a test program recently that measured the latency of a reads/writes 
 to an mmap() file in /dev/shm and in a normal filesystem.  Even after 
 unlinking the underlying file, the write latency was much better with a 
 mmap()'d file in 
 /dev/shm.

 /dev/shm is not really for general use.  I think we'll want to have our own 
 tmpfs mount that we use to create VM images.  I also prefer to use a unix 
 socket for communication, unlink the file immediately after open, and then 
 pass the fd 
 via SCM_RIGHTS to the other process.

The original motivations for the file-backed mmap (rather than the
/dev/zero mmap) were two-fold.

Firstly, to allow suspend and resume to be done to a guest, it would
need somewhere for its memory to survive. (ie. a guest could be
suspended externally immediately, and its state would be resumable from
that mmap file)

Secondly, heading towards some kind of common-page-sharing trick, where
each lguest could spot and share pages in common with other lguests.

Both of these assume the file is going to be visible in the filesystem
until the guest is shut down.

As to whether these are still interesting motivations, I withhold any
opinion in favour of those who know better. ^_^

-- 
---
Paul TBBle Hampson, B.Sc, LPI, MCSE
Very-later-year Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
[EMAIL PROTECTED]

Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
 -- Kristian Wilson, Nintendo, Inc, 1989

License: http://creativecommons.org/licenses/by/2.1/au/
---


pgp7xRVKYuGtl.pgp
Description: PGP signature
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Anthony Liguori
Avi Kivity wrote:
 Anthony Liguori wrote:
 Avi Kivity wrote:

 Each guest's host userspace mmaps the other guest's address space.  
 The userspace then does a copy on both the tx and rx paths.


 Well, that's better security-wise (I'd still prefer to avoid it, so we 
 can run each guest under a separate uid), but then we lose performance 
 wise.

What performance win?  I'm not sure the copies can be eliminated in the 
case of interguest IO.

Fast interguest IO means mmap()'ing the other guest's address space 
read-only.  If you had a pv dma registration api you could conceivably 
only allow the active dma entries to be mapped but my fear would be that 
the zap'ing on unregister would hurt performance.

 Conceivably, this could be done as a read-only mapping so that each 
 guest userspace copies only the rx packets.  That's about as secure 
 as you're going to get with this approach I think.


 Maybe we can terminate the virtio queue in the host kernel as a pipe, 
 and splice pipes together.

 That gives us guest-guest and guest-process communications, and if you 
 use aio the kernel can use a dma engine for the copy.

Ah, so you're looking to use a DMA engine for accelerated copy.  Perhaps 
the answer is to expose the DMA engine via a userspace API?

Regards,

Anthony Liguori


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Avi Kivity
Anthony Liguori wrote:
 Avi Kivity wrote:
 Anthony Liguori wrote:
 Avi Kivity wrote:

 Each guest's host userspace mmaps the other guest's address space.  
 The userspace then does a copy on both the tx and rx paths.


 Well, that's better security-wise (I'd still prefer to avoid it, so 
 we can run each guest under a separate uid), but then we lose 
 performance wise.

 What performance win?  I'm not sure the copies can be eliminated in 
 the case of interguest IO.


I guess not.  But at least you can dma instead of busy-copying.

 Fast interguest IO means mmap()'ing the other guest's address space 
 read-only.  

This implies trusting the other userspace, which is not a good thing.  
Let the kernel copy, we already trust it, and it has more resources to 
do the copy.

 If you had a pv dma registration api you could conceivably only allow 
 the active dma entries to be mapped but my fear would be that the 
 zap'ing on unregister would hurt performance.


Yes, mmu games are costly.  They also only work on page granularity 
which isn't always possible to guarantee.


 Conceivably, this could be done as a read-only mapping so that each 
 guest userspace copies only the rx packets.  That's about as secure 
 as you're going to get with this approach I think.


 Maybe we can terminate the virtio queue in the host kernel as a pipe, 
 and splice pipes together.

 That gives us guest-guest and guest-process communications, and if 
 you use aio the kernel can use a dma engine for the copy.

 Ah, so you're looking to use a DMA engine for accelerated copy.  
 Perhaps the answer is to expose the DMA engine via a userspace API?

That's one option, but it still involves sharing all of memory.  
Splicing pipes might be better.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Anthony Liguori
Avi Kivity wrote:
 Anthony Liguori wrote:
 Avi Kivity wrote:
 Anthony Liguori wrote:
 Avi Kivity wrote:

 Each guest's host userspace mmaps the other guest's address space.  
 The userspace then does a copy on both the tx and rx paths.


 Well, that's better security-wise (I'd still prefer to avoid it, so 
 we can run each guest under a separate uid), but then we lose 
 performance wise.

 What performance win?  I'm not sure the copies can be eliminated in 
 the case of interguest IO.


 I guess not.  But at least you can dma instead of busy-copying.

 Fast interguest IO means mmap()'ing the other guest's address space 
 read-only.  

You can have the file descriptor be opened O_RDONLY so trust isn't an issue.

 This implies trusting the other userspace, which is not a good thing.  
 Let the kernel copy, we already trust it, and it has more resources to 
 do the copy.


You're going to end up with the same trust issues no matter what unless 
you let the kernel look directly at the virtio ring queue.  That's the 
only way to arbitrate what memory gets copied.  There may be a generic 
API here for fast interprocess IO, I don't know.  splice() is a little 
awkward though for this because you really don't want to sit in a 
splice() loop.  What you want is for both sides to be kick'ing the 
kernel and the kernel to raise an event via eventfd() or something.

Absent whatever this kernel API is (which is really just helpful with a 
DMA engine), I think the current userspace approach is pretty 
reasonable.  Not just for interguest IO but also for driver domains 
which I think is a logical extension.

Regards,

Anthony Liguori

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Avi Kivity
Anthony Liguori wrote:
 Rusty Russell wrote:
   
 From: Paul TBBle Hampson [EMAIL PROTECTED]

 This creates a file in $HOME/.lguest/ to directly back the RAM and DMA memory
 mappings created by map_zeroed_pages.
   
 

 I created a test program recently that measured the latency of a 
 reads/writes to an mmap() file in /dev/shm and in a normal filesystem.  
 Even after unlinking the underlying file, the write latency was much 
 better with a mmap()'d file in /dev/shm.
   

Surely the difference disappears once the pages have been faulted in?


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Anthony Liguori
Avi Kivity wrote:
 Anthony Liguori wrote:
 Rusty Russell wrote:
  
 From: Paul TBBle Hampson [EMAIL PROTECTED]

 This creates a file in $HOME/.lguest/ to directly back the RAM and 
 DMA memory
 mappings created by map_zeroed_pages.
   

 I created a test program recently that measured the latency of a 
 reads/writes to an mmap() file in /dev/shm and in a normal 
 filesystem.  Even after unlinking the underlying file, the write 
 latency was much better with a mmap()'d file in /dev/shm.
   

 Surely the difference disappears once the pages have been faulted in?

I don't recall.  I believe rewrite was okay but initial write was much 
worse.

Regards,

Anthony Liguori



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Avi Kivity
Anthony Liguori wrote:

 You can have the file descriptor be opened O_RDONLY so trust isn't an 
 issue.


Reading is just as bad as writing.

 This implies trusting the other userspace, which is not a good 
 thing.  Let the kernel copy, we already trust it, and it has more 
 resources to do the copy.


 You're going to end up with the same trust issues no matter what 
 unless you let the kernel look directly at the virtio ring queue.  
 That's the only way to arbitrate what memory gets copied.  

That's what we need, then.

 There may be a generic API here for fast interprocess IO, I don't 
 know.  splice() is a little awkward though for this because you really 
 don't want to sit in a splice() loop.  What you want is for both sides 
 to be kick'ing the kernel and the kernel to raise an event via 
 eventfd() or something.

 Absent whatever this kernel API is (which is really just helpful with 
 a DMA engine), I think the current userspace approach is pretty 
 reasonable.  Not just for interguest IO but also for driver domains 
 which I think is a logical extension.

I disagree.  A driver domain is shared between multiple guests, and if 
one of the guests manages to break into qemu then it can see other 
guest's data.

[Driver domains are a horrible idea IMO, but that's another story]

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Anthony Liguori
Avi Kivity wrote:

 I disagree.  A driver domain is shared between multiple guests, and if 
 one of the guests manages to break into qemu then it can see other 
 guest's data.

You still don't strictly need to do things in the kernel if this is your 
concern.  You can have another process map both guest's address spaces 
and do the copying on behalf of each guest if you're paranoid about 
escaping into QEMU.

 [Driver domains are a horrible idea IMO, but that's another story]

I don't disagree :-)

Regards,

Anthony Liguori


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable

2008-03-20 Thread Carsten Otte
From: Martin Schwidefsky [EMAIL PROTECTED]

The SIE instruction on s390 uses the 2nd half of the page table page to
virtualize the storage keys of a guest. This patch offers the s390_enable_sie
function, which reorganizes the page tables of a single-threaded process to
reserve space in the page table:
s390_enable_sie makes sure that the process is single threaded and then uses
dup_mm to create a new mm with reorganized page tables. The old mm is freed 
and the process has now a page status extended field after every page table.

Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

This patch has a small common code hit, namely making dup_mm non-static.


Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---

 arch/s390/Kconfig  |4 ++
 arch/s390/kernel/setup.c   |4 ++
 arch/s390/mm/pgtable.c |   55 ++---
 include/asm-s390/mmu.h |1 
 include/asm-s390/mmu_context.h |8 +
 include/asm-s390/pgtable.h |1 
 kernel/fork.c  |2 -
 7 files changed, 70 insertions(+), 5 deletions(-)

Index: kvm/arch/s390/Kconfig
===
--- kvm.orig/arch/s390/Kconfig
+++ kvm/arch/s390/Kconfig
@@ -55,6 +55,10 @@ config GENERIC_LOCKBREAK
default y
depends on SMP  PREEMPT
 
+config PGSTE
+   bool
+   default y if KVM
+
 mainmenu Linux Kernel Configuration
 
 config S390
Index: kvm/arch/s390/kernel/setup.c
===
--- kvm.orig/arch/s390/kernel/setup.c
+++ kvm/arch/s390/kernel/setup.c
@@ -315,7 +315,11 @@ static int __init early_parse_ipldelay(c
 early_param(ipldelay, early_parse_ipldelay);
 
 #ifdef CONFIG_S390_SWITCH_AMODE
+#ifdef CONFIG_PGSTE
+unsigned int switch_amode = 1;
+#else
 unsigned int switch_amode = 0;
+#endif
 EXPORT_SYMBOL_GPL(switch_amode);
 
 static void set_amode_and_uaccess(unsigned long user_amode,
Index: kvm/arch/s390/mm/pgtable.c
===
--- kvm.orig/arch/s390/mm/pgtable.c
+++ kvm/arch/s390/mm/pgtable.c
@@ -30,11 +30,27 @@
 #define TABLES_PER_PAGE4
 #define FRAG_MASK  15UL
 #define SECOND_HALVES  10UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+   clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+   memset(table + 256, 0, PAGE_SIZE/4);
+   clear_table(table + 512, _PAGE_TYPE_EMPTY, PAGE_SIZE/4);
+   memset(table + 768, 0, PAGE_SIZE/4);
+}
+
 #else
 #define ALLOC_ORDER2
 #define TABLES_PER_PAGE2
 #define FRAG_MASK  3UL
 #define SECOND_HALVES  2UL
+
+void clear_table_pgstes(unsigned long *table)
+{
+   clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2);
+   memset(table + 256, 0, PAGE_SIZE/2);
+}
+
 #endif
 
 unsigned long *crst_table_alloc(struct mm_struct *mm, int noexec)
@@ -153,7 +169,7 @@ unsigned long *page_table_alloc(struct m
unsigned long *table;
unsigned long bits;
 
-   bits = mm-context.noexec ? 3UL : 1UL;
+   bits = (mm-context.noexec || mm-context.pgstes) ? 3UL : 1UL;
spin_lock(mm-page_table_lock);
page = NULL;
if (!list_empty(mm-context.pgtable_list)) {
@@ -170,7 +186,10 @@ unsigned long *page_table_alloc(struct m
pgtable_page_ctor(page);
page-flags = ~FRAG_MASK;
table = (unsigned long *) page_to_phys(page);
-   clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
+   if (mm-context.pgstes)
+   clear_table_pgstes(table);
+   else
+   clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE);
spin_lock(mm-page_table_lock);
list_add(page-lru, mm-context.pgtable_list);
}
@@ -191,7 +210,7 @@ void page_table_free(struct mm_struct *m
struct page *page;
unsigned long bits;
 
-   bits = mm-context.noexec ? 3UL : 1UL;
+   bits = (mm-context.noexec || mm-context.pgstes) ? 3UL : 1UL;
bits = (__pa(table)  (PAGE_SIZE - 1)) / 256 / sizeof(unsigned long);
page = pfn_to_page(__pa(table)  PAGE_SHIFT);
spin_lock(mm-page_table_lock);
@@ -228,3 +247,33 @@ void disable_noexec(struct mm_struct *mm
mm-context.noexec = 0;
update_mm(mm, tsk);
 }
+
+struct mm_struct *dup_mm(struct task_struct *tsk);
+
+/*
+ * switch on pgstes for its userspace process (for kvm)
+ */
+int s390_enable_sie(void)
+{
+   struct task_struct *tsk = current;
+   struct mm_struct *mm;
+
+   if (tsk-mm-context.pgstes)
+   return 0;
+   if (!tsk-mm || atomic_read(tsk-mm-mm_users)  1 ||
+   tsk-mm != tsk-active_mm || tsk-mm-ioctx_list)
+   return -EINVAL;
+   tsk-mm-context.pgstes = 1;/* dirty little tricks .. */
+   mm = dup_mm(tsk);
+   tsk-mm-context.pgstes = 0;
+   if (!mm)
+   return 

[kvm-devel] [RFC/PATCH 02/15] preparation: host memory management changes for s390 kvm

2008-03-20 Thread Carsten Otte
From: Heiko Carstens [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This patch changes the s390 memory management defintions to use the pgste field
for dirty and reference bit tracking of host and guest code. Usually on s390, 
dirty and referenced are tracked in storage keys, which belong to the physical
page. This changes with virtualization: The guest and host dirty/reference bits
are defined to be the logical OR of the values for the mapping and the physical
page. This patch implements the necessary changes in pgtable.h for s390.


There is a common code change in mm/rmap.c, the call to 
page_test_and_clear_young
must be moved. This is a no-op for all architecture but s390. page_referenced
checks the referenced bits for the physiscal page and for all mappings:
o The physical page is checked with page_test_and_clear_young.
o The mappings are checked with ptep_test_and_clear_young and friends.

Without pgstes (the current implementation on Linux s390) the physical page
check is implemented but the mapping callbacks are no-ops because dirty 
and referenced are not tracked in the s390 page tables. The pgstes introduces 
guest and host dirty and reference bits for s390 in the host mapping. These
mapping must be checked before page_test_and_clear_young resets the reference
bit. 

Signed-off-by: Heiko Carstens [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 include/asm-s390/pgtable.h |  109 +++--
 mm/rmap.c  |7 +-
 2 files changed, 110 insertions(+), 6 deletions(-)

Index: kvm/include/asm-s390/pgtable.h
===
--- kvm.orig/include/asm-s390/pgtable.h
+++ kvm/include/asm-s390/pgtable.h
@@ -30,6 +30,7 @@
  */
 #ifndef __ASSEMBLY__
 #include linux/mm_types.h
+#include asm/atomic.h
 #include asm/bug.h
 #include asm/processor.h
 
@@ -258,6 +259,13 @@ extern char empty_zero_page[PAGE_SIZE];
  * swap pte is 1011 and 0001, 0011, 0101, 0111 are invalid.
  */
 
+/* Page status extended for virtualization */
+#define _PAGE_RCP_PCL  0x0080UL
+#define _PAGE_RCP_HR   0x0040UL
+#define _PAGE_RCP_HC   0x0020UL
+#define _PAGE_RCP_GR   0x0004UL
+#define _PAGE_RCP_GC   0x0002UL
+
 #ifndef __s390x__
 
 /* Bits in the segment table address-space-control-element */
@@ -513,6 +521,67 @@ static inline int pte_file(pte_t pte)
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(a,b)  (pte_val(a) == pte_val(b))
 
+static inline void rcp_lock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+   atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE);
+   preempt_disable();
+   atomic64_set_mask(_PAGE_RCP_PCL, rcp);
+#endif
+}
+
+static inline void rcp_unlock(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+   atomic64_t *rcp = (atomic64_t *) (ptep + PTRS_PER_PTE);
+   atomic64_clear_mask(_PAGE_RCP_PCL, rcp);
+   preempt_enable();
+#endif
+}
+
+static inline void rcp_set_bits(pte_t *ptep, unsigned long val)
+{
+#ifdef CONFIG_PGSTE
+   *(unsigned long *) (ptep + PTRS_PER_PTE) |= val;
+#endif
+}
+
+static inline int rcp_test_and_clear_bits(pte_t *ptep, unsigned long val)
+{
+#ifdef CONFIG_PGSTE
+   unsigned long ret;
+
+   ret = *(unsigned long *) (ptep + PTRS_PER_PTE);
+   *(unsigned long *) (ptep + PTRS_PER_PTE) = ~val;
+   return (ret  val) == val;
+#else
+   return 0;
+#endif
+}
+
+
+/* forward declaration for SetPageUptodate in page-flags.h*/
+static inline void page_clear_dirty(struct page *page);
+#include linux/page-flags.h
+
+static inline void ptep_rcp_copy(pte_t *ptep)
+{
+#ifdef CONFIG_PGSTE
+   struct page *page = virt_to_page(pte_val(*ptep));
+   unsigned int skey;
+
+   skey = page_get_storage_key(page_to_phys(page));
+   if (skey  _PAGE_CHANGED)
+   rcp_set_bits(ptep, _PAGE_RCP_GC);
+   if (skey  _PAGE_REFERENCED)
+   rcp_set_bits(ptep, _PAGE_RCP_GR);
+   if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HC))
+   SetPageDirty(page);
+   if (rcp_test_and_clear_bits(ptep, _PAGE_RCP_HR))
+   SetPageReferenced(page);
+#endif
+}
+
 /*
  * query functions pte_write/pte_dirty/pte_young only work if
  * pte_present() is true. Undefined behaviour if not..
@@ -599,6 +668,8 @@ static inline void pmd_clear(pmd_t *pmd)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t 
*ptep)
 {
+   if (mm-context.pgstes)
+   ptep_rcp_copy(ptep);
pte_val(*ptep) = _PAGE_TYPE_EMPTY;
if (mm-context.noexec)
pte_val(ptep[PTRS_PER_PTE]) = _PAGE_TYPE_EMPTY;
@@ -667,6 +738,22 @@ static inline pte_t pte_mkyoung(pte_t pt
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
 {
+#ifdef 

[kvm-devel] [RFC/PATCH 05/15] kvm-s390: s390 arch backend for the kvm kernel module

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]
From: Heiko Carstens [EMAIL PROTECTED]

This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
 (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
instruction SIE to run virtual machines with up to 64 virtual CPUs each.
This port is only usable on 64bit host kernels, and can only run 64bit guest
kernels. However, running 31bit applications in guest userspace is possible.

The following source files are introduced by this patch
arch/s390/kvm/kvm-s390.csimilar to arch/x86/kvm/x86.c, this implements all
arch callbacks for kvm. __vcpu_run calls back into
sie64a to enter the guest machine context
arch/s390/kvm/sie64a.S  assembler function sie64a, which enters guest
context via SIE, and switches world before and 
afterthat
include/asm-s390/kvm_host.h contains all vital data structures needed to run
virtual machines on the mainframe
include/asm-s390/kvm.h  defines kvm_regs and friends for user access to
guest register content
arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
arch/s390/kvm/kvm-s390.hheader file for kvm-s390 internals, extended by
later patches

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Heiko Carstens [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/Makefile  |2 
 arch/s390/kernel/vtime.c|1 
 arch/s390/kvm/Makefile  |   14 +
 arch/s390/kvm/gaccess.h |  280 +
 arch/s390/kvm/kvm-s390.c|  574 
 arch/s390/kvm/kvm-s390.h|   29 ++
 arch/s390/kvm/sie64a.S  |   47 +++
 include/asm-s390/Kbuild |1 
 include/asm-s390/kvm.h  |   44 +++
 include/asm-s390/kvm_host.h |  119 +
 include/asm-s390/kvm_para.h |   30 ++
 include/linux/kvm.h |   15 +
 include/linux/kvm_host.h|4 
 13 files changed, 1159 insertions(+), 1 deletion(-)

Index: kvm/arch/s390/Makefile
===
--- kvm.orig/arch/s390/Makefile
+++ kvm/arch/s390/Makefile
@@ -87,7 +87,7 @@ LDFLAGS_vmlinux := -e start
 head-y := arch/s390/kernel/head.o arch/s390/kernel/init_task.o
 
 core-y += arch/s390/mm/ arch/s390/kernel/ arch/s390/crypto/ \
-  arch/s390/appldata/ arch/s390/hypfs/
+  arch/s390/appldata/ arch/s390/hypfs/ arch/s390/kvm/
 libs-y += arch/s390/lib/
 drivers-y  += drivers/s390/
 drivers-$(CONFIG_MATHEMU) += arch/s390/math-emu/
Index: kvm/arch/s390/kernel/vtime.c
===
--- kvm.orig/arch/s390/kernel/vtime.c
+++ kvm/arch/s390/kernel/vtime.c
@@ -110,6 +110,7 @@ void account_system_vtime(struct task_st
S390_lowcore.steal_clock -= cputime  12;
account_system_time(tsk, 0, cputime);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 static inline void set_vtimer(__u64 expires)
 {
Index: kvm/arch/s390/kvm/Makefile
===
--- /dev/null
+++ kvm/arch/s390/kvm/Makefile
@@ -0,0 +1,14 @@
+# Makefile for kernel virtual machines on s390
+#
+# Copyright IBM Corp. 2008
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License (version 2 only)
+# as published by the Free Software Foundation.
+
+common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o)
+
+EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
+
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o
+obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/gaccess.h
===
--- /dev/null
+++ kvm/arch/s390/kvm/gaccess.h
@@ -0,0 +1,280 @@
+/*
+ * gaccess.h -  access guest memory
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Carsten Otte [EMAIL PROTECTED]
+ */
+
+#ifndef __KVM_S390_GACCESS_H
+#define __KVM_S390_GACCESS_H
+
+#include linux/compiler.h
+#include linux/kvm_host.h
+#include asm/uaccess.h
+
+static inline void __user *__guestaddr_to_user(struct kvm_vcpu *vcpu,
+  u64 guestaddr)
+{
+   u64 prefix  = vcpu-arch.sie_block-prefix;
+   u64 origin  = vcpu-kvm-arch.guest_origin;
+   u64 memsize = vcpu-kvm-arch.guest_memsize;
+
+   if (guestaddr  2 * PAGE_SIZE)
+   guestaddr += prefix;
+   else if ((guestaddr = prefix)  (guestaddr  prefix + 2 * PAGE_SIZE))
+

[kvm-devel] [RFC/PATCH 04/15] preparation: split sysinfo defintions for kvm use

2008-03-20 Thread Carsten Otte
From: Christian Borntraeger [EMAIL PROTECTED]

drivers/s390/sysinfo.c uses the store system information intruction to query
the system about information of the machine, the LPAR and additional 
hypervisors. KVM has to implement the host part for this instruction. 

To avoid code duplication, this patch splits the common definitions from
sysinfo.c into a separate header file include/asm-s390/sysinfo.h for KVM use.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 drivers/s390/sysinfo.c |  100 
 include/asm-s390/sysinfo.h |  112 +
 2 files changed, 113 insertions(+), 99 deletions(-)

Index: kvm/drivers/s390/sysinfo.c
===
--- kvm.orig/drivers/s390/sysinfo.c
+++ kvm/drivers/s390/sysinfo.c
@@ -11,111 +11,13 @@
 #include linux/init.h
 #include linux/delay.h
 #include asm/ebcdic.h
+#include asm/sysinfo.h
 
 /* Sigh, math-emu. Don't ask. */
 #include asm/sfp-util.h
 #include math-emu/soft-fp.h
 #include math-emu/single.h
 
-struct sysinfo_1_1_1 {
-   char reserved_0[32];
-   char manufacturer[16];
-   char type[4];
-   char reserved_1[12];
-   char model_capacity[16];
-   char sequence[16];
-   char plant[4];
-   char model[16];
-};
-
-struct sysinfo_1_2_1 {
-   char reserved_0[80];
-   char sequence[16];
-   char plant[4];
-   char reserved_1[2];
-   unsigned short cpu_address;
-};
-
-struct sysinfo_1_2_2 {
-   char format;
-   char reserved_0[1];
-   unsigned short acc_offset;
-   char reserved_1[24];
-   unsigned int secondary_capability;
-   unsigned int capability;
-   unsigned short cpus_total;
-   unsigned short cpus_configured;
-   unsigned short cpus_standby;
-   unsigned short cpus_reserved;
-   unsigned short adjustment[0];
-};
-
-struct sysinfo_1_2_2_extension {
-   unsigned int alt_capability;
-   unsigned short alt_adjustment[0];
-};
-
-struct sysinfo_2_2_1 {
-   char reserved_0[80];
-   char sequence[16];
-   char plant[4];
-   unsigned short cpu_id;
-   unsigned short cpu_address;
-};
-
-struct sysinfo_2_2_2 {
-   char reserved_0[32];
-   unsigned short lpar_number;
-   char reserved_1;
-   unsigned char characteristics;
-   unsigned short cpus_total;
-   unsigned short cpus_configured;
-   unsigned short cpus_standby;
-   unsigned short cpus_reserved;
-   char name[8];
-   unsigned int caf;
-   char reserved_2[16];
-   unsigned short cpus_dedicated;
-   unsigned short cpus_shared;
-};
-
-#define LPAR_CHAR_DEDICATED(1  7)
-#define LPAR_CHAR_SHARED   (1  6)
-#define LPAR_CHAR_LIMITED  (1  5)
-
-struct sysinfo_3_2_2 {
-   char reserved_0[31];
-   unsigned char count;
-   struct {
-   char reserved_0[4];
-   unsigned short cpus_total;
-   unsigned short cpus_configured;
-   unsigned short cpus_standby;
-   unsigned short cpus_reserved;
-   char name[8];
-   unsigned int caf;
-   char cpi[16];
-   char reserved_1[24];
-
-   } vm[8];
-};
-
-static inline int stsi(void *sysinfo, int fc, int sel1, int sel2)
-{
-   register int r0 asm(0) = (fc  28) | sel1;
-   register int r1 asm(1) = sel2;
-
-   asm volatile(
-  stsi 0(%2)\n
-   0: jz   2f\n
-   1: lhi  %0,%3\n
-   2:\n
-   EX_TABLE(0b,1b)
-   : +d (r0) : d (r1), a (sysinfo), K (-ENOSYS)
-   : cc, memory );
-   return r0;
-}
-
 static inline int stsi_0(void)
 {
int rc = stsi (NULL, 0, 0, 0);
Index: kvm/include/asm-s390/sysinfo.h
===
--- /dev/null
+++ kvm/include/asm-s390/sysinfo.h
@@ -0,0 +1,112 @@
+/*
+ * definition for store system information stsi
+ *
+ * Copyright IBM Corp. 2001,2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Ulrich Weigand [EMAIL PROTECTED]
+ *   Christian Borntraeger [EMAIL PROTECTED]
+ */
+
+struct sysinfo_1_1_1 {
+   char reserved_0[32];
+   char manufacturer[16];
+   char type[4];
+   char reserved_1[12];
+   char model_capacity[16];
+   char sequence[16];
+   char plant[4];
+   char model[16];
+};
+
+struct sysinfo_1_2_1 {
+   char reserved_0[80];
+   char sequence[16];
+   char plant[4];
+   char reserved_1[2];
+   unsigned short cpu_address;
+};
+
+struct sysinfo_1_2_2 {
+   char format;
+   char reserved_0[1];
+   unsigned short acc_offset;
+   

[kvm-devel] [RFC/PATCH 03/15] preparation: address of the 64bit extint parm in lowcore

2008-03-20 Thread Carsten Otte
From: Christian Borntraeger [EMAIL PROTECTED]

The address 0x11b8 is used by z/VM for pfault and diag 250 I/O to
provide a 64 bit extint parameter. virtio uses the same address, so
its time to update the lowcore structure.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 include/asm-s390/lowcore.h |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

Index: kvm/include/asm-s390/lowcore.h
===
--- kvm.orig/include/asm-s390/lowcore.h
+++ kvm/include/asm-s390/lowcore.h
@@ -380,27 +380,32 @@ struct _lowcore
 /* whether the kernel died with panic() or not */
 __u32panic_magic;  /* 0xe00 */
 
-   __u8 pad13[0x1200-0xe04];  /* 0xe04 */
+   __u8 pad13[0x11b8-0xe04];  /* 0xe04 */
+
+   /* 64 bit extparam used for pfault, diag 250 etc  */
+   __u64ext_params2;   /* 0x11B8 */
+
+   __u8 pad14[0x1200-0x11C0];  /* 0x11C0 */
 
 /* System info area */ 
 
__u64floating_pt_save_area[16]; /* 0x1200 */
__u64gpregs_save_area[16];  /* 0x1280 */
__u32st_status_fixed_logout[4]; /* 0x1300 */
-   __u8 pad14[0x1318-0x1310];  /* 0x1310 */
+   __u8 pad15[0x1318-0x1310];  /* 0x1310 */
__u32prefixreg_save_area;   /* 0x1318 */
__u32fpt_creg_save_area;/* 0x131c */
-   __u8 pad15[0x1324-0x1320];  /* 0x1320 */
+   __u8 pad16[0x1324-0x1320];  /* 0x1320 */
__u32tod_progreg_save_area; /* 0x1324 */
__u32cpu_timer_save_area[2];/* 0x1328 */
__u32clock_comp_save_area[2];   /* 0x1330 */
-   __u8 pad16[0x1340-0x1338];  /* 0x1338 */ 
+   __u8 pad17[0x1340-0x1338];  /* 0x1338 */
__u32access_regs_save_area[16]; /* 0x1340 */ 
__u64cregs_save_area[16];   /* 0x1380 */
 
/* align to the top of the prefix area */
 
-   __u8 pad17[0x2000-0x1400];  /* 0x1400 */
+   __u8 pad18[0x2000-0x1400];  /* 0x1400 */
 #endif /* !__s390x__ */
 } __attribute__((packed)); /* End structure*/
 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC/PATCH 07/15] kvm-s390: interrupt subsystem, cpu timer, waitpsw

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]

This patch contains the s390 interrupt subsystem (similar to in kernel apic)
including timer interrupts (similar to in-kernel-pit) and enabled wait
(similar to in kernel hlt).

In order to achieve that, this patch also introduces intercept handling
for instruction intercepts, and it implements load control instructions.

This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
the vm file descriptors and the vcpu file descriptors. In case this ioctl is
issued against a vm file descriptor, the interrupt is considered floating.
Floating interrupts may be delivered to any virtual cpu in the configuration.

The following interrupts are supported:
SIGP STOP   - interprocessor signal that stops a remote cpu
SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
  (stopped) remote cpu
INT EMERGENCY   - interprocessor interrupt, usually used to signal need_reshed
  and for smp_call_function() in the guest.
PROGRAM INT - exception during program execution such as page fault, illegal
  instruction and friends
RESTART - interprocessor signal that starts a stopped cpu
INT VIRTIO  - floating interrupt for virtio signalisation
INT SERVICE - floating interrupt for signalisations from the system
  service processor

struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
an interrupt, also carrys parameter data for interrupts along with the interrupt
type. Interrupts on s390 usually have a state that represents the current
operation, or identifies which device has caused the interruption on s390.

kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
disabled wait (that is, disabled for interrupts), we exit to userspace. In case
of an enabled wait we set up a timer that equals the cpu clock comparator value
and sleep on a wait queue.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/kvm/Makefile  |2 
 arch/s390/kvm/intercept.c   |  123 +
 arch/s390/kvm/interrupt.c   |  583 
 arch/s390/kvm/kvm-s390.c|   48 +++
 arch/s390/kvm/kvm-s390.h|   15 +
 include/asm-s390/kvm_host.h |   75 +
 include/linux/kvm.h |   17 +
 7 files changed, 860 insertions(+), 3 deletions(-)

Index: kvm/arch/s390/kvm/Makefile
===
--- kvm.orig/arch/s390/kvm/Makefile
+++ kvm/arch/s390/kvm/Makefile
@@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
 
-kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o
 obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -18,6 +18,91 @@
 #include asm/kvm_host.h
 
 #include kvm-s390.h
+#include gaccess.h
+
+static int handle_lctg(struct kvm_vcpu *vcpu)
+{
+   int reg1 = (vcpu-arch.sie_block-ipa  0x00f0)  4;
+   int reg3 = vcpu-arch.sie_block-ipa  0x000f;
+   int base2 = vcpu-arch.sie_block-ipb  28;
+   int disp2 = ((vcpu-arch.sie_block-ipb  0x0fff)  16) +
+   ((vcpu-arch.sie_block-ipb  0xff00)  4);
+   u64 useraddr;
+   int reg, rc;
+
+   vcpu-stat.instruction_lctg++;
+   if ((vcpu-arch.sie_block-ipb  0xff) != 0x2f)
+   return -ENOTSUPP;
+
+   useraddr = disp2;
+   if (base2)
+   useraddr += vcpu-arch.guest_gprs[base2];
+
+   reg = reg1;
+
+   VCPU_EVENT(vcpu, 5, lctg r1:%x, r3:%x,b2:%x,d2:%x, reg1, reg3, base2,
+  disp2);
+
+   do {
+   rc = get_guest_u64(vcpu, useraddr,
+   vcpu-arch.sie_block-gcr[reg]);
+   if (rc == -EFAULT) {
+   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   break;
+   }
+   useraddr += 8;
+   if (reg == reg3)
+   break;
+   reg = reg + 1;
+   if (reg  15)
+   reg = 0;
+   } while (1);
+   return 0;
+}
+
+static int handle_lctl(struct kvm_vcpu *vcpu)
+{
+   int reg1 = (vcpu-arch.sie_block-ipa  0x00f0)  4;
+   int reg3 = vcpu-arch.sie_block-ipa  0x000f;
+   int base2 = vcpu-arch.sie_block-ipb  28;
+   int disp2 = ((vcpu-arch.sie_block-ipb  0x0fff)  16);
+   u64 useraddr;
+   u32 val = 0;
+   int reg, rc;
+
+   vcpu-stat.instruction_lctl++;
+
+   useraddr = disp2;
+   if (base2)
+   useraddr += vcpu-arch.guest_gprs[base2];
+
+   reg = reg1;
+
+   VCPU_EVENT(vcpu, 5, lctl r1:%x, r3:%x,b2:%x,d2:%x, reg1, reg3, base2,
+  disp2);
+
+  

[kvm-devel] [RFC/PATCH 09/15] kvm-s390: interprocessor communication via sigp

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This patch introduces in-kernel handling of _some_ sigp interprocessor
signals (similar to ipi).
kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
handlers depending on the operation requested:
- sigp sense tries to retrieve information such as existence or running state
  of the remote cpu
- sigp emergency sends an external interrupt to the remove cpu
- sigp stop stops a remove cpu
- sigp stop store status stops a remote cpu, and stores its entire internal
  state to the cpus lowcore
- sigp set arch sets the architecture mode of the remote cpu. setting to
  ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
  denied, all others are passed to userland
- sigp set prefix sets the prefix register of a remote cpu

For implementation of this, the stop intercept indication starts to get reused
on purpose: a set of action bits defines what to do once a cpu gets stopped:
ACTION_STOP_ON_STOP  really stops the cpu when a stop intercept is recognized
ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
 recognized

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/kvm/Makefile  |2 
 arch/s390/kvm/intercept.c   |   22 +++
 arch/s390/kvm/kvm-s390.c|7 +
 arch/s390/kvm/kvm-s390.h|7 +
 arch/s390/kvm/sigp.c|  289 
 include/asm-s390/kvm_host.h |   12 +
 6 files changed, 336 insertions(+), 3 deletions(-)

Index: kvm/arch/s390/kvm/Makefile
===
--- kvm.orig/arch/s390/kvm/Makefile
+++ kvm/arch/s390/kvm/Makefile
@@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
 
-kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o 
sigp.o
 obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu *
 }
 
 static intercept_handler_t instruction_handlers[256] = {
+   [0xae] = kvm_s390_handle_sigp,
[0xb2] = kvm_s390_handle_priv,
[0xb7] = handle_lctl,
[0xeb] = handle_lctg,
@@ -122,10 +123,27 @@ static int handle_noop(struct kvm_vcpu *
 
 static int handle_stop(struct kvm_vcpu *vcpu)
 {
+   int rc;
+
vcpu-stat.exit_stop_request++;
-   VCPU_EVENT(vcpu, 3, %s, cpu stopped);
atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags);
-   return -ENOTSUPP;
+   spin_lock_bh(vcpu-arch.local_int.lock);
+   if (vcpu-arch.local_int.action_bits  ACTION_STORE_ON_STOP) {
+   vcpu-arch.local_int.action_bits = ~ACTION_STORE_ON_STOP;
+   rc = __kvm_s390_vcpu_store_status(vcpu,
+ KVM_S390_STORE_STATUS_NOADDR);
+   if (rc = 0)
+   rc = -ENOTSUPP;
+   }
+
+   if (vcpu-arch.local_int.action_bits  ACTION_STOP_ON_STOP) {
+   vcpu-arch.local_int.action_bits = ~ACTION_STOP_ON_STOP;
+   VCPU_EVENT(vcpu, 3, %s, cpu stopped);
+   rc = -ENOTSUPP;
+   } else
+   rc = 0;
+   spin_unlock_bh(vcpu-arch.local_int.lock);
+   return rc;
 }
 
 static int handle_validity(struct kvm_vcpu *vcpu)
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -57,6 +57,12 @@ struct kvm_stats_debugfs_item debugfs_en
{ instruction_chsc, VCPU_STAT(instruction_chsc) },
{ instruction_stsi, VCPU_STAT(instruction_stsi) },
{ instruction_stfl, VCPU_STAT(instruction_stfl) },
+   { instruction_sigp_sense, VCPU_STAT(instruction_sigp_sense) },
+   { instruction_sigp_emergency, VCPU_STAT(instruction_sigp_emergency) },
+   { instruction_sigp_stop, VCPU_STAT(instruction_sigp_stop) },
+   { instruction_sigp_set_arch, VCPU_STAT(instruction_sigp_arch) },
+   { instruction_sigp_set_prefix, VCPU_STAT(instruction_sigp_prefix) },
+   { instruction_sigp_restart, VCPU_STAT(instruction_sigp_restart) },
{ NULL }
 };
 
@@ -290,6 +296,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
spin_lock_bh(kvm-arch.float_int.lock);
kvm-arch.float_int.local_int[id] = vcpu-arch.local_int;
init_waitqueue_head(vcpu-arch.local_int.wq);
+   vcpu-arch.local_int.cpuflags = vcpu-arch.sie_block-cpuflags;
spin_unlock_bh(kvm-arch.float_int.lock);
 
rc = kvm_vcpu_init(vcpu, kvm, id);
Index: 

[kvm-devel] [RFC/PATCH 10/15] kvm-s390: intercepts for diagnose instructions

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This patch introduces interpretation of some diagnose instruction intercepts.
Diagnose is our classic architected way of doing a hypercall. This patch
features the following diagnose codes:
- vm storage size, that tells the guest about its memory layout
- time slice end, which is used by the guest to indicate that it waits
  for a lock and thus cannot use up its time slice in a useful way
- ipl functions, which a guest can use to reset and reboot itself

In order to implement ipl functions, we also introduce an exit reason that
causes userspace to perform various resets on the virtual machine. All resets
are described in the principles of operation book, except KVM_S390_RESET_IPL
which causes a reboot of the machine.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/kvm/Makefile  |2 -
 arch/s390/kvm/diag.c|   67 
 arch/s390/kvm/intercept.c   |1 
 arch/s390/kvm/kvm-s390.c|1 
 arch/s390/kvm/kvm-s390.h|2 +
 include/asm-s390/kvm_host.h |5 ++-
 include/linux/kvm.h |8 +
 7 files changed, 84 insertions(+), 2 deletions(-)

Index: kvm/arch/s390/kvm/Makefile
===
--- kvm.orig/arch/s390/kvm/Makefile
+++ kvm/arch/s390/kvm/Makefile
@@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
 
-kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o 
sigp.o
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o 
sigp.o diag.o
 obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/diag.c
===
--- /dev/null
+++ kvm/arch/s390/kvm/diag.c
@@ -0,0 +1,67 @@
+/*
+ * diag.c - handling diagnose instructions
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Carsten Otte [EMAIL PROTECTED]
+ *   Christian Borntraeger [EMAIL PROTECTED]
+ */
+
+#include linux/kvm.h
+#include linux/kvm_host.h
+#include kvm-s390.h
+
+static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
+{
+   VCPU_EVENT(vcpu, 5, %s, diag time slice end);
+   vcpu-stat.diagnose_44++;
+   vcpu_put(vcpu);
+   schedule();
+   vcpu_load(vcpu);
+   return 0;
+}
+
+static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
+{
+   unsigned int reg = vcpu-arch.sie_block-ipa  0xf;
+   unsigned long subcode = vcpu-arch.guest_gprs[reg]  0x;
+
+   VCPU_EVENT(vcpu, 5, diag ipl functions, subcode %lx, subcode);
+   switch (subcode) {
+   case 3:
+   vcpu-run-s390_reset_flags = KVM_S390_RESET_CLEAR;
+   break;
+   case 4:
+   vcpu-run-s390_reset_flags = 0;
+   break;
+   default:
+   return -ENOTSUPP;
+   }
+
+   atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags);
+   vcpu-run-s390_reset_flags |= KVM_S390_RESET_SUBSYSTEM;
+   vcpu-run-s390_reset_flags |= KVM_S390_RESET_IPL;
+   vcpu-run-s390_reset_flags |= KVM_S390_RESET_CPU_INIT;
+   vcpu-run-exit_reason = KVM_EXIT_S390_RESET;
+   VCPU_EVENT(vcpu, 3, requesting userspace resets %lx,
+ vcpu-run-s390_reset_flags);
+   return -EREMOTE;
+}
+
+int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
+{
+   int code = (vcpu-arch.sie_block-ipb  0xfff)  16;
+
+   switch (code) {
+   case 0x44:
+   return __diag_time_slice_end(vcpu);
+   case 0x308:
+   return __diag_ipl_functions(vcpu);
+   default:
+   return -ENOTSUPP;
+   }
+}
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu *
 }
 
 static intercept_handler_t instruction_handlers[256] = {
+   [0x83] = kvm_s390_handle_diag,
[0xae] = kvm_s390_handle_sigp,
[0xb2] = kvm_s390_handle_priv,
[0xb7] = handle_lctl,
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -63,6 +63,7 @@ struct kvm_stats_debugfs_item debugfs_en
{ instruction_sigp_set_arch, VCPU_STAT(instruction_sigp_arch) },
{ instruction_sigp_set_prefix, VCPU_STAT(instruction_sigp_prefix) },
{ instruction_sigp_restart, VCPU_STAT(instruction_sigp_restart) },
+   { diagnose_44, VCPU_STAT(diagnose_44) },
{ NULL }
 };
 
Index: 

[kvm-devel] [RFC/PATCH 11/15] kvm-s390: add kvm to kconfig on s390

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This patch adds the virtualization submenu and the kvm option to the kernel
config. It also defines HAVE_KVM for 64bit kernels.

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/Kconfig |3 +++
 arch/s390/kvm/Kconfig |   43 +++
 2 files changed, 46 insertions(+)

Index: kvm/arch/s390/Kconfig
===
--- kvm.orig/arch/s390/Kconfig
+++ kvm/arch/s390/Kconfig
@@ -66,6 +66,7 @@ config S390
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
+   select HAVE_KVM if 64BIT
 
 source init/Kconfig
 
@@ -553,3 +554,5 @@ source security/Kconfig
 source crypto/Kconfig
 
 source lib/Kconfig
+
+source arch/s390/kvm/Kconfig
Index: kvm/arch/s390/kvm/Kconfig
===
--- /dev/null
+++ kvm/arch/s390/kvm/Kconfig
@@ -0,0 +1,43 @@
+#
+# KVM configuration
+#
+config HAVE_KVM
+   bool
+
+menuconfig VIRTUALIZATION
+   bool Virtualization
+   default y
+   ---help---
+ Say Y here to get to see options for using your Linux host to run 
other
+ operating systems inside virtual machines (guests).
+ This option alone does not add any kernel code.
+
+ If you say N, all options in this submenu will be skipped and 
disabled.
+
+if VIRTUALIZATION
+
+config KVM
+   tristate Kernel-based Virtual Machine (KVM) support
+   depends on HAVE_KVM  EXPERIMENTAL
+   select PREEMPT_NOTIFIERS
+   select ANON_INODES
+   select S390_SWITCH_AMODE
+   select PREEMPT
+   ---help---
+ Support hosting paravirtualized guest machines using the SIE
+ virtualization capability on the mainframe. This should work
+ on any 64bit machine.
+
+ This module provides access to the hardware capabilities through
+ a character device node named /dev/kvm.
+
+ To compile this as a module, choose M here: the module
+ will be called kvm.
+
+ If unsure, say N.
+
+# OK, it's a little counter-intuitive to do this, but it puts it neatly under
+# the virtualization menu.
+source drivers/virtio/Kconfig
+
+endif # VIRTUALIZATION



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC/PATCH 12/15] kvm-s390: API documentation

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]

This patch adds Documentation/s390/kvm.txt, which describes specifics of kvm's
user interface that are unique to s390 architecture.

Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 Documentation/s390/kvm.txt |  125 +
 1 file changed, 125 insertions(+)

Index: kvm/Documentation/s390/kvm.txt
===
--- /dev/null
+++ kvm/Documentation/s390/kvm.txt
@@ -0,0 +1,125 @@
+*** BIG FAT WARNING ***
+The kvm module is currently in EXPERIMENTAL state for s390. This means, that
+the interface to the module is not yet considered to remain stable. Thus, be
+prepared that we keep breaking your userspace application and guest
+compatibility over and over again until we feel happy with the result. Make 
sure
+your guest kernel, your host kernel, and your userspace launcher are in a
+consistent state.
+
+This Documentation describes the unique ioctl calls to /dev/kvm, the resulting
+kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from 
x86.
+
+1. ioctl calls to /dev/kvm
+KVM does support the following ioctls on s390 that are common with other
+architectures and do behave the same:
+KVM_GET_API_VERSION
+KVM_CREATE_VM  (*) see note
+KVM_CHECK_EXTENSION
+KVM_GET_VCPU_MMAP_SIZE
+
+Notes:
+* KVM_CREATE_VM may fail on s390, if the calling process has multiple
+threads and has not called KVM_S390_ENABLE_SIE before.
+
+In addition, on s390 the following architecture specific ioctls are supported:
+ioctl: KVM_S390_ENABLE_SIE
+args:  none
+see also:  include/linux/kvm.h
+This call causes the kernel to switch on PGSTE in the user page table. This
+operation is needed in order to run a virtual machine, and it requires the
+calling process to be single-threaded. Note that the first call to 
KVM_CREATE_VM
+will implicitly try to switch on PGSTE if the user process has not called
+KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads
+before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
+observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
+operation, is not reversible, and will persist over the entire lifetime of
+the calling process. It does not have any user-visibe effect other than a small
+performance penalty.
+
+2. ioctl calls to the kvm-vm file descriptor
+KVM does support the following ioctls on s390 that are common with other
+architectures and do behave the same:
+KVM_CREATE_VCPU
+KVM_SET_USER_MEMORY_REGION  (*) see note
+KVM_GET_DIRTY_LOG  (**) see note
+
+Notes:
+*  kvm does only allow exactly one memory slot on s390, which has to start
+   at guest absolute address zero and at a user address that is aligned on any
+   page boundary. This hardware limitation allows us to have a few unique
+   optimizations. The memory slot does'nt have to be filled
+   with memory actually, it may contain sparse holes. That said, with different
+   user memory layout this does still allow a large flexibility when
+   doing the guest memory setup.
+** KVM_GET_DIRTY_LOG does'nt work proper yet. The user will receive an empty
+log. This ioctl call is only needed for guest migration, and we intend to
+implement this one in the future.
+
+In addition, on s390 the following architecture specific ioctls for the kvm-vm
+file descriptor are supported:
+ioctl: KVM_S390_INTERRUPT
+args:  struct kvm_s390_interrupt *
+see also:  include/linux/kvm.h
+This ioctl is used to submit a floating interrupt for a virtual machine.
+Floating interrupts may be delivered to any virtual cpu in the configuration.
+Only some interrupt types defined in include/linux/kvm.h make sense when
+submitted as floating interrupt. The following interrupts are not considered
+to be useful as floating interrupt, and a call to inject them will result in
+-EINVAL error code: program interrupts, and interprocessor signals. Valid
+floating interrupts are:
+KVM_S390_INT_VIRTIO
+KVM_S390_INT_SERVICE
+
+3. ioctl calls to the kvm-vcpu file descriptor
+KVM does support the following ioctls on s390 that are common with other
+architectures and do behave the same:
+KVM_RUN
+KVM_GET_REGS
+KVM_SET_REGS
+KVM_GET_SREGS
+KVM_SET_SREGS
+KVM_GET_FPU
+KVM_SET_FPU
+
+In addition, on s390 the following architecture specific ioctls for the
+kvm-vcpu file descriptor are supported:
+ioctl: KVM_S390_INTERRUPT
+args:  struct kvm_s390_interrupt *
+see also:  include/linux/kvm.h
+This ioctl is used to submit an interrupt for a specific virtual cpu.
+Only some interrupt types defined in include/linux/kvm.h make sense when
+submitted for a specific cpu. The following interrupts are not considered
+to be useful, and a call to inject them will result in -EINVAL error code:
+service processor calls, and virtio interrupts. Valid interrupt types are:
+KVM_S390_PROGRAM_INT
+KVM_S390_SIGP_STOP

[kvm-devel] [RFC/PATCH 13/15] kvm-s390: update maintainers

2008-03-20 Thread Carsten Otte
From: Christian Borntraeger [EMAIL PROTECTED]

This patch adds an entry for kvm on s390 to the MAINTAINERS file :-). We intend
to push all patches regarding this via Avi's kvm.git.

Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 MAINTAINERS |   10 ++
 1 file changed, 10 insertions(+)

Index: kvm/MAINTAINERS
===
--- kvm.orig/MAINTAINERS
+++ kvm/MAINTAINERS
@@ -2296,6 +2296,16 @@ L:   [EMAIL PROTECTED]
 W: kvm.sourceforge.net
 S: Supported
 
+KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
+P: Carsten Otte
+M: [EMAIL PROTECTED]
+P: Christian Borntraeger
+M: [EMAIL PROTECTED]
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+
 KEXEC
 P: Eric Biederman
 M: [EMAIL PROTECTED]



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Carsten Otte
From: Christian Borntraeger [EMAIL PROTECTED]
From: Carsten Otte [EMAIL PROTECTED]

This patch adds functionality to detect if the kernel runs under the KVM
hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
allows drivers to skip device detection if the systems runs non-virtualized.
We also define a preferred console to avoid having the ttyS0, which is a line
mode only console.

Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/Kconfig|7 +++
 arch/s390/kernel/early.c |4 
 arch/s390/kernel/setup.c |   10 +++---
 include/asm-s390/setup.h |1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

Index: kvm/arch/s390/Kconfig
===
--- kvm.orig/arch/s390/Kconfig
+++ kvm/arch/s390/Kconfig
@@ -533,6 +533,13 @@ config ZFCPDUMP
  Select this option if you want to build an zfcpdump enabled kernel.
  Refer to file:Documentation/s390/zfcpdump.txt for more details on 
this.
 
+config S390_GUEST
+bool s390 guest support (EXPERIMENTAL)
+   depends on 64BIT  EXPERIMENTAL
+   select VIRTIO
+   select VIRTIO_RING
+   help
+ Select this option if you want to run the kernel under s390 linux
 endmenu
 
 source net/Kconfig
Index: kvm/arch/s390/kernel/early.c
===
--- kvm.orig/arch/s390/kernel/early.c
+++ kvm/arch/s390/kernel/early.c
@@ -143,6 +143,10 @@ static noinline __init void detect_machi
/* Running on a P/390 ? */
if (cpuinfo-cpu_id.machine == 0x7490)
machine_flags |= 4;
+
+   /* Running under KVM ? */
+   if (cpuinfo-cpu_id.version == 0xfe)
+   machine_flags |= 64;
 }
 
 #ifdef CONFIG_64BIT
Index: kvm/arch/s390/kernel/setup.c
===
--- kvm.orig/arch/s390/kernel/setup.c
+++ kvm/arch/s390/kernel/setup.c
@@ -793,9 +793,13 @@ setup_arch(char **cmdline_p)
   This machine has an IEEE fpu\n :
   This machine has no IEEE fpu\n);
 #else /* CONFIG_64BIT */
-   printk((MACHINE_IS_VM) ?
-  We are running under VM (64 bit mode)\n :
-  We are running native (64 bit mode)\n);
+   if (MACHINE_IS_VM)
+   printk(We are running under VM (64 bit mode)\n);
+   else if (MACHINE_IS_KVM) {
+   printk(We are running under KVM (64 bit mode)\n);
+   add_preferred_console(ttyS, 1, NULL);
+   } else
+   printk(We are running native (64 bit mode)\n);
 #endif /* CONFIG_64BIT */
 
/* Save unparsed command line copy for /proc/cmdline */
Index: kvm/include/asm-s390/setup.h
===
--- kvm.orig/include/asm-s390/setup.h
+++ kvm/include/asm-s390/setup.h
@@ -62,6 +62,7 @@ extern unsigned long machine_flags;
 #define MACHINE_IS_VM  (machine_flags  1)
 #define MACHINE_IS_P390(machine_flags  4)
 #define MACHINE_HAS_MVPG   (machine_flags  16)
+#define MACHINE_IS_KVM (machine_flags  64)
 #define MACHINE_HAS_IDTE   (machine_flags  128)
 #define MACHINE_HAS_DIAG9C (machine_flags  256)
 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls

2008-03-20 Thread Carsten Otte
From: Christian Borntraeger [EMAIL PROTECTED]

This patch implements kvm guest kernel support for paravirtualized devices
and contains two parts:
o a basic virtio stub using virtio_ring and external interrupts and hypercalls
o full hypercall implementation in kvm_para.h

Currently we dont have PCI on s390. Making virtio_pci usable for s390 seems
more complicated that providing an own stub. This virtio stub is similar to
the lguest one, the memory for the descriptors and the device detection is made
via additional mapped memory on top of the guest storage. We use an external
interrupt with extint code 1237 for host-guest notification. 

The hypercall definition uses the diag instruction for issuing a hypercall. The
parameters are written in R2-R7, the hypercall number is written in R1. This is
similar to the system call ABI (svc) which can use R1 for the number and R2-R6 
for the parameters.


Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 drivers/s390/Makefile |2 
 drivers/s390/kvm/Makefile |9 +
 drivers/s390/kvm/kvm_virtio.c |  326 ++
 drivers/s390/kvm/kvm_virtio.h |   47 ++
 include/asm-s390/kvm_para.h   |  124 +++
 5 files changed, 505 insertions(+), 3 deletions(-)

Index: kvm/drivers/s390/Makefile
===
--- kvm.orig/drivers/s390/Makefile
+++ kvm/drivers/s390/Makefile
@@ -5,7 +5,7 @@
 CFLAGS_sysinfo.o += -Iinclude/math-emu -Iarch/s390/math-emu -w
 
 obj-y += s390mach.o sysinfo.o s390_rdev.o
-obj-y += cio/ block/ char/ crypto/ net/ scsi/
+obj-y += cio/ block/ char/ crypto/ net/ scsi/ kvm/
 
 drivers-y += drivers/s390/built-in.o
 
Index: kvm/drivers/s390/kvm/Makefile
===
--- /dev/null
+++ kvm/drivers/s390/kvm/Makefile
@@ -0,0 +1,9 @@
+# Makefile for kvm guest drivers on s390
+#
+# Copyright IBM Corp. 2008
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License (version 2 only)
+# as published by the Free Software Foundation.
+
+obj-$(CONFIG_VIRTIO) += kvm_virtio.o
Index: kvm/drivers/s390/kvm/kvm_virtio.c
===
--- /dev/null
+++ kvm/drivers/s390/kvm/kvm_virtio.c
@@ -0,0 +1,326 @@
+/*
+ * kvm_virtio.c - virtio for kvm on s390
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Christian Borntraeger [EMAIL PROTECTED]
+ */
+
+#include linux/init.h
+#include linux/bootmem.h
+#include linux/err.h
+#include linux/virtio.h
+#include linux/virtio_config.h
+#include linux/interrupt.h
+#include linux/virtio_ring.h
+#include asm/io.h
+#include asm/kvm_para.h
+#include asm/setup.h
+#include asm/s390_ext.h
+
+#include kvm_virtio.h
+
+/*
+ * The pointer to our (page) of device descriptions.
+ */
+static void *kvm_devices;
+
+/*
+ * Unique numbering for kvm devices.
+ */
+static unsigned int dev_index;
+
+struct kvm_device {
+   struct virtio_device vdev;
+   struct kvm_device_desc *desc;
+};
+
+#define to_kvmdev(vd) container_of(vd, struct kvm_device, vdev)
+
+/*
+ * memory layout:
+ * - kvm_device_descriptor
+ *struct kvm_device_desc
+ * - configuration
+ *struct kvm_vqconfig
+ * - feature bits
+ * - config space
+ */
+static struct kvm_vqconfig *kvm_vq_config(const struct kvm_device_desc *desc)
+{
+   return (struct kvm_vqconfig *)(desc + 1);
+}
+
+static u8 *kvm_vq_features(const struct kvm_device_desc *desc)
+{
+   return (u8 *)(kvm_vq_config(desc) + desc-num_vq);
+}
+
+static u8 *kvm_vq_configspace(const struct kvm_device_desc *desc)
+{
+   return kvm_vq_features(desc) + desc-feature_len * 2;
+}
+
+/*
+ * The total size of the config page used by this device (incl. desc)
+ */
+static unsigned desc_size(const struct kvm_device_desc *desc)
+{
+   return sizeof(*desc)
+   + desc-num_vq * sizeof(struct kvm_vqconfig)
+   + desc-feature_len * 2
+   + desc-config_len;
+}
+
+/*
+ * This tests (and acknowleges) a feature bit.
+ */
+static bool kvm_feature(struct virtio_device *vdev, unsigned fbit)
+{
+   struct kvm_device_desc *desc = to_kvmdev(vdev)-desc;
+   u8 *features;
+
+   if (fbit / 8  desc-feature_len)
+   return false;
+
+   features = kvm_vq_features(desc);
+   if (!(features[fbit / 8]  (1  (fbit % 8
+   return false;
+
+   /*
+* We set the matching bit in the other half of the bitmap to tell the
+* Host we want to use this feature.
+*/
+   features[desc-feature_len + fbit / 8] |= (1  (fbit % 8));
+   return true;
+}
+

[kvm-devel] [RFC/PATCH 08/15] kvm-s390: intercepts for privileged instructions

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This patch introduces in-kernel handling of some intercepts for privileged
instructions:
handle_set_prefix()sets the prefix register of the local cpu
handle_store_prefix()  stores the content of the prefix register to memory
handle_store_cpu_address() stores the cpu number of the current cpu to memory
handle_skey()  just decrements the instruction address and retries
handle_stsch() delivers condition code 3 operation not supported
handle_chsc()  same here
handle_stfl()  stores the facility list which contains the
   capabilities of the cpu
handle_stidp() stores cpu type/model/revision and such
handle_stsi()  stores information about the system topology

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/kvm/Makefile  |2 
 arch/s390/kvm/intercept.c   |1 
 arch/s390/kvm/kvm-s390.c|   11 +
 arch/s390/kvm/kvm-s390.h|3 
 arch/s390/kvm/priv.c|  322 
 include/asm-s390/kvm_host.h |   13 +
 6 files changed, 351 insertions(+), 1 deletion(-)

Index: kvm/arch/s390/kvm/Makefile
===
--- kvm.orig/arch/s390/kvm/Makefile
+++ kvm/arch/s390/kvm/Makefile
@@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
 
-kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o interrupt.o priv.o
 obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/intercept.c
===
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -100,6 +100,7 @@ static int handle_lctl(struct kvm_vcpu *
 }
 
 static intercept_handler_t instruction_handlers[256] = {
+   [0xb2] = kvm_s390_handle_priv,
[0xb7] = handle_lctl,
[0xeb] = handle_lctg,
 };
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -48,6 +48,15 @@ struct kvm_stats_debugfs_item debugfs_en
{ deliver_restart_signal, VCPU_STAT(deliver_restart_signal) },
{ deliver_program_interruption, VCPU_STAT(deliver_program_int) },
{ exit_wait_state, VCPU_STAT(exit_wait_state) },
+   { instruction_stidp, VCPU_STAT(instruction_stidp) },
+   { instruction_spx, VCPU_STAT(instruction_spx) },
+   { instruction_stpx, VCPU_STAT(instruction_stpx) },
+   { instruction_stap, VCPU_STAT(instruction_stap) },
+   { instruction_storage_key, VCPU_STAT(instruction_storage_key) },
+   { instruction_stsch, VCPU_STAT(instruction_stsch) },
+   { instruction_chsc, VCPU_STAT(instruction_chsc) },
+   { instruction_stsi, VCPU_STAT(instruction_stsi) },
+   { instruction_stfl, VCPU_STAT(instruction_stfl) },
{ NULL }
 };
 
@@ -249,6 +258,8 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu 
vcpu-arch.sie_block-eca   = 0xC1002001U;
setup_timer(vcpu-arch.ckc_timer, kvm_s390_idle_wakeup,
 (unsigned long) vcpu);
+   get_cpu_id(vcpu-arch.cpu_id);
+   vcpu-arch.cpu_id.version = 0xfe;
return 0;
 }
 
Index: kvm/arch/s390/kvm/kvm-s390.h
===
--- kvm.orig/arch/s390/kvm/kvm-s390.h
+++ kvm/arch/s390/kvm/kvm-s390.h
@@ -47,4 +47,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_interrupt *s390int);
 int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code);
+
+/* implemented in priv.c */
+int kvm_s390_handle_priv(struct kvm_vcpu *vcpu);
 #endif
Index: kvm/arch/s390/kvm/priv.c
===
--- /dev/null
+++ kvm/arch/s390/kvm/priv.c
@@ -0,0 +1,322 @@
+/*
+ * priv.c - handling privileged instructions
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Carsten Otte [EMAIL PROTECTED]
+ *   Christian Borntraeger [EMAIL PROTECTED]
+ */
+
+#include linux/kvm.h
+#include linux/errno.h
+#include asm/current.h
+#include asm/debug.h
+#include asm/ebcdic.h
+#include asm/sysinfo.h
+#include gaccess.h
+#include kvm-s390.h
+
+static int handle_set_prefix(struct kvm_vcpu *vcpu)
+{
+   int base2 = vcpu-arch.sie_block-ipb  28;
+   int disp2 = ((vcpu-arch.sie_block-ipb  0x0fff)  16);
+   u64 operand2;
+   u32 address = 0;
+   u8 tmp;
+
+   

[kvm-devel] [RFC/PATCH 06/15] kvm-s390: sie intercept handling

2008-03-20 Thread Carsten Otte
From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

This path introduces handling of sie intercepts in three flavors: Intercepts
are either handled completely in-kernel by kvm_handle_sie_intercept(),
or passed to userspace with corresponding data in struct kvm_run in case
kvm_handle_sie_intercept() returns -ENOTSUPP.
In case of partial execution in kernel with the need of userspace support,
kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
-EREMOTE.

The trivial intercept reasons are handled in this patch:
handle_noop() just does nothing for intercepts that don't require our support
  at all
handle_stop() is called when a cpu enters stopped state, and it drops out to
  userland after updating our vcpu state
handle_validity() faults in the cpu lowcore if needed, or passes the request
  to userland

Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
Signed-off-by: Carsten Otte [EMAIL PROTECTED]
---
 arch/s390/kvm/Makefile  |2 -
 arch/s390/kvm/intercept.c   |   83 
 arch/s390/kvm/kvm-s390.c|   46 +++-
 arch/s390/kvm/kvm-s390.h|6 +++
 include/asm-s390/kvm_host.h |4 ++
 include/linux/kvm.h |9 
 6 files changed, 148 insertions(+), 2 deletions(-)

Index: kvm/arch/s390/kvm/Makefile
===
--- kvm.orig/arch/s390/kvm/Makefile
+++ kvm/arch/s390/kvm/Makefile
@@ -10,5 +10,5 @@ common-objs = $(addprefix ../../../virt/
 
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/s390/kvm
 
-kvm-objs := $(common-objs) kvm-s390.o sie64a.o
+kvm-objs := $(common-objs) kvm-s390.o sie64a.o intercept.o
 obj-$(CONFIG_KVM) += kvm.o
Index: kvm/arch/s390/kvm/intercept.c
===
--- /dev/null
+++ kvm/arch/s390/kvm/intercept.c
@@ -0,0 +1,83 @@
+/*
+ * intercept.c - in-kernel handling for sie intercepts
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Carsten Otte [EMAIL PROTECTED]
+ *   Christian Borntraeger [EMAIL PROTECTED]
+ */
+
+#include linux/kvm_host.h
+#include linux/errno.h
+#include linux/pagemap.h
+
+#include asm/kvm_host.h
+
+#include kvm-s390.h
+
+static int handle_noop(struct kvm_vcpu *vcpu)
+{
+   switch (vcpu-arch.sie_block-icptcode) {
+   case 0x10:
+   vcpu-stat.exit_external_request++;
+   break;
+   case 0x14:
+   vcpu-stat.exit_external_interrupt++;
+   break;
+   default:
+   break; /* nothing */
+   }
+   return 0;
+}
+
+static int handle_stop(struct kvm_vcpu *vcpu)
+{
+   vcpu-stat.exit_stop_request++;
+   VCPU_EVENT(vcpu, 3, %s, cpu stopped);
+   atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags);
+   return -ENOTSUPP;
+}
+
+static int handle_validity(struct kvm_vcpu *vcpu)
+{
+   int viwhy = vcpu-arch.sie_block-ipb  16;
+   vcpu-stat.exit_validity++;
+   if (viwhy == 0x37) {
+   fault_in_pages_writeable((char __user *)
+   vcpu-kvm-arch.guest_origin +
+   vcpu-arch.sie_block-prefix, PAGE_SIZE);
+   return 0;
+   }
+   VCPU_EVENT(vcpu, 2, unhandled validity intercept code %d,
+   viwhy);
+   return -ENOTSUPP;
+}
+
+static const intercept_handler_t intercept_funcs[0x48  2] = {
+   [0x00  2] = handle_noop,
+   [0x10  2] = handle_noop,
+   [0x14  2] = handle_noop,
+   [0x20  2] = handle_validity,
+   [0x28  2] = handle_stop,
+};
+
+int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
+{
+   intercept_handler_t func;
+   u8 code = vcpu-arch.sie_block-icptcode;
+
+   if (code  3 || code  0x48)
+   return -ENOTSUPP;
+
+   func = intercept_funcs[code  2];
+
+   if (func)
+   return func(vcpu);
+
+   return -ENOTSUPP;
+}
+
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -23,12 +23,17 @@
 #include asm/lowcore.h
 #include asm/pgtable.h
 
+#include kvm-s390.h
 #include gaccess.h
 
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
 struct kvm_stats_debugfs_item debugfs_entries[] = {
{ userspace_handled, VCPU_STAT(exit_userspace) },
+   { exit_validity, VCPU_STAT(exit_validity) },
+   { exit_stop_request, VCPU_STAT(exit_stop_request) },
+   { exit_external_request, VCPU_STAT(exit_external_request) },
+   { exit_external_interrupt, VCPU_STAT(exit_external_interrupt) },
{ NULL }
 };
 
@@ -384,6 +389,7 @@ static void __vcpu_run(struct 

Re: [kvm-devel] [RFC/PATCH 05/15] KVM_MAX_VCPUS

2008-03-20 Thread Carsten Otte
Hollis Blanchard wrote:
 On Thu, 2008-03-20 at 17:24 +0100, Carsten Otte wrote:
 Index: kvm/include/linux/kvm_host.h
 ===
 --- kvm.orig/include/linux/kvm_host.h
 +++ kvm/include/linux/kvm_host.h
 @@ -24,7 +24,11 @@

  #include asm/kvm_host.h

 +#ifdef CONFIG_S390
 +#define KVM_MAX_VCPUS 64
 +#else
  #define KVM_MAX_VCPUS 16
 +#endif
  #define KVM_MEMORY_SLOTS 32
  /* memory slots that does not exposed to userspace */
  #define KVM_PRIVATE_MEM_SLOTS 4

 Why don't we just define this in asm/kvm_host.h ?
No problem with that, I just wanted to keep impact on common code very 
low and things like this seperated from the actual port. I have a few 
things like this that can safely be taken care about later.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Randy Dunlap
On Thu, 20 Mar 2008 17:25:26 +0100 Carsten Otte wrote:

 From: Christian Borntraeger [EMAIL PROTECTED]
 From: Carsten Otte [EMAIL PROTECTED]
 
 This patch adds functionality to detect if the kernel runs under the KVM
 hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
 allows drivers to skip device detection if the systems runs non-virtualized.
 We also define a preferred console to avoid having the ttyS0, which is a line
 mode only console.
 
 Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]
 Acked-by: Martin Schwidefsky [EMAIL PROTECTED]
 Signed-off-by: Carsten Otte [EMAIL PROTECTED]
 ---
  arch/s390/Kconfig|7 +++
  arch/s390/kernel/early.c |4 
  arch/s390/kernel/setup.c |   10 +++---
  include/asm-s390/setup.h |1 +
  4 files changed, 19 insertions(+), 3 deletions(-)
 
 Index: kvm/arch/s390/kernel/early.c
 ===
 --- kvm.orig/arch/s390/kernel/early.c
 +++ kvm/arch/s390/kernel/early.c
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
   /* Running on a P/390 ? */
   if (cpuinfo-cpu_id.machine == 0x7490)
   machine_flags |= 4;
 +
 + /* Running under KVM ? */
 + if (cpuinfo-cpu_id.version == 0xfe)

Hi,

Where are these magic numbers documented?  (0x7490, 0xfe, etc.)


 + machine_flags |= 64;
  }
  
  #ifdef CONFIG_64BIT

---
~Randy

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 12/15] kvm-s390: API documentation

2008-03-20 Thread Randy Dunlap
On Thu, 20 Mar 2008 17:25:20 +0100 Carsten Otte wrote:

 This patch adds Documentation/s390/kvm.txt, which describes specifics of kvm's
 user interface that are unique to s390 architecture.
 
 Signed-off-by: Carsten Otte [EMAIL PROTECTED]
 ---
  Documentation/s390/kvm.txt |  125 
 +
  1 file changed, 125 insertions(+)
 
 Index: kvm/Documentation/s390/kvm.txt
 ===
 --- /dev/null
 +++ kvm/Documentation/s390/kvm.txt
 @@ -0,0 +1,125 @@
 +*** BIG FAT WARNING ***
 +The kvm module is currently in EXPERIMENTAL state for s390. This means, that

This means that  [no comma]

 +the interface to the module is not yet considered to remain stable. Thus, be
 +prepared that we keep breaking your userspace application and guest
 +compatibility over and over again until we feel happy with the result. Make 
 sure
 +your guest kernel, your host kernel, and your userspace launcher are in a
 +consistent state.
 +
 +This Documentation describes the unique ioctl calls to /dev/kvm, the 
 resulting
 +kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from 
 x86.
 +
 +1. ioctl calls to /dev/kvm
 +KVM does support the following ioctls on s390 that are common with other
 +architectures and do behave the same:
 +KVM_GET_API_VERSION
 +KVM_CREATE_VM(*) see note
 +KVM_CHECK_EXTENSION
 +KVM_GET_VCPU_MMAP_SIZE
 +
 +Notes:
 +* KVM_CREATE_VM may fail on s390, if the calling process has multiple
 +threads and has not called KVM_S390_ENABLE_SIE before.
 +
 +In addition, on s390 the following architecture specific ioctls are 
 supported:
 +ioctl:   KVM_S390_ENABLE_SIE
 +args:none
 +see also:include/linux/kvm.h
 +This call causes the kernel to switch on PGSTE in the user page table. This
 +operation is needed in order to run a virtual machine, and it requires the
 +calling process to be single-threaded. Note that the first call to 
 KVM_CREATE_VM
 +will implicitly try to switch on PGSTE if the user process has not called
 +KVM_S390_ENABLE_SIE before. User processes that want to launch multiple 
 threads
 +before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
 +observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
 +operation, is not reversible, and will persist over the entire lifetime of
 +the calling process. It does not have any user-visibe effect other than a 
 small

 user-visible

 +performance penalty.
 +
 +2. ioctl calls to the kvm-vm file descriptor
 +KVM does support the following ioctls on s390 that are common with other
 +architectures and do behave the same:
 +KVM_CREATE_VCPU
 +KVM_SET_USER_MEMORY_REGION  (*) see note
 +KVM_GET_DIRTY_LOG(**) see note
 +
 +Notes:
 +*  kvm does only allow exactly one memory slot on s390, which has to start
 +   at guest absolute address zero and at a user address that is aligned on 
 any
 +   page boundary. This hardware limitation allows us to have a few unique
 +   optimizations. The memory slot does'nt have to be filled

 doesn't

 +   with memory actually, it may contain sparse holes. That said, with 
 different
 +   user memory layout this does still allow a large flexibility when
 +   doing the guest memory setup.
 +** KVM_GET_DIRTY_LOG does'nt work proper yet. The user will receive an empty

doesn't work properly

 +log. This ioctl call is only needed for guest migration, and we intend to
 +implement this one in the future.
 +
 +In addition, on s390 the following architecture specific ioctls for the 
 kvm-vm
 +file descriptor are supported:
 +ioctl:   KVM_S390_INTERRUPT
 +args:struct kvm_s390_interrupt *
 +see also:include/linux/kvm.h
 +This ioctl is used to submit a floating interrupt for a virtual machine.
 +Floating interrupts may be delivered to any virtual cpu in the configuration.
 +Only some interrupt types defined in include/linux/kvm.h make sense when
 +submitted as floating interrupt. The following interrupts are not considered

 interrupts.

 +to be useful as floating interrupt, and a call to inject them will result in

interrupts,

 +-EINVAL error code: program interrupts, and interprocessor signals. Valid

no comma

 +floating interrupts are:
 +KVM_S390_INT_VIRTIO
 +KVM_S390_INT_SERVICE
 +
 +3. ioctl calls to the kvm-vcpu file descriptor
 +KVM does support the following ioctls on s390 that are common with other
 +architectures and do behave the same:
 +KVM_RUN
 +KVM_GET_REGS
 +KVM_SET_REGS
 +KVM_GET_SREGS
 +KVM_SET_SREGS
 +KVM_GET_FPU
 +KVM_SET_FPU
 +
 +In addition, on s390 the following architecture specific ioctls for the
 +kvm-vcpu file descriptor are supported:
 +ioctl:   KVM_S390_INTERRUPT
 +args:struct kvm_s390_interrupt *
 +see also:   

Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Carsten Otte
Randy Dunlap wrote:
 Index: kvm/arch/s390/kernel/early.c
 ===
 --- kvm.orig/arch/s390/kernel/early.c
 +++ kvm/arch/s390/kernel/early.c
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
  /* Running on a P/390 ? */
  if (cpuinfo-cpu_id.machine == 0x7490)
  machine_flags |= 4;
 +
 +/* Running under KVM ? */
 +if (cpuinfo-cpu_id.version == 0xfe)
 
 Hi,
 
 Where are these magic numbers documented?  (0x7490, 0xfe, etc.)
 
 
 +machine_flags |= 64;
  }
  
  #ifdef CONFIG_64BIT
The cpuid (and most other things about s390 arch) are documented in 
the principles of operation:
http://publibz.boulder.ibm.com/epubs/pdf/a2278324.pdf
http://publibz.boulder.ibm.com/epubs/pdf/dz9zs001.pdf

(see chapter control instructions - store cpu id)

The 0xfe however is convention, the kvm arch code sets this value 
where it implements that instruction. See privileged instructions patch.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable

2008-03-20 Thread Jeremy Fitzhardinge
Carsten Otte wrote:
 +struct mm_struct *dup_mm(struct task_struct *tsk);
   

No prototypes in .c files.  Put this in an appropriate header.

J

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] virtio_blk: Dont waste major numbers

2008-03-20 Thread H. Peter Anvin
Christian Borntraeger wrote:
 Rusty,
 
 currently virtio_blk uses one major number per device. While this works
 quite well on most systems it is wasteful and will exhaust major numbers
 on larger installations.
 
 This patch allocates a major number on init and will use 16 minor numbers
 for each disk. That will allow ~64k virtio_blk disks.
 

Would it be too much to allow 64 minors (63 partitions)?  I have run out 
of 16, myself, but never 64.

-hpa

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] virtio_blk: Dont waste major numbers

2008-03-20 Thread H. Peter Anvin
Anthony Liguori wrote:
 Christian Borntraeger wrote:
 Rusty,

 currently virtio_blk uses one major number per device. While this works
 quite well on most systems it is wasteful and will exhaust major numbers
 on larger installations.

 This patch allocates a major number on init and will use 16 minor numbers
 for each disk. That will allow ~64k virtio_blk disks.

 
 There's are some other limitations to the number of virtio block 
 devices.  For instances...
 
  sprintf(vblk-disk-disk_name, vd%c, virtblk_index++);
 
 This gets bogus after 64 disks.  We also have a hard limit for 
 virtio-pci based on the number of PCI slots available.  One thing I was 
 considering was whether we should try to support multiple disks per 
 virtio device.
 

I would much rather prefer a /dev/vd/dXpY naming scheme, similar to 
cciss and other large disk installations.

Unfortunately yet another side effect of people not habitually 
registering major numbers is that the namespace is not as well maintained.

-hpa

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Christoph Hellwig
On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
   /* Running on a P/390 ? */
   if (cpuinfo-cpu_id.machine == 0x7490)
   machine_flags |= 4;
 +
 + /* Running under KVM ? */
 + if (cpuinfo-cpu_id.version == 0xfe)
 + machine_flags |= 64;

Shouldn't these have symbolic names?


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] performance question

2008-03-20 Thread david ahern
I am trying to understand spikes in system time that I am seeing in a VM. The
guest OS is RHEL4, with 2 vpcus, and 2.5Gb RAM; host is running 2.6.24.2 kernel.
kvm version is kvm-63.

Using the stat scripts Christian Ehrhardt posted a few days ago (thanks,
Christian, very handy tool) I collected kvm_stat data as a function of time (I
added time to the output). Comparing plots of guest system time to plots of
kvm_stat the spikes in system time most correlate to the following kvm_stat
variables:

mmu_cache_miss
mmu_flooded
mmu_pte_updated
mmu_pte_write
mmu_shadow_zapped
pf_fixed
pf_guest
remote_tlb_flush
tlb_flush

Can someone provide some guidance/hints on what would cause spikes in the above
and if there is anything I can do to improve it?

The load on the VM is fairly constant (network traffic of ~48kB/sec received and
 ~189kB/sec transmit) with some moderate disk IO as well.

thanks,
david

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable

2008-03-20 Thread Dave Hansen
On Thu, 2008-03-20 at 10:28 -0700, Jeremy Fitzhardinge wrote:
 Carsten Otte wrote:
  +struct mm_struct *dup_mm(struct task_struct *tsk);
 
 No prototypes in .c files.  Put this in an appropriate header.

Well, and more fundamentally: do we really want dup_mm() able to be
called from other code?

Maybe we need a bit more detailed justification why fork() itself isn't
good enough.  It looks to me like they basically need an arch-specific
argument to fork, telling the new process's page tables to take the
fancy new bit.

I'm really curious how this new stuff is going to get used.  Are you
basically replacing fork() when creating kvm guests?

-- Dave


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 01/15] preparation: provide hook to enable pgstes in user pagetable

2008-03-20 Thread Carsten Otte
Dave Hansen wrote:
 Well, and more fundamentally: do we really want dup_mm() able to be
 called from other code?
 
 Maybe we need a bit more detailed justification why fork() itself isn't
 good enough.  It looks to me like they basically need an arch-specific
 argument to fork, telling the new process's page tables to take the
 fancy new bit.
 
 I'm really curious how this new stuff is going to get used.  Are you
 basically replacing fork() when creating kvm guests?
No. The trick is, that we do need bigger page tables when running 
guests: our page tables are usually 2k, but when running a guest 
they're 4k to track both guest and host dirtyreference information. 
This looks like this:
*--*
*2k PTE's  *
*--*
*2k PGSTE  *
*--*
We don't want to waste precious memory for all page tables. We'd like 
to have one kernel image that runs regular server workload _and_ 
guests. Therefore, we need to reallocate the page table after fork() 
once we know that task is going to be a hypervisor. That's what this 
code does: reallocate a bigger page table to accomondate the extra 
information. The task needs to be single-threaded when calling for 
extended page tables.

Btw: at fork() time, we cannot tell whether or not the user's going to 
be a hypervisor. Therefore we cannot do this in fork.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Carsten Otte
Christoph Hellwig wrote:
 On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
  /* Running on a P/390 ? */
  if (cpuinfo-cpu_id.machine == 0x7490)
  machine_flags |= 4;
 +
 +/* Running under KVM ? */
 +if (cpuinfo-cpu_id.version == 0xfe)
 +machine_flags |= 64;
 
 Shouldn't these have symbolic names?
You mean symbolics for machine_flags? Or symbolics for cpu ids?

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Christoph Hellwig
On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote:
 Christoph Hellwig wrote:
 On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
 /* Running on a P/390 ? */
 if (cpuinfo-cpu_id.machine == 0x7490)
 machine_flags |= 4;
 +
 +   /* Running under KVM ? */
 +   if (cpuinfo-cpu_id.version == 0xfe)
 +   machine_flags |= 64;

 Shouldn't these have symbolic names?
 You mean symbolics for machine_flags? Or symbolics for cpu ids?

Either.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Carsten Otte
Christoph Hellwig wrote:
 On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote:
 Christoph Hellwig wrote:
 On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
 @@ -143,6 +143,10 @@ static noinline __init void detect_machi
/* Running on a P/390 ? */
if (cpuinfo-cpu_id.machine == 0x7490)
machine_flags |= 4;
 +
 +  /* Running under KVM ? */
 +  if (cpuinfo-cpu_id.version == 0xfe)
 +  machine_flags |= 64;
 Shouldn't these have symbolic names?
 You mean symbolics for machine_flags? Or symbolics for cpu ids?
 
 Either.
Hmmh. For cpu id's did'nt make sense probably until now that kvm also 
uses them. Before, this was the only one place that uses them.

With kvm and 0xfe, this one is sort of temporary one. We intend to 
rework this code to use store system information, which would give 
us way more information about the machine and it's hypervisor 
topology. Up until my todo list gets to that point, I think we'll have 
to cope with a temporary number. We'll aim for making that change 
before 2.6.26 gets released.

The machine flags do have symbolic names, defined in 
include/asm-s390/setup.h. And yea, they should be used here. Will 
change that.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] QEMU hotplug: check device name in drive_add

2008-03-20 Thread Ryan Harper
Using drive_add with bogus devfn values would segfault QEMU when
attempting to add scsi devices.  Attached patch checks in hotplug code
for appropriate devices that drive_add() will work with (looking before
leaping) and bails if you don't specify a proper device with your
bus,devfn.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]


diffstat output:
 device-hotplug.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletion(-)

Signed-off-by: Ryan Harper [EMAIL PROTECTED]
---
When using drive_add in the QEMU monitor, if one specifies a bogus devfn to the
command while specifying a scsi disk (if=scsi), then QEMU segfaults due to
issues with getting a valid return from find_pci_dev, and vl.c setting unit_id=0
avoiding lsi_scsi_attach's check for a controller.  Rather than muck through the
unit_id calculation (which does make sense for the case that users don't specify
a unit_id), in drive_add() we know that we only support the SCSI
controller and virtio_blk, so ignore any devfn that doesn't point to either type
of device.

Signed-off-by: Ryan Harper [EMAIL PROTECTED]

diff --git a/qemu/hw/device-hotplug.c b/qemu/hw/device-hotplug.c
index 98a467c..a717d9b 100644
--- a/qemu/hw/device-hotplug.c
+++ b/qemu/hw/device-hotplug.c
@@ -55,7 +55,7 @@ void drive_hot_add(int pcibus, const char *devfn_string, 
const char *opts)
 {
 int drive_idx, type, bus;
 int devfn;
-int success = 0;
+int success = 0, valid_dev = 0;
 PCIDevice *dev;
 
 devfn = strtoul(devfn_string, NULL, 0);
@@ -67,6 +67,18 @@ void drive_hot_add(int pcibus, const char *devfn_string, 
const char *opts)
 return;
 }
 
+if (!strcmp(dev-name, LSI53C895A SCSI HBA)) {
+valid_dev = 1;
+} else if (!strcmp(dev-name, virtio-blk)) {
+valid_dev = 1;
+}
+
+if (!valid_dev) {
+term_printf(Invalid PCI Device specified by bus:%d devfn:%d\n,
+pcibus, devfn);
+return;
+}
+
 drive_idx = add_init_drive(opts);
 if (drive_idx  0)
 return;

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Heiko Carstens
On Thu, Mar 20, 2008 at 09:59:32PM +0100, Carsten Otte wrote:
 Christoph Hellwig wrote:
  On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote:
  Christoph Hellwig wrote:
  On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
  @@ -143,6 +143,10 @@ static noinline __init void detect_machi
   /* Running on a P/390 ? */
   if (cpuinfo-cpu_id.machine == 0x7490)
   machine_flags |= 4;
  +
  +/* Running under KVM ? */
  +if (cpuinfo-cpu_id.version == 0xfe)
  +machine_flags |= 64;
  Shouldn't these have symbolic names?
  You mean symbolics for machine_flags? Or symbolics for cpu ids?
  
  Either.
 [...]
 The machine flags do have symbolic names, defined in 
 include/asm-s390/setup.h. And yea, they should be used here. Will 
 change that.

Since when do we have symbolic names for the bits?
It was always on my todo list to do a cleanup and replace the numbers
we use everywhere with names. Especially since we have clashes from time
to time... but that didn't hurt enough yet, obviously.
But now that you volunteered to take care of this... :)

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Rusty Russell
On Friday 21 March 2008 01:04:17 Anthony Liguori wrote:
 Rusty Russell wrote:
  From: Paul TBBle Hampson [EMAIL PROTECTED]
 
  This creates a file in $HOME/.lguest/ to directly back the RAM and DMA
  memory mappings created by map_zeroed_pages.

 I created a test program recently that measured the latency of a
 reads/writes to an mmap() file in /dev/shm and in a normal filesystem.
 Even after unlinking the underlying file, the write latency was much
 better with a mmap()'d file in /dev/shm.

How odd!  Do you have any idea why?

 /dev/shm is not really for general use.  I think we'll want to have our
 own tmpfs mount that we use to create VM images.

If we're going to mod the kernel, how about a mmap this part of their address 
space and having the kernel keep the mappings in sync.  But I think that if 
we want to get speed, we should probably be doing the copy between address 
spaces in-kernel so we can do lightweight exits.

 I also prefer to use a 
 unix socket for communication, unlink the file immediately after open,
 and then pass the fd via SCM_RIGHTS to the other process.

Yeah, I shied away from that because cred passing kills whole litters of 
puppies.  It makes for better encapsulation tho, so I'd do it that way in a 
serious implementation.

Cheers,
Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 0/4] Inter-guest virtio I/O example with lguest

2008-03-20 Thread Rusty Russell
On Thursday 20 March 2008 17:54:45 Avi Kivity wrote:
 Rusty Russell wrote:
  Hi all,
 
 Just finished my prototype of inter-guest virtio, using networking as
  an example.  Each guest mmaps the other's address space and uses a FIFO
  for notifications.

 Isn't that a security hole (hole? chasm)?  If the two guests can access
 each other's memory, they might as well be just one guest, and
 communicate internally.

Sorry, sloppy language on my part.  Each launcher process maps the other 
guest's memory as well: ie. copying occurs in the host.

 My feeling is that the host needs to copy the data, using dma if
 available.  Another option is to have one guest map the other's memory
 for read and write, while the other guest is unprivileged.  This allows
 one privileged guest to provide services for other, unprivileged guests,
 like domain 0 or driver domains in Xen.

One having privilege is possible, even trivial with the current patch (it's 
actually doing a completely generic inter-virtio-ring shuffle).  I chose the 
symmetrical approach for this demo for no particularly good reason.

Cheers,
Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Anthony Liguori
Rusty Russell wrote:
 How odd!  Do you have any idea why?
   

Nope, but part of the reason I did this was I recalled a similar 
discussion relating to kqemu and why it used /dev/shm.  I thought it was 
only an issue with older kernels but apparently not.

 /dev/shm is not really for general use.  I think we'll want to have our
 own tmpfs mount that we use to create VM images.
 

 If we're going to mod the kernel, how about a mmap this part of their 
 address 
 space and having the kernel keep the mappings in sync.  But I think that if 
 we want to get speed, we should probably be doing the copy between address 
 spaces in-kernel so we can do lightweight exits.
   

I don't think lightweight exits help the situation very much.  The 
difference between a light weight and heavy weight exit is only 3-4k 
cycles or so.

in-kernel doesn't make the situation much easier.  You have to map pages 
in from a different task.  It's a lot easier if you have both guest 
mapped in userspace.

 I also prefer to use a 
 unix socket for communication, unlink the file immediately after open,
 and then pass the fd via SCM_RIGHTS to the other process.
 

 Yeah, I shied away from that because cred passing kills whole litters of 
 puppies.  It makes for better encapsulation tho, so I'd do it that way in a 
 serious implementation.
   

I'm working on an implementation for KVM at the moment.  Instead of just 
supporting two guests, I'm looking to support N-guests and provide a 
simple switch.  I'll have patches soon.

Regards,

Anthony Liguori

 Cheers,
 Rusty.
   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC/PATCH 15/15] guest: virtio device support, and kvm hypercalls

2008-03-20 Thread Rusty Russell
On Friday 21 March 2008 03:25:28 Carsten Otte wrote:
 +static void kvm_set_status(struct virtio_device *vdev, u8 status)
 +{
 + BUG_ON(!status);
 + to_kvmdev(vdev)-desc-status = status;
 +}
 +
 +/*
 + * To reset the device, we (ab)use the NOTIFY hypercall, with the descriptor
 + * address of the device.  The Host will zero the status and all the 
 + * features. 
 + */
 +static void kvm_reset(struct virtio_device *vdev)
 +{
 + unsigned long offset = (void *)to_kvmdev(vdev)-desc - kvm_devices;
 +
 + kvm_hypercall1(1237, (max_pfnPAGE_SHIFT) + offset);
 +}

I'd recommend a hypercall after set_status, as well as reset.  The
reason lguest doesn't do this is that we don't do feature negotiation
(assuming guest kernel matches host kernel).  In general, the host
needs to know when the VIRTIO_CONFIG_S_DRIVER_OK is set so it can see
what features the guest driver accepted.

Overloading the notify hypercall is kind of a hack too, but it works so
no real need to change that.

 + * The root device for the kvm virtio devices.
 + * This makes them appear as /sys/devices/kvm/0,1,2 not /sys/devices/0,1,2.
 + */ 
 +static struct device kvm_root = {
 + .parent = NULL,
 + .bus_id = kvm_s390,
 +};

You mean /sys/devices/kvm_s390/0,1,2?

 +static int __init kvm_devices_init(void)
 +{
 + if (!MACHINE_IS_KVM)
 + return -ENODEV;
 +
 + if (device_register(kvm_root) != 0)
 + panic(Could not register kvm root);
 +
 + if (add_shared_memory((max_pfn)  PAGE_SHIFT, PAGE_SIZE)) {
 + device_unregister(kvm_root);
 + return -ENOMEM;
 + }

Hmm, panic on device_register fail, but -ENOMEM on add_shared_memory fail?
My theory was that since this is boot time, panic() is the right thing.

Cheers,
Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Lguest] [RFC PATCH 1/5] lguest: mmap backing file

2008-03-20 Thread Rusty Russell
On Thursday 20 March 2008 19:16:00 Tim Post wrote:
 On Thu, 2008-03-20 at 17:05 +1100, Rusty Russell wrote:
  +   snprintf(memfile_path, PATH_MAX, %s/.lguest,
  getenv(HOME) ?: );

 Hi Rusty,

 Is that safe if being run via setuid/gid or shared root? It might be
 better to just look it up in /etc/passwd against the real UID,
 considering that anyone can change (or null) that env string.

Hi Tim,

Fair point: it is bogus in this usage case.  Of course, setuid-ing lguest 
is dumb anyway, since you could use --block= to read and write any file in 
the filesystem.  The mid-term goal is to allow non-root to run lguest, which 
fixes this problem (we don't allow that at the moment, as the guest can pin 
memory).

Cheers,
Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM Test result, kernel f1080a0.., userspace 49cf2d2..

2008-03-20 Thread Yunfeng Zhao
Avi Kivity wrote:
 Yunfeng Zhao wrote:
   
 Following issues fixed:
 1. qcow based smp linux guests likely hang
 https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831
  

 2. smp windows installer crashes while rebooting
 https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831
  

   
 

 No idea how these were fixed.
   
The first one should be fixed by in-kernel pit.  But not sure about the 
second one.

   
 3. Timer of guest is inaccurate
 https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599
  
   
 

 This may be the in-kernel pit.

   
 4. Installer of 64bit vista guest will pause for ten minutes after reboot
 https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599
  

   
 

 The pit again?!

 Confused.
   
Yes, this one should be fixed by in-kernel pit too.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel