Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-22 Thread Peter Zijlstra
On Sat, 2012-02-04 at 11:08 +0900, Takuya Yoshikawa wrote: The latter needs a fundamental change: I heard (from Avi) that we can change mmu_lock to mutex_lock if mmu_notifier becomes preemptible. So I was planning to restart this work when Peter's mm: Preemptibility

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Avi Kivity
On 02/16/2012 10:41 PM, Scott Wood wrote: Sharing the data structures is not need. Simply synchronize them before lookup, like we do for ordinary registers. Ordinary registers are a few bytes. We're talking of dozens of kbytes here. A TLB way is a few dozen bytes, no? I think you

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Avi Kivity
On 02/17/2012 02:19 AM, Alexander Graf wrote: Or we try to be less clever unless we have a really compelling reason. qemu monitor and gdb support aren't compelling reasons to optimize. The goal here was simplicity with a grain of performance concerns. Shared memory is simple in one

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Avi Kivity
On 02/17/2012 02:09 AM, Michael Ellerman wrote: On Thu, 2012-02-16 at 21:28 +0200, Avi Kivity wrote: On 02/16/2012 03:04 AM, Michael Ellerman wrote: ioctl is good for hardware devices and stuff that you want to enumerate and/or control permissions on. For something like KVM that is

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Alexander Graf
On 18.02.2012, at 11:00, Avi Kivity a...@redhat.com wrote: On 02/17/2012 02:19 AM, Alexander Graf wrote: Or we try to be less clever unless we have a really compelling reason. qemu monitor and gdb support aren't compelling reasons to optimize. The goal here was simplicity with a grain

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Avi Kivity
On 02/16/2012 10:41 PM, Scott Wood wrote: Sharing the data structures is not need. Simply synchronize them before lookup, like we do for ordinary registers. Ordinary registers are a few bytes. We're talking of dozens of kbytes here. A TLB way is a few dozen bytes, no? I think you

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Avi Kivity
On 02/17/2012 02:19 AM, Alexander Graf wrote: Or we try to be less clever unless we have a really compelling reason. qemu monitor and gdb support aren't compelling reasons to optimize. The goal here was simplicity with a grain of performance concerns. Shared memory is simple in one

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-18 Thread Alexander Graf
On 18.02.2012, at 11:00, Avi Kivity a...@redhat.com wrote: On 02/17/2012 02:19 AM, Alexander Graf wrote: Or we try to be less clever unless we have a really compelling reason. qemu monitor and gdb support aren't compelling reasons to optimize. The goal here was simplicity with a grain

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-17 Thread Scott Wood
On 02/16/2012 06:23 PM, Alexander Graf wrote: On 16.02.2012, at 21:41, Scott Wood wrote: And yes, we do have fancier hardware coming fairly soon for which this breaks (TLB0 entries can be loaded without host involvement, as long as there's a translation from guest physical to physical in a

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Gleb Natapov
On Wed, Feb 15, 2012 at 03:59:33PM -0600, Anthony Liguori wrote: On 02/15/2012 07:39 AM, Avi Kivity wrote: On 02/07/2012 08:12 PM, Rusty Russell wrote: I would really love to have this, but the problem is that we'd need a general purpose bytecode VM with binding to some kernel APIs. The

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 12:21 AM, Arnd Bergmann wrote: ioctl is good for hardware devices and stuff that you want to enumerate and/or control permissions on. For something like KVM that is really a core kernel service, a syscall makes much more sense. I would certainly never mix the two concepts: If

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Anthony Liguori
On 02/16/2012 02:57 AM, Gleb Natapov wrote: On Wed, Feb 15, 2012 at 03:59:33PM -0600, Anthony Liguori wrote: On 02/15/2012 07:39 AM, Avi Kivity wrote: On 02/07/2012 08:12 PM, Rusty Russell wrote: I would really love to have this, but the problem is that we'd need a general purpose bytecode VM

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is hardly any use. We either need all ways of a respective address to do a full fledged lookup or all of them. I should have said,

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 03:04 AM, Michael Ellerman wrote: ioctl is good for hardware devices and stuff that you want to enumerate and/or control permissions on. For something like KVM that is really a core kernel service, a syscall makes much more sense. Yeah maybe. That distinction is at least

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf
On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is hardly any use. We either need all ways of a respective address to do a full fledged

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 04:46 PM, Anthony Liguori wrote: What will it buy us? Surely not speed. Entering a guest is not much (if at all) faster than exiting to userspace and any non trivial operation will require exit to userspace anyway, You can emulate the PIT/RTC entirely within the guest using

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is hardly any use. We either need all

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Scott Wood
On 02/16/2012 01:38 PM, Avi Kivity wrote: On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Michael Ellerman
On Thu, 2012-02-16 at 21:28 +0200, Avi Kivity wrote: On 02/16/2012 03:04 AM, Michael Ellerman wrote: ioctl is good for hardware devices and stuff that you want to enumerate and/or control permissions on. For something like KVM that is really a core kernel service, a syscall makes

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf
On 16.02.2012, at 20:38, Avi Kivity wrote: On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf
On 16.02.2012, at 21:41, Scott Wood wrote: On 02/16/2012 01:38 PM, Avi Kivity wrote: On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is hardly any use. We either need all

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Scott Wood
On 02/16/2012 01:38 PM, Avi Kivity wrote: On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just one register or all of them. One register is

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf
On 16.02.2012, at 21:41, Scott Wood wrote: On 02/16/2012 01:38 PM, Avi Kivity wrote: On 02/16/2012 09:34 PM, Alexander Graf wrote: On 16.02.2012, at 20:24, Avi Kivity wrote: On 02/15/2012 04:08 PM, Alexander Graf wrote: Well, the scatter/gather registers I proposed will give you just

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 04:39 PM, Alexander Graf wrote: Syscalls are orthogonal to that - they're to avoid the fget_light() and to tighten the vcpu/thread and vm/process relationship. How about keeping the ioctl interface but moving vcpu_run to a syscall then? I dislike half-and-half interfaces

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf
On 15.02.2012, at 12:18, Avi Kivity wrote: On 02/07/2012 04:39 PM, Alexander Graf wrote: Syscalls are orthogonal to that - they're to avoid the fget_light() and to tighten the vcpu/thread and vm/process relationship. How about keeping the ioctl interface but moving vcpu_run to a syscall

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very expensive, yes. We want to have the possibility of exposing a very large TLB to the guest, in the order of multiple kentries. Every

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/12/2012 09:10 AM, Takuya Yoshikawa wrote: Avi Kivity a...@redhat.com wrote: Slot searching is quite fast since there's a small number of slots, and we sort the larger ones to be in the front, so positive lookups are fast. We cache negative lookups in the shadow page tables

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 05:23 PM, Anthony Liguori wrote: On 02/07/2012 07:40 AM, Alexander Graf wrote: Why? For the HPET timer register for example, we could have a simple MMIO hook that says on_read: return read_current_time() - shared_page.offset; on_write: handle_in_user_space();

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf
On 15.02.2012, at 14:29, Avi Kivity wrote: On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very expensive, yes. We want to have the possibility of exposing a very large TLB to the guest,

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 08:12 PM, Rusty Russell wrote: I would really love to have this, but the problem is that we'd need a general purpose bytecode VM with binding to some kernel APIs. The bytecode VM, if made general enough to host more complicated devices, would likely be much larger than the

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 06:29 PM, Jan Kiszka wrote: Isn't there another level in between just scheduling and full syscall return if the user return notifier has some real work to do? Depends on whether you're scheduling a kthread or a userspace process, no? If Kthreads can't return, of

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/07/2012 06:19 PM, Anthony Liguori wrote: Ah. But then ioeventfd has that as well, unless the other end is in the kernel too. Yes, that was my point exactly :-) ioeventfd/mmio-over-socketpair to adifferent thread is not faster than a synchronous KVM_RUN + writing to an eventfd in

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/15/2012 03:37 PM, Alexander Graf wrote: On 15.02.2012, at 14:29, Avi Kivity wrote: On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very expensive, yes. We want to have the

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf
On 15.02.2012, at 14:57, Avi Kivity wrote: On 02/15/2012 03:37 PM, Alexander Graf wrote: On 15.02.2012, at 14:29, Avi Kivity wrote: On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Scott Wood
On 02/15/2012 05:57 AM, Alexander Graf wrote: On 15.02.2012, at 12:18, Avi Kivity wrote: Well the real reason is we have an extra bit reported by page faults that we can control. Can't you set up a hashed pte that is configured in a way that it will fault, no matter what type of access

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Anthony Liguori
On 02/15/2012 07:39 AM, Avi Kivity wrote: On 02/07/2012 08:12 PM, Rusty Russell wrote: I would really love to have this, but the problem is that we'd need a general purpose bytecode VM with binding to some kernel APIs. The bytecode VM, if made general enough to host more complicated devices,

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Arnd Bergmann
On Tuesday 07 February 2012, Alexander Graf wrote: On 07.02.2012, at 07:58, Michael Ellerman wrote: On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote: You're exposing a large, complex kernel subsystem that does very low-level things with the hardware. It's a potential source of

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Arnd Bergmann
On Tuesday 07 February 2012, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen will come out giving us even more new

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Michael Ellerman
On Wed, 2012-02-15 at 22:21 +, Arnd Bergmann wrote: On Tuesday 07 February 2012, Alexander Graf wrote: On 07.02.2012, at 07:58, Michael Ellerman wrote: On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote: You're exposing a large, complex kernel subsystem that does very low-level

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Rusty Russell
On Wed, 15 Feb 2012 15:39:41 +0200, Avi Kivity a...@redhat.com wrote: On 02/07/2012 08:12 PM, Rusty Russell wrote: I would really love to have this, but the problem is that we'd need a general purpose bytecode VM with binding to some kernel APIs. The bytecode VM, if made general enough

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf
On 15.02.2012, at 12:18, Avi Kivity wrote: On 02/07/2012 04:39 PM, Alexander Graf wrote: Syscalls are orthogonal to that - they're to avoid the fget_light() and to tighten the vcpu/thread and vm/process relationship. How about keeping the ioctl interface but moving vcpu_run to a syscall

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Avi Kivity
On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very expensive, yes. We want to have the possibility of exposing a very large TLB to the guest, in the order of multiple kentries. Every

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Alexander Graf
On 15.02.2012, at 14:29, Avi Kivity wrote: On 02/15/2012 01:57 PM, Alexander Graf wrote: Is an extra syscall for copying TLB entries to user space prohibitively expensive? The copying can be very expensive, yes. We want to have the possibility of exposing a very large TLB to the guest,

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-15 Thread Arnd Bergmann
On Tuesday 07 February 2012, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen will come out giving us even more new

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-11 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: Slot searching is quite fast since there's a small number of slots, and we sort the larger ones to be in the front, so positive lookups are fast. We cache negative lookups in the shadow page tables (an spte can be either not mapped, mapped to

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-11 Thread Takuya Yoshikawa
Avi Kivity a...@redhat.com wrote: Slot searching is quite fast since there's a small number of slots, and we sort the larger ones to be in the front, so positive lookups are fast. We cache negative lookups in the shadow page tables (an spte can be either not mapped, mapped to

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-09 Thread Jamie Lokier
Anthony Liguori wrote: The new API will do away with the IOAPIC/PIC/PIT emulation and defer them to userspace. I'm a big fan of this. I agree with getting rid of unnecessary emulations. (Why were those things emulated in the first place?) But it would be good to retain some way to plugin

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-08 Thread Scott Wood
On 02/07/2012 06:28 AM, Anthony Liguori wrote: On 02/06/2012 01:46 PM, Scott Wood wrote: On 02/03/2012 04:52 PM, Anthony Liguori wrote: On 02/03/2012 12:07 PM, Eric Northup wrote: How would the ability to use sys_kvm_* be regulated? Why should it be regulated? It's not a finite or

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-08 Thread Alan Cox
If the fd overhead really is a problem, perhaps the fd could be retained for setup operations, and omitted only on calls that require a vcpu to have been already set up on the current thread? Quite frankly I'd like to have an fd because it means you've got a meaningful way of ensuring that id

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-08 Thread Alan Cox
register_pio_hook_ptr_r(PIO_IDE, SIZE_BYTE,s-cmd[0]); for (i = 1; i 7; i++) { register_pio_hook_ptr_r(PIO_IDE + i, SIZE_BYTE,s-cmd[i]); register_pio_hook_ptr_w(PIO_IDE + i, SIZE_BYTE,s-cmd[i]); } You can't easily serialize updates to that address with the kernel

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-08 Thread Alan Cox
register_pio_hook_ptr_r(PIO_IDE, SIZE_BYTE,s-cmd[0]); for (i = 1; i 7; i++) { register_pio_hook_ptr_r(PIO_IDE + i, SIZE_BYTE,s-cmd[i]); register_pio_hook_ptr_w(PIO_IDE + i, SIZE_BYTE,s-cmd[i]); } You can't easily serialize updates to that address with the kernel

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 07:58, Michael Ellerman wrote: On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote: On 02/03/2012 04:52 PM, Anthony Liguori wrote: On 02/03/2012 12:07 PM, Eric Northup wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivitya...@redhat.com wrote: [...] Moving to syscalls

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/06/2012 07:41 PM, Rob Earhart wrote: I like the ioctl() interface. If the overhead matters in your hot path, I can't say that it's a pressing problem, but it's not negligible. I suspect you're doing it wrong; What am I doing wrong? You the vmm not you the KVM maintainer :-)

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put the kthread to sleep while it waits for the other end to process it. This is effectively equivalent to a heavy weight exit. The difference in cost is dropping to

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that work in this model? It would be a vm-wide syscall. You can also do that on x86 (through KVM_IRQ_LINE). I really do like the ioctl model

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/06/2012 01:46 PM, Scott Wood wrote: On 02/03/2012 04:52 PM, Anthony Liguori wrote: On 02/03/2012 12:07 PM, Eric Northup wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivitya...@redhat.com wrote: [...] Moving to syscalls avoids these problems, but introduces new ones: - adding new

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 02:28 PM, Anthony Liguori wrote: It's a potential source of exploits (from bugs in KVM or in hardware). I can see people wanting to be selective with access because of that. As is true of the rest of the kernel. If you want finer grain access control, that's exactly why we

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 06:40 AM, Avi Kivity wrote: On 02/07/2012 02:28 PM, Anthony Liguori wrote: It's a potential source of exploits (from bugs in KVM or in hardware). I can see people wanting to be selective with access because of that. As is true of the rest of the kernel. If you want finer

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 13:24, Avi Kivity wrote: On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that work in this model? It would be a vm-wide syscall. You can also do that on x86 (through

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 02:51 PM, Alexander Graf wrote: On 07.02.2012, at 13:24, Avi Kivity wrote: On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that work in this model? It would be a vm-wide

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 02:51 PM, Anthony Liguori wrote: On 02/07/2012 06:40 AM, Avi Kivity wrote: On 02/07/2012 02:28 PM, Anthony Liguori wrote: It's a potential source of exploits (from bugs in KVM or in hardware). I can see people wanting to be selective with access because of that. As is true of

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 14:16, Avi Kivity wrote: On 02/07/2012 02:51 PM, Alexander Graf wrote: On 07.02.2012, at 13:24, Avi Kivity wrote: On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 03:40 PM, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen will come out giving us even more new constraints.

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 15:21, Avi Kivity wrote: On 02/07/2012 03:40 PM, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 07:18 AM, Avi Kivity wrote: On 02/07/2012 02:51 PM, Anthony Liguori wrote: On 02/07/2012 06:40 AM, Avi Kivity wrote: On 02/07/2012 02:28 PM, Anthony Liguori wrote: It's a potential source of exploits (from bugs in KVM or in hardware). I can see people wanting to be selective

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put the kthread to sleep while it waits for the other end to process it. This is effectively equivalent to a heavy weight exit. The

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 07:40 AM, Alexander Graf wrote: Why? For the HPET timer register for example, we could have a simple MMIO hook that says on_read: return read_current_time() - shared_page.offset; on_write: handle_in_user_space(); For IDE, it would be as simple as

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 16:23, Anthony Liguori wrote: On 02/07/2012 07:40 AM, Alexander Graf wrote: Why? For the HPET timer register for example, we could have a simple MMIO hook that says on_read: return read_current_time() - shared_page.offset; on_write:

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 05:17 PM, Anthony Liguori wrote: On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put the kthread to sleep while it waits for the other end to process it. This is

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Jan Kiszka
On 2012-02-07 17:02, Avi Kivity wrote: On 02/07/2012 05:17 PM, Anthony Liguori wrote: On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put the kthread to sleep while it waits for

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 10:02 AM, Avi Kivity wrote: On 02/07/2012 05:17 PM, Anthony Liguori wrote: On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put the kthread to sleep while it waits for

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 10:18 AM, Jan Kiszka wrote: On 2012-02-07 17:02, Avi Kivity wrote: On 02/07/2012 05:17 PM, Anthony Liguori wrote: On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds and a future mmio-over-socketpair have to put

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Jan Kiszka
On 2012-02-07 17:21, Anthony Liguori wrote: On 02/07/2012 10:18 AM, Jan Kiszka wrote: On 2012-02-07 17:02, Avi Kivity wrote: On 02/07/2012 05:17 PM, Anthony Liguori wrote: On 02/07/2012 06:03 AM, Avi Kivity wrote: On 02/06/2012 09:11 PM, Anthony Liguori wrote: I'm not so sure. ioeventfds

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote: On 02/07/2012 07:18 AM, Avi Kivity wrote: On 02/07/2012 02:51 PM, Anthony Liguori wrote: On 02/07/2012 06:40 AM, Avi Kivity wrote: On 02/07/2012 02:28 PM, Anthony Liguori wrote: It's a potential source of exploits (from bugs in KVM or in

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Rusty Russell
On Mon, 06 Feb 2012 11:34:01 +0200, Avi Kivity a...@redhat.com wrote: On 02/05/2012 06:36 PM, Anthony Liguori wrote: If userspace had a way to upload bytecode to the kernel that was executed for a PIO operation, it could either pass the operation to userspace or handle it within the kernel

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that work in this model? It would be a vm-wide syscall. You can also do that on x86 (through KVM_IRQ_LINE). I really do like the ioctl model

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 13:24, Avi Kivity wrote: On 02/07/2012 03:08 AM, Alexander Graf wrote: I don't like the idea too much. On s390 and ppc we can set other vcpu's interrupt status. How would that work in this model? It would be a vm-wide syscall. You can also do that on x86 (through

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Avi Kivity
On 02/07/2012 03:40 PM, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen will come out giving us even more new constraints.

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Alexander Graf
On 07.02.2012, at 15:21, Avi Kivity wrote: On 02/07/2012 03:40 PM, Alexander Graf wrote: Not sure we'll ever get there. For PPC, it will probably take another 1-2 years until we get the 32-bit targets stabilized. By then we will have new 64-bit support though. And then the next gen

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Anthony Liguori
On 02/07/2012 07:40 AM, Alexander Graf wrote: Why? For the HPET timer register for example, we could have a simple MMIO hook that says on_read: return read_current_time() - shared_page.offset; on_write: handle_in_user_space(); For IDE, it would be as simple as

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Avi Kivity
On 02/05/2012 06:36 PM, Anthony Liguori wrote: On 02/05/2012 03:51 AM, Gleb Natapov wrote: On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Anthony Liguori
On 02/06/2012 03:34 AM, Avi Kivity wrote: On 02/05/2012 06:36 PM, Anthony Liguori wrote: On 02/05/2012 03:51 AM, Gleb Natapov wrote: On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Avi Kivity
On 02/06/2012 03:33 PM, Anthony Liguori wrote: Look at arch/x86/kvm/i8254.c:pit_ioport_read() for a counterexample. There are also interactions with other devices (for example the apic/ioapic interaction via the apic bus). Hrm, maybe I'm missing it, but the path that would be hot is: if

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Anthony Liguori
On 02/06/2012 07:54 AM, Avi Kivity wrote: On 02/06/2012 03:33 PM, Anthony Liguori wrote: Look at arch/x86/kvm/i8254.c:pit_ioport_read() for a counterexample. There are also interactions with other devices (for example the apic/ioapic interaction via the apic bus). Hrm, maybe I'm missing it,

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Avi Kivity
On 02/06/2012 04:00 PM, Anthony Liguori wrote: Do guests always read an unlatched counter? Doesn't seem reasonable since they can't get a stable count this way. Perhaps. You could have the latching done by writing to persisted scratch memory but then locking becomes an issue. Oh, you'd

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Rob Earhart
On Sun, Feb 5, 2012 at 5:14 AM, Avi Kivity a...@redhat.com wrote: On 02/03/2012 12:13 AM, Rob Earhart wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivity a...@redhat.com mailto:a...@redhat.com wrote:     The kvm api has been accumulating cruft for several years now.      This is     due to

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Anthony Liguori
On 02/06/2012 11:41 AM, Rob Earhart wrote: On Sun, Feb 5, 2012 at 5:14 AM, Avi Kivitya...@redhat.com wrote: On 02/03/2012 12:13 AM, Rob Earhart wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivitya...@redhat.com mailto:a...@redhat.com wrote: The kvm api has been accumulating cruft for

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Scott Wood
On 02/03/2012 04:52 PM, Anthony Liguori wrote: On 02/03/2012 12:07 PM, Eric Northup wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivitya...@redhat.com wrote: [...] Moving to syscalls avoids these problems, but introduces new ones: - adding new syscalls is generally frowned upon, and kvm will

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Alexander Graf
On 03.02.2012, at 03:09, Anthony Liguori wrote: On 02/02/2012 10:09 AM, Avi Kivity wrote: The kvm api has been accumulating cruft for several years now. This is due to feature creep, fixing mistakes, experience gained by the maintainers and developers on how to do things, ports to new

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-06 Thread Michael Ellerman
On Mon, 2012-02-06 at 13:46 -0600, Scott Wood wrote: On 02/03/2012 04:52 PM, Anthony Liguori wrote: On 02/03/2012 12:07 PM, Eric Northup wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivitya...@redhat.com wrote: [...] Moving to syscalls avoids these problems, but introduces new ones:

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-05 Thread Avi Kivity
On 02/03/2012 04:09 AM, Anthony Liguori wrote: Note: this may cause a regression for older guests that don't support MSI or kvmclock. Device assignment will be done using VFIO, that is, without direct kvm involvement. Local APICs will be mandatory, but it will be possible to hide them from

Re: [RFC] Next gen kvm api

2012-02-05 Thread Gleb Natapov
On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently kvm virtualizes or emulates a set of x86 cores, with or without local APICs, a 24-input IOAPIC, a PIC, a PIT, and a number of PCI devices assigned from the host. The API allows emulating the local

Re: [RFC] Next gen kvm api

2012-02-05 Thread Avi Kivity
On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently kvm virtualizes or emulates a set of x86 cores, with or without local APICs, a 24-input IOAPIC, a PIC, a PIT, and a number of PCI devices assigned

Re: [RFC] Next gen kvm api

2012-02-05 Thread Gleb Natapov
On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently kvm virtualizes or emulates a set of x86 cores, with or without local APICs, a

Re: [RFC] Next gen kvm api

2012-02-05 Thread Avi Kivity
On 02/05/2012 11:51 AM, Gleb Natapov wrote: On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently kvm virtualizes or emulates a set of

Re: [RFC] Next gen kvm api

2012-02-05 Thread Gleb Natapov
On Sun, Feb 05, 2012 at 11:56:21AM +0200, Avi Kivity wrote: On 02/05/2012 11:51 AM, Gleb Natapov wrote: On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-05 Thread Avi Kivity
On 02/03/2012 12:13 AM, Rob Earhart wrote: On Thu, Feb 2, 2012 at 8:09 AM, Avi Kivity a...@redhat.com mailto:a...@redhat.com wrote: The kvm api has been accumulating cruft for several years now. This is due to feature creep, fixing mistakes, experience gained by the

Re: [RFC] Next gen kvm api

2012-02-05 Thread Avi Kivity
On 02/05/2012 12:58 PM, Gleb Natapov wrote: Reduced performance is what I mean. Obviously old guests will continue working. I'm not happy about it either. It is not only about old guests either. In RHEL we pretend to not support HPET because when some guests detect it they

Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-05 Thread Anthony Liguori
On 02/05/2012 03:51 AM, Gleb Natapov wrote: On Sun, Feb 05, 2012 at 11:44:43AM +0200, Avi Kivity wrote: On 02/05/2012 11:37 AM, Gleb Natapov wrote: On Thu, Feb 02, 2012 at 06:09:54PM +0200, Avi Kivity wrote: Device model Currently kvm virtualizes or emulates a set of x86 cores,

  1   2   >