ts

2010-03-09 Thread Karl




http://aneqedom.maddsites.com/uqyvewox.html
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-09 Thread Avi Kivity

On 03/08/2010 10:19 PM, Michael Tokarev wrote:

Michael Tokarev wrote:
[]
   

Apparently that does not quite work.  I just re-compiled kvm with
--enable-linux-aio (actually I just installed libaio-dev on debian
and qemu-kvm's configure picked it up automatically), and tried
a guest.  But any I/O fails.
 

It has nothing to do with kvm.  It is compat_ioctl32 in the kernel
wrt aio calls.  Historically I've a 64bit kernel with 32bit userland,
and tried 32bit kvm too, and that does not work.  But 64bit kvm works
just fine with aio, and the performance numbers are indeed better.

   


Can you elaborate?  This sounds like a bug that wants to be fixed.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-09 Thread Avi Kivity

On 03/08/2010 11:27 PM, Nikola Ciprich wrote:

It's faster.
 

Hi Avi,
Could You give some rough estimate on how much faster?
   


The standard it depends on the workload.


I'm stuck with glibc-2.5 now, but I'm always eager to improve performance,
so I wonder if it would make sense to either port eventfd + aio stuff, or
switch to glibc-2.8 for me...
   


Switching to a modern setup should be much easier and safer.  Esp. a 
modern kernel.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-09 Thread Michael Tokarev
Avi Kivity wrote:
 On 03/08/2010 10:19 PM, Michael Tokarev wrote:
 Michael Tokarev wrote:
 []
   
 Apparently that does not quite work.  I just re-compiled kvm with
 --enable-linux-aio (actually I just installed libaio-dev on debian
 and qemu-kvm's configure picked it up automatically), and tried
 a guest.  But any I/O fails.
  
 It has nothing to do with kvm.  It is compat_ioctl32 in the kernel
 wrt aio calls.  Historically I've a 64bit kernel with 32bit userland,
 and tried 32bit kvm too, and that does not work.  But 64bit kvm works
 just fine with aio, and the performance numbers are indeed better.
 
 Can you elaborate?  This sounds like a bug that wants to be fixed.

http://thread.gmane.org/gmane.linux.kernel.aio.general/2891

It's missing compat_ioctl for some of the aio opcodes, namely
it's PREADV and PWRITE - the only ones used by kvm and the only
ones missing in kernel.

As far as i can see, current code converts the iocb array just
fine, but does not touch iovec array used with p{read,write}v.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Portable Ladders - New Catalogue

2010-03-09 Thread Skylax Srl
  
Portable Ladders - New Catalogue



SKYLAX ladders · clean · insulating · antistatic  
Smart Catalogue: http://www.skylax.com/pdf/smartcatalog.pdf (90 pages)


SKYLAX Leitern: elegant · sauber · isoliert · antistatisch
Katalog: http://www.skylax.com/pdf/Katalog.pdf (90 Seiten mit Schnellzugriff)


SKYLAX échelles: élégantes · hygiéniques · isolées · antistatiques
Le Catalogue: http://www.skylax.com/pdf/lecatalogue.pdf (90 pages)


Scale SKYLAX · eleganti · pulite · isolanti · antistatiche
Catalogo Rapido: http://www.skylax.com/pdf/catalogorapido.pdf (90 pagine)



Design, engineering, manufacturing and packaging done exclusively on Italian 
territory

--

SKYLAX -  The healthier way up
Via del Mulino 33, 33030 Coseano (UD) - Italy
Tel: +390-432.951366  - Fax: +390 - 432.951339
Cell: +39-333.2577681 - skype: pierreleblanc
skylax.com  -  skylax.info  -  skylax.biz  -  em...@skylax.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] take srcu lock before call to complete_pio()

2010-03-09 Thread Gleb Natapov
complete_pio() may use slot table which is protected by srcu.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 703f637..3753c11 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4519,7 +4519,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_set_cr8(vcpu, kvm_run-cr8);
 
if (vcpu-arch.pio.cur_count) {
+   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
r = complete_pio(vcpu);
+   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
if (r)
goto out;
}
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-aio usable?

2010-03-09 Thread Avi Kivity

On 03/09/2010 11:19 AM, Michael Tokarev wrote:



Can you elaborate?  This sounds like a bug that wants to be fixed.
 

http://thread.gmane.org/gmane.linux.kernel.aio.general/2891

It's missing compat_ioctl for some of the aio opcodes, namely
it's PREADV and PWRITE - the only ones used by kvm and the only
ones missing in kernel.

As far as i can see, current code converts the iocb array just
fine, but does not touch iovec array used with p{read,write}v.
   


Yikes, looks pretty bad.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Do not create debugfs if fail to create vcpu

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 07:13, Wei Yongjun wrote:

 If fail to create the vcpu, we should not create the debugfs
 for it.
 
 Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com

Good catch.

I guess a goto out kind of construct would be better, but for a single line of 
code this is enough. And whoever adds more lines can put the goto into place.

Acked-by: Alexander Graf ag...@suse.de


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Arnd Bergmann
On Monday 08 March 2010, Cam Macdonell wrote:
 enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
 };
 
 The first two registers are the interrupt mask and status registers.
 Interrupts are triggered when a message is received on the guest's eventfd 
 from
 another VM.  Writing to the 'Doorbell' register is how synchronization 
 messages
 are sent to other VMs.
 
 The IVPosition register is read-only and reports the guest's ID number.  The
 IVLiveList register is also read-only and reports a bit vector of currently
 live VM IDs.
 
 The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
 value which will be written to the destination VM and what the guest status
 register will be set to when the interrupt is trigger is the destination 
 guest.
 A value of 255 in the upper 8-bits will trigger a broadcast where the message
 will be sent to all other guests.

This means you have at least two intercepts for each message:

1. Sender writes to doorbell
2. Receiver gets interrupted

With optionally two more intercepts in order to avoid interrupting the
receiver every time:

3. Receiver masks interrupt in order to process data
4. Receiver unmasks interrupt when it's done and status is no longer pending

I believe you can do much better than this, you combine status and mask
bits, making this level triggered, and move to a bitmask of all guests:

In order to send an interrupt to another guest, the sender first checks
the bit for the receiver. If it's '1', no need for any intercept, the
receiver will come back anyway. If it's zero, write a '1' bit, which
gets OR'd into the bitmask by the host. The receiver gets interrupted
at a raising edge and just leaves the bit on, until it's done processing,
then turns the bit off by writing a '1' into its own location in the mask.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-09 Thread Avi Kivity

On 03/08/2010 08:03 PM, Alexander Graf wrote:

Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.
   


I asked for a KVM_CAP_ for this.  What was the conclusion of that thread?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10] Initialize in-kernel irqchip

2010-03-09 Thread Avi Kivity

On 03/02/2010 08:25 PM, Glauber Costa wrote:

On Tue, Mar 02, 2010 at 01:31:35AM -0300, Marcelo Tosatti wrote:
   

On Fri, Feb 26, 2010 at 05:12:20PM -0300, Glauber Costa wrote:
 

Now that we have all devices set up, this patch initializes the irqchip.
This is dependant on the io-thread, since we need someone to pull ourselves
out of the halted state.
   

I don't understand why - it should work without iothread.
 

with irqchip in kernel, we have to handle halted state in the kernel too.
   


We still exit on signals, same as tcg w/o iothread.

qemu-kvm had irqchip before iothread, IIRC.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 14:00, Avi Kivity wrote:

 On 03/08/2010 08:03 PM, Alexander Graf wrote:
 MOL uses its own hypercall interface to call back into userspace when
 the guest wants to do something.
 
 So let's implement that as an exit reason, specify it with a CAP and
 only really use it when userspace wants us to.
 
 The only user of it so far is MOL.
 
 Signed-off-by: Alexander Grafag...@suse.de
 
 ---
 
 v1 -  v2:
 
   - Add documentation for OSI exit struct
 ---
  Documentation/kvm/api.txt |   13 +
  arch/powerpc/include/asm/kvm_book3s.h |5 +
  arch/powerpc/include/asm/kvm_host.h   |2 ++
  arch/powerpc/kvm/book3s.c |   24 ++--
  arch/powerpc/kvm/powerpc.c|   12 
  include/linux/kvm.h   |6 ++
  6 files changed, 56 insertions(+), 6 deletions(-)
 
 diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
 index 6a19ab6..b2129e8 100644
 --- a/Documentation/kvm/api.txt
 +++ b/Documentation/kvm/api.txt
 @@ -932,6 +932,19 @@ s390 specific.
 
  powerpc specific.
 
 +/* KVM_EXIT_OSI */
 +struct {
 +__u64 gprs[32];
 +} osi;
 +
 +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we 
 catch
 +hypercalls and exit with this exit struct that contains all the guest gprs.
 +
 +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a 
 hypercall.
 +Userspace can now handle the hypercall and when it's done modify the gprs as
 +necessary. Upon guest entry all guest GPRs will then be replaced by the 
 values
 +in this struct.
 +
   
 
 That's migration unsafe.  There may not be next guest entry on this host.

It's as unsafe as MMIO then.

 Is using KVM_[GS]ET_REGS problematic for some reason?

It's two additional ioctls for no good reason. We know the interface, so we can 
model towards it.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-09 Thread Avi Kivity

On 03/09/2010 03:01 PM, Alexander Graf wrote:

On 09.03.2010, at 13:56, Avi Kivity wrote:

   

On 03/08/2010 08:03 PM, Alexander Graf wrote:
 

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.


diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index d170cb4..6a19ab6 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -749,6 +749,21 @@ Writes debug registers into the vcpu.
  See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
  yet and must be cleared on entry.

+4.34 KVM_ENABLE_CAP
+
+Capability: basic

   

Capability: basic means that the feature was present in 2.6.22.  Otherwise you 
need to specify the KVM_CAP_ that presents this feature.

 

+Architectures: all


   

But it's implemented for ppc only (other arches will get ENOTTY).
 

That was the whole idea behind it. if it fails it fails. Nothing we can do 
about it. If it succeeds - great.
   


If KVM_CAP_ENABLE_CAP is present, it means the KVM_ENABLE_CAP ioctl will 
not return ENOTTY (it may return EINVAL if wrong values are present).


ENOTTY means not implemented.  'Architectures: all' means implemented.


+Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI.

   

That needs to be documented.  It also needs to be discoverable separately - we 
can have a kernel with KVM_ENABLE_CAP but without KVM_CAP_PPC_OSI.

btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace.  Please choose 
another namespace.
 

Well I figured it'd be slick to have capabilities get enabled or disabled. 
That's the whole idea behind making it generic. If I wanted a specific 
interface I'd go in and create an ioctl ENABLE_OSI_INTERFACE.
   


Ah, I see.  Well, that makes sense.  Please document it.


But this way the detection if a capability exists can be done using the 
existing CAP detection. It can then be enabled using ENABLE_CAP.
   


Okay, I agree.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 14:11, Avi Kivity wrote:

 On 03/09/2010 03:04 PM, Alexander Graf wrote:
 
 +  /* KVM_EXIT_OSI */
 +  struct {
 +  __u64 gprs[32];
 +  } osi;
 +
 +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we 
 catch
 +hypercalls and exit with this exit struct that contains all the guest 
 gprs.
 +
 +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a 
 hypercall.
 +Userspace can now handle the hypercall and when it's done modify the gprs 
 as
 +necessary. Upon guest entry all guest GPRs will then be replaced by the 
 values
 +in this struct.
 +
 
   
 That's migration unsafe.  There may not be next guest entry on this host.
 
 It's as unsafe as MMIO then.
 
   
 
 From api.txt:
 
 NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
 are complete (and guest state is consistent) only after userspace has
 re-entered the kernel with KVM_RUN.  The kernel side will first finish
 incomplete operations and then check for pending signals.  Userspace
 can re-enter the guest with an unmasked signal pending to complete
 pending operations.
 

Alright - so I add KVM_EXIT_OSI there and be good? :)

 
 Is using KVM_[GS]ET_REGS problematic for some reason?
 
 It's two additional ioctls for no good reason. We know the interface, so we 
 can model towards it.
   
 
 But we need to be migration safe.  If the interface is not heavily used, 
 let's not add complications.

MOL uses OSI calls instead of MMIO. So yes, it is heavily used.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Make sure check_image script runs on VMs turned off

2010-03-09 Thread Lucas Meneghel Rodrigues
As it is hard to guarantee that a qcow2 image will be in a
consistent state with a VM turned on, take an extra safety
step and make sure the preprocessor shuts down the VMs
before the post process command check_image.py runs.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 340b0c0..beae786 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -1049,6 +1049,8 @@ variants:
 post_command =  python scripts/check_image.py;
 remove_image = no
 post_command_timeout = 600
+kill_vm = yes
+kill_vm_gracefully = yes
 - vmdk:
 only Fedora Ubuntu Windows
 only smp2
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Do not create debugfs if fail to create vcpu

2010-03-09 Thread Avi Kivity
On 03/09/2010 08:13 AM, Wei Yongjun wrote:
 If fail to create the vcpu, we should not create the debugfs
 for it.
   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] take srcu lock before call to complete_pio()

2010-03-09 Thread Avi Kivity

On 03/09/2010 12:01 PM, Gleb Natapov wrote:

complete_pio() may use slot table which is protected by srcu.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 13:56, Avi Kivity wrote:

 On 03/08/2010 08:03 PM, Alexander Graf wrote:
 Some times we don't want all capabilities to be available to all
 our vcpus. One example for that is the OSI interface, implemented
 in the next patch.
 
 In order to have a generic mechanism in how to enable capabilities
 individually, this patch introduces a new ioctl that can be used
 for this purpose. That way features we don't want in all guests or
 userspace configurations can just not be enabled and we're good.
 
 
 diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
 index d170cb4..6a19ab6 100644
 --- a/Documentation/kvm/api.txt
 +++ b/Documentation/kvm/api.txt
 @@ -749,6 +749,21 @@ Writes debug registers into the vcpu.
  See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
  yet and must be cleared on entry.
 
 +4.34 KVM_ENABLE_CAP
 +
 +Capability: basic
   
 
 Capability: basic means that the feature was present in 2.6.22.  Otherwise 
 you need to specify the KVM_CAP_ that presents this feature.
 
 +Architectures: all
 
   
 
 But it's implemented for ppc only (other arches will get ENOTTY).

That was the whole idea behind it. if it fails it fails. Nothing we can do 
about it. If it succeeds - great.

 
 +Not all extensions are enabled by default. Using this ioctl the application
 +can enable an extension, making it available to the guest.
 +
 +On systems that do not support this ioctl, it always fails. On systems that
 +do support it, it only works for extensions that are supported for 
 enablement.
 +As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI.
   
 
 That needs to be documented.  It also needs to be discoverable separately - 
 we can have a kernel with KVM_ENABLE_CAP but without KVM_CAP_PPC_OSI.
 
 btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace.  Please choose 
 another namespace.

Well I figured it'd be slick to have capabilities get enabled or disabled. 
That's the whole idea behind making it generic. If I wanted a specific 
interface I'd go in and create an ioctl ENABLE_OSI_INTERFACE.

But this way the detection if a capability exists can be done using the 
existing CAP detection. It can then be enabled using ENABLE_CAP.

 Need to document the structure fields.
 
 
  /*
 @@ -696,6 +705,8 @@ struct kvm_clock_data {
  /* Available with KVM_CAP_DEBUGREGS */
  #define KVM_GET_DEBUGREGS _IOR(KVMIO,  0xa1, struct kvm_debugregs)
  #define KVM_SET_DEBUGREGS _IOW(KVMIO,  0xa2, struct kvm_debugregs)
 +/* No need for CAP, because then it just always fails */
 +#define KVM_ENABLE_CAP_IOW(KVMIO,  0xa3, struct kvm_enable_cap)
   
 The CAPs are needed so you can discover what you have without running guests.

The whole point of this extension was to make CAPs not always enabled, but make 
them possibly enable on demand.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Avi Kivity

On 03/09/2010 03:04 PM, Alexander Graf wrote:



+   /* KVM_EXIT_OSI */
+   struct {
+   __u64 gprs[32];
+   } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+

   

That's migration unsafe.  There may not be next guest entry on this host.
 

It's as unsafe as MMIO then.

   


From api.txt:


NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
are complete (and guest state is consistent) only after userspace has
re-entered the kernel with KVM_RUN.  The kernel side will first finish
incomplete operations and then check for pending signals.  Userspace
can re-enter the guest with an unmasked signal pending to complete
pending operations.




Is using KVM_[GS]ET_REGS problematic for some reason?
 

It's two additional ioctls for no good reason. We know the interface, so we can 
model towards it.
   


But we need to be migration safe.  If the interface is not heavily used, 
let's not add complications.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 13:50, Avi Kivity wrote:

 On 03/08/2010 08:03 PM, Alexander Graf wrote:
 Userspace can tell us that it wants to trigger an interrupt. But
 so far it can't tell us that it wants to stop triggering one.
 
 So let's interpret the parameter to the ioctl that we have anyways
 to tell us if we want to raise or lower the interrupt line.
   
 
 I asked for a KVM_CAP_ for this.  What was the conclusion of that thread?

Uh - did we come to one?

The last thing you said about it was:

 Having individual capabilities makes backporting a lot easier (otherwise you 
 have to backport the whole thing).  If the changes are logically separate, I 
 prefer 500 separate capabilities.
 
 However, for a platform bringup, it's okay to have just one capability, 
 assuming none of the changes are applicable to other platforms.

So I assumed it'd be ok to not have one. If you like I can send an additional 
patch adding the CAP.


Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-09 Thread Avi Kivity

On 03/09/2010 02:54 PM, Alexander Graf wrote:

On 09.03.2010, at 13:50, Avi Kivity wrote:

   

On 03/08/2010 08:03 PM, Alexander Graf wrote:
 

Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

   

I asked for a KVM_CAP_ for this.  What was the conclusion of that thread?
 

Uh - did we come to one?

The last thing you said about it was:

   

Having individual capabilities makes backporting a lot easier (otherwise you 
have to backport the whole thing).  If the changes are logically separate, I 
prefer 500 separate capabilities.

However, for a platform bringup, it's okay to have just one capability, 
assuming none of the changes are applicable to other platforms.
 

So I assumed it'd be ok to not have one. If you like I can send an additional 
patch adding the CAP.

   


Well, what's the capability for this patchset?

Things like if you have KVM_CAP_OSI you can assume you have 
KVM_INTERRUPT_LOWER don't work for me.  A platform cap would be called 
KVM_CAP_MOL and explicitly document everything in there.


And it commits you to not deprecating things individually.  Really, 
individual caps are better.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/7] kvm-tpr-opt cleanups

2010-03-09 Thread Avi Kivity

On 03/09/2010 02:47 AM, Marcelo Tosatti wrote:

Prepare kvm-tpr-opt.c for upstream merge.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Avi Kivity

On 03/09/2010 03:12 PM, Alexander Graf wrote:

On 09.03.2010, at 14:11, Avi Kivity wrote:

   

On 03/09/2010 03:04 PM, Alexander Graf wrote:
 
   

+   /* KVM_EXIT_OSI */
+   struct {
+   __u64 gprs[32];
+   } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+


   

That's migration unsafe.  There may not be next guest entry on this host.

 

It's as unsafe as MMIO then.


   

 From api.txt:

 

NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
are complete (and guest state is consistent) only after userspace has
re-entered the kernel with KVM_RUN.  The kernel side will first finish
incomplete operations and then check for pending signals.  Userspace
can re-enter the guest with an unmasked signal pending to complete
pending operations.
   
 

Alright - so I add KVM_EXIT_OSI there and be good? :)
   


Sure, just verify that the note holds for that case too.


But we need to be migration safe.  If the interface is not heavily used, let's 
not add complications.
 

MOL uses OSI calls instead of MMIO. So yes, it is heavily used.

   


Ok.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] Don't call apic functions directly from kvm code

2010-03-09 Thread Avi Kivity

On 02/26/2010 10:12 PM, Glauber Costa wrote:

It is actually not necessary to call a tpr function to save and load cr8,
as cr8 is part of the processor state, and thus, it is much easier
to just add it to CPUState.

As for apic base, wrap kvm usages, so we can call either the qemu device,
or the in kernel version.


  }

+static void kvm_set_apic_base(CPUState *env, uint64_t val)
+{
+if (!kvm_irqchip_in_kernel())
+cpu_set_apic_base(env, val);
   


What if it is in kernel?  Just ignored?  Doesn't seem right.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/24] KVM: x86 emulator: Use load_segment_descriptor() instead of kvm_load_segment_descriptor()

2010-03-09 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 094d17c..81ecf47 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1506,7 +1506,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt,
if (rc != X86EMUL_CONTINUE)
return rc;
 
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)selector, seg);
+   rc = load_segment_descriptor(ctxt, ops, (u16)selector, seg);
return rc;
 }
 
@@ -1681,7 +1681,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt,
rc = emulate_pop(ctxt, ops, cs, c-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, (u16)cs, VCPU_SREG_CS);
+   rc = load_segment_descriptor(ctxt, ops, (u16)cs, VCPU_SREG_CS);
return rc;
 }
 
@@ -2714,7 +2714,7 @@ special_insn:
if (c-modrm_reg == VCPU_SREG_SS)
toggle_interruptibility(ctxt, 
KVM_X86_SHADOW_INT_MOV_SS);
 
-   rc = kvm_load_segment_descriptor(ctxt-vcpu, sel, c-modrm_reg);
+   rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg);
 
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
@@ -2889,8 +2889,8 @@ special_insn:
goto jmp;
case 0xea: /* jmp far */
jump_far:
-   if (kvm_load_segment_descriptor(ctxt-vcpu, c-src2.val,
-   VCPU_SREG_CS))
+   if (load_segment_descriptor(ctxt, ops, c-src2.val,
+   VCPU_SREG_CS))
goto done;
 
c-eip = c-src.val;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-09 Thread Gleb Natapov
Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |   16 +
 arch/x86/kvm/x86.c |  130 +++
 2 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 032d02f..e881618 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -63,6 +63,15 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
/*
+* write_std: Write bytes of standard (non-emulated/special) memory.
+*Used for descriptor writing.
+*  @addr:  [IN ] Linear address to which to write.
+*  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
+*  @bytes: [IN ] Number of bytes to write to memory.
+*/
+   int (*write_std)(unsigned long addr, void *val,
+unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
+   /*
 * fetch: Read bytes of standard (non-emulated/special) memory.
 *Used for instruction fetch.
 *  @addr:  [IN ] Linear address from which to read.
@@ -108,6 +117,13 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
+   bool (*get_cached_descriptor)(struct desc_struct *desc,
+ int seg, struct kvm_vcpu *vcpu);
+   void (*set_cached_descriptor)(struct desc_struct *desc,
+ int seg, struct kvm_vcpu *vcpu);
+   u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu);
+   void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu);
+   void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 41cf54c..f89502d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3077,6 +3077,18 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t 
addr, int len, void *v)
return kvm_io_bus_read(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
 }
 
+static void kvm_set_segment(struct kvm_vcpu *vcpu,
+   struct kvm_segment *var, int seg)
+{
+   kvm_x86_ops-set_segment(vcpu, var, seg);
+}
+
+void kvm_get_segment(struct kvm_vcpu *vcpu,
+struct kvm_segment *var, int seg)
+{
+   kvm_x86_ops-get_segment(vcpu, var, seg);
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -3157,14 +3169,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void 
*val, unsigned int bytes,
return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
 }
 
-static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
-   struct kvm_vcpu *vcpu, u32 *error)
+static int kvm_write_guest_virt_helper(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 access,
+  u32 *error)
 {
void *data = val;
int r = X86EMUL_CONTINUE;
 
+   access |= PFERR_WRITE_MASK;
+
while (bytes) {
-   gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
+   gpa_t gpa =  vcpu-arch.mmu.gva_to_gpa(vcpu, addr, access, 
error);
unsigned offset = addr  (PAGE_SIZE-1);
unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
@@ -3187,6 +3203,19 @@ out:
return r;
 }
 
+static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
+   struct kvm_vcpu *vcpu, u32 *error)
+{
+   u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, 
error);
+}
+
+static int kvm_write_guest_virt_system(gva_t addr, void *val,
+  unsigned int bytes,
+  struct kvm_vcpu *vcpu, u32 *error)
+{
+   return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
+}
 
 static int emulator_read_emulated(unsigned long addr,
  void *val,
@@ -3453,12 +3482,95 @@ static void emulator_set_cr(int cr, unsigned long val, 
struct kvm_vcpu *vcpu)
}
 }
 
+static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu)
+{
+   kvm_x86_ops-get_gdt(vcpu, dt);
+}
+
+static bool 

[PATCH 14/24] KVM: x86 emulator: cleanup grp3 return value

2010-03-09 Thread Gleb Natapov
When x86_emulate_insn() does not know how to emulate instruction it
exits via cannot_emulate label in all cases except when emulating
grp3. Fix that.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 018abb3..6e2b34b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1394,7 +1394,6 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
-   int rc = X86EMUL_CONTINUE;
 
switch (c-modrm_reg) {
case 0 ... 1:   /* test */
@@ -1407,11 +1406,9 @@ static inline int emulate_grp3(struct x86_emulate_ctxt 
*ctxt,
emulate_1op(neg, c-dst, ctxt-eflags);
break;
default:
-   DPRINTF(Cannot emulate %02x\n, c-b);
-   rc = X86EMUL_UNHANDLEABLE;
-   break;
+   return 0;
}
-   return rc;
+   return 1;
 }
 
 static inline int emulate_grp45(struct x86_emulate_ctxt *ctxt,
@@ -2370,9 +2367,8 @@ special_insn:
c-dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf6 ... 0xf7: /* Grp3 */
-   rc = emulate_grp3(ctxt, ops);
-   if (rc != X86EMUL_CONTINUE)
-   goto done;
+   if (!emulate_grp3(ctxt, ops))
+   goto cannot_emulate;
break;
case 0xf8: /* clc */
ctxt-eflags = ~EFLG_CF;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/24] KVM: x86 emulator: fix mov dr to inject #UD when needed.

2010-03-09 Thread Gleb Natapov
If CR4.DE=1 access to registers DR4/DR5 cause #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5ba082a..dcb9720 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2527,9 +2527,12 @@ twobyte_insn:
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops-get_cr(4, ctxt-vcpu)  X86_CR4_DE) 
+   (c-modrm_reg == 4 || c-modrm_reg == 5)) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
@@ -2537,9 +2540,12 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
-   goto cannot_emulate;
-   rc = X86EMUL_CONTINUE;
+   if ((ops-get_cr(4, ctxt-vcpu)  X86_CR4_DE) 
+   (c-modrm_reg == 4 || c-modrm_reg == 5)) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
+   emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/24] KVM: x86 emulator: inject #UD on access to non-existing CR

2010-03-09 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 54e62dc..5ba082a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2516,6 +2516,13 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
+   switch (c-modrm_reg) {
+   case 1:
+   case 5 ... 7:
+   case 9 ... 15:
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
+   }
c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/24] KVM: x86 emulator: fix mov r/m, sreg emulation.

2010-03-09 Thread Gleb Natapov
mov r/m, sreg generates #UD ins sreg is incorrect.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2cc9ef4..2df510b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2122,12 +2122,11 @@ special_insn:
case 0x8c: { /* mov r/m, sreg */
struct kvm_segment segreg;
 
-   if (c-modrm_reg = 5)
+   if (c-modrm_reg = VCPU_SREG_GS)
kvm_get_segment(ctxt-vcpu, segreg, c-modrm_reg);
else {
-   printk(KERN_INFO 0x8c: Invalid segreg in modrm byte 
0x%02x\n,
-  c-modrm);
-   goto cannot_emulate;
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
}
c-dst.val = segreg.selector;
break;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/24] KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.

2010-03-09 Thread Gleb Natapov
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |8 
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1a32b78..54e62dc 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2516,28 +2516,20 @@ twobyte_insn:
c-dst.type = OP_NONE;
break;
case 0x20: /* mov cr, reg */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
-   if (c-modrm_mod != 3)
-   goto cannot_emulate;
if (emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]))
goto cannot_emulate;
rc = X86EMUL_CONTINUE;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/24] KVM: Provide current CPL as part of emulator context.

2010-03-09 Thread Gleb Natapov
Eliminate the need to call back into KVM to get it from emulator.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |6 +++---
 arch/x86/kvm/x86.c |1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0c5caa4..d8b2da0 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -159,6 +159,7 @@ struct x86_emulate_ctxt {
struct kvm_vcpu *vcpu;
 
unsigned long eflags;
+   int cpl;
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
int mode;
u32 cs_base;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3d1ee74..ed29a52 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1254,7 +1254,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
int rc;
unsigned long val, change_mask;
int iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  IOPL_SHIFT;
-   int cpl = kvm_x86_ops-get_cpl(ctxt-vcpu);
+   int cpl = ctxt-cpl;
 
rc = emulate_pop(ctxt, ops, val, len);
if (rc != X86EMUL_CONTINUE)
@@ -1763,7 +1763,7 @@ static bool emulator_bad_iopl(struct x86_emulate_ctxt 
*ctxt)
if (ctxt-mode == X86EMUL_MODE_VM86)
return true;
iopl = (ctxt-eflags  X86_EFLAGS_IOPL)  IOPL_SHIFT;
-   return kvm_x86_ops-get_cpl(ctxt-vcpu)  iopl;
+   return ctxt-cpl  iopl;
 }
 
 static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
@@ -1839,7 +1839,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* Privileged instruction can be executed only in CPL=0 */
-   if ((c-d  Priv)  kvm_x86_ops-get_cpl(ctxt-vcpu)) {
+   if ((c-d  Priv)  ctxt-cpl) {
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f2b61c..9b5fb43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
vcpu-arch.emulate_ctxt.vcpu = vcpu;
vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
+   vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu);
vcpu-arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/24] KVM: x86 emulator: remove saved_eip

2010-03-09 Thread Gleb Natapov
c-eip is never written back in case of emulation failure, so no need to
set it to old value.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 505dfba..ba1ce61 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2377,7 +2377,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 {
unsigned long memop = 0;
u64 msr_data;
-   unsigned long saved_eip = 0;
struct decode_cache *c = ctxt-decode;
unsigned int port;
int io_dir_in;
@@ -2391,7 +2390,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 */
 
memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
-   saved_eip = c-eip;
 
if (ctxt-mode == X86EMUL_MODE_PROT64  (c-d  No64)) {
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
@@ -2983,11 +2981,7 @@ writeback:
kvm_rip_write(ctxt-vcpu, c-eip);
 
 done:
-   if (rc == X86EMUL_UNHANDLEABLE) {
-   c-eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 
 twobyte_insn:
switch (c-b) {
@@ -3264,6 +3258,5 @@ twobyte_insn:
 
 cannot_emulate:
DPRINTF(Cannot emulate %02x\n, c-b);
-   c-eip = saved_eip;
return -1;
 }
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/24] KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.

2010-03-09 Thread Gleb Natapov
If LOCK prefix is used dest arg should be memory, otherwise instruction
should generate #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 45ded7f..018abb3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1838,7 +1838,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
/* LOCK prefix is allowed only with some instructions */
-   if (c-lock_prefix  !(c-d  Lock)) {
+   if (c-lock_prefix  (!(c-d  Lock) || c-dst.type != OP_MEM)) {
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
goto done;
}
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/24] KVM: x86 emulator: do not call writeback if msr access fails.

2010-03-09 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6381df9..45ded7f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2559,7 +2559,7 @@ twobyte_insn:
| ((u64)c-regs[VCPU_REGS_RDX]  32);
if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = ctxt-eip;
+   goto done;
}
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;
@@ -2568,7 +2568,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) 
{
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = ctxt-eip;
+   goto done;
} else {
c-regs[VCPU_REGS_RAX] = (u32)msr_data;
c-regs[VCPU_REGS_RDX] = msr_data  32;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/24] KVM: x86 emulator: fix in/out emulation.

2010-03-09 Thread Gleb Natapov
in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |7 ++
 arch/x86/kvm/emulate.c |   17 ++--
 arch/x86/kvm/svm.c |   22 +
 arch/x86/kvm/vmx.c |   19 +---
 arch/x86/kvm/x86.c |  203 +---
 5 files changed, 139 insertions(+), 129 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 4268330..7d323d5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -119,6 +119,13 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
+
+   int (*pio_in_emulated)(int size, unsigned short port, void *val,
+  unsigned int count, struct kvm_vcpu *vcpu);
+
+   int (*pio_out_emulated)(int size, unsigned short port, const void *val,
+   unsigned int count, struct kvm_vcpu *vcpu);
+
bool (*get_cached_descriptor)(struct desc_struct *desc,
  int seg, struct kvm_vcpu *vcpu);
void (*set_cached_descriptor)(struct desc_struct *desc,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 81ecf47..0ec7b9b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -208,12 +208,12 @@ static u32 opcode_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0,
/* 0xE0 - 0xE7 */
0, 0, 0, 0,
-   ByteOp | SrcImmUByte, SrcImmUByte,
+   ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
ByteOp | SrcImmUByte, SrcImmUByte,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
-   SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
+   SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
/* 0xF0 - 0xF7 */
0, 0, 0, 0,
@@ -2915,12 +2915,13 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio(ctxt-vcpu, io_dir_in,
-  (c-d  ByteOp) ? 1 : c-op_bytes,
-  port) != 0) {
-   c-eip = saved_eip;
-   goto cannot_emulate;
-   }
+   if (io_dir_in)
+   ops-pio_in_emulated((c-d  ByteOp) ? 1 : c-op_bytes,
+port, c-dst.val, 1, ctxt-vcpu);
+   else
+   ops-pio_out_emulated((c-d  ByteOp) ? 1 : c-op_bytes,
+ port, c-regs[VCPU_REGS_RAX], 1,
+ ctxt-vcpu);
break;
case 0xf4:  /* hlt */
ctxt-vcpu-arch.halt_request = 1;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index def4877..315e8a8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm)
 
 static int io_interception(struct vcpu_svm *svm)
 {
-   u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */
-   int size, in, string;
-   unsigned port;
-
++svm-vcpu.stat.io_exits;
 
-   svm-next_rip = svm-vmcb-control.exit_info_2;
-
-   string = (io_info  SVM_IOIO_STR_MASK) != 0;
-
-   if (string) {
-   if (emulate_instruction(svm-vcpu,
-   0, 0, 0) == EMULATE_DO_MMIO)
-   return 0;
-   return 1;
-   }
-
-   in = (io_info  SVM_IOIO_TYPE_MASK) != 0;
-   port = io_info  16;
-   size = (io_info  SVM_IOIO_SIZE_MASK)  SVM_IOIO_SIZE_SHIFT;
-
-   skip_emulated_instruction(svm-vcpu);
-   return kvm_emulate_pio(svm-vcpu, in, size, port);
+   return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO);
 }
 
 static int nmi_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae3217d..7f33d8e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2974,26 +2974,9 @@ static int handle_triple_fault(struct kvm_vcpu *vcpu)
 
 static int handle_io(struct kvm_vcpu *vcpu)
 {
-   unsigned long exit_qualification;
-   int size, in, string;
-   unsigned port;
-
++vcpu-stat.io_exits;
-   

[PATCH 11/24] KVM: x86 emulator: fix return values of syscall/sysenter/sysexit emulations

2010-03-09 Thread Gleb Natapov
Return X86EMUL_PROPAGATE_FAULT is fault was injected. Also inject #UD
for those instruction when appropriate.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dcb9720..6381df9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1597,8 +1597,11 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
u64 msr_data;
 
/* syscall is not available in real mode */
-   if (ctxt-mode == X86EMUL_MODE_REAL || ctxt-mode == X86EMUL_MODE_VM86)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt-mode == X86EMUL_MODE_REAL ||
+   ctxt-mode == X86EMUL_MODE_VM86) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, cs, ss);
 
@@ -1648,14 +1651,16 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
/* inject #GP if in real mode */
if (ctxt-mode == X86EMUL_MODE_REAL) {
kvm_inject_gp(ctxt-vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
/* XXX sysenter/sysexit have not been tested in 64bit mode.
* Therefore, we inject an #UD.
*/
-   if (ctxt-mode == X86EMUL_MODE_PROT64)
-   return X86EMUL_UNHANDLEABLE;
+   if (ctxt-mode == X86EMUL_MODE_PROT64) {
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
 
setup_syscalls_segments(ctxt, cs, ss);
 
@@ -1710,7 +1715,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
if (ctxt-mode == X86EMUL_MODE_REAL ||
ctxt-mode == X86EMUL_MODE_VM86) {
kvm_inject_gp(ctxt-vcpu, 0);
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_PROPAGATE_FAULT;
}
 
setup_syscalls_segments(ctxt, cs, ss);
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.


+   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
+   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
  };
   


Note, passing a vcpu means we are still tightly coupled to kvm.  Can be 
fixed later.



+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
+{
+   unsigned long value;
+
+   switch (cr) {
+   case 0:
+   value = kvm_read_cr0(vcpu);
+   break;
+   case 2:
+   value = vcpu-arch.cr2;
+   break;
+   case 3:
+   value = vcpu-arch.cr3;
+   break;
+   case 4:
+   value = kvm_read_cr4(vcpu);
+   break;
+   case 8:
+   value = kvm_get_cr8(vcpu);
+   break;
+   default:
+   vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
+   return 0;
   


This printk is triggerable by guest code (as the patch didn't introduce 
this, it can be fixed later).


The emulator should #UD on unrecognised control registers.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/24] KVM: remove realmode_lmsw function.

2010-03-09 Thread Gleb Natapov
Use (get|set)_cr callback to emulate lmsw inside emulator.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/emulate.c  |4 ++--
 arch/x86/kvm/x86.c  |7 ---
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e8e108a..1e15a0a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -582,8 +582,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 void kvm_report_emulation_failure(struct kvm_vcpu *cvpu, const char *context);
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
 void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags);
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d515795..3d1ee74 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2483,8 +2483,8 @@ twobyte_insn:
c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
-   realmode_lmsw(ctxt-vcpu, (u16)c-src.val,
- ctxt-eflags);
+   ops-set_cr(0, (ops-get_cr(0, ctxt-vcpu)  ~0x0ful) |
+   (c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 7: /* invlpg*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7b62ef2..8f2b61c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4072,13 +4072,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 limit, 
unsigned long base)
kvm_x86_ops-set_idt(vcpu, dt);
 }
 
-void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
-  unsigned long *rflags)
-{
-   kvm_lmsw(vcpu, msw);
-   *rflags = kvm_get_rflags(vcpu);
-}
-
 static int move_to_next_stateful_cpuid_entry(struct kvm_vcpu *vcpu, int i)
 {
struct kvm_cpuid_entry2 *e = vcpu-arch.cpuid_entries[i];
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/24] [RFC] emulator cleanup

2010-03-09 Thread Gleb Natapov
This is the first series of patches that tries to cleanup emulator code.
This is mix of bug fixes and moving code that does emulation from x86.c
to emulator.c while making it KVM independent. The status of the patches:
works for me. realtime.flat test now also pass where it failed before.

Gleb Natapov (24):
  KVM: Remove pointer to rflags from realmode_set_cr parameters.
  KVM: Provide callback to get/set control registers in emulator ops.
  KVM: remove realmode_lmsw function.
  KVM: Provide current CPL as part of emulator context.
  KVM: Provide current eip as part of emulator context.
  KVM: x86 emulator: fix mov r/m, sreg emulation.
  KVM: x86 emulator: fix 0f 01 /5 emulation
  KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
  KVM: x86 emulator: inject #UD on access to non-existing CR
  KVM: x86 emulator: fix mov dr to inject #UD when needed.
  KVM: x86 emulator: fix return values of syscall/sysenter/sysexit
emulations
  KVM: x86 emulator: do not call writeback if msr access fails.
  KVM: x86 emulator: If LOCK prefix is used dest arg should be memory.
  KVM: x86 emulator: cleanup grp3 return value
  KVM: x86 emulator: Provide more callbacks for x86 emulator.
  KVM: x86 emulator: Emulate task switch in emulator.c
  KVM: x86 emulator: Use load_segment_descriptor() instead of
kvm_load_segment_descriptor()
  KVM: Use task switch from emulator.c
  KVM: x86 emulator: fix in/out emulation.
  KVM: x86 emulator: Move string pio emulation into emulator.c
  KVM: x86 emulator: remove saved_eip
  KVM: x86 emulator: restart string instruction without going back to a
guest.
  KVM: x86 emulator: introduce pio in string read ahead.
  KVM: small kvm_arch_vcpu_ioctl_run() cleanup.

 arch/x86/include/asm/kvm_emulate.h |   41 ++-
 arch/x86/include/asm/kvm_host.h|   10 -
 arch/x86/kvm/emulate.c |  813 +++
 arch/x86/kvm/svm.c |   22 +-
 arch/x86/kvm/vmx.c |   19 +-
 arch/x86/kvm/x86.c | 1112 +---
 6 files changed, 1016 insertions(+), 1001 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/24] KVM: Use task switch from emulator.c

2010-03-09 Thread Gleb Natapov
Remove old task switch code from x86.c

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/x86.c |  558 ++--
 1 files changed, 18 insertions(+), 540 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f89502d..5171696 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4798,553 +4798,31 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu 
*vcpu,
return 0;
 }
 
-static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector,
-  struct kvm_segment *kvm_desct)
-{
-   kvm_desct-base = get_desc_base(seg_desc);
-   kvm_desct-limit = get_desc_limit(seg_desc);
-   if (seg_desc-g) {
-   kvm_desct-limit = 12;
-   kvm_desct-limit |= 0xfff;
-   }
-   kvm_desct-selector = selector;
-   kvm_desct-type = seg_desc-type;
-   kvm_desct-present = seg_desc-p;
-   kvm_desct-dpl = seg_desc-dpl;
-   kvm_desct-db = seg_desc-d;
-   kvm_desct-s = seg_desc-s;
-   kvm_desct-l = seg_desc-l;
-   kvm_desct-g = seg_desc-g;
-   kvm_desct-avl = seg_desc-avl;
-   if (!selector)
-   kvm_desct-unusable = 1;
-   else
-   kvm_desct-unusable = 0;
-   kvm_desct-padding = 0;
-}
-
-static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu,
- u16 selector,
- struct desc_ptr *dtable)
-{
-   if (selector  1  2) {
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, kvm_seg, VCPU_SREG_LDTR);
-
-   if (kvm_seg.unusable)
-   dtable-size = 0;
-   else
-   dtable-size = kvm_seg.limit;
-   dtable-address = kvm_seg.base;
-   }
-   else
-   kvm_x86_ops-get_gdt(vcpu, dtable);
-}
-
-/* allowed just for 8 bytes segments */
-static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector  3;
-   int ret;
-   u32 err;
-   gva_t addr;
-
-   get_segment_descriptor_dtable(vcpu, selector, dtable);
-
-   if (dtable.size  index * 8 + 7) {
-   kvm_queue_exception_e(vcpu, GP_VECTOR, selector  0xfffc);
-   return X86EMUL_PROPAGATE_FAULT;
-   }
-   addr = dtable.address + index * 8;
-   ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc),
-vcpu,  err);
-   if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(vcpu, addr, err);
-
-   return ret;
-}
-
-/* allowed just for 8 bytes segments */
-static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-struct desc_struct *seg_desc)
-{
-   struct desc_ptr dtable;
-   u16 index = selector  3;
-
-   get_segment_descriptor_dtable(vcpu, selector, dtable);
-
-   if (dtable.size  index * 8 + 7)
-   return 1;
-   return kvm_write_guest_virt(dtable.address + index*8, seg_desc, 
sizeof(*seg_desc), vcpu, NULL);
-}
-
-static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu,
-  struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL);
-}
-
-static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
-struct desc_struct *seg_desc)
-{
-   u32 base_addr = get_desc_base(seg_desc);
-
-   return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
-}
-
-static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
-{
-   struct kvm_segment kvm_seg;
-
-   kvm_get_segment(vcpu, kvm_seg, seg);
-   return kvm_seg.selector;
-}
-
-static int kvm_load_realmode_segment(struct kvm_vcpu *vcpu, u16 selector, int 
seg)
-{
-   struct kvm_segment segvar = {
-   .base = selector  4,
-   .limit = 0x,
-   .selector = selector,
-   .type = 3,
-   .present = 1,
-   .dpl = 3,
-   .db = 0,
-   .s = 1,
-   .l = 0,
-   .g = 0,
-   .avl = 0,
-   .unusable = 0,
-   };
-   kvm_x86_ops-set_segment(vcpu, segvar, seg);
-   return X86EMUL_CONTINUE;
-}
-
-static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg)
-{
-   return (seg != VCPU_SREG_LDTR) 
-   (seg != VCPU_SREG_TR) 
-   (kvm_get_rflags(vcpu)  X86_EFLAGS_VM);
-}
-
-int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg)
-{
-   struct kvm_segment kvm_seg;
-   struct desc_struct seg_desc;
-   u8 dpl, rpl, cpl;
-   unsigned err_vec = GP_VECTOR;
-   u32 err_code = 0;
-   bool 

[PATCH 20/24] KVM: x86 emulator: Move string pio emulation into emulator.c

2010-03-09 Thread Gleb Natapov
Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |5 -
 arch/x86/kvm/emulate.c  |   61 ++--
 arch/x86/kvm/x86.c  |  204 +++
 3 files changed, 45 insertions(+), 225 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1e15a0a..8507b22 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -224,14 +224,9 @@ struct kvm_pv_mmu_op_buffer {
 
 struct kvm_pio_request {
unsigned long count;
-   int cur_count;
-   gva_t guest_gva;
int in;
int port;
int size;
-   int string;
-   int down;
-   int rep;
 };
 
 /*
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0ec7b9b..505dfba 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -151,8 +151,8 @@ static u32 opcode_table[256] = {
0, 0, 0, 0,
/* 0x68 - 0x6F */
SrcImm | Mov | Stack, 0, SrcImmByte | Mov | Stack, 0,
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* insb, 
insw/insd */
-   SrcNone  | ByteOp  | ImplicitOps, SrcNone  | ImplicitOps, /* outsb, 
outsw/outsd */
+   DstMem | ByteOp | Mov | String, DstMem | Mov | String, /* insb, 
insw/insd */
+   SrcMem | ByteOp | ImplicitOps | String, SrcMem | ImplicitOps | String, 
/* outsb, outsw/outsd */
/* 0x70 - 0x77 */
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
SrcImmByte, SrcImmByte, SrcImmByte, SrcImmByte,
@@ -2439,7 +2439,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
goto done;
}
}
-   c-regs[VCPU_REGS_RCX]--;
+   if (c-src.type == OP_MEM)
+   memop = register_address(c, seg_override_base(ctxt, c),
+c-regs[VCPU_REGS_RSI]);
+   if (c-dst.type == OP_MEM)
+   memop = register_address(c, es_base(ctxt),
+c-regs[VCPU_REGS_RDI]);
c-eip = ctxt-eip;
}
 
@@ -2596,20 +2601,14 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt-vcpu,
-   1,
-   (c-d  ByteOp) ? 1 : c-op_bytes,
-   c-rep_prefix ?
-   address_mask(c, c-regs[VCPU_REGS_RCX]) : 1,
-   (ctxt-eflags  EFLG_DF),
-   register_address(c, es_base(ctxt),
-c-regs[VCPU_REGS_RDI]),
-   c-rep_prefix,
-   c-regs[VCPU_REGS_RDX]) == 0) {
-   c-eip = saved_eip;
-   return -1;
-   }
-   return 0;
+   if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX],
+ c-dst.val, 1, ctxt-vcpu))
+   goto done; /* IO is needed, skip writeback */
+
+   register_address_increment(c, c-regs[VCPU_REGS_RDI],
+  (ctxt-eflags  EFLG_DF) ?
+  -c-dst.bytes : c-dst.bytes);
+   break;
case 0x6e:  /* outsb */
case 0x6f:  /* outsw/outsd */
if (!emulator_io_permited(ctxt, ops, c-regs[VCPU_REGS_RDX],
@@ -2617,21 +2616,14 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (kvm_emulate_pio_string(ctxt-vcpu,
-   0,
-   (c-d  ByteOp) ? 1 : c-op_bytes,
-   c-rep_prefix ?
-   address_mask(c, c-regs[VCPU_REGS_RCX]) : 1,
-   (ctxt-eflags  EFLG_DF),
-register_address(c,
- seg_override_base(ctxt, c),
-c-regs[VCPU_REGS_RSI]),
-   c-rep_prefix,
-   c-regs[VCPU_REGS_RDX]) == 0) {
-   c-eip = saved_eip;
-   return -1;
-   }
-   return 0;
+  

[PATCH 23/24] KVM: x86 emulator: introduce pio in string read ahead.

2010-03-09 Thread Gleb Natapov
To optimize rep ins instruction do IO in big chunks ahead of time
instead of doing it only when required during instruction emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |7 +++
 arch/x86/kvm/emulate.c |   34 ++
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f74b4ad..da7a711 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -150,6 +150,12 @@ struct fetch_cache {
unsigned long end;
 };
 
+struct read_cache {
+   u8 data[1024];
+   unsigned long pos;
+   unsigned long end;
+};
+
 struct decode_cache {
u8 twobyte;
u8 b;
@@ -177,6 +183,7 @@ struct decode_cache {
void *modrm_ptr;
unsigned long modrm_val;
struct fetch_cache fetch;
+   struct read_cache io_read;
 };
 
 struct x86_emulate_ctxt {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 76ed77d..987be2a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1222,6 +1222,28 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  unsigned int size, unsigned short port,
+  void *dest, unsigned int count)
+{
+   struct read_cache *mc = ctxt-decode.io_read;
+
+   if (mc-pos == mc-end) { /* refill pio read ahead */
+   unsigned int n = sizeof(mc-data) / size;
+   n = min(n, count);
+   mc-pos = mc-end = 0;
+   if (!ops-pio_in_emulated(size, port, mc-data, n,
+ ctxt-vcpu))
+   return 0;
+   mc-end = n * size;
+   }
+
+   memcpy(dest, mc-data + mc-pos, size);
+   mc-pos += size;
+   return 1;
+}
+
 static u32 desc_limit_scaled(struct desc_struct *desc)
 {
u32 limit = get_desc_limit(desc);
@@ -2601,8 +2623,11 @@ special_insn:
kvm_inject_gp(ctxt-vcpu, 0);
goto done;
}
-   if (!ops-pio_in_emulated(c-dst.bytes, c-regs[VCPU_REGS_RDX],
- c-dst.val, 1, ctxt-vcpu))
+   if (c-rep_prefix)
+   ctxt-restart = true;
+   if (!pio_in_emulated(ctxt, ops, c-dst.bytes,
+c-regs[VCPU_REGS_RDX], c-dst.val,
+c-rep_prefix ? c-regs[VCPU_REGS_RCX] : 
1))
goto done; /* IO is needed, skip writeback */
 
register_address_increment(c, c-regs[VCPU_REGS_RDI],
@@ -2908,8 +2933,9 @@ special_insn:
goto done;
}
if (io_dir_in)
-   ops-pio_in_emulated((c-d  ByteOp) ? 1 : c-op_bytes,
-port, c-dst.val, 1, ctxt-vcpu);
+   pio_in_emulated(ctxt, ops,
+   (c-d  ByteOp) ? 1 : c-op_bytes,
+   port, c-dst.val, 1);
else
ops-pio_out_emulated((c-d  ByteOp) ? 1 : c-op_bytes,
  port, c-regs[VCPU_REGS_RAX], 1,
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-09 Thread Gleb Natapov
It is undefined and should generate #UD.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/emulate.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2df510b..1a32b78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,6 +2486,9 @@ twobyte_insn:
(c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
+   case 5: /* not defined */
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
case 7: /* invlpg*/
emulate_invlpg(ctxt-vcpu, memop);
/* Disable writeback. */
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/24] KVM: x86 emulator: Emulate task switch in emulator.c

2010-03-09 Thread Gleb Natapov
Implement emulation of 16/32 bit task switch in emulator.c

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |5 +
 arch/x86/kvm/emulate.c |  564 
 2 files changed, 569 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index e881618..4268330 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -11,6 +11,8 @@
 #ifndef _ASM_X86_KVM_X86_EMULATE_H
 #define _ASM_X86_KVM_X86_EMULATE_H
 
+#include asm/desc_defs.h
+
 struct x86_emulate_ctxt;
 
 /*
@@ -210,5 +212,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops);
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
 struct x86_emulate_ops *ops);
+int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 tss_selector, int reason);
 
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6e2b34b..094d17c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -33,6 +33,7 @@
 #include asm/kvm_emulate.h
 
 #include x86.h
+#include tss.h
 
 /*
  * Opcode effective-address decode tables.
@@ -1218,6 +1219,199 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static u32 desc_limit_scaled(struct desc_struct *desc)
+{
+   u32 limit = get_desc_limit(desc);
+
+   return desc-g ? (limit  12) | 0xfff : limit;
+}
+
+static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+u16 selector, struct desc_ptr *dt)
+{
+   if (selector  1  2) {
+   struct desc_struct desc;
+   memset (dt, 0, sizeof *dt);
+   if(!ops-get_cached_descriptor(desc, VCPU_SREG_LDTR, 
ctxt-vcpu))
+   return;
+
+   dt-size = desc_limit_scaled(desc); /* what if limit  65535? 
*/
+   dt-address = get_desc_base(desc);
+   }
+   else
+   ops-get_gdt(dt, ctxt-vcpu);
+}
+
+/* allowed just for 8 bytes segments */
+static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector  3;
+   int ret;
+   u32 err;
+   ulong addr;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, dt);
+
+   if (dt.size  index * 8 + 7) {
+   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+   addr = dt.address + index * 8;
+   ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu,  err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+
+   return ret;
+}
+
+/* allowed just for 8 bytes segments */
+static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops,
+   u16 selector, struct desc_struct *desc)
+{
+   struct desc_ptr dt;
+   u16 index = selector  3;
+   u32 err;
+   ulong addr;
+   int ret;
+
+   get_descriptor_table_ptr(ctxt, ops, selector, dt);
+
+   if (dt.size  index * 8 + 7) {
+   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
+
+   addr = dt.address + index * 8;
+   ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+
+   return ret;
+}
+
+static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
+  u16 selector, int seg)
+{
+   struct desc_struct seg_desc;
+   u8 dpl, rpl, cpl;
+   unsigned err_vec = GP_VECTOR;
+   u32 err_code = 0;
+   bool null_selector = !(selector  ~0x3); /* -0003 are null */
+   int ret;
+
+   memset(seg_desc, 0, sizeof seg_desc);
+
+   if ((seg = VCPU_SREG_GS  ctxt-mode == X86EMUL_MODE_VM86)
+   || ctxt-mode == X86EMUL_MODE_REAL) {
+   /* set real mode segment descriptor */
+   set_desc_base(seg_desc, selector  4);
+   set_desc_limit(seg_desc, 0x);
+   seg_desc.type = 3;
+   seg_desc.p = 1;
+   seg_desc.s = 1;
+   goto load;
+   }
+
+   /* NULL selector is not valid for TR, CS and SS */
+   if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR)
+null_selector)
+   goto exception;
+
+   /* TR should be in 

[PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-09 Thread Gleb Natapov
Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |   22 --
 arch/x86/kvm/x86.c |   16 +++-
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 7d323d5..f74b4ad 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -193,6 +193,7 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
+   bool restart; /* restart string instruction after writeback */
/* decode cache */
struct decode_cache decode;
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ba1ce61..76ed77d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -925,8 +925,11 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int mode = ctxt-mode;
int def_op_bytes, def_ad_bytes, group;
 
-   /* Shadow copy of register state. Committed on successful emulation. */
 
+   /* we cannot decode insn before we complete previous rep insn */
+   WARN_ON(ctxt-restart);
+
+   /* Shadow copy of register state. Committed on successful emulation. */
memset(c, 0, sizeof(struct decode_cache));
c-eip = ctxt-eip;
ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS);
@@ -2412,8 +2415,11 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
memop = c-modrm_ea;
 
if (c-rep_prefix  (c-d  String)) {
+   ctxt-restart = true;
/* All REP prefixes have the same first termination condition */
if (c-regs[VCPU_REGS_RCX] == 0) {
+   string_done:
+   ctxt-restart = false;
kvm_rip_write(ctxt-vcpu, c-eip);
goto done;
}
@@ -2425,17 +2431,13 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 *  - if REPNE/REPNZ and ZF = 1 then done
 */
if ((c-b == 0xa6) || (c-b == 0xa7) ||
-   (c-b == 0xae) || (c-b == 0xaf)) {
+   (c-b == 0xae) || (c-b == 0xaf)) {
if ((c-rep_prefix == REPE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == 0)) {
-   kvm_rip_write(ctxt-vcpu, c-eip);
-   goto done;
-   }
+   ((ctxt-eflags  EFLG_ZF) == 0))
+   goto string_done;
if ((c-rep_prefix == REPNE_PREFIX) 
-   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF)) {
-   kvm_rip_write(ctxt-vcpu, c-eip);
-   goto done;
-   }
+   ((ctxt-eflags  EFLG_ZF) == EFLG_ZF))
+   goto string_done;
}
if (c-src.type == OP_MEM)
memop = register_address(c, seg_override_base(ctxt, c),
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b25ef4b..82379e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3724,6 +3724,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DONE;
}
 
+restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops);
shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
 
@@ -3746,7 +3747,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
if (r) {
if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
-   return EMULATE_DONE;
+   goto done;
if (!vcpu-mmio_needed) {
kvm_report_emulation_failure(vcpu, mmio);
return EMULATE_FAIL;
@@ -3761,6 +3762,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DO_MMIO;
}
 
+done:
+   if (vcpu-arch.emulate_ctxt.restart)
+   goto restart;
+
return EMULATE_DONE;
 }
 EXPORT_SYMBOL_GPL(emulate_instruction);
@@ -4516,6 +4521,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
goto out;
}
}
+   if (vcpu-arch.emulate_ctxt.restart) {
+   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
+   r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
+   srcu_read_unlock(vcpu-kvm-srcu, 

Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.

Signed-off-by: Gleb Natapovg...@redhat.com
---
  arch/x86/include/asm/kvm_emulate.h |   16 +
  arch/x86/kvm/x86.c |  130 +++
  2 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 032d02f..e881618 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -63,6 +63,15 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);

/*
+* write_std: Write bytes of standard (non-emulated/special) memory.
+*Used for descriptor writing.
+*  @addr:  [IN ] Linear address to which to write.
+*  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
+*  @bytes: [IN ] Number of bytes to write to memory.
+*/
+   int (*write_std)(unsigned long addr, void *val,
+unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
   


Descriptor writes need an atomic kvm_set_guest_bit(), no?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.


index def4877..315e8a8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm)

  static int io_interception(struct vcpu_svm *svm)
  {
-   u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */
-   int size, in, string;
-   unsigned port;
-
++svm-vcpu.stat.io_exits;

-   svm-next_rip = svm-vmcb-control.exit_info_2;
-
-   string = (io_info  SVM_IOIO_STR_MASK) != 0;
-
-   if (string) {
-   if (emulate_instruction(svm-vcpu,
-   0, 0, 0) == EMULATE_DO_MMIO)
-   return 0;
-   return 1;
-   }
-
-   in = (io_info  SVM_IOIO_TYPE_MASK) != 0;
-   port = io_info  16;
-   size = (io_info  SVM_IOIO_SIZE_MASK)  SVM_IOIO_SIZE_SHIFT;
-
-   skip_emulated_instruction(svm-vcpu);
-   return kvm_emulate_pio(svm-vcpu, in, size, port);
+   return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO);
  }
   


We don't want to enter the emulator for non-string in/out.  Leftover 
test code?




  static int nmi_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae3217d..7f33d8e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2974,26 +2974,9 @@ static int handle_triple_fault(struct kvm_vcpu *vcpu)

  static int handle_io(struct kvm_vcpu *vcpu)
  {
-   unsigned long exit_qualification;
-   int size, in, string;
-   unsigned port;
-
++vcpu-stat.io_exits;
-   exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
-   string = (exit_qualification  16) != 0;

-   if (string) {
-   if (emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DO_MMIO)
-   return 0;
-   return 1;
-   }
-
-   size = (exit_qualification  7) + 1;
-   in = (exit_qualification  8) != 0;
-   port = exit_qualification  16;
-
-   skip_emulated_instruction(vcpu);
-   return kvm_emulate_pio(vcpu, in, size, port);
+   return !(emulate_instruction(vcpu, 0, 0, 0) == EMULATE_DO_MMIO);
  }
   


Ditto.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Anthony Liguori

On 03/09/2010 07:32 AM, Avi Kivity wrote:

On 03/02/2010 04:36 AM, Anthony Liguori wrote:
I keep a patch in the SUSE version for quite some time now that 
bumps the default to 384 for qemu-kvm. That was the first round 
number where an openSUSE installation worked.


If someone works up a patch and tests at least a couple types of 
guests to confirm that they all install with that number, I'd be 
happy to apply it (although we need some trickery to support older pc 
versions).


We should avoid changing defaults.


I disagree.  IMHO, the defaults should represent our best suggestion for 
any given release.  The compatibility machines make it very easier for a 
user to fix on a particular version of a machine type.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Anthony Liguori

On 03/09/2010 08:38 AM, Alexander Graf wrote:

On 09.03.2010, at 15:32, Dustin Kirkland wrote:

   

On Tue, 2010-03-09 at 15:32 +0200, Avi Kivity wrote:
 

On 03/02/2010 04:36 AM, Anthony Liguori wrote:
   

I keep a patch in the SUSE version for quite some time now that bumps
the default to 384 for qemu-kvm. That was the first round number
where an openSUSE installation worked.
   

If someone works up a patch and tests at least a couple types of
guests to confirm that they all install with that number, I'd be happy
to apply it (although we need some trickery to support older pc
versions).
 

We should avoid changing defaults.  I don't think in this case it
matters, since everyone specifies -m anyway, but as a general rule
changing defaults = breakage for the unwary.  At least make the default
part of the machine type to preserve compatibility.
   

In that case, Alex, where can I find your +384M patch, because I'd like
to carry the same one in Ubuntu...
 

It's all in the openSUSE build service. The direct access URL (login required 
FWIW) is here:

https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization

It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm 
package.
   


We should attempt to do three things with default ram size:

1) bump it up to a more reasonable number
2) make it specified in the global default config
3) make sure we can provide compatibility support for older machine types


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

Currently when string instruction is only partially complete we go back
to a guest mode, guest tries to reexecute instruction and exits again
and at this point emulation continues. Avoid all of this by restarting
instruction without going back to a guest mode.
   


What happens if rcx is really big?  Going back into the guest gave us a 
preemption point.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:50 PM, Anthony Liguori wrote:
It's all in the openSUSE build service. The direct access URL (login 
required FWIW) is here:


https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization 



It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the 
kvm package.


We should attempt to do three things with default ram size:

1) bump it up to a more reasonable number
2) make it specified in the global default config
3) make sure we can provide compatibility support for older machine types


It's really sad, the amount of code needed to change a number.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 15:32, Dustin Kirkland wrote:

 On Tue, 2010-03-09 at 15:32 +0200, Avi Kivity wrote:
 On 03/02/2010 04:36 AM, Anthony Liguori wrote:
 I keep a patch in the SUSE version for quite some time now that bumps 
 the default to 384 for qemu-kvm. That was the first round number 
 where an openSUSE installation worked.
 
 If someone works up a patch and tests at least a couple types of 
 guests to confirm that they all install with that number, I'd be happy 
 to apply it (although we need some trickery to support older pc 
 versions).
 
 We should avoid changing defaults.  I don't think in this case it 
 matters, since everyone specifies -m anyway, but as a general rule 
 changing defaults = breakage for the unwary.  At least make the default 
 part of the machine type to preserve compatibility.
 
 In that case, Alex, where can I find your +384M patch, because I'd like
 to carry the same one in Ubuntu...

It's all in the openSUSE build service. The direct access URL (login required 
FWIW) is here:

https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization

It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the kvm 
package.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:49 PM, Anthony Liguori wrote:

On 03/09/2010 07:32 AM, Avi Kivity wrote:

On 03/02/2010 04:36 AM, Anthony Liguori wrote:
I keep a patch in the SUSE version for quite some time now that 
bumps the default to 384 for qemu-kvm. That was the first round 
number where an openSUSE installation worked.


If someone works up a patch and tests at least a couple types of 
guests to confirm that they all install with that number, I'd be 
happy to apply it (although we need some trickery to support older 
pc versions).


We should avoid changing defaults.


I disagree.  IMHO, the defaults should represent our best suggestion 
for any given release.  The compatibility machines make it very easier 
for a user to fix on a particular version of a machine type.




Agreed, should have said, avoid changing defaults without taking care of 
backwards compatibility.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Anthony Liguori

On 03/09/2010 08:52 AM, Avi Kivity wrote:

On 03/09/2010 04:50 PM, Anthony Liguori wrote:
It's all in the openSUSE build service. The direct access URL (login 
required FWIW) is here:


https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization 



It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the 
kvm package.


We should attempt to do three things with default ram size:

1) bump it up to a more reasonable number
2) make it specified in the global default config
3) make sure we can provide compatibility support for older machine 
types


It's really sad, the amount of code needed to change a number.


We don't do enough via a config.  If we did, we could just have a 0.12 
config version that got frozen over time.


So really, if we can make the mem readable by global config, and we can 
have machine specific configs, it would simplify the problem in the 
future so that we just had to bump a number.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm upstream segfaults when using -smp 1

2010-03-09 Thread Lucas Meneghel Rodrigues
On Thu, 2010-03-04 at 19:30 +0100, Jan Kiszka wrote:
 Lucas Meneghel Rodrigues wrote:
  Hi folks:
  
  Today's upstream qemu-kvm.git is crashing when attempting to use -smp 1:
  
  03/04 12:56:12 DEBUG|kvm_vm:0461| Running qemu command:
  /usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
  unix:/tmp/monitor-20100304-125508-G6lf,server,nowait -drive 
  file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
  nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
  -smp 1 -drive 
  file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
   -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
  /usr/local/autotest/tests/kvm/images/tftpboot  -boot d -bootp /pxelinux.0 
  -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
  03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) kvm_create_vcpu: Bad file 
  descriptor
  03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) /bin/sh: line 1: 17273 
  Segmentation fault  (core dumped) /usr/local/autotest/tests/kvm/qemu 
  -name 'vm1' -monitor unix:/tmp/monitor-20100304-125508-G6lf,server,nowait 
  -drive file=/tmp/kvm_autotest_root/images/rhel5-64.qcow2,if=ide -net 
  nic,vlan=0,model=rtl8139,macaddr=52:54:00:12:36:60 -net user,vlan=0 -m 1024 
  -smp 1 -drive 
  file=/tmp/kvm_autotest_root/isos/linux/RHEL-5.4-x86_64-DVD.iso,index=2,media=cdrom
   -fda /usr/local/autotest/tests/kvm/images/floppy.img -tftp 
  /usr/local/autotest/tests/kvm/images/tftpboot -boot d -bootp /pxelinux.0 
  -boot n -mem-path /mnt/kvm_hugepage -redir tcp:5000::22 -vnc :0
  03/04 12:56:13 DEBUG|kvm_subpro:0686| (qemu) (Process terminated with 
  status 139)
  
  I have opened a bug about it on KVM's bug tracking system on sourceforge. 
  Relevant software versions involved:
  
  Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
  7811d4e8ec057d25db68f900be1f09a142faca49 (tag kvm-88-3686-g7811d4e)
  Kernel: 2.6.31.12-174.2.22.fc12.x86_64
  
  Please let me know if you need more information about it. 
  
 
 Should be fixed by this:
 
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/47883

Ok, seems like the fix was already applied and today's upstream job
didn't present any problems (100% PASS across the board for qemu-kvm and
qemu :))

Thanks,

Lucas



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Avi Kivity

On 03/02/2010 04:36 AM, Anthony Liguori wrote:
I keep a patch in the SUSE version for quite some time now that bumps 
the default to 384 for qemu-kvm. That was the first round number 
where an openSUSE installation worked.


If someone works up a patch and tests at least a couple types of 
guests to confirm that they all install with that number, I'd be happy 
to apply it (although we need some trickery to support older pc 
versions).


We should avoid changing defaults.  I don't think in this case it 
matters, since everyone specifies -m anyway, but as a general rule 
changing defaults = breakage for the unwary.  At least make the default 
part of the machine type to preserve compatibility.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/08/2010 07:57 PM, Cam Macdonell wrote:



Can you provide a spec that describes the device?  This would be useful for
maintaining the code, writing guest drivers, and as a framework for review.
 

I'm not sure if you want the Qemu command-line part as part of the
spec here, but I've included for completeness.
   


I meant something from the guest's point of view, so command line syntax 
is less important.  It should be equally applicable to a real PCI card 
that works with the same driver.


See http://ozlabs.org/~rusty/virtio-spec/ for an example.


The Inter-VM Shared Memory PCI device
---

BARs

The device supports two BARs.  BAR0 is a 256-byte MMIO region to
support registers
   


(but might be extended in the future)


and BAR1 is used to map the shared memory object from the host.  The size of
BAR1 is specified on the command-line and must be a power of 2 in size.

Registers

BAR0 currently supports 5 registers of 16-bits each.


Suggest making registers 32-bits, friendlier towards non-x86.


  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).
   


How does the driver detect whether interrupts are supported or not?


When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.
   


That limits the number of guests to 16.


The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
   


What happens when two interrupts are sent back-to-back to the same 
guest?  Will the first status value be lost?


Also, reading the status register requires a vmexit.  I suggest dropping 
it and requiring the application to manage this information in the 
shared memory area (where it could do proper queueing of multiple messages).



A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.
   


Please consider adding:

- MSI support
- interrupt on a guest attaching/detaching to the shared memory device

With MSI you could also have the doorbell specify both guest ID and 
vector number, which may be useful.


Thanks for this - it definitely makes reviewing easier.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Avi Kivity

On 03/08/2010 08:03 PM, Alexander Graf wrote:

MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Grafag...@suse.de

---

v1 -  v2:

   - Add documentation for OSI exit struct
---
  Documentation/kvm/api.txt |   13 +
  arch/powerpc/include/asm/kvm_book3s.h |5 +
  arch/powerpc/include/asm/kvm_host.h   |2 ++
  arch/powerpc/kvm/book3s.c |   24 ++--
  arch/powerpc/kvm/powerpc.c|   12 
  include/linux/kvm.h   |6 ++
  6 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 6a19ab6..b2129e8 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -932,6 +932,19 @@ s390 specific.

  powerpc specific.

+   /* KVM_EXIT_OSI */
+   struct {
+   __u64 gprs[32];
+   } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+
   


That's migration unsafe.  There may not be next guest entry on this host.

Is using KVM_[GS]ET_REGS problematic for some reason?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] KVM: Add support for enabling capabilities per-vcpu

2010-03-09 Thread Avi Kivity

On 03/08/2010 08:03 PM, Alexander Graf wrote:

Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.


diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index d170cb4..6a19ab6 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -749,6 +749,21 @@ Writes debug registers into the vcpu.
  See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
  yet and must be cleared on entry.

+4.34 KVM_ENABLE_CAP
+
+Capability: basic
   


Capability: basic means that the feature was present in 2.6.22.  
Otherwise you need to specify the KVM_CAP_ that presents this feature.



+Architectures: all

   


But it's implemented for ppc only (other arches will get ENOTTY).


+Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+As of writing this the only enablement enabled extenion is KVM_CAP_PPC_OSI.
   


That needs to be documented.  It also needs to be discoverable 
separately - we can have a kernel with KVM_ENABLE_CAP but without 
KVM_CAP_PPC_OSI.


btw, KVM_CAP_PPC_OSI conflicts with the KVM_CAP_ namespace.  Please 
choose another namespace.


Need to document the structure fields.



  /*
@@ -696,6 +705,8 @@ struct kvm_clock_data {
  /* Available with KVM_CAP_DEBUGREGS */
  #define KVM_GET_DEBUGREGS _IOR(KVMIO,  0xa1, struct kvm_debugregs)
  #define KVM_SET_DEBUGREGS _IOW(KVMIO,  0xa2, struct kvm_debugregs)
+/* No need for CAP, because then it just always fails */
+#define KVM_ENABLE_CAP_IOW(KVMIO,  0xa3, struct kvm_enable_cap)
   

The CAPs are needed so you can discover what you have without running guests.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

It is undefined and should generate #UD.

Signed-off-by: Gleb Natapovg...@redhat.com
---
  arch/x86/kvm/emulate.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2df510b..1a32b78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,6 +2486,9 @@ twobyte_insn:
(c-src.val  0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
+   case 5: /* not defined */
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
case 7: /* invlpg*/
emulate_invlpg(ctxt-vcpu, memop);
/* Disable writeback. */
   


Why is this needed?  We can only get here if the guest tricks us 
(otherwise the #UD would go back to the guest, or rather, we'd trap it 
to see if it's a hypercall instruction, but not pass it on to the emulator).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH bugfix] KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails

2010-03-09 Thread Avi Kivity

On 03/09/2010 07:55 AM, Takuya Yoshikawa wrote:

svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Mar 9

2010-03-09 Thread Anthony Liguori

On 03/09/2010 07:44 AM, Luiz Capitulino wrote:

On Mon, 8 Mar 2010 22:29:55 -0800
Chris Wrightchr...@redhat.com  wrote:

   

- virtio-9p passthrough filesystem support
- modular command line helpers

Please send in any additional agenda items you are interested in covering.
 

- Summer of code 2010 (do we want to join?)
- Status of Anthony's patch queue
   


I was more behind than I thought I was.  I will be catching up today.  
Sorry for the inconvenience.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-09 Thread Anthony Liguori

On 03/08/2010 02:33 PM, Marcelo Tosatti wrote:

On Fri, Mar 05, 2010 at 09:37:26PM -0500, Kevin O'Connor wrote:
   

On Thu, Mar 04, 2010 at 03:35:52PM -0300, Marcelo Tosatti wrote:
 

On Thu, Mar 04, 2010 at 12:58:58AM -0500, Kevin O'Connor wrote:
   

On Thu, Mar 04, 2010 at 01:21:12AM -0300, Marcelo Tosatti wrote:
 

The regression seems to be caused by seabios commit d7e998f. Kevin, the
failure can be seen on the attached screenshot, which happens on the
first reboot of WinXP 32 installation (after copying files etc).
   

Sorry - I also noticed a bug in that commit recently.  I pushed the
fix I had in my local tree.
 

Thanks, it does fix the issue here. Anthony can you please update
seabios?
   

Neither commit d7e998f nor the fix 8f469b96 are on the SeaBIOS stable
branch.  Is qemu ready to pull in bigger changes now?
 

Anthony pulls in seabios master into qemu.git master periodically.
   


We should be up to date now FWIW.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:27:39PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 It is undefined and should generate #UD.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
   arch/x86/kvm/emulate.c |3 +++
   1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 2df510b..1a32b78 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -2486,6 +2486,9 @@ twobyte_insn:
  (c-src.val  0x0f), ctxt-vcpu);
  c-dst.type = OP_NONE;
  break;
 +case 5: /* not defined */
 +kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
 +goto done;
  case 7: /* invlpg*/
  emulate_invlpg(ctxt-vcpu, memop);
  /* Disable writeback. */
 
 Why is this needed?  We can only get here if the guest tricks us
 (otherwise the #UD would go back to the guest, or rather, we'd trap
 it to see if it's a hypercall instruction, but not pass it on to the
 emulator).
 
For completes. A lot of code we added recently is there only because guest
can trick us to enter emulator. Unfortunately we have to take suck tricks
into account. Without this patch if emulator gets here it will report failed
emulation.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Mar 9

2010-03-09 Thread Luiz Capitulino
On Mon, 8 Mar 2010 22:29:55 -0800
Chris Wright chr...@redhat.com wrote:

 - virtio-9p passthrough filesystem support
 - modular command line helpers 
 
 Please send in any additional agenda items you are interested in covering.

- Summer of code 2010 (do we want to join?)
- Status of Anthony's patch queue
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/24] KVM: x86 emulator: fix 0f 01 /5 emulation

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:33 PM, Gleb Natapov wrote:

On Tue, Mar 09, 2010 at 04:27:39PM +0200, Avi Kivity wrote:
   

On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 

It is undefined and should generate #UD.

Signed-off-by: Gleb Natapovg...@redhat.com
---
  arch/x86/kvm/emulate.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2df510b..1a32b78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2486,6 +2486,9 @@ twobyte_insn:
(c-src.val   0x0f), ctxt-vcpu);
c-dst.type = OP_NONE;
break;
+   case 5: /* not defined */
+   kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
+   goto done;
case 7: /* invlpg*/
emulate_invlpg(ctxt-vcpu, memop);
/* Disable writeback. */
   

Why is this needed?  We can only get here if the guest tricks us
(otherwise the #UD would go back to the guest, or rather, we'd trap
it to see if it's a hypercall instruction, but not pass it on to the
emulator).

 

For completes. A lot of code we added recently is there only because guest
can trick us to enter emulator. Unfortunately we have to take suck tricks
into account. Without this patch if emulator gets here it will report failed
emulation.

   


Okay.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/24] KVM: Remove pointer to rflags from realmode_set_cr parameters.

2010-03-09 Thread Gleb Natapov
Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |3 +--
 arch/x86/kvm/emulate.c  |3 +--
 arch/x86/kvm/x86.c  |4 +---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ec891a2..3b178d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -586,8 +586,7 @@ void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
 unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value,
-unsigned long *rflags);
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4f6ccab..a91bb42 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2531,8 +2531,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt-vcpu,
-   c-modrm_reg, c-modrm_val, ctxt-eflags);
+   realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3753c11..d8711fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4054,13 +4054,11 @@ unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, 
int cr)
return value;
 }
 
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val,
-unsigned long *rflags)
+void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long val)
 {
switch (cr) {
case 0:
kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
-   *rflags = kvm_get_rflags(vcpu);
break;
case 2:
vcpu-arch.cr2 = val;
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/24] KVM: small kvm_arch_vcpu_ioctl_run() cleanup.

2010-03-09 Thread Gleb Natapov
Unify all conditions that get us back into emulator after returning from
userspace.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/kvm/x86.c |   32 ++--
 1 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 82379e1..a2c728f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4495,33 +4495,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
if (!irqchip_in_kernel(vcpu-kvm))
kvm_set_cr8(vcpu, kvm_run-cr8);
 
-   if (vcpu-arch.pio.count) {
-   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
-   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   r = 0;
-   goto out;
+   if (vcpu-arch.pio.count || vcpu-mmio_needed ||
+   vcpu-arch.emulate_ctxt.restart) {
+   if (vcpu-mmio_needed) {
+   memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8);
+   vcpu-mmio_read_completed = 1;
+   vcpu-mmio_needed = 0;
}
-   }
-   if (vcpu-mmio_needed) {
-   memcpy(vcpu-mmio_data, kvm_run-mmio.data, 8);
-   vcpu-mmio_read_completed = 1;
-   vcpu-mmio_needed = 0;
-
-   vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
-   r = emulate_instruction(vcpu, vcpu-arch.mmio_fault_cr2, 0,
-   EMULTYPE_NO_DECODE);
-   srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-   if (r == EMULATE_DO_MMIO) {
-   /*
-* Read-modify-write.  Back to userspace.
-*/
-   r = 0;
-   goto out;
-   }
-   }
-   if (vcpu-arch.emulate_ctxt.restart) {
vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
r = emulate_instruction(vcpu, 0, 0, EMULTYPE_NO_DECODE);
srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()

2010-03-09 Thread Avi Kivity
On 03/09/2010 08:37 AM, Wei Yongjun wrote:
 This patch fixed possible memory leak in kvm_arch_vcpu_create()
 under s390, which would happen when kvm_arch_vcpu_create() fails.
   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: Fix possible memory leak of in kvm_arch_vcpu_create()

2010-03-09 Thread Carsten Otte

Wei Yongjun wrote:

This patch fixed possible memory leak in kvm_arch_vcpu_create()
under s390, which would happen when kvm_arch_vcpu_create() fails.

Good catch, thanks!

Acked-by: Carsten Otte co...@de.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Cam Macdonell
On Tue, Mar 9, 2010 at 3:29 AM, Avi Kivity a...@redhat.com wrote:
 On 03/08/2010 07:57 PM, Cam Macdonell wrote:

 Can you provide a spec that describes the device?  This would be useful
 for
 maintaining the code, writing guest drivers, and as a framework for
 review.


 I'm not sure if you want the Qemu command-line part as part of the
 spec here, but I've included for completeness.


 I meant something from the guest's point of view, so command line syntax is
 less important.  It should be equally applicable to a real PCI card that
 works with the same driver.

 See http://ozlabs.org/~rusty/virtio-spec/ for an example.

 The Inter-VM Shared Memory PCI device
 ---

 BARs

 The device supports two BARs.  BAR0 is a 256-byte MMIO region to
 support registers


 (but might be extended in the future)

 and BAR1 is used to map the shared memory object from the host.  The size
 of
 BAR1 is specified on the command-line and must be a power of 2 in size.

 Registers

 BAR0 currently supports 5 registers of 16-bits each.

 Suggest making registers 32-bits, friendlier towards non-x86.

  Registers are used
 for synchronization between guests sharing the same memory object when
 interrupts are supported (this requires using the shared memory server).


 How does the driver detect whether interrupts are supported or not?

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?


 When using interrupts, VMs communicate with a shared memory server that
 passes
 the shared memory object file descriptor using SCM_RIGHTS.  The server
 assigns
 each VM an ID number and sends this ID number to the Qemu process along
 with a
 series of eventfd file descriptors, one per guest using the shared memory
 server.  These eventfds will be used to send interrupts between guests.
  Each
 guest listens on the eventfd corresponding to their ID and may use the
 others
 for sending interrupts to other guests.

 enum ivshmem_registers {
     IntrMask = 0,
     IntrStatus = 2,
     Doorbell = 4,
     IVPosition = 6,
     IVLiveList = 8
 };

 The first two registers are the interrupt mask and status registers.
 Interrupts are triggered when a message is received on the guest's eventfd
 from
 another VM.  Writing to the 'Doorbell' register is how synchronization
 messages
 are sent to other VMs.

 The IVPosition register is read-only and reports the guest's ID number.
  The
 IVLiveList register is also read-only and reports a bit vector of
 currently
 live VM IDs.


 That limits the number of guests to 16.

True, it could grow to 32 or 64 without difficulty.  We could leave
'liveness' to the user (could be implemented using the shared memory
region) or via the interrupts that arrive on guest attach/detach as
you suggest below..


 The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
 value which will be written to the destination VM and what the guest
 status
 register will be set to when the interrupt is trigger is the destination
 guest.


 What happens when two interrupts are sent back-to-back to the same guest?
  Will the first status value be lost?

Right now, it would be.  I believe that eventfd has a counting
semaphore option, that could prevent loss of status (but limits what
the status could be).  My understanding of uio_pci interrupt handling
is fairly new, but we could have the uio driver store the interrupt
statuses to avoid losing them.


 Also, reading the status register requires a vmexit.  I suggest dropping it
 and requiring the application to manage this information in the shared
 memory area (where it could do proper queueing of multiple messages).

 A value of 255 in the upper 8-bits will trigger a broadcast where the
 message
 will be sent to all other guests.


 Please consider adding:

 - MSI support

Sure, I'll look into it.

 - interrupt on a guest attaching/detaching to the shared memory device

Sure.


 With MSI you could also have the doorbell specify both guest ID and vector
 number, which may be useful.

 Thanks for this - it definitely makes reviewing easier.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/24] KVM: Provide current CPL as part of emulator context.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:24:45PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 Eliminate the need to call back into KVM to get it from emulator.
 
 
 @@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
  vcpu-arch.emulate_ctxt.vcpu = vcpu;
  vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
 +vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu);
  vcpu-arch.emulate_ctxt.mode =
  (!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
  (vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
 
 This is an unconditional VMREAD, which is slow (extra slow if
 nested).  Most common emulator ops do not need the cpl.
 
Will have to make it one of x86_emulate_ops callback then.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:18:09PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 Use this callback instead of directly call kvm function. Also rename
 realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
 to do with real mode.
 
 
 +ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
 +void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
   };
 
 Note, passing a vcpu means we are still tightly coupled to kvm.  Can
 be fixed later.
 
Yes, that is on my todo.

 +static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
 +{
 +unsigned long value;
 +
 +switch (cr) {
 +case 0:
 +value = kvm_read_cr0(vcpu);
 +break;
 +case 2:
 +value = vcpu-arch.cr2;
 +break;
 +case 3:
 +value = vcpu-arch.cr3;
 +break;
 +case 4:
 +value = kvm_read_cr4(vcpu);
 +break;
 +case 8:
 +value = kvm_get_cr8(vcpu);
 +break;
 +default:
 +vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
 +return 0;
 
 This printk is triggerable by guest code (as the patch didn't
 introduce this, it can be fixed later).
 
 The emulator should #UD on unrecognised control registers.
inject #UD on access to non-existing CR patch does this.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/24] KVM: Provide current eip as part of emulator context.

2010-03-09 Thread Gleb Natapov
Eliminate the need to call back into KVM to get it from emulator.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |3 ++-
 arch/x86/kvm/emulate.c |   12 ++--
 arch/x86/kvm/x86.c |1 +
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index d8b2da0..032d02f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -140,7 +140,7 @@ struct decode_cache {
u8 seg_override;
unsigned int d;
unsigned long regs[NR_VCPU_REGS];
-   unsigned long eip, eip_orig;
+   unsigned long eip;
/* modrm */
u8 modrm;
u8 modrm_mod;
@@ -159,6 +159,7 @@ struct x86_emulate_ctxt {
struct kvm_vcpu *vcpu;
 
unsigned long eflags;
+   unsigned long eip; /* eip before instruction emulation */
int cpl;
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
int mode;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ed29a52..2cc9ef4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -667,7 +667,7 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
int rc;
 
/* x86 instructions are limited to 15 bytes. */
-   if (eip + size - ctxt-decode.eip_orig  15)
+   if (eip + size - ctxt-eip  15)
return X86EMUL_UNHANDLEABLE;
eip += ctxt-cs_base;
while (size--) {
@@ -927,7 +927,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
/* Shadow copy of register state. Committed on successful emulation. */
 
memset(c, 0, sizeof(struct decode_cache));
-   c-eip = c-eip_orig = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS);
memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
 
@@ -1874,7 +1874,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
}
c-regs[VCPU_REGS_RCX]--;
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
}
 
if (c-src.type == OP_MEM) {
@@ -2443,7 +2443,7 @@ twobyte_insn:
goto done;
 
/* Let the processor re-execute the fixed hypercall */
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
/* Disable writeback. */
c-dst.type = OP_NONE;
break;
@@ -2547,7 +2547,7 @@ twobyte_insn:
| ((u64)c-regs[VCPU_REGS_RDX]  32);
if (kvm_set_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) {
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
}
rc = X86EMUL_CONTINUE;
c-dst.type = OP_NONE;
@@ -2556,7 +2556,7 @@ twobyte_insn:
/* rdmsr */
if (kvm_get_msr(ctxt-vcpu, c-regs[VCPU_REGS_RCX], msr_data)) 
{
kvm_inject_gp(ctxt-vcpu, 0);
-   c-eip = kvm_rip_read(ctxt-vcpu);
+   c-eip = ctxt-eip;
} else {
c-regs[VCPU_REGS_RAX] = (u32)msr_data;
c-regs[VCPU_REGS_RDX] = msr_data  32;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b5fb43..41cf54c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3500,6 +3500,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
vcpu-arch.emulate_ctxt.vcpu = vcpu;
vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu);
+   vcpu-arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
vcpu-arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
-- 
1.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/24] KVM: Provide current CPL as part of emulator context.

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:09 PM, Gleb Natapov wrote:

Eliminate the need to call back into KVM to get it from emulator.


@@ -3499,6 +3499,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,

vcpu-arch.emulate_ctxt.vcpu = vcpu;
vcpu-arch.emulate_ctxt.eflags = kvm_x86_ops-get_rflags(vcpu);
+   vcpu-arch.emulate_ctxt.cpl = kvm_x86_ops-get_cpl(vcpu);
vcpu-arch.emulate_ctxt.mode =
(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
(vcpu-arch.emulate_ctxt.eflags  X86_EFLAGS_VM)
   


This is an unconditional VMREAD, which is slow (extra slow if nested).  
Most common emulator ops do not need the cpl.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: PPC: Add OSI hypercall interface

2010-03-09 Thread Alexander Graf

On 09.03.2010, at 14:19, Avi Kivity wrote:

 On 03/09/2010 03:12 PM, Alexander Graf wrote:
 On 09.03.2010, at 14:11, Avi Kivity wrote:
 
   
 On 03/09/2010 03:04 PM, Alexander Graf wrote:
 
   
 +/* KVM_EXIT_OSI */
 +struct {
 +__u64 gprs[32];
 +} osi;
 +
 +MOL uses a special hypercall interface it calls 'OSI'. To enable it, we 
 catch
 +hypercalls and exit with this exit struct that contains all the guest 
 gprs.
 +
 +If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a 
 hypercall.
 +Userspace can now handle the hypercall and when it's done modify the 
 gprs as
 +necessary. Upon guest entry all guest GPRs will then be replaced by the 
 values
 +in this struct.
 +
 
 
   
 That's migration unsafe.  There may not be next guest entry on this host.
 
 
 It's as unsafe as MMIO then.
 
 
   
 From api.txt:
 
 
 NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
 are complete (and guest state is consistent) only after userspace has
 re-entered the kernel with KVM_RUN.  The kernel side will first finish
 incomplete operations and then check for pending signals.  Userspace
 can re-enter the guest with an unmasked signal pending to complete
 pending operations.
   
 
 Alright - so I add KVM_EXIT_OSI there and be good? :)
   
 
 Sure, just verify that the note holds for that case too.

The handling of the hypercall write-back is in the same region as the mmio one. 
So whatever applies for MMIO entries applies for OSI entries too.

Alex--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/09/2010 02:49 PM, Arnd Bergmann wrote:

On Monday 08 March 2010, Cam Macdonell wrote:
   

enum ivshmem_registers {
 IntrMask = 0,
 IntrStatus = 2,
 Doorbell = 4,
 IVPosition = 6,
 IVLiveList = 8
};

The first two registers are the interrupt mask and status registers.
Interrupts are triggered when a message is received on the guest's eventfd from
another VM.  Writing to the 'Doorbell' register is how synchronization messages
are sent to other VMs.

The IVPosition register is read-only and reports the guest's ID number.  The
IVLiveList register is also read-only and reports a bit vector of currently
live VM IDs.

The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest status
register will be set to when the interrupt is trigger is the destination guest.
A value of 255 in the upper 8-bits will trigger a broadcast where the message
will be sent to all other guests.
 

This means you have at least two intercepts for each message:

1. Sender writes to doorbell
2. Receiver gets interrupted

With optionally two more intercepts in order to avoid interrupting the
receiver every time:

3. Receiver masks interrupt in order to process data
4. Receiver unmasks interrupt when it's done and status is no longer pending

I believe you can do much better than this, you combine status and mask
bits, making this level triggered, and move to a bitmask of all guests:

In order to send an interrupt to another guest, the sender first checks
the bit for the receiver. If it's '1', no need for any intercept, the
receiver will come back anyway. If it's zero, write a '1' bit, which
gets OR'd into the bitmask by the host. The receiver gets interrupted
at a raising edge and just leaves the bit on, until it's done processing,
then turns the bit off by writing a '1' into its own location in the mask.
   


We could make the masking in RAM, not in registers, like virtio, which 
would require no exits.  It would then be part of the application 
specific protocol and out of scope of of this spec.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/24] KVM: Provide callback to get/set control registers in emulator ops.

2010-03-09 Thread Gleb Natapov
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_emulate.h |3 +-
 arch/x86/include/asm/kvm_host.h|2 -
 arch/x86/kvm/emulate.c |7 +-
 arch/x86/kvm/x86.c |  114 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 2666d7a..0c5caa4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -108,7 +108,8 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct kvm_vcpu *vcpu);
-
+   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
+   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3b178d8..e8e108a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -585,8 +585,6 @@ void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, 
unsigned long address);
 void realmode_lmsw(struct kvm_vcpu *vcpu, unsigned long msw,
   unsigned long *rflags);
 
-unsigned long realmode_get_cr(struct kvm_vcpu *vcpu, int cr);
-void realmode_set_cr(struct kvm_vcpu *vcpu, int cr, unsigned long value);
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a91bb42..d515795 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2480,7 +2480,7 @@ twobyte_insn:
break;
case 4: /* smsw */
c-dst.bytes = 2;
-   c-dst.val = realmode_get_cr(ctxt-vcpu, 0);
+   c-dst.val = ops-get_cr(0, ctxt-vcpu);
break;
case 6: /* lmsw */
realmode_lmsw(ctxt-vcpu, (u16)c-src.val,
@@ -2516,8 +2516,7 @@ twobyte_insn:
case 0x20: /* mov cr, reg */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   c-regs[c-modrm_rm] =
-   realmode_get_cr(ctxt-vcpu, c-modrm_reg);
+   c-regs[c-modrm_rm] = ops-get_cr(c-modrm_reg, ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x21: /* mov from dr to reg */
@@ -2531,7 +2530,7 @@ twobyte_insn:
case 0x22: /* mov reg, cr */
if (c-modrm_mod != 3)
goto cannot_emulate;
-   realmode_set_cr(ctxt-vcpu, c-modrm_reg, c-modrm_val);
+   ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu);
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d8711fe..7b62ef2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3397,12 +3397,70 @@ void kvm_report_emulation_failure(struct kvm_vcpu 
*vcpu, const char *context)
 }
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
+static u64 mk_cr_64(u64 curr_cr, u32 new_val)
+{
+   return (curr_cr  ~((1ULL  32) - 1)) | new_val;
+}
+
+static unsigned long emulator_get_cr(int cr, struct kvm_vcpu *vcpu)
+{
+   unsigned long value;
+
+   switch (cr) {
+   case 0:
+   value = kvm_read_cr0(vcpu);
+   break;
+   case 2:
+   value = vcpu-arch.cr2;
+   break;
+   case 3:
+   value = vcpu-arch.cr3;
+   break;
+   case 4:
+   value = kvm_read_cr4(vcpu);
+   break;
+   case 8:
+   value = kvm_get_cr8(vcpu);
+   break;
+   default:
+   vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
+   return 0;
+   }
+
+   return value;
+}
+
+static void emulator_set_cr(int cr, unsigned long val, struct kvm_vcpu *vcpu)
+{
+   switch (cr) {
+   case 0:
+   kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
+   break;
+   case 2:
+   vcpu-arch.cr2 = val;
+   break;
+   case 3:
+   kvm_set_cr3(vcpu, val);
+   break;
+   case 4:
+   kvm_set_cr4(vcpu, mk_cr_64(kvm_read_cr4(vcpu), val));
+   break;
+   case 8:
+   kvm_set_cr8(vcpu, val  0xfUL);
+   break;
+   default:
+   vcpu_printf(vcpu, %s: unexpected cr %u\n, __func__, cr);
+   }
+}
+
 static struct x86_emulate_ops emulate_ops = {

Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 Provide get_cached_descriptor(), set_cached_descriptor(),
 get_segment_selector(), set_segment_selector(), get_gdt(),
 write_std() callbacks.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
   arch/x86/include/asm/kvm_emulate.h |   16 +
   arch/x86/kvm/x86.c |  130 
  +++
   2 files changed, 131 insertions(+), 15 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index 032d02f..e881618 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -63,6 +63,15 @@ struct x86_emulate_ops {
  unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
  /*
 + * write_std: Write bytes of standard (non-emulated/special) memory.
 + *Used for descriptor writing.
 + *  @addr:  [IN ] Linear address to which to write.
 + *  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
 + *  @bytes: [IN ] Number of bytes to write to memory.
 + */
 +int (*write_std)(unsigned long addr, void *val,
 + unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
 Descriptor writes need an atomic kvm_set_guest_bit(), no?
 
It is? atomic against what? Current code just write whole descriptor
using write_std().

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Cam Macdonell
On Tue, Mar 9, 2010 at 6:03 AM, Avi Kivity a...@redhat.com wrote:
 On 03/09/2010 02:49 PM, Arnd Bergmann wrote:

 On Monday 08 March 2010, Cam Macdonell wrote:


 enum ivshmem_registers {
     IntrMask = 0,
     IntrStatus = 2,
     Doorbell = 4,
     IVPosition = 6,
     IVLiveList = 8
 };

 The first two registers are the interrupt mask and status registers.
 Interrupts are triggered when a message is received on the guest's
 eventfd from
 another VM.  Writing to the 'Doorbell' register is how synchronization
 messages
 are sent to other VMs.

 The IVPosition register is read-only and reports the guest's ID number.
  The
 IVLiveList register is also read-only and reports a bit vector of
 currently
 live VM IDs.

 The Doorbell register is 16-bits, but is treated as two 8-bit values.
  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are
 the
 value which will be written to the destination VM and what the guest
 status
 register will be set to when the interrupt is trigger is the destination
 guest.
 A value of 255 in the upper 8-bits will trigger a broadcast where the
 message
 will be sent to all other guests.


 This means you have at least two intercepts for each message:

 1. Sender writes to doorbell
 2. Receiver gets interrupted

 With optionally two more intercepts in order to avoid interrupting the
 receiver every time:

 3. Receiver masks interrupt in order to process data
 4. Receiver unmasks interrupt when it's done and status is no longer
 pending

 I believe you can do much better than this, you combine status and mask
 bits, making this level triggered, and move to a bitmask of all guests:

 In order to send an interrupt to another guest, the sender first checks
 the bit for the receiver. If it's '1', no need for any intercept, the
 receiver will come back anyway. If it's zero, write a '1' bit, which
 gets OR'd into the bitmask by the host. The receiver gets interrupted
 at a raising edge and just leaves the bit on, until it's done processing,
 then turns the bit off by writing a '1' into its own location in the mask.


 We could make the masking in RAM, not in registers, like virtio, which would
 require no exits.  It would then be part of the application specific
 protocol and out of scope of of this spec.


This kind of implementation would be possible now since with UIO it's
up to the application whether to mask interrupts or not and what
interrupts mean.  We could leave the interrupt mask register for those
who want that behaviour.  Arnd's idea would remove the need for the
Doorbell and Mask, but we will always need at least one MMIO register
to send whatever interrupts we do send.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Avi Kivity

On 03/09/2010 04:57 PM, Anthony Liguori wrote:

On 03/09/2010 08:52 AM, Avi Kivity wrote:

On 03/09/2010 04:50 PM, Anthony Liguori wrote:
It's all in the openSUSE build service. The direct access URL 
(login required FWIW) is here:


https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization 



It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the 
kvm package.


We should attempt to do three things with default ram size:

1) bump it up to a more reasonable number
2) make it specified in the global default config
3) make sure we can provide compatibility support for older machine 
types


It's really sad, the amount of code needed to change a number.


We don't do enough via a config.  If we did, we could just have a 0.12 
config version that got frozen over time.


So really, if we can make the mem readable by global config, and we 
can have machine specific configs, it would simplify the problem in 
the future so that we just had to bump a number.


Perhaps a json representation of things.  We already have the parser.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-09 Thread Avi Kivity

On 03/09/2010 06:25 PM, Gleb Natapov wrote:

On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote:
   

On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 

Provide get_cached_descriptor(), set_cached_descriptor(),
get_segment_selector(), set_segment_selector(), get_gdt(),
write_std() callbacks.

Signed-off-by: Gleb Natapovg...@redhat.com
---
  arch/x86/include/asm/kvm_emulate.h |   16 +
  arch/x86/kvm/x86.c |  130 +++
  2 files changed, 131 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 032d02f..e881618 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -63,6 +63,15 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);

/*
+* write_std: Write bytes of standard (non-emulated/special) memory.
+*Used for descriptor writing.
+*  @addr:  [IN ] Linear address to which to write.
+*  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
+*  @bytes: [IN ] Number of bytes to write to memory.
+*/
+   int (*write_std)(unsigned long addr, void *val,
+unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
   

Descriptor writes need an atomic kvm_set_guest_bit(), no?

 

It is? atomic against what? Current code just write whole descriptor
using write_std().
   


These are accessed bit changes, and are done atomically in the same way 
as a page table walk sets the accessed and dirty bit.  Presumably the 
atomic operation is to allow the kernel to scan segments and swap them 
out if they are not used.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Anthony Liguori

On 03/09/2010 11:11 AM, Avi Kivity wrote:

On 03/09/2010 04:57 PM, Anthony Liguori wrote:

On 03/09/2010 08:52 AM, Avi Kivity wrote:

On 03/09/2010 04:50 PM, Anthony Liguori wrote:
It's all in the openSUSE build service. The direct access URL 
(login required FWIW) is here:


https://build.opensuse.org/package/view_file?file=kvm-qemu-default-memsize.patchpackage=kvmproject=Virtualization 



It merely changes DEFAULT_RAM_SIZE in vl.c from 128 to 384 for the 
kvm package.


We should attempt to do three things with default ram size:

1) bump it up to a more reasonable number
2) make it specified in the global default config
3) make sure we can provide compatibility support for older machine 
types


It's really sad, the amount of code needed to change a number.


We don't do enough via a config.  If we did, we could just have a 
0.12 config version that got frozen over time.


So really, if we can make the mem readable by global config, and we 
can have machine specific configs, it would simplify the problem in 
the future so that we just had to bump a number.


Perhaps a json representation of things.  We already have the parser.


Please no :-)

We have a config format, QemuOpts ties nicely into it as does qdev.  We 
just need to represent machine information via QemuOpts and tie -m to 
manipulating the memory assigned to a machine.  IOW, instead of:


(machine_init)(ram_addr_t ram_size,
const char *boot_device,
const char *kernel_filename,
const char *kernel_cmdline,
const char *initrd_filename,
const char *cpu_model)

It should be:

(machine_init)(QemuOpts *opts);

Then we can have a [machine] section in the config where we describe all 
of these things.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Avi Kivity

On 03/09/2010 05:27 PM, Cam Macdonell wrote:





  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).

   

How does the driver detect whether interrupts are supported or not?
 

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?
   


I suggest not designing the device to uio.  Make it a good 
guest-independent device, and if uio doesn't fit it, change it.


Why not support interrupts unconditionally?  Is the device useful 
without interrupts?



The Doorbell register is 16-bits, but is treated as two 8-bit values.  The
upper 8-bits are used for the destination VM ID.  The lower 8-bits are the
value which will be written to the destination VM and what the guest
status
register will be set to when the interrupt is trigger is the destination
guest.

   

What happens when two interrupts are sent back-to-back to the same guest?
  Will the first status value be lost?
 

Right now, it would be.  I believe that eventfd has a counting
semaphore option, that could prevent loss of status (but limits what
the status could be).
   


It only counts the number of interrupts (and kvm will coalesce them anyway).


My understanding of uio_pci interrupt handling
is fairly new, but we could have the uio driver store the interrupt
statuses to avoid losing them.
   


There's nowhere to store them if we use ioeventfd/irqfd.  I think it's 
both easier and more efficient to leave this to the application (to 
store into shared memory).


--

error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM usability

2010-03-09 Thread Avi Kivity

On 03/09/2010 07:27 PM, Anthony Liguori wrote:

Perhaps a json representation of things.  We already have the parser.



Please no :-)

We have a config format, QemuOpts ties nicely into it as does qdev.  
We just need to represent machine information via QemuOpts and tie -m 
to manipulating the memory assigned to a machine.  IOW, instead of:


(machine_init)(ram_addr_t ram_size,
const char *boot_device,
const char *kernel_filename,
const char *kernel_cmdline,
const char *initrd_filename,
const char *cpu_model)

It should be:

(machine_init)(QemuOpts *opts);

Then we can have a [machine] section in the config where we describe 
all of these things.


Looks good.

One day we'll read VHDL descriptions of the device model from the 
machine config file and tcg them to host native code, and qemu will be 
pure infrastructure with zero details.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Anthony Liguori

On 03/09/2010 11:28 AM, Avi Kivity wrote:

On 03/09/2010 05:27 PM, Cam Macdonell wrote:





  Registers are used
for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory 
server).



How does the driver detect whether interrupts are supported or not?

At the moment, the VM ID is set to -1 if interrupts aren't supported,
but that may not be the clearest way to do things.  With UIO is there
a way to detect if the interrupt pin is on?


I suggest not designing the device to uio.  Make it a good 
guest-independent device, and if uio doesn't fit it, change it.


You can always fall back to reading the config space directly.  It's not 
strictly required that you stick to the UIO interface.


Why not support interrupts unconditionally?  Is the device useful 
without interrupts?


You can always just have interrupts enabled and not use them if that's 
desired.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 07:22:51PM +0200, Avi Kivity wrote:
 On 03/09/2010 06:25 PM, Gleb Natapov wrote:
 On Tue, Mar 09, 2010 at 04:43:59PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 Provide get_cached_descriptor(), set_cached_descriptor(),
 get_segment_selector(), set_segment_selector(), get_gdt(),
 write_std() callbacks.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 ---
   arch/x86/include/asm/kvm_emulate.h |   16 +
   arch/x86/kvm/x86.c |  130 
  +++
   2 files changed, 131 insertions(+), 15 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_emulate.h 
 b/arch/x86/include/asm/kvm_emulate.h
 index 032d02f..e881618 100644
 --- a/arch/x86/include/asm/kvm_emulate.h
 +++ b/arch/x86/include/asm/kvm_emulate.h
 @@ -63,6 +63,15 @@ struct x86_emulate_ops {
unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
/*
 +   * write_std: Write bytes of standard (non-emulated/special) memory.
 +   *Used for descriptor writing.
 +   *  @addr:  [IN ] Linear address to which to write.
 +   *  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
 +   *  @bytes: [IN ] Number of bytes to write to memory.
 +   */
 +  int (*write_std)(unsigned long addr, void *val,
 +   unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 Descriptor writes need an atomic kvm_set_guest_bit(), no?
 
 It is? atomic against what? Current code just write whole descriptor
 using write_std().
 
 These are accessed bit changes, and are done atomically in the same
 way as a page table walk sets the accessed and dirty bit.
 Presumably the atomic operation is to allow the kernel to scan
 segments and swap them out if they are not used.
 
We can use cmpxchg callback for that, no?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:47:24PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 in/out emulation is broken now. The breakage is different depending
 on where IO device resides. If it is in userspace emulator reports
 emulation failure since it incorrectly interprets kvm_emulate_pio()
 return value. If IO device is in the kernel emulation of 'in' will do
 nothing since kvm_emulate_pio() stores result directly into vcpu
 registers, so emulator will overwrite result of emulation during
 commit of shadowed register.
 
 
 index def4877..315e8a8 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1488,29 +1488,9 @@ static int shutdown_interception(struct vcpu_svm *svm)
 
   static int io_interception(struct vcpu_svm *svm)
   {
 -u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */
 -int size, in, string;
 -unsigned port;
 -
  ++svm-vcpu.stat.io_exits;
 
 -svm-next_rip = svm-vmcb-control.exit_info_2;
 -
 -string = (io_info  SVM_IOIO_STR_MASK) != 0;
 -
 -if (string) {
 -if (emulate_instruction(svm-vcpu,
 -0, 0, 0) == EMULATE_DO_MMIO)
 -return 0;
 -return 1;
 -}
 -
 -in = (io_info  SVM_IOIO_TYPE_MASK) != 0;
 -port = io_info  16;
 -size = (io_info  SVM_IOIO_SIZE_MASK)  SVM_IOIO_SIZE_SHIFT;
 -
 -skip_emulated_instruction(svm-vcpu);
 -return kvm_emulate_pio(svm-vcpu, in, size, port);
 +return !(emulate_instruction(svm-vcpu, 0, 0, 0) == EMULATE_DO_MMIO);
   }
 
 We don't want to enter the emulator for non-string in/out.  Leftover
 test code?
 
No, unfortunately this is not leftover. I just don't see a way how we
can bypass emulator and still have emulator be able to emulate in/out
(for big real mode for instance). The problem is basically described in
the commit message. If we have function outside of emulator that does
in/out emulation on vcpu directly, then emulator can't  use it since
committing shadowed registers will overwrite the result of emulation.
Having two different emulations (one outside of emulator and another in
emulator) is also problematic since when userspace returns after IO exit
we don't know which emulation to continue. If we want to avoid
instruction decoding we can fill in emulation context from exit info as
if instruction was already decoded and call emulator.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.

2010-03-09 Thread Gleb Natapov
On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote:
 On 03/09/2010 04:09 PM, Gleb Natapov wrote:
 Currently when string instruction is only partially complete we go back
 to a guest mode, guest tries to reexecute instruction and exits again
 and at this point emulation continues. Avoid all of this by restarting
 instruction without going back to a guest mode.
 
 What happens if rcx is really big?  Going back into the guest gave
 us a preemption point.
 
Two solutions. We can check if reschedule is required and yield cpu if
needed. Or we can enter guest from time to time.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Cam Macdonell
On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity a...@redhat.com wrote:
 On 03/09/2010 05:27 PM, Cam Macdonell wrote:


  Registers are used
 for synchronization between guests sharing the same memory object when
 interrupts are supported (this requires using the shared memory server).



 How does the driver detect whether interrupts are supported or not?


 At the moment, the VM ID is set to -1 if interrupts aren't supported,
 but that may not be the clearest way to do things.  With UIO is there
 a way to detect if the interrupt pin is on?


 I suggest not designing the device to uio.  Make it a good guest-independent
 device, and if uio doesn't fit it, change it.

 Why not support interrupts unconditionally?  Is the device useful without
 interrupts?

Currently my patch works with or without the shared memory server.  If
you give the parameter

-ivshmem 256,foo

then this will create (if necessary) and map /dev/shm/foo as the
shared region without interrupt support.  Some users of shared memory
are using it this way.

Going forward we can require the shared memory server and always have
interrupts enabled.


 The Doorbell register is 16-bits, but is treated as two 8-bit values.
  The
 upper 8-bits are used for the destination VM ID.  The lower 8-bits are
 the
 value which will be written to the destination VM and what the guest
 status
 register will be set to when the interrupt is trigger is the destination
 guest.



 What happens when two interrupts are sent back-to-back to the same guest?
  Will the first status value be lost?


 Right now, it would be.  I believe that eventfd has a counting
 semaphore option, that could prevent loss of status (but limits what
 the status could be).


 It only counts the number of interrupts (and kvm will coalesce them anyway).

Right.


 My understanding of uio_pci interrupt handling
 is fairly new, but we could have the uio driver store the interrupt
 statuses to avoid losing them.


 There's nowhere to store them if we use ioeventfd/irqfd.  I think it's both
 easier and more efficient to leave this to the application (to store into
 shared memory).

Agreed.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Guest mmap.c bug

2010-03-09 Thread Bruno Cesar Ribas
On Mon, Mar 08, 2010 at 03:49:01PM +0100, Andrea Arcangeli wrote:
 On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote:
  It looks unrelated to kvm, though of course random memory corruption 
  cannot be ruled out.
  
  Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?
  
  Andrea, any idea?
 
 Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma
 despite vma is the one with the smaller vm_end where the comparison
 vma-vm_start  vma-vm_end is true (the next vma is null and the
 prev will have vma-vm_start == prev-vm_end, not ).
 
 The bug check looks right, it doesn't seem false positive and this
 bugcheck indicates that the vma rbtree is memory-corrupted somehow.
 
 so yes fiddling with npt on and off sounds a good start, if it's a bug

I can confirm it happens with npt on and off.

And it also happens on a Nehalem XEON (it just happened).

 in shadow paging it's unlikely the exact same bug materializes with
 both npt and without. If the crash happens with npt on and off, then
 maybe it's not hypervisor related. Could also be bad RAM if it only

I doubt it is bad ram! This machine is working (wihtout KVM) for almost 2
years and MCE does not report any problems on the host machine.

And it happens on two identical machines (Opteron) and now o the new (5 days
old) Intel Nehalem XEON.

All guest are Running the same kernel. It happens with a kernel compiled by
me and from debian SID both 2.6.32.9, and from previous kernel I tried
(2.6.31.12 and 2.6.27.45)

 happens on a single host and all other hosts are fine with same binary
 guest/host kernels (rbtree walk might stress the memory bus more than
 other operations). Said that vm_next being null (and if it's null,
 likely vm_next pointer has no ram bitflip) is a bit weird and not
 common scenario and this page fault seems triggered with procfs
 copy_user call which is non standard, so maybe this is a guest bug. It
 would be interesting to know what is the vm_start address, at the end
 there are stack, vdso and vsyscall areas.

I'll make it print vm_start for next reboot.

-- 
Bruno Ribas - ri...@c3sl.ufpr.br
http://www.inf.ufpr.br/ribas
C3SL: http://www.c3sl.ufpr.br
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Paul Brook wrote:
  However, coherence could be made host-type-independent by the host
  mapping and unampping pages, so that each page is only mapped into one
  guest (or guest CPU) at a time.  Just like some clustering filesystems
  do to maintain coherence.
 
 You're assuming that a TLB flush implies a write barrier, and a TLB miss 
 implies a read barrier.  I'd be surprised if this were true in general.

The host driver itself can issue full barriers at the same time as it
maps pages on TLB miss, and would probably have to interrupt the
guest's SMP KVM threads to insert a full barrier when broadcasting a
TLB flush on unmap.

-- Jamie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Avi Kivity wrote:
 On 03/08/2010 03:03 PM, Paul Brook wrote:
 On 03/08/2010 12:53 AM, Paul Brook wrote:
  
 Support an inter-vm shared memory device that maps a shared-memory
 object as a PCI device in the guest.  This patch also supports
 interrupts between guest by communicating over a unix domain socket.
 This patch applies to the qemu-kvm repository.
  
 No. All new devices should be fully qdev based.
 
 I suspect you've also ignored a load of coherency issues, especially when
 not using KVM. As soon as you have shared memory in more than one host
 thread/process you have to worry about memory barriers.

 Shouldn't it be sufficient to require the guest to issue barriers (and
 to ensure tcg honours the barriers, if someone wants this with tcg)?.
  
 In a cross environment that becomes extremely hairy.  For example the x86
 architecture effectively has an implicit write barrier before every store, 
 and
 an implicit read barrier before every load.

 
 Ah yes.  For cross tcg environments you can map the memory using mmio 
 callbacks instead of directly, and issue the appropriate barriers there.

That makes sense.  It will force an mmio callback for every access to
the shared memory, which is ok for correctness but vastly slower when
running in TCG compared with KVM.

But it's hard to see what else could be done - those implicit write
barries on x86 have to be emulated somehow.  For TCG without inter-vm
shared memory, those barriers aren't a problem.

Non-random-corruption guest behaviour is paramount, so I hope the
inter-vm device will add those mmio callbacks for the cross-arch case
before it sees much action.  (Strictly, it isn't cross-arch, but
host-has-more-relaxed-implicit-memory-model-than-guest.  I'm assuming
TCG doesn't reorder memory instructions).

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Jamie Lokier
Paul Brook wrote:
  On 03/08/2010 12:53 AM, Paul Brook wrote:
   Support an inter-vm shared memory device that maps a shared-memory
   object as a PCI device in the guest.  This patch also supports
   interrupts between guest by communicating over a unix domain socket. 
   This patch applies to the qemu-kvm repository.
  
   No. All new devices should be fully qdev based.
  
   I suspect you've also ignored a load of coherency issues, especially when
   not using KVM. As soon as you have shared memory in more than one host
   thread/process you have to worry about memory barriers.
  
  Shouldn't it be sufficient to require the guest to issue barriers (and
  to ensure tcg honours the barriers, if someone wants this with tcg)?.
 
 In a cross environment that becomes extremely hairy.  For example the x86 
 architecture effectively has an implicit write barrier before every store, 
 and 
 an implicit read barrier before every load.

Btw, x86 doesn't have any implicit barriers due to ordinary loads.
Only stores and atomics have implicit barriers, afaik.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


report stolen time via pvclock?

2010-03-09 Thread Thomas Treutner
Hi,

I'm referring to this patchset

http://www.mail-archive.com/kvm@vger.kernel.org/msg23810.html

of Marcelo Tosatti. It seems it was never included or even discussed, although 
it's nearly half a year old. I wonder if there is a good reason for that? I'd 
like to use the steal time for my VMs, as I consider it useful in some cases.

Thanks,
kr,t
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: report stolen time via pvclock?

2010-03-09 Thread Rik van Riel

On 03/09/2010 04:30 PM, Marcelo Tosatti wrote:

On Tue, Mar 09, 2010 at 09:47:38PM +0100, Thomas Treutner wrote:

Hi,

I'm referring to this patchset

http://www.mail-archive.com/kvm@vger.kernel.org/msg23810.html

of Marcelo Tosatti. It seems it was never included or even discussed, although
it's nearly half a year old. I wonder if there is a good reason for that? I'd
like to use the steal time for my VMs, as I consider it useful in some cases.


There is a problem with it: stolen time is accounted separately (in
addition to) user/system/idle.

And as you noted there seems to be lack of interest in the feature.


More like a lack of time to implement it properly for KVM.

I know it is a useful feature for system administrators, and
should probably try to get it reimplemented so it works right
and with lower overhead than full schedstats (I have some
ideas on how to achieve that).

Thanks for reminding me of this project :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Anthony Liguori

On 03/08/2010 03:54 AM, Jamie Lokier wrote:

Alexander Graf wrote:
   

Or we could put in some code that tells the guest the host shm
architecture and only accept x86 on x86 for now. If anyone cares for
other combinations, they're free to implement them.

Seriously, we're looking at an interface designed for kvm here. Let's
please keep it as simple and fast as possible for the actual use case,
not some theoretically possible ones.
 

The concern is that a perfectly working guest image running on kvm,
the guest being some OS or app that uses this facility (_not_ a
kvm-only guest driver), is later run on qemu on a different host, and
then mostly works except for some silent data corruption.

That is not a theoretical scenario.
   


Hint: no matter what you do, shared memory is a hack that's going to 
lead to subtle failures one way or another.


It's useful to support because it has some interesting academic uses but 
it's not a mechanism that can ever be used for real world purposes.


It's impossible to support save/restore correctly.  It can never be made 
to work with TCG in a safe way.  That's why I've been advocating keeping 
this as simple as humanly possible.  It's just not worth trying to make 
this fancier than it needs to be because it will never be fully correct.


Regards,

Anthony Liguori


Well, the bit with this driver is theoretical, obviously :-)
But not the bit about moving to a different host.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Anthony Liguori

On 03/08/2010 07:16 AM, Avi Kivity wrote:

On 03/08/2010 03:03 PM, Paul Brook wrote:

On 03/08/2010 12:53 AM, Paul Brook wrote:

Support an inter-vm shared memory device that maps a shared-memory
object as a PCI device in the guest.  This patch also supports
interrupts between guest by communicating over a unix domain socket.
This patch applies to the qemu-kvm repository.

No. All new devices should be fully qdev based.

I suspect you've also ignored a load of coherency issues, 
especially when

not using KVM. As soon as you have shared memory in more than one host
thread/process you have to worry about memory barriers.

Shouldn't it be sufficient to require the guest to issue barriers (and
to ensure tcg honours the barriers, if someone wants this with tcg)?.
In a cross environment that becomes extremely hairy.  For example the 
x86
architecture effectively has an implicit write barrier before every 
store, and

an implicit read barrier before every load.


Ah yes.  For cross tcg environments you can map the memory using mmio 
callbacks instead of directly, and issue the appropriate barriers there.


Not good enough unless you want to severely restrict the use of shared 
memory within the guest.


For instance, it's going to be useful to assume that you atomic 
instructions remain atomic.  Crossing architecture boundaries here makes 
these assumptions invalid.  A barrier is not enough.


Shared memory only makes sense when using KVM.  In fact, we should 
actively disable the shared memory device when not using KVM.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Paul Brook
  In a cross environment that becomes extremely hairy.  For example the x86
  architecture effectively has an implicit write barrier before every
  store, and an implicit read barrier before every load.
 
 Btw, x86 doesn't have any implicit barriers due to ordinary loads.
 Only stores and atomics have implicit barriers, afaik.

As of March 2009[1] Intel guarantees that memory reads occur in order (they 
may only be reordered relative to writes). It appears AMD do not provide this 
guarantee, which could be an interesting problem for heterogeneous migration..

Paul

[*] The most recent docs I have handy. Up to and including Core-2 Duo.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >