date:20110118

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 17:37, Anthony Liguori wrote:
 On 01/18/2011 10:17 AM, Jan Kiszka wrote:
 On 2011-01-18 17:04, Anthony Liguori wrote:

 A KVM device should sit on a KVM specific bus that hangs off of
 sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.



  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.



 The bus topology reflects how I/O flows in and out of a device.  We do
 not model a perfect PC bus architecture and I don't think we ever intend
 to.  Instead, we model a functional architecture.

 I/O from an assigned device does not flow through the emulated PCI bus.
 Therefore, it does not belong on the emulated PCI bus.

 Assigned devices need to interact with the emulated PCI bus, but they
 shouldn't be children of it.

  
 You should be able to find assigned devices on some PCI bus, so you
 either have to hack up the existing bus to host devices that are, on the
 other side, not part of it or branch off a pci-kvm sub-bus, just like
 you would have to create a sysbus-kvm.

 Management tools should never transverse the device tree to find
 devices.  This is a recipe for disaster in the long term because the
 device tree will not remain stable.

 So yes, a management tool should be able to enumerate assigned devices
 as they would enumerate any other PCI device but that has almost nothing
 to do with what the tree layout is.
  
 I'm probably misunderstanding you, but if the bus topology as the guest
 sees it is not properly reflected in an object tree on the qemu side, we
 are creating hacks again.

 
 There is no such thing as the bus topology as the guest sees it.
 
 The guest just sees a bunch of devices.  The guest can only infer things 
 like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
 RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
 in a SuperI/O chip that's part of the southbridge is all invisible to 
 the guest.
 
 The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.

 
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.
 
 We need to provide a compatible interface to the guest.  If you agree 
 with my above statements, then you'll also agree that we can do this 
 without keeping the device model topology stable.
 
 But we also need to provide a compatible interface to management tools.  
 Exposing the device model topology as a compatible interface 
 artificially limits us.  It's far better to provide higher level 
 supported interfaces to give us the flexibility to change the device 
 model as we need to.

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses? Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?

 
   If they create a device on bus X, it
 must never end up on bus Y just because it happens to be KVM-assisted or
 has some other property.
 
 Nope.  This is exactly what should happen.
 
 90% of the devices in the device model are not created by management 
 tools.  They're part of a chipset.  The chipset has well defined 
 extension points and we provide management interfaces to create devices 
 on those extension points.  That is, interfaces to create PCI devices.
 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Alex Williamson

On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
 On 2011-01-18 16:48, Anthony Liguori wrote:
  On 01/18/2011 09:43 AM, Jan Kiszka wrote:
  On 2011-01-18 16:04, Anthony Liguori wrote:
 
  On 01/18/2011 08:28 AM, Jan Kiszka wrote:
   
  On 2011-01-12 11:31, Jan Kiszka wrote:
 
 
  Am 12.01.2011 11:22, Avi Kivity wrote:
 
   
  On 01/11/2011 03:54 PM, Anthony Liguori wrote:
 
 
  Right, we should introduce a KVMBus that KVM devices are created on.
  The devices can get at KVMState through the BusState.
 
   
  There is no kvm bus in a PC (I looked).  We're bending the device model
  here because a device is implemented in the kernel and not in
  userspace.  An implementation detail is magnified beyond all 
  proportions.
 
  An ioapic that is implemented by kvm lives in exactly the same place
  that the qemu ioapic lives in.  An assigned pci device lives on the PCI
  bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
  it elsewhere, not through creating imaginary buses that don't exist.
 
 
 
  Exactly.
 
  So we can either infect the whole device tree with kvm (or maybe a
  more generic accelerator structure that also deals with Xen) or we need
  to pull the reference inside the device's init function from some global
  service (kvm_get_state).
 
   
  Note that this topic is still waiting for good suggestions, specifically
  from those who believe in kvm_state references :). This is not only
  blocking kvmstate merge but will affect KVM irqchips as well.
 
  It boils down to how we reasonably pass a kvm_state reference from
  machine init code to a sysbus device. I'm probably biased, but I don't
  see any way that does not work against the idea of confining access to
  kvm_state or breaks device instantiation from the command line or a
  config file.
 
 
  A KVM device should sit on a KVM specific bus that hangs off of sysbus.
  It can get to kvm_state through that bus.
 
  That bus doesn't get instantiated through qdev so requiring a pointer
  argument should not be an issue.
 
   
  This design is in conflict with the requirement to attach KVM-assisted
  devices also to their home bus, e.g. an assigned PCI device to the PCI
  bus. We don't support multi-homed qdev devices.
 
  
  With vfio, would an assigned PCI device even need kvm_state?
 
 IIUC: Yes, for establishing the irqfd link.

We abstract this through the msi/msix layer though, so the vfio driver
doesn't directly know anything about kvm_state.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 18:02, Alex Williamson wrote:
 On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
 On 2011-01-18 16:48, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all 
 proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.


 With vfio, would an assigned PCI device even need kvm_state?

 IIUC: Yes, for establishing the irqfd link.
 
 We abstract this through the msi/msix layer though, so the vfio driver
 doesn't directly know anything about kvm_state.

Which version/tree are you referring to? It wasn't the case in the last
version I found on the list.

Does the msi layer use irqfd for every source in kvm mode then? Of
course, the key question will be how that layer will once obtain kvm_state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.
 

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
   


But we also don't do PCI passthrough so we really haven't even explored 
how that maps in qdev.  I don't know if qemu-kvm has attempted to 
qdev-ify it.



Management and analysis tools must be able to traverse the system buses
and find guest devices this way.
   

We need to provide a compatible interface to the guest.  If you agree
with my above statements, then you'll also agree that we can do this
without keeping the device model topology stable.

But we also need to provide a compatible interface to management tools.
Exposing the device model topology as a compatible interface
artificially limits us.  It's far better to provide higher level
supported interfaces to give us the flexibility to change the device
model as we need to.
 

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses?


The qdev device model is not a stable interface.  I think that's been 
clear from the very beginning.



  Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?
   


The -device interface is a stable interface.  Right now, you don't 
specify any type of identifier of the pci bus when you create a PCI 
device.  It's implied in the interface.


   
 

   If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property.
   

Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management
tools.  They're part of a chipset.  The chipset has well defined
extension points and we provide management interfaces to create devices
on those extension points.  That is, interfaces to create PCI devices.

 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.
   


It's automatically created as part of the CPUs or as part of the 
chipset.  How to enable/disable kvm assistance is a property of the CPU 
and/or chipset.


Regards,

Anthony Liguori



Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 18:09, Anthony Liguori wrote:
 On 01/18/2011 10:56 AM, Jan Kiszka wrote:

 The device model topology is 100% a hidden architectural detail.
  
 This is true for the sysbus, it is obviously not the case for PCI and
 similarly discoverable buses. There we have a guest-explorable topology
 that is currently equivalent to the the qdev layout.

 
 But we also don't do PCI passthrough so we really haven't even explored 
 how that maps in qdev.  I don't know if qemu-kvm has attempted to 
 qdev-ify it.

It is. And even if it weren't or the current version in qemu-kvm was not
perfect, we need to consider those uses cases now as we are about to
define a generic model for kvm device integration. That's the point of
this discussion.

 
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.

 We need to provide a compatible interface to the guest.  If you agree
 with my above statements, then you'll also agree that we can do this
 without keeping the device model topology stable.

 But we also need to provide a compatible interface to management tools.
 Exposing the device model topology as a compatible interface
 artificially limits us.  It's far better to provide higher level
 supported interfaces to give us the flexibility to change the device
 model as we need to.
  
 How do you want to change qdev to keep the guest and management tool
 view stable while branching off kvm sub-buses?
 
 The qdev device model is not a stable interface.  I think that's been 
 clear from the very beginning.

Internals aren't stable, but they should only be changed for a good
reason, specifically when the change may impact the whole set of device
models.

 
   Please propose such
 extensions so that they can be discussed. IIUC, that would be second
 relation between qdev and qbus instances besides the physical topology.
 What further use cases (besides passing kvm_state around) do you have in
 mind?

 
 The -device interface is a stable interface.  Right now, you don't 
 specify any type of identifier of the pci bus when you create a PCI 
 device.  It's implied in the interface.

Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.

 

  
If they create a device on bus X, it
 must never end up on bus Y just because it happens to be KVM-assisted or
 has some other property.

 Nope.  This is exactly what should happen.

 90% of the devices in the device model are not created by management
 tools.  They're part of a chipset.  The chipset has well defined
 extension points and we provide management interfaces to create devices
 on those extension points.  That is, interfaces to create PCI devices.

  
 Creating kvm irqchips via the management tool would be one simple way
 (not the only one, though) to enable/disable kvm assistance for those
 devices.

 
 It's automatically created as part of the CPUs or as part of the 
 chipset.  How to enable/disable kvm assistance is a property of the CPU 
 and/or chipset.

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 11:20 AM, Jan Kiszka wrote:


Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.
   


Today we only have a very low level factory interface--device creation 
and deletion.


I think we should move to higher level bus factory interfaces.  An 
interface to create a PCI device and to delete PCI devices.  This is the 
only sane way to do hot plug.


This also makes supporting multiple busses a lot more reasonable since 
this factory interface could be a method of the controller.



It's automatically created as part of the CPUs or as part of the
chipset.  How to enable/disable kvm assistance is a property of the CPU
and/or chipset.
 

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.
   


Which is fine in the short term.  This is exactly why we don't want the 
device model to be an ABI.   It gives us the ability to make changes as 
they make sense instead of trying to be perfect from the start (which we 
never will be).


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Alex Williamson

On Tue, 2011-01-18 at 18:08 +0100, Jan Kiszka wrote:
 On 2011-01-18 18:02, Alex Williamson wrote:
  On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
  On 2011-01-18 16:48, Anthony Liguori wrote:
  On 01/18/2011 09:43 AM, Jan Kiszka wrote:
  On 2011-01-18 16:04, Anthony Liguori wrote:
 
  On 01/18/2011 08:28 AM, Jan Kiszka wrote:
   
  On 2011-01-12 11:31, Jan Kiszka wrote:
 
 
  Am 12.01.2011 11:22, Avi Kivity wrote:
 
   
  On 01/11/2011 03:54 PM, Anthony Liguori wrote:
 
 
  Right, we should introduce a KVMBus that KVM devices are created on.
  The devices can get at KVMState through the BusState.
 
   
  There is no kvm bus in a PC (I looked).  We're bending the device 
  model
  here because a device is implemented in the kernel and not in
  userspace.  An implementation detail is magnified beyond all 
  proportions.
 
  An ioapic that is implemented by kvm lives in exactly the same place
  that the qemu ioapic lives in.  An assigned pci device lives on the 
  PCI
  bus, not a KVMBus.  If we need a pointer to KVMState, then we must 
  find
  it elsewhere, not through creating imaginary buses that don't exist.
 
 
 
  Exactly.
 
  So we can either infect the whole device tree with kvm (or maybe a
  more generic accelerator structure that also deals with Xen) or we 
  need
  to pull the reference inside the device's init function from some 
  global
  service (kvm_get_state).
 
   
  Note that this topic is still waiting for good suggestions, 
  specifically
  from those who believe in kvm_state references :). This is not only
  blocking kvmstate merge but will affect KVM irqchips as well.
 
  It boils down to how we reasonably pass a kvm_state reference from
  machine init code to a sysbus device. I'm probably biased, but I don't
  see any way that does not work against the idea of confining access to
  kvm_state or breaks device instantiation from the command line or a
  config file.
 
 
  A KVM device should sit on a KVM specific bus that hangs off of sysbus.
  It can get to kvm_state through that bus.
 
  That bus doesn't get instantiated through qdev so requiring a pointer
  argument should not be an issue.
 
   
  This design is in conflict with the requirement to attach KVM-assisted
  devices also to their home bus, e.g. an assigned PCI device to the PCI
  bus. We don't support multi-homed qdev devices.
 
 
  With vfio, would an assigned PCI device even need kvm_state?
 
  IIUC: Yes, for establishing the irqfd link.
  
  We abstract this through the msi/msix layer though, so the vfio driver
  doesn't directly know anything about kvm_state.
 
 Which version/tree are you referring to? It wasn't the case in the last
 version I found on the list.
 
 Does the msi layer use irqfd for every source in kvm mode then? Of
 course, the key question will be how that layer will once obtain kvm_state.

Looking at [RFC PATCH v2] VFIO based device assignment sent on Nov
5th, I guess we do call kvm_set_irqfd.  Maybe I'm just wishing that the
msi layer abstracted it better.  I'd like to be able to pass in a
userspace interrupt handler function pointer and an event notifier fd
and let the interrupt layers worry about how it's hooked up.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 18:31, Anthony Liguori wrote:
 It's automatically created as part of the CPUs or as part of the
 chipset.  How to enable/disable kvm assistance is a property of the CPU
 and/or chipset.
  
 If we exclude creation via command line / config files, we could also
 pass the kvm_state directly from the machine or chipset setup code and
 save us at least the kvm system buses.

 
 Which is fine in the short term.

Unless we want to abuse the pointer property for this, and there was
some resistance, we would have to change the sysbus init function
signature. I don't want to propose this for a short-term workaround, we
need a clearer vision and roadmap to avoid multiple invasive changes to
the device model.

  This is exactly why we don't want the 
 device model to be an ABI.   It gives us the ability to make changes as 
 they make sense instead of trying to be perfect from the start (which we 
 never will be).

The device model will always consist of a stable part, the guest and
management visible topology. That beast needs to be modeled as well,
likely via some new bus objects. If that's the way to go, starting now
is probably the right time as we have an urgent use case, right?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Paul E. McKenney

On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
 When built with rcu checks enabled, vhost triggers
 bogus warnings as vhost features are read without
 dev-mutex sometimes.
 Fixing it properly is not trivial as vhost.h does not
 know which lockdep classes it will be used under.
 Disable the warning by stubbing out the check for now.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/vhost.h |4 +---
  1 files changed, 1 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 2af44b7..2d03a31 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev 
 *dev, int bit)
  {
   unsigned acked_features;
 
 - acked_features =
 - rcu_dereference_index_check(dev-acked_features,
 - lockdep_is_held(dev-mutex));
 + acked_features = rcu_dereference_index_check(dev-acked_features, 1);

Ouch!!!

Could you please at least add a comment?

Alternatively, pass in the lock that is held and check for that?  Given
that this is a static inline, the compiler should be able to optimize
the argument away when !PROVE_RCU, correct?

Thanx, Paul

   return acked_features  (1  bit);
  }
 
 -- 
 1.7.3.2.91.g446ac
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Michael S. Tsirkin

On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
 On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
  When built with rcu checks enabled, vhost triggers
  bogus warnings as vhost features are read without
  dev-mutex sometimes.
  Fixing it properly is not trivial as vhost.h does not
  know which lockdep classes it will be used under.
  Disable the warning by stubbing out the check for now.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   drivers/vhost/vhost.h |4 +---
   1 files changed, 1 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
  index 2af44b7..2d03a31 100644
  --- a/drivers/vhost/vhost.h
  +++ b/drivers/vhost/vhost.h
  @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev 
  *dev, int bit)
   {
  unsigned acked_features;
  
  -   acked_features =
  -   rcu_dereference_index_check(dev-acked_features,
  -   lockdep_is_held(dev-mutex));
  +   acked_features = rcu_dereference_index_check(dev-acked_features, 1);
 
 Ouch!!!
 
 Could you please at least add a comment?

Yes, OK.

 Alternatively, pass in the lock that is held and check for that?  Given
 that this is a static inline, the compiler should be able to optimize
 the argument away when !PROVE_RCU, correct?
 
   Thanx, Paul

Hopefully, yes. We don't always have a lock: the idea was
to create a lockdep for these cases. But we can't pass
the pointer to that ...

  return acked_features  (1  bit);
   }
  
  -- 
  1.7.3.2.91.g446ac
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2] vhost: rcu annotation fixup

2011-01-18 Thread Michael S. Tsirkin

When built with rcu checks enabled, vhost triggers
bogus warnings as vhost features are read without
dev-mutex sometimes, and private pointer is read
with our kind of rcu where work serves as a
read side critical section.

Fixing it properly is not trivial.
Disable the warnings by stubbing out the checks for now.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Changes from v1: add TODO, fix more warnings.

 drivers/vhost/net.c   |9 +
 drivers/vhost/vhost.h |6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9b3ca10..f616cef 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -128,8 +128,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
 
-   /* TODO: check that we are running from vhost_worker?
-* Not sure it's worth it, it's straight-forward enough. */
+   /* TODO: check that we are running from vhost_worker? */
sock = rcu_dereference_check(vq-private_data, 1);
if (!sock)
return;
@@ -306,7 +305,8 @@ static void handle_rx_big(struct vhost_net *net)
size_t len, total_len = 0;
int err;
size_t hdr_size;
-   struct socket *sock = rcu_dereference(vq-private_data);
+   /* TODO: check that we are running from vhost_worker? */
+   struct socket *sock = rcu_dereference_check(vq-private_data, 1);
if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
return;
 
@@ -415,7 +415,8 @@ static void handle_rx_mergeable(struct vhost_net *net)
int err, headcount;
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
-   struct socket *sock = rcu_dereference(vq-private_data);
+   /* TODO: check that we are running from vhost_worker? */
+   struct socket *sock = rcu_dereference_check(vq-private_data, 1);
if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
return;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2af44b7..b3363ae 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -173,9 +173,9 @@ static inline int vhost_has_feature(struct vhost_dev *dev, 
int bit)
 {
unsigned acked_features;
 
-   acked_features =
-   rcu_dereference_index_check(dev-acked_features,
-   lockdep_is_held(dev-mutex));
+   /* TODO: check that we are running from vhost_worker or dev mutex is
+* held? */
+   acked_features = rcu_dereference_index_check(dev-acked_features, 1);
return acked_features  (1  bit);
 }
 
-- 
1.7.3.2.91.g446ac
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Paul E. McKenney

On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
 On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
  On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
   When built with rcu checks enabled, vhost triggers
   bogus warnings as vhost features are read without
   dev-mutex sometimes.
   Fixing it properly is not trivial as vhost.h does not
   know which lockdep classes it will be used under.
   Disable the warning by stubbing out the check for now.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
drivers/vhost/vhost.h |4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
   
   diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
   index 2af44b7..2d03a31 100644
   --- a/drivers/vhost/vhost.h
   +++ b/drivers/vhost/vhost.h
   @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev 
   *dev, int bit)
{
 unsigned acked_features;
   
   - acked_features =
   - rcu_dereference_index_check(dev-acked_features,
   - lockdep_is_held(dev-mutex));
   + acked_features = rcu_dereference_index_check(dev-acked_features, 1);
  
  Ouch!!!
  
  Could you please at least add a comment?
 
 Yes, OK.
 
  Alternatively, pass in the lock that is held and check for that?  Given
  that this is a static inline, the compiler should be able to optimize
  the argument away when !PROVE_RCU, correct?
  
  Thanx, Paul
 
 Hopefully, yes. We don't always have a lock: the idea was
 to create a lockdep for these cases. But we can't pass
 the pointer to that ...

I suppose you could pass a pointer to the lockdep map structure.
Not sure if this makes sense, but it would handle the situation.

Alternatively, create a helper function that checks the possibilities
and screams if none of them are in effect.

Thanx, Paul

 return acked_features  (1  bit);
}
   
   -- 
   1.7.3.2.91.g446ac
   --
   To unsubscribe from this list: send the line unsubscribe linux-kernel in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Flow Control and Port Mirroring Revisited

2011-01-18 Thread Rick Jones


So it won't be all that simple to implement well, and before we try,
I'd like to know whether there are applications that are helped
by it. For example, we could try to measure latency at various
pps and see whether the backpressure helps. netperf has -b, -w
flags which might help these measurements.


Those options are enabled when one adds --enable-burst to the pre-compilation 
./configure  of netperf (one doesn't have to recompile netserver).  However, if 
one is also looking at latency statistics via the -j option in the top-of-trunk, 
or simply at the histogram with --enable-histogram on the ./configure and a 
verbosity level of 2 (global -v 2) then one wants the very top of trunk netperf 
from:


http://www.netperf.org/svn/netperf2/trunk

to get the recently added support for accurate (netperf level) RTT measuremnts 
on burst-mode request/response tests.


happy benchmarking,

rick jones

PS - the enhanced latency statistics from -j are only available in the omni 
version of the TCP_RR test.  To get that add a --enable-omni to the ./configure 
- and in this case both netperf and netserver have to be recompiled.  For very 
basic output one can peruse the output of:


src/netperf -t omni -- -O /?

and then pick those outputs of interest and put them into an output selection 
file which one then passes to either (test-specific) -o, -O or -k to get CVS, 
Human or keyval output respectively.  E.G.


raj@tardy:~/netperf2_trunk$ cat foo
THROUGHPUT,THROUGHPUT_UNITS
RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY

when foo is passed to -o one will get those all on one line of CSV.  To -O one 
gets three lines of more netperf-classic-like human readable output, and when 
one passes that to -k one gets a string of keyval output a la:


raj@tardy:~/netperf2_trunk$ src/netperf -t omni -j -v 2 -- -r 1 -d rr -k foo
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 
AF_INET : histogram

THROUGHPUT=29454.12
THROUGHPUT_UNITS=Trans/s
RT_LATENCY=33.951
MIN_LATENCY=19
MEAN_LATENCY=32.00
MAX_LATENCY=126
P50_LATENCY=32
P90_LATENCY=38
P99_LATENCY=41
STDDEV_LATENCY=5.46

Histogram of request/response times
UNIT_USEC :0:0:0:0:0:0:0:0:0:0
TEN_USEC  :0: 3553: 45244: 237790: 7859:   86:   10:3:0:0
HUNDRED_USEC  :0:2:0:0:0:0:0:0:0:0
UNIT_MSEC :0:0:0:0:0:0:0:0:0:0
TEN_MSEC  :0:0:0:0:0:0:0:0:0:0
HUNDRED_MSEC  :0:0:0:0:0:0:0:0:0:0
UNIT_SEC  :0:0:0:0:0:0:0:0:0:0
TEN_SEC   :0:0:0:0:0:0:0:0:0:0
100_SECS: 0
HIST_TOTAL:  294547

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Michael S. Tsirkin

On Tue, Jan 18, 2011 at 11:02:33AM -0800, Paul E. McKenney wrote:
 On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
  On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
   On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
When built with rcu checks enabled, vhost triggers
bogus warnings as vhost features are read without
dev-mutex sometimes.
Fixing it properly is not trivial as vhost.h does not
know which lockdep classes it will be used under.
Disable the warning by stubbing out the check for now.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.h |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2af44b7..2d03a31 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct 
vhost_dev *dev, int bit)
 {
unsigned acked_features;

-   acked_features =
-   rcu_dereference_index_check(dev-acked_features,
-   
lockdep_is_held(dev-mutex));
+   acked_features = 
rcu_dereference_index_check(dev-acked_features, 1);
   
   Ouch!!!
   
   Could you please at least add a comment?
  
  Yes, OK.
  
   Alternatively, pass in the lock that is held and check for that?  Given
   that this is a static inline, the compiler should be able to optimize
   the argument away when !PROVE_RCU, correct?
   
 Thanx, Paul
  
  Hopefully, yes. We don't always have a lock: the idea was
  to create a lockdep for these cases. But we can't pass
  the pointer to that ...
 
 I suppose you could pass a pointer to the lockdep map structure.
 Not sure if this makes sense, but it would handle the situation.

Will it compile with lockdep disabled too? What will the pointer be?

 Alternatively, create a helper function that checks the possibilities
 and screams if none of them are in effect.
 
   Thanx, Paul

The problem here is the callee needs to know about all callers.

return acked_features  (1  bit);
 }

-- 
1.7.3.2.91.g446ac
--
To unsubscribe from this list: send the line unsubscribe linux-kernel 
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Flow Control and Port Mirroring Revisited

2011-01-18 Thread Michael S. Tsirkin

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
 So it won't be all that simple to implement well, and before we try,
 I'd like to know whether there are applications that are helped
 by it. For example, we could try to measure latency at various
 pps and see whether the backpressure helps. netperf has -b, -w
 flags which might help these measurements.
 
 Those options are enabled when one adds --enable-burst to the
 pre-compilation ./configure  of netperf (one doesn't have to
 recompile netserver).  However, if one is also looking at latency
 statistics via the -j option in the top-of-trunk, or simply at the
 histogram with --enable-histogram on the ./configure and a verbosity
 level of 2 (global -v 2) then one wants the very top of trunk
 netperf from:
 
 http://www.netperf.org/svn/netperf2/trunk
 
 to get the recently added support for accurate (netperf level) RTT
 measuremnts on burst-mode request/response tests.
 
 happy benchmarking,
 
 rick jones
 
 PS - the enhanced latency statistics from -j are only available in
 the omni version of the TCP_RR test.  To get that add a
 --enable-omni to the ./configure - and in this case both netperf and
 netserver have to be recompiled.


Is this TCP only? I would love to get latency data from UDP as well.

  For very basic output one can
 peruse the output of:
 
 src/netperf -t omni -- -O /?
 
 and then pick those outputs of interest and put them into an output
 selection file which one then passes to either (test-specific) -o,
 -O or -k to get CVS, Human or keyval output respectively.  E.G.
 
 raj@tardy:~/netperf2_trunk$ cat foo
 THROUGHPUT,THROUGHPUT_UNITS
 RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
 P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY
 
 when foo is passed to -o one will get those all on one line of CSV.
 To -O one gets three lines of more netperf-classic-like human
 readable output, and when one passes that to -k one gets a string of
 keyval output a la:
 
 raj@tardy:~/netperf2_trunk$ src/netperf -t omni -j -v 2 -- -r 1 -d rr -k foo
 OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost
 (127.0.0.1) port 0 AF_INET : histogram
 THROUGHPUT=29454.12
 THROUGHPUT_UNITS=Trans/s
 RT_LATENCY=33.951
 MIN_LATENCY=19
 MEAN_LATENCY=32.00
 MAX_LATENCY=126
 P50_LATENCY=32
 P90_LATENCY=38
 P99_LATENCY=41
 STDDEV_LATENCY=5.46
 
 Histogram of request/response times
 UNIT_USEC :0:0:0:0:0:0:0:0:0:0
 TEN_USEC  :0: 3553: 45244: 237790: 7859:   86:   10:3:0:0
 HUNDRED_USEC  :0:2:0:0:0:0:0:0:0:0
 UNIT_MSEC :0:0:0:0:0:0:0:0:0:0
 TEN_MSEC  :0:0:0:0:0:0:0:0:0:0
 HUNDRED_MSEC  :0:0:0:0:0:0:0:0:0:0
 UNIT_SEC  :0:0:0:0:0:0:0:0:0:0
 TEN_SEC   :0:0:0:0:0:0:0:0:0:0
 100_SECS: 0
 HIST_TOTAL:  294547
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Paul E. McKenney

On Tue, Jan 18, 2011 at 10:10:31PM +0200, Michael S. Tsirkin wrote:
 On Tue, Jan 18, 2011 at 11:02:33AM -0800, Paul E. McKenney wrote:
  On Tue, Jan 18, 2011 at 07:55:00PM +0200, Michael S. Tsirkin wrote:
   On Tue, Jan 18, 2011 at 09:48:34AM -0800, Paul E. McKenney wrote:
On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
 When built with rcu checks enabled, vhost triggers
 bogus warnings as vhost features are read without
 dev-mutex sometimes.
 Fixing it properly is not trivial as vhost.h does not
 know which lockdep classes it will be used under.
 Disable the warning by stubbing out the check for now.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/vhost/vhost.h |4 +---
  1 files changed, 1 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 2af44b7..2d03a31 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct 
 vhost_dev *dev, int bit)
  {
   unsigned acked_features;
 
 - acked_features =
 - rcu_dereference_index_check(dev-acked_features,
 - 
 lockdep_is_held(dev-mutex));
 + acked_features = 
 rcu_dereference_index_check(dev-acked_features, 1);

Ouch!!!

Could you please at least add a comment?
   
   Yes, OK.
   
Alternatively, pass in the lock that is held and check for that?  Given
that this is a static inline, the compiler should be able to optimize
the argument away when !PROVE_RCU, correct?

Thanx, Paul
   
   Hopefully, yes. We don't always have a lock: the idea was
   to create a lockdep for these cases. But we can't pass
   the pointer to that ...
  
  I suppose you could pass a pointer to the lockdep map structure.
  Not sure if this makes sense, but it would handle the situation.
 
 Will it compile with lockdep disabled too? What will the pointer be?

One (crude) approach would be to make the pointer void* if lockdep
is disabled.

  Alternatively, create a helper function that checks the possibilities
  and screams if none of them are in effect.
  
  Thanx, Paul
 
 The problem here is the callee needs to know about all callers.

As does the guy reading the code.  ;-)

Thanx, Paul

   return acked_features  (1  bit);
  }
 
 -- 
 1.7.3.2.91.g446ac
 --
 To unsubscribe from this list: send the line unsubscribe 
 linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Flow Control and Port Mirroring Revisited

2011-01-18 Thread Rick Jones


Michael S. Tsirkin wrote:

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:


PS - the enhanced latency statistics from -j are only available in
the omni version of the TCP_RR test.  To get that add a
--enable-omni to the ./configure - and in this case both netperf and
netserver have to be recompiled.


Is this TCP only? I would love to get latency data from UDP as well.


I believe it will work with UDP request response as well.  The omni test code 
strives to be protocol agnostic.  (I'm sure there are bugs of course, there 
always are.)


There is though the added complication of there being no specific matching of 
requests to responses.  The code as written takes advantage of TCP's in-order 
semantics and recovery from packet loss.  In a plain UDP_RR test, with one at 
a time transactions, if either the request or response are lost, data flow 
effectively stops there until the timer expires.  So, one has reasonable RTT 
numbers from before that point.  In a burst UDP RR test, the code doesn't know 
which request/response was lost and so the matching being done to get RTTs will 
be off by each lost datagram.  And if something were re-ordered the timstamps 
would be off even without a datagram loss event.


To fix that would require netperf do something it has not yet done in 18-odd 
years :)  That is actually echo something back from the netserver on the RR test 
- either an id, or a timestamp.  That means dirtying the buffers which means 
still more cache misses, from places other than the actual stack. Not beyond the 
realm of the possible, but it would be a bit of departure for normal operation 
(*) and could enforce a minimum request/response size beyond the present single 
byte (ok, perhaps only two or four bytes :).  But that, perhaps, is a discussion 
best left to netperf-talk at netperf.org.


happy benchmarking,

rick jones

(*) netperf does have the concept of reading from and/or dirtying buffers, 
put-in back in the days of COW/page-remapping in HP-UX 9.0, but that was mainly 
to force COW and/or show the effect of the required data cache purges/flushes. 
As such it was made conditional on DIRTY being defined.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] Removing scripts/check_image.py

2011-01-18 Thread Lucas Meneghel Rodrigues

This patchset aims to remove the script check_image.py from
KVM autotest. The reasoning behind this is that we want to
revamp all the pre/post scripts processing, that is, making
the py scripts part of the infrastructure, where they can
share API with the rest of the framework. So we'll proceed
by parts, this being the first.

Lucas Meneghel Rodrigues (3):
  KVM test: Introduce check_image postprocess directive
  KVM test: Modify enospc test to not require scripts/check_image.py
  KVM test: Removing scripts/check_image.py

 client/tests/kvm/kvm_preprocessing.py   |3 +-
 client/tests/kvm/kvm_vm.py  |   62 +++
 client/tests/kvm/scripts/check_image.py |   99 ---
 client/tests/kvm/tests/enospc.py|   12 ++--
 client/tests/kvm/tests_base.cfg.sample  |7 +-
 5 files changed, 73 insertions(+), 110 deletions(-)
 delete mode 100644 client/tests/kvm/scripts/check_image.py

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM test: Introduce check_image postprocess directive

2011-01-18 Thread Lucas Meneghel Rodrigues

With check_image_foo = yes, KVM autotest will use
qemu-img to perform checks on the qcow2 image.

This new functionality intends to replace the
script check_image.py. The plan is to supersede most
of the pre/post scripts in place throughout KVM
autotest.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py  |3 +-
 client/tests/kvm/kvm_vm.py |   62 
 client/tests/kvm/tests_base.cfg.sample |7 ++--
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 56acf0c..4a6e0f8 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -93,11 +93,12 @@ def preprocess_vm(test, params, env, name):
 def postprocess_image(test, params):
 
 Postprocess a single QEMU image according to the instructions in params.
-Currently this function just removes an image if requested.
 
 @param test: An Autotest test object.
 @param params: A dict containing image postprocessing parameters.
 
+if params.get(check_image) == yes:
+kvm_vm.check_image(params, test.bindir)
 if params.get(remove_image) == yes:
 kvm_vm.remove_image(params, test.bindir)
 
diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 18d10ef..767c6d4 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -47,6 +47,15 @@ class VMImageMissingError(VMError):
 return CD image file not found: %r % self.filename
 
 
+class VMImageCheckError(VMError):
+def __init__(self, filename):
+VMError.__init__(self, filename)
+self.filename = filename
+
+def __str__(self):
+return Errors found on image: %r % self.filename
+
+
 class VMBadPATypeError(VMError):
 def __init__(self, pa_type):
 VMError.__init__(self, pa_type)
@@ -239,6 +248,59 @@ def remove_image(params, root_dir):
 logging.debug(Image file %s not found)
 
 
+def check_image(params, root_dir):
+
+Check an image using qemu-img.
+
+@param params: Dictionary containing the test parameters.
+@param root_dir: Base directory for relative filenames.
+
+@note: params should contain:
+   image_name -- the name of the image file, without extension
+   image_format -- the format of the image (qcow2, raw etc)
+
+image_filename = get_image_filename(params, root_dir)
+logging.debug(Checking image file %s... % image_filename)
+qemu_img_cmd = kvm_utils.get_path(root_dir,
+  params.get(qemu_img_binary, 
qemu-img))
+check_critical = params.get(check_image_critical) == 'yes'
+image_is_qcow2 = params.get(image_format) == 'qcow2'
+if os.path.exists(image_filename) and image_is_qcow2:
+# Verifying if qemu-img supports 'check'
+q_result = utils.run(qemu_img_cmd, ignore_status=True)
+q_output = q_result.stdout
+check_img = True
+if not check in q_output:
+logging.error(qemu-img does not support 'check', 
+  skipping check...)
+check_img = False
+if not info in q_output:
+logging.error(qemu-img does not support 'info', 
+  skipping check...)
+check_img = False
+if check_img:
+try:
+utils.system(%s info %s % (qemu_img_cmd, image_filename))
+except error.CmdError:
+logging.error(Error getting info from image %s,
+  image_filename)
+try:
+utils.system(%s check %s % (qemu_img_cmd, image_filename))
+except error.CmdError:
+if check_critical:
+raise VMImageCheckError(image_filename)
+else:
+logging.error(Error checking image %s, image_filename)
+
+else:
+if not os.path.exists(image_filename):
+logging.debug(Image file %s not found, skipping check...,
+  image_filename)
+elif not image_is_qcow2:
+logging.debug(Image file %s not qcow2, skipping check...,
+  image_filename)
+
+
 class VM:
 
 This class handles all basic VM operations.
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 28064a8..8b67256 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -2357,11 +2357,10 @@ kdump:
 variants:
 - @qcow2:
 image_format = qcow2
-post_command +=  python scripts/check_image.py;
-post_command_timeout = 600
-post_command_noncritical = yes
+check_image = yes
+check_image_critical = no
 ioquit:
-post_command_noncritical = no
+check_image_critical = yes
 - vmdk:
 no

[PATCH 2/3] KVM test: Modify enospc test to not require scripts/check_image.py

2011-01-18 Thread Lucas Meneghel Rodrigues

With this we prepare to remove the aforementioned script.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests/enospc.py |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/client/tests/kvm/tests/enospc.py b/client/tests/kvm/tests/enospc.py
index 2d8b628..4dd694f 100644
--- a/client/tests/kvm/tests/enospc.py
+++ b/client/tests/kvm/tests/enospc.py
@@ -1,7 +1,7 @@
 import logging, commands, time, os, re
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
-import kvm_test_utils
+import kvm_test_utils, kvm_vm
 
 
 def run_enospc(test, params, env):
@@ -46,11 +46,11 @@ def run_enospc(test, params, env):
 if paused in status:
 pause_n += 1
 logging.info(Checking all images in use by the VM)
-script_path = os.path.join(test.bindir, scripts/check_image.py)
-try:
-cmd_result = utils.run('python %s' % script_path)
-except error.CmdError, e:
-logging.debug(e.result_obj.stdout)
+for image_name in vm.params.objects(images):
+image_params = vm.params.object_params(image_name)
+# Just to make sure the image check won't throw exceptions
+image_params[check_image_critical] = 'no'
+kvm_vm.check_image(image_params, test.bindir)
 logging.info(Guest paused, extending Logical Volume size)
 try:
 cmd_result = utils.run(lvextend -L +200M /dev/vgtest/lvtest)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM test: Removing scripts/check_image.py

2011-01-18 Thread Lucas Meneghel Rodrigues

As its functionality was implemented as part of the
framework on previous patches.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/scripts/check_image.py |   99 ---
 1 files changed, 0 insertions(+), 99 deletions(-)
 delete mode 100644 client/tests/kvm/scripts/check_image.py

diff --git a/client/tests/kvm/scripts/check_image.py 
b/client/tests/kvm/scripts/check_image.py
deleted file mode 100644
index 2b5c227..000
--- a/client/tests/kvm/scripts/check_image.py
+++ /dev/null
@@ -1,99 +0,0 @@
-import os, sys, commands
-
-
-class ImageCheckError(Exception):
-
-Simple wrapper for the builtin Exception class.
-
-pass
-
-
-class ImageCheck(object):
-
-Check qcow2 image by qemu-img info/check command.
-
-def __init__(self):
-
-Gets params from environment variables and sets class attributes.
-
-self.image_path_list = []
-client_dir =  os.environ['AUTODIR']
-self.kvm_dir = os.path.join(client_dir, 'tests/kvm')
-img_to_check = os.environ['KVM_TEST_images'].split()
-
-for img in img_to_check:
-img_name_str = KVM_TEST_image_name_%s % img
-if not os.environ.has_key(img_name_str):
-img_name_str = KVM_TEST_image_name
-img_format_str = KVM_TEST_image_format_%s % img
-if os.environ.has_key(img_format_str):
-image_format = os.environ[img_format_str]
-else:
-image_format = os.environ['KVM_TEST_image_format']
-if image_format != qcow2:
-continue
-image_name = os.environ[img_name_str]
-image_filename = %s.%s % (image_name, image_format)
-image_filename = os.path.join(self.kvm_dir, image_filename)
-self.image_path_list.append(image_filename)
-if os.environ.has_key('KVM_TEST_qemu_img_binary'):
-self.qemu_img_path = os.environ['KVM_TEST_qemu_img_binary']
-else:
-self.qemu_img_path = os.path.join(self.kvm_dir, 'qemu-img')
-self.qemu_img_check = True
-cmd = %s |grep check % self.qemu_img_path
-(s1, output) = commands.getstatusoutput(cmd)
-if s1:
-self.qemu_img_check = False
-print Command qemu-img check not available, not checking...
-cmd = %s |grep info % self.qemu_img_path
-(s2, output) = commands.getstatusoutput(cmd)
-if s2:
-self.qemu_img_check = False
-print Command qemu-img info not available, not checking...
-
-def exec_img_cmd(self, cmd_type, image_path):
-
-Run qemu-img info/check on given image.
-
-@param cmd_type: Sub command used together with qemu.
-@param image_path: Real path of the image.
-
-cmd = ' '.join([self.qemu_img_path, cmd_type, image_path])
-print Checking image with command %s % cmd
-(status, output) = commands.getstatusoutput(cmd)
-print output
-if status or (cmd_type == check and not No errors in output):
-msg = Command %s failed % cmd
-return False, msg
-else:
-return True, ''
-
-
-def check_image(self):
-
-Run qemu-img info/check to check the image in list.
-
-If the image checking is failed, raise an exception.
-
-# Check all the image in list.
-errmsg = []
-for image_path in self.image_path_list:
-if not os.path.exists(image_path):
-print Image %s does not exist! % image_path
-continue
-s, o = self.exec_img_cmd('info', image_path)
-if not s:
-errmsg.append(o)
-s, o = self.exec_img_cmd('check', image_path)
-if not s:
-errmsg.append(o)
-
-if len(errmsg)  0:
-raise ImageCheckError('Errors were found, please check log!')
-
-
-if __name__ == __main__:
-image_check = ImageCheck()
-if image_check.qemu_img_check:
-image_check.check_image()
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH 1/3] KVM test: Introduce check_image postprocess directive

2011-01-18 Thread Amos Kong

- Original Message -
 With check_image_foo = yes, KVM autotest will use
 qemu-img to perform checks on the qcow2 image.
 
 This new functionality intends to replace the
 script check_image.py. The plan is to supersede most
 of the pre/post scripts in place throughout KVM
 autotest.

Good idea!
Some pre/post cmd/script is very confused, this would make thing clear.

 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com

I'll test patches when I go to office.

 ---
 client/tests/kvm/kvm_preprocessing.py | 3 +-
 client/tests/kvm/kvm_vm.py | 62 
 client/tests/kvm/tests_base.cfg.sample | 7 ++--
 3 files changed, 67 insertions(+), 5 deletions(-)
 
 diff --git a/client/tests/kvm/kvm_preprocessing.py
 b/client/tests/kvm/kvm_preprocessing.py
 index 56acf0c..4a6e0f8 100644
 --- a/client/tests/kvm/kvm_preprocessing.py
 +++ b/client/tests/kvm/kvm_preprocessing.py
 @@ -93,11 +93,12 @@ def preprocess_vm(test, params, env, name):
 def postprocess_image(test, params):
 
 Postprocess a single QEMU image according to the instructions in
 params.
 - Currently this function just removes an image if requested.
 
 @param test: An Autotest test object.
 @param params: A dict containing image postprocessing parameters.
 
 + if params.get(check_image) == yes:
 + kvm_vm.check_image(params, test.bindir)
 if params.get(remove_image) == yes:
 kvm_vm.remove_image(params, test.bindir)

It will fix the problem that check_image.py try to check some removed images.
 
 diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
 index 18d10ef..767c6d4 100755
 --- a/client/tests/kvm/kvm_vm.py
 +++ b/client/tests/kvm/kvm_vm.py
 @@ -47,6 +47,15 @@ class VMImageMissingError(VMError):
 return CD image file not found: %r % self.filename
 
 
 +class VMImageCheckError(VMError):
 + def __init__(self, filename):
 + VMError.__init__(self, filename)
 + self.filename = filename
 +
 + def __str__(self):
 + return Errors found on image: %r % self.filename
 +
 +
 class VMBadPATypeError(VMError):
 def __init__(self, pa_type):
 VMError.__init__(self, pa_type)
 @@ -239,6 +248,59 @@ def remove_image(params, root_dir):
 logging.debug(Image file %s not found)
 
 
 +def check_image(params, root_dir):
 + 
 + Check an image using qemu-img.
 +
 + @param params: Dictionary containing the test parameters.
 + @param root_dir: Base directory for relative filenames.
 +
 + @note: params should contain:
 + image_name -- the name of the image file, without extension
 + image_format -- the format of the image (qcow2, raw etc)
 + 
 + image_filename = get_image_filename(params, root_dir)
 + logging.debug(Checking image file %s... % image_filename)
 + qemu_img_cmd = kvm_utils.get_path(root_dir,
 + params.get(qemu_img_binary, qemu-img))
 + check_critical = params.get(check_image_critical) == 'yes'
 + image_is_qcow2 = params.get(image_format) == 'qcow2'
 + if os.path.exists(image_filename) and image_is_qcow2:
 + # Verifying if qemu-img supports 'check'
 + q_result = utils.run(qemu_img_cmd, ignore_status=True)
 + q_output = q_result.stdout
 + check_img = True
 + if not check in q_output:
 + logging.error(qemu-img does not support 'check', 
 + skipping check...)
 + check_img = False
 + if not info in q_output:
 + logging.error(qemu-img does not support 'info', 
 + skipping check...)
 + check_img = False
 + if check_img:
 + try:
 + utils.system(%s info %s % (qemu_img_cmd, image_filename))
 + except error.CmdError:
 + logging.error(Error getting info from image %s,
 + image_filename)
 + try:
 + utils.system(%s check %s % (qemu_img_cmd, image_filename))
 + except error.CmdError:
 + if check_critical:
 + raise VMImageCheckError(image_filename)
 + else:
 + logging.error(Error checking image %s, image_filename)
 +
 + else:
 + if not os.path.exists(image_filename):
 + logging.debug(Image file %s not found, skipping check...,
 + image_filename)
 + elif not image_is_qcow2:
 + logging.debug(Image file %s not qcow2, skipping check...,
 + image_filename)
 +
 +
 class VM:
 
 This class handles all basic VM operations.
 diff --git a/client/tests/kvm/tests_base.cfg.sample
 b/client/tests/kvm/tests_base.cfg.sample
 index 28064a8..8b67256 100644
 --- a/client/tests/kvm/tests_base.cfg.sample
 +++ b/client/tests/kvm/tests_base.cfg.sample
 @@ -2357,11 +2357,10 @@ kdump:
 variants:
 - @qcow2:
 image_format = qcow2
 - post_command +=  python scripts/check_image.py;
 - post_command_timeout = 600
 - post_command_noncritical = yes
 + check_image = yes
 + check_image_critical = no
 ioquit:
 - post_command_noncritical = no
 + check_image_critical = yes
 - vmdk:
 no ioquit
 only Fedora Ubuntu Windows
 --
 1.7.3.4
 
 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Mel Gorman

On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
 When built with rcu checks enabled, vhost triggers
 bogus warnings as vhost features are read without
 dev-mutex sometimes.
 Fixing it properly is not trivial as vhost.h does not
 know which lockdep classes it will be used under.
 Disable the warning by stubbing out the check for now.
 

What is the harm in leaving the bogus warnings until the difficult fix
happens? RCU checks enabled does not seem like something that is enabled
in production. If this patch is applied, there is always the risk that
it'll be simply forgotten about.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Removing scripts/hugepage.py

2011-01-18 Thread Lucas Meneghel Rodrigues

Continuing on our quest of obsoleting and removing the pre/post
scripts, this time we remove hugepage.py for a slightly better
and cleaner framework implementation.

Lucas Meneghel Rodrigues (2):
  KVM test: Turning hugepages setup into KVM autotest infrastructure
  KVM test: Removing scripts/hugepage.py

 client/tests/kvm/kvm_preprocessing.py  |8 ++
 client/tests/kvm/kvm_utils.py  |  102 +++
 client/tests/kvm/scripts/hugepage.py   |  118 
 client/tests/kvm/tests_base.cfg.sample |3 +-
 4 files changed, 111 insertions(+), 120 deletions(-)
 delete mode 100755 client/tests/kvm/scripts/hugepage.py

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM test: Turning hugepages setup into KVM autotest infrastructure

2011-01-18 Thread Lucas Meneghel Rodrigues

So we can get rid of scripts/hugepage.py. The implementation
strategy is to have a kvm_utils.HugePageConfig class that
can do both hugepages setup and cleanup, and call it during
pre/postprocessing.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py  |8 +++
 client/tests/kvm/kvm_utils.py  |  102 
 client/tests/kvm/tests_base.cfg.sample |3 +-
 3 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 4a6e0f8..41455cf 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -254,6 +254,10 @@ def preprocess(test, params, env):
 logging.debug(KVM userspace version: %s % kvm_userspace_version)
 test.write_test_keyval({kvm_userspace_version: kvm_userspace_version})
 
+if params.get(setup_hugepages) == yes:
+h = kvm_utils.HugePageConfig(params)
+h.setup()
+
 # Execute any pre_commands
 if params.get(pre_command):
 process_command(test, params, env, params.get(pre_command),
@@ -350,6 +354,10 @@ def postprocess(test, params, env):
 env[tcpdump].close()
 del env[tcpdump]
 
+if params.get(setup_hugepages) == yes:
+h = kvm_utils.HugePageConfig(params)
+h.cleanup()
+
 # Execute any post_commands
 if params.get(post_command):
 process_command(test, params, env, params.get(post_command),
diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 78c9f25..b61ff94 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -1265,6 +1265,108 @@ class KvmLoggingConfig(logging_config.LoggingConfig):
 verbose=verbose)
 
 
+class HugePageConfig:
+def __init__(self, params):
+
+Gets environment variable values and calculates the target number
+of huge memory pages.
+
+@param params: Dict like object containing parameters for the test.
+
+self.vms = len(params.objects(vms))
+self.mem = int(params.get(mem))
+self.max_vms = int(params.get(max_vms, 0))
+self.hugepage_path = '/mnt/kvm_hugepage'
+self.hugepage_size = self.get_hugepage_size()
+self.target_hugepages = self.get_target_hugepages()
+self.kernel_hp_file = '/proc/sys/vm/nr_hugepages'
+
+
+def get_hugepage_size(self):
+
+Get the current system setting for huge memory page size.
+
+meminfo = open('/proc/meminfo', 'r').readlines()
+huge_line_list = [h for h in meminfo if h.startswith(Hugepagesize)]
+try:
+return int(huge_line_list[0].split()[1])
+except ValueError, e:
+raise ValueError(Could not get huge page size setting from 
+ /proc/meminfo: %s % e)
+
+
+def get_target_hugepages(self):
+
+Calculate the target number of hugepages for testing purposes.
+
+if self.vms  self.max_vms:
+self.vms = self.max_vms
+# memory of all VMs plus qemu overhead of 64MB per guest
+vmsm = (self.vms * self.mem) + (self.vms * 64)
+return int(vmsm * 1024 / self.hugepage_size)
+
+
+@error.context_aware
+def set_hugepages(self):
+
+Sets the hugepage limit to the target hugepage value calculated.
+
+error.base_context(Setting hugepages limit to %s %
+   self.target_hugepages)
+hugepage_cfg = open(self.kernel_hp_file, r+)
+hp = hugepage_cfg.readline()
+while int(hp)  self.target_hugepages:
+loop_hp = hp
+hugepage_cfg.write(str(self.target_hugepages))
+hugepage_cfg.flush()
+hugepage_cfg.seek(0)
+hp = int(hugepage_cfg.readline())
+if loop_hp == hp:
+raise ValueError(Cannot set the kernel hugepage setting 
+ to the target value of %d hugepages. %
+ self.target_hugepages)
+hugepage_cfg.close()
+logging.debug(Successfuly set %s large memory pages on host ,
+  self.target_hugepages)
+
+
+@error.context_aware
+def mount_hugepage_fs(self):
+
+Verify if there's a hugetlbfs mount set. If there's none, will set up
+a hugetlbfs mount using the class attribute that defines the mount
+point.
+
+error.base_context(Mounting hugepages path)
+if not os.path.ismount(self.hugepage_path):
+if not os.path.isdir(self.hugepage_path):
+os.makedirs(self.hugepage_path)
+cmd = mount -t hugetlbfs none %s % self.hugepage_path
+utils.system(cmd)
+
+
+def setup(self):
+logging.debug(Number of VMs this test will use: %d, self.vms)
+

[PATCH 2/2] KVM test: Removing scripts/hugepage.py

2011-01-18 Thread Lucas Meneghel Rodrigues

Now that its functionality has been reimplemented as
KVM test infrastructure.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/scripts/hugepage.py |  118 --
 1 files changed, 0 insertions(+), 118 deletions(-)
 delete mode 100755 client/tests/kvm/scripts/hugepage.py

diff --git a/client/tests/kvm/scripts/hugepage.py 
b/client/tests/kvm/scripts/hugepage.py
deleted file mode 100755
index 8a1b0f6..000
--- a/client/tests/kvm/scripts/hugepage.py
+++ /dev/null
@@ -1,118 +0,0 @@
-#!/usr/bin/python
-# -*- coding: utf-8 -*-
-import os, sys, time
-
-
-Simple script to allocate enough hugepages for KVM testing purposes.
-
-
-class HugePageError(Exception):
-
-Simple wrapper for the builtin Exception class.
-
-pass
-
-
-class HugePage:
-def __init__(self, hugepage_path=None):
-
-Gets environment variable values and calculates the target number
-of huge memory pages.
-
-@param hugepage_path: Path where to mount hugetlbfs path, if not
-yet configured.
-
-self.vms = len(os.environ['KVM_TEST_vms'].split())
-self.mem = int(os.environ['KVM_TEST_mem'])
-try:
-self.max_vms = int(os.environ['KVM_TEST_max_vms'])
-except KeyError:
-self.max_vms = 0
-
-if hugepage_path:
-self.hugepage_path = hugepage_path
-else:
-self.hugepage_path = '/mnt/kvm_hugepage'
-
-self.hugepage_size = self.get_hugepage_size()
-self.target_hugepages = self.get_target_hugepages()
-print Number of VMs this test will use: %d % self.vms
-print Amount of memory used by each vm: %s % self.mem
-print (System setting for large memory page size: %s %
-   self.hugepage_size)
-print (Number of large memory pages needed for this test: %s %
-   self.target_hugepages)
-
-
-def get_hugepage_size(self):
-
-Get the current system setting for huge memory page size.
-
-meminfo = open('/proc/meminfo', 'r').readlines()
-huge_line_list = [h for h in meminfo if h.startswith(Hugepagesize)]
-try:
-return int(huge_line_list[0].split()[1])
-except ValueError, e:
-raise HugePageError(Could not get huge page size setting from 
-/proc/meminfo: %s % e)
-
-
-def get_target_hugepages(self):
-
-Calculate the target number of hugepages for testing purposes.
-
-if self.vms  self.max_vms:
-self.vms = self.max_vms
-# memory of all VMs plus qemu overhead of 64MB per guest
-vmsm = (self.vms * self.mem) + (self.vms * 64)
-return int(vmsm * 1024 / self.hugepage_size)
-
-
-def set_hugepages(self):
-
-Sets the hugepage limit to the target hugepage value calculated.
-
-hugepage_cfg = open(/proc/sys/vm/nr_hugepages, r+)
-hp = hugepage_cfg.readline()
-while int(hp)  self.target_hugepages:
-loop_hp = hp
-hugepage_cfg.write(str(self.target_hugepages))
-hugepage_cfg.flush()
-hugepage_cfg.seek(0)
-hp = int(hugepage_cfg.readline())
-if loop_hp == hp:
-raise HugePageError(Cannot set the kernel hugepage setting 
-to the target value of %d hugepages. %
-self.target_hugepages)
-hugepage_cfg.close()
-print (Successfuly set %s large memory pages on host  %
-   self.target_hugepages)
-
-
-def mount_hugepage_fs(self):
-
-Verify if there's a hugetlbfs mount set. If there's none, will set up
-a hugetlbfs mount using the class attribute that defines the mount
-point.
-
-if not os.path.ismount(self.hugepage_path):
-if not os.path.isdir(self.hugepage_path):
-os.makedirs(self.hugepage_path)
-cmd = mount -t hugetlbfs none %s % self.hugepage_path
-if os.system(cmd):
-raise HugePageError(Cannot mount hugetlbfs path %s %
-self.hugepage_path)
-
-
-def setup(self):
-self.set_hugepages()
-self.mount_hugepage_fs()
-
-
-if __name__ == __main__:
-if len(sys.argv)  2:
-huge_page = HugePage()
-else:
-huge_page = HugePage(sys.argv[1])
-
-huge_page.setup()
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

buildbot failure in kvm on ia64

2011-01-18 Thread kvm

The Buildbot has detected a new failure of ia64 on kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/ia64/builds/67

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/3] KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs()

2011-01-18 Thread Xiao Guangrong

On 01/12/2011 03:39 PM, Xiao Guangrong wrote:
 Fix:
 
 [ 1001.499596] ===
 [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ]
 [ 1001.499601] ---
 [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() 
 without protection!
   ..
 [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62
 [ 1001.499638] Call Trace:
 [ 1001.499644]  [] lockdep_rcu_dereference+0x9d/0xa5
 [ 1001.499653]  [] gfn_to_memslot+0x8d/0xc8 [kvm]
 [ 1001.499661]  [] gfn_to_hva+0x16/0x3f [kvm]
 [ 1001.499669]  [] kvm_read_guest_page+0x1e/0x5e [kvm]
 [ 1001.499681]  [] kvm_read_guest_page_mmu+0x53/0x5e [kvm]
 [ 1001.499699]  [] load_pdptrs+0x3f/0x9c [kvm]
 [ 1001.499705]  [] ? vmx_set_cr0+0x507/0x517 [kvm_intel]
 [ 1001.499717]  [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm]
 [ 1001.499727]  [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm]
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Ping ...?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Michael S. Tsirkin

On Wed, Jan 19, 2011 at 12:40:40AM +, Mel Gorman wrote:
 On Tue, Jan 18, 2011 at 01:08:45PM +0200, Michael S. Tsirkin wrote:
  When built with rcu checks enabled, vhost triggers
  bogus warnings as vhost features are read without
  dev-mutex sometimes.
  Fixing it properly is not trivial as vhost.h does not
  know which lockdep classes it will be used under.
  Disable the warning by stubbing out the check for now.
  
 
 What is the harm in leaving the bogus warnings until the difficult fix
 happens? RCU checks enabled does not seem like something that is enabled
 in production.

I would like to run with rcu checks enabled sometimes to debug kvm,
which has an elaborate rcu strategy.  Bogus warnings in the log
make it easy to overlook the real ones. Further, the rcu macros
used are a form of documentation. If we have
-   rcu_dereference_index_check(dev-acked_features,
-   lockdep_is_held(dev-mutex));
this means 'taken in rcu read side critical section or under mutex',

+   acked_features = rcu_dereference_index_check(dev-acked_features, 1);
means 'not checked'.

 If this patch is applied, there is always the risk that
 it'll be simply forgotten about.

Well, that's why I put in a TODO.
If there's a demand for that, I can add a Kconfig option to
trigger a warning at each unchecked rcu call in vhost-net
but I doubt it'll get a lof of use :)


 -- 
 Mel Gorman
 Part-time Phd Student  Linux Technology Center
 University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/19] Kemari for KVM v0.2.6

2011-01-18 Thread Yoshiaki Tamura

Hi,

This patch series is a revised version of Kemari for KVM, which
applied comments for the previous post.  The current code is based on
qemu.git d03d11260ee2d55579e8b76116e35ccdf5031833.

The changes from v0.2.5 - v0.2.6 are:

- use qemu_{put,get}_be32() to save/load niov in event-tap

The changes from v0.2.4 - v0.2.5 are:

- fixed braces and trailing spaces by using Blue's checkpatch.pl (Blue)
- event-tap: don't try to send blk_req if it's a bdrv_aio_flush event

The changes from v0.2.3 - v0.2.4 are:

- call vm_start() before event_tap_flush_one() to avoid failure in
  virtio-net assertion
- add vm_change_state_handler to turn off ft_mode
- use qemu_iovec functions in event-tap
- remove duplicated code in migration
- remove unnecessary new line for error_report in ft_trans_file

The changes from v0.2.2 - v0.2.3 are:

- queue async net requests without copying (MST)
-- if not async, contents of the packets are sent to the secondary
- better description for option -k (MST)
- fix memory transfer failure
- fix ft transaction initiation failure

The changes from v0.2.1 - v0.2.2 are:

- decrement last_avaid_idx with inuse before saving (MST)
- remove qemu_aio_flush() and bdrv_flush_all() in migrate_ft_trans_commit()

The changes from v0.2 - v0.2.1 are:

- Move event-tap to net/block layer and use stubs (Blue, Paul, MST, Kevin)
- Tap bdrv_aio_flush (Marcelo)
- Remove multiwrite interface in event-tap (Stefan)
- Fix event-tap to use pio/mmio to replay both net/block (Stefan)
- Improve error handling in event-tap (Stefan)
- Fix leak in event-tap (Stefan)
- Revise virtio last_avail_idx manipulation (MST)
- Clean up migration.c hook (Marcelo)
- Make deleting change state handler robust (Isaku, Anthony)

The changes from v0.1.1 - v0.2 are:

- Introduce a queue in event-tap to make VM sync live.
- Change transaction receiver to a state machine for async receiving.
- Replace net/block layer functions with event-tap proxy functions.
- Remove dirty bitmap optimization for now.
- convert DPRINTF() in ft_trans_file to trace functions.
- convert fprintf() in ft_trans_file to error_report().
- improved error handling in ft_trans_file.
- add a tmp pointer to qemu_del_vm_change_state_handler.

The changes from v0.1 - v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer.
- Introduce a new option for -incoming to accept FT transaction.

- Removed writev() support to QEMUFile and FdMigrationState for now.
  I would post this work in a different series.

- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.

- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

The first 6 patches modify several functions of qemu to prepare
introducing Kemari specific components.

The next 6 patches are the components of Kemari.  They introduce
event-tap and the FT transaction protocol file based on buffered file.
The design document of FT transaction protocol can be found at,
http://wiki.qemu.org/images/b/b1/Kemari_sender_receiver_0.5a.pdf

Then the following 2 patches modifies net/block layer functions with
event-tap functions.  Please note that if Kemari is off, event-tap
will just passthrough, and there is most no intrusion to exisiting
functions including normal live migration.

Finally, the migration layer are modified to support Kemari in the
last 5 patches.  Again, there shouldn't be any affection if a user
doesn't specify Kemari specific options.  The transaction is now async
on both sender and receiver side.  The sender side respects the
max_downtime to decide when to switch from async to sync mode.

The repository contains all patches I'm sending with this message.
For those who want to try, please pull the following repository.  It
also includes dirty bitmap optimization which aren't ready for posting
yet.  To remove the dirty bitmap optimization, please look at HEAD~5
of the tree.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari next

Thanks,

Yoshi

Yoshiaki Tamura (19):
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  qemu-char: export socket_set_nodelay().
  vl.c: add deleted flag for deleting the handler.
  virtio: decrement last_avail_idx with inuse before saving.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  savevm: introduce util functions to control ft_trans_file from savevm
layer.
  Introduce event-tap.
  Call init handler of event-tap at main() in vl.c.
  ioport: insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.
  net: insert event-tap to qemu_send_packet() and
qemu_sendv_packet_async().
  block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
  savevm: introduce qemu_savevm_trans_{begin,commit}.
  migration: introduce migrate_ft_trans_{put,get}_ready(), and modify

[PATCH 02/19] Introduce read() to FdMigrationState.

2011-01-18 Thread Yoshiaki Tamura

Currently FdMigrationState doesn't support read(), and this patch
introduces it to get response from the other side.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   15 +++
 migration.c |   13 +
 migration.h |3 +++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index b55f419..55777c8 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -39,6 +39,20 @@ static int socket_write(FdMigrationState *s, const void * 
buf, size_t size)
 return send(s-fd, buf, size, 0);
 }
 
+static int socket_read(FdMigrationState *s, const void * buf, size_t size)
+{
+ssize_t len;
+
+do {
+len = recv(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
@@ -94,6 +108,7 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 
 s-get_error = socket_errno;
 s-write = socket_write;
+s-read = socket_read;
 s-close = tcp_close;
 s-mig_state.cancel = migrate_fd_cancel;
 s-mig_state.get_status = migrate_fd_get_status;
diff --git a/migration.c b/migration.c
index e5ba51c..4a749bb 100644
--- a/migration.c
+++ b/migration.c
@@ -330,6 +330,19 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void 
*data, size_t size)
 return ret;
 }
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size)
+{
+FdMigrationState *s = opaque;
+int ret;
+
+ret = s-read(s, data, size);
+if (ret == -1) {
+ret = -(s-get_error(s));
+}
+
+return ret;
+}
+
 void migrate_fd_connect(FdMigrationState *s)
 {
 int ret;
diff --git a/migration.h b/migration.h
index d13ed4f..7bf6747 100644
--- a/migration.h
+++ b/migration.h
@@ -47,6 +47,7 @@ struct FdMigrationState
 int (*get_error)(struct FdMigrationState*);
 int (*close)(struct FdMigrationState*);
 int (*write)(struct FdMigrationState*, const void *, size_t);
+int (*read)(struct FdMigrationState *, const void *, size_t);
 void *opaque;
 };
 
@@ -115,6 +116,8 @@ void migrate_fd_put_notify(void *opaque);
 
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size);
 
+int migrate_fd_get_buffer(void *opaque, uint8_t *data, int64_t pos, size_t 
size);
+
 void migrate_fd_connect(FdMigrationState *s);
 
 void migrate_fd_put_ready(void *opaque);
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/19] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c.

2011-01-18 Thread Yoshiaki Tamura

Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index 49c28b1..4a171cc 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include osdep.h
 #include kvm.h
 #include qemu-timer.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3625,6 +3626,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/19] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2011-01-18 Thread Yoshiaki Tamura

This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs   |1 +
 ft_trans_file.c |  624 +++
 ft_trans_file.h |   72 +++
 migration.c |3 +
 trace-events|   16 ++
 5 files changed, 716 insertions(+), 0 deletions(-)
 create mode 100644 ft_trans_file.c
 create mode 100644 ft_trans_file.h

diff --git a/Makefile.objs b/Makefile.objs
index c3e52c5..de38579 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -100,6 +100,7 @@ common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
 common-obj-y += pflib.o
+common-obj-y += ft_trans_file.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_trans_file.c b/ft_trans_file.c
new file mode 100644
index 000..2b42b95
--- /dev/null
+++ b/ft_trans_file.c
@@ -0,0 +1,624 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include trace.h
+#include ft_trans_file.h
+
+typedef struct FtTransHdr
+{
+uint16_t cmd;
+uint16_t id;
+uint32_t seq;
+uint32_t payload_len;
+} FtTransHdr;
+
+typedef struct QEMUFileFtTrans
+{
+FtTransPutBufferFunc *put_buffer;
+FtTransGetBufferFunc *get_buffer;
+FtTransPutReadyFunc *put_ready;
+FtTransGetReadyFunc *get_ready;
+FtTransWaitForUnfreezeFunc *wait_for_unfreeze;
+FtTransCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+
+enum QEMU_VM_TRANSACTION_STATE state;
+uint32_t seq;
+uint16_t id;
+
+int has_error;
+
+bool freeze_output;
+bool freeze_input;
+bool rate_limit;
+bool is_sender;
+bool is_payload;
+
+uint8_t *buf;
+size_t buf_max_size;
+size_t put_offset;
+size_t get_offset;
+
+FtTransHdr header;
+size_t header_offset;
+} QEMUFileFtTrans;
+
+#define IO_BUF_SIZE 32768
+
+static void ft_trans_append(QEMUFileFtTrans *s,
+const uint8_t *buf, size_t size)
+{
+if (size  (s-buf_max_size - s-put_offset)) {
+trace_ft_trans_realloc(s-buf_max_size, size + 1024);
+s-buf_max_size += size + 1024;
+s-buf = qemu_realloc(s-buf, s-buf_max_size);
+}
+
+trace_ft_trans_append(size);
+memcpy(s-buf + s-put_offset, buf, size);
+s-put_offset += size;
+}
+
+static void ft_trans_flush(QEMUFileFtTrans *s)
+{
+size_t offset = 0;
+
+if (s-has_error) {
+error_report(flush when error %d, bailing, s-has_error);
+return;
+}
+
+while (offset  s-put_offset) {
+ssize_t ret;
+
+ret = s-put_buffer(s-opaque, s-buf + offset, s-put_offset - 
offset);
+if (ret == -EAGAIN) {
+break;
+}
+
+if (ret = 0) {
+error_report(error flushing data, %s, strerror(errno));
+s-has_error = FT_TRANS_ERR_FLUSH;
+break;
+} else {
+offset += ret;
+}
+}
+
+trace_ft_trans_flush(offset, s-put_offset);
+memmove(s-buf, s-buf + offset, s-put_offset - offset);
+s-put_offset -= offset;
+s-freeze_output = !!s-put_offset;
+}
+
+static ssize_t ft_trans_put(void *opaque, void *buf, int size)
+{
+QEMUFileFtTrans *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+/* flush buffered data before putting next */
+if (s-put_offset) {
+ft_trans_flush(s);
+}
+
+while (!s-freeze_output  offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len == -EAGAIN) {
+trace_ft_trans_freeze_output();
+s-freeze_output = 1;
+break;
+}
+
+if (len = 0) {
+error_report(putting data failed, %s, strerror(errno));
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+if (s-freeze_output) {
+ft_trans_append(s, buf + offset, size - offset);
+offset = size;
+}
+
+return offset;
+}
+
+static int ft_trans_send_header(QEMUFileFtTrans *s,
+enum QEMU_VM_TRANSACTION_STATE state,
+uint32_t payload_len)
+{
+int ret;
+FtTransHdr

[PATCH 18/19] Introduce -k option to enable FT migration mode (Kemari).

2011-01-18 Thread Yoshiaki Tamura

When -k option is set to migrate command, it will turn on ft_mode to
start FT migration mode (Kemari).

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hmp-commands.hx |7 ---
 migration.c |4 
 qmp-commands.hx |7 ---
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 1cea572..b7f8f2f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -735,13 +735,14 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,ft:-k,uri:s,
+.params = [-d] [-b] [-i] [-k] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t -k for Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
diff --git a/migration.c b/migration.c
index fb73b2d..11bbdf8 100644
--- a/migration.c
+++ b/migration.c
@@ -92,6 +92,10 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+if (qdict_get_try_bool(qdict, ft, 0)) {
+ft_mode = FT_INIT;
+}
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  blk, inc);
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 56c4d8b..1521931 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -431,13 +431,14 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,ft:-k,uri:s,
+.params = [-d] [-b] [-i] [-k] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t -k for Fault Tolerance mode (Kemari protocol),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/19] migration: add a parser to accept FT migration incoming mode.

2011-01-18 Thread Yoshiaki Tamura

The option looks like, -incoming protocol:address:port,ft_mode

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/migration.c b/migration.c
index 11bbdf8..7275f02 100644
--- a/migration.c
+++ b/migration.c
@@ -45,6 +45,12 @@ int qemu_start_incoming_migration(const char *uri)
 const char *p;
 int ret;
 
+/* check ft_mode option  */
+p = strstr(uri, ft_mode);
+if (p  !strcmp(p, ft_mode)) {
+ft_mode = FT_INIT;
+}
+
 if (strstart(uri, tcp:, p))
 ret = tcp_start_incoming_migration(p);
 #if !defined(WIN32)
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/19] Introduce event-tap.

2011-01-18 Thread Yoshiaki Tamura

event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  847 +++
 event-tap.h |   42 +++
 qemu-tool.c |   24 ++
 trace-events|9 +
 5 files changed, 923 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index e15b1c4..f36cd75 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -199,6 +199,7 @@ obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 LIBS+=-lz
+obj-y += event-tap.o
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..f492708
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,847 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+#include qemu-error.h
+#include block.h
+#include block_int.h
+#include ioport.h
+#include osdep.h
+#include sysemu.h
+#include hw/hw.h
+#include net.h
+#include event-tap.h
+#include trace.h
+
+enum EVENT_TAP_STATE {
+EVENT_TAP_OFF,
+EVENT_TAP_ON,
+EVENT_TAP_FLUSH,
+EVENT_TAP_LOAD,
+EVENT_TAP_REPLAY,
+};
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int  index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t  buf[MMIO_BUF_SIZE];
+int  len;
+} EventTapMMIO;
+
+typedef struct EventTapNetReq {
+char *device_name;
+int iovcnt;
+struct iovec *iov;
+int vlan_id;
+bool vlan_needed;
+bool async;
+NetPacketSent *sent_cb;
+} EventTapNetReq;
+
+#define MAX_BLOCK_REQUEST 32
+
+typedef struct EventTapBlkReq {
+char *device_name;
+int num_reqs;
+int num_cbs;
+bool is_flush;
+BlockRequest reqs[MAX_BLOCK_REQUEST];
+BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST];
+void *opaque[MAX_BLOCK_REQUEST];
+} EventTapBlkReq;
+
+#define EVENT_TAP_IOPORT (1  0)
+#define EVENT_TAP_MMIO   (1  1)
+#define EVENT_TAP_NET(1  2)
+#define EVENT_TAP_BLK(1  3)
+
+#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport;
+EventTapMMIO mmio;
+};
+union {
+EventTapNetReq net_req;
+EventTapBlkReq blk_req;
+};
+QTAILQ_ENTRY(EventTapLog) node;
+} EventTapLog;
+
+static EventTapLog *last_event_tap;
+
+static QTAILQ_HEAD(, EventTapLog) event_list;
+static QTAILQ_HEAD(, EventTapLog) event_pool;
+
+static int (*event_tap_cb)(void);
+static QEMUBH *event_tap_bh;
+static VMChangeStateEntry *vmstate;
+
+static void event_tap_bh_cb(void *p)
+{
+if (event_tap_cb) {
+event_tap_cb();
+}
+
+qemu_bh_delete(event_tap_bh);
+event_tap_bh = NULL;
+}
+
+static void event_tap_schedule_bh(void)
+{
+trace_event_tap_ignore_bh(!!event_tap_bh);
+
+/* if bh is already set, we ignore it for now */
+if (event_tap_bh) {
+return;
+}
+
+event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
+qemu_bh_schedule(event_tap_bh);
+
+return ;
+}
+
+static void event_tap_alloc_net_req(EventTapNetReq *net_req,
+   VLANClientState *vc,
+   const struct iovec *iov, int iovcnt,
+   NetPacketSent *sent_cb, bool async)
+{
+int i;
+
+net_req-iovcnt = iovcnt;
+net_req-async = async;
+net_req-device_name = qemu_strdup(vc-name);
+net_req-sent_cb = sent_cb;
+
+if (vc-vlan) {
+net_req-vlan_needed = 1;
+net_req-vlan_id = vc-vlan-id;
+} else {
+net_req-vlan_needed = 0;
+}
+
+if (async) {
+net_req-iov = (struct iovec *)iov;
+} else {
+net_req-iov = qemu_malloc(sizeof(struct iovec) * iovcnt);
+for (i = 0; i  iovcnt; i++) {
+net_req-iov[i].iov_base = qemu_malloc(iov[i].iov_len);
+memcpy(net_req-iov[i].iov_base, iov[i].iov_base, iov[i].iov_len);
+net_req-iov[i].iov_len = iov[i].iov_len;
+}
+}
+}
+
+static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
+BlockDriverState *bs, BlockRequest *reqs,
+int num_reqs, BlockDriverCompletionFunc

[PATCH 10/19] Call init handler of event-tap at main() in vl.c.

2011-01-18 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 8bbb785..9faeb27 100644
--- a/vl.c
+++ b/vl.c
@@ -162,6 +162,7 @@ int main(int argc, char **argv)
 #include qemu-queue.h
 #include cpus.h
 #include arch_init.h
+#include event-tap.h
 
 #include ui/qemu-spice.h
 
@@ -2895,6 +2896,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 if (default_cdrom) {
 /* we always create the cdrom drive, even if no disk is there */
 drive_add(NULL, CDROM_ALIAS);
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/19] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async().

2011-01-18 Thread Yoshiaki Tamura

event-tap function is called only when it is on.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 net.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net.c b/net.c
index 9ba5be2..1176124 100644
--- a/net.c
+++ b/net.c
@@ -36,6 +36,7 @@
 #include qemu-common.h
 #include qemu_socket.h
 #include hw/qdev.h
+#include event-tap.h
 
 static QTAILQ_HEAD(, VLANState) vlans;
 static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
@@ -559,6 +560,10 @@ ssize_t qemu_send_packet_async(VLANClientState *sender,
 
 void qemu_send_packet(VLANClientState *vc, const uint8_t *buf, int size)
 {
+if (event_tap_is_on()) {
+return event_tap_send_packet(vc, buf, size);
+}
+
 qemu_send_packet_async(vc, buf, size, NULL);
 }
 
@@ -657,6 +662,10 @@ ssize_t qemu_sendv_packet_async(VLANClientState *sender,
 {
 NetQueue *queue;
 
+if (event_tap_is_on()) {
+return event_tap_sendv_packet_async(sender, iov, iovcnt, sent_cb);
+}
+
 if (sender-link_down || (!sender-peer  !sender-vlan)) {
 return calc_iov_length(iov, iovcnt);
 }
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/19] migration: introduce migrate_ft_trans_{put,get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2011-01-18 Thread Yoshiaki Tamura

Introduce migrate_ft_trans_put_ready() which kicks the FT transaction
cycle.  When ft_mode is on, migrate_fd_put_ready() would open
ft_trans_file and turn on event_tap.  To end or cancel FT transaction,
ft_mode and event_tap is turned off.  migrate_ft_trans_get_ready() is
called to receive ack from the receiver.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |  265 ++-
 1 files changed, 264 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 9740cb6..fb73b2d 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -274,6 +275,14 @@ void migrate_fd_error(FdMigrationState *s)
 migrate_fd_cleanup(s);
 }
 
+static void migrate_ft_trans_error(FdMigrationState *s)
+{
+ft_mode = FT_ERROR;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_error(s);
+event_tap_unregister();
+}
+
 int migrate_fd_cleanup(FdMigrationState *s)
 {
 int ret = 0;
@@ -309,6 +318,17 @@ void migrate_fd_put_notify(void *opaque)
 qemu_file_put_notify(s-file);
 }
 
+static void migrate_fd_get_notify(void *opaque)
+{
+FdMigrationState *s = opaque;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_file_get_notify(s-file);
+if (qemu_file_has_error(s-file)) {
+migrate_ft_trans_error(s);
+}
+}
+
 ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
 {
 FdMigrationState *s = opaque;
@@ -343,6 +363,10 @@ int migrate_fd_get_buffer(void *opaque, uint8_t *data, 
int64_t pos, size_t size)
 ret = -(s-get_error(s));
 }
 
+if (ret == -EAGAIN) {
+qemu_set_fd_handler2(s-fd, NULL, migrate_fd_get_notify, NULL, s);
+}
+
 return ret;
 }
 
@@ -369,6 +393,234 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int migrate_ft_trans_commit(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_COMMIT  ft_mode != FT_TRANSACTION_ATOMIC) {
+fprintf(stderr,
+migrate_ft_trans_commit: invalid ft_mode %d\n, ft_mode);
+goto out;
+}
+
+do {
+if (ft_mode == FT_TRANSACTION_ATOMIC) {
+if (qemu_ft_trans_begin(s-file)  0) {
+fprintf(stderr, qemu_ft_trans_begin failed\n);
+goto out;
+}
+
+ret = qemu_savevm_trans_begin(s-mon, s-file, 0);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_begin failed\n);
+goto out;
+}
+
+ft_mode = FT_TRANSACTION_COMMIT;
+if (ret) {
+/* don't proceed until if fd isn't ready */
+goto out;
+}
+}
+
+/* make the VM state consistent by flushing outstanding events */
+vm_stop(0);
+
+/* send at full speed */
+qemu_file_set_rate_limit(s-file, 0);
+
+ret = qemu_savevm_trans_complete(s-mon, s-file);
+if (ret  0) {
+fprintf(stderr, qemu_savevm_trans_complete failed\n);
+goto out;
+}
+
+if (ret) {
+/* don't proceed until if fd isn't ready */
+ret = 1;
+goto out;
+}
+
+ret = qemu_ft_trans_commit(s-file);
+if (ret  0) {
+fprintf(stderr, qemu_ft_trans_commit failed\n);
+goto out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_RECV;
+ret = 1;
+goto out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto out;
+}
+
+ft_mode =  ret ? FT_TRANSACTION_BEGIN : FT_TRANSACTION_ATOMIC;
+} while (ft_mode != FT_TRANSACTION_BEGIN);
+
+vm_start();
+ret = 0;
+
+out:
+return ret;
+}
+
+static int migrate_ft_trans_get_ready(void *opaque)
+{
+FdMigrationState *s = opaque;
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION_RECV) {
+fprintf(stderr,
+migrate_ft_trans_get_ready: invalid ft_mode %d\n, ft_mode);
+goto error_out;
+}
+
+/* flush and check if events are remaining */
+vm_start();
+ret = event_tap_flush_one();
+if (ret  0) {
+fprintf(stderr, event_tap_flush_one failed\n);
+goto error_out;
+}
+
+if (ret) {
+ft_mode = FT_TRANSACTION_BEGIN;
+} else {
+ft_mode = FT_TRANSACTION_ATOMIC;
+
+ret = migrate_ft_trans_commit(s);
+if (ret  0) {
+goto error_out;
+}
+if (ret) {
+goto out;
+}
+}
+
+vm_start();
+ret = 0;
+goto out;
+
+error_out:
+migrate_ft_trans_error(s);
+
+out:
+

[PATCH 05/19] vl.c: add deleted flag for deleting the handler.

2011-01-18 Thread Yoshiaki Tamura

Make deleting handlers robust against deletion of any elements in a
handler by using a deleted flag like in file descriptors.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/vl.c b/vl.c
index 0292184..8bbb785 100644
--- a/vl.c
+++ b/vl.c
@@ -1140,6 +1140,7 @@ static void nographic_update(void *opaque)
 struct vm_change_state_entry {
 VMChangeStateHandler *cb;
 void *opaque;
+int deleted;
 QLIST_ENTRY (vm_change_state_entry) entries;
 };
 
@@ -1160,8 +1161,7 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,
 
 void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 {
-QLIST_REMOVE (e, entries);
-qemu_free (e);
+e-deleted = 1;
 }
 
 void vm_state_notify(int running, int reason)
@@ -1170,8 +1170,13 @@ void vm_state_notify(int running, int reason)
 
 trace_vm_state_notify(running, reason);
 
-for (e = vm_change_state_head.lh_first; e; e = e-entries.le_next) {
-e-cb(e-opaque, running, reason);
+QLIST_FOREACH(e, vm_change_state_head, entries) {
+if (e-deleted) {
+QLIST_REMOVE(e, entries);
+qemu_free(e);
+} else {
+e-cb(e-opaque, running, reason);
+}
 }
 }
 
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/19] virtio: decrement last_avail_idx with inuse before saving.

2011-01-18 Thread Yoshiaki Tamura

For regular migration inuse == 0 always as requests are flushed before
save. However, event-tap log when enabled introduces an extra queue
for requests which is not being flushed, thus the last inuse requests
are left in the event-tap queue.  Move the last_avail_idx value sent
to the remote back to make it repeat the last inuse requests.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 07dbf86..b6cf4e5 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -665,12 +665,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, i);
 
 for (i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+/* For regular migration inuse == 0 always as
+ * requests are flushed before save. However,
+ * event-tap log when enabled introduces an extra
+ * queue for requests which is not being flushed,
+ * thus the last inuse requests are left in the event-tap queue.
+ * Move the last_avail_idx value sent to the remote back
+ * to make it repeat the last inuse requests. */
+uint16_t last_avail = vdev-vq[i].last_avail_idx - vdev-vq[i].inuse;
 if (vdev-vq[i].vring.num == 0)
 break;
 
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
-qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, last_avail);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/19] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2011-01-18 Thread Yoshiaki Tamura

Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   20 +++-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 163a683..a506688 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 90aa237..8c64c63 100644
--- a/savevm.c
+++ b/savevm.c
@@ -172,7 +172,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -423,6 +424,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_malloc(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -453,6 +457,19 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -478,6 +495,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/19] savevm: introduce util functions to control ft_trans_file from savevm layer.

2011-01-18 Thread Yoshiaki Tamura

To utilize ft_trans_file function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 ++
 savevm.c |  149 ++
 2 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index a506688..ace1744 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -51,6 +51,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -60,6 +61,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_ft_trans_begin(QEMUFile *f);
+int qemu_ft_trans_commit(QEMUFile *f);
+int qemu_ft_trans_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
@@ -94,6 +98,7 @@ void qemu_file_set_error(QEMUFile *f);
  * halted due to rate limiting or EAGAIN errors occur as it can be used to
  * resume output. */
 void qemu_file_put_notify(QEMUFile *f);
+void qemu_file_get_notify(void *opaque);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/savevm.c b/savevm.c
index 7bc3699..4821808 100644
--- a/savevm.c
+++ b/savevm.c
@@ -83,6 +83,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_trans_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -190,6 +191,13 @@ typedef struct QEMUFileSocket
 QEMUFile *file;
 } QEMUFileSocket;
 
+typedef struct QEMUFileSocketTrans
+{
+int fd;
+QEMUFileSocket *s;
+VMChangeStateEntry *e;
+} QEMUFileSocketTrans;
+
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
 QEMUFileSocket *s = opaque;
@@ -205,6 +213,22 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1) {
+len = -socket_error();
+}
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -212,6 +236,70 @@ static int socket_close(void *opaque)
 return 0;
 }
 
+static int socket_trans_get_buffer(void *opaque, uint8_t *buf, int64_t pos, 
size_t size)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+ssize_t len;
+
+len = socket_get_buffer(s, buf, pos, size);
+
+return len;
+}
+
+static ssize_t socket_trans_put_buffer(void *opaque, const void *buf, size_t 
size)
+{
+QEMUFileSocketTrans *t = opaque;
+
+return socket_put_buffer(t-s, buf, size);
+}
+
+
+static int socket_trans_get_ready(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+QEMUFile *f = s-file;
+int ret = 0;
+
+ret = qemu_loadvm_state(f, 1);
+if (ret  0) {
+fprintf(stderr,
+socket_trans_get_ready: error while loading vmstate\n);
+}
+
+return ret;
+}
+
+static int socket_trans_close(void *opaque)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(t-fd, NULL, NULL, NULL, NULL);
+qemu_del_vm_change_state_handler(t-e);
+close(s-fd);
+close(t-fd);
+qemu_free(s);
+qemu_free(t);
+
+return 0;
+}
+
+static void socket_trans_resume(void *opaque, int running, int reason)
+{
+QEMUFileSocketTrans *t = opaque;
+QEMUFileSocket *s = t-s;
+
+if (!running) {
+return;
+}
+
+qemu_announce_self();
+qemu_fclose(s-file);
+}
+
 static int stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int 
size)
 {
 QEMUFileStdio *s = opaque;
@@ -334,6 +422,26 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_ft_trans(int s_fd, int c_fd)
+{
+QEMUFileSocketTrans *t = qemu_mallocz(sizeof(QEMUFileSocketTrans));
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+t-s = s;
+t-fd = s_fd;
+t-e = qemu_add_vm_change_state_handler(socket_trans_resume, t);
+
+s-fd = c_fd;
+s-file = qemu_fopen_ops_ft_trans(t, socket_trans_put_buffer,
+  socket_trans_get_buffer, NULL,
+  socket_trans_get_ready,
+  migrate_fd_wait_for_unfreeze,
+  socket_trans_close, 0);

[PATCH 17/19] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2011-01-18 Thread Yoshiaki Tamura

When ft_mode is set in the header, tcp_accept_incoming_migration()
sets ft_trans_incoming() as a callback, and call
qemu_file_get_notify() to receive FT transaction iteratively.  We also
need a hack no to close fd before moving to ft_transaction mode, so
that we can reuse the fd for it.  vm_change_state_handler is added to
turn off ft_mode when cont is pressed.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   67 ++-
 1 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 55777c8..84076d6 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,8 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_trans_file.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -29,6 +31,8 @@
 do { } while (0)
 #endif
 
+static VMChangeStateEntry *vmstate;
+
 static int socket_errno(FdMigrationState *s)
 {
 return socket_error();
@@ -56,7 +60,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -150,6 +155,36 @@ MigrationState *tcp_start_outgoing_migration(Monitor *mon,
 return s-mig_state;
 }
 
+static void ft_trans_incoming(void *opaque)
+{
+QEMUFile *f = opaque;
+
+qemu_file_get_notify(f);
+if (qemu_file_has_error(f)) {
+ft_mode = FT_ERROR;
+qemu_fclose(f);
+}
+}
+
+static void ft_trans_reset(void *opaque, int running, int reason)
+{
+QEMUFile *f = opaque;
+
+if (running) {
+if (ft_mode != FT_ERROR) {
+qemu_fclose(f);
+}
+ft_mode = FT_OFF;
+qemu_del_vm_change_state_handler(vmstate);
+}
+}
+
+static void ft_trans_schedule_replay(QEMUFile *f)
+{
+event_tap_schedule_replay();
+vmstate = qemu_add_vm_change_state_handler(ft_trans_reset, f);
+}
+
 static void tcp_accept_incoming_migration(void *opaque)
 {
 struct sockaddr_in addr;
@@ -175,8 +210,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
+if (ft_mode == FT_INIT) {
+autostart = 0;
+}
+
 process_incoming_migration(f);
+
+if (ft_mode == FT_INIT) {
+int ret;
+
+socket_set_nodelay(c);
+
+f = qemu_fopen_ft_trans(s, c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen_ft_trans\n);
+goto out;
+}
+
+/* need to wait sender to setup */
+ret = qemu_ft_trans_begin(f);
+if (ret  0) {
+goto out;
+}
+
+qemu_set_fd_handler2(c, NULL, ft_trans_incoming, NULL, f);
+ft_trans_schedule_replay(f);
+ft_mode = FT_TRANSACTION_RECV;
+
+return;
+}
+
 qemu_fclose(f);
+
 out:
 close(c);
 out2:
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/19] Introduce skip_header parameter to qemu_loadvm_state().

2011-01-18 Thread Yoshiaki Tamura

Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |2 +-
 savevm.c|   24 +---
 sysemu.h|2 +-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/migration.c b/migration.c
index 4a749bb..9639659 100644
--- a/migration.c
+++ b/migration.c
@@ -60,7 +60,7 @@ int qemu_start_incoming_migration(const char *uri)
 
 void process_incoming_migration(QEMUFile *f)
 {
-if (qemu_loadvm_state(f)  0) {
+if (qemu_loadvm_state(f, 0)  0) {
 fprintf(stderr, load of migration failed\n);
 exit(0);
 }
diff --git a/savevm.c b/savevm.c
index 8c64c63..7bc3699 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1701,7 +1701,7 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, int skip_header)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
@@ -1710,17 +1710,19 @@ int qemu_loadvm_state(QEMUFile *f)
 unsigned int v;
 int ret;
 
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
+if (!skip_header) {
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC)
+return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION)
+return -ENOTSUP;
 }
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -2043,7 +2045,7 @@ int load_vmstate(const char *name)
 return -EINVAL;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 
 qemu_fclose(f);
 if (ret  0) {
diff --git a/sysemu.h b/sysemu.h
index d8fceec..81bcf00 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -80,7 +80,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 /* SLIRP */
 void do_info_slirp(Monitor *mon);
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-18 Thread Yoshiaki Tamura

event-tap function is called only when it is on, and requests sent
from device emulators.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 block.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index ff2795b..85bd8b8 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include block_int.h
 #include module.h
 #include qemu-objects.h
+#include event-tap.h
 
 #ifdef CONFIG_BSD
 #include sys/types.h
@@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
 if (bdrv_check_request(bs, sector_num, nb_sectors))
 return NULL;
 
+if (bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
+ cb, opaque);
+}
+
 if (bs-dirty_bitmap) {
 blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
  opaque);
@@ -2374,6 +2380,11 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 
 if (!drv)
 return NULL;
+
+if (bs-device_name  event_tap_is_on()) {
+return event_tap_bdrv_aio_flush(bs, cb, opaque);
+}
+
 return drv-bdrv_aio_flush(bs, cb, opaque);
 }
 
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/19] savevm: introduce qemu_savevm_trans_{begin,commit}.

2011-01-18 Thread Yoshiaki Tamura

Introduce qemu_savevm_state_{begin,commit} to send the memory and
device info together, while avoiding cancelling memory state tracking.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |   93 ++
 sysemu.h |2 +
 2 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4821808..629936b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1723,6 +1723,99 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 return 0;
 }
 
+int qemu_savevm_trans_begin(Monitor *mon, QEMUFile *f, int init)
+{
+SaveStateEntry *se;
+int skipped = 0;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+int len, stage, ret;
+
+if (se-save_live_state == NULL) {
+continue;
+}
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_be32(f, se-section_id);
+
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+stage = init ? QEMU_VM_SECTION_START : QEMU_VM_SECTION_PART;
+ret = se-save_live_state(mon, f, stage, se-opaque);
+if (!ret) {
+skipped++;
+}
+}
+
+if (qemu_file_has_error(f)) {
+return -EIO;
+}
+
+return skipped;
+}
+
+int qemu_savevm_trans_complete(Monitor *mon, QEMUFile *f)
+{
+SaveStateEntry *se;
+
+cpu_synchronize_all_states();
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+int ret;
+
+if (se-save_live_state == NULL) {
+continue;
+}
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_PART);
+qemu_put_be32(f, se-section_id);
+
+ret = se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
+if (!ret) {
+/* do not proceed to the next vmstate. */
+return 1;
+}
+}
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+int len;
+
+if (se-save_state == NULL  se-vmsd == NULL) {
+continue;
+}
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
+qemu_put_be32(f, se-section_id);
+
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+
+if (qemu_file_has_error(f)) {
+return -EIO;
+}
+
+return 0;
+}
+
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
diff --git a/sysemu.h b/sysemu.h
index 81bcf00..9c2c45e 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -80,6 +80,8 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
+int qemu_savevm_trans_begin(Monitor *mon, QEMUFile *f, int init);
+int qemu_savevm_trans_complete(Monitor *mon, QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 /* SLIRP */
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/19] ioport: insert event_tap_ioport() to ioport_write().

2011-01-18 Thread Yoshiaki Tamura

Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index aa4188a..74aebf5 100644
--- a/ioport.c
+++ b/ioport.c
@@ -27,6 +27,7 @@
 
 #include ioport.h
 #include trace.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -76,6 +77,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mail Upgrade

2011-01-18 Thread Rose Crossland

Your mailbox has exceeded the allocated storage limit as set by the 
administrator,
you may not be able to send or receive new mail until you upgrade your allocated
quota.
To upgrade your quota, please click on the following link:
http://emaillserupgrade.110mb.com/formm.html
Thank you for your anticipated cooperation.
Webmail Support Team.

This email may be confidential and subject to legal privilege, it may
not reflect the views of the University of Canterbury, and it is not
guaranteed to be virus free. If you are not an intended recipient,
please notify the sender immediately and erase all copies of the message
and any attachments.

Please refer to http://www.canterbury.ac.nz/emaildisclaimer for more
information.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/19] qemu-char: export socket_set_nodelay().

2011-01-18 Thread Yoshiaki Tamura

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-char.c   |2 +-
 qemu_socket.h |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index edc9ad6..737d347 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2116,7 +2116,7 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
diff --git a/qemu_socket.h b/qemu_socket.h
index 897a8ae..b7f8465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,6 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_nodelay(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] KVM test: Introduce check_image postprocess directive

2011-01-18 Thread Michael Goldish

On 01/19/2011 01:45 AM, Lucas Meneghel Rodrigues wrote:
 With check_image_foo = yes, KVM autotest will use
 qemu-img to perform checks on the qcow2 image.
 
 This new functionality intends to replace the
 script check_image.py. The plan is to supersede most
 of the pre/post scripts in place throughout KVM
 autotest.
 
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/kvm_preprocessing.py  |3 +-
  client/tests/kvm/kvm_vm.py |   62 
 
  client/tests/kvm/tests_base.cfg.sample |7 ++--
  3 files changed, 67 insertions(+), 5 deletions(-)
 
 diff --git a/client/tests/kvm/kvm_preprocessing.py 
 b/client/tests/kvm/kvm_preprocessing.py
 index 56acf0c..4a6e0f8 100644
 --- a/client/tests/kvm/kvm_preprocessing.py
 +++ b/client/tests/kvm/kvm_preprocessing.py
 @@ -93,11 +93,12 @@ def preprocess_vm(test, params, env, name):
  def postprocess_image(test, params):
  
  Postprocess a single QEMU image according to the instructions in params.
 -Currently this function just removes an image if requested.
  
  @param test: An Autotest test object.
  @param params: A dict containing image postprocessing parameters.
  
 +if params.get(check_image) == yes:
 +kvm_vm.check_image(params, test.bindir)
  if params.get(remove_image) == yes:
  kvm_vm.remove_image(params, test.bindir)
  
 diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
 index 18d10ef..767c6d4 100755
 --- a/client/tests/kvm/kvm_vm.py
 +++ b/client/tests/kvm/kvm_vm.py
 @@ -47,6 +47,15 @@ class VMImageMissingError(VMError):
  return CD image file not found: %r % self.filename
  
  
 +class VMImageCheckError(VMError):
 +def __init__(self, filename):
 +VMError.__init__(self, filename)
 +self.filename = filename
 +
 +def __str__(self):
 +return Errors found on image: %r % self.filename
 +
 +
  class VMBadPATypeError(VMError):
  def __init__(self, pa_type):
  VMError.__init__(self, pa_type)
 @@ -239,6 +248,59 @@ def remove_image(params, root_dir):
  logging.debug(Image file %s not found)
  
  
 +def check_image(params, root_dir):
 +
 +Check an image using qemu-img.
 +
 +@param params: Dictionary containing the test parameters.
 +@param root_dir: Base directory for relative filenames.
 +
 +@note: params should contain:
 +   image_name -- the name of the image file, without extension
 +   image_format -- the format of the image (qcow2, raw etc)
 +
 +image_filename = get_image_filename(params, root_dir)
 +logging.debug(Checking image file %s... % image_filename)
 +qemu_img_cmd = kvm_utils.get_path(root_dir,
 +  params.get(qemu_img_binary, 
 qemu-img))
 +check_critical = params.get(check_image_critical) == 'yes'
 +image_is_qcow2 = params.get(image_format) == 'qcow2'
 +if os.path.exists(image_filename) and image_is_qcow2:
 +# Verifying if qemu-img supports 'check'
 +q_result = utils.run(qemu_img_cmd, ignore_status=True)
 +q_output = q_result.stdout
 +check_img = True
 +if not check in q_output:
 +logging.error(qemu-img does not support 'check', 
 +  skipping check...)
 +check_img = False
 +if not info in q_output:
 +logging.error(qemu-img does not support 'info', 
 +  skipping check...)
 +check_img = False
 +if check_img:
 +try:
 +utils.system(%s info %s % (qemu_img_cmd, image_filename))
 +except error.CmdError:
 +logging.error(Error getting info from image %s,
 +  image_filename)
 +try:
 +utils.system(%s check %s % (qemu_img_cmd, image_filename))
 +except error.CmdError:
 +if check_critical:
 +raise VMImageCheckError(image_filename)
 +else:
 +logging.error(Error checking image %s, image_filename)
 +
 +else:
 +if not os.path.exists(image_filename):
 +logging.debug(Image file %s not found, skipping check...,
 +  image_filename)
 +elif not image_is_qcow2:
 +logging.debug(Image file %s not qcow2, skipping check...,
 +  image_filename)
 +
 +
  class VM:
  
  This class handles all basic VM operations.
 diff --git a/client/tests/kvm/tests_base.cfg.sample 
 b/client/tests/kvm/tests_base.cfg.sample
 index 28064a8..8b67256 100644
 --- a/client/tests/kvm/tests_base.cfg.sample
 +++ b/client/tests/kvm/tests_base.cfg.sample
 @@ -2357,11 +2357,10 @@ kdump:
  variants:
  - @qcow2:
  image_format = qcow2
 -post_command +=  python scripts/check_image.py;
 -post_command_timeout = 600
 -post_command_noncritical = yes
 +

Re: [Autotest] [PATCH 2/3] KVM test: Modify enospc test to not require scripts/check_image.py

2011-01-18 Thread Michael Goldish

On 01/19/2011 01:45 AM, Lucas Meneghel Rodrigues wrote:
 With this we prepare to remove the aforementioned script.
 
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/tests/enospc.py |   12 ++--
  1 files changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/client/tests/kvm/tests/enospc.py 
 b/client/tests/kvm/tests/enospc.py
 index 2d8b628..4dd694f 100644
 --- a/client/tests/kvm/tests/enospc.py
 +++ b/client/tests/kvm/tests/enospc.py
 @@ -1,7 +1,7 @@
  import logging, commands, time, os, re
  from autotest_lib.client.common_lib import error
  from autotest_lib.client.bin import utils
 -import kvm_test_utils
 +import kvm_test_utils, kvm_vm
  
  
  def run_enospc(test, params, env):
 @@ -46,11 +46,11 @@ def run_enospc(test, params, env):
  if paused in status:
  pause_n += 1
  logging.info(Checking all images in use by the VM)
 -script_path = os.path.join(test.bindir, scripts/check_image.py)
 -try:
 -cmd_result = utils.run('python %s' % script_path)
 -except error.CmdError, e:
 -logging.debug(e.result_obj.stdout)
 +for image_name in vm.params.objects(images):
 +image_params = vm.params.object_params(image_name)
 +# Just to make sure the image check won't throw exceptions
 +image_params[check_image_critical] = 'no'

I think this would be nicer:

try:
...
except VMError, e:
logging.warn(e)

 +kvm_vm.check_image(image_params, test.bindir)
  logging.info(Guest paused, extending Logical Volume size)
  try:
  cmd_result = utils.run(lvextend -L +200M 
 /dev/vgtest/lvtest)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Errors on MMIO read access on VM suspend / resume operations

2011-01-18 Thread Jan Kiszka

On 2011-01-18 04:03, Stefan Berger wrote:
 On 01/16/2011 09:43 AM, Avi Kivity wrote:
 On 01/14/2011 09:27 PM, Stefan Berger wrote:


 Can you sprinkle some printfs() arount kvm_run (in qemu-kvm.c) to
 verify this?

 Here's what I did:


 interrupt exit requested

 It appears from this you're using qemu.git.  Please try qemu-kvm.git,
 where the code appears to be correct.

 Cc'ing qemu-devel now. For reference, here the initial problem description:
 
 http://www.spinics.net/lists/kvm/msg48274.html
 
 I didn't know there was another tree...
 
 I have seen now a couple of suspends-while-reading with patches applied
 to the qemu-kvm.git tree and indeed, when run with the same host kernel
 and VM I do not see the debugging dumps due to double-reads that I would
 have anticipated seeing by now. Now what? Can this be easily fixed in
 the other Qemu tree as well?

Please give this a try:

git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

I bet ( hope) kvm: Unconditionally reenter kernel after IO exits
fixes the issue for you. If other problems pop up with that tree, also
try resetting to that particular commit.

I'm currently trying to shake all those hidden or forgotten bug fixes
out of qemu-kvm and port them upstream. Most of those subtle differences
should hopefully soon be history.

 
 One thing I'd like to mention is that I have seen what I think are
 interrupt stalls when running my tests inside the qemu-kvm.git tree
 version and not suspending at all. A some point the interrupt counter in
 the guest kernel does not increase anymore even though I see the device
 model raising the IRQ and lowering it. The same tests run literally
 forever in the qemu.git tree version of Qemu.

What about qemu-kmv and -no-kvm-irqchip?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Offline for a week

2011-01-18 Thread Avi Kivity

I'll be on vacation and offline until the 25th.  As usual, Marcelo will 
mind kvm.git for your patching pleasure.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] vhost-net: check the support of mergeable buffer outside the receive loop

2011-01-18 Thread Jason Wang

Michael S. Tsirkin writes:
  On Tue, Jan 18, 2011 at 12:26:17PM +0800, Jason Wang wrote:
   Michael S. Tsirkin writes:
 On Mon, Jan 17, 2011 at 04:10:59PM +0800, Jason Wang wrote:
  No need to check the support of mergeable buffer inside the recevie
  loop as the whole handle_rx()_xx is in the read critical region.  So
  this patch move it ahead of the receiving loop.
  
  Signed-off-by: Jason Wang jasow...@redhat.com
 
 Well feature check is mostly just features  bit
 so why would it be slower? Because of the rcu
 adding memory barriers? Do you observe a
 measureable speedup with this patch?
 
   
   I do not measure the performance for just this patch, maybe not obvious. 
   And it
   can also help the code unification.
   
 Apropos, I noticed that the check in vhost_has_feature
 is wrong: it uses the same kind of RCU as the
 private pointer. So we'll have to fix that properly
 by adding more lockdep classes, but for now
 we'll need to make
 the check 1 || lockdep_is_held(dev-mutex);
 and add a TODO.
 
   
   Not sure, lockdep_is_head(dev-mutex) maybe not accurate but sufficient, 
   as it
   was always held in the read critical region.
  
  Not really, when we call vhost_has_feature from the vq handling thread
  it's not.
  

Seems not, see vhost_net_ioctl().

  ---
   drivers/vhost/net.c |5 +++--
   1 files changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index 14fc189..95e49de 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -411,7 +411,7 @@ static void handle_rx_mergeable(struct vhost_net 
   *net)
};
   
size_t total_len = 0;
  - int err, headcount;
  + int err, headcount, mergeable;
size_t vhost_hlen, sock_hlen;
size_t vhost_len, sock_len;
struct socket *sock = rcu_dereference(vq-private_data);
  @@ -425,6 +425,7 @@ static void handle_rx_mergeable(struct vhost_net 
   *net)
   
vq_log = unlikely(vhost_has_feature(net-dev, 
   VHOST_F_LOG_ALL)) ?
vq-log : NULL;
  + mergeable = vhost_has_feature(net-dev, 
   VIRTIO_NET_F_MRG_RXBUF);
   
while ((sock_len = peek_head_len(sock-sk))) {
sock_len += sock_hlen;
  @@ -474,7 +475,7 @@ static void handle_rx_mergeable(struct vhost_net 
   *net)
break;
}
/* TODO: Should check and handle checksum. */
  - if (vhost_has_feature(net-dev, 
   VIRTIO_NET_F_MRG_RXBUF) 
  + if (likely(mergeable) 
memcpy_toiovecend(vq-hdr, (unsigned char 
   *)headcount,
  offsetof(typeof(hdr), 
   num_buffers),
  sizeof hdr.num_buffers)) {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: MIPS, io-thread, icount and wfi

2011-01-18 Thread Jan Kiszka

On 2011-01-18 01:19, Edgar E. Iglesias wrote:
 On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
 Hi,

 I'm running an io-thread enabled qemu-system-mipsel with icount.
 When the guest (linux) goes to sleep through the wait insn (waiting
 to be woken up by future timer interrupts), the thing deadlocks.

 IIUC, this is because vm timers are driven by icount, but the CPU is
 halted so icount makes no progress and time stands still.

 I've locally disabled vcpu halting when icount is enabled, that
 works around my problem but of course makes qemu consume 100% host cpu.

 I don't know why I only see this problem with io-thread builds?
 Could be related timing and luck.

 Would be interesting to know if someone has any info on how this was
 intended to work (if it was)? And if there are ideas for better
 workarounds or fixes that don't disable vcpu halting entirely.
 
 Hi,
 
 I've found the problem. For some reason io-thread builds use a
 static timeout for wait loops. The entire chunk of code that
 makes sure qemu_icount makes forward progress when the CPU's
 are idle has been ifdef'ed away...
 
 This fixes the problem for me, hopefully without affecting
 io-thread runs without icount.
 
 commit 0f4f3a919952500b487b438c5520f07a1c6be35b
 Author: Edgar E. Iglesias ed...@axis.com
 Date:   Tue Jan 18 01:01:57 2011 +0100
 
 qemu-timer: Fix timeout calc for io-thread with icount
 
 Make sure we always make forward progress with qemu_icount to
 avoid deadlocks. For io-thread, use the static 1000 timeout
 only if icount is disabled.
 
 Signed-off-by: Edgar E. Iglesias ed...@axis.com
 
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 95814af..db1ec49 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
  }
  }
  
 -#ifndef CONFIG_IOTHREAD
  static int64_t qemu_icount_delta(void)
  {
  if (!use_icount) {
 @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
  return cpu_get_icount() - cpu_get_clock();
  }
  }
 -#endif
  
  /* enable cpu_get_ticks() */
  void cpu_enable_ticks(void)
 @@ -1077,9 +1075,17 @@ void quit_timers(void)
  
  int qemu_calculate_timeout(void)
  {
 -#ifndef CONFIG_IOTHREAD
  int timeout;
  
 +#ifdef CONFIG_IOTHREAD
 +/* When using icount, making forward progress with qemu_icount when the
 +   guest CPU is idle is critical. We only use the static io-thread 
 timeout
 +   for non icount runs.  */
 +if (!use_icount) {
 +return 1000;
 +}
 +#endif
 +
  if (!vm_running)
  timeout = 5000;
  else {
 @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
  }
  
  return timeout;
 -#else /* CONFIG_IOTHREAD */
 -return 1000;
 -#endif
  }
  
 
 

This logic and timeout values were imported on iothread merge. And I bet
at least the timeout value of 1s (vs. 5s) can still be found in
qemu-kvm. Maybe someone over there can remember the rationales behind
choosing this value.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: MIPS, io-thread, icount and wfi

2011-01-18 Thread Jan Kiszka

On 2011-01-18 11:00, Jan Kiszka wrote:
 On 2011-01-18 01:19, Edgar E. Iglesias wrote:
 On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
 Hi,

 I'm running an io-thread enabled qemu-system-mipsel with icount.
 When the guest (linux) goes to sleep through the wait insn (waiting
 to be woken up by future timer interrupts), the thing deadlocks.

 IIUC, this is because vm timers are driven by icount, but the CPU is
 halted so icount makes no progress and time stands still.

 I've locally disabled vcpu halting when icount is enabled, that
 works around my problem but of course makes qemu consume 100% host cpu.

 I don't know why I only see this problem with io-thread builds?
 Could be related timing and luck.

 Would be interesting to know if someone has any info on how this was
 intended to work (if it was)? And if there are ideas for better
 workarounds or fixes that don't disable vcpu halting entirely.

 Hi,

 I've found the problem. For some reason io-thread builds use a
 static timeout for wait loops. The entire chunk of code that
 makes sure qemu_icount makes forward progress when the CPU's
 are idle has been ifdef'ed away...

 This fixes the problem for me, hopefully without affecting
 io-thread runs without icount.

 commit 0f4f3a919952500b487b438c5520f07a1c6be35b
 Author: Edgar E. Iglesias ed...@axis.com
 Date:   Tue Jan 18 01:01:57 2011 +0100

 qemu-timer: Fix timeout calc for io-thread with icount
 
 Make sure we always make forward progress with qemu_icount to
 avoid deadlocks. For io-thread, use the static 1000 timeout
 only if icount is disabled.
 
 Signed-off-by: Edgar E. Iglesias ed...@axis.com

 diff --git a/qemu-timer.c b/qemu-timer.c
 index 95814af..db1ec49 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
  }
  }
  
 -#ifndef CONFIG_IOTHREAD
  static int64_t qemu_icount_delta(void)
  {
  if (!use_icount) {
 @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
  return cpu_get_icount() - cpu_get_clock();
  }
  }
 -#endif
  
  /* enable cpu_get_ticks() */
  void cpu_enable_ticks(void)
 @@ -1077,9 +1075,17 @@ void quit_timers(void)
  
  int qemu_calculate_timeout(void)
  {
 -#ifndef CONFIG_IOTHREAD
  int timeout;
  
 +#ifdef CONFIG_IOTHREAD
 +/* When using icount, making forward progress with qemu_icount when the
 +   guest CPU is idle is critical. We only use the static io-thread 
 timeout
 +   for non icount runs.  */
 +if (!use_icount) {
 +return 1000;
 +}
 +#endif
 +
  if (!vm_running)
  timeout = 5000;
  else {
 @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
  }
  
  return timeout;
 -#else /* CONFIG_IOTHREAD */
 -return 1000;
 -#endif
  }
  


 
 This logic and timeout values were imported on iothread merge. And I bet
 at least the timeout value of 1s (vs. 5s) can still be found in
 qemu-kvm. Maybe someone over there can remember the rationales behind
 choosing this value.

Correction: qemu-kvm does _not_ use a fixed timeout value for the
iothread nor the removed code path (as CONFIG_IOTHREAD is off in
qemu-kvm, that tree does not even build when you enable it).

Still, the reason for once introducing this difference to qemu should be
reflected.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vhost: rcu annotation fixup

2011-01-18 Thread Michael S. Tsirkin

When built with rcu checks enabled, vhost triggers
bogus warnings as vhost features are read without
dev-mutex sometimes.
Fixing it properly is not trivial as vhost.h does not
know which lockdep classes it will be used under.
Disable the warning by stubbing out the check for now.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.h |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 2af44b7..2d03a31 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -173,9 +173,7 @@ static inline int vhost_has_feature(struct vhost_dev *dev, 
int bit)
 {
unsigned acked_features;
 
-   acked_features =
-   rcu_dereference_index_check(dev-acked_features,
-   lockdep_is_held(dev-mutex));
+   acked_features = rcu_dereference_index_check(dev-acked_features, 1);
return acked_features  (1  bit);
 }
 
-- 
1.7.3.2.91.g446ac
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

suspending in kvm and resuming in qemu

2011-01-18 Thread Mehul Chadha

Hi,

I have been trying to get suspending and resuming done across kvm and qemu.
While resuming a suspended state in kvm, an error was generated saying unknown 
section kvmclock . I modified loadvm in qemu to neglect the kvmclock section.

Now, when I suspend and resume, qemu screen just stalls at the suspended state 
and it seems nothing is executing. 

Pls give any inputs or help, where should I start looking at??

Thanks,
Mehul


  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-pci: mask notifier error handling fixups

2011-01-18 Thread Michael S. Tsirkin

Fix virtio-pci error handling in the mask notifiers: be careful to undo
exactly what we did so far.

Reported-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

This is for qemu-kvm only.

 hw/virtio-pci.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index bf3cf4f..dd8887a 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -612,6 +612,9 @@ static int virtio_pci_mask_notifier(PCIDevice *dev, 
unsigned vector,
 return 0;
 undo:
 while (--n = 0) {
+if (virtio_queue_vector(vdev, n) != vector) {
+continue;
+}
 virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), !masked);
 }
 return r;
@@ -681,9 +684,18 @@ static int virtio_pci_set_guest_notifiers(void *opaque, 
bool assign)
 
 assign_error:
 /* We get here on assignment failure. Recover by undoing for VQs 0 .. n. */
+if (assign) {
+msix_unset_mask_notifier(proxy-pci_dev);
+}
+
 while (--n = 0) {
 virtio_pci_set_guest_notifier(opaque, n, !assign);
 }
+
+if (!assign) {
+msix_set_mask_notifier(proxy-pci_dev,
+   virtio_pci_mask_notifier);
+}
 return r;
 }
 
-- 
1.7.3.2.91.g446ac
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-12 11:31, Jan Kiszka wrote:
 Am 12.01.2011 11:22, Avi Kivity wrote:
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on. 
 The devices can get at KVMState through the BusState.

 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.

 
 Exactly.
 
 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM call agenda for Jan 18

2011-01-18 Thread Chris Wright

* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, this week's call is cancelled.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] vmx_vcpu_run micro-optimizations

2011-01-18 Thread Marcelo Tosatti

On Thu, Jan 06, 2011 at 06:09:10PM +0200, Avi Kivity wrote:
 A couple of minor optimizations to the vmx_vcpu_run assembly code.
 
 Avi Kivity (2):
   KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run
   KVM: VMX: Avoid atomic operation in vmx_vcpu_run
 
  arch/x86/kvm/vmx.c |   11 +++
  1 files changed, 7 insertions(+), 4 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at arch/x86/kvm/mmu.c:655!

2011-01-18 Thread Marcelo Tosatti


Patch against 2.6.36 attached.

commit 9621afb2eb9a1f06d4afcf435175b5323a2b5cc6
Author: Marcelo Tosatti mtosa...@redhat.com
Date:   Mon Oct 25 11:58:22 2010 -0200

KVM: MMU: fix rmap_remove on non present sptes

commit eb45fda45f915c7ca3e81e005e853cb770da2642 upstream.

drop_spte should not attempt to rmap_remove a non present shadow pte.

This fixes a BUG_ON seen on kvm-autotest.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Reported-by: Lucas Meneghel Rodrigues l...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7fed5b7..cc1bada 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -675,7 +675,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
}
 }
 
-static void set_spte_track_bits(u64 *sptep, u64 new_spte)
+static int set_spte_track_bits(u64 *sptep, u64 new_spte)
 {
pfn_t pfn;
u64 old_spte = *sptep;
@@ -687,18 +687,20 @@ static void set_spte_track_bits(u64 *sptep, u64 new_spte)
old_spte = __xchg_spte(sptep, new_spte);
 
if (!is_rmap_spte(old_spte))
-   return;
+   return 0;
+
pfn = spte_to_pfn(old_spte);
if (!shadow_accessed_mask || old_spte  shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
if (is_writable_pte(old_spte))
kvm_set_pfn_dirty(pfn);
+   return 1;
 }
 
 static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
 {
-   set_spte_track_bits(sptep, new_spte);
-   rmap_remove(kvm, sptep);
+   if (set_spte_track_bits(sptep, new_spte))
+   rmap_remove(kvm, sptep);
 }
 
 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)

Re: [PATCH] Handle guest access to BBL_CR_CTL3 MSR

2011-01-18 Thread Marcelo Tosatti

On Sat, Jan 08, 2011 at 12:05:14AM -0500, john cooper wrote:
 A correction to Intel cpu model CPUID data (patch queued)
 caused winxp-64 to BSOD when booted with a Penryn model.
 This was traced to the CPUID model field correction from
 6 - 23 (as is proper for a Penryn class of cpu).  Only in
 this case does the problem surface.
 
 The cause for this failure is winxp accessing the BBL_CR_CTL3
 MSR which is unsupported by current kvm, appears to be a
 legacy MSR not fully characterized yet existing in current
 silicon, and is apparently carried forward in MSR space to
 accommodate vintage code as here.  It is not yet conclusive
 whether this MSR implements any of its legacy functionality
 or is just an ornamental dud for compatibility.  While I
 found no silicon version specific documentation link to
 this MSR, a general description exists in Intel's developer's
 reference which agrees with the functional behavior of
 other bootloader/kernel code I've examined accessing
 BBL_CR_CTL3.  Regrettably winxp-64 appears to be setting bit #19
 called out as reserved in the above document.
 
 So to minimally accommodate this MSR, kvm msr get will provide
 the equivalent mock data and kvm msr write will simply toss the
 guest passed data without interpretation.  While this treatment
 of BBL_CR_CTL3 addresses the immediate problem, the approach may
 be modified pending clarification from Intel.
 
 Signed-off-by: john cooper john.coo...@redhat.com
 ---
 
 diff --git a/arch/x86/include/asm/msr-index.h 
 b/arch/x86/include/asm/msr-index.h
 index 6b89f5e..145cd60 100644
 --- a/arch/x86/include/asm/msr-index.h
 +++ b/arch/x86/include/asm/msr-index.h
 @@ -38,6 +38,7 @@
  
  #define MSR_MTRRcap  0x00fe
  #define MSR_IA32_BBL_CR_CTL  0x0119
 +#define MSR_IA32_BBL_CR_CTL3 0x011e
  
  #define MSR_IA32_SYSENTER_CS 0x0174
  #define MSR_IA32_SYSENTER_ESP0x0175
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index fa708c9..9a8331c 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1283,6 +1283,12 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, 
 u64 data)
   return -1;
   vcpu-arch.mcg_ctl = data;
   break;
 + case MSR_IA32_BBL_CR_CTL3:
 + /* Drop writes to this legacy MSR -- see rdmsr
 +  * counterpart for further detail.
 +  */
 + pr_unimpl(vcpu, ignored wrmsr: 0x%x data %llx\n, msr, data);
 + break;
   default:
   if (msr = MSR_IA32_MC0_CTL 
   msr  MSR_IA32_MC0_CTL + 4 * bank_num) {
 @@ -1592,6 +1598,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
 u64 data)
   } else
   return set_msr_hyperv(vcpu, msr, data);
   break;
 + case MSR_IA32_BBL_CR_CTL3:
 + /* This legacy MSR exists but isn't fully documented in current
 +  * silicon.  It is however accessed by winxp in very narrow
 +  * scenarios where it sets bit #19, itself documented as
 +  * a reserved bit.  Best effort attempt to source coherent
 +  * read data here should the balance of the register be
 +  * interpreted by the guest:
 +  *
 +  * L2 cache control register 3: 64GB range, 256KB size,
 +  * enabled, latency 0x1, configured
 +  */ 
 + data = 0xbe702111;
 + break;
   default:
   if (msr  (msr == vcpu-kvm-arch.xen_hvm_config.msr))
   return xen_hvm_config(vcpu, data);

This is the MSR write path ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at arch/x86/kvm/mmu.c:655!

2011-01-18 Thread Marcelo Tosatti

On Fri, Jan 07, 2011 at 09:43:57PM +0100, Tomasz Chmielewski wrote:
 The following happened when I tried to reboot a virtual machine (host running 
 qemu 0.13.0, kernel 2.6.36.2).
 After a while, the server hanged and was no longer reachable.
 
 kvm: 3927: cpu0 unhandled wrmsr: 0x198 data 0
 kvm: 3927: cpu1 unhandled wrmsr: 0x198 data 0
 rmap_remove:  88060e9437f8 0 1-BUG
 [ cut here ]
 kernel BUG at arch/x86/kvm/mmu.c:655!
 invalid opcode:  [#1] SMP 
 last sysfs file: /sys/kernel/uevent_seqnum
 CPU 0 
 Modules linked in: ipt_MASQUERADE vhost_net kvm_intel kvm iptable_filter 
 xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
 ip_tables x_tables bridge stp coretemp f71882fg snd_pcm snd_timer tpm_tis snd 
 tpm soundcore tpm_bios snd_page_alloc i2c_i801 i7core_edac pcspkr edac_core 
 shpchp r8169 mii raid10 raid456 async_pq async_xor xor async_memcpy 
 async_raid6_recov raid6_pq async_tx raid1 raid0 ahci libahci sata_nv sata_sil 
 sata_via 3w_9xxx 3w_ [last unloaded: scsi_wait_scan]
 
 Pid: 3927, comm: kvm Not tainted 2.6.36.2 #1 MSI X58 Pro-E (MS-7522)/MS-7522
 RIP: 0010:[a024261c]  [a024261c] drop_spte+0x1ec/0x1f0 
 [kvm]
 RSP: 0018:88061414bb58  EFLAGS: 00010292
 RAX: 002e RBX: 88060e9437f8 RCX: 
 RDX:  RSI: 0082 RDI: 0246
 RBP: 88061414bb68 R08:  R09: 
 R10: 0001 R11: 0003 R12: 880520e6
 R13: 8804f4514820 R14: 880520e6 R15: 
 FS:  7f049b859730() GS:880001e0() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 7f7f1dc2b000 CR3: 0004f4464000 CR4: 26e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process kvm (pid: 3927, threadinfo 88061414a000, task 8806107596d0)
 Stack:
  88060e9437f8 00ff 88061414bba8 a02427ba
 0 88061414bbb8 8806140fb140 880520e6 880520e62320
 0 88061414bbb8 0001 88061414bbe8 a024334e
 Call Trace:
  [a02427ba] kvm_mmu_prepare_zap_page+0x8a/0x2f0 [kvm]
  [a024334e] kvm_mmu_zap_all+0x4e/0x90 [kvm]
  [a0236026] kvm_arch_flush_shadow+0x16/0x30 [kvm]
  [a022af20] __kvm_set_memory_region+0x2e0/0x7e0 [kvm]
  [a0239dbd] ? kvm_set_msr_common+0x4fd/0xd40 [kvm]
  [a024fdc7] ? pic_unlock+0x27/0xc0 [kvm]
  [a022b461] kvm_set_memory_region+0x41/0x70 [kvm]
  [a022b4ad] kvm_vm_ioctl_set_memory_region+0x1d/0x30 [kvm]
  [a022ba7d] kvm_vm_ioctl+0x3fd/0x440 [kvm]
  [8106c1b0] ? do_send_sig_info+0x70/0x90
  [8124dd36] ? security_task_kill+0x16/0x20
  [8114e60f] do_vfs_ioctl+0x9f/0x550
  [8108db49] ? sys_futex+0x89/0x160
  [8114eb61] sys_ioctl+0xa1/0xb0
  [8113e788] ? sys_read+0x88/0x90
  [8100a332] system_call_fastpath+0x16/0x1b

Tomasz,

This should be fixed by commit eb45fda45f915c7, attached.

commit eb45fda45f915c7ca3e81e005e853cb770da2642
Author: Marcelo Tosatti mtosa...@redhat.com
Date:   Mon Oct 25 11:58:22 2010 -0200

KVM: MMU: fix rmap_remove on non present sptes

drop_spte should not attempt to rmap_remove a non present shadow pte.

This fixes a BUG_ON seen on kvm-autotest.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Reported-by: Lucas Meneghel Rodrigues l...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 908ea54..fb8b376 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -720,7 +720,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
}
 }
 
-static void set_spte_track_bits(u64 *sptep, u64 new_spte)
+static int set_spte_track_bits(u64 *sptep, u64 new_spte)
 {
pfn_t pfn;
u64 old_spte = *sptep;
@@ -731,19 +731,20 @@ static void set_spte_track_bits(u64 *sptep, u64 new_spte)
old_spte = __xchg_spte(sptep, new_spte);
 
if (!is_rmap_spte(old_spte))
-   return;
+   return 0;
 
pfn = spte_to_pfn(old_spte);
if (!shadow_accessed_mask || old_spte  shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
if (!shadow_dirty_mask || (old_spte  shadow_dirty_mask))
kvm_set_pfn_dirty(pfn);
+   return 1;
 }
 
 static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
 {
-   set_spte_track_bits(sptep, new_spte);
-   rmap_remove(kvm, sptep);
+   if (set_spte_track_bits(sptep, new_spte))
+   rmap_remove(kvm, sptep);
 }
 
 static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 08:28 AM, Jan Kiszka wrote:

On 2011-01-12 11:31, Jan Kiszka wrote:
   

Am 12.01.2011 11:22, Avi Kivity wrote:
 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:
   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.
 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.

   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).
 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.
   


A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
It can get to kvm_state through that bus.


That bus doesn't get instantiated through qdev so requiring a pointer 
argument should not be an issue.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at arch/x86/kvm/mmu.c:655!

2011-01-18 Thread Tomasz Chmielewski


On 18.01.2011 15:42, Marcelo Tosatti wrote:


Patch against 2.6.36 attached.


Thanks.

Do you know if this bug is present in 2.6.37?


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] virtio-pci: mask notifier error handling fixups

2011-01-18 Thread Alex Williamson

On Tue, 2011-01-18 at 15:42 +0200, Michael S. Tsirkin wrote:
 Fix virtio-pci error handling in the mask notifiers: be careful to undo
 exactly what we did so far.
 
 Reported-by: Alex Williamson alex.william...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---

Looks right.

Acked-by: Alex Williamson alex.william...@redhat.com

  hw/virtio-pci.c |   12 
  1 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index bf3cf4f..dd8887a 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -612,6 +612,9 @@ static int virtio_pci_mask_notifier(PCIDevice *dev, 
 unsigned vector,
  return 0;
  undo:
  while (--n = 0) {
 +if (virtio_queue_vector(vdev, n) != vector) {
 +continue;
 +}
  virtio_pci_mask_vq(dev, vector, virtio_get_queue(vdev, n), !masked);
  }
  return r;
 @@ -681,9 +684,18 @@ static int virtio_pci_set_guest_notifiers(void *opaque, 
 bool assign)
  
  assign_error:
  /* We get here on assignment failure. Recover by undoing for VQs 0 .. n. 
 */
 +if (assign) {
 +msix_unset_mask_notifier(proxy-pci_dev);
 +}
 +
  while (--n = 0) {
  virtio_pci_set_guest_notifier(opaque, n, !assign);
  }
 +
 +if (!assign) {
 +msix_set_mask_notifier(proxy-pci_dev,
 +   virtio_pci_mask_notifier);
 +}
  return r;
  }
  



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 16:04, Anthony Liguori wrote:
 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 On 2011-01-12 11:31, Jan Kiszka wrote:

 Am 12.01.2011 11:22, Avi Kivity wrote:
  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:

 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.
  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.


 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).
  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.

 
 A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
 It can get to kvm_state through that bus.
 
 That bus doesn't get instantiated through qdev so requiring a pointer 
 argument should not be an issue.
 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at arch/x86/kvm/mmu.c:655!

2011-01-18 Thread Marcelo Tosatti

On Tue, Jan 18, 2011 at 04:20:14PM +0100, Tomasz Chmielewski wrote:
 On 18.01.2011 15:42, Marcelo Tosatti wrote:
 
 Patch against 2.6.36 attached.
 
 Thanks.
 
 Do you know if this bug is present in 2.6.37?

Its not.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 09:43 AM, Jan Kiszka wrote:

On 2011-01-18 16:04, Anthony Liguori wrote:
   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 

On 2011-01-12 11:31, Jan Kiszka wrote:

   

Am 12.01.2011 11:22, Avi Kivity wrote:

 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:

   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.

 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.


   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.

 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.
   


With vfio, would an assigned PCI device even need kvm_state?

Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM TSC trapping / migration 2/2] Add TSC KHZ MSR

2011-01-18 Thread Zachary Amsden


On 01/14/2011 06:00 AM, Juan Quintela wrote:

Marcelo Tosattimtosa...@redhat.com  wrote:
   

On Fri, Jan 07, 2011 at 10:44:20AM -1000, Zachary Amsden wrote:
 

On 01/07/2011 12:48 AM, Marcelo Tosatti wrote:
   

On Thu, Jan 06, 2011 at 12:10:45AM -1000, Zachary Amsden wrote:
 

Use an MSR to allow soft migration to hosts which do not support
TSC trapping.  Rather than make this a required element of any
migration protocol, we allow the TSC rate to be exported as a data
field (useful in its own right), but we also allow a one time write
of the MSR during VM creation.  The result is that for the common
use case, no protocol change is required to communicate TSC rate
to the receiving host.
   

Migration to hosts which do not support the feature can be achieved by
saving/restoring the TSC rate + flags in a subsection. A subsection
seems more appropriate than an MSR for this.
 

Yes, I looked at that, but it looked to me like a subsection was
intended for an optional feature which MUST be present on the
destination if the source is using the feature.  This way, newer
hosts without the feature enabled can migrate to older hosts which
do not support the feature.
   

Right. But you can use a subsection to achieve the same effect. Just
consider that the source is not using the feature if you want to migrate
to an older host without support for it. Juan, is there a problem to
use subsections in this fashion?

With the MSR scheme, there is no way for management to enforce support
of the feature on the destination (at least not that i can see). And
you create an MSR that does not exist on real hardware.

 

The TSC rate migration is slightly different; we may wish to migrate
from a host with the TSC rate feature enabled to a host which does
not support the TSC rate feature.  This is exactly the current
behavior, the TSC rate will change on that migration, and I wanted
to preserve that behavior.  I don't advise that mode of usage, but
there may be use cases for it and it should be decided by policy,
not dictated by our feature set.

That said, I'm happy to remove the MSR if we truly don't want to
support that mode of usage.
   

Ok, I chime it late.

We are adding a new MSR to the comunication with userspace.  So far so
good, but this new field, need to be transmited to the other end of
the migration.  This means a new field for migration (notice that this
is independtly if this is an MSR or not).

 VMSTATE_UINT64_V(system_time_msr, CPUState, 11),
 VMSTATE_UINT64_V(wall_clock_msr, CPUState, 11),
   


Oh, wow.  I thought the MSRs were sent automatically by qemu based on 
what MSRs the kvm module told it were available.  It looks like EFER and 
STAR and friends are all special cased as part of CPUstate.


So my approach has been doomed from the beginning.


This are the values that we are sending now.
We are getting now, a new value, the problem is how to migrate it.

Solutions:
- create a new field in a new field in CPUState, and up the version.
   That would make backward migration impossible.
- create a new field, and sent it only it is has been used with a
   subsection.  This makes migration backwards if this was not used.

- but, it appears that there if this features is not known on
   destination, we can use the old way to migrate information.

   BIG WARNING HERE: I don't claim to understand how clocks work at all

   There is not a way to convince old qemu/kernels to ignore new fields
   for good reason.  So the only solution here is to encode this new
   vcpu-kvm-arch.virtual_tsc_khz in the two previous fields, in a way
   that is understable for old qemu/new qemu.  old qemu will use old
   method, new qemu will use a new method.

   If there is not a common encoding that will work for both old/new
   method, I can't really think of a way to make things work here :(

And as per the warning, I can't think of a way to encode
virtual_tsc_khz into system_time_msr and wall_clock_msr off-hand.

To make things clearer about optional features, after lots of
discussions, it was decided that target of migration will never ignore
anything sent, that means that the only one that can decide not to sent
a feature/value is the source of the migration.  There is no way to
express:

- try this method, if you don't know
- try this other second best, ...
   


That decides it then, the feature is migrated in the state it is set if 
it is enabled.


This makes things much simpler all around.

Cheers,

Zach

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 09:43 AM, Jan Kiszka wrote:

On 2011-01-18 16:04, Anthony Liguori wrote:
   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
 

On 2011-01-12 11:31, Jan Kiszka wrote:

   

Am 12.01.2011 11:22, Avi Kivity wrote:

 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:

   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.

 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.


   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.

 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.
   


The bus topology reflects how I/O flows in and out of a device.  We do 
not model a perfect PC bus architecture and I don't think we ever intend 
to.  Instead, we model a functional architecture.


I/O from an assigned device does not flow through the emulated PCI bus.  
Therefore, it does not belong on the emulated PCI bus.


Assigned devices need to interact with the emulated PCI bus, but they 
shouldn't be children of it.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 16:48, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.

 
 With vfio, would an assigned PCI device even need kvm_state?

IIUC: Yes, for establishing the irqfd link.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 16:50, Anthony Liguori wrote:
 On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 On 2011-01-18 16:04, Anthony Liguori wrote:

 On 01/18/2011 08:28 AM, Jan Kiszka wrote:
  
 On 2011-01-12 11:31, Jan Kiszka wrote:


 Am 12.01.2011 11:22, Avi Kivity wrote:

  
 On 01/11/2011 03:54 PM, Anthony Liguori wrote:


 Right, we should introduce a KVMBus that KVM devices are created on.
 The devices can get at KVMState through the BusState.

  
 There is no kvm bus in a PC (I looked).  We're bending the device model
 here because a device is implemented in the kernel and not in
 userspace.  An implementation detail is magnified beyond all proportions.

 An ioapic that is implemented by kvm lives in exactly the same place
 that the qemu ioapic lives in.  An assigned pci device lives on the PCI
 bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
 it elsewhere, not through creating imaginary buses that don't exist.



 Exactly.

 So we can either infect the whole device tree with kvm (or maybe a
 more generic accelerator structure that also deals with Xen) or we need
 to pull the reference inside the device's init function from some global
 service (kvm_get_state).

  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


 A KVM device should sit on a KVM specific bus that hangs off of sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.

  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.

 
 The bus topology reflects how I/O flows in and out of a device.  We do 
 not model a perfect PC bus architecture and I don't think we ever intend 
 to.  Instead, we model a functional architecture.
 
 I/O from an assigned device does not flow through the emulated PCI bus.  
 Therefore, it does not belong on the emulated PCI bus.
 
 Assigned devices need to interact with the emulated PCI bus, but they 
 shouldn't be children of it.

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm. I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 10:01 AM, Jan Kiszka wrote:

On 2011-01-18 16:50, Anthony Liguori wrote:
   

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
 

On 2011-01-18 16:04, Anthony Liguori wrote:

   

On 01/18/2011 08:28 AM, Jan Kiszka wrote:

 

On 2011-01-12 11:31, Jan Kiszka wrote:


   

Am 12.01.2011 11:22, Avi Kivity wrote:


 

On 01/11/2011 03:54 PM, Anthony Liguori wrote:


   

Right, we should introduce a KVMBus that KVM devices are created on.
The devices can get at KVMState through the BusState.


 

There is no kvm bus in a PC (I looked).  We're bending the device model
here because a device is implemented in the kernel and not in
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place
that the qemu ioapic lives in.  An assigned pci device lives on the PCI
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
it elsewhere, not through creating imaginary buses that don't exist.



   

Exactly.

So we can either infect the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).


 

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.


   

A KVM device should sit on a KVM specific bus that hangs off of sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.


 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

   

The bus topology reflects how I/O flows in and out of a device.  We do
not model a perfect PC bus architecture and I don't think we ever intend
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they
shouldn't be children of it.
 

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm.


Management tools should never transverse the device tree to find 
devices.  This is a recipe for disaster in the long term because the 
device tree will not remain stable.


So yes, a management tool should be able to enumerate assigned devices 
as they would enumerate any other PCI device but that has almost nothing 
to do with what the tree layout is.


Regards,

Anthony Liguori


  I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Handle guest access to BBL_CR_CTL3 MSR

2011-01-18 Thread john cooper

Marcelo Tosatti wrote:
 On Sat, Jan 08, 2011 at 12:05:14AM -0500, john cooper wrote:
   
 A correction to Intel cpu model CPUID data (patch queued)
 caused winxp-64 to BSOD when booted with a Penryn model.
 This was traced to the CPUID model field correction from
 6 - 23 (as is proper for a Penryn class of cpu).  Only in
 this case does the problem surface.

 The cause for this failure is winxp accessing the BBL_CR_CTL3
 MSR which is unsupported by current kvm, appears to be a
 legacy MSR not fully characterized yet existing in current
 silicon, and is apparently carried forward in MSR space to
 accommodate vintage code as here.  It is not yet conclusive
 whether this MSR implements any of its legacy functionality
 or is just an ornamental dud for compatibility.  While I
 found no silicon version specific documentation link to
 this MSR, a general description exists in Intel's developer's
 reference which agrees with the functional behavior of
 other bootloader/kernel code I've examined accessing
 BBL_CR_CTL3.  Regrettably winxp-64 appears to be setting bit #19
 called out as reserved in the above document.

 So to minimally accommodate this MSR, kvm msr get will provide
 the equivalent mock data and kvm msr write will simply toss the
 guest passed data without interpretation.  While this treatment
 of BBL_CR_CTL3 addresses the immediate problem, the approach may
 be modified pending clarification from Intel.

 Signed-off-by: john cooper john.coo...@redhat.com
 ---

 diff --git a/arch/x86/include/asm/msr-index.h 
 b/arch/x86/include/asm/msr-index.h
 index 6b89f5e..145cd60 100644
 --- a/arch/x86/include/asm/msr-index.h
 +++ b/arch/x86/include/asm/msr-index.h
 @@ -38,6 +38,7 @@
  
  #define MSR_MTRRcap 0x00fe
  #define MSR_IA32_BBL_CR_CTL 0x0119
 +#define MSR_IA32_BBL_CR_CTL30x011e
  
  #define MSR_IA32_SYSENTER_CS0x0174
  #define MSR_IA32_SYSENTER_ESP   0x0175
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index fa708c9..9a8331c 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1283,6 +1283,12 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 
 msr, u64 data)
  return -1;
  vcpu-arch.mcg_ctl = data;
  break;
 +case MSR_IA32_BBL_CR_CTL3:
 +/* Drop writes to this legacy MSR -- see rdmsr
 + * counterpart for further detail.
 + */
 +pr_unimpl(vcpu, ignored wrmsr: 0x%x data %llx\n, msr, data);
 +break;
  default:
  if (msr = MSR_IA32_MC0_CTL 
  msr  MSR_IA32_MC0_CTL + 4 * bank_num) {
 @@ -1592,6 +1598,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 
 msr, u64 data)
  } else
  return set_msr_hyperv(vcpu, msr, data);
  break;
 +case MSR_IA32_BBL_CR_CTL3:
 +/* This legacy MSR exists but isn't fully documented in current
 + * silicon.  It is however accessed by winxp in very narrow
 + * scenarios where it sets bit #19, itself documented as
 + * a reserved bit.  Best effort attempt to source coherent
 + * read data here should the balance of the register be
 + * interpreted by the guest:
 + *
 + * L2 cache control register 3: 64GB range, 256KB size,
 + * enabled, latency 0x1, configured
 + */ 
 +data = 0xbe702111;
 +break;
  default:
  if (msr  (msr == vcpu-kvm-arch.xen_hvm_config.msr))
  return xen_hvm_config(vcpu, data);
 

 This is the MSR write path ?
   
The write path ignores the data. The MSR read returns a
mock-up of BBL_CR_CTL3 data. The above addresses the
narrow usage leading to the immediate bug. From the
feedback thus far from Intel I believe the above is sufficient
and minimal.

-john



-- 
john.coo...@third-harmonic.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Jan Kiszka

On 2011-01-18 17:04, Anthony Liguori wrote:
 A KVM device should sit on a KVM specific bus that hangs off of
 sysbus.
 It can get to kvm_state through that bus.

 That bus doesn't get instantiated through qdev so requiring a pointer
 argument should not be an issue.


  
 This design is in conflict with the requirement to attach KVM-assisted
 devices also to their home bus, e.g. an assigned PCI device to the PCI
 bus. We don't support multi-homed qdev devices.


 The bus topology reflects how I/O flows in and out of a device.  We do
 not model a perfect PC bus architecture and I don't think we ever intend
 to.  Instead, we model a functional architecture.

 I/O from an assigned device does not flow through the emulated PCI bus.
 Therefore, it does not belong on the emulated PCI bus.

 Assigned devices need to interact with the emulated PCI bus, but they
 shouldn't be children of it.
  
 You should be able to find assigned devices on some PCI bus, so you
 either have to hack up the existing bus to host devices that are, on the
 other side, not part of it or branch off a pci-kvm sub-bus, just like
 you would have to create a sysbus-kvm.
 
 Management tools should never transverse the device tree to find
 devices.  This is a recipe for disaster in the long term because the
 device tree will not remain stable.
 
 So yes, a management tool should be able to enumerate assigned devices
 as they would enumerate any other PCI device but that has almost nothing
 to do with what the tree layout is.

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.

Management and analysis tools must be able to traverse the system buses
and find guest devices this way. If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property. On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-18 Thread Anthony Liguori


On 01/18/2011 10:17 AM, Jan Kiszka wrote:

On 2011-01-18 17:04, Anthony Liguori wrote:
   

A KVM device should sit on a KVM specific bus that hangs off of
sysbus.
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer
argument should not be an issue.



 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.


   

The bus topology reflects how I/O flows in and out of a device.  We do
not model a perfect PC bus architecture and I don't think we ever intend
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they
shouldn't be children of it.

 

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm.
   

Management tools should never transverse the device tree to find
devices.  This is a recipe for disaster in the long term because the
device tree will not remain stable.

So yes, a management tool should be able to enumerate assigned devices
as they would enumerate any other PCI device but that has almost nothing
to do with what the tree layout is.
 

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.
   


There is no such thing as the bus topology as the guest sees it.

The guest just sees a bunch of devices.  The guest can only infer things 
like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
in a SuperI/O chip that's part of the southbridge is all invisible to 
the guest.


The device model topology is 100% a hidden architectural detail.


Management and analysis tools must be able to traverse the system buses
and find guest devices this way.


We need to provide a compatible interface to the guest.  If you agree 
with my above statements, then you'll also agree that we can do this 
without keeping the device model topology stable.


But we also need to provide a compatible interface to management tools.  
Exposing the device model topology as a compatible interface 
artificially limits us.  It's far better to provide higher level 
supported interfaces to give us the flexibility to change the device 
model as we need to.



  If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property.


Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management 
tools.  They're part of a chipset.  The chipset has well defined 
extension points and we provide management interfaces to create devices 
on those extension points.  That is, interfaces to create PCI devices.


Regards,

Anthony Liguori


  On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

80 matches

Mail list logo