Re: [Qemu-devel] DOS VM problem with QEMU-KVM and newer kernels

2012-04-16 Thread Roedel, Joerg
On Mon, Apr 16, 2012 at 12:25:37PM +0200, Jan Kiszka wrote:
 On 2012-04-15 11:44, Avi Kivity wrote:
  Jan, Joerg, was an AMD erratum published for the bug?
 
 It wasn't an erratum but a documented feature limitation in the AMD
 architecture that was simply ignored by the old code.

Right. But there is an erratum on K8 (only) which Kevin ran into. It is
documented as Erratum 701 and the bug is that no EXITINTINFO is stored
on a task-switch intercept on K8.


Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-26 Thread Roedel, Joerg
On Fri, Aug 26, 2011 at 12:24:23AM -0400, David Gibson wrote:
 On Thu, Aug 25, 2011 at 08:25:45AM -0500, Alexander Graf wrote:
  On 25.08.2011, at 07:31, Roedel, Joerg wrote:

   For mmio we could stop the guest and replace the mmio region with a
   region that is filled with 0xff, no?
  
  Sure, but that happens in user space. The question is how does
  kernel space enforce an MMIO region to not be mapped after the
  hotplug event occured? Keep in mind that user space is pretty much
  untrusted here - it doesn't have to be QEMU. It could just as well
  be a generic user space driver. And that can just ignore hotplug
  events.
 
 We're saying you hard yank the mapping from the userspace process.
 That is, you invalidate all its PTEs mapping the MMIO space, and don't
 let it fault them back in.
 
 As I see it there are two options: (a) make subsequent accesses from
 userspace or the guest result in either a SIGBUS that userspace must
 either deal with or die, or (b) replace the mapping with a dummy RO
 mapping containing 0xff, with any trapped writes emulated as nops.

The biggest problem with this approach is that it has to happen in the
context of the given process. Linux can't really modify an mm which
which belong to another context in a safe way.

The more I think about this, I come to the conclusion that it would be
the best to just kill the process accessing the device if it is manually
de-assigned from vfio. It should be a non-standard path anyway so it
doesn't make a lot of sense to implement complicated handling semantics
for it, no?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-26 Thread Roedel, Joerg
On Fri, Aug 26, 2011 at 12:20:00AM -0400, David Gibson wrote:
 On Wed, Aug 24, 2011 at 01:03:32PM +0200, Roedel, Joerg wrote:
  On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote:
   On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote:
  
I don't see a reason to make this meta-grouping static. It would harm
flexibility on x86. I think it makes things easier on power but there
are options on that platform to get the dynamic solution too.
   
   I think several people are misreading what Ben means by static.  I
   would prefer to say 'persistent', in that the meta-groups lifetime is
   not tied to an fd, but they can be freely created, altered and removed
   during runtime.
  
  Even if it can be altered at runtime, from a usability perspective it is
  certainly the best to handle these groups directly in qemu. Or are there
  strong reasons to do it somewhere else?
 
 Funny, Ben and I think usability demands it be the other way around.

The reason is that you mean the usability for the programmer and I mean
it for the actual user of qemu :)

 If the meta-groups are transient - that is lifetime tied to an fd -
 then any program that wants to use meta-groups *must* know the
 interfaces for creating one, whatever they are.
 
 But if they're persistent, the admin can use other tools to create the
 meta-group then just hand it to a program to use, since the interfaces
 for _using_ a meta-group are identical to those for an atomic group.
 
 This doesn't preclude a program from being meta-group aware, and
 creating its own if it wants to, of course.  My guess is that qemu
 would not want to build its own meta-groups, but libvirt probably
 would.

Doing it in libvirt makes it really hard for a plain user of qemu to
assign more than one device to a guest. What I want it that a user just
types

qemu -device vfio,host=00:01.0 -device vfio,host=00:02.0 ...

and it just works. Qemu creates the meta-groups and they are
automatically destroyed when qemu exits. That the programs are not aware
of meta-groups is not a big problem because all software using vfio
needs still to be written :)

Btw, with this concept the programmer can still decide to not use
meta-groups and just multiplex the mappings to all open device-fds it
uses.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 10:56:13AM -0400, Alex Williamson wrote:
 On Wed, 2011-08-24 at 10:43 +0200, Joerg Roedel wrote:
  A side-note: Might it be better to expose assigned devices in a guest on
  a seperate bus? This will make it easier to emulate an IOMMU for the
  guest inside qemu.
 
 I think we want that option, sure.  A lot of guests aren't going to
 support hotplugging buses though, so I think our default, map the entire
 guest model should still be using bus 0.  The ACPI gets a lot more
 complicated for that model too; dynamic SSDTs?  Thanks,

Ok, if only AMD-Vi should be emulated then it is not strictly
necessary. For this IOMMU we can specify that devices on the same bus
belong to different IOMMUs. So we can implement an IOMMU that handles
internal qemu-devices and one that handles pass-through devices.
Not sure if this is possible with VT-d too. Okay VT-d emulation would
also require that the devices emulation of a PCIe bridge, no?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
Hi Alex,

On Wed, Aug 24, 2011 at 05:13:49PM -0400, Alex Williamson wrote:
 Is this roughly what you're thinking of for the iommu_group component?
 Adding a dev_to_group iommu ops callback let's us consolidate the sysfs
 support in the iommu base.  Would AMD-Vi do something similar (or
 exactly the same) for group #s?  Thanks,

The concept looks good, I have some comments, though. On AMD-Vi the
implementation would look a bit different because there is a
data-structure were the information can be gathered from, so no need for
PCI bus scanning there.

 diff --git a/drivers/base/iommu.c b/drivers/base/iommu.c
 index 6e6b6a1..6b54c1a 100644
 --- a/drivers/base/iommu.c
 +++ b/drivers/base/iommu.c
 @@ -17,20 +17,56 @@
   */
  
  #include linux/bug.h
 +#include linux/device.h
  #include linux/types.h
  #include linux/module.h
  #include linux/slab.h
  #include linux/errno.h
  #include linux/iommu.h
 +#include linux/pci.h
  
  static struct iommu_ops *iommu_ops;
  
 +static ssize_t show_iommu_group(struct device *dev,
 + struct device_attribute *attr, char *buf)
 +{
 + return sprintf(buf, %lx, iommu_dev_to_group(dev));

Probably add a 0x prefix so userspace knows the format?

 +}
 +static DEVICE_ATTR(iommu_group, S_IRUGO, show_iommu_group, NULL);
 +
 +static int add_iommu_group(struct device *dev, void *unused)
 +{
 + if (iommu_dev_to_group(dev) = 0)
 + return device_create_file(dev, dev_attr_iommu_group);
 +
 + return 0;
 +}
 +
 +static int device_notifier(struct notifier_block *nb,
 +unsigned long action, void *data)
 +{
 + struct device *dev = data;
 +
 + if (action == BUS_NOTIFY_ADD_DEVICE)
 + return add_iommu_group(dev, NULL);
 +
 + return 0;
 +}
 +
 +static struct notifier_block device_nb = {
 + .notifier_call = device_notifier,
 +};
 +
  void register_iommu(struct iommu_ops *ops)
  {
   if (iommu_ops)
   BUG();
  
   iommu_ops = ops;
 +
 + /* FIXME - non-PCI, really want for_each_bus() */
 + bus_register_notifier(pci_bus_type, device_nb);
 + bus_for_each_dev(pci_bus_type, NULL, NULL, add_iommu_group);
  }

We need to solve this differently. ARM is starting to use the iommu-api
too and this definitly does not work there. One possible solution might
be to make the iommu-ops per-bus.

  bool iommu_found(void)
 @@ -94,6 +130,14 @@ int iommu_domain_has_cap(struct iommu_domain *domain,
  }
  EXPORT_SYMBOL_GPL(iommu_domain_has_cap);
  
 +long iommu_dev_to_group(struct device *dev)
 +{
 + if (iommu_ops-dev_to_group)
 + return iommu_ops-dev_to_group(dev);
 + return -ENODEV;
 +}
 +EXPORT_SYMBOL_GPL(iommu_dev_to_group);

Please rename this to iommu_device_group(). The dev_to_group name
suggests a conversion but it is actually just a property of the device.
Also the return type should not be long but something that fits into
32bit on all platforms. Since you use -ENODEV, probably s32 is a good
choice.

 +
  int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, int gfp_order, int prot)
  {
 diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
 index f02c34d..477259c 100644
 --- a/drivers/pci/intel-iommu.c
 +++ b/drivers/pci/intel-iommu.c
 @@ -404,6 +404,7 @@ static int dmar_map_gfx = 1;
  static int dmar_forcedac;
  static int intel_iommu_strict;
  static int intel_iommu_superpage = 1;
 +static int intel_iommu_no_mf_groups;
  
  #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
  static DEFINE_SPINLOCK(device_domain_lock);
 @@ -438,6 +439,10 @@ static int __init intel_iommu_setup(char *str)
   printk(KERN_INFO
   Intel-IOMMU: disable supported super page\n);
   intel_iommu_superpage = 0;
 + } else if (!strncmp(str, no_mf_groups, 12)) {
 + printk(KERN_INFO
 + Intel-IOMMU: disable separate groups for 
 multifunction devices\n);
 + intel_iommu_no_mf_groups = 1;

This should really be a global iommu option and not be VT-d specific.

  
   str += strcspn(str, ,);
 @@ -3902,6 +3907,52 @@ static int intel_iommu_domain_has_cap(struct 
 iommu_domain *domain,
   return 0;
  }
  
 +/* Group numbers are arbitrary.  Device with the same group number
 + * indicate the iommu cannot differentiate between them.  To avoid
 + * tracking used groups we just use the seg|bus|devfn of the lowest
 + * level we're able to differentiate devices */
 +static long intel_iommu_dev_to_group(struct device *dev)
 +{
 + struct pci_dev *pdev = to_pci_dev(dev);
 + struct pci_dev *bridge;
 + union {
 + struct {
 + u8 devfn;
 + u8 bus;
 + u16 segment;
 + } pci;
 + u32 group;
 + } id;
 +
 + if (iommu_no_mapping(dev))
 + return -ENODEV;
 +
 +

Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 11:07:46AM -0400, Alex Williamson wrote:
 On Wed, 2011-08-24 at 10:52 +0200, Roedel, Joerg wrote:
  On Tue, Aug 23, 2011 at 01:08:29PM -0400, Alex Williamson wrote:
   On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote:
  
Handling it through fds is a good idea. This makes sure that everything
belongs to one process. I am not really sure yet if we go the way to
just bind plain groups together or if we create meta-groups. The
meta-groups thing seems somewhat cleaner, though.
   
   I'm leaning towards binding because we need to make it dynamic, but I
   don't really have a good picture of the lifecycle of a meta-group.
  
  In my view the life-cycle of the meta-group is a subrange of the
  qemu-instance's life-cycle.
 
 I guess I mean the lifecycle of a super-group that's actually exposed as
 a new group in sysfs.  Who creates it?  How?  How are groups dynamically
 added and removed from the super-group?  The group merging makes sense
 to me because it's largely just an optimization that qemu will try to
 merge groups.  If it works, great.  If not, it manages them separately.
 When all the devices from a group are unplugged, unmerge the group if
 necessary.

Right. The super-group thing is an optimization.

 We need to try the polite method of attempting to hot unplug the device
 from qemu first, which the current vfio code already implements.  We can
 then escalate if it doesn't respond.  The current code calls abort in
 qemu if the guest doesn't respond, but I agree we should also be
 enforcing this at the kernel interface.  I think the problem with the
 hard-unplug is that we don't have a good revoke mechanism for the mmio
 mmaps.

For mmio we could stop the guest and replace the mmio region with a
region that is filled with 0xff, no?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-25 Thread Roedel, Joerg
On Thu, Aug 25, 2011 at 11:38:09AM -0400, Don Dutile wrote:

 On 08/25/2011 06:54 AM, Roedel, Joerg wrote:
  We need to solve this differently. ARM is starting to use the iommu-api
  too and this definitly does not work there. One possible solution might
  be to make the iommu-ops per-bus.
 
 When you think of a system where there isn't just one bus-type
 with iommu support, it makes more sense.
 Additionally, it also allows the long-term architecture to use different types
 of IOMMUs on each bus segment -- think per-PCIe-switch/bridge IOMMUs --
 esp. 'tuned' IOMMUs -- ones better geared for networks, ones better geared
 for direct-attach disk hba's.

Not sure how likely it is to have different types of IOMMUs within a
given bus-type. But if they become reality we can multiplex in the
iommu-api without much hassle :)
For now, something like bus_set_iommu() or bus_register_iommu() would
provide a nice way to do bus-specific setups for a given iommu
implementation.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 01:08:29PM -0400, Alex Williamson wrote:
 On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote:

  Handling it through fds is a good idea. This makes sure that everything
  belongs to one process. I am not really sure yet if we go the way to
  just bind plain groups together or if we create meta-groups. The
  meta-groups thing seems somewhat cleaner, though.
 
 I'm leaning towards binding because we need to make it dynamic, but I
 don't really have a good picture of the lifecycle of a meta-group.

In my view the life-cycle of the meta-group is a subrange of the
qemu-instance's life-cycle.

  Putting the process to sleep (which would be uninterruptible) seems bad.
  The process would sleep until the guest releases the device-group, which
  can take days or months.
  The best thing (and the most intrusive :-) ) is to change PCI core to
  allow unbindings to fail, I think. But this probably further complicates
  the way to upstream VFIO...
 
 Yes, it's not ideal but I think it's sufficient for now and if we later
 get support for returning an error from release, we can set a timeout
 after notifying the user to make use of that.  Thanks,

Ben had the idea of just forcing to hard-unplug this device from the
guest. Thats probably the best way to deal with that, I think. VFIO
sends a notification to qemu that the device is gone and qemu informs
the guest in some way about it.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 07:35:37PM -0400, Benjamin Herrenschmidt wrote:
 On Tue, 2011-08-23 at 15:18 +0200, Roedel, Joerg wrote:

  Hmm, good idea. But as far as I know the hotplug-event needs to be in
  the guest _before_ the device is actually unplugged (so that the guest
  can unbind its driver first). That somehow brings back the sleep-idea
  and the timeout in the .release function.
 
 That's for normal assisted hotplug, but don't we support hard hotplug ?
 I mean, things like cardbus, thunderbolt (if we ever support that)
 etc... will need it and some platforms do support hard hotplug of PCIe
 devices.
 
 (That's why drivers should never spin on MMIO waiting for a 1 bit to
 clear without a timeout :-)

Right, thats probably the best semantic for this issue then. The worst
thing that happens is that the admin crashed the guest.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Tue, Aug 23, 2011 at 12:54:27PM -0400, aafabbri wrote:
 On 8/23/11 4:04 AM, Joerg Roedel joerg.roe...@amd.com wrote:
  That is makes uiommu basically the same as the meta-groups, right?
 
 Yes, functionality seems the same, thus my suggestion to keep uiommu
 explicit.  Is there some need for group-groups besides defining sets of
 groups which share IOMMU resources?
 
 I do all this stuff (bringing up sets of devices which may share IOMMU
 domain) dynamically from C applications.  I don't really want some static
 (boot-time or sysfs fiddling) supergroup config unless there is a good
 reason KVM/power needs it.
 
 As you say in your next email, doing it all from ioctls is very easy,
 programmatically.

I don't see a reason to make this meta-grouping static. It would harm
flexibility on x86. I think it makes things easier on power but there
are options on that platform to get the dynamic solution too.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-24 Thread Roedel, Joerg
On Wed, Aug 24, 2011 at 05:33:00AM -0400, David Gibson wrote:
 On Wed, Aug 24, 2011 at 11:14:26AM +0200, Roedel, Joerg wrote:

  I don't see a reason to make this meta-grouping static. It would harm
  flexibility on x86. I think it makes things easier on power but there
  are options on that platform to get the dynamic solution too.
 
 I think several people are misreading what Ben means by static.  I
 would prefer to say 'persistent', in that the meta-groups lifetime is
 not tied to an fd, but they can be freely created, altered and removed
 during runtime.

Even if it can be altered at runtime, from a usability perspective it is
certainly the best to handle these groups directly in qemu. Or are there
strong reasons to do it somewhere else?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote:
 
  I am in favour of /dev/vfio/$GROUP. If multiple devices should be
  assigned to a guest, there can also be an ioctl to bind a group to an
  address-space of another group (certainly needs some care to not allow
  that both groups belong to different processes).
  
  Btw, a problem we havn't talked about yet entirely is
  driver-deassignment. User space can decide to de-assign the device from
  vfio while a fd is open on it. With PCI there is no way to let this fail
  (the .release function returns void last time i checked). Is this a
  problem, and yes, how we handle that?
 
 We can treat it as a hard unplug (like a cardbus gone away).
 
 IE. Dispose of the direct mappings (switch to MMIO emulation) and return
 all ff's from reads ( ignore writes).
 
 Then send an unplug event via whatever mechanism the platform provides
 (ACPI hotplug controller on x86 for example, we haven't quite sorted out
 what to do on power for hotplug yet).

Hmm, good idea. But as far as I know the hotplug-event needs to be in
the guest _before_ the device is actually unplugged (so that the guest
can unbind its driver first). That somehow brings back the sleep-idea
and the timeout in the .release function.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote:
 On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:

  I am in favour of /dev/vfio/$GROUP. If multiple devices should be
  assigned to a guest, there can also be an ioctl to bind a group to an
  address-space of another group (certainly needs some care to not allow
  that both groups belong to different processes).
 
 That's an interesting idea.  Maybe an interface similar to the current
 uiommu interface, where you open() the 2nd group fd and pass the fd via
 ioctl to the primary group.  IOMMUs that don't support this would fail
 the attach device callback, which would fail the ioctl to bind them.  It
 will need to be designed so any group can be removed from the super-set
 and the remaining group(s) still works.  This feels like something that
 can be added after we get an initial implementation.

Handling it through fds is a good idea. This makes sure that everything
belongs to one process. I am not really sure yet if we go the way to
just bind plain groups together or if we create meta-groups. The
meta-groups thing seems somewhat cleaner, though.

  Btw, a problem we havn't talked about yet entirely is
  driver-deassignment. User space can decide to de-assign the device from
  vfio while a fd is open on it. With PCI there is no way to let this fail
  (the .release function returns void last time i checked). Is this a
  problem, and yes, how we handle that?
 
 The current vfio has the same problem, we can't unbind a device from
 vfio while it's attached to a guest.  I think we'd use the same solution
 too; send out a netlink packet for a device removal and have the .remove
 call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
 and SIGBUS the PIDs holding the device if they don't return it
 willingly.  Thanks,

Putting the process to sleep (which would be uninterruptible) seems bad.
The process would sleep until the guest releases the device-group, which
can take days or months.
The best thing (and the most intrusive :-) ) is to change PCI core to
allow unbindings to fail, I think. But this probably further complicates
the way to upstream VFIO...

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 06:51:35AM -0400, Avi Kivity wrote:
 On 08/22/2011 01:46 PM, Joerg Roedel wrote:
  That does not work. The bridge in question may not even be visible as a
  PCI device, so you can't link to it. This is the case on a few PCIe
  cards which only have a PCIx chip and a PCIe-2-PCIx bridge to implement
  the PCIe interface (yes, I have seen those cards).
 
 How does the kernel detect that devices behind the invisible bridge must 
 be assigned as a unit?

On the AMD IOMMU side this information is stored in the IVRS ACPI table.
Not sure about the VT-d side, though.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 08:42:35AM -0400, Avi Kivity wrote:
 On 08/22/2011 03:36 PM, Roedel, Joerg wrote:
  On the AMD IOMMU side this information is stored in the IVRS ACPI table.
  Not sure about the VT-d side, though.
 
 I see.  There is no sysfs node representing it?

No. It also doesn't exist as a 'struct pci_dev'. This caused problems in
the AMD IOMMU driver in the past and I needed to fix that. There I know
that from :)

 I'd rather not add another meaningless identifier.

Well, I don't think its really meaningless, but we need some way to
communicate the information about device groups to userspace.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote:
 On 08/22/2011 03:55 PM, Roedel, Joerg wrote:

  Well, I don't think its really meaningless, but we need some way to
  communicate the information about device groups to userspace.
 
 I mean the contents of the group descriptor.  There are enough 42s in 
 the kernel, it's better if we can replace a synthetic number with 
 something meaningful.

If we only look at PCI than a Segment:Bus:Dev.Fn Number would be
sufficient, of course. But the idea was to make it generic enough so
that it works with !PCI too.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] kvm PCI assignment VFIO ramblings

2011-08-22 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 09:17:41AM -0400, Avi Kivity wrote:
 On 08/22/2011 04:15 PM, Roedel, Joerg wrote:
  On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote:
On 08/22/2011 03:55 PM, Roedel, Joerg wrote:
 
  Well, I don't think its really meaningless, but we need some way to
  communicate the information about device groups to userspace.
  
I mean the contents of the group descriptor.  There are enough 42s in
the kernel, it's better if we can replace a synthetic number with
something meaningful.
 
  If we only look at PCI than a Segment:Bus:Dev.Fn Number would be
  sufficient, of course. But the idea was to make it generic enough so
  that it works with !PCI too.
 
 
 We could make it an arch defined string instead of a symlink.  So it 
 doesn't return 42, rather something that can be used by the admin to 
 figure out what the problem was.

Well, ok, it would certainly differ from the in-kernel representation
then and introduce new architecture dependencies into libvirt. But if
the 'group-string' is more meaningful to users then its certainly good.
Suggestions?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] semantics of -cpu host and check/enforce

2011-06-11 Thread Roedel, Joerg
On Fri, Jun 10, 2011 at 05:36:37PM -0400, Eduardo Habkost wrote:

 We have 3 sets of cpu features that may or may not be included in
 '-cpu host':
 
 A) Features that are supported by the host and that KVM can already
emulate, or don't need KVM support to be used;
 B) Features that may be not supported by the host but can be emulated by
KVM (e.g. the SVM features, or x2apic);
 C) Features that are supported by the host but KVM can't emulate.
Divided in:
C1) features we can't emulate and we know about it (e.g. dtes64)[1]
C2) features we possibly can't emulate but we don't even know about it
(e.g. features added to recent CPUs).
 
 It seems obvious that all the features in group A must always be
 included in '-cpu host', but what about features in the B or C groups?
 
 
 About group B: it looks like we are not being consistent. For example,
 svm_features has every bit enabled when using '-cpu host' even if the
 host doesn't support them; in other cases (e.g. x2apic), it is not
 enabled by '-cpu host' unless the host already supports it.

SVM is a special feature. We can't just forward the host-cpuid to the
guest because every SVM feature we provide to the guest needs emulation
in the kernel module. The kernel-module will tell qemu-kvm about its
feature set via the GET_SUPPORTED_CPUID ioctl. So the idea behint -cpu
host and SVM features is that qemu-kvm enables all of them and masks out
everything that is not supported by the kernel module.

Note that the kernel might even emulate features that are not supported
on the host, like the vmcb-clean-bits, so we really need to ask the
kernel what is supported for the guest.

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] drop -enable-nesting

2011-05-31 Thread Roedel, Joerg
On Tue, May 31, 2011 at 04:58:16AM -0400, Avi Kivity wrote:
 On 05/31/2011 11:44 AM, Daniel P. Berrange wrote:
I think it's safe to drop -enable-nesting immediately.  Dan, does
libvirt make use of it?
 
  Yes, but it should be safe to drop it. Currently, if the user specifies
  a CPU with the 'svm' flag present in libvirt guest XML, then we will
  pass args '-cpu +svm -enable-nesting'. So if we drop --enable-nesting,
  then libvirt will simply omit it and everything should still work because
  we have still got +svm set.
 
 But qemu will complain about an option it can't parse.

The best choice is probably to keep the option and make it a nop for the
lifetime of qemu-kvm. Optionally qemu-kvm can print a warning about the
deprecated option.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 3/7] cpu model bug fixes and definition corrections: Add kvm emulated x2apic flag to config defined cpu models

2011-05-30 Thread Roedel, Joerg
On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote:

 Jörg, how to deal with -enable-nesting in qemu-kvm to align behavior
 with upstream?

My personal preference is to just remove it. In upstream-qemu it is
enabled/disabled by +/-svm. -enable-nesting is just a historic thing
which can be wiped out.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] drop -enable-nesting

2011-05-30 Thread Roedel, Joerg
On Mon, May 30, 2011 at 11:04:02AM -0400, Jan Kiszka wrote:
 On 2011-05-30 16:38, Nadav Har'El wrote:
  On Mon, May 30, 2011, Jan Kiszka wrote about drop -enable-nesting (was: 
  [PATCH 3/7] cpu model bug fixes and definition corrections...):
  On 2011-05-30 10:18, Roedel, Joerg wrote:
  On Sat, May 28, 2011 at 04:39:13AM -0400, Jan Kiszka wrote:
 
  J�rg, how to deal with -enable-nesting in qemu-kvm to align behavior
  with upstream?
 
  My personal preference is to just remove it. In upstream-qemu it is
  enabled/disabled by +/-svm. -enable-nesting is just a historic thing
  which can be wiped out.
  
  -enable-nesting could remain as a synonym for enabling either VMX or SVM
  in the guest, depending on what was available in the host (because KVM now
  supports both nested SVM and nested VMX, but not SVM-on-VMX or vice versa).
 
 Why? Once nesting is stable (I think SVM already is), there is no reason
 for an explicit enable. And you can always mask it out via -cpu.
 
 BTW, what are the defaults for SVM right now in qemu-kvm and upstream?
 Enable if the modeled CPU supports it?

qemu-kvm still needs -enable-nesting, otherwise it is disabled. Upstream
qemu should enable it unconditionally (can be disabled with -cpu ,-svm).

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 08/10] ahci: add ahci emulation

2010-11-19 Thread Roedel, Joerg
On Fri, Nov 19, 2010 at 04:12:43AM -0500, Gerd Hoffmann wrote:
  +static void ahci_check_irq(AHCIState *s)
 
  MSI support would be nice to have.
 
  Alrighty, I added MSI support. But I only have a single vector in use
  atm as nvec  0 doesn't get written to the PCI config space so the code
  trips on that later on.
 
 For multiple vectors MSI-X is the best choice as it fixes a few design 
 issues MSI has when it comes to multiple vectors.  I think linux never 
 ever uses MSI with multiple vectors anyway.

Linux doesn't even support MSI with multiple vectors today.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-10-18 Thread Roedel, Joerg
(Sorry for the late reply)

On Thu, Oct 07, 2010 at 08:48:06AM -0400, Anthony Liguori wrote:
 On 10/07/2010 03:42 AM, Roedel, Joerg wrote:
  On Wed, Oct 06, 2010 at 03:24:59PM -0400, Anthony Liguori wrote:
 
  +qemu_compat_version = machine-compat_version;
  +
 if (display_type == DT_NOGRAPHIC) {
 if (default_parallel)
 add_device_config(DEV_PARALLEL, null);
  -- 
  1.7.0.4
 
   
  Looks fine to me, given CPUs are not in qdev. Anthony?
 
 
  The idea is fine, but why not just add the default CPU to the machine
  description?
   
  If I remember correctly the reason was that the machine description was
  not accessible in the cpuid initialization path because it is a function
  local variable.
 
 Not tested at all but I think the attached patch addresses it in a 
 pretty nice way.
 
 There's a couple ways you could support your patch on top of this.  You 
 could add a kvm_cpu_model to the machine structure that gets defaulted 
 too if kvm_enabled().  You could also introduce a new KVM machine type 
 that gets defaulted to if no explicit machine is specified.

I had something similar in mind but then I realized that we need at
least a cpu_model and a cpu_model_kvm to distinguish between the TCG and
the KVM case.
Further the QEMUMachine data structure is used for all architectures in
QEMU and the model-names only make sense for x86. So I decided for the
comapt-version way (which doesn't mean I object against this one ;-) )

Joerg

 From d2370c88cef4b07d48ba3c4804e35ae2db8db7c0 Mon Sep 17 00:00:00 2001
 From: Anthony Liguori aligu...@us.ibm.com
 Date: Thu, 7 Oct 2010 07:43:42 -0500
 Subject: [PATCH] machine: make default cpu model part of machine structure
 
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 
 diff --git a/hw/boards.h b/hw/boards.h
 index 6f0f0d7..8c6ef27 100644
 --- a/hw/boards.h
 +++ b/hw/boards.h
 @@ -16,6 +16,7 @@ typedef struct QEMUMachine {
  const char *name;
  const char *alias;
  const char *desc;
 +const char *cpu_model;
  QEMUMachineInitFunc *init;
  int use_scsi;
  int max_cpus;
 diff --git a/hw/pc.c b/hw/pc.c
 index 69b13bf..0826107 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -866,14 +866,6 @@ void pc_cpus_init(const char *cpu_model)
  int i;
  
  /* init CPUs */
 -if (cpu_model == NULL) {
 -#ifdef TARGET_X86_64
 -cpu_model = qemu64;
 -#else
 -cpu_model = qemu32;
 -#endif
 -}
 -
  for(i = 0; i  smp_cpus; i++) {
  pc_new_cpu(cpu_model);
  }
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 12359a7..919b4d6 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -204,17 +204,22 @@ static void pc_init_isa(ram_addr_t ram_size,
  const char *initrd_filename,
  const char *cpu_model)
  {
 -if (cpu_model == NULL)
 -cpu_model = 486;
  pc_init1(ram_size, boot_device,
   kernel_filename, kernel_cmdline,
   initrd_filename, cpu_model, 0);
  }
  
 +#ifdef TARGET_X86_64
 +#define DEF_CPU_MODEL qemu64
 +#else
 +#define DEF_CPU_MODEL qemu32
 +#endif
 +
  static QEMUMachine pc_machine = {
  .name = pc-0.13,
  .alias = pc,
  .desc = Standard PC,
 +.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .is_default = 1,
 @@ -223,6 +228,7 @@ static QEMUMachine pc_machine = {
  static QEMUMachine pc_machine_v0_12 = {
  .name = pc-0.12,
  .desc = Standard PC,
 +.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
 @@ -242,6 +248,7 @@ static QEMUMachine pc_machine_v0_12 = {
  static QEMUMachine pc_machine_v0_11 = {
  .name = pc-0.11,
  .desc = Standard PC, qemu 0.11,
 +.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
 @@ -277,6 +284,7 @@ static QEMUMachine pc_machine_v0_11 = {
  static QEMUMachine pc_machine_v0_10 = {
  .name = pc-0.10,
  .desc = Standard PC, qemu 0.10,
 +.cpu_model = DEF_CPU_MODEL,
  .init = pc_init_pci,
  .max_cpus = 255,
  .compat_props = (GlobalProperty[]) {
 @@ -324,6 +332,7 @@ static QEMUMachine pc_machine_v0_10 = {
  static QEMUMachine isapc_machine = {
  .name = isapc,
  .desc = ISA-only PC,
 +.cpu_model = 486,
  .init = pc_init_isa,
  .max_cpus = 1,
  };
 diff --git a/vl.c b/vl.c
 index df414ef..3a55cc8 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -2904,6 +2904,10 @@ int main(int argc, char **argv, char **envp)
  }
  qemu_add_globals();
  
 +if (cpu_model == NULL) {
 +cpu_model = machine-cpu_model;
 +}
 +
  machine-init(ram_size, boot_devices,
kernel_filename, kernel_cmdline, initrd_filename, 
 cpu_model);
  
 -- 
 1.7.0.4
 


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew

[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-10-07 Thread Roedel, Joerg
On Wed, Oct 06, 2010 at 03:24:59PM -0400, Anthony Liguori wrote:
  +qemu_compat_version = machine-compat_version;
  +
if (display_type == DT_NOGRAPHIC) {
if (default_parallel)
add_device_config(DEV_PARALLEL, null);
  -- 
  1.7.0.4
   
  Looks fine to me, given CPUs are not in qdev. Anthony?
 
 
 The idea is fine, but why not just add the default CPU to the machine 
 description?

If I remember correctly the reason was that the machine description was
not accessible in the cpuid initialization path because it is a function
local variable. I could have made it a global variable but considered
the compat_version approach simpler. The qemu_compat_version might also
be useful at other places.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Roedel, Joerg
On Mon, Sep 27, 2010 at 12:22:13PM -0400, Avi Kivity wrote:
   On 09/27/2010 05:40 PM, Roedel, Joerg wrote:
  On Mon, Sep 27, 2010 at 11:02:23AM -0400, Avi Kivity wrote:
  On 09/27/2010 04:58 PM, Avi Kivity wrote:
   On 09/27/2010 03:16 PM, Joerg Roedel wrote:
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.


  This should really be all svm features since we'll later mask them
  with kvm capabilities.  This way, if we later add more capabilities,
  we automatically gain support in userspace.
 
  Yes, that is what -cpu host does with these patches applied. The
  svm_features variable is set to -1 in this case and masked out later
  with the KVM capabilities.
 
 
 Well, running current uq/master I get:
 
has_svm: can't execute cpuid 0x800a
kvm: no hardware support
 
 ?

Weird, it worked here as I tested it. I had it on qemu/master and with
all three patches. But patch 1 should not make the difference. I take a
look, have you pushed the failing uq/master? What was your command line?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Roedel, Joerg
On Tue, Sep 28, 2010 at 05:37:58AM -0400, Avi Kivity wrote:
   On 09/28/2010 11:28 AM, Roedel, Joerg wrote:

  Weird, it worked here as I tested it. I had it on qemu/master and with
  all three patches. But patch 1 should not make the difference. I take a
  look, have you pushed the failing uq/master?
 
 Yes, 8fe6a21c76.
 
  What was your command line?
 
 qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ...
 
 Note this is qemu.git, so -enable-kvm is needed.

Ok, I apparently forgot to force the CPUID xlevel to be 0x800A when
SVM is enabled, probably because I only tested CPUID models where xlevel
already defaults to 0x800A. Attached is a fix, thanks for catching
this.

Joerg


From d18d973d037f1af1869e114aea1f076ea3a1ee7a Mon Sep 17 00:00:00 2001
From: Joerg Roedel joerg.roe...@amd.com
Date: Tue, 28 Sep 2010 11:58:49 +0200
Subject: [PATCH 1/1] qemu: Force xlevel to at least 0x800A if SVM is enabled

This patch forces the extended CPUID level to be at least
0x800A if SVM is enabled in the CPU model.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 target-i386/cpuid.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 0e0bf60..0630fe1 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -714,6 +714,11 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 x86_cpu_def-ext3_features = ~minus_ext3_features;
 x86_cpu_def-kvm_features = ~minus_kvm_features;
 x86_cpu_def-svm_features = ~minus_svm_features;
+if ((x86_cpu_def-ext3_features  CPUID_EXT3_SVM) 
+(x86_cpu_def-xlevel  0x800A)) {
+/* Force xlevel to at least 0x800A if SVM enabled */
+x86_cpu_def-xlevel = 0x800A;
+}
 if (check_cpuid) {
 if (check_features_against_host(x86_cpu_def)  enforce_cpuid)
 goto error;
-- 
1.7.0.4


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-27 Thread Roedel, Joerg
On Mon, Sep 27, 2010 at 11:02:23AM -0400, Avi Kivity wrote:
   On 09/27/2010 04:58 PM, Avi Kivity wrote:
   On 09/27/2010 03:16 PM, Joerg Roedel wrote:
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.
 
 
  This should really be all svm features since we'll later mask them 
  with kvm capabilities.  This way, if we later add more capabilities, 
  we automatically gain support in userspace.

Yes, that is what -cpu host does with these patches applied. The
svm_features variable is set to -1 in this case and masked out later
with the KVM capabilities.

 
  (or rather, all svm features that don't need save/restore support in 
  userspace; I don't think any do, beyond svm itself)
 
 
 I applied 2 and 3 (to branch uq/master).  If you want to add more svm 
 features, please send a patch.  For patch 1, I'd like a review by 
 someone who understands the compat code.

Great, thanks.

Joerg
-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-16 Thread Roedel, Joerg
On Thu, Sep 16, 2010 at 10:06:48AM -0400, Avi Kivity wrote:
   On 09/14/2010 05:52 PM, Joerg Roedel wrote:
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.
 
 
 
 Does phenom really have vmcbclean?  I thought it was a really new feature.

No, but we could emulate it on almost all hardware that has SVM. Its
basically the same as with x2apic.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-09-16 Thread Roedel, Joerg
On Thu, Sep 16, 2010 at 10:03:19AM -0400, Avi Kivity wrote:
   On 09/14/2010 05:52 PM, Joerg Roedel wrote:
  As requested by Alex this patch makes kvm64 the default CPU
  model when qemu is started with -enable-kvm.
 
 
 
 This breaks compatiblity; if started with -M 0.13 or prior we should 
 default to qemu64.

Ok, I can change that. But its  ok to make kvm64 the default for
anything  0.13?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-16 Thread Roedel, Joerg
On Thu, Sep 16, 2010 at 10:17:09AM -0400, Alexander Graf wrote:
 Roedel, Joerg wrote:
  On Thu, Sep 16, 2010 at 10:06:48AM -0400, Avi Kivity wrote:

On 09/14/2010 05:52 PM, Joerg Roedel wrote:
  
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.
 
 

  Does phenom really have vmcbclean?  I thought it was a really new feature.
  
 
  No, but we could emulate it on almost all hardware that has SVM. Its
  basically the same as with x2apic.

 
 -cpu non-host is not about what we could emulate, but what the
 hardware really is. Features not supported by the particular CPU do not
 belong there.

I'll remove it then. It would have been nice to have it there because it
will improve performance of svm emulation. But if it is about emulating
the real cpu then I'll just drop it.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 3/3] Add svm cpuid features

2010-09-16 Thread Roedel, Joerg
On Thu, Sep 16, 2010 at 10:28:01AM -0400, Alexander Graf wrote:
 Roedel, Joerg wrote:
  I'll remove it then. It would have been nice to have it there because it
  will improve performance of svm emulation. But if it is about emulating
  the real cpu then I'll just drop it.
 
 speed == -cpu host :). If the performance increase is significant, we
 can always add a new cpu type for a cpu that does support those flags so
 people can use it there and still be migration safe. In fact, that would
 even benefit you guys as you'd sell more new machines for speed ;).

Well, since its emulated completly in software it would be available on
all hosts that start with -cpu phenom (at least if they use the same kvm
version). So this would not break migration.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-09-14 Thread Roedel, Joerg
On Tue, Sep 14, 2010 at 11:58:03AM -0400, Alexander Graf wrote:

  +if (kvm_enabled())
  +cpu_model = DEFAULT_KVM_CPU_MODEL;
  +else
  +cpu_model = DEFAULT_QEMU_CPU_MODEL;

 
 Braces :(.

Okay, here is the new patch:

From f49e78edbd4143d05128228d9cc24bd5abc3abf1 Mon Sep 17 00:00:00 2001
From: Joerg Roedel joerg.roe...@amd.com
Date: Tue, 14 Sep 2010 16:52:11 +0200
Subject: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

As requested by Alex this patch makes kvm64 the default CPU
model when qemu is started with -enable-kvm.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 hw/pc.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..a6355f3 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -40,6 +40,16 @@
 #include sysbus.h
 #include sysemu.h
 #include blockdev.h
+#include kvm.h
+
+
+#ifdef TARGET_X86_64
+#define DEFAULT_KVM_CPU_MODEL kvm64
+#define DEFAULT_QEMU_CPU_MODEL qemu64
+#else
+#define DEFAULT_KVM_CPU_MODEL kvm32
+#define DEFAULT_QEMU_CPU_MODEL qemu32
+#endif
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -867,11 +877,11 @@ void pc_cpus_init(const char *cpu_model)
 
 /* init CPUs */
 if (cpu_model == NULL) {
-#ifdef TARGET_X86_64
-cpu_model = qemu64;
-#else
-cpu_model = qemu32;
-#endif
+if (kvm_enabled()) {
+cpu_model = DEFAULT_KVM_CPU_MODEL;
+} else {
+cpu_model = DEFAULT_QEMU_CPU_MODEL;
+}
 }
 
 for(i = 0; i  smp_cpus; i++) {
-- 
1.7.0.4


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 1/3] qemu-kvm: Invert svm-flag setting logic

2010-09-07 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 08:59:52AM -0400, Avi Kivity wrote:
   On 09/07/2010 03:55 PM, Alexander Graf wrote:
 
  Was it broken?  How?
  When migrating inside l2 context, we're missing information. My idea back 
  then was to force the l1 guest out of l2 context every time we want to 
  migrate, but I'm not sure this has happened.
 
 I thought that was implemented, but not sure (though it isn't clean 
 architecturally).
 
  Even then I'm not sure we're transferring the GIF.
 
 Argh, we aren't.

Migration with nested-svm doesn't work reliably today. But that is on my
list.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 1/3] qemu-kvm: Invert svm-flag setting logic

2010-09-07 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 10:14:46AM -0400, Alexander Graf wrote:
 
 On 07.09.2010, at 16:12, Roedel, Joerg wrote:
 
  On Tue, Sep 07, 2010 at 09:51:33AM -0400, Avi Kivity wrote:
   On 09/07/2010 03:23 PM, Alexander Graf wrote:
  
  I think we should get rid of kvm_nested and -enable-nesting. Instead, we 
  should enable the SVM bit in the host and qemu64 cpu types, but not 
  in kvm64. This way users are safe to not use nested svm, but can choose 
  to do so if they like.
  
  Also, it should be possible to do something like -cpu kvm64,flags=+svm. 
  Without having to mess with -enable-nesting.
  
  I agree, -enable-nesting is redundant with -cpu ,+svm.
  
  This patchset makes -enable-nesting pratically a synonym for -cpu ,+svm.
  So we can safely remove -enable-nesting in the future.
 
 Why not just not introduce it? :) It's always hard to get rid of legacy.

Because its already there? The patches are against qemu-kvm.

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 1/3] qemu-kvm: Invert svm-flag setting logic

2010-09-07 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 10:16:10AM -0400, Alexander Graf wrote:
 Oh, because this is for qemu-kvm. Uh - how about a nice little patch
 that makes things work in qemu.git, leave out the -enable-nesting
 piece and just keep the -enable-nesting backwards compat patch in
 qemu-kvm.git - or maybe even remove it there.

I can do this. Is it better to merge all this stuff into qemu.git
instead of qemu-kvm.git? Then I target again for qemu.git.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




Re: [Qemu-devel] [PATCH 1/3] qemu-kvm: Invert svm-flag setting logic

2010-09-07 Thread Roedel, Joerg
On Tue, Sep 07, 2010 at 09:51:33AM -0400, Avi Kivity wrote:
   On 09/07/2010 03:23 PM, Alexander Graf wrote:
 
  I think we should get rid of kvm_nested and -enable-nesting. Instead, we 
  should enable the SVM bit in the host and qemu64 cpu types, but not in 
  kvm64. This way users are safe to not use nested svm, but can choose to 
  do so if they like.
 
  Also, it should be possible to do something like -cpu kvm64,flags=+svm. 
  Without having to mess with -enable-nesting.
 
 I agree, -enable-nesting is redundant with -cpu ,+svm.

This patchset makes -enable-nesting pratically a synonym for -cpu ,+svm.
So we can safely remove -enable-nesting in the future.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632




[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Thu, May 27, 2010 at 02:53:18PM -0400, Jan Kiszka wrote:
 Based on the KVM code (which is known to work perfectly :) ), I think
 you are right: SVM apparently clears the valid bit in EVENTINJ during
 VMRUN once it starts processing the injection, not after it as it's the
 case in current QEMU. But better ask the experts: Jörg, Gleb?

SVM always clears the vmcb.eventinj on vmrun because every exception is
injected right after vmrun finished and cpu is in guest mode. It can
happen (for example if taking the exception causes a page fault) that
the vmcb.eventinj field is copied to vmcb.exit_int_info.
Also note that at this point there is a difference between hardware svm
and the nested-svm implementation in kvm. The hardware always takes the
exception first before checking for any other intercept condition. This
basically means that exit_int_info is only set when the injected event
could not be delivered due to other conditions in the guest (page
fault, nested page-fault, ...).
In nested-svm you can get a valid exit_int_info when an interrupt or nmi
is pending too. In the software implementation these intercepts are
taken before the event is delivered and you find the event in
vmcb.exit_int_info.
This is not forbidden in the svm architecture and I have not found a
hypervisor that has a problem with this different behavior. I have a
patch here which changes this in nested-svm, but it introduces more
problems than it fixes.

Joerg





[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Thu, May 27, 2010 at 06:20:00PM -0400, Jan Kiszka wrote:
 Erik van der Kouwe wrote:
  Problem is: I'm compiling in Linux and testing in MINIX. Testing on the
  real hardware would require a reboot everytime. Moreover, it might screw
  up my system if I make bad mistakes (the MINIX filesystem is easily
  corrupted).
 
 Use Linux+KVM as host OS, it can also run VMMs as guests (aka nested
 SVM). And you could even debug those guests just like when you would run
 QEMU in emulation mode. In contrast to SVM emulation, nesting is fairly
 stable AFAIK. And it is faster.

At least it is more stable than any other nested-svm implementation I
know of ;-)
There are issues with kvmclock when you run kvm-on-kvm and you should
not expect windows-based hypervisors to run without problems. Beside
that, for running kvm-on-kvm and xen-on-kvm it is indeed fairly
stable :-)

  Linux source tree (2.6.31-ubuntu), arch/x86/kvm/svm.c, end of function
  nested_svm_vmrun. Here event_inj and event_inj_err are copied from a
  different VMCB, effectively clearing the value set by the CPU. Maybe
  this isn't were I should have been looking though?

The interesting part is in nested_svm_vmexit. There you have this piece
of code:

/*
 * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have
 * to make sure that we do not lose injected events. So check event_inj
 * here and copy it to exit_int_info if it is valid.
 * Exit_int_info and event_inj can't be both valid because the case
 * below only happens on a VMRUN instruction intercept which has
 * no valid exit_int_info set.
 */
if (vmcb-control.event_inj  SVM_EVTINJ_VALID) {
struct vmcb_control_area *nc = nested_vmcb-control;

nc-exit_int_info = vmcb-control.event_inj;
nc-exit_int_info_err = vmcb-control.event_inj_err;
}

and a few lines later:

nested_vmcb-control.event_inj = 0;
nested_vmcb-control.event_inj_err = 0;

... which takes care of this situation. The vmcb.eventinf field is _defined_ to
be zero on a #vmexit.

Joerg





[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Fri, May 28, 2010 at 02:10:59AM -0400, Jan Kiszka wrote:
 Erik van der Kouwe wrote:

  In my experience, if I provide the -enable-kvm switch then the guest VMM
  never detects the presence of virtualization support. Does this only
  work on AMD hardware? Or do I need to supply some additional parameter
  to make it work?
 
 Yes, forgot to mention: -enable-nesting, and you need qemu-kvm. This
 feature hasn't been merged upstream yet.

And the svm-emulation is only available on AMD hardware.

Joerg





[Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Fri, May 28, 2010 at 03:45:09AM -0400, Erik van der Kouwe wrote:
 This is a ok, the problem is the event_inj field rather than the 
 exit_int_info field. From what I've seen the SVM specification neither 
 specifies that the CPU writes to this field nor does it explicitly 
 forbid it. Given the unclarity of the specification it may safest to 
 deal with this in the same way as the hardware does (although I don't 
 know which way this is, it seems inuitively unlikely that the hardware 
 would set event_inj to valid).

The AMD64 Architecture Programmer's Manual Volume 2 states in section
15.19:

When an event is injected by means of this mechanism, the VMRUN
instruction causes the guest to unconditionally take the
specified exception or interrupt before executing the first
guest instruction.

Which implicitly means that. But it could be documented more explicitly,
thats right :)

Joerg





Re: [Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Fri, May 28, 2010 at 09:30:19AM -0400, Erik van der Kouwe wrote:
 Would be nice to have nested VMX support though, given that Intel CPUs 
 are so much more common than AMDs. I've been searching on the Dell 
 website (in the Netherlands) for a laptop recently and I couldn't find a 
 single AMD model.

That doesn't mean they are uncommon ;-) There are many vendors that
offer notebooks with AMD processors.

Joerg





Re: [Qemu-devel] Re: SVM emulation: EVENTINJ marked valid when a pagefault happens while issuing a software interrupt

2010-05-28 Thread Roedel, Joerg
On Fri, May 28, 2010 at 09:20:31AM -0400, Jamie Lokier wrote:
 Roedel, Joerg wrote:
  On Fri, May 28, 2010 at 02:10:59AM -0400, Jan Kiszka wrote:
   Erik van der Kouwe wrote:
  
In my experience, if I provide the -enable-kvm switch then the guest VMM
never detects the presence of virtualization support. Does this only
work on AMD hardware? Or do I need to supply some additional parameter
to make it work?
   
   Yes, forgot to mention: -enable-nesting, and you need qemu-kvm. This
   feature hasn't been merged upstream yet.
  
  And the svm-emulation is only available on AMD hardware.
 
 I assume you mean nested SVM emulation in a KVM guest is only
 available on real AMD hardware?

Right.

 Is this due to something inherent, or just a limitation of the KVM
 code not handling all the necessary traps in kvm-intel?

The SVM and VMX extensions are architecturally very different in many
details. This makes it very hard to emulate VMX on SVM or vice verca. I
am not even sure if it is possible at all to emulate one extension in an
architectural complete way using the other extension.

Joerg