Virtualization DevRoom at FOSDEM 2013

2012-11-16 Thread Chris Wright
Following on the heels of a successful KVM Forum and oVirt Workshop,
FOSDEM will be hosting a Virtualization DevRoom in February.  If you've
been to FOSDEM before, you know this is about developers and code, not
products.

Presentation proposals are due by December 16th 2012.

The full details are here:

 http://osvc.v2.cs.unibo.it/index.php/Main_Page

With the relevant topics being:

Topics covered will include, but not limited to:
 - machine virtualization (e.g. KVM, Xen, VirtualBox,...)
 - network virtualization (e.g. openvstack, vale, vde, Open vSwitch,...)
 - process level virtualization, flexible kernels (e.g. rump anykernel, 
view-os, ...)
 - virt management (e.g. ganeti, libvirt, ovirt, XCP, ...)

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU was not selected for Google Summer of Code this year

2012-03-16 Thread Chris Wright
* Natalia Portillo (clau...@claunia.com) wrote:
 QEMU hosted on Haiku would be interesting.

The fun of Haiku
especially when it is
hosting QEMU
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-07 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 02/07/2012 07:18 AM, Avi Kivity wrote:
 On 02/07/2012 02:51 PM, Anthony Liguori wrote:
 On 02/07/2012 06:40 AM, Avi Kivity wrote:
 On 02/07/2012 02:28 PM, Anthony Liguori wrote:
 
 It's a potential source of exploits
 (from bugs in KVM or in hardware). I can see people wanting to be
 selective with access because of that.
 
 As is true of the rest of the kernel.
 
 If you want finer grain access control, that's exactly why we have things 
 like
 LSM and SELinux. You can add the appropriate LSM hooks into the KVM
 infrastructure and setup default SELinux policies appropriately.
 
 LSMs protect objects, not syscalls. There isn't an object to protect here
 (except the fake /dev/kvm object).
 
 A VM can be an object.
 
 Not really, it's not accessible in a namespace. How would you label it?

A VM, vcpu, etc are all objects.  The labelling can be implicit based on
the security context of the process creating the object.  You could create
simplistic rules such as a process may have the ability KVM__VM_CREATE
(this is roughly analogous to the PROC__EXECMEM policy control that
allows some processes to create executable writable memory mappings, or
SHM__CREATE for a process that can create a shared memory segment).
Adding some label mgmt to the object (add -security and some callbacks to
do -alloc/init/free), and then checks on the object itself would allow
for finer grained protection.  If there was any VM lookup (although the
original example explicitly ties a process to a vm and a thread to a
vcpu) the finer grained check would certainly be useful to verify that
the process can access the VM.

 Labels can originate from userspace, IIUC, so I think it's possible for QEMU
 (or whatever the userspace is) to set the label for the VM while it's
 creating it. I think this is how most of the labeling for X and things of
 that nature works.

For X, the policy enforcement is done in the X server.  There is
assistance from the kernel for doing policy server queries (can foo do
bar?), but it's up to the X server to actually care enough to ask and
then fail a request that doesn't comply.  I'm not sure that's the model
here.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Add device info into list before doing context mapping

2011-12-21 Thread Chris Wright
* Hao, Xudong (xudong@intel.com) wrote:
 Yes, Chris, thanks your comments.
 How about this one?

Yes, it gets the locking right.

Also makes host device and guest assigned device go through the same order:

alloc_devinfo and init
lock; place info on lists; unlock
domain_context_mapping()

The patch itself is whitespace damaged and does not apply.  Please fix
and feel free to add my:

Acked-by: Chris Wright chr...@sous-sol.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Add device info into list before doing context mapping

2011-12-21 Thread Chris Wright
* Chris Wright (chr...@sous-sol.org) wrote:
 * Hao, Xudong (xudong@intel.com) wrote:
  Yes, Chris, thanks your comments.
  How about this one?
 
 Yes, it gets the locking right.

Sorry, I missed one other problem on the error path.  You need to also
update pdev-dev.archdata.iommu to NULL (otherwise it is left pointing
to freed memory).

 Also makes host device and guest assigned device go through the same order:
 
 alloc_devinfo and init
 lock; place info on lists; unlock
 domain_context_mapping()
 
 The patch itself is whitespace damaged and does not apply.  Please fix
 and feel free to add my:
 
 Acked-by: Chris Wright chr...@sous-sol.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Add device info into list before doing context mapping

2011-12-20 Thread Chris Wright
* Hao, Xudong (xudong@intel.com) wrote:
 @@ -2282,6 +2276,14 @@ static int domain_add_dev_info(struct dmar_domain 
 *domain,
   pdev-dev.archdata.iommu = info;
   spin_unlock_irqrestore(device_domain_lock, flags);
  
 + ret = domain_context_mapping(domain, pdev, translation);
 + if (ret) {
 + list_del(info-link);
 + list_del(info-global);

At the very least, this is not correct locking.

 + free_devinfo_mem(info);
 + return ret;
 + }
 +
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-30 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Wed, 2011-11-30 at 21:52 +0530, Dipankar Sarma wrote:
  
  Also, if at all topology changes due to migration or host kernel decisions,
  we can make use of something like VPHN (virtual processor home node)
  capability on Power systems to have guest kernel update its topology
  knowledge. You can refer to that in
  arch/powerpc/mm/numa.c. 
 
 I think that fail^Wfeature of PPC is terminally broken. You simply
 cannot change the topology after the fact. 

Agreed, there's too many things that consult topology once and never
look back.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode

2011-11-30 Thread Chris Wright
* Ben Hutchings (bhutchi...@solarflare.com) wrote:
 On Wed, 2011-11-30 at 09:34 -0800, Greg Rose wrote:
  On 11/29/2011 9:19 AM, Ben Hutchings wrote:
   On Tue, 2011-11-29 at 16:35 +, Ben Hutchings wrote:
  
   Maybe I missed something!
 [...]
   If not, please explain what the new model *is*.
  
  The new model is to incorporate a VEB into the NIC.  The current model 
  doesn't address any of the requirements of a VEB in the NIC and this 
  proposed set of patches allow us to set MAC filters for the *ports* on 
  the internal NIC VEB.  Consider the PF and each of the VFs as just a 
  port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
  VLAN) for each of the ports on that VEB.  There is no currently 
  supported method for doing this.  So yes, this is a new model although 
  it's a fairly simple one.
 
 Explain precisely how the VEB changes the existing model.  Explain how
 the existing MAC filter and VF filter APIs interact with port filters on
 the VEB.  Refer to any relevant standards.

I agree that it's confusing.  Couldn't you simplify your ascii art
(hopefully removing hw assumptions about receive processing, and
completely ignoring vlans for the moment) to something like:

 |RX
 v
++-+
| +--++|
| | RX MAC filter ||
| |and port select||
| +---+|
|/|\   |
|   / | \   match 2|
|  /  v  \ |
| /match  \|
|/  1 |\   |
|   / | \  |
|match /  |  \ |
|  0  /   |   \|
|v|v   |
||||   |
++++---+
 |||
PF   VF 1 VF 2

And there's an unclear number of ways to update RX MAC filter and port
select table.

1) PF ndo_set_mac_addr
I expect that to be implicit to match 0.

2) PF ndo_set_rx_mode
Less clear, but I'd still expect these to implicitly match 0

3) PF ndo_set_vf_mac
I expect these to be an explicit match to VF N (given the interface
specifices which VF's MAC is being programmed).

4) VF ndo_set_mac_addr
This one may or may not be allowed (setting MAC+port if the VF is owned
by a guest is likely not allowed), but would expect an implicit VF N.

5) VF ndo_set_rx_mode
Same as 4) above.

6) PF or VF? ndo_set_rx_filter_addr
The new proposal, which has an explicit VF, although when it's VF_SELF
I'm not clear if this is just the same as 5) above?

Have I missed anything?

thanks,
chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode

2011-11-30 Thread Chris Wright
* Ben Hutchings (bhutchi...@solarflare.com) wrote:
 On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
  I agree that it's confusing.  Couldn't you simplify your ascii art
  (hopefully removing hw assumptions about receive processing, and
  completely ignoring vlans for the moment) to something like:
 
   |RX
   v
  ++-+
  | +--++|
  | | RX MAC filter ||
  | |and port select||
  | +---+|
  |/|\   |
  |   / | \   match 2|
  |  /  v  \ |
  | /match  \|
  |/  1 |\   |
  |   / | \  |
  |match /  |  \ |
  |  0  /   |   \|
  |v|v   |
  ||||   |
  ++++---+
   |||
  PF   VF 1 VF 2
  
  And there's an unclear number of ways to update RX MAC filter and port
  select table.
  
  1) PF ndo_set_mac_addr
  I expect that to be implicit to match 0.
  
  2) PF ndo_set_rx_mode
  Less clear, but I'd still expect these to implicitly match 0
  
  3) PF ndo_set_vf_mac
  I expect these to be an explicit match to VF N (given the interface
  specifices which VF's MAC is being programmed).
 
 I'm not sure whether this is supposed to implicitly add to the MAC
 filter or whether that has to be changed too.  That's the main
 difference between my models (a) and (b).

I see now.  I wasn't entirely clear on the difference before.  It's also
going to be hw specific.  I think (Intel folks can verify) that the
Intel SR-IOV devices have a single global unicast exact match table,
for example.

 There's also PF ndo_set_vf_vlan.

Right, although I had mentioned I was trying to limit just to MAC
filtering to simplify.

  4) VF ndo_set_mac_addr
  This one may or may not be allowed (setting MAC+port if the VF is owned
  by a guest is likely not allowed), but would expect an implicit VF N.
  
  5) VF ndo_set_rx_mode
  Same as 4) above.
 
 So this is where we are today.

Cool, good that we agree there.

  6) PF or VF? ndo_set_rx_filter_addr
  The new proposal, which has an explicit VF, although when it's VF_SELF
  I'm not clear if this is just the same as 5) above?
  
  Have I missed anything?
 
 Any physical port can be bridged to a mixture of guests with and without
 their own VFs.  Packets sent from a guest with a VF to the address of a
 guest without a VF need to be forwarded to the PF rather than the
 physical port, but none of the drivers currently get to know about those
 addresses.

To clarify, do you mean something like this?

   physical port
 |
+++
| +-+ |
| | VEB | |
| +-+ |
|/   |   \|
|   /|\   |
|  / | \  |
+-+--+--+-+
  |  |   |
 PFVF 1VF 2
 /   |   | 
 +---+---+  VM4  +---+---+
 |  sw   |   |macvtap|
 | switch|   +---+---+
 +-+-+-+-+   |
   / | \VM5
  /  |  \
VM1 VM2 VM3

This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
switching), VM4 directly owning VF1 (pci device assignement), and VM5
indirectly owning VF2 (macvtap passthrough, that started this whole
thing).

So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
goes in to VEB, out PF, and into linux bridging code, rigth?  At which
point the PF is in promiscuous mode (btw, same does not work if bridge is
attached to VF, at least for some VFs, due to lack of promiscuous mode).

 Packets sent from a guest with a VF to the address of another guest with
 a VF need to be forwarded similarly, but the driver should be able to
 infer that from (3).

Right, and that works currently for the case where both guests are like
VM4, they directly own the VF via PCI device assignement.  But for VM4
to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
address than VM5's vNIC.  If the embedded bridge does not learn, and
nobody programmed it to fwd frames for VM5 via VF3...

I believe this is what Roopa's patch will allow.  The question now is
whether there's a better way to handle this?

In my mind, we'd model the NIC's embedded bridge as, well, a bridge.
And set anti-spoofing, port mirroring, port mac/vlan filtering, etc via
that bridge.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode

2011-11-30 Thread Chris Wright
* Sridhar Samudrala (s...@us.ibm.com) wrote:
 On 11/30/2011 3:00 PM, Chris Wright wrote:
 physical port
   |
 +++
 | +-+ |
 | | VEB | |
 | +-+ |
 |/   |   \|
 |   /|\   |
 |  / | \  |
 +-+--+--+-+
|  |   |
   PFVF 1VF 2
   /   |   |
   +---+---+  VM4  +---+---+
   |  sw   |   |macvtap|
   | switch|   +---+---+
   +-+-+-+-+   |
 / | \VM5
/  |  \
 VM1 VM2 VM3
 
 This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
 switching), VM4 directly owning VF1 (pci device assignement), and VM5
 indirectly owning VF2 (macvtap passthrough, that started this whole
 thing).
 
 So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
 goes in to VEB, out PF, and into linux bridging code, rigth?  At which
 point the PF is in promiscuous mode (btw, same does not work if bridge is
 attached to VF, at least for some VFs, due to lack of promiscuous mode).
 
 Packets sent from a guest with a VF to the address of another guest with
 a VF need to be forwarded similarly, but the driver should be able to
 infer that from (3).
 Right, and that works currently for the case where both guests are like
 VM4, they directly own the VF via PCI device assignement.  But for VM4
 to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
 address than VM5's vNIC.  If the embedded bridge does not learn, and
 nobody programmed it to fwd frames for VM5 via VF3...
 I think you are referring to VF2. There is no VF3 in your picture.

*sigh*  (also meant 'VM4 or VM5' up above, not 'VM4 or VM4')...

 In macvtap passthru mode, VF2 will be set to the same mac address as VM5's
 MAC.  So VM4 should be be able to talk to VM5.

yes (i think macvtap in bridging or vepa mode w/ single VM has that issue,
not passthru)

 I believe this is what Roopa's patch will allow.  The question now is
 whether there's a better way to handle this?
 My understanding is that Roopa's patch will allow setting additional mac
 addresses to
 VM5 without the need to put VF5 in promiscous mode.

Thanks for your corrections Sridar.

cheers,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-21 Thread Chris Wright
* Peter Zijlstra (a.p.zijls...@chello.nl) wrote:
 On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote:
  
  In the original post of this mail thread, I proposed a way to export
  guest RAM ranges (Guest Physical Address-GPA) and their corresponding host
  host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU 
  monitor).
  The idea was to use this GPA to HVA mappings from tools like libvirt to bind
  specific parts of the guest RAM to different host nodes. This needed an
  extension to existing mbind() to allow binding memory of a process(QEMU) 
  from a
  different process(libvirt). This was needed since we wanted to do all this 
  from
  libvirt.
  
  Hence I was coming from that background when I asked for extending
  ms_mbind() to take a tid parameter. If QEMU community thinks that NUMA
  binding should all be done from outside of QEMU, it is needed, otherwise
  what you have should be sufficient. 
 
 That's just retarded, and no you won't get such extentions. Poking at
 another process's virtual address space is just daft. Esp. if there's no
 actual reason for it.

Need to separate the binding vs the policy mgmt.  The policy mgmt could
still be done outside, whereas the binding could still be done from w/in
QEMU.  A simple monitor interface to rebalance vcpu memory allcoations
to different nodes could very well schedule vcpu thread work in QEMU.

So, I agree, even if there is some external policy mgmt, it could still
easily work w/ QEMU to use Peter's proposed interface.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM device assignment and user privileges

2011-11-20 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
 On 11/20/2011 04:58 PM, Sasha Levin wrote:
  Hi all,
 
  I've been working on adding device assignment to KVM tools, and started
  with the basics of just getting a device assigned using the
  KVM_ASSIGN_PCI_DEVICE ioctl.
 
  What I've figured is that unprivileged users can request any PCI device
  to be assigned to him, including devices which he shouldn't be touching.
 
  In my case, it happened with the VGA card, where an unprivileged user
  simply called KVM_ASSIGN_PCI_DEVICE with the bus, seg and fn of the VGA
  card and caused the display on the host to go apeshit.
 
  Was it supposed to work this way? 
 
 No, of course not.

Indeed.  A device is typically owned by a host OS driver which precludes
device assignment from working.  If it's not, the unprivilged guest
will not have access to the device's config space or resource bars as
they are only rw for a privileged user.  And similarly, /dev/kvm was
typically left as 0644.  As you can see, it's fragile.

  I couldn't find any security checks in
  the code paths of KVM_ASSIGN_PCI_DEVICE and it looks like any user can
  invoke it with any parameters he'd want - enabling him to kill the host.
 
 Alex, Chris?

The security checks were removed some time back.  The expectation was
that there was nothing an unprivleged user could usefully do w/ the
assign device ioctl, and the assign irq ioctl only works after assign
device.  It's built on an overly fragile set of assumptions, however.
Avi, the simplest short term thing to do now might be simply revert:

48bb09e KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq

While it's a regression for existing unprivileged users it's better than
a hole.  And in the meantime, we can come up w/ something better to
replace with.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-11-08 Thread Chris Wright
* Alexander Graf (ag...@suse.de) wrote:
 On 29.10.2011, at 20:45, Bharata B Rao wrote:
  As guests become NUMA aware, it becomes important for the guests to
  have correct NUMA policies when they run on NUMA aware hosts.
  Currently limited support for NUMA binding is available via libvirt
  where it is possible to apply a NUMA policy to the guest as a whole.
  However multinode guests would benefit if guest memory belonging to
  different guest nodes are mapped appropriately to different host NUMA nodes.
  
  To achieve this we would need QEMU to expose information about
  guest RAM ranges (Guest Physical Address - GPA) and their host virtual
  address mappings (Host Virtual Address - HVA). Using GPA and HVA, any 
  external
  tool like libvirt would be able to divide the guest RAM as per the guest 
  NUMA
  node geometry and bind guest memory nodes to corresponding host memory nodes
  using HVA. This needs both QEMU (and libvirt) changes as well as changes
  in the kernel.
 
 Ok, let's take a step back here. You are basically growing libvirt into a 
 memory resource manager that know how much memory is available on which nodes 
 and how these nodes would possibly fit into the host's memory layout.
 
 Shouldn't that be the kernel's job? It seems to me that architecturally the 
 kernel is the place I would want my memory resource controls to be in.

I think that both Peter and Andrea are looking at this.  Before we commit
an API to QEMU that has a different semantic than a possible new kernel
interface (that perhaps QEMU could use directly to inform kernel of the
binding/relationship between vcpu thread and it's memory at VM startuup)
it would be useful to see what these guys are working on...

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] KVM: remove host and guest pv mmu support

2011-11-01 Thread Chris Wright
This feature hasn't been in use for some years now.  The host side bits
are deprecated for almost a year.  The guest side would only get used
on old hosts, and it's slower than shadow or hw assisted paging.

Time to remove it.

Chris Wright (2):
  KVM Guest: remove KVM guest pv mmu support
  KVM: remove KVM host pv mmu support

 Documentation/feature-removal-schedule.txt |9 --
 arch/x86/include/asm/kvm_host.h|   13 --
 arch/x86/kernel/kvm.c  |  181 
 arch/x86/kvm/mmu.c |  135 -
 arch/x86/kvm/x86.c |   12 --
 5 files changed, 0 insertions(+), 350 deletions(-)


Changes since RFC:

- v2 rebase to b796a09c

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] KVM Guest: remove KVM guest pv mmu support

2011-11-01 Thread Chris Wright
This has not been used for some years now.  It's time to remove it.

Signed-off-by: Chris Wright chr...@redhat.com
---

- v2 rebase to b796a09c

 arch/x86/kernel/kvm.c |  181 -
 1 files changed, 0 insertions(+), 181 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..f0c6fd6 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,8 +39,6 @@
 #include asm/desc.h
 #include asm/tlbflush.h
 
-#define MMU_QUEUE_SIZE 1024
-
 static int kvmapf = 1;
 
 static int parse_no_kvmapf(char *arg)
@@ -60,21 +58,10 @@ static int parse_no_stealacc(char *arg)
 
 early_param(no-steal-acc, parse_no_stealacc);
 
-struct kvm_para_state {
-   u8 mmu_queue[MMU_QUEUE_SIZE];
-   int mmu_queue_len;
-};
-
-static DEFINE_PER_CPU(struct kvm_para_state, para_state);
 static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
 static int has_steal_clock = 0;
 
-static struct kvm_para_state *kvm_para_state(void)
-{
-   return per_cpu(para_state, raw_smp_processor_id());
-}
-
 /*
  * No need for any IO delay on KVM
  */
@@ -271,151 +258,6 @@ do_async_page_fault(struct pt_regs *regs, unsigned long 
error_code)
}
 }
 
-static void kvm_mmu_op(void *buffer, unsigned len)
-{
-   int r;
-   unsigned long a1, a2;
-
-   do {
-   a1 = __pa(buffer);
-   a2 = 0;   /* on i386 __pa() always returns 4G */
-   r = kvm_hypercall3(KVM_HC_MMU_OP, len, a1, a2);
-   buffer += r;
-   len -= r;
-   } while (len);
-}
-
-static void mmu_queue_flush(struct kvm_para_state *state)
-{
-   if (state-mmu_queue_len) {
-   kvm_mmu_op(state-mmu_queue, state-mmu_queue_len);
-   state-mmu_queue_len = 0;
-   }
-}
-
-static void kvm_deferred_mmu_op(void *buffer, int len)
-{
-   struct kvm_para_state *state = kvm_para_state();
-
-   if (paravirt_get_lazy_mode() != PARAVIRT_LAZY_MMU) {
-   kvm_mmu_op(buffer, len);
-   return;
-   }
-   if (state-mmu_queue_len + len  sizeof state-mmu_queue)
-   mmu_queue_flush(state);
-   memcpy(state-mmu_queue + state-mmu_queue_len, buffer, len);
-   state-mmu_queue_len += len;
-}
-
-static void kvm_mmu_write(void *dest, u64 val)
-{
-   __u64 pte_phys;
-   struct kvm_mmu_op_write_pte wpte;
-
-#ifdef CONFIG_HIGHPTE
-   struct page *page;
-   unsigned long dst = (unsigned long) dest;
-
-   page = kmap_atomic_to_page(dest);
-   pte_phys = page_to_pfn(page);
-   pte_phys = PAGE_SHIFT;
-   pte_phys += (dst  ~(PAGE_MASK));
-#else
-   pte_phys = (unsigned long)__pa(dest);
-#endif
-   wpte.header.op = KVM_MMU_OP_WRITE_PTE;
-   wpte.pte_val = val;
-   wpte.pte_phys = pte_phys;
-
-   kvm_deferred_mmu_op(wpte, sizeof wpte);
-}
-
-/*
- * We only need to hook operations that are MMU writes.  We hook these so that
- * we can use lazy MMU mode to batch these operations.  We could probably
- * improve the performance of the host code if we used some of the information
- * here to simplify processing of batched writes.
- */
-static void kvm_set_pte(pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
-{
-   kvm_mmu_write(pmdp, pmd_val(pmd));
-}
-
-#if PAGETABLE_LEVELS = 3
-#ifdef CONFIG_X86_PAE
-static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_pte_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
-   kvm_mmu_write(ptep, 0);
-}
-
-static void kvm_pmd_clear(pmd_t *pmdp)
-{
-   kvm_mmu_write(pmdp, 0);
-}
-#endif
-
-static void kvm_set_pud(pud_t *pudp, pud_t pud)
-{
-   kvm_mmu_write(pudp, pud_val(pud));
-}
-
-#if PAGETABLE_LEVELS == 4
-static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-   kvm_mmu_write(pgdp, pgd_val(pgd));
-}
-#endif
-#endif /* PAGETABLE_LEVELS = 3 */
-
-static void kvm_flush_tlb(void)
-{
-   struct kvm_mmu_op_flush_tlb ftlb = {
-   .header.op = KVM_MMU_OP_FLUSH_TLB,
-   };
-
-   kvm_deferred_mmu_op(ftlb, sizeof ftlb);
-}
-
-static void kvm_release_pt(unsigned long pfn)
-{
-   struct kvm_mmu_op_release_pt rpt = {
-   .header.op = KVM_MMU_OP_RELEASE_PT,
-   .pt_phys = (u64)pfn  PAGE_SHIFT,
-   };
-
-   kvm_mmu_op(rpt, sizeof rpt);
-}
-
-static void kvm_enter_lazy_mmu(void)
-{
-   paravirt_enter_lazy_mmu();
-}
-
-static void kvm_leave_lazy_mmu(void)
-{
-   struct kvm_para_state *state = kvm_para_state();
-
-   mmu_queue_flush(state);
-   paravirt_leave_lazy_mmu();
-}
-
 static void

[PATCH v2 2/2] KVM: remove KVM host pv mmu support

2011-11-01 Thread Chris Wright
The host side pv mmu support has been marked for feature removal in
January 2011.  It's not in use, is slower than shadow or hardware
assisted paging, and a maintenance burden.  It's November 2011, time to
remove it.

Signed-off-by: Chris Wright chr...@redhat.com
---

- v2 rebase to b796a09c

 Documentation/feature-removal-schedule.txt |9 --
 arch/x86/include/asm/kvm_host.h|   13 ---
 arch/x86/kvm/mmu.c |  135 
 arch/x86/kvm/x86.c |   12 ---
 4 files changed, 0 insertions(+), 169 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index d5ac362..877f897 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -397,15 +397,6 @@ Who:   anybody or Florian Mickler flor...@mickler.org
 
 
 
-What:  KVM paravirt mmu host support
-When:  January 2011
-Why:   The paravirt mmu host support is slower than non-paravirt mmu, both
-   on newer and older hardware.  It is already not exposed to the guest,
-   and kept only for live migration purposes.
-Who:   Avi Kivity a...@redhat.com
-
-
-
 What:  iwlwifi 50XX module parameters
 When:  3.0
 Why:   The ..50 modules parameters were used to configure 5000 series and
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c1f19de..6d83264 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -244,13 +244,6 @@ struct kvm_mmu_page {
struct rcu_head rcu;
 };
 
-struct kvm_pv_mmu_op_buffer {
-   void *ptr;
-   unsigned len;
-   unsigned processed;
-   char buf[512] __aligned(sizeof(long));
-};
-
 struct kvm_pio_request {
unsigned long count;
int in;
@@ -347,10 +340,6 @@ struct kvm_vcpu_arch {
 */
struct kvm_mmu *walk_mmu;
 
-   /* only needed in kvm_pv_mmu_op() path, but it's hot so
-* put it here to avoid allocation */
-   struct kvm_pv_mmu_op_buffer mmu_op_buffer;
-
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
@@ -667,8 +656,6 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 
unsigned long cr3);
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
-int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
- gpa_t addr, unsigned long *ret);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 extern bool tdp_enabled;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e9534ce..a9b3a32 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2028,20 +2028,6 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 
-static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
-{
-   struct kvm_mmu_page *sp;
-   struct hlist_node *node;
-   LIST_HEAD(invalid_list);
-
-   for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
-   pgprintk(%s: zap %llx %x\n,
-__func__, gfn, sp-role.word);
-   kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-   }
-   kvm_mmu_commit_zap_page(kvm, invalid_list);
-}
-
 static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
 {
int slot = memslot_id(kvm, gfn);
@@ -4004,127 +3990,6 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm 
*kvm)
return nr_mmu_pages;
 }
 
-static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
-   unsigned len)
-{
-   if (len  buffer-len)
-   return NULL;
-   return buffer-ptr;
-}
-
-static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
-   unsigned len)
-{
-   void *ret;
-
-   ret = pv_mmu_peek_buffer(buffer, len);
-   if (!ret)
-   return ret;
-   buffer-ptr += len;
-   buffer-len -= len;
-   buffer-processed += len;
-   return ret;
-}
-
-static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
-gpa_t addr, gpa_t value)
-{
-   int bytes = 8;
-   int r;
-
-   if (!is_long_mode(vcpu)  !is_pae(vcpu))
-   bytes = 4;
-
-   r = mmu_topup_memory_caches(vcpu);
-   if (r)
-   return r;
-
-   if (!emulator_write_phys(vcpu, addr, value, bytes))
-   return -EFAULT;
-
-   return 1;
-}
-
-static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
-{
-   (void)kvm_set_cr3(vcpu, kvm_read_cr3(vcpu));
-   return 1;
-}
-
-static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
-{
-   spin_lock(vcpu-kvm-mmu_lock);
-   mmu_unshadow(vcpu-kvm, addr  PAGE_SHIFT);
-   spin_unlock(vcpu-kvm-mmu_lock

[PATCH RFC 0/2] KVM: remove host and guest pv mmu support

2011-10-20 Thread Chris Wright
This feature hasn't been in use for some years now.  The host side bits
are deprecated for almost a year.  The guest side would only get used
on old hosts, and it's slower than shadow or hw assisted paging.

Time to remove it.

 Documentation/feature-removal-schedule.txt |9 --
 arch/x86/include/asm/kvm_host.h|   13 --
 arch/x86/kernel/kvm.c  |  181 
 arch/x86/kvm/mmu.c |  135 -
 arch/x86/kvm/x86.c |   12 --
 5 files changed, 0 insertions(+), 350 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/2] KVM Guest: remove KVM guest pv mmu support

2011-10-20 Thread Chris Wright
This has not been used for some years now.  It's time to remove it.
Will also make some pv patching improvements easier.

Signed-off-by: Chris Wright chr...@redhat.com
---
 arch/x86/kernel/kvm.c |  181 -
 1 files changed, 0 insertions(+), 181 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..f0c6fd6 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -39,8 +39,6 @@
 #include asm/desc.h
 #include asm/tlbflush.h
 
-#define MMU_QUEUE_SIZE 1024
-
 static int kvmapf = 1;
 
 static int parse_no_kvmapf(char *arg)
@@ -60,21 +58,10 @@ static int parse_no_stealacc(char *arg)
 
 early_param(no-steal-acc, parse_no_stealacc);
 
-struct kvm_para_state {
-   u8 mmu_queue[MMU_QUEUE_SIZE];
-   int mmu_queue_len;
-};
-
-static DEFINE_PER_CPU(struct kvm_para_state, para_state);
 static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
 static int has_steal_clock = 0;
 
-static struct kvm_para_state *kvm_para_state(void)
-{
-   return per_cpu(para_state, raw_smp_processor_id());
-}
-
 /*
  * No need for any IO delay on KVM
  */
@@ -271,151 +258,6 @@ do_async_page_fault(struct pt_regs *regs, unsigned long 
error_code)
}
 }
 
-static void kvm_mmu_op(void *buffer, unsigned len)
-{
-   int r;
-   unsigned long a1, a2;
-
-   do {
-   a1 = __pa(buffer);
-   a2 = 0;   /* on i386 __pa() always returns 4G */
-   r = kvm_hypercall3(KVM_HC_MMU_OP, len, a1, a2);
-   buffer += r;
-   len -= r;
-   } while (len);
-}
-
-static void mmu_queue_flush(struct kvm_para_state *state)
-{
-   if (state-mmu_queue_len) {
-   kvm_mmu_op(state-mmu_queue, state-mmu_queue_len);
-   state-mmu_queue_len = 0;
-   }
-}
-
-static void kvm_deferred_mmu_op(void *buffer, int len)
-{
-   struct kvm_para_state *state = kvm_para_state();
-
-   if (paravirt_get_lazy_mode() != PARAVIRT_LAZY_MMU) {
-   kvm_mmu_op(buffer, len);
-   return;
-   }
-   if (state-mmu_queue_len + len  sizeof state-mmu_queue)
-   mmu_queue_flush(state);
-   memcpy(state-mmu_queue + state-mmu_queue_len, buffer, len);
-   state-mmu_queue_len += len;
-}
-
-static void kvm_mmu_write(void *dest, u64 val)
-{
-   __u64 pte_phys;
-   struct kvm_mmu_op_write_pte wpte;
-
-#ifdef CONFIG_HIGHPTE
-   struct page *page;
-   unsigned long dst = (unsigned long) dest;
-
-   page = kmap_atomic_to_page(dest);
-   pte_phys = page_to_pfn(page);
-   pte_phys = PAGE_SHIFT;
-   pte_phys += (dst  ~(PAGE_MASK));
-#else
-   pte_phys = (unsigned long)__pa(dest);
-#endif
-   wpte.header.op = KVM_MMU_OP_WRITE_PTE;
-   wpte.pte_val = val;
-   wpte.pte_phys = pte_phys;
-
-   kvm_deferred_mmu_op(wpte, sizeof wpte);
-}
-
-/*
- * We only need to hook operations that are MMU writes.  We hook these so that
- * we can use lazy MMU mode to batch these operations.  We could probably
- * improve the performance of the host code if we used some of the information
- * here to simplify processing of batched writes.
- */
-static void kvm_set_pte(pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
-{
-   kvm_mmu_write(pmdp, pmd_val(pmd));
-}
-
-#if PAGETABLE_LEVELS = 3
-#ifdef CONFIG_X86_PAE
-static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
-{
-   kvm_mmu_write(ptep, pte_val(pte));
-}
-
-static void kvm_pte_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
-   kvm_mmu_write(ptep, 0);
-}
-
-static void kvm_pmd_clear(pmd_t *pmdp)
-{
-   kvm_mmu_write(pmdp, 0);
-}
-#endif
-
-static void kvm_set_pud(pud_t *pudp, pud_t pud)
-{
-   kvm_mmu_write(pudp, pud_val(pud));
-}
-
-#if PAGETABLE_LEVELS == 4
-static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-   kvm_mmu_write(pgdp, pgd_val(pgd));
-}
-#endif
-#endif /* PAGETABLE_LEVELS = 3 */
-
-static void kvm_flush_tlb(void)
-{
-   struct kvm_mmu_op_flush_tlb ftlb = {
-   .header.op = KVM_MMU_OP_FLUSH_TLB,
-   };
-
-   kvm_deferred_mmu_op(ftlb, sizeof ftlb);
-}
-
-static void kvm_release_pt(unsigned long pfn)
-{
-   struct kvm_mmu_op_release_pt rpt = {
-   .header.op = KVM_MMU_OP_RELEASE_PT,
-   .pt_phys = (u64)pfn  PAGE_SHIFT,
-   };
-
-   kvm_mmu_op(rpt, sizeof rpt);
-}
-
-static void kvm_enter_lazy_mmu(void)
-{
-   paravirt_enter_lazy_mmu();
-}
-
-static void kvm_leave_lazy_mmu(void)
-{
-   struct kvm_para_state *state = kvm_para_state();
-
-   mmu_queue_flush(state

[PATCH RFC 2/2] KVM: remove KVM host pv mmu support

2011-10-20 Thread Chris Wright
The host side pv mmu support has been marked for feature removal in
January 2011.  It's not in use, is slower than shadow or hardware
assisted paging, and a maintenance burden.  It's October 2011, time to
remove it.

Signed-off-by: Chris Wright chr...@redhat.com
---
 Documentation/feature-removal-schedule.txt |9 --
 arch/x86/include/asm/kvm_host.h|   13 ---
 arch/x86/kvm/mmu.c |  135 
 arch/x86/kvm/x86.c |   12 ---
 4 files changed, 0 insertions(+), 169 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt 
b/Documentation/feature-removal-schedule.txt
index 4dc4654..75f88a5 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -397,15 +397,6 @@ Who:   anybody or Florian Mickler flor...@mickler.org
 
 
 
-What:  KVM paravirt mmu host support
-When:  January 2011
-Why:   The paravirt mmu host support is slower than non-paravirt mmu, both
-   on newer and older hardware.  It is already not exposed to the guest,
-   and kept only for live migration purposes.
-Who:   Avi Kivity a...@redhat.com
-
-
-
 What:  iwlwifi 50XX module parameters
 When:  3.0
 Why:   The ..50 modules parameters were used to configure 5000 series and
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd51c83..8c9ce69 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -241,13 +241,6 @@ struct kvm_mmu_page {
struct rcu_head rcu;
 };
 
-struct kvm_pv_mmu_op_buffer {
-   void *ptr;
-   unsigned len;
-   unsigned processed;
-   char buf[512] __aligned(sizeof(long));
-};
-
 struct kvm_pio_request {
unsigned long count;
int in;
@@ -343,10 +336,6 @@ struct kvm_vcpu_arch {
 */
struct kvm_mmu *walk_mmu;
 
-   /* only needed in kvm_pv_mmu_op() path, but it's hot so
-* put it here to avoid allocation */
-   struct kvm_pv_mmu_op_buffer mmu_op_buffer;
-
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
@@ -666,8 +655,6 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 
unsigned long cr3);
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
-int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
- gpa_t addr, unsigned long *ret);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 extern bool tdp_enabled;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8e8da79..0a45bc1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2005,20 +2005,6 @@ static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t 
gfn)
return r;
 }
 
-static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
-{
-   struct kvm_mmu_page *sp;
-   struct hlist_node *node;
-   LIST_HEAD(invalid_list);
-
-   for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
-   pgprintk(%s: zap %llx %x\n,
-__func__, gfn, sp-role.word);
-   kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
-   }
-   kvm_mmu_commit_zap_page(kvm, invalid_list);
-}
-
 static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
 {
int slot = memslot_id(kvm, gfn);
@@ -3958,127 +3944,6 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm 
*kvm)
return nr_mmu_pages;
 }
 
-static void *pv_mmu_peek_buffer(struct kvm_pv_mmu_op_buffer *buffer,
-   unsigned len)
-{
-   if (len  buffer-len)
-   return NULL;
-   return buffer-ptr;
-}
-
-static void *pv_mmu_read_buffer(struct kvm_pv_mmu_op_buffer *buffer,
-   unsigned len)
-{
-   void *ret;
-
-   ret = pv_mmu_peek_buffer(buffer, len);
-   if (!ret)
-   return ret;
-   buffer-ptr += len;
-   buffer-len -= len;
-   buffer-processed += len;
-   return ret;
-}
-
-static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu,
-gpa_t addr, gpa_t value)
-{
-   int bytes = 8;
-   int r;
-
-   if (!is_long_mode(vcpu)  !is_pae(vcpu))
-   bytes = 4;
-
-   r = mmu_topup_memory_caches(vcpu);
-   if (r)
-   return r;
-
-   if (!emulator_write_phys(vcpu, addr, value, bytes))
-   return -EFAULT;
-
-   return 1;
-}
-
-static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu)
-{
-   (void)kvm_set_cr3(vcpu, kvm_read_cr3(vcpu));
-   return 1;
-}
-
-static int kvm_pv_mmu_release_pt(struct kvm_vcpu *vcpu, gpa_t addr)
-{
-   spin_lock(vcpu-kvm-mmu_lock);
-   mmu_unshadow(vcpu-kvm, addr  PAGE_SHIFT);
-   spin_unlock(vcpu-kvm-mmu_lock);
-   return 1;
-}
-
-static int

Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-29 Thread Chris Wright
* Reeted (ree...@shiftmail.org) wrote:
 On 09/29/11 02:39, Chris Wright wrote:
 Can you help narrow down what is happening during the additional 12
 seconds in the guest?  For example, does a quick simple boot to single
 user mode happen at the same boot speed w/ and w/out vhost_net?
 
 Not tried (would probably be too short to measure effectively) but
 I'd guess it would be the same as for multiuser, see also the FC6
 sub-thread
 
 I'm guessing (hoping) that it's the network bring-up that is slow.
 Are you using dhcp to get an IP address?  Does static IP have the same
 slow down?
 
 It's all static IP.
 
 And please see my previous post, 1 hour before yours, regarding
 Fedora Core 6: the bring-up of eth0 in Fedora Core 6 is not
 particularly faster or slower than the rest. This is an overall
 system slowdown (I'd say either CPU or disk I/O) not related to the
 network (apart from being triggered by vhost_net).

OK, I re-read it (pretty sure FC6 had the old dhclient, which is why
I wondered).  That is odd.  No ideas are springing to mind.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: device assignment: add 82599 PCIe Cap struct quirk

2011-09-28 Thread Chris Wright
* Donald Dutile (ddut...@redhat.com) wrote:
 commit f9c29774d2174df6ffc20becec20928948198914
 changed the PCIe Capability structure version check
 from if  2 fail, to if ==1, size=x, if ==2, size=y,
 else fail.
 Turns out the 82599's VF has an errata where it's
 PCIe Cap struct version is 0, which now fails device assignment
 due to the else fallout, where before, it would blissfully work.
 
 Add a quirk if version=0,  intel-82599, set size to version 2 struct.
 
 Signed-off-by: Donald_Dutile ddut...@redhat.com

(not pretty, but neither is the hw errata...)

Acked-by: Chris Wright chr...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Qemu/KVM is 3x slower under libvirt

2011-09-28 Thread Chris Wright
* Reeted (ree...@shiftmail.org) wrote:
 On 09/28/11 11:28, Daniel P. Berrange wrote:
 On Wed, Sep 28, 2011 at 11:19:43AM +0200, Reeted wrote:
 On 09/28/11 09:51, Daniel P. Berrange wrote:
 You could have equivalently used
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=hostnet0,vhost=on
   -device 
  virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:05:36:60,bus=pci.0,addr=0x3
 It's this! It's this!! (thanks for the line)
 
 It raises boot time by 10-13 seconds
 Ok, that is truely bizarre and I don't really have any explanation
 for why that is. I guess you could try 'vhost=off' too and see if that
 makes the difference.
 
 YES!
 It's the vhost. With vhost=on it takes about 12 seconds more time to boot.

Can you help narrow down what is happening during the additional 12
seconds in the guest?  For example, does a quick simple boot to single
user mode happen at the same boot speed w/ and w/out vhost_net?

I'm guessing (hoping) that it's the network bring-up that is slow.
Are you using dhcp to get an IP address?  Does static IP have the same
slow down?

If it's just dhcp, can you recompile qemu with this patch and see if it
causes the same slowdown you saw w/ vhost?

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 0b03b57..0c864f7 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -496,7 +496,7 @@ static int receive_header(VirtIONet *n, struct iovec *iov, 
int iovcnt,
 if (n-has_vnet_hdr) {
 memcpy(hdr, buf, sizeof(*hdr));
 offset = sizeof(*hdr);
-work_around_broken_dhclient(hdr, buf + offset, size - offset);
+//work_around_broken_dhclient(hdr, buf + offset, size - offset);
 }
 
 /* We only ever receive a struct virtio_net_hdr from the tapfd,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to assign a pci device to guest [with qemu.git upstream]?

2011-09-28 Thread Chris Wright
* Ren, Yongjie (yongjie@intel.com) wrote:
 I'm using kvm and qemu upstream on https://github.com/avikivity
 The following command line was right for me about three weeks ago, but now I 
 meet some error.
 # qemu-system-x86_64 -m 1024 -smp 2 -device pci-assign,host=0e:00.0 -hda 
 /root/rhel6u1.img
 output error is like following.
 qemu-system-x86_64: -device pci-assign,host=0d:00.0: Parameter 'driver' 
 expects a driver name
 Try with argument '?' for a list.

Looks like you don't have device assignment support compiled in.
Start with the basics (assuming tree has hw/device-assignment.c):

did your ./configure output show:

KVM device assig. yes

and does your binary agree?

qemu-system-x86_64 -device ? 21 | grep pci-assign

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to assign a pci device to guest [with qemu.git upstream]?

2011-09-28 Thread Chris Wright
* Ren, Yongjie (yongjie@intel.com) wrote:
 Chris,
 Thanks very much for you kind help. 
 I can't find hw/device-assignment.c in the qemu.git tree.
 Avi,
 I clone qemu from git://github.com/avikivity/qemu.git 
 So device assignment is not available. But qemu-kvm.git has device-assignment 
 code before kernel.org is down.
 Any update for this issue?

Are you using the master branch?  I noticed the github web defaults to
the memory/queue branch.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to assign a pci device to guest [with qemu.git upstream]?

2011-09-28 Thread Chris Wright
* Chris Wright (chr...@sous-sol.org) wrote:
 * Ren, Yongjie (yongjie@intel.com) wrote:
  Chris,
  Thanks very much for you kind help. 
  I can't find hw/device-assignment.c in the qemu.git tree.
  Avi,
  I clone qemu from git://github.com/avikivity/qemu.git 
  So device assignment is not available. But qemu-kvm.git has 
  device-assignment code before kernel.org is down.
  Any update for this issue?
 
 Are you using the master branch?  I noticed the github web defaults to
 the memory/queue branch.

BTW, if you hadn't used branches much before, something like this will
get you what you want:

$ git checkout -b master origin/master

Now you'll be on the master branch (and it should track upstream master
properly).

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inter VM / PF-VF communication

2011-09-23 Thread Chris Wright
* Sagar Borikar (sagar.bori...@gmail.com) wrote:
 Sorry if I am not keeping up on the subject but wanted to know whether
 there is any effort going on for inter VM communication / PF-VF
 communication (in case of SR-IOV)
 I see that most of SR-IOV capable NIC supports mailboxes for that
 purpose to avoid the security hole.
 Xen has virtual device implementation for the same. Should I presume
 that such kind of effort is not on the radar and HW needs to own the
 responsibility of filling the loop holes in security threats imposed
 by VF?

We do not support this, and had no plans to.  Most cards have managed to
do this in hw.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI-passthrough - issues/questions/ideas

2011-09-23 Thread Chris Wright
* Patrick Ringl (patri...@freenet.de) wrote:
 Hi,
 
 I just wanted to introduce a problem I currently face including some
 questions regarding my temporary fix.
 Anyway, I have a PCI-device that I want to passthrough to a hvm
 guest. Now there are several problems that add up:
 
 a) the PCI-device is bound to a PCI-to-PCI bridge (which in turn is
 directly attached to the rootbus) (mainboard has a AMD 970/SB950
 chipset).
 [since pciIsParent shows that the secondary bus equals the device's bus]:
 
 bridge:
  lspci -s00:14.4 -vvv | grep Bus:
  Bus: primary=00, secondary=07, subordinate=07, sec-latency=64
 
 PCI-device
  07:06.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)
 
 b) neither the bridge nor the PCI device itself have the currently
 implemented reset functionality that you trigger in pciResetDevice
 
 c) the PCI-device is mapped through the PCI bridge (IOMMU-wise):
 
 ACPI IOMMU dump:
  [1.121239] AMD-Vi:   DEV_SELECT devid: 00:14.4 flags: 00
  [1.121274] AMD-Vi:   DEV_ALIAS_RANGE devid: 07:00.0
 flags: 00 devid_to: 00:14.4
  [1.121311] AMD-Vi:   DEV_RANGE_END devid: 07:1f.7
 
 
 What I did to get (temporarily and in a rather hackish (maybe even
 wrong) manner) rid of the problem, is to ignore the error thrown in
 pciResetDevice when no reset had been possible at all.
 
 if (ret  0) {
 /*
 -- I know what you did last summer!
 
 virErrorPtr err = virGetLastError();
 pciReportError(VIR_ERR_INTERNAL_ERROR,
_(Unable to reset PCI device %s: %s),
dev-name,
err ? err-message : _(no FLR, PM reset or
 bus reset available));
 */
 ret = 0;
 }
 
 
 Concludingly I'd ask the following questions:
 
 a) Why is a secondary_bus_reset a bad idea if the device in
 question's _primary_ bus is the root bus?

That's not the issue the code is guarding against.  It is guarding
against issuing a secondary bus reset on the root bus.  Your dev-bus
should be 7, not 0.  However, the part that should be failing is
pciTrySecondaryBusReset().  And this will fail if there are other devices
on that bus (07) that are not assigned to your guest, because a secondary
bus resest will reset _all_ devices on the secondary bus.

 b) Why would it be a bad idea adding some sort of 'override
 attribute' to the guest's config, so libvirt may intentionally skip
 the reset? What are the possible consequences if no reset takes
 place at all?

The problem with skipping the reset is primarily a security concern.
Device state will leak between users of the device.  An override is
possible, you can discuss that with libvirt developers to see if they'd
support an insecure flag like that.

 c) What options do I have besides implementing c) or just going the
 dirty way and ignore the case when no reset is possible (like I
 described above)?

You can try assiging all devices on bus 7 to the guest.  This should
allow a sbus reset to be issued.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory API code review

2011-09-14 Thread Chris Wright
* Avi Kivity (a...@redhat.com) wrote:
 I would like to carry out an online code review of the memory API so that
 more people are familiar with the internals, and perhaps even to catch some
 bugs or deficiency.  I'd like to use the next kvm conference call slot for
 this (Tuesday 1400 UTC) since many people already have it reserved in the
 schedule.
 
 It would be great if people from the wider qemu community be present, rather
 than the usual x86 is everything crowd (+Jan) that usually participates in
 the kvm weekly call.
 
 Juan, Chris, can we dedicate next week's call to this?

Yup, sounds like a good idea.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote:
 On 8/26/11 7:07 AM, Alexander Graf ag...@suse.de wrote:
  Forget the KVM case for a moment and think of a user space device driver. I 
  as
  a user am not root. But I as a user when having access to /dev/vfioX want to
  be able to access the device and manage it - and only it. The admin of that
  box needs to set it up properly for me to be able to access it.
  
  So having two steps is really the correct way to go:
  
* create VFIO group
* use VFIO group
  
  because the two are done by completely different users.
 
 This is not the case for my userspace drivers using VFIO today.
 
 Each process will open vfio devices on the fly, and they need to be able to
 share IOMMU resources.

How do you share IOMMU resources w/ multiple processes, are the processes
sharing memory?

 So I need the ability to dynamically bring up devices and assign them to a
 group.  The number of actual devices and how they map to iommu domains is
 not known ahead of time.  We have a single piece of silicon that can expose
 hundreds of pci devices.

This does not seem fundamentally different from the KVM use case.

We have 2 kinds of groupings.

1) low-level system or topoolgy grouping

   Some may have multiple devices in a single group

   * the PCIe-PCI bridge example
   * the POWER partitionable endpoint

   Many will not

   * singleton group, e.g. typical x86 PCIe function (majority of
 assigned devices)

   Not sure it makes sense to have these administratively defined as
   opposed to system defined.

2) logical grouping

   * multiple low-level groups (singleton or otherwise) attached to same
 process, allowing things like single set of io page tables where
 applicable.

   These are nominally adminstratively defined.  In the KVM case, there
   is likely a privileged task (i.e. libvirtd) involved w/ making the
   device available to the guest and can do things like group merging.
   In your userspace case, perhaps it should be directly exposed.

 In my case, the only administrative task would be to give my processes/users
 access to the vfio groups (which are initially singletons), and the
 application actually opens them and needs the ability to merge groups
 together to conserve IOMMU resources (assuming we're not going to expose
 uiommu).

I agree, we definitely need to expose _some_ way to do this.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm PCI assignment VFIO ramblings

2011-08-26 Thread Chris Wright
* Aaron Fabbri (aafab...@cisco.com) wrote:
 On 8/26/11 12:35 PM, Chris Wright chr...@sous-sol.org wrote:
  * Aaron Fabbri (aafab...@cisco.com) wrote:
  Each process will open vfio devices on the fly, and they need to be able to
  share IOMMU resources.
  
  How do you share IOMMU resources w/ multiple processes, are the processes
  sharing memory?
 
 Sorry, bad wording.  I share IOMMU domains *within* each process.

Ah, got it.  Thanks.

 E.g. If one process has 3 devices and another has 10, I can get by with two
 iommu domains (and can share buffers among devices within each process).
 
 If I ever need to share devices across processes, the shared memory case
 might be interesting.
 
  
  So I need the ability to dynamically bring up devices and assign them to a
  group.  The number of actual devices and how they map to iommu domains is
  not known ahead of time.  We have a single piece of silicon that can expose
  hundreds of pci devices.
  
  This does not seem fundamentally different from the KVM use case.
  
  We have 2 kinds of groupings.
  
  1) low-level system or topoolgy grouping
  
 Some may have multiple devices in a single group
  
 * the PCIe-PCI bridge example
 * the POWER partitionable endpoint
  
 Many will not
  
 * singleton group, e.g. typical x86 PCIe function (majority of
   assigned devices)
  
 Not sure it makes sense to have these administratively defined as
 opposed to system defined.
  
  2) logical grouping
  
 * multiple low-level groups (singleton or otherwise) attached to same
   process, allowing things like single set of io page tables where
   applicable.
  
 These are nominally adminstratively defined.  In the KVM case, there
 is likely a privileged task (i.e. libvirtd) involved w/ making the
 device available to the guest and can do things like group merging.
 In your userspace case, perhaps it should be directly exposed.
 
 Yes.  In essence, I'd rather not have to run any other admin processes.
 Doing things programmatically, on the fly, from each process, is the
 cleanest model right now.

I don't see an issue w/ this.  As long it can not add devices to the
system defined groups, it's not a privileged operation.  So we still
need the iommu domain concept exposed in some form to logically put
groups into a single iommu domain (if desired).  In fact, I believe Alex
covered this in his most recent recap:

  ...The group fd will provide interfaces for enumerating the devices
  in the group, returning a file descriptor for each device in the group
  (the device fd), binding groups together, and returning a file
  descriptor for iommu operations (the iommu fd).

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gfx card passthrough broken with latest head

2011-08-23 Thread Chris Wright
* André Weidemann (andre.weidem...@web.de) wrote:
snip
 git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
snip
 ./configure --audio-drv-list=alsa --target-list=x86_64-softmmu
 --enable-kvm-device-assignment
 
 ERROR: unknown option --enable-kvm-device-assignment
snip
 How come so many revision do not support device assignment? Is there
 a trick to enable it?

Bisection qemu-kvm userspace is tricky.  The upstream qemu repo
(git://git.qemu.org/qemu.git) does not have PCI device assignment
support.  The qemu-kvm repo does regular merges w/ the upstream qemu
repo.  As you bisect through the qemu-kvm repo history, you are likely
to land on a commit that is from upstream (meaning a tree w/out
downstream qemu-kvm additions, like device assignment).

Depending on where you suspect the issue is coming from, you can be
careful to bisect only through the qemu-kvm tree (by skipping back to a
merge point), or you can remerge the qemu-kvm tree to the qemu tree when
you bisect into the qemu tree.

Note...gfx assignment has many issues associated w/ it and often does
not work.  You can check out Allen Kay's presentation at the recent KVM
Forum for some examples: http://goo.gl/Hyk13

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] pci: correct pci config size default for cap version 2 endpoints

2011-08-08 Thread Chris Wright
* Don Dutile (ddut...@redhat.com) wrote:
 On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote:
 On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote:
 On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote:
 On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote:
 * Alex Williamson (alex.william...@redhat.com) wrote:
 On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
 * Donald Dutile (ddut...@redhat.com) wrote:
 +} else if (version == 2) {
 +/* don't include slot cap/stat/ctrl 2 regs; only support 
 endpoints */
 +size = 0x34;
 
 That doesn't look correct to me.  The size is fixed, just that some
 registers are Reserved Zero when they do not apply (e.g. endpoint only).
 
 Apparently it can be interpreted differently.  In this case, we've seen
 a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
 0x3c bytes, we extend 8 bytes past the legacy config space area :(
 
 Wow, that device sounds broken to me.  The spec is pretty clear.
 
 Yes, I agree it's broken. Looks like something that
 happens when a device is designed in parallel with the spec.
 
 What bothers me is this patch seems to make devices that do behave
 correctly out of spec (registers will be writeable by default) -
 correct?
 
 How about we check for overflow and only do the hacks
 if it happens?
 
 Also, the code to initialize slot and root control registers is still
 there: it would seem that running it will corrupt memmory beyond the
 config array?
 
 I take this last bit back: registers we touch are at offset  0x34.
 Sorry about the noise. But the question about read-only registers
 still stands.
 
 Also, where does the magic 0x34 come from? I'm guessing this is
 simply what's left till the end of the config space.
 So let's be conservative specific as possible with
 this hack:
 
 I believe the spec leaves room for interpretation, and thus the
 resulting 'broken' device.  As I read the spec, the size of the struct can be:

Yeah, I can see how it might be misinterpreted, however, it's made
really clear in the config space test spec.  This strucuture is meant to
be full size.  Perhaps something like Michael suggested (and if really
paranoid + pci vendor/device id to quirk it).  I haven't come across many
devices have this wrong.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] pci: correct pci config size default for cap version 2 endpoints

2011-07-22 Thread Chris Wright
* Donald Dutile (ddut...@redhat.com) wrote:
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 36ad6b0..34db52e 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice 
 *pci_dev)
  }
  
  if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
 -uint8_t version;
 +uint8_t version, size;
  uint16_t type, devctl, lnkcap, lnksta;
  uint32_t devcap;
 -int size = 0x3c; /* version 2 size */
  
  version = pci_get_byte(pci_dev-config + pos + PCI_EXP_FLAGS);
  version = PCI_EXP_FLAGS_VERS;
  if (version == 1) {
  size = 0x14;
 -} else if (version  2) {
 +} else if (version == 2) {
 +/* don't include slot cap/stat/ctrl 2 regs; only support 
 endpoints */
 +size = 0x34;

That doesn't look correct to me.  The size is fixed, just that some
registers are Reserved Zero when they do not apply (e.g. endpoint only).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] pci: correct pci config size default for cap version 2 endpoints

2011-07-22 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote:
  * Donald Dutile (ddut...@redhat.com) wrote:
   diff --git a/hw/device-assignment.c b/hw/device-assignment.c
   index 36ad6b0..34db52e 100644
   --- a/hw/device-assignment.c
   +++ b/hw/device-assignment.c
   @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice 
   *pci_dev)
}

if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) {
   -uint8_t version;
   +uint8_t version, size;
uint16_t type, devctl, lnkcap, lnksta;
uint32_t devcap;
   -int size = 0x3c; /* version 2 size */

version = pci_get_byte(pci_dev-config + pos + PCI_EXP_FLAGS);
version = PCI_EXP_FLAGS_VERS;
if (version == 1) {
size = 0x14;
   -} else if (version  2) {
   +} else if (version == 2) {
   +/* don't include slot cap/stat/ctrl 2 regs; only support 
   endpoints */
   +size = 0x34;
  
  That doesn't look correct to me.  The size is fixed, just that some
  registers are Reserved Zero when they do not apply (e.g. endpoint only).
 
 Apparently it can be interpreted differently.  In this case, we've seen
 a tg3 device expose a v2 PCI express capability at offset 0xcc.  Using
 0x3c bytes, we extend 8 bytes past the legacy config space area :(

Wow, that device sounds broken to me.  The spec is pretty clear.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Introduce iommu_commit() function

2011-06-23 Thread Chris Wright
* David Woodhouse (dw...@infradead.org) wrote:
 I'd much rather KVM just gave us a list of the pages to map, in a single
 call.

This makes most sense to me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-22 Thread Chris Wright
* Izik Eidus (izik.ei...@ravellosystems.com) wrote:
 On 6/22/2011 3:21 AM, Chris Wright wrote:
 * Nai Xia (nai@gmail.com) wrote:
 +   if (!shadow_dirty_mask) {
 +   WARN(1, KVM: do NOT try to test dirty bit in EPT\n);
 +   goto out;
 +   }
 This should never fire with the dirty_update() notifier test, right?
 And that means that this whole optimization is for the shadow mmu case,
 arguably the legacy case.
 
 Hi Chris,
 AMD npt does track the dirty bit in the nested page tables,
 so the shadow_dirty_mask should not be 0 in that case...

Yeah, momentary lapse... ;)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for June 21

2011-06-21 Thread Chris Wright
concerns about backwards compat
- https://bugzilla.redhat.com/show_bug.cgi?id=689672
  - f13 host can no longer run f14 guest after qemu update
- this particular bug is older f13 which includes patched qemu...
- could be useful to fingerprint the guest (lspci, etc)
  - sounds simple enough, need someone who's inclined to do it

state of image streaming/block copy
- live block copy and image streaming overlap
  - attempting to unify
- some confusion over next steps
- need to clarify differing requirements (shared storage vs. generic storage)
- stefan to summarize solution proposal on list/wiki

guest agent api current verbs and future roadmap?
- pretty happy w/ current verbs, future intention to keep it simple,
  high-level
- should be working on windows guests
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mmu_notifier, kvm: Introduce dirty bit tracking in spte and mmu notifier to help KSM dirty bit tracking

2011-06-21 Thread Chris Wright
* Nai Xia (nai@gmail.com) wrote:
 Introduced kvm_mmu_notifier_test_and_clear_dirty(), 
 kvm_mmu_notifier_dirty_update()
 and their mmu_notifier interfaces to support KSM dirty bit tracking, which 
 brings
 significant performance gain in volatile pages scanning in KSM.
 Currently, kvm_mmu_notifier_dirty_update() returns 0 if and only if intel EPT 
 is
 enabled to indicate that the dirty bits of underlying sptes are not updated by
 hardware.

Did you test with each of EPT, NPT and shadow?

 Signed-off-by: Nai Xia nai@gmail.com
 Acked-by: Izik Eidus izik.ei...@ravellosystems.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/mmu.c  |   36 +
  arch/x86/kvm/mmu.h  |3 +-
  arch/x86/kvm/vmx.c  |1 +
  include/linux/kvm_host.h|2 +-
  include/linux/mmu_notifier.h|   48 
 +++
  mm/mmu_notifier.c   |   33 ++
  virt/kvm/kvm_main.c |   27 ++
  8 files changed, 149 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index d2ac8e2..f0d7aa0 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -848,6 +848,7 @@ extern bool kvm_rebooting;
  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 +int kvm_test_and_clear_dirty_hva(struct kvm *kvm, unsigned long hva);
  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index aee3862..a5a0c51 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -979,6 +979,37 @@ out:
   return young;
  }
  
 +/*
 + * Caller is supposed to SetPageDirty(), it's not done inside this.
 + */
 +static
 +int kvm_test_and_clear_dirty_rmapp(struct kvm *kvm, unsigned long *rmapp,
 +unsigned long data)
 +{
 + u64 *spte;
 + int dirty = 0;
 +
 + if (!shadow_dirty_mask) {
 + WARN(1, KVM: do NOT try to test dirty bit in EPT\n);
 + goto out;
 + }

This should never fire with the dirty_update() notifier test, right?
And that means that this whole optimization is for the shadow mmu case,
arguably the legacy case.

 +
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + int _dirty;
 + u64 _spte = *spte;
 + BUG_ON(!(_spte  PT_PRESENT_MASK));
 + _dirty = _spte  PT_DIRTY_MASK;
 + if (_dirty) {
 + dirty = 1;
 + clear_bit(PT_DIRTY_SHIFT, (unsigned long *)spte);

Is this sufficient (not losing dirty state ever)?

 + }
 + spte = rmap_next(kvm, rmapp, spte);
 + }
 +out:
 + return dirty;
 +}
 +
  #define RMAP_RECYCLE_THRESHOLD 1000
  
  static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 @@ -1004,6 +1035,11 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long 
 hva)
   return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp);
  
  
 +int kvm_test_and_clear_dirty_hva(struct kvm *kvm, unsigned long hva)
 +{
 + return kvm_handle_hva(kvm, hva, 0, kvm_test_and_clear_dirty_rmapp);
 +}
 +
  #ifdef MMU_DEBUG
  static int is_empty_shadow_page(u64 *spt)
  {
 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
 index 7086ca8..b8d01c3 100644
 --- a/arch/x86/kvm/mmu.h
 +++ b/arch/x86/kvm/mmu.h
 @@ -18,7 +18,8 @@
  #define PT_PCD_MASK (1ULL  4)
  #define PT_ACCESSED_SHIFT 5
  #define PT_ACCESSED_MASK (1ULL  PT_ACCESSED_SHIFT)
 -#define PT_DIRTY_MASK (1ULL  6)
 +#define PT_DIRTY_SHIFT 6
 +#define PT_DIRTY_MASK (1ULL  PT_DIRTY_SHIFT)
  #define PT_PAGE_SIZE_MASK (1ULL  7)
  #define PT_PAT_MASK (1ULL  7)
  #define PT_GLOBAL_MASK (1ULL  8)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index d48ec60..b407a69 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -4674,6 +4674,7 @@ static int __init vmx_init(void)
   kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
   VMX_EPT_EXECUTABLE_MASK);
   kvm_enable_tdp();
 + kvm_dirty_update = 0;

Doesn't the above shadow_dirty_mask==0ull tell us this same info?

   } else
   kvm_disable_tdp();
  
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 31ebb59..2036bae 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -53,7 +53,7 @@
  struct kvm;
  struct kvm_vcpu;
  extern struct kmem_cache *kvm_vcpu_cache;
 -
 +extern int kvm_dirty_update;
  /*
   * It would be nice to use something smarter than a linear search, TBD...
   * Thankfully we dont expect many devices to register (famous last words :),
 diff --git 

Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* padmanabh ratnakar (pratnaka...@gmail.com) wrote:
 On Tue, Jun 7, 2011 at 4:04 AM, Chris Wright chr...@sous-sol.org wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
  On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote:
   Hi,
           I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
   I have used following boot options -
   intel_iommu=on iommu=pt
  
   I was loading/unloading my NIC driver(be2net) with num_vfs=7.
  
   After some iterations I get following DMAR errors -
   Jun  4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
   2d on CPU 0.
   Jun  4 03:50:20 rhel6 kernel: Do you have a strange power saving mode 
   enabled?
   Jun  4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
   Jun  4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
   Jun  4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
   fault addr 78077000
   Jun  4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
   context entry is clear
  
   I was trying to debug this. I dont understand iommu code much.
   The physical address belongs the printed PCI function and there should
   not have been an error.
  
   I am unable to see pci_dev(pdev) of VFs getting removed from
   si_domain-devices list(intel-iommu.c)
   when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
   Looks like issue happens when when freed pdev is allocated again and
   as it is already in list,
   required initializations dont happen.
  
   I dont know if my understanding is correct. Can anyone point me to
   what the issue may be?
 
  Yes, that's correct.  The (now replaced) check identity_mapping()
  will succeed when the pci_dev is recycled (it's freed, but never
  removed from the list, this is an issue with passtrhough mode and device
  creation/desctruction).  This false match happens w/ a brand new pci_dev
  which still has default 32bit DMA mask, so it is removed from pt domain.
  During removal domain_remove_one_dev_info() test that matches only
  on bus/devfn (now also segment) will match despite the fact that the
  info-pdev != pdev-dev.archdata.iommu.  Then...Oops
 
  Typically devices are removed from the domain via
  drivers/pci/intel-iommu.c:device_notifier(), which is called as the
  device is unbound from the driver.  However, this seems to get skipped
  when running in passthrough mode, so I'm not sure where that's supposed
  to occur.  Does it happen w/o passthrough?
 
 I had tried without passthrough on RHEL 6.1 GA kernel. Was seeing
 hangs and panics. Will check if non passthrough mode works on latest kernel.
 
  If you blacklist the driver then a create/delete may do similar (haven't
  tested that idea).
 
  Also note that some
  intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update
  and see if anything is better there.  Thanks,
 
  The change in identity_mapping() means we won't demote to 32-bit DMA
  (drop out of pt domain), so I don't think we'll see the same issue.
 
 For testing I had made a hack in 2.6.39 kernel which will prevent
 demoting to 32bit DMA mask
 and thereby prevent calling of domain_remove_one_dev_info() for the
 specific VF device I was using
 and it had worked.
 So as you said I may not hit the issue in latest kernel. Will try that.

I think we still leak the list entry though.  Bottom line is that we
need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* David Woodhouse (dw...@infradead.org) wrote:
 On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
  I think we still leak the list entry though.  Bottom line is that we
  need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
  happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
 
 Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
 should figure out the matching DMAR unit directly from the ACPI table at
 ADD_DEVICE time, and store it in pdev-archdata.iommu.
 
 I saw patches which were going in that direction...

Cool, where are they?  I'm working on something similar, and missed them.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-07 Thread Chris Wright
* David Woodhouse (dw...@infradead.org) wrote:
 On Tue, 2011-06-07 at 08:10 -0700, Chris Wright wrote:
  * David Woodhouse (dw...@infradead.org) wrote:
   On Tue, 2011-06-07 at 06:38 -0700, Chris Wright wrote:
I think we still leak the list entry though.  Bottom line is that we
need to handle hotplug ADD_DEVICE and DEL_DEVICE notifications.  We
happen to pick up ADD_DEVICE by accident, but it's all pretty sloppy. 
   
   Yeah, keeping a list of possible stale 'pci_dev' pointers is stupid. We
   should figure out the matching DMAR unit directly from the ACPI table at
   ADD_DEVICE time, and store it in pdev-archdata.iommu.
   
   I saw patches which were going in that direction...
  
  Cool, where are they?  I'm working on something similar, and missed them.
 
 [PATCH] pci, dmar: Update dmar units devices list during hotplug

Oh yeah, thanks for the reminder.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for June 7

2011-06-06 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Seeing DMAR errors after multiple load/unload with SR-IOV

2011-06-06 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Mon, 2011-06-06 at 14:39 +0530, padmanabh ratnakar wrote:
  Hi,
  I am using linux kernel 2.6.39. I have a IBM x3650 M3 system.
  I have used following boot options -
  intel_iommu=on iommu=pt
  
  I was loading/unloading my NIC driver(be2net) with num_vfs=7.
  
  After some iterations I get following DMAR errors -
  Jun  4 03:50:20 rhel6 kernel: Uhhuh. NMI received for unknown reason
  2d on CPU 0.
  Jun  4 03:50:20 rhel6 kernel: Do you have a strange power saving mode 
  enabled?
  Jun  4 03:50:20 rhel6 kernel: Dazed and confused, but trying to continue
  Jun  4 03:50:20 rhel6 kernel: DRHD: handling fault status reg 2
  Jun  4 03:50:20 rhel6 kernel: DMAR:[DMA Read] Request device [1a:00.2]
  fault addr 78077000
  Jun  4 03:50:20 rhel6 kernel: DMAR:[fault reason 02] Present bit in
  context entry is clear
  
  I was trying to debug this. I dont understand iommu code much.
  The physical address belongs the printed PCI function and there should
  not have been an error.
  
  I am unable to see pci_dev(pdev) of VFs getting removed from
  si_domain-devices list(intel-iommu.c)
  when driver gets unloaded calling pci_disable_sriov() freeing VF pdevs.
  Looks like issue happens when when freed pdev is allocated again and
  as it is already in list,
  required initializations dont happen.
  
  I dont know if my understanding is correct. Can anyone point me to
  what the issue may be?

Yes, that's correct.  The (now replaced) check identity_mapping()
will succeed when the pci_dev is recycled (it's freed, but never
removed from the list, this is an issue with passtrhough mode and device
creation/desctruction).  This false match happens w/ a brand new pci_dev
which still has default 32bit DMA mask, so it is removed from pt domain.
During removal domain_remove_one_dev_info() test that matches only
on bus/devfn (now also segment) will match despite the fact that the
info-pdev != pdev-dev.archdata.iommu.  Then...Oops

 Typically devices are removed from the domain via
 drivers/pci/intel-iommu.c:device_notifier(), which is called as the
 device is unbound from the driver.  However, this seems to get skipped
 when running in passthrough mode, so I'm not sure where that's supposed
 to occur.  Does it happen w/o passthrough?

If you blacklist the driver then a create/delete may do similar (haven't
tested that idea).

 Also note that some
 intel-iommu fixes have rolled into 3.0.0-rc2, you might want to update
 and see if anything is better there.  Thanks,

The change in identity_mapping() means we won't demote to 32-bit DMA
(drop out of pt domain), so I don't think we'll see the same issue.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Apr 26

2011-04-26 Thread Chris Wright
Tools for resource accounting the virtual machines.
- Luis Castro was not on the call

Status of glib tree - next steps?
- full conversion done in tree
- still targeting 0.15

status of QCFG
- code generator rewritten to be more generic and useful
- merge core infrastructure first
  - to not block other work waiting on full conversion
- still need to complete full conversion

qemu-kvm merge
- status
  - review and merge/feedback pending from Avi on current outstanding patches
  - still have some 60 patches
- break them into a few smaller series
- next steps, specifically:
  - upstreaming in-kernel irqchip support
  - MSI/MSI-X (cleanup and make mergable)
  - this is a decent amount of work, Jan is solo...anyone want to help?
- need to be careful of regressions
- add tests to avi's autotest run (e.g., cpu hotplug)
  - cpu hotplug test initiated from host side
  - online needs some cooperation in linux
  - still unclear on what's supported, windows apparently only supports online

autotest
- had autotest test day, feedback coming on list
- some issues with getting set up
- having basic common config could be useful

KVM Forum reminder
- send in your proposals
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] intel-iommu: Fix use after release during device attach

2011-04-21 Thread Chris Wright
* Jan Kiszka (jan.kis...@siemens.com) wrote:
 On 2011-01-04 11:42, Jan Kiszka wrote:
  Am 10.12.2010 19:44, Chris Wright wrote:
  * Jan Kiszka (jan.kis...@siemens.com) wrote:
  --- a/drivers/pci/intel-iommu.c
  +++ b/drivers/pci/intel-iommu.c
  @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
  iommu_domain *domain,
 
 pte = dmar_domain-pgd;
 if (dma_pte_present(pte)) {
  -  free_pgtable_page(dmar_domain-pgd);
 dmar_domain-pgd = (struct dma_pte *)
 phys_to_virt(dma_pte_addr(pte));
 
  While here, might as well remove the unnecessary cast.
 
  +  free_pgtable_page(pte);
 }
 dmar_domain-agaw--;
 }
 
  Reviewed-by: Sheng Yang sh...@linux.intel.com
 
  Acked-by: Chris Wright chr...@sous-sol.org
 
  CC iommu mailing list and David.
 
  Ping...
 
  I think this fix also qualifies for stable (.35 and .36).
 
 
  Still not merged?
 
  David, do you plan to pick this one up?
 
  thanks,
  -chris
  
  Hmm, still no reaction. Trying David's Intel address now...
  
  Jan
  
 
 Walking through my old queues, I came across this one again.
 
 Given the still lacking reaction from the official maintainer, I'm a
 bit confused about the state of intel-iommu. Is it unmaintained? Should
 this bug fix better be routed through the KVM tree as its only in-tree
 user? Please enlighten me.
 
 Note that the patch became stable material for 35..38 in the meantime,
 and it should go into 39 before release as well.
 
 Thanks,
 Jan
 
 ---8
 
 Obtain the new pgd pointer before releasing the page containing this
 value. Remove unneeded cast at this chance as well.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Acked-by: Chris Wright chr...@sous-sol.org

 ---
  drivers/pci/intel-iommu.c |5 ++---
  1 files changed, 2 insertions(+), 3 deletions(-)
 
 v1-v2: Clean up cast as suggested by Chris.
 
 diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
 index 505c1c7..b3e5c43 100644
 --- a/drivers/pci/intel-iommu.c
 +++ b/drivers/pci/intel-iommu.c
 @@ -3607,9 +3607,8 @@ static int intel_iommu_attach_device(struct 
 iommu_domain *domain,
  
   pte = dmar_domain-pgd;
   if (dma_pte_present(pte)) {
 - free_pgtable_page(dmar_domain-pgd);
 - dmar_domain-pgd = (struct dma_pte *)
 - phys_to_virt(dma_pte_addr(pte));
 + dmar_domain-pgd = phys_to_virt(dma_pte_addr(pte));
 + free_pgtable_page(pte);
   }
   dmar_domain-agaw--;
   }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Apr 5

2011-04-05 Thread Chris Wright
KVM Forum
- save the date is out, cfp will follow later this week
- abstracts due in 6wks, 2wk review period, notifications by end of May

Improving process to scale project
- Trivial patch bot
- Sub-maintainership

Trivial patch monkeys^Wteam
- small/simple patches posted can fall through the cracks (esp. for
  areas that aren't well maintained)
- patches should be simple, easy to review (
- aiming to gather a team, so that the position can rotate
- patch submitter can rest assured
- Stefan and possibly Mike Roth are volunteering to get this started
- Cc: qemu-triv...@nongnu.org to send patches to the Trivial patch monkey
- details here:
  
  http://wiki.qemu.org/Contribute/TrivialPatches

Sub-maintainership
- have MAINTAINERS file
  - need to add git tree URLs
  - needs another pass to make sure there are no missing subsystems
  - make it clearer how maintained the subsystems are
- adding a wiki page to show how to become a subsystem maintainer
  - one valuable step...write testing around the subsystem
- means you've had to learn the subsystem (builds expertise)
- allows for regression testing the subsystem (esp. validating new patches)
- sub-maintainers sometimes disappear
  - can add another maintainer
  - actively poke the maintainer when patches are languishing
  - if you're going to be away, be sure to let list or backup know
- systematic patch tracking would help, patchwork doesn't quite cut it
- who receives pull request
  - list + blue swirl/aurelien for tcg, anthony picking up plenty of
other bits
- infrastructure subsystems (qdev, migration, etc..)
  - big invasive changes done externally, effective flag day for full merge
  - subsystem localized change (e.g. vmstate fix for a specific device)
maintainers can work it out, be sure to have both
- facilitating patch review and hopefully improving subsystem over time

kvm-autotest
- roadmap...refactor to centralize testing (handle the xen-autotest split off)
- internally at RH, lmr and cleber maintain autotest server to test
  branches (testing qemu.git daily)
  - have good automation for installs and testing
- seems more QA focused than developers
  - plenty of benefit for developers, so lack of developer use partly
cultural/visibility...
  - kvm-autotest team always looking for feedback to improve for
developer use case
- kvm-autotest day to have folks use it, write test, give feedback?
  - startup cost is/was steep, the day might be too much handholding
  - install-fest? (to get it installed and up and running)
- buildbot or autotest for testing patches to verify building and working
- one goal is to reduce mailing list load (patch resubmission because
  they haven't handled basic cases that buildbot or autotest would have
  caught)
- fedora-virt test day coming up on April 14th.  lucas will be on hand and
  we can piggy back on that to include kvm-autotest install and virt testing
- kvm autotest run before qemu pull request and post merge to track
  regressions, more frequent testing helps developers see breakage
  quickly
  - qemu.git daily testing already, only the sanity test subset 
- run more comprehensive stable set of tests on weekends
- one issue is the large number of known failures, need to make these
  easier to identify (and fix the failures one way or another)
- create database and verify (regressions) against that
  - red/yellow/green (yellow shows area was already broken)
- autotest can be run against server, not just on laptop
- how to do remote client display testing (e.g. spice client)
  - dogtail and LDTP
  - graphics could be tested w/ screenshot compares
- WHQL testing automated as well
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Reset device on system reset

2011-03-17 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
  static void reset_assigned_device(DeviceState *dev)
  {
 -PCIDevice *d = DO_UPCAST(PCIDevice, qdev, dev);
 +PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, dev);
 +AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 +char reset_file[64];
 +const char reset[] = 1;
 +int fd, ret;
 +
 +snprintf(reset_file, sizeof(reset_file),
 + /sys/bus/pci/devices/:%02x:%02x.%01x/reset,
 + adev-host.bus, adev-host.dev, adev-host.func);

need to consider segment: %04x:..., adev-host.seg, ...

 +/*
 + * Issue a device reset via pci-sysfs.  Note that we use write(2) here
 + * and ignore the return value because some kernels have a bug that
 + * returns 0 rather than bytes written on success, sending us into an
 + * infinite retry loop using other write mechanisms.
 + */
 +fd = open(reset_file, O_WRONLY);
 +if (fd != -1) {
 +ret = write(fd, reset, strlen(reset));
 +close(fd);
 +}

This will probably fail when it's managed by libvirt.  I expect it
will need some file ownership and security label mgmt added to device
assignement path I expect.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] device-assignment: Reset device on system reset

2011-03-17 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On system reset, we currently try to quiesce DMA by clearing the
 command register.  This assumes that nothing re-enables bus master
 support without first de-programming the device.  Use a bigger
 hammer to help the guest not shoot itself by issuing a function
 reset via sysfs on each system reset.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com

Looks good.

Acked-by: Chris Wright chr...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: Reset device on system reset

2011-03-17 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Thu, 2011-03-17 at 14:12 -0700, Chris Wright wrote:
  * Alex Williamson (alex.william...@redhat.com) wrote:
   +fd = open(reset_file, O_WRONLY);
   +if (fd != -1) {
   +ret = write(fd, reset, strlen(reset));
   +close(fd);
   +}
  
  This will probably fail when it's managed by libvirt.  I expect it
  will need some file ownership and security label mgmt added to device
  assignement path I expect.
 
 Already posted a patch for adding file rights, seems to be sufficient:
 
 https://www.redhat.com/archives/libvir-list/2011-March/msg00823.html

Awesome, I missed that path, thanks Alex!

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Mar 15

2011-03-15 Thread Chris Wright
QAPI -- http://wiki.qemu.org/Features/QAPI
- please review!
- Anthony would like to see feedback and plans to commit in a week
  (assuming agreement and no major issues in review)
- some concern about the maintainability of code generation
  - but still nothing concrete on the list, need to review and discuss
on the list
- some concern that implementation details may change the wire protocol
  - introduces a new mechanism for new signals (mask by default and
enabled explicitly)
  - disagreement over when/how to introduce new extensions
- libvirt feedback?
  - no protocol level changes
- old and new versions are testable with test suite and proves this
- c library implementation is critical to have unit tests and test
  driven development
  - thread safe?
- no shared state, no statics.
- threading model requires lock for the qmp session
  - licensiing?
- LGPL
  - forwards/backwards compat?
- designed with that in mind see wiki:
  
  http://wiki.qemu.org/Features/QAPI

QCFG -- http://wiki.qemu.org/Features/QCFG
- command line args translation to objects is complex and buggy
- schema + code generator to formalize this
- formally describe each command line option and generate code
  to build and validate objects
- provides systematic way to document command line options
- automatically 
- device_add does multiple conversions to go from qmp to qemuopts to
  objects
- move to basic c structures, and autogenerated marshalling code
- no plan to do this work soon, late in 0.15 cycle
  - same as qapi, fork a tree, do mass conversion and merge for 0.16 cycle
- qmp server mode to take all configuation commands before actually
  starting the guest
- can provide a config file 
- qdev...
  - could just bridge to setting and getting qdev properties
  - OR get to point where device objects go directly to qdev device init
- why not move command line to qmp instead of new schema?
  - single schema
- considerations for -M (didn't capture all of these)
- for all the details:
  
  http://wiki.qemu.org/Features/QCFG

Merging big changes
- in the past, evolving in tree has not worked well, leaving partial
  conversions
- QAPI/QCFG method of doing changes in external tree hopes to set new precedent
  - preserve patch/review on list
  - do full conversion
  - provide strong testing to show it works

Kemari merge plans
- just needs some ACKs
- Juan, Anthony, anybody else who is familiar with migration to review?

switch from gpxe to ipxe
- possible 0.15 release w/ ipxe (Alex looking into it)
- Michael Brown been helpful in fixing bugs, so compat
- Alex will send out mail soon on the details
- ipxe releases?  not yet, there are plans for it, should be coming RSN
- Stefan volunteers to help test
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for Mar 15

2011-03-15 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 03/15/2011 09:53 AM, Chris Wright wrote:
  QAPI
snip
 - c library implementation is critical to have unit tests and test
driven development
- thread safe?
  - no shared state, no statics.
  - threading model requires lock for the qmp session
- licensiing?
  - LGPL
- forwards/backwards compat?
  - designed with that in mind see wiki:
 
http://wiki.qemu.org/Features/QAPI
 
 One neat feature of libqmp is that once libvirt has a better QMP
 passthrough interface, we can create a QmpSession that uses libvirt.
 
 It would look something like:
 
 QmpSession *libqmp_session_new_libvirt(virDomainPtr dom);

Looks like you mean this?

   - request QmpSession - 
client  libvirt
   - return QmpSession  -

client - QmpSession - QMP - QEMU

So bypassing libvirt completely to actually use the session?

Currently, it's more like:

client - QemuMonitorCommand - libvirt - QMP - QEMU

 The QmpSession returned by this call can then be used with all of
 the libqmp interfaces.  This means we can still exercise our test
 suite with a guest launched through libvirt.  It also should make
 the libvirt pass through interface a bit easier to consume by third
 parties.

This sounds like it's something libvirt folks should be involved with.
At the very least, this mode is there now and considered basically
unstable/experimental/developer use:

 Qemu monitor command '%s' executed; libvirt results may be unpredictable!

So likely some concern about making it easier to use, esp. assuming
that third parties above are mgmt apps, not just developers.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Mar 8

2011-03-08 Thread Chris Wright
QAPI merge plans
- should be 100% back compat
- qmp moved over
- hmp moved over
- 1st pass, core infrastructure (includes test framework)
- 2nd pass, command conversion
- 3rd pass, more controversial bits
- adds dependencies: glib and python
- some testing based on kvm-unit-test micro-os instance (e.g. added a balloon
  and run commands against it to test)
  - add more functionality here? (kvm autotest is slow, above is quick)
- will hit some point where full functionality is needed
  - have a mini linux to do this (lags where driver updates are part of test)
- generated code can obfuscate the debugging process
  - code generator has some ugly corners (python writing C...)
  - but generated code should be debuggable, readable, etc.
- some grumbling regarding glib dependency
  - reducing NIH and relying on external functionality is solid way to
grow qemu as a project

Read wiki here and review closely:

  http://wiki.qemu.org/Features/QAPI

virt-agent
- json string converted to command (and vice versa)
- add to qmp schema - allows generated marshalling code to sanity check in/out
- problem with qmp not being bi-directional (rpc - in, events - out)
  - posted events allow migration to save and send unposted events
- any issues with guest agent interface extensibility
  - will add command to return schema
  - can add (optional) parameters to commands
- make libqmp a shared object for 0.16 (too much going on for 0.15)
- can terminate in qemu (e.g. vnc server internally qmp client to interact
  with guest cut 'n paste) or externally proxying to/from endpoint
- possibly revisit dynamic schema in future

glib, main loop, events
- (context was setfd changes from amit)
- iothread work is more critical to do first and get merged
- glib work starting just in qapi

iothread merge?
- progressing slowly, marcelo working on it
- have found regressions (signal handling code) (ifdef'd away for now)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when use sriov, guest os could not access the vf device assigned

2011-03-04 Thread Chris Wright
* lidong chen (chen.lidong.ker...@gmail.com) wrote:
 guest os could not access the vf assigned ,and print this error message .
 PCI: device :00:06.0 has unknown header type 7f, ignoring.
 PCI: device :00:07.0 has unknown header type 7f, ignoring.
 PCI: device :00:08.0 has unknown header type 7f, ignoring.
 
 the reason is the config file /sys/bus/pci/devices/xx/config of pci
 device could not access correctly after guestos start,
 the content qemu-kvm read from /sys/bus/pci/devices/xx/config is all FF.

This is mostly likely a combination of two bugs, both have since been
fixed (starting in v0.8.3).  What version of libvirt are you using?

One is the 82599 VF has an erratum that it does not show that it supports
Function Level Reset (FLR -- SR-IOV VFs are required to support FLR).
The second is libvirt had buggy handling of device reset for devices
that don't support FLR.  IIRC, what you are seeing is the result of a
secondary bus reset resetting all devices on that bus (including the PF).

Try upgrading libvirt.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-28 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 HOLY CRAP IT WORKS 8@


Hey, great! ;)

 ...almost...
 
 OK, clear_emulator_capabilities=0 solved the IRQ problem (which was,
 as it turns out, the rawio problem)
 My VM came up, both the tuners were there and after the firmware
 install I was able to tune and watch the slowest TV in the world over
 VNC.
 
 Thank god for that, i was really starting to believe that slashing out
 a lot of cash on my 890FX board and the fancy DDR3 ram it needed was a
 collosal waste of money.
 Sigh of relief
 
 Well, thank you all so much for helping me to get to this point!
 
 And yes, I did say almost works
 
 Looks like I've run straight into Chris' ref counting problem when
 shutting the guest down.
 Some sort of critical error barf was on the servers' screen when I
 shut down the guest, appeared to be very similar to Chris' example, in
 amd_iommu.c
 
 I'd post it but the server locks up after it's been shown and needed
 resetting. No idea how I would post that bit of dmesg as it gets reset
 after each boot.
 
 Is there a solution for this at the moment or will I have to wait for
 it to be patched?

No solution at the moment.  Will keep you posted.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-25 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 On Fri, Feb 25, 2011 at 12:06 AM, Chris Wright chr...@sous-sol.org wrote:
  * James Neave (robo...@gmail.com) wrote:
  OK, here's my latest dmesg with amd_iommu_dump and debug with no quiet
  http://pastebin.com/JxEwvqRA
 
  Yeah, that's what I expected:
 
  [    0.724403] AMD-Vi:   DEV_ALIAS_RANGE                 devid: 08:00.0 
  flags: 00 devid_to: 00:14.4
  [    0.724439] AMD-Vi:   DEV_RANGE_END           devid: 08:1f.7
 
  That basically says 08:00.0 - 08:1f.7 will show up as 00:14.4 (and
  should all go into same iommu domain).
 
  I've just figured out a sequence of echo DEV  PATH commands to call
  for 14.4 gets me past the claimed by pci-stub error and gets me to
  the failed to assign IRQ error.
  I'm going to narrow down the required sequence and then post it.
 
  Kind of afraid to ask, but does it include:
 
  (assuming 1002 4384 is the pci to pci bridge)
  echo 1002 4384  /sys/bus/pci/drivers/pci-stub/new_id
  echo :00:14.4  /sys/bus/pci/drivers/pci-stub/unbind
 
  (this has the side effect of detaching the bridge from its domain)
 
 Exact sequence is:
 
 echo 1002 4384  /sys/bus/pci/drivers/pci-stub/new_id
 echo :00:14.4  /sys/bus/pci/devices/:00:14.4/driver/unbind

OK, same, since driver is a symlink to pci-stub.

 I take it this is a bad thing then?

It just means the amd iommu driver might be susceptible to a refcounting
issue.  Indeed, here's what I do that  assigning a device below the
PCI-PCI bridge, then shutdown the guest:

[  406.535873] [ cut here ]
[  406.536864] kernel BUG at arch/x86/kernel/amd_iommu.c:2460!
[  406.536864] invalid opcode:  [#1] SMP 
[  406.536864] last sysfs file: 
/sys/devices/pci:00/:00:14.4/:03:06.0/device
[  406.536864] CPU 0 
[  406.536864] Modules linked in: kvm_amd kvm e1000e bnx2
[  406.536864] 
[  406.536864] Pid: 4265, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #61 
Toonie/Toonie
[  406.536864] RIP: 0010:[81025e53]  [81025e53] 
amd_iommu_domain_destroy+0x75/0x9d
[  406.536864] RSP: 0018:88013507fb78  EFLAGS: 00010202
[  406.536864] RAX: 8801346ebeb8 RBX: 8801346ebeb8 RCX: 00014f67
[  406.536864] RDX: 0202 RSI: 0202 RDI: 81a118a0
[  406.536864] RBP: 88013507fba8 R08:  R09: 88007900f8e8
[  406.536864] R10: 88013507f8d8 R11: 0006 R12: 8801346ebea8
[  406.536864] R13: 8800783b73a8 R14: 0202 R15: 880135089570
[  406.536864] FS:  7fe794db76e0() GS:88007fc0() 
knlGS:
[  406.536864] CS:  0010 DS:  ES:  CR0: 8005003b
[  406.536864] CR2:  CR3: 7c6fb000 CR4: 06f0
[  406.536864] DR0:  DR1:  DR2: 
[  406.536864] DR3:  DR6: 0ff0 DR7: 0400
[  406.536864] Process qemu-system-x86 (pid: 4265, threadinfo 88013507e000, 
task 88013496b090)
[  406.536864] Stack:
[  406.536864]  0009 880135089570 88007c734ca0 
0001
[  406.536864]  88007c74e3c8 0002 88013507fbc8 
813013b7
[  406.536864]  0001 880135089570 88013507fbe8 a003f
d81
[  406.536864] Call Trace:
[  406.536864]  [813013b7] iommu_domain_free+0x16/0x22
[  406.536864]  [a003fd81] kvm_iommu_unmap_guest+0x22/0x28 [kvm]
[  406.536864]  [a00440fd] kvm_arch_destroy_vm+0x15/0x119 [kvm]
[  406.536864]  [a003af59] kvm_put_kvm+0xde/0x103 [kvm]
[  406.536864]  [a003b64e] kvm_vcpu_release+0x13/0x17 [kvm]
[  406.536864]  [810e893a] fput+0x11b/0x1bc
[  406.536864]  [810e5db9] filp_close+0x67/0x72
[  406.536864]  [81040505] put_files_struct+0x70/0xc3
[  406.536864]  [8104058c] exit_files+0x34/0x39
[  406.536864]  [810418ec] do_exit+0x267/0x72e
[  406.536864]  [8104994c] ? lock_timer_base+0x26/0x4a
[  406.536864]  [8104be04] ? freezing+0xe/0x10
[  406.536864]  [81041e47] sys_exit_group+0x0/0x16
[  406.536864]  [8104dfce] get_signal_to_deliver+0x31c/0x33b
[  406.536864]  [81001fc6] do_notify_resume+0x8b/0x6c3
[  406.536864]  [8104be41] ? set_tsk_thread_flag+0xd/0xf
[  406.536864]  [8104e6b5] ? sys_rt_sigtimedwait+0x18e/0x208
[  406.536864]  [810efe00] ? path_put+0x1d/0x22
[  406.536864]  [81002c58] int_signal+0x12/0x17
[  406.536864] Code: 00 00 00 4c 89 eb 4d 8b 6d 00 49 8d 44 24 10 48 39
c3 75 df 4c 89 f6 48 c7 c7 a0 18 a1 81 e8 fa b5 56 00 41 83 7c 24 64 00
74 04 0f 0b eb fe 4c 89 e7 e8 2c f5 ff ff 4c 89 e7 e8 9e e2 ff ff 49 
[  406.536864] RIP  [81025e53] amd_iommu_domain_destroy+0x75/0x9d
[  406.536864]  RSP 88013507fb78
[  406.854138] ---[ end trace 13c9f9241c8b376b ]---
[  406.859182] Fixing recursive fault but reboot is needed!

  I assume this means that 00:14.4 is still left

Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-25 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 On Fri, Feb 25, 2011 at 11:02 PM, James Neave robo...@gmail.com wrote:
  On Fri, Feb 25, 2011 at 10:47 PM, James Neave robo...@gmail.com wrote:
  On Fri, Feb 25, 2011 at 12:06 AM, Chris Wright chr...@sous-sol.org wrote:
  * James Neave (robo...@gmail.com) wrote:
  OK, here's my latest dmesg with amd_iommu_dump and debug with no quiet
  http://pastebin.com/JxEwvqRA
 
  Yeah, that's what I expected:
 
  [    0.724403] AMD-Vi:   DEV_ALIAS_RANGE                 devid: 08:00.0 
  flags: 00 devid_to: 00:14.4
  [    0.724439] AMD-Vi:   DEV_RANGE_END           devid: 08:1f.7
 
  That basically says 08:00.0 - 08:1f.7 will show up as 00:14.4 (and
  should all go into same iommu domain).
 
  I've just figured out a sequence of echo DEV  PATH commands to call
  for 14.4 gets me past the claimed by pci-stub error and gets me to
  the failed to assign IRQ error.
  I'm going to narrow down the required sequence and then post it.
 
  Kind of afraid to ask, but does it include:
 
  (assuming 1002 4384 is the pci to pci bridge)
  echo 1002 4384  /sys/bus/pci/drivers/pci-stub/new_id
  echo :00:14.4  /sys/bus/pci/drivers/pci-stub/unbind
 
  (this has the side effect of detaching the bridge from its domain)
 
  thanks,
  -chris
 
 
  Exact sequence is:
 
  echo 1002 4384  /sys/bus/pci/drivers/pci-stub/new_id
  echo :00:14.4  /sys/bus/pci/devices/:00:14.4/driver/unbind
 
  I take it this is a bad thing then?
 
  I assume this means that 00:14.4 is still left claimed by pci-stub?
 
  Yes
 
  How are you determining this?  The lspci paste above has pci-stub for all
  of them.  The easiest thing might be to start with manually disabling
  host driver and reassigning pci-stub to: 00:14.4, 08: 06.2,3 and 0e.0
  Then giving the guest only 08:06.1.
 
  I determined it by being half asleep and not reading it properly... .
  You're right, all 5 devices were using pci-stub
 
  libvirtError: this function is not supported by the connection driver:
  Unable to reset PCI device :00:14.4: no FLR, PM reset or bus reset
  available
 
  Right, libvirt is more restrictive than qemu-kvm (forgot you were using
  libvirt here).
 
  What does that libvirt error mean? I can't find a definition.
  Am I limiting myself by using libvirt? Would not using it help and how
  would I go about not using it?
 
  Trouble now is that
  with shared IRQ we don't have a good way to handle that right now.
 
  Game over then?
  I've tried assigning the USB devices before, I couldn't do it because
  qemu doesn't support USB2 devices.
  I don't really understand where this IRQ conflict is, the firewire and
  the USB2 device share IRQ22 but I'm assigning them both to the VM?
  Is that still a problem?
  I don't suppose there's any way to change which IRQ they use in the
  BIOS or with a command is there?
 
  I don't know if it means anything but this page:
 
  http://linuxtv.org/wiki/index.php/Hauppauge_WinTV-HVR-2200
 
  Has the lspci output for the HVR-2200 which mentions MSI and IRQ255.
  My knowledge it very limited on this subject so I don't know if that's
  meaningless looking at the output from another person's lspci.
 
  Anything left to try?
 
  Regardless, many thanks for your help,
 
  James.
 
 
  On the off chance I tried disabling the firewire in the BIOS, which
  leaves only my tuner card using IRQ 20, 21 and 22.
  No difference, still complains about IRQs:
 
  Using raw in/out ioport access (sysfs - Input/output error)
  Failed to assign irq for hostdev0: Operation not permitted
  Perhaps you are assigning a device that shares an IRQ with another device?
 
  It does say Operation not permitted and that only perhaps I am
  assigning a device that shares an IRQ.
  Perhaps IRQ conflict it not the problem? They really are sitting on
  their own. Another permissions problem perhaps?
 
  Regards,
 
  James.
 
 
 I'm reading something about this error message being related to
 libvirt and CAP_SYS_RAWIO?

Depending on how new your libvirt is, you can force it to stop dropping
capabilities.  Look for the config item clear_emulator_capabilities
in /etc/libvirt/qemu.conf.  Setting this to 0 would verify that's the
problem (and not a real shared irq...i thought i saw sharing on
/proc/interrupts though).

 
 http://www.mail-archive.com/kvm@vger.kernel.org/msg34338.html
 http://www.google.co.uk/#hl=enxhr=tq=libvirt+CAP_SYS_RAWIOcp=21pf=psclient=psyaq=faqi=aql=oq=libvirt+CAP_SYS_RAWIOpbx=1fp=2d8e3f69fec095f4
 
 When I patch libvirt to not drop the capabilities, everything works
 as expected.

Well, that's a good point.  We fixed that a while ago, but I'm not sure
your kernel has that fix.

2.6.35.10-dmar (btw, random nitpick, dmar == intel dma remapping engine,
aka vt-d not amd iommu ;)

This was fixed in 2.6.36, commit:

48bb09e KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq

The last 2.6.35 stable release is 2.6.35.9 and does not have that fix.
So unless your .10-dmar has it, you could

Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-24 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 libvirtError: this function is not supported by the connection driver:
 Unable to reset PCI device :00:14.4: no FLR, PM reset or bus reset
 available

Right, libvirt is more restrictive than qemu-kvm (forgot you were using
libvirt here).

 There is nothing written to test.log when you try to start the VM with
 00:14.4 attached.
 
 At this point libvirt goes screwy and I have to restart it before I
 can remove 00:14.4 from the VM.

I assume this means that 00:14.4 is still left claimed by pci-stub?

 Failed to assign irq for hostdev0: Operation not permitted
 Perhaps you are assigning a device that shares an IRQ with another device?
 kvm: -device 
 pci-assign,host=08:06.0,id=hostdev0,configfd=58,bus=pci.0,addr=0x6:

Believe it or not this is progress ;)  You have passed the point that
it was failing before (the iommu domain issue).  Trouble now is that
with shared IRQ we don't have a good way to handle that right now.

 Device 'pci-assign' could not be initialized


 2011-02-23 19:21:13.958: shutting down
 
 dmesg:
 http://pastebin.com/70D26xp4
 
 This bit is different:
 
 [  201.625221] uhci_hcd :08:06.0: remove, state 4
 [  201.625237] usb usb4: USB disconnect, address 1
 [  201.625514] uhci_hcd :08:06.0: USB bus 4 deregistered
 [  201.625595] uhci_hcd :08:06.0: PCI INT A disabled
 [  201.626028] pci-stub :08:06.0: claimed by stub
 [  201.631922] uhci_hcd :08:06.1: remove, state 4
 [  201.631937] usb usb9: USB disconnect, address 1
 [  201.632195] uhci_hcd :08:06.1: USB bus 9 deregistered
 [  201.632274] uhci_hcd :08:06.1: PCI INT B disabled
 [  201.632419] pci-stub :08:06.1: claimed by stub
 [  201.638160] ehci_hcd :08:06.2: remove, state 1
 [  201.638172] usb usb10: USB disconnect, address 1
 [  201.638178] usb 10-1: USB disconnect, address 2
 [  201.721626] dvb-usb: Hauppauge Nova-T 500 Dual DVB-T successfully
 deinitialized and disconnected.
 [  201.721990] ehci_hcd :08:06.2: USB bus 10 deregistered
 [  201.722126] ehci_hcd :08:06.2: PCI INT C disabled
 [  201.725042] pci-stub :08:06.2: claimed by stub
 [  201.731830] firewire_ohci :08:0e.0: PCI INT A disabled
 [  201.731838] firewire_ohci: Removed fw-ohci device.
 [  201.732536] pci-stub :08:0e.0: claimed by stub
 [  202.303880] device vnet0 entered promiscuous mode
 [  202.305184] virbr0: topology change detected, propagating
 [  202.305193] virbr0: port 1(vnet0) entering forwarding state
 [  202.305199] virbr0: port 1(vnet0) entering forwarding state
 [  202.433007] pci-stub :08:06.0: PCI INT A - GSI 20 (level, low) - IRQ 
 20
 [  202.470076] pci-stub :08:06.0: restoring config space at offset
 0x1 (was 0x210, writing 0x211)
 [  202.697270] assign device 0:8:6.0
 [  202.697325] deassign device 0:8:6.0
 [  202.730080] pci-stub :08:06.0: restoring config space at offset
 0x1 (was 0x210, writing 0x211)
 [  202.730107] pci-stub :08:06.0: PCI INT A disabled
 
 This time the pci-stub claimed lines are not all bunched up and there
 is only one per device, rather than three per device.
 Also for the first time it says assign device 0:8:6.0 rather than
 assign device 0:8:6.0 failed
 It them immediately deassigns the device and stops.
 
 test.log shows:
 
 Failed to assign irq for hostdev0: Operation not permitted
 Perhaps you are assigning a device that shares an IRQ with another device?
 
 lspsci -vv for the relevant devices shows:
 http://pastebin.com/EUtUMj8x
 
 00:14.4 now appears to be using pci-stub as it's driver, as well as
 08:06.1, 2, 3 but not 0e.0

How are you determining this?  The lspci paste above has pci-stub for all
of them.  The easiest thing might be to start with manually disabling
host driver and reassigning pci-stub to: 00:14.4, 08: 06.2,3 and 0e.0
Then giving the guest only 08:06.1.

 Anyway, that's all for now.

Thanks for testing.

 I think I'll try 'amd_iommu_dump' next, does it write to dmesg?

Yes it does.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-24 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 OK, here's my latest dmesg with amd_iommu_dump and debug with no quiet
 http://pastebin.com/JxEwvqRA

Yeah, that's what I expected:

[0.724403] AMD-Vi:   DEV_ALIAS_RANGE devid: 08:00.0 flags: 
00 devid_to: 00:14.4
[0.724439] AMD-Vi:   DEV_RANGE_END   devid: 08:1f.7

That basically says 08:00.0 - 08:1f.7 will show up as 00:14.4 (and
should all go into same iommu domain).

 I've just figured out a sequence of echo DEV  PATH commands to call
 for 14.4 gets me past the claimed by pci-stub error and gets me to
 the failed to assign IRQ error.
 I'm going to narrow down the required sequence and then post it.

Kind of afraid to ask, but does it include:

(assuming 1002 4384 is the pci to pci bridge)
echo 1002 4384  /sys/bus/pci/drivers/pci-stub/new_id
echo :00:14.4  /sys/bus/pci/drivers/pci-stub/unbind

(this has the side effect of detaching the bridge from its domain)

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-24 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 Just out of interest, what kind of mileage would I expect out of
 buying a shiny new PCIe tuner?

Hard to say.  One advantage would be if it's using MSI or MSI-X
interrupts.

 Can I pass through PCIe?

Often, yes (still some caveats w.r.t. extended config space I believe).

 Would it work better because it wouldn't be
 behind a bridge? WOULD it not be behind a bridge?

You should have a PCIe slot that does not sit behind a PCI-PCI bridge.

 As much as I'd hate to solve a problem with the application of money... :(

If you just want _one_ tuner to go to the guest, you should be able to
do that by unbinding the other devices and giving the guest just the one
usb controller (assuming just assigning the usb device itself is hitting
usb/qemu stack limitations).  The trick is to be sure to unbind any host
devices that are sharing interrupts with the one device you want the
guest to have.  With USB controllers you just have to be sure you know
which ports they map to so you don't kill a keyboard, mouse, external
disk, etc...

 (OT question, on mailing lists should I use Reply All or just reply
 and change the To address to kvm.vger.kernel.org?)

Reply all is best.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is PCI pass-through possible with host+kvm on latest linux but guest on an older linux?

2011-02-23 Thread Chris Wright
* Chigurupati, Chaks (ch...@wichorus.com) wrote:
 If my hardware is VT-d capable and the host is latest linux+kvm with all
 the needed VT-d support but the guest is an older linux (say 2.6.27), will
 I be able to use PCI pass-through to hot-plug a PCI device from one guest
 to another guest? Any comments/thoughts are appreciated.

The basic requirement in the guest to do what you describe is that
it has hotplug capability and has the driver for the device you want
to assign to it.  All the rest of the requirements are on the host sw
and hw (linux+kvm capable of device assignment, which latest should be,
and hw VT-d or AMD IOMMU).

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Feb 22

2011-02-22 Thread Chris Wright
0.14 recap
- keeping schedule on wiki was helpful
- changelog was helpful
- testing (could even more emphasis could be improved)
- -rc cycles
  - -rc2 and final release just hours

0.15
- tentative date July 1st
- qapi
- qed features
- virtagent?
  - depends on whether to terminate in qemu vs external
- terminating w/in qemu is close to feature complete
- using QMP (kinda, QObject - JSON marshalling, still use HTTP)
- QMP is not bi-directional XMLRPC, one way with event posting
- XMLRPC + server logic add to the basic QEMU side attack surface
  - splitting out to external process
- state associated with guest in external process complicates live migration
  - e.g. handling in-process command in server
  - guest client reconnects during migration
  - can virtagent features be stateless 
- Avi's favorite Lua based extension language coming RSN ;)
  - let's use copy and paste as a concrete example
- usecase to help define the requirements and expose
  architectural
- Jes will do this, make concrete counter proposal to hosting
  virtagent server in qemu
  - splitting QEMU into more modular components is a large architectural
step, but better step

Block format acceptance
- qcow3 wiki starting

GSoC projects
- only 3 so far, mentoring organization applications Feb 28th
  - can update app 
- please add your thoughts here so that we can have a successful
- Luiz will send out a note as more explicit reminder

gpxe vs ipxe
- gpxe still stagnate
- ipxe accepting patches (e.g. igbvf)
- perhaps switch in 0.15 (Alex take a look)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-22 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 On Tue, Feb 22, 2011 at 1:51 AM, Chris Wright chr...@sous-sol.org wrote:
  * James Neave (robo...@gmail.com) wrote:
  Does anybody know the debug kernel switches for iommu?
 
  Two helpful kernel commandline options are:
 
  amd_iommu_dump debug (and drop quiet)
 
  The problem is when you attach the device (function) you're getting
  stuck up in conflicts with the existing domain for that function.
 
  My guess is that all the functions are behind a PCI to PCI bridge, so the 
  alias
  lookup is finding a conflict.
 
 Yes, it's behind a PCI-PCI bridge I think, here's the blurb from an
 earlier email:

Sorry, I missed that in your original mail, thanks for reposting.

 cat /proc/interruts
 http://pastebin.com/LQdB3hms
 
 lspci -vvv
 http://pastebin.com/GJDkC8B4

 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)

 lspci -t -v
 http://pastebin.com/Ftx8Hfjt

Yup, that's what I expected:

 +-14.4-[08]--+-06.0  VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller
 |+-06.1  VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller
 |+-06.2  VIA Technologies, Inc. USB 2.0
 |\-0e.0  Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller

I'd now expect to see (if you boot with amd_iommu_dump) some IVRS
details showing an alias range entry basically showing 08:* pointing
back to 00:14.4.  This means that from the point of view of the IOMMU the
devices 08:06.0, 08:06.1, 08:06.2, 08:0e.0 will all show up as if they
are 00:14.4.

When you assign a device to a guest, the guest VM gets an IOMMU domain
(a context to manage IOMMU page table mappings) and the device is put
into that guest's IOMMU domain.  However, if the device is behind a
PCI-PCI bridge it will appear as an alias for the bridge itself.  The
bridge is a PCI device with an IOMMU domain.  When trying to assign a
device to a guest there's some sanity checking to verify that the device
(or its alias) aren't already under some IOMMU domain other than the
guest VM's IOMMU domain.

I suspect this is what you are hitting.  You could test this theory by
adding 2 more devices to your guest -- the firewire device (08:0e.0)
and the PCI-PCI bridge itself (00:14.4).

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-21 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 I don't know why you're getting -EBUSY for this device, but maybe we can
 start from a clean slate and see if it helps.  Here's what I would
 suggest:

I bet this is an AMD IOMMU box.  Can we get full dmesg?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2

2011-02-21 Thread Chris Wright
* James Neave (robo...@gmail.com) wrote:
 Finally, here is the very latest dmesg:
 http://pastebin.com/9HE61K62

OK, this is an AMD IOMMU box.

[0.00] ACPI: IVRS cfcf9830 000E0 (v01  AMD RD890S 00202031 
AMD  )

It's discovered and enalbed properly:

[0.698992] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
[0.710287] AMD-Vi: Lazy IO/TLB flushing enabled

 Does anybody know the debug kernel switches for iommu?

Two helpful kernel commandline options are:

amd_iommu_dump debug (and drop quiet)

The problem is when you attach the device (function) you're getting
stuck up in conflicts with the existing domain for that function.

My guess is that all the functions are behind a PCI to PCI bridge, so the alias
lookup is finding a conflict.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Test report, kernel a685b38... qemu 671d89d...

2011-02-16 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 On Wed, 2011-02-16 at 11:10 +0200, Avi Kivity wrote:
  On 02/16/2011 11:05 AM, Hao, Xudong wrote:
   Hi, all,
   This is KVM test result against kvm.git 
   a685b38e272587e644fedd37269ddb82df21c052, and qemu-kvm.git 
   671d89d6411655bb4f8058ce6eb86bb0bb8ec978.
  
   Currently qemu-kvm can build successfully on RHEL5, and Qcow image create 
   failure issue also got fixed, our nightly testing resumed. One VT-d 
   device assignment issue opened on latest KVM.
  
   New issue:
   1. [VT-d] VT-d device passthrough fail to guest
   https://bugzilla.kernel.org/show_bug.cgi?id=29232
  
 
 Extremely reproducible.  Looks like it's a result of this kernel change:
 
 commit 47970b1b2aa64464bc0a9543e86361a622ae7c03
 Author: Chris Wright chr...@sous-sol.org
 Date:   Thu Feb 10 15:58:56 2011 -0800
 
 pci: use security_capable() when checking capablities during config space 
 re
 
 Eric Paris noted that commit de139a3 (pci: check caps from sysfs file
 open to read device dependent config space) caused the capability check
 to bypass security modules and potentially auditing.  Rectify this by
 calling security_capable() when checking the open file's capabilities
 for config space reads.
 
 Reported-by: Eric Paris epa...@redhat.com
 Signed-off-by: Chris Wright chr...@sous-sol.org
 Signed-off-by: James Morris jmor...@namei.org
 
 Chris, why isn't this working for us?  Thanks,

It's a broken patch, the fix is floating about.  Linus reverted it and I
supplied this patch after the revert:


From 683034fca7b8c322f87b8b4f664f1ae0b5fc Mon Sep 17 00:00:00 2001
From: Chris Wright chr...@sous-sol.org
Date: Mon, 14 Feb 2011 19:12:00 -0500
Subject: [PATCH] pci: use security_capable() when checking capablities during 
config space read

This reintroduces commit 47970b1b which was subsequently reverted
as f00eaeea.  The original change was broken and caused X startup
failures and generally made privileged processes incapable of reading
device dependent config space.  The normal capable() interface returns
true on success, but the LSM interface returns 0 on success.  This thinko
is now fixed in this patch, and has been confirmed to work properly.

So, once again...Eric Paris noted that commit de139a3 (pci: check caps
from sysfs file open to read device dependent config space) caused the
capability check to bypass security modules and potentially auditing.
Rectify this by calling security_capable() when checking the open file's
capabilities for config space reads.

Reported-by: Eric Paris epa...@redhat.com
Tested-by: Dave Young hidave.darks...@gmail.com
Acked-by: James Morris jmor...@namei.org
Cc: Dave Airlie airl...@gmail.com
Cc: Alex Riesen raa.l...@gmail.com
Cc: Sedat Dilek sedat.di...@googlemail.com
Cc: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Chris Wright chr...@sous-sol.org
---
 drivers/pci/pci-sysfs.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 8ecaac9..ea25e5b 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -23,6 +23,7 @@
 #include linux/mm.h
 #include linux/fs.h
 #include linux/capability.h
+#include linux/security.h
 #include linux/pci-aspm.h
 #include linux/slab.h
 #include pci.h
@@ -368,7 +369,7 @@ pci_read_config(struct file *filp, struct kobject *kobj,
u8 *data = (u8*) buf;
 
/* Several chips lock up trying to read undefined config space */
-   if (cap_raised(filp-f_cred-cap_effective, CAP_SYS_ADMIN)) {
+   if (security_capable(filp-f_cred, CAP_SYS_ADMIN) == 0) {
size = dev-cfg_size;
} else if (dev-hdr_type == PCI_HEADER_TYPE_CARDBUS) {
size = 128;
-- 
1.7.3.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Feb 15

2011-02-15 Thread Chris Wright
QAPI and QMP
- Anthony adding a new wiki page to describe all of this
- specified in formal schema using JSON
  - includes documenation in javadoc-like syntax
  - can generate api (possibly protocol) docs
  - documenting each command and expected errors
- creates marshalling functions and C interfaces
- can generate C library
  - facilitates unit tests/regression tests
- new and old code both exist in Anthony's tree
  - allows unit tests to run on both to verify
  - will remove old and force a flag day on merging in for 0.15
- still need to convert human monitor commands
  - goal to convert all of human monitor to QMP
- events?
  - still not consumable from internal use
  - model signals and slots
- similar to notifier lists, but can pass arbitrary data
- client connects to signal via QMP
  - how to extend?
- optional parameters (ABI bump)
  - no way to know if client is aware of and consuming the optional
parameters
- add new events
  - client required to register for new events when the know about
them, server can generate different logic based on clients
capability
- first release may not include shared library (lack of libconf/autotool)
  - could 
- QMP session in default well-known location
  - allows iteration of all running QMP sessions
  - per-user directory to handle user-level isolation

qdev future
- have an object model, but can't do polymorphism (i.e. bus level)
- could use more oop style, use GObject, use C++...no great ideas
- no major qdev plans for 0.15
- would be useful to have the ability to do device level unit testing
  - cleaner device model, better encapsulation
  - this is both the device side interfaces, but also interfaces back to qemu
  - ability to do something like a virtual PCI bus to be a test harness
to interact with a device
  - back to the GObject, oop, C++ questions?
- IDL based code generation to generate VMState in effort to make
  migration more verifiable
- VMState
  - need to focus on serialized guest visible state
- start with all state and remove obviously internal only state
- start with only guest visible state (structure separation)
  - verfiable
- need a qdev tree maintainer?
- some disagreement on exactly how much 
- qdev autodoc patches? (posted and ack'd multiple times)

bad patches committed that are not on list
- please inform of specifics incidents, this should not be happening

SeaBIOS update?
- w/out we will have features that can't be used 
- need a release..
  - 0.15 will need good planning and dates and communication with Kevin

0.14-rc2 tagged please review for any missing patches, 0.14.0 likely
tagged late today

revisit new - old migration
- Amit offers virtio-serial patches and some legwork
- tabled discussion to list, possibly next week's call
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Feb 15

2011-02-14 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Feb 8

2011-02-08 Thread Chris Wright
Automated builds and testing
- found broken 32-bit
- luiz suggested running against maintainer trees
- daniel gollub offered to take on maintenance
- integration with kvm-autotest?
  - lucas, daniel, stefan...
  - testing each git commit is probably overkill and too expensive
  - current autotest run (each 48-hours to batch it up)
  - stefan currently running once a day, autotest run is 3 hours, so
daily should work
- need an integration tree to run build test on?
  - probably still too early

QEMU testing
- kvm unit tests
  - small standalone kernel that exercises paths that have shown bugs
http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=summary
- Michael Roth recent sent RFC for qtest
  (http://www.mail-archive.com/qemu-devel@nongnu.org/msg54191.html)
  - test module (-init(), -run()) which runs in place of vcpu threads to
set up a test framework to do targetted testing, for example, of devices
  - normal C code, access to qemu internal functions
  - not just functional device testing, but can also to fuzz testing
  - looking feedback/users/test developers/etc
- PPC (just kernel + initrd to boot, and verify boots are identical)
  - full install in many cases is too long, and can trigger other issues
(alex had examples of emulation being slow enough that login screen
times out)
- tcg basic testing to verify qemu-kvm patch isn't breaking tcg

Cross version migration (new-old version migration thread)
- downstreams want this, support this upstream?
- versions vs. subsections (subsections should allow this to work)
- (as usual) more vmstate conversion needed
- qdev/vmstate both examples of partially completed work that need more
  attention 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Feb 8

2011-02-07 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Feb 1

2011-02-01 Thread Chris Wright
KVM upstream merge: status, plans, coordination
- Jan has a git tree, consolidating
- qemu-kvm io threading is still an issue
- Anthony wants to just merge
  - concerns with non-x86 arch and merge
  - concerns with big-bang patch merge and following stability
- post 0.14 conversion to glib mainloop, non-upstreamed qemu-kvm will be
  a problem if it's not there by then
- testing and nuances are still an issue (e.g. stefan berger's mmio read issue)
- qemu-kvm still evolving, needs to get sync'd or it will keep diverging
- 2 implementations of main init, cpu init, Jan has merged them into one
  - qemu-kvm-x86.c file that's only a few hundred lines
- review as one patch to see the fundamental difference

QMP support status for 0.14
- declare QMP fully supported
  - caveats: specific errors aren't guaranteed yet (primarily documentation)
  - human monitor passthrough command is best effort
- device tree structure is not reliable, use name not path
- will send out patch to update qmp-commands.hx to document this (and Cc
  libvirt)
- schema file (json subset which is python) and code generator to
  generate code with C structures, also generates client library for
  test cases (can test against new and old qmp server to verify hasn't
  changed)
  - HMP implemented in terms of QMP only
  - at the end should have a test framework to test all commands
  - glib/gtest framework

0.14 stable fork today
already posted 0.14 patches?
- will pick up all those patches before forking, fork at the end of the day
- will grab latest SeaBIOS and vgabios

SeaBIOS update for 0.14 (AHCI boot capable version)
- need to check if (and why) AHCI is disabled by default 
  - assuming no fundamental issues, could be enabled and become an
experimental new 0.14 feature

Summer of code 2011
- http://wiki.qemu.org/Google_Summer_of_Code_2011
- update wiki page with project ideas (let Anthony or Luiz know if you
  want to be a mentor)
- application is due at end of the month
- mentors...be prepared that projects may take longer than just the
  summer of code to complete
- join #qemu-gsoc on OFTC for gsoc discussions

Going to FOSDEM?  agraf will be there...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Jan 25

2011-01-24 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Jan 18

2011-01-18 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, this week's call is cancelled.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Jan 18

2011-01-17 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Jan 11

2011-01-11 Thread Chris Wright
KVM Forum 2011
- expand the scope? yes, continue up the stack
- how long?  2 days (maybe 2 1/2 - 3 space permitting)
- where?  Vancouver with LinuxCon

Spice guest agent:
- virt agent, matahari, spice agent...what is in spice agent?
- spice char device
  - mouse, copy 'n paste, screen resolution change
- could be generic (at least input and copy/paste)
  - send protocol details of what is being sent
- need to look at how difficult it is to split it out from spice
  (how to split out in qemu vs. libspice)
- goal to converge on common framework
- more discussion on char device vs. protocol
  - eg. mouse_set breaks if mouse channel is part pv and part spice specific
- Alon will send link to protocol and try to propose new interfaces

migration and block devices:
- need to invalidate data after first read on target,
  because it can be stale
- close + reopen is what was done for NFS
- iscsi: can issue ioctl(BLKFLSBUF) to flush, but it's CAP_SYS_ADMIN only
- O_DIRECT to avoid cache (concerns that it's not guaranteed)
- agree change the default (cache=none for 

qemu patch queue is long:
- slow to return from break
- patience and more patch review will help make sure things are applied
  and don't fall through cracks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device-assignment: chmod the rom file before opening read/write

2011-01-04 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 The PCI sysfs rom file is exposed read-only by default, but we need
 to write to it to enable and disable the ROM around the read.  When
 running as root, the code works fine as is, but when running
 de-privileged via libvirt, the fopen(r+) will fail if the file
 doesn't have owner write permissions.  libvirt already gives us
 ownership of the file, so we can toggle this around the short
 usage window ourselves.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com

Acked-by: Chris Wright chr...@redhat.com

 ---
 
  hw/device-assignment.c |   17 +++--
  1 files changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 8446cd4..da0a4d7 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1866,16 +1866,18 @@ static void 
 assigned_dev_load_option_rom(AssignedDevice *dev)
  return;
  }
  
 -if (access(rom_file, F_OK)) {
 -fprintf(stderr, pci-assign: Insufficient privileges for %s\n,
 -rom_file);
 +/* The ROM file is typically mode 0400, ensure that it's at least 0600
 + * for the following fopen to succeed when qemu is de-privileged. */
 +if (chmod(rom_file, (st.st_mode  ALLPERMS) | S_IRUSR | S_IWUSR)) {
 +fprintf(stderr, pci-assign: Insufficient privileges for %s (%s)\n,
 +rom_file, strerror(errno));
  return;
  }
  
  /* Write 1 to the ROM file to enable it */
  fp = fopen(rom_file, r+);
  if (fp == NULL) {
 -return;
 +goto restore_rom;
  }
  val = 1;
  if (fwrite(val, 1, 1, fp) != 1) {
 @@ -1895,17 +1897,20 @@ static void 
 assigned_dev_load_option_rom(AssignedDevice *dev)
  or load from file with romfile=\n, rom_file);
  qemu_ram_free(dev-dev.rom_offset);
  dev-dev.rom_offset = 0;
 -goto close_rom;
 +goto disable_rom;
  }
  
  pci_register_bar(dev-dev, PCI_ROM_SLOT,
   st.st_size, 0, pci_map_option_rom);
 -close_rom:
 +disable_rom:
  /* Write 0 to disable ROM */
  fseek(fp, 0, SEEK_SET);
  val = 0;
  if (!fwrite(val, 1, 1, fp)) {

Nitpick...could you unify this? (!= 1, like the enabling write check)

  DEBUG(%s\n, Failed to disable pci-sysfs rom file);
  }
 +close_rom:
  fclose(fp);
 +restore_rom:
 +chmod(rom_file, st.st_mode  ALLPERMS);
  }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-22 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
 I have few (may be stupid) questions on this 
 
  From: Chris Wright [chr...@sous-sol.org]
  That's the issue.  The IOMMU has a set of page tables for each DeviceID.
  For most devices, the DeviceID is the same as the Bus:Dev.Func (the PCI
  address) of the device.  But this does not always work.  One example is
  when a device is behind a PCI-to-PCI Bridge.  In that case, the device
  memory read/write requests (attempts to DMA) will appear as if they came
  from the bridge.
 
 Oh I see, I can understand this part.
 
  
  00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
  Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
  
  That's the bridge that sits between your e100 and the IOMMU.
 
 Can you please explain how did you make out the device 01:05:0 is behind the 
 bridge?
 01:05.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 
 (rev 0c)

A PCI bridge has config space that states what busses are behind it.
The bridge at 00:14.4 is a bridge between bus 0 and bus 1, you can tell
from this line:

 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64

There are no other devices behind that bridge (so theoretically you could
safely use it for device assignment).

 If you can explain this, I will try to find if the other network card
 also sits behind the bridge or not.

The other network interface card you have (03:00.0) is a PCIe device,
it's upstream is the PCIe port.  It should not have the aliasing issue,
and should work.

00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express 
gpp port F)
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.  RTL8111/8168B PCI 
Express Gigabit Ethernet controller

 I would like to know the same thing
 for the PCIe GPU card connected to my machine. If GPU card is also sitting
 behind the bridge then the hardware may be useless for the project. :(

The GPU is also in a PCIe port, here:

00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express 
gpp port B)
Bus: primary=00, secondary=06, subordinate=06, sec-latency=0

06:00.0 VGA compatible controller: nVidia Corporation G86 [Quadro NVS 290]

 Please explain how to find out this information.

Using lspci -t you can see the topology pretty easily.  Otherwise you
can sift through lspci output to find the topology.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-22 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
 Is the answer 
 
 All PCI buses located behind a PCI-PCI bridge must reside between the 
 seondary bus number and the subordinate bus number (inclusive).
 
 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
   Bus: primary=00, secondary=01, subordinate=01, 
 sec-latency=64
 
 So all the PCI devices between secondary (01) and subordinate (01) (in this 
 case same) are behind the PCI Bridge. Correct me if I am wrong.

That's correct.  You'll find secondary  subordinate when there's
another bridge downstream.

 01:05.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 
 (rev 0c)
 As Bus ID is 01 this ethernet controller is behind the PCI Bridge

Yup.

 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B 
 PCI Express Gigabit Ethernet controller (rev 06)
 As Bus: 03, I can assume this is not behind the PCI Bridge
 
 But if subordinate would have been, say 03 or 04, then even this ethernet 
 card (03:00:0) would be behind the PCI Bridge.
 
 Am I correct?

That's right.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-22 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
  From: Chris Wright [chr...@sous-sol.org]
  I would like to know the same thing
  for the PCIe GPU card connected to my machine. If GPU card is also sitting
  behind the bridge then the hardware may be useless for the project. :(
 
  The GPU is also in a PCIe port, here:
 
  00:02.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI 
  express gpp port B)
 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
 
  06:00.0 VGA compatible controller: nVidia Corporation G86 [Quadro NVS 290]
 
 As the secondary and subordinate are 06, it means GPU pass through won't work.

No, it just means there are nor more bridges behind 00:02.0.  The GPU is
a PCIe device in a PCIe port (which happens to look a lot like a
bridge).  So, while GPU assignment has some tricky issues, I don't think
you'll be stopped by the IOMMU.

  Please explain how to find out this information.
 
  Using lspci -t you can see the topology pretty easily.  Otherwise you
  can sift through lspci output to find the topology.
 
 Thanks a lot Chris for explaining everything.

You're welcome.  Good luck with your project.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Dec 21

2010-12-21 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, today's call is cancelled.

Also, given people's holiday and vacation schedules, next week's call is
cancelled.  Talk again after the New Year.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
 I am facing a problem with enabling the IOMMU.
 
 Dec 21 15:50:57 prasad-kvm kernel: [0.00] Aperture pointing to e820 
 RAM. Ignoring.
 Dec 21 15:50:57 prasad-kvm kernel: [0.00] Your BIOS doesn't leave a 
 aperture memory hole
 Dec 21 15:50:57 prasad-kvm kernel: [0.00] Please enable the IOMMU 
 option in the BIOS setup
 
 Dec 21 15:50:57 prasad-kvm kernel: [2.790913] pci :01:05.0: Firmware 
 left e100 interrupts enabled; disabling
 Dec 21 15:50:57 prasad-kvm kernel: [2.791941] pci :00:00.2: PCI INT A 
 - GSI 55 (level, low) - IRQ 55
 Dec 21 15:50:57 prasad-kvm kernel: [2.792775] AMD-Vi: Enabling IOMMU at 
 :00:00.2 cap 0x40
 Dec 21 15:50:57 prasad-kvm kernel: [2.800989] AMD-Vi: Lazy IO/TLB 
 flushing enabled
 
 
 I have enabled IOMMU in the BIOS, but I am not sure why it is still asking to 
 enabled IOMMU in BIOS. Do I need to worry about this?

It's unfortunate wording.  It's telling you that the GART is missing,
which is fine because you have an IOMMU.

 Besides I don't see the DMAR message similar to the one mentioned on the link 
 http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

That wiki page is specific to Intel VT-d.  You have an AMD box with IOMMU,
so all looks fine.

Are you interested in using the IOMMU to do direct PCI device assignment
to a guest?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
  From: Chris Wright [chr...@sous-sol.org]
 
  I have enabled IOMMU in the BIOS, but I am not sure why it is still asking 
  to enabled IOMMU in BIOS. Do I need to worry about this?
 
  It's unfortunate wording.  It's telling you that the GART is missing,
  which is fine because you have an IOMMU.
 
  Besides I don't see the DMAR message similar to the one mentioned on the 
  link
  http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM
 
  That wiki page is specific to Intel VT-d.  You have an AMD box with IOMMU,
  so all looks fine.
 
 Yes I am using AMD processor and ASUS motherboard. Both of them have the 
 IOMMU support, atleast it is mentioned on the Xen VT-d

Looks like we need some additional info in the wiki.  Care to create an
account and add the info?

  Are you interested in using the IOMMU to do direct PCI device assignment
  to a guest?
 
 Thanks a lot for your reply. Yes I am interested in working on GPU 
 pass-through to Virtual Machine. But for now I am trying to pass-through a 
 network card to VM.
 
 r...@prasad-kvm:~/VMDisks# qemu-system-x86_64 -hda Ubuntu-10.10-amd64.img -m 
 1024M -device pci-assign,host=01:05.0
 Failed to assign device (null) : Device or resource busy
 *** The driver 'pci-stub' is occupying your device :01:05.0.
 ***
 *** You can try the following commands to free it:
 ***
 *** $ echo 8086 1229  /sys/bus/pci/drivers/pci-stub/new_id
 *** $ echo :01:05.0  /sys/bus/pci/drivers/pci-stub/unbind
 *** $ echo :01:05.0  /sys/bus/pci/drivers/pci-stub/bind
 *** $ echo 8086 1229  /sys/bus/pci/drivers/pci-stub/remove_id
 ***

Heh, this error is a little odd.  It's telling you the pci-stub
driver already has this device.  Then it's telling you to unbind it
from pci-stub, and bind it to pci-stub.  That error message is meant to
tell you that the real host driver (in your case e100) has the device,
unbind from it, and bind to pci-stub.

 qemu-system-x86_64: -device pci-assign,host=01:05.0: Device 'pci-assign' 
 could not be initialized
 r...@prasad-kvm:~/VMDisks# echo 8086 1229  
 /sys/bus/pci/drivers/pci-stub/new_id
 r...@prasad-kvm:~/VMDisks# echo :01:05.0  
 /sys/bus/pci/drivers/pci-stub/unbind
 r...@prasad-kvm:~/VMDisks# echo :01:05.0  
 /sys/bus/pci/drivers/pci-stub/bind
 r...@prasad-kvm:~/VMDisks# echo 8086 1229  
 /sys/bus/pci/drivers/pci-stub/remove_id
 r...@prasad-kvm:~/VMDisks# qemu-system-x86_64 -hda Ubuntu-10.10-amd64.img -m 
 1024M -device pci-assign,host=01:05.0
 Failed to assign device (null) : Device or resource busy
 *** The driver 'pci-stub' is occupying your device :01:05.0.
 
 
 [  605.015852] e100 :01:05.0: BAR 0: can't reserve [mem 
 0xf9cff000-0xf9cf]
 [  605.015855] kvm_vm_ioctl_assign_device: Could not get access to device 
 regions

This is what is returning -EBUSY and triggering the error message.

 [  667.410228] e100 :01:05.0: PCI INT A disabled
 [  700.500278] pci-stub: invalid id string 
 [  707.730636] pci-stub :01:05.0: claimed by stub
 [  734.755491] pci-stub :01:05.0: PCI INT A - GSI 20 (level, low) - IRQ 
 20
 [  734.790077] pci-stub :01:05.0: restoring config space at offset 0xf 
 (was 0x38080100, writing 0x3808010b)
 [  734.790095] pci-stub :01:05.0: restoring config space at offset 0xc 
 (was 0x0, writing 0xf9ce)
 [  734.790113] pci-stub :01:05.0: restoring config space at offset 0x6 
 (was 0x0, writing 0xf9cc)
 [  734.790123] pci-stub :01:05.0: restoring config space at offset 0x5 
 (was 0x1, writing 0xac01)
 [  734.790132] pci-stub :01:05.0: restoring config space at offset 0x4 
 (was 0x0, writing 0xf9cff000)
 [  734.790142] pci-stub :01:05.0: restoring config space at offset 0x3 
 (was 0x0, writing 0x4010)
 [  734.790153] pci-stub :01:05.0: restoring config space at offset 0x1 
 (was 0x290, writing 0x2900113)
 [  735.173647] assign device 0:1:5.0 failed
 [  735.173688] pci-stub :01:05.0: PCI INT A disabled
 [  768.850519] pci-stub :01:05.0: claimed by stub
 [  775.855376] pci-stub :01:05.0: PCI INT A - GSI 20 (level, low) - IRQ 
 20
 [  775.890080] pci-stub :01:05.0: restoring config space at offset 0xf 
 (was 0x38080100, writing 0x3808010b)
 [  775.890097] pci-stub :01:05.0: restoring config space at offset 0xc 
 (was 0x0, writing 0xf9ce)
 [  775.890115] pci-stub :01:05.0: restoring config space at offset 0x6 
 (was 0x0, writing 0xf9cc)
 [  775.890126] pci-stub :01:05.0: restoring config space at offset 0x5 
 (was 0x1, writing 0xac01)
 [  775.890135] pci-stub :01:05.0: restoring config space at offset 0x4 
 (was 0x0, writing 0xf9cff000)
 [  775.890144] pci-stub :01:05.0: restoring config space at offset 0x3 
 (was 0x0, writing 0x4010)
 [  775.890155] pci-stub :01:05.0: restoring config space at offset 0x1 
 (was 0x290, writing 0x2900113)
 [  776.275188] assign device 0:1:5.0 failed
 [  776.275230] pci-stub :01:05.0: PCI INT

Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
  From: kvm-ow...@vger.kernel.org [kvm-ow...@vger.kernel.org] on behalf of 
  Chris Wright [chr...@sous-sol.org]
  Yes I am using AMD processor and ASUS motherboard. Both of them have the 
  IOMMU support, atleast it is mentioned on the Xen VT-d
 
  Looks like we need some additional info in the wiki.  Care to create an
  account and add the info?
 
 Sure I would love to.

Thanks, you can use the VT-d portion as an example.

The useful dmesg info will be AMD-Vi: messages, the important line
is this one:

AMD-Vi: Enabling IOMMU at ...

(and if you boot with amd_iommu_dump you'll get extra debugging info)

  Thanks a lot for your reply. Yes I am interested in working on GPU 
  pass-through to Virtual Machine. But for now I am trying to pass-through a 
  network card to VM.

Great, GPU assignment has plenty of issues ;)

snip
 It still fails with the same error, here is the screen shot.
 
 r...@prasad-kvm:/sys# uptime 
  17:29:11 up 2 min,  3 users,  load average: 0.93, 0.52, 0.20
 
 r...@prasad-kvm:/sys# ls -l /sys/bus/pci/devices/:01:05.0/driver
 lrwxrwxrwx 1 root root 0 2010-12-21 17:26 
 /sys/bus/pci/devices/:01:05.0/driver - ../../../../bus/pci/drivers/e100
 
 r...@prasad-kvm:/sys# lsmod | grep pci_stub
 
 r...@prasad-kvm:/sys# modprobe pci_stub
 
 r...@prasad-kvm:/sys# lsmod | grep pci_stub
 pci_stub1590  0 
 
 r...@prasad-kvm:/sys# echo 8086 1229  /sys/bus/pci/drivers/pci-stub/new_id 
 
 r...@prasad-kvm:/sys# echo :01:05.0  /sys/bus/pci/drivers/e100/unbind 
 
 r...@prasad-kvm:/sys# echo :01:05.0  
 /sys/bus/pci/drivers/pci-stub/bind 
 
 r...@prasad-kvm:/sys# echo 8086 1229  
 /sys/bus/pci/drivers/pci-stub/remove_id 
 
 r...@prasad-kvm:/sys# ls -l /sys/bus/pci/devices/:01:05.0/driver
 lrwxrwxrwx 1 root root 0 2010-12-21 17:31 
 /sys/bus/pci/devices/:01:05.0/driver - 
 ../../../../bus/pci/drivers/pci-stub
 
 r...@prasad-kvm:~/VMDisks# modprobe kvm_amd
 
 r...@prasad-kvm:~/VMDisks# lsmod | grep -i kvm
 kvm_amd56416  0 
 kvm   348987  1 kvm_amd
 
 r...@prasad-kvm:~/VMDisks# qemu-system-x86_64 -hda Ubuntu-10.10-amd64.img -m 
 1024M -device pci-assign,host=01:05.0
 Failed to assign device (null) : Device or resource busy
 *** The driver 'pci-stub' is occupying your device :01:05.0.
 ***
 *** You can try the following commands to free it:
 ***
 *** $ echo 8086 1229  /sys/bus/pci/drivers/pci-stub/new_id
 *** $ echo :01:05.0  /sys/bus/pci/drivers/pci-stub/unbind
 *** $ echo :01:05.0  /sys/bus/pci/drivers/pci-stub/bind
 *** $ echo 8086 1229  /sys/bus/pci/drivers/pci-stub/remove_id
 ***
 qemu-system-x86_64: -device pci-assign,host=01:05.0: Device 'pci-assign' 
 could not be initialized
 r...@prasad-kvm:~/VMDisks# echo $?
 1
 r...@prasad-kvm:~/VMDisks# 
 
 The VM does not boot.

Are you still seeing the same errors in dmesg?  Your first dmesg showed
that the e100 driver couldn't allocate BAR0:

e100 :01:05.0: BAR 0: can't reserve [mem 0xf9cff000-0xf9cf]

If the host driver can't, then kvm_vm_ioctl_assign_device() will fail as
well.  Seems as if there's a resource conflict on your machine.

Can you include a full dmesg, /proc/iomem, and lspci -vvv -?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
 Besides when I insert the pci_stub module, it emits a messages 
 [   49.197112] pci-stub: invalid id string 
 I don't know why?

It's just broken error message.  The commit b439b1d (PCI: pci-stub: add
pci_stub.ids parameter) created that.  I looked at it very briefly a few
weeks ago and didn't see the issue.  It's cosmetic, and not related to
the failure you are seeing.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
  From: Chris Wright [chr...@sous-sol.org]
  Sent: 21 December 2010 19:29
  To: Prasad Joshi
  Cc: Chris Wright; kvm@vger.kernel.org; Tejun Heo
  Subject: Re: Query on IOMMU
 
  * Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
  Besides when I insert the pci_stub module, it emits a messages
  [   49.197112] pci-stub: invalid id string 
  I don't know why?
 
  It's just broken error message.  The commit b439b1d (PCI: pci-stub: add
  pci_stub.ids parameter) created that.  I looked at it very briefly a few
  weeks ago and didn't see the issue.  It's cosmetic, and not related to
  the failure you are seeing.
 
 Is it okay to add a following line in section 4. unbind device from host 
 kernel driver (example PCI device 01:00.0)
 
 * If the PCI Stub Driver is compiled as module, then load the module using 
 modprobe pci_stub.
 
 When I compiled the kernel I selected it as a kernel module. As the driver 
 was not loaded, I could not see the entries in /sys file system. I could 
 figure that out after reading few things. It will good to add a note to 
 mention this fact.
 
 Let me know if I should add it or not.

Yes, that sounds fine.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Query on IOMMU

2010-12-21 Thread Chris Wright
* Prasad Joshi (p.g.jo...@student.reading.ac.uk) wrote:
 The following condition from __attach_device() returns the error.
 static int __attach_device(struct device *dev,
struct protection_domain *domain)
 {
 ...
 if (alias_data-domain != NULL 
 alias_data-domain != domain)
 goto out_unlock;
 ...
 }

That's the issue.  The IOMMU has a set of page tables for each DeviceID.
For most devices, the DeviceID is the same as the Bus:Dev.Func (the PCI
address) of the device.  But this does not always work.  One example is
when a device is behind a PCI-to-PCI Bridge.  In that case, the device
memory read/write requests (attempts to DMA) will appear as if they came
from the bridge.

00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64

That's the bridge that sits between your e100 and the IOMMU.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Dec 21

2010-12-20 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Dec 14

2010-12-14 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.

No agenda, today's call is cancelled.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Dec 14

2010-12-14 Thread Chris Wright
* Jes Sorensen (jes.soren...@redhat.com) wrote:
 Any chance you could fix your cronjob to send out the CFA a day earlier?
 15 hrs before is a bit short notice.

Sure.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Dec 14

2010-12-13 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix use after release during device attach

2010-12-10 Thread Chris Wright
* Jan Kiszka (jan.kis...@siemens.com) wrote:
  --- a/drivers/pci/intel-iommu.c
  +++ b/drivers/pci/intel-iommu.c
  @@ -3627,9 +3627,9 @@ static int intel_iommu_attach_device(struct
  iommu_domain *domain,
 
pte = dmar_domain-pgd;
if (dma_pte_present(pte)) {
  - free_pgtable_page(dmar_domain-pgd);
dmar_domain-pgd = (struct dma_pte *)
phys_to_virt(dma_pte_addr(pte));

While here, might as well remove the unnecessary cast.

  + free_pgtable_page(pte);
}
dmar_domain-agaw--;
}
 
  Reviewed-by: Sheng Yang sh...@linux.intel.com

Acked-by: Chris Wright chr...@sous-sol.org

  CC iommu mailing list and David.
  
  Ping...
  
  I think this fix also qualifies for stable (.35 and .36).
  
 
 Still not merged?

David, do you plan to pick this one up?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for Dec 7

2010-12-07 Thread Chris Wright
* Jes Sorensen (jes.soren...@redhat.com) wrote:
 On 12/07/10 00:51, Chris Wright wrote:
  Please send in any agenda items you are interested in covering.
  
  thanks,
  -chris
  
 
 No agenda, no replies
 
 Call canceled I presume?

Indeed, next week, then pick up next year...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Dec 7

2010-12-06 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] kvm: keep track of which task is running a KVM vcpu

2010-12-03 Thread Chris Wright
* Rik van Riel (r...@redhat.com) wrote:
 On 12/02/2010 08:18 PM, Chris Wright wrote:
 * Rik van Riel (r...@redhat.com) wrote:
 Keep track of which task is running a KVM vcpu.  This helps us
 figure out later what task to wake up if we want to boost a
 vcpu that got preempted.
 
 Unfortunately there are no guarantees that the same task
 always keeps the same vcpu, so we can only track the task
 across a single run of the vcpu.
 
 So shouldn't it confine to KVM_RUN?  The other vcpu_load callers aren't
 always a vcpu in a useful runnable state.
 
 Yeah, probably.  If you want I can move the setting of
 vcpu-task to kvm_vcpu_ioctl.

Or maybe setting in sched_out and unsetting in sched_in.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

2010-12-03 Thread Chris Wright
* Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
 On Thu, Dec 02, 2010 at 11:14:16AM -0800, Chris Wright wrote:
  Perhaps it should be a VM level option.  And then invert the notion.
  Create one idle domain w/out hlt trap.  Give that VM a vcpu per pcpu
  (pin in place probably).  And have that VM do nothing other than hlt.
  Then it's always runnable according to scheduler, and can consume the
  extra work that CFS wants to give away.
 
 That's not sufficient. Lets we have 3 guests A, B, C that need to be
 rate limited to 25% on a single cpu system. We create this idle guest
 D that is 100% cpu hog as per above definition. Now when one of the
 guest is idle, what ensures that the idle cycles of A is given only
 to D and not partly to B/C?

Yeah, I pictured priorties handling this.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

2010-12-03 Thread Chris Wright
* Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
 On Fri, Dec 03, 2010 at 05:27:52PM +0530, Srivatsa Vaddagiri wrote:
  On Thu, Dec 02, 2010 at 11:14:16AM -0800, Chris Wright wrote:
   Perhaps it should be a VM level option.  And then invert the notion.
   Create one idle domain w/out hlt trap.  Give that VM a vcpu per pcpu
   (pin in place probably).  And have that VM do nothing other than hlt.
   Then it's always runnable according to scheduler, and can consume the
   extra work that CFS wants to give away.
  
  That's not sufficient. Lets we have 3 guests A, B, C that need to be rate
  limited to 25% on a single cpu system. We create this idle guest D that is 
  100%
  cpu hog as per above definition. Now when one of the guest is idle, what 
  ensures
  that the idle cycles of A is given only to D and not partly to B/C?
 
 To tackle this problem, I was thinking of having a fill-thread associated 
 with 
 each vcpu (i.e both belong to same cgroup). Fill-thread consumes idle cycles 
 left by vcpu, but otherwise doesn't compete with it for cycles.

That's what Marcelo's suggestion does w/out a fill thread.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

2010-12-03 Thread Chris Wright
* Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
 On Fri, Dec 03, 2010 at 09:28:25AM -0800, Chris Wright wrote:
  * Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
   On Thu, Dec 02, 2010 at 11:14:16AM -0800, Chris Wright wrote:
Perhaps it should be a VM level option.  And then invert the notion.
Create one idle domain w/out hlt trap.  Give that VM a vcpu per pcpu
(pin in place probably).  And have that VM do nothing other than hlt.
Then it's always runnable according to scheduler, and can consume the
extra work that CFS wants to give away.
   
   That's not sufficient. Lets we have 3 guests A, B, C that need to be
   rate limited to 25% on a single cpu system. We create this idle guest
   D that is 100% cpu hog as per above definition. Now when one of the
   guest is idle, what ensures that the idle cycles of A is given only
   to D and not partly to B/C?
  
  Yeah, I pictured priorties handling this.
 
 All guest are of equal priorty in this case (that's how we are able to divide 
 time into 25% chunks), so unless we dynamically boost D's priority based on 
 how
 idle other VMs are, its not going to be easy!

Right, I think there has to be an external mgmt entity.  Because num
vcpus is not static.  So priorities have to be rebalanaced at vcpu
create/destroy time.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

2010-12-03 Thread Chris Wright
* Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
 On Fri, Dec 03, 2010 at 09:29:06AM -0800, Chris Wright wrote:
  That's what Marcelo's suggestion does w/out a fill thread.
 
 There's one complication though even with that. How do we compute the
 real utilization of VM (given that it will appear to be burning 100% cycles)?
 We need to have scheduler discount the cycles burnt post halt-exit, so more
 stuff is needed than those simple 3-4 lines!

Heh, was just about to say the same thing ;)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v2)

2010-12-03 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 On 12/03/2010 11:58 AM, Chris Wright wrote:
 * Srivatsa Vaddagiri (va...@linux.vnet.ibm.com) wrote:
 On Fri, Dec 03, 2010 at 09:29:06AM -0800, Chris Wright wrote:
 That's what Marcelo's suggestion does w/out a fill thread.
 There's one complication though even with that. How do we compute the
 real utilization of VM (given that it will appear to be burning 100% 
 cycles)?
 We need to have scheduler discount the cycles burnt post halt-exit, so more
 stuff is needed than those simple 3-4 lines!
 Heh, was just about to say the same thing ;)
 
 My first reaction is that it's not terribly important to account the
 non-idle time in the guest because of the use-case for this model.

Depends on the chargeback model.  This would put guest vcpu runtime vs
host running guest vcpu time really out of skew.  ('course w/out steal
and that time it's already out of skew).  But I think most models are
more uptime based rather then actual runtime now.

 Eventually, it might be nice to have idle time accounting but I
 don't see it as a critical feature here.
 
 Non-idle time simply isn't as meaningful here as it normally would
 be.  If you have 10 VMs in a normal environment and saw that you had
 only 50% CPU utilization, you might be inclined to add more VMs.

Who is you?  cloud user, or cloud service provider's scheduler?
On the user side, 50% cpu utilization wouldn't trigger me to add new
VMs.  On the host side, 50% cpu utilization would have to be measure
solely in terms of guest vcpu count.

 But if you're offering deterministic execution, it doesn't matter if
 you only have 50% utilization.  If you add another VM, the guests
 will get exactly the same impact as if they were using 100%
 utilization.

Sorry, didn't follow here?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/3] sched: add yield_to function

2010-12-03 Thread Chris Wright
* Rik van Riel (r...@redhat.com) wrote:
 On 12/02/2010 07:50 PM, Chris Wright wrote:
 +/*
 + * Yield the CPU, giving the remainder of our time slice to task p.
 + * Typically used to hand CPU time to another thread inside the same
 + * process, eg. when p holds a resource other threads are waiting for.
 + * Giving priority to p may help get that resource released sooner.
 + */
 +void yield_to(struct task_struct *p)
 +{
 +   unsigned long flags;
 +   struct sched_entity *se =p-se;
 +   struct rq *rq;
 +   struct cfs_rq *cfs_rq;
 +   u64 remain = slice_remain(current);
 +
 +   rq = task_rq_lock(p,flags);
 +   if (task_running(rq, p) || task_has_rt_policy(p))
 +   goto out;
 +   cfs_rq = cfs_rq_of(se);
 +   se-vruntime -= remain;
 +   if (se-vruntime  cfs_rq-min_vruntime)
 +   se-vruntime = cfs_rq-min_vruntime;
 
 Should these details all be in sched_fair?  Seems like the wrong layer
 here.  And would that condition go the other way?  If new vruntime is
 smaller than min, then it becomes new cfs_rq-min_vruntime?
 
 That would be nice.  Unfortunately, EXPORT_SYMBOL() does
 not seem to work right from sched_fair.c, which is included
 from sched.c instead of being built from the makefile!

add a -yield_to() to properly isolate (only relevant then in
sched_fair)?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] pci: MSI-X capability is 12 bytes, not 16, MSI is 10 bytes

2010-12-03 Thread Chris Wright
* Alex Williamson (alex.william...@redhat.com) wrote:
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  hw/pci.h |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/hw/pci.h b/hw/pci.h
 index 34955d8..7c52637 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -124,8 +124,8 @@ enum {
  
  #define PCI_CAPABILITY_CONFIG_MAX_LENGTH 0x60
  #define PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR 0x40
 -#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0x10
 -#define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x10
 +#define PCI_CAPABILITY_CONFIG_MSI_LENGTH 0xa

This is variable length.

 +#define PCI_CAPABILITY_CONFIG_MSIX_LENGTH 0x0c
  
  typedef int (*msix_mask_notifier_func)(PCIDevice *, unsigned vector,
  int masked);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >