Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-08 Thread Xiao Guangrong
On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote:

 Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
 network connect to the netperf server, the bandwidth of our network
 is 100M.

 
 I see the reason, thank you!
 
 I used virtio-net and you used e1000.
 You are using e1000 to see the MMIO performance change, right?
 

Hi Takuya,

Now, i have done the performance test for virtio-net, the performance is
improved very little, and it is not *regression* ;-)

The reason is, MMIO generated by virtio-net is very very little.

ept = 1:

Before patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 972.21   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 971.01   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 974.44   
16384  87380 

After patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 973.45   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 973.63   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 976.25   
16384  87380 

ept = 0, bypass_guest_pf=0:

Before patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 975.16   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 979.95   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 984.03   
16384  87380 

After patch:
--
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 974.30   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 976.33   
16384  87380 

TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.122.247 (192.168.122.247) port 0 AF_INET
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   60.00 981.45   
16384  87380 
--
To unsubscribe from this list: send the line 

assigned EHCI USB headset not working

2011-06-08 Thread André Weidemann

Hi,
I am using latest clone from qemu-kvm git with kernel 2.6.35.7.

Since assigning PCI soundcards, did not yield any usable results, I 
assigned a USB headset to a Windows7 VM.
I used the following two command lines to enable the EHCI controller 
inside the VM and to assign the device to it:


...
-device usb-ehci,id=ehci \
-device usb-host,vendorid=046d,productid=0a01,bus=ehci.0 \
...

Right after starting the VM I see the following output:
...
Booting from Hard Disk...
Booting from :7c00
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 3 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 3 interfaces claimed for configuration 1
husb: config #1 need 1
husb: 3 interfaces claimed for configuration 1
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
USB stall
husb: config #1 need 1
husb: 2 interfaces claimed for configuration 1


info usb on the monitor looks like this:

  Device 0.1, Port 1, Speed 1.5 Mb/s, Product Microsoft Wireless 
Desktop Rece

  Device 1.1, Port 1, Speed 480 Mb/s, Product Logitech USB Headset


The sound device shows up under Windows7 and drivers are installed 
automatically. Unfortunately it does not work. All the players I tried, 
did not even start playing the sound file, although they detected the 
DirectSound Device.


When connected to a natively running Windows7, the USB headset works 
the way it's supposed to.



Any help is greatly appreciated.

Regards
 André
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it

2011-06-08 Thread Jan Kiszka
On 2011-06-07 20:46, Alex Williamson wrote:
 On Tue, 2011-06-07 at 10:14 +0200, Jan Kiszka wrote:
 On 2011-06-07 10:06, Avi Kivity wrote:
 On 06/07/2011 01:04 AM, Jan Kiszka wrote:
 On 2011-06-06 23:48, Alex Williamson wrote:
  On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote:
  From: Jan Kiszkajan.kis...@siemens.com

  At least kernels 2.6.38 and 2.6.39 do not properly support issuing a
  reset on an assigned device and corrupt its config space. Prevent
  this by checking for a host kernel with the required support,
 tagged by
  the to-be-introduced KVM_CAP_DEVICE_RESET.

  Wouldn't it be easier just to revert ed78661f in 2.6.39 stable?  I
 guess
  we don't have an option to do that for .38 since stable is done there,
  but there are also some intel-iommu breakages that won't make
 stable for
  that release.  It seems like the userspace invoked reset resolves
 known,
  demonstrable issues of devices continuing to DMA into guest memory
 while
  ed78661f is mostly a theoretical change.

 Easier would be this patch. But I don't mind reverting the problematic
 commit in 39, whatever is preferred. We should just resolve the issue
 finally.

 Kernel problems should be solved in the kernel (with exceptions of
 course, but don't see the need here).

 Then please file a revert for stable ASAP.
 
 How's this?  For stable only or course.  Thanks,
 
 Alex
 
 Revert KVM: Save/restore state of assigned PCI device
 
 From: Alex Williamson alex.william...@redhat.com
 
 This reverts ed78661f2614d3c9f69c23e280db3bafdabdf5bb as it assumes
 the saved PCI state will remain valid for the entire length of time
 that it is attached to a guest.  This fails when userspace makes use
 of the pci-sysfs reset interface, which invalidates the saved device
 state, leaving nothing to be restored after the device is reset on
 de-assignment.  This leaves the device in an unusable state.
 
 3.0.0 will add an interface for KVM to save the PCI state in a

[ It will be called 3.0. :) ]

 buffer unaffected by other callers of pci_reset_function(), but the
 most appropriate stable fix seems to be reverting this change since
 the original assumption about the device saved state persisting is
 incorrect.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  virt/kvm/assigned-dev.c |5 +
  1 files changed, 1 insertions(+), 4 deletions(-)
 
 
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index ae72ae6..e3f1235 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -197,8 +197,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
  {
   kvm_free_assigned_irq(kvm, assigned_dev);
  
 - __pci_reset_function(assigned_dev-dev);
 - pci_restore_state(assigned_dev-dev);
 + pci_reset_function(assigned_dev-dev);
  
   pci_release_regions(assigned_dev-dev);
   pci_disable_device(assigned_dev-dev);
 @@ -515,7 +514,6 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
   }
  
   pci_reset_function(dev);
 - pci_save_state(dev);
  
   match-assigned_dev_id = assigned_dev-assigned_dev_id;
   match-host_segnr = assigned_dev-segnr;
 @@ -546,7 +544,6 @@ out:
   mutex_unlock(kvm-lock);
   return r;
  out_list_del:
 - pci_restore_state(dev);
   list_del(match-list);
   pci_release_regions(dev);
  out_disable:
 
 
 

Acked-by: Jan Kiszka jan.kis...@siemens.com

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-08 Thread Alexander Graf

On 07.06.2011, at 15:00, Xiao Guangrong wrote:

 If the page fault is caused by mmio, we can cache the mmio info, later, we do
 not need to walk guest page table and quickly know it is a mmio fault while we
 emulate the mmio instruction
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 ---
 arch/x86/include/asm/kvm_host.h |5 +++
 arch/x86/kvm/mmu.c  |   21 +--
 arch/x86/kvm/mmu.h  |   23 +
 arch/x86/kvm/paging_tmpl.h  |   21 ++-
 arch/x86/kvm/x86.c  |   52 ++
 arch/x86/kvm/x86.h  |   36 +++
 6 files changed, 126 insertions(+), 32 deletions(-)
 
 

[...]

 +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 +gpa_t *gpa, struct x86_exception *exception,
 +bool write)
 +{
 + u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 +
 + if (vcpu_match_mmio_gva(vcpu, gva) 
 +   check_write_user_access(vcpu, write, access,
 +   vcpu-arch.access)) {
 + *gpa = vcpu-arch.mmio_gfn  PAGE_SHIFT |
 + (gva  (PAGE_SIZE - 1));
 + return 1;

Hrm. Let me try to understand what you're doing.

Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk 
either the NPT or the guest PT to resolve the GPA to the fault and send off an 
MMIO.
Within that path, you remember the GVA-GPA mapping for the last MMIO request. 
If the next MMIO request is on the same GVA and kernel/user permissions still 
apply, you simply bypass the resolution. So far so good.

Now, what happens when the GVA is not identical to the GVA it was before? It's 
probably a purely theoretic case, but imagine the following:

  1) guest issues MMIO on GVA 0x1000 (GPA 0x1000)
  2) guest remaps page 0x1000 to GPA 0x2000
  3) guest issues MMIO on GVA 0x1000

That would break with your current implementation, right? It sounds pretty 
theoretic, but imagine the following:

  1) guest user space 1 maps MMIO region A to 0x1000
  2) guest user space 2 maps MMIO region B to 0x1000
  3) guest user space 1 issues MMIO on 0x1000
  4) context switch; going to user space 2
  5) user space 2 issues MMIO on 0x1000

That case could at least be identified by also comparing the guest's cr3 value 
during this hack. And considering things like UIO or microkernels, it's not too 
unlikely :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] KVM: optimize for MMIO handled

2011-06-08 Thread Takuya Yoshikawa
On Wed, 08 Jun 2011 14:22:36 +0800
Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote:

 On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote:
 
  Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT
  network connect to the netperf server, the bandwidth of our network
  is 100M.
 
  
  I see the reason, thank you!
  
  I used virtio-net and you used e1000.
  You are using e1000 to see the MMIO performance change, right?
  
 
 Hi Takuya,
 
 Now, i have done the performance test for virtio-net, the performance is
 improved very little, and it is not *regression* ;-)
 
 The reason is, MMIO generated by virtio-net is very very little.
 

Yes, so I thought you had chosen e1000 for this test :)

Thanks,
  Takuya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-08 Thread Xiao Guangrong
On 06/08/2011 04:22 PM, Alexander Graf wrote:

 +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 +   gpa_t *gpa, struct x86_exception *exception,
 +   bool write)
 +{
 +u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 +
 +if (vcpu_match_mmio_gva(vcpu, gva) 
 +  check_write_user_access(vcpu, write, access,
 +  vcpu-arch.access)) {
 +*gpa = vcpu-arch.mmio_gfn  PAGE_SHIFT |
 +(gva  (PAGE_SIZE - 1));
 +return 1;
 

Hi Alexander,

Thanks for your review!

 Hrm. Let me try to understand what you're doing.
 
 Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk 
 either the NPT or the guest PT to resolve the GPA to the fault and send off 
 an MMIO.
 Within that path, you remember the GVA-GPA mapping for the last MMIO 
 request. If the next MMIO request is on the same GVA and kernel/user 
 permissions still apply, you simply bypass the resolution. So far so good.
 

In this patch, we also introduced vcpu_clear_mmio_info() that clears mmio cache 
info on the vcpu,
and it is called when guest flush tlb (reload CR3 or INVLPG). 

 Now, what happens when the GVA is not identical to the GVA it was before? 
 It's probably a purely theoretic case, but imagine the following:
 
   1) guest issues MMIO on GVA 0x1000 (GPA 0x1000)
   2) guest remaps page 0x1000 to GPA 0x2000
   3) guest issues MMIO on GVA 0x1000
 

If guest modify the page structure, base on x86 tlb rules, we should flush tlb 
to ensure the cpu use
the new mapping.

When you remap GVA 0x1000 to 0x2000, you should flush tlb, then mmio cache info 
is cleared, so the later
access is right.

 That would break with your current implementation, right? It sounds pretty 
 theoretic, but imagine the following:
 
   1) guest user space 1 maps MMIO region A to 0x1000
   2) guest user space 2 maps MMIO region B to 0x1000
   3) guest user space 1 issues MMIO on 0x1000
   4) context switch; going to user space 2
   5) user space 2 issues MMIO on 0x1000
 

Also, when context switched, CR3 is reloaded, mmio cache info can be cleared 
too. right? :-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-08 Thread Alexander Graf

On 08.06.2011, at 10:58, Xiao Guangrong wrote:

 On 06/08/2011 04:22 PM, Alexander Graf wrote:
 
 +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 +  gpa_t *gpa, struct x86_exception *exception,
 +  bool write)
 +{
 +   u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 +
 +   if (vcpu_match_mmio_gva(vcpu, gva) 
 + check_write_user_access(vcpu, write, access,
 + vcpu-arch.access)) {
 +   *gpa = vcpu-arch.mmio_gfn  PAGE_SHIFT |
 +   (gva  (PAGE_SIZE - 1));
 +   return 1;
 
 
 Hi Alexander,
 
 Thanks for your review!
 
 Hrm. Let me try to understand what you're doing.
 
 Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk 
 either the NPT or the guest PT to resolve the GPA to the fault and send off 
 an MMIO.
 Within that path, you remember the GVA-GPA mapping for the last MMIO 
 request. If the next MMIO request is on the same GVA and kernel/user 
 permissions still apply, you simply bypass the resolution. So far so good.
 
 
 In this patch, we also introduced vcpu_clear_mmio_info() that clears mmio 
 cache info on the vcpu,
 and it is called when guest flush tlb (reload CR3 or INVLPG). 

Ah, that one solved the SPT case then of course.

 
 Now, what happens when the GVA is not identical to the GVA it was before? 
 It's probably a purely theoretic case, but imagine the following:
 
  1) guest issues MMIO on GVA 0x1000 (GPA 0x1000)
  2) guest remaps page 0x1000 to GPA 0x2000
  3) guest issues MMIO on GVA 0x1000
 
 
 If guest modify the page structure, base on x86 tlb rules, we should flush 
 tlb to ensure the cpu use
 the new mapping.
 
 When you remap GVA 0x1000 to 0x2000, you should flush tlb, then mmio cache 
 info is cleared, so the later
 access is right.
 
 That would break with your current implementation, right? It sounds pretty 
 theoretic, but imagine the following:
 
  1) guest user space 1 maps MMIO region A to 0x1000
  2) guest user space 2 maps MMIO region B to 0x1000
  3) guest user space 1 issues MMIO on 0x1000
  4) context switch; going to user space 2
  5) user space 2 issues MMIO on 0x1000
 
 
 Also, when context switched, CR3 is reloaded, mmio cache info can be cleared 
 too. right? :-)

Only when using SPT. In the NPT case, you will never see cr3 getting reloaded 
or INVLPG :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-08 Thread Xiao Guangrong
On 06/08/2011 05:18 PM, Alexander Graf wrote:


 Also, when context switched, CR3 is reloaded, mmio cache info can be cleared 
 too. right? :-)
 
 Only when using SPT. In the NPT case, you will never see cr3 getting reloaded 
 or INVLPG :).
 

In the NPT case, we only cache the GPA, GVA is not cached (vcpu.arch.mmio_gva 
is always 0) ;-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path

2011-06-08 Thread Alexander Graf

On 08.06.2011, at 11:33, Xiao Guangrong wrote:

 On 06/08/2011 05:18 PM, Alexander Graf wrote:
 
 
 Also, when context switched, CR3 is reloaded, mmio cache info can be 
 cleared too. right? :-)
 
 Only when using SPT. In the NPT case, you will never see cr3 getting 
 reloaded or INVLPG :).
 
 
 In the NPT case, we only cache the GPA, GVA is not cached (vcpu.arch.mmio_gva 
 is always 0) ;-)

Ah, very nice! :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assigned EHCI USB headset not working

2011-06-08 Thread Gerd Hoffmann

  Hi,


The sound device shows up under Windows7 and drivers are installed
automatically. Unfortunately it does not work. All the players I tried,
did not even start playing the sound file, although they detected the
DirectSound Device.


iso xfer's from usb-linux via ehci are flaky for reasons not yet tracked 
down.


Any reason why you don't just plug in a virtual sound card?  The HDA 
emulation should work fine with win7.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-08 Thread Mark Wu
On 06/01/2011 07:57 PM, Rusty Russell wrote:
 On Wed,  1 Jun 2011 03:24:29 -0400, Mark Wu d...@redhat.com wrote:
 Current index allocation in virtio-blk is based on a monotonically
  increasing variable index. It could cause some confusion about 
 disk name in the case of hot-plugging disks. And it's impossible
 to find the lowest available index by just maintaining a simple
 index. So it's changed to use ida to allocate index via referring
 to the index allocation in scsi disk.
 
 Signed-off-by: Mark Wu d...@redhat.com
 
 Hi Mark,
 
 I don't believe that we do disk probes in parallel, so the spinlock 
 is unnecessary.  Otherwise, this looks good.
 
 Thanks, Rusty.
Hi Rusty,
Yes, I can't figure out an instance of disk probing in parallel either, but as
per the following commit, I think we still need use lock for safety. What's 
your opinion?

commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
Author: Tejun Heo t...@kernel.org
Date:   Sat Feb 21 11:04:45 2009 +0900

[SCSI] sd: revive sd_index_lock

Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
use ida instead of idr incorrectly removed sd_index_lock around id
allocation and free.  idr/ida do have internal locks but they protect
their free object lists not the allocation itself.  The caller is
responsible for that.  This missing synchronization led to the same id
being assigned to multiple devices leading to oops.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-08 Thread Mark Wu
On 06/02/2011 06:34 AM, Michael S. Tsirkin wrote:
 On Wed, Jun 01, 2011 at 04:25:48AM -0400, Mark Wu wrote:
 On 06/01/2011 03:24 AM, Mark Wu wrote:
 -   if (index_to_minor(index)= 1  MINORBITS)
 -   return -ENOSPC;
 +   do {
 +   if (!ida_pre_get(vd_index_ida, GFP_KERNEL))
 +   return err;
 +
 There's a problem in above code: err is not initialized before
 using, so change it to return -1;
 +   do {
 +   if (!ida_pre_get(vd_index_ida, GFP_KERNEL))
 +   return -1;
 
 Not -1. Pls return -ENOMEM.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi Michael,
Thanks for pointing out that. This is the revised patch.


From ffe49efd20938952a09d5a87fe694a6f62937756 Mon Sep 17 00:00:00 2001
From: Mark Wu d...@redhat.com
Date: Wed, 8 Jun 2011 08:25:53 -0400
Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

Current index allocation in virtio-blk is based on a monotonically
increasing variable index. It could cause some confusion about disk
name in the case of hot-plugging disks. And it's impossible to find the
lowest available index by just maintaining a simple index. So it's
changed to use ida to allocate index via referring to the index
allocation in scsi disk.

Signed-off-by: Mark Wu d...@redhat.com
---
 drivers/block/virtio_blk.c |   37 -
 1 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 079c088..f13b758 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -8,10 +8,14 @@
 #include linux/scatterlist.h
 #include linux/string_helpers.h
 #include scsi/scsi_cmnd.h
+#include linux/idr.h
 
 #define PART_BITS 4
 
-static int major, index;
+static int major;
+static DEFINE_SPINLOCK(vd_index_lock);
+static DEFINE_IDA(vd_index_ida);
+
 struct workqueue_struct *virtblk_wq;
 
 struct virtio_blk
@@ -23,6 +27,7 @@ struct virtio_blk
 
/* The disk structure for the kernel. */
struct gendisk *disk;
+   u32 index;
 
/* Request tracking. */
struct list_head reqs;
@@ -343,12 +348,26 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
struct request_queue *q;
int err;
u64 cap;
-   u32 v, blk_size, sg_elems, opt_io_size;
+   u32 v, blk_size, sg_elems, opt_io_size, index;
u16 min_io_size;
u8 physical_block_exp, alignment_offset;
 
-   if (index_to_minor(index) = 1  MINORBITS)
-   return -ENOSPC;
+   do {
+   if (!ida_pre_get(vd_index_ida, GFP_KERNEL))
+   return -ENOMEM;
+
+   spin_lock(vd_index_lock);
+   err = ida_get_new(vd_index_ida, index);
+   spin_unlock(vd_index_lock);
+   } while (err == -EAGAIN);
+
+   if (err)
+   return err;
+
+   if (index_to_minor(index) = 1  MINORBITS) {
+   err =  -ENOSPC;
+   goto out_free_index;
+   }
 
/* We need to know how many segments before we allocate. */
err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX,
@@ -421,7 +440,7 @@ static int __devinit virtblk_probe(struct virtio_device 
*vdev)
vblk-disk-private_data = vblk;
vblk-disk-fops = virtblk_fops;
vblk-disk-driverfs_dev = vdev-dev;
-   index++;
+   vblk-index = index;
 
/* configure queue flush support */
if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
@@ -516,6 +535,10 @@ out_free_vq:
vdev-config-del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
+out_free_index:
+   spin_lock(vd_index_lock);
+   ida_remove(vd_index_ida, index);
+   spin_unlock(vd_index_lock);
 out:
return err;
 }
@@ -538,6 +561,10 @@ static void __devexit virtblk_remove(struct virtio_device 
*vdev)
mempool_destroy(vblk-pool);
vdev-config-del_vqs(vdev);
kfree(vblk);
+
+   spin_lock(vd_index_lock);
+   ida_remove(vd_index_ida, vblk-index);
+   spin_unlock(vd_index_lock);
 }
 
 static const struct virtio_device_id id_table[] = {
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Implementing Virtio Net driver for Solaris

2011-06-08 Thread Conor Murphy
Hi,

I'm in the middle of writing a network driver for Solaris 10 to use a VirtIO
backend. I've gotten the basics working and throughput between two VMs on the
same host is ~ 4x faster then when using the rtls interface.

When I'm looking for is some guidance as to which of the features 
(CSUM,MRG_RXBUF
,HOST_TSO,GUEST_TSO) give the most bang for buck, i.e. which should I look at
implementing first

Thanks,
Conor

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/12] kvm: Clean up stubs

2011-06-08 Thread Jan Kiszka
No one references kvm_check_extension, kvm_has_vcpu_events, and
kvm_has_robust_singlestep outside KVM code.

kvm_update_guest_debug is never called, thus has no job besides
returning an error.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-stub.c |   18 +-
 1 files changed, 1 insertions(+), 17 deletions(-)

diff --git a/kvm-stub.c b/kvm-stub.c
index 1c95452..1e835c6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -42,11 +42,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, 
ram_addr_t size)
 return -ENOSYS;
 }
 
-int kvm_check_extension(KVMState *s, unsigned int extension)
-{
-return 0;
-}
-
 int kvm_init(void)
 {
 return -ENOSYS;
@@ -78,16 +73,6 @@ int kvm_has_sync_mmu(void)
 return 0;
 }
 
-int kvm_has_vcpu_events(void)
-{
-return 0;
-}
-
-int kvm_has_robust_singlestep(void)
-{
-return 0;
-}
-
 int kvm_has_many_ioeventfds(void)
 {
 return 0;
@@ -99,8 +84,7 @@ void kvm_setup_guest_memory(void *start, size_t size)
 
 int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap)
 {
-tb_flush(env);
-return 0;
+return -ENOSYS;
 }
 
 int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/12] Remove unneeded kvm.h from cpu-exec.c

2011-06-08 Thread Jan Kiszka
This was obsoleted by 6792a57bf1.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpu-exec.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 6ddd8dd..9bb6405 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -20,7 +20,6 @@
 #include exec.h
 #include disas.h
 #include tcg.h
-#include kvm.h
 #include qemu-barrier.h
 
 #if defined(__sparc__)  !defined(CONFIG_SOLARIS)
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12] kvm: Drop useless zero-initializations

2011-06-08 Thread Jan Kiszka
Backing KVMState is alreay zero-initialized.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 106eb3a..4a9910a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -764,28 +764,23 @@ int kvm_init(void)
 }
 #endif
 
-s-vcpu_events = 0;
 #ifdef KVM_CAP_VCPU_EVENTS
 s-vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
 #endif
 
-s-robust_singlestep = 0;
 #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
 s-robust_singlestep =
 kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
 #endif
 
-s-debugregs = 0;
 #ifdef KVM_CAP_DEBUGREGS
 s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
 #endif
 
-s-xsave = 0;
 #ifdef KVM_CAP_XSAVE
 s-xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
 #endif
 
-s-xcrs = 0;
 #ifdef KVM_CAP_XCRS
 s-xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] Add kernel header update script

2011-06-08 Thread Jan Kiszka
This helper pulls the required kernel headers for KVM and vhost into a
specified directory. The update is triggered via

scripts/update-linux-headers.sh LINUX_PATH

and will place the output under linux-headers/linux and linux-headers/asm-*.
It also imports the COPYING to care for headers without an explicit license.

CC: Alexander Graf ag...@suse.de
CC: Christoph Hellwig h...@lst.de
CC: Peter Maydell peter.mayd...@linaro.org
CC: Andreas Färber andreas.faer...@web.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 linux-headers/README|2 +
 scripts/update-linux-headers.sh |   55 +++
 2 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/README
 create mode 100755 scripts/update-linux-headers.sh

diff --git a/linux-headers/README b/linux-headers/README
new file mode 100644
index 000..5c9026b
--- /dev/null
+++ b/linux-headers/README
@@ -0,0 +1,2 @@
+Automatically imported Linux kernel headers.
+Only use scripts/update-linux-headers.sh to update!
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
new file mode 100755
index 000..e5f45b2
--- /dev/null
+++ b/scripts/update-linux-headers.sh
@@ -0,0 +1,55 @@
+#!/bin/sh -e
+#
+# Update Linux kernel headers QEMU requires from a specified kernel tree.
+#
+# Copyright (C) 2011 Siemens AG
+#
+# Authors:
+#  Jan Kiszkajan.kis...@siemens.com
+#
+# This work is licensed under the terms of the GNU GPL version 2.
+# See the COPYING file in the top-level directory.
+
+tmpdir=$TMPDIR/.tmp-hdrs-$$
+linux=$1
+output=$2
+
+if [ -z $linux -o ! -d $linux ]; then
+cat  EOF
+usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH]
+
+LINUX_PATH  Linux kernel directory to obtain the headers from
+OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD)
+EOF
+exit 1
+fi
+
+if [ -z $output ]; then
+output=$PWD
+fi
+
+for arch in x86 powerpc s390; do
+make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install
+
+rm -rf $output/linux-headers/asm-$arch
+mkdir -p $output/linux-headers/asm-$arch
+for header in kvm.h kvm_para.h; do
+cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
+done
+if [ $arch == x86 ]; then
+cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
+fi
+done
+
+rm -rf $output/linux-headers/linux
+mkdir -p $output/linux-headers/linux
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+cp $tmpdir/include/linux/$header $output/linux-headers/linux
+done
+if [ -L $linux/source ]; then
+cp $linux/source/COPYING $output/linux-headers
+else
+cp $linux/COPYING $output/linux-headers
+fi
+
+rm -rf $tmpdir
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/12] [uq/master] Import linux headers and some cleanups

2011-06-08 Thread Jan Kiszka
Licensing of the virtio headers is no clarified. So we can finally
resolve the clumbsy and constantly buggy #ifdef'ery around old KVM and
virtio headers. Recent example: current qemu-kvm does not build against
2.6.32 headers.

This series introduces an import mechanism for all required Linux
headers so that the appropriate versions can be kept safely inside the
QEMU tree. I've incorporated all the valuable review comments on the
first version and rebased the result over current uq/master after
rebasing that one over current QEMU master.

Please note that I had no chance to test-build PPC or s390.

Beside the header topic, this series also includes a few assorted KVM
cleanup patches so that my queue is empty again.

CC: Alexander Graf ag...@suse.de
CC: Andreas Färber andreas.faer...@web.de
CC: Christoph Hellwig h...@lst.de
CC: Eduardo Habkost ehabk...@redhat.com
CC: Peter Maydell peter.mayd...@linaro.org

Jan Kiszka (12):
  Add kernel header update script
  Import kernel headers
  Switch build system to accompanied kernel headers
  kvm: Drop CONFIG_KVM_PARA
  kvm: ppc: Drop CONFIG_KVM_PPC_PVR
  kvm: Drop useless zero-initializations
  kvm: Drop KVM_CAP build dependencies
  kvm: x86: Drop KVM_CAP build dependencies
  kvm: ppc: Drop KVM_CAP build dependencies
  kvm: Clean up stubs
  kvm: x86: Pass KVMState to kvm_arch_get_supported_cpuid
  Remove unneeded kvm.h from cpu-exec.c

 Makefile.target  |4 +-
 configure|  149 +--
 cpu-exec.c   |1 -
 hw/kvmclock.c|9 -
 kvm-all.c|   13 -
 kvm-stub.c   |   18 +-
 kvm.h|2 +-
 linux-headers/COPYING|  356 +++
 linux-headers/README |2 +
 linux-headers/asm-powerpc/kvm.h  |  275 
 linux-headers/asm-powerpc/kvm_para.h |   53 +++
 linux-headers/asm-s390/kvm.h |   44 ++
 linux-headers/asm-s390/kvm_para.h|   17 +
 linux-headers/asm-x86/hyperv.h   |  193 
 linux-headers/asm-x86/kvm.h  |  324 ++
 linux-headers/asm-x86/kvm_para.h |   79 
 linux-headers/linux/kvm.h|  804 ++
 linux-headers/linux/kvm_para.h   |   29 ++
 linux-headers/linux/vhost.h  |  130 ++
 linux-headers/linux/virtio_config.h  |   54 +++
 linux-headers/linux/virtio_ring.h|  163 +++
 scripts/update-linux-headers.sh  |   55 +++
 target-i386/cpuid.c  |   20 +-
 target-i386/kvm.c|  123 +-
 target-ppc/kvm.c |   23 -
 target-s390x/cpu.h   |   10 -
 target-s390x/op_helper.c |1 +
 27 files changed, 2630 insertions(+), 321 deletions(-)
 create mode 100644 linux-headers/COPYING
 create mode 100644 linux-headers/README
 create mode 100644 linux-headers/asm-powerpc/kvm.h
 create mode 100644 linux-headers/asm-powerpc/kvm_para.h
 create mode 100644 linux-headers/asm-s390/kvm.h
 create mode 100644 linux-headers/asm-s390/kvm_para.h
 create mode 100644 linux-headers/asm-x86/hyperv.h
 create mode 100644 linux-headers/asm-x86/kvm.h
 create mode 100644 linux-headers/asm-x86/kvm_para.h
 create mode 100644 linux-headers/linux/kvm.h
 create mode 100644 linux-headers/linux/kvm_para.h
 create mode 100644 linux-headers/linux/vhost.h
 create mode 100644 linux-headers/linux/virtio_config.h
 create mode 100644 linux-headers/linux/virtio_ring.h
 create mode 100755 scripts/update-linux-headers.sh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/12] kvm: Drop CONFIG_KVM_PARA

2011-06-08 Thread Jan Kiszka
The kvm_para.h header is now always available.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 configure |1 -
 hw/kvmclock.c |9 -
 target-i386/kvm.c |   26 +-
 3 files changed, 1 insertions(+), 35 deletions(-)

diff --git a/configure b/configure
index 0e1dc46..ed54db9 100755
--- a/configure
+++ b/configure
@@ -3218,7 +3218,6 @@ case $target_arch2 in
   \( $target_arch2 = x86_64 -a $cpu = i386   \) -o \
   \( $target_arch2 = i386   -a $cpu = x86_64 \) \) ; then
   echo CONFIG_KVM=y  $config_target_mak
-  echo CONFIG_KVM_PARA=y  $config_target_mak
   if test $vhost_net = yes ; then
 echo CONFIG_VHOST_NET=y  $config_target_mak
   fi
diff --git a/hw/kvmclock.c b/hw/kvmclock.c
index 004c4ad..692ad18 100644
--- a/hw/kvmclock.c
+++ b/hw/kvmclock.c
@@ -17,8 +17,6 @@
 #include kvm.h
 #include kvmclock.h
 
-#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ADJUST_CLOCK)
-
 #include linux/kvm.h
 #include linux/kvm_para.h
 
@@ -120,10 +118,3 @@ static void kvmclock_register_device(void)
 }
 
 device_init(kvmclock_register_device);
-
-#else /* !(CONFIG_KVM_PARA  KVM_CAP_ADJUST_CLOCK) */
-
-void kvmclock_create(void)
-{
-}
-#endif /* !(CONFIG_KVM_PARA  KVM_CAP_ADJUST_CLOCK) */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1ae2d61..0efcf97 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -18,6 +18,7 @@
 #include sys/utsname.h
 
 #include linux/kvm.h
+#include linux/kvm_para.h
 
 #include qemu-common.h
 #include sysemu.h
@@ -29,10 +30,6 @@
 #include hw/apic.h
 #include ioport.h
 
-#ifdef CONFIG_KVM_PARA
-#include linux/kvm_para.h
-#endif
-//
 //#define DEBUG_KVM
 
 #ifdef DEBUG_KVM
@@ -62,9 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
-#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ASYNC_PF)
 static bool has_msr_async_pf_en;
-#endif
 static int lm_capable_kernel;
 
 static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
@@ -92,7 +87,6 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
 return cpuid;
 }
 
-#ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
 int cap;
 int feature;
@@ -118,7 +112,6 @@ static int get_para_features(CPUState *env)
 
 return features;
 }
-#endif
 
 
 uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
@@ -128,9 +121,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 int i, max;
 uint32_t ret = 0;
 uint32_t cpuid_1_edx;
-#ifdef CONFIG_KVM_PARA
 int has_kvm_features = 0;
-#endif
 
 max = 1;
 while ((cpuid = try_get_cpuid(env-kvm_state, max)) == NULL) {
@@ -140,11 +131,9 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 for (i = 0; i  cpuid-nent; ++i) {
 if (cpuid-entries[i].function == function 
 cpuid-entries[i].index == index) {
-#ifdef CONFIG_KVM_PARA
 if (cpuid-entries[i].function == KVM_CPUID_FEATURES) {
 has_kvm_features = 1;
 }
-#endif
 switch (reg) {
 case R_EAX:
 ret = cpuid-entries[i].eax;
@@ -177,12 +166,10 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 
 qemu_free(cpuid);
 
-#ifdef CONFIG_KVM_PARA
 /* fallback for older kernels */
 if (!has_kvm_features  (function == KVM_CPUID_FEATURES)) {
 ret = get_para_features(env);
 }
-#endif
 
 return ret;
 }
@@ -377,9 +364,7 @@ int kvm_arch_init_vcpu(CPUState *env)
 uint32_t limit, i, j, cpuid_i;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
-#ifdef CONFIG_KVM_PARA
 uint32_t signature[3];
-#endif
 
 env-cpuid_features = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
 
@@ -397,7 +382,6 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 cpuid_i = 0;
 
-#ifdef CONFIG_KVM_PARA
 /* Paravirtualization CPUIDs */
 memcpy(signature, KVMKVMKVM\0\0\0, 12);
 c = cpuid_data.entries[cpuid_i++];
@@ -418,8 +402,6 @@ int kvm_arch_init_vcpu(CPUState *env)
 has_msr_async_pf_en = c-eax  (1  KVM_FEATURE_ASYNC_PF);
 #endif
 
-#endif
-
 cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused);
 
 for (i = 0; i = limit; i++) {
@@ -931,12 +913,10 @@ static int kvm_put_msrs(CPUState *env, int level)
 kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME,
   env-system_time_msr);
 kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
-#if defined(CONFIG_KVM_PARA)  defined(KVM_CAP_ASYNC_PF)
 if (has_msr_async_pf_en) {
 kvm_msr_entry_set(msrs[n++], MSR_KVM_ASYNC_PF_EN,
   env-async_pf_en_msr);
 }
-#endif
 }
 #ifdef KVM_CAP_MCE
 if (env-mcg_cap) {
@@ -1172,11 +1152,9 @@ static int kvm_get_msrs(CPUState *env)
 #endif
 msrs[n++].index = MSR_KVM_SYSTEM_TIME;
 msrs[n++].index = MSR_KVM_WALL_CLOCK;
-#if defined(CONFIG_KVM_PARA)  

[PATCH 08/12] kvm: x86: Drop KVM_CAP build dependencies

2011-06-08 Thread Jan Kiszka
No longer needed with accompanied kernel headers.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 target-i386/kvm.c |   67 ++--
 1 files changed, 3 insertions(+), 64 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0efcf97..1c2d32c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -94,9 +94,7 @@ struct kvm_para_features {
 { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
 { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
 { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
-#ifdef KVM_CAP_ASYNC_PF
 { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
-#endif
 { -1, -1 }
 };
 
@@ -193,7 +191,6 @@ static void kvm_unpoison_all(void *param)
 }
 }
 
-#ifdef KVM_CAP_MCE
 static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
 {
 HWPoisonPage *page;
@@ -239,7 +236,6 @@ static void kvm_mce_inject(CPUState *env, 
target_phys_addr_t paddr, int code)
cpu_x86_support_mca_broadcast(env) ?
MCE_INJECT_BROADCAST : 0);
 }
-#endif /* KVM_CAP_MCE */
 
 static void hardware_memory_error(void)
 {
@@ -249,7 +245,6 @@ static void hardware_memory_error(void)
 
 int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
-#ifdef KVM_CAP_MCE
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
 
@@ -269,9 +264,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void 
*addr)
 }
 kvm_hwpoison_page_add(ram_addr);
 kvm_mce_inject(env, paddr, code);
-} else
-#endif /* KVM_CAP_MCE */
-{
+} else {
 if (code == BUS_MCEERR_AO) {
 return 0;
 } else if (code == BUS_MCEERR_AR) {
@@ -285,7 +278,6 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void 
*addr)
 
 int kvm_arch_on_sigbus(int code, void *addr)
 {
-#ifdef KVM_CAP_MCE
 if ((first_cpu-mcg_cap  MCG_SER_P)  addr  code == BUS_MCEERR_AO) {
 ram_addr_t ram_addr;
 target_phys_addr_t paddr;
@@ -300,9 +292,7 @@ int kvm_arch_on_sigbus(int code, void *addr)
 }
 kvm_hwpoison_page_add(ram_addr);
 kvm_mce_inject(first_cpu, paddr, code);
-} else
-#endif /* KVM_CAP_MCE */
-{
+} else {
 if (code == BUS_MCEERR_AO) {
 return 0;
 } else if (code == BUS_MCEERR_AR) {
@@ -316,7 +306,6 @@ int kvm_arch_on_sigbus(int code, void *addr)
 
 static int kvm_inject_mce_oldstyle(CPUState *env)
 {
-#ifdef KVM_CAP_MCE
 if (!kvm_has_vcpu_events()  env-exception_injected == EXCP12_MCHK) {
 unsigned int bank, bank_num = env-mcg_cap  0xff;
 struct kvm_x86_mce mce;
@@ -342,7 +331,6 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
 
 return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, mce);
 }
-#endif /* KVM_CAP_MCE */
 return 0;
 }
 
@@ -398,9 +386,7 @@ int kvm_arch_init_vcpu(CPUState *env)
 c-eax = env-cpuid_kvm_features  kvm_arch_get_supported_cpuid(env,
 KVM_CPUID_FEATURES, 0, R_EAX);
 
-#ifdef KVM_CAP_ASYNC_PF
 has_msr_async_pf_en = c-eax  (1  KVM_FEATURE_ASYNC_PF);
-#endif
 
 cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused);
 
@@ -481,7 +467,6 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 cpuid_data.cpuid.nent = cpuid_i;
 
-#ifdef KVM_CAP_MCE
 if (((env-cpuid_version  8)0xF) = 6
  (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
  kvm_check_extension(env-kvm_state, KVM_CAP_MCE)  0) {
@@ -508,7 +493,6 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 env-mcg_cap = mcg_cap;
 }
-#endif
 
 qemu_add_vm_change_state_handler(cpu_update_state, env);
 
@@ -600,7 +584,6 @@ int kvm_arch_init(KVMState *s)
  * that case we need to stick with the default, i.e. a 256K maximum BIOS
  * size.
  */
-#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
 if (kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
 /* Allows up to 16M BIOSes. */
 identity_base = 0xfeffc000;
@@ -610,7 +593,7 @@ int kvm_arch_init(KVMState *s)
 return ret;
 }
 }
-#endif
+
 /* Set TSS base one page after EPT identity map. */
 ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, identity_base + 0x1000);
 if (ret  0) {
@@ -745,7 +728,6 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu);
 }
 
-#ifdef KVM_CAP_XSAVE
 #define XSAVE_CWD_RIP 2
 #define XSAVE_CWD_RDP 4
 #define XSAVE_MXCSR   6
@@ -753,11 +735,9 @@ static int kvm_put_fpu(CPUState *env)
 #define XSAVE_XMM_SPACE   40
 #define XSAVE_XSTATE_BV   128
 #define XSAVE_YMMH_SPACE  144
-#endif
 
 static int kvm_put_xsave(CPUState *env)
 {
-#ifdef KVM_CAP_XSAVE
 int i, r;
 struct kvm_xsave* xsave;
 uint16_t cwd, swd, twd, fop;
@@ -788,14 +768,10 @@ static int kvm_put_xsave(CPUState *env)
 r = kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
 qemu_free(xsave);
 return r;
-#else
-return kvm_put_fpu(env);
-#endif
 }
 
 static int 

[PATCH 11/12] kvm: x86: Pass KVMState to kvm_arch_get_supported_cpuid

2011-06-08 Thread Jan Kiszka
kvm_arch_get_supported_cpuid checks for global cpuid restrictions, it
does not require any CPUState reference. Changing its interface allows
to call it before any VCPU is initialized.

CC: Eduardo Habkost ehabk...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm.h   |2 +-
 target-i386/cpuid.c |   20 
 target-i386/kvm.c   |   30 +++---
 3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/kvm.h b/kvm.h
index d565dba..243b063 100644
--- a/kvm.h
+++ b/kvm.h
@@ -157,7 +157,7 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env);
 
 int kvm_check_extension(KVMState *s, unsigned int extension);
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
   uint32_t index, int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 79e7580..e1ae3af 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1144,10 +1144,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 7:
 if (kvm_enabled()) {
-*eax = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EAX);
-*ebx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EBX);
-*ecx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_ECX);
-*edx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EDX);
+KVMState *s = env-kvm_state;
+
+*eax = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_ECX);
+*edx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EDX);
 } else {
 *eax = 0;
 *ebx = 0;
@@ -1179,10 +1181,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 }
 if (kvm_enabled()) {
-*eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
-*ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
-*ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
-*edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
+KVMState *s = env-kvm_state;
+
+*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_ECX);
+*edx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EDX);
 } else {
 *eax = 0;
 *ebx = 0;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1c2d32c..5ebb054 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -98,12 +98,12 @@ struct kvm_para_features {
 { -1, -1 }
 };
 
-static int get_para_features(CPUState *env)
+static int get_para_features(KVMState *s)
 {
 int i, features = 0;
 
 for (i = 0; i  ARRAY_SIZE(para_features) - 1; i++) {
-if (kvm_check_extension(env-kvm_state, para_features[i].cap)) {
+if (kvm_check_extension(s, para_features[i].cap)) {
 features |= (1  para_features[i].feature);
 }
 }
@@ -112,7 +112,7 @@ static int get_para_features(CPUState *env)
 }
 
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
   uint32_t index, int reg)
 {
 struct kvm_cpuid2 *cpuid;
@@ -122,7 +122,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 int has_kvm_features = 0;
 
 max = 1;
-while ((cpuid = try_get_cpuid(env-kvm_state, max)) == NULL) {
+while ((cpuid = try_get_cpuid(s, max)) == NULL) {
 max *= 2;
 }
 
@@ -153,7 +153,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 /* On Intel, kvm returns cpuid according to the Intel spec,
  * so add missing bits according to the AMD spec:
  */
-cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, 
R_EDX);
+cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
 ret |= cpuid_1_edx  0x183f7ff;
 break;
 }
@@ -166,7 +166,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function,
 
 /* fallback for older kernels */
 if (!has_kvm_features  (function == KVM_CPUID_FEATURES)) {
-ret = get_para_features(env);
+ret = get_para_features(s);
 }
 
 return ret;
@@ -349,25 +349,25 @@ int kvm_arch_init_vcpu(CPUState *env)
 struct kvm_cpuid2 cpuid;
 struct kvm_cpuid_entry2 entries[100];
 

[PATCH 03/12] Switch build system to accompanied kernel headers

2011-06-08 Thread Jan Kiszka
This helps reducing our build-time checks for feature support in the
available Linux kernel headers. And it helps users that do not have
sufficiently recent headers installed on their build machine.

Consequently, the patch removes and build-time checks for kvm and vhost
in configure, the --kerneldir switch, and KVM_CFLAGS. Kernel headers are
supposed to be provided by QEMU only.

s390 needs some extra love as it carries redefinitions from kernel
headers.

CC: Alexander Graf ag...@suse.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target  |4 +-
 configure|  151 ++
 target-s390x/cpu.h   |   10 ---
 target-s390x/op_helper.c |1 +
 4 files changed, 21 insertions(+), 145 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 5c22df8..be9c0e8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -14,7 +14,7 @@ endif
 
 TARGET_PATH=$(SRC_PATH)/target-$(TARGET_BASE_ARCH)
 $(call set-vpath, $(SRC_PATH):$(TARGET_PATH):$(SRC_PATH)/hw)
-QEMU_CFLAGS+= -I.. -I$(TARGET_PATH) -DNEED_CPU_H
+QEMU_CFLAGS+= -I.. -I../linux-headers -I$(TARGET_PATH) -DNEED_CPU_H
 
 include $(SRC_PATH)/Makefile.objs
 
@@ -37,8 +37,6 @@ ifndef CONFIG_HAIKU
 LIBS+=-lm
 endif
 
-kvm.o kvm-all.o vhost.o vhost_net.o kvmclock.o: QEMU_CFLAGS+=$(KVM_CFLAGS)
-
 config-target.h: config-target.h-timestamp
 config-target.h-timestamp: config-target.mak
 
diff --git a/configure b/configure
index d38b952..0e1dc46 100755
--- a/configure
+++ b/configure
@@ -113,8 +113,7 @@ curl=
 curses=
 docs=
 fdt=
-kvm=
-kvm_para=
+kvm=yes
 nptl=
 sdl=
 vnc=yes
@@ -130,7 +129,7 @@ xen=
 xen_ctrl_version=
 linux_aio=
 attr=
-vhost_net=
+vhost_net=yes
 xfs=
 
 gprof=no
@@ -165,7 +164,6 @@ guest_base=
 uname_release=
 io_thread=no
 mixemu=no
-kerneldir=
 aix=no
 blobs=yes
 pkgversion=
@@ -712,8 +710,6 @@ for opt do
   ;;
   --disable-blobs) blobs=no
   ;;
-  --kerneldir=*) kerneldir=$optarg
-  ;;
   --with-pkgversion=*) pkgversion= ($optarg)
   ;;
   --disable-docs) docs=no
@@ -1001,7 +997,6 @@ echo   --disable-attr   disables attr and xattr 
support
 echo   --enable-attrenable attr and xattr support
 echo   --enable-io-thread   enable IO thread
 echo   --disable-blobs  disable installing provided firmware blobs
-echo   --kerneldir=PATH look for kernel includes in PATH
 echo   --enable-docsenable documentation build
 echo   --disable-docs   disable documentation build
 echo   --disable-vhost-net  disable vhost-net acceleration support
@@ -1766,124 +1761,6 @@ EOF
 fi
 
 ##
-# kvm probe
-if test $kvm != no ; then
-cat  $TMPC EOF
-#include linux/kvm.h
-#if !defined(KVM_API_VERSION) || KVM_API_VERSION  12 || KVM_API_VERSION  12
-#error Invalid KVM version
-#endif
-EOF
-must_have_caps=KVM_CAP_USER_MEMORY \
-KVM_CAP_DESTROY_MEMORY_REGION_WORKS \
-KVM_CAP_COALESCED_MMIO \
-KVM_CAP_SYNC_MMU \
-   
-if test \( $cpu = i386 -o $cpu = x86_64 \) ; then
-  must_have_caps=$caps \
-  KVM_CAP_SET_TSS_ADDR \
-  KVM_CAP_EXT_CPUID \
-  KVM_CAP_CLOCKSOURCE \
-  KVM_CAP_NOP_IO_DELAY \
-  KVM_CAP_PV_MMU \
-  KVM_CAP_MP_STATE \
-  KVM_CAP_USER_NMI \
- 
-fi
-for c in $must_have_caps ; do
-  cat  $TMPC EOF
-#if !defined($c)
-#error Missing KVM capability $c
-#endif
-EOF
-done
-cat  $TMPC EOF
-int main(void) { return 0; }
-EOF
-  if test $kerneldir !=  ; then
-  kvm_cflags=-I$kerneldir/include
-  if test \( $cpu = i386 -o $cpu = x86_64 \) \
- -a -d $kerneldir/arch/x86/include ; then
-kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include
-   elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then
-   kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include
-   elif test $cpu = s390x -a -d $kerneldir/arch/s390/include ; then
-   kvm_cflags=$kvm_cflags -I$kerneldir/arch/s390/include
-elif test -d $kerneldir/arch/$cpu/include ; then
-kvm_cflags=$kvm_cflags -I$kerneldir/arch/$cpu/include
-  fi
-  else
-kvm_cflags=`$pkg_config --cflags kvm-kmod 2/dev/null`
-  fi
-  if compile_prog $kvm_cflags  ; then
-kvm=yes
-cat  $TMPC EOF
-#include linux/kvm_para.h
-int main(void) { return 0; }
-EOF
-if compile_prog $kvm_cflags  ; then
-  kvm_para=yes
-fi
-  else
-if test $kvm = yes ; then
-  if has awk  has grep; then
-kvmerr=`LANG=C $cc $QEMU_CFLAGS -o $TMPE $kvm_cflags $TMPC 21 \
-   | grep error:  \
-   | awk -F error:  '{if (NR1) printf(, ); printf(%s,$2);}'`
-if test $kvmerr !=  ; then
-  echo -e ${kvmerr}\n\
-NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \
-recent 

[PATCH 09/12] kvm: ppc: Drop KVM_CAP build dependencies

2011-06-08 Thread Jan Kiszka
No longer needed with accompanied kernel headers.

CC: Alexander Graf ag...@suse.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 target-ppc/kvm.c |   14 --
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 0500e3f..21f35af 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -65,18 +65,10 @@ static void kvm_kick_env(void *env)
 
 int kvm_arch_init(KVMState *s)
 {
-#ifdef KVM_CAP_PPC_UNSET_IRQ
 cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
-#endif
-#ifdef KVM_CAP_PPC_IRQ_LEVEL
 cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
-#endif
-#ifdef KVM_CAP_PPC_SEGSTATE
 cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
-#endif
-#ifdef KVM_CAP_PPC_BOOKE_SREGS
 cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
-#endif
 
 if (!cap_interrupt_level) {
 fprintf(stderr, KVM: Couldn't find level irq capability. Expect the 
@@ -217,7 +209,6 @@ int kvm_arch_get_registers(CPUState *env)
 return ret;
 }
 
-#ifdef KVM_CAP_PPC_BOOKE_SREGS
 if (sregs.u.e.features  KVM_SREGS_E_BASE) {
 env-spr[SPR_BOOKE_CSRR0] = sregs.u.e.csrr0;
 env-spr[SPR_BOOKE_CSRR1] = sregs.u.e.csrr1;
@@ -314,7 +305,6 @@ int kvm_arch_get_registers(CPUState *env)
 env-spr[SPR_BOOKE_PID2] = sregs.u.e.impl.fsl.pid2;
 }
 }
-#endif
 }
 
 if (cap_segstate) {
@@ -323,7 +313,6 @@ int kvm_arch_get_registers(CPUState *env)
 return ret;
 }
 
-#ifdef KVM_CAP_PPC_SEGSTATE
 ppc_store_sdr1(env, sregs.u.s.sdr1);
 
 /* Sync SLB */
@@ -346,7 +335,6 @@ int kvm_arch_get_registers(CPUState *env)
 env-IBAT[0][i] = sregs.u.s.ppc32.ibat[i]  0x;
 env-IBAT[1][i] = sregs.u.s.ppc32.ibat[i]  32;
 }
-#endif
 }
 
 return 0;
@@ -525,7 +513,6 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int 
buf_len)
 {
 uint32_t *hc = (uint32_t*)buf;
 
-#ifdef KVM_CAP_PPC_GET_PVINFO
 struct kvm_ppc_pvinfo pvinfo;
 
 if (kvm_check_extension(env-kvm_state, KVM_CAP_PPC_GET_PVINFO) 
@@ -534,7 +521,6 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int 
buf_len)
 
 return 0;
 }
-#endif
 
 /*
  * Fallback to always fail hypercalls:
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/12] kvm: Drop KVM_CAP build dependencies

2011-06-08 Thread Jan Kiszka
No longer needed with accompanied kernel headers. We are only left with
build dependencies that are controlled by kvm arch headers.

CC: Alexander Graf ag...@suse.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c |8 
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 4a9910a..cbc2532 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -757,21 +757,17 @@ int kvm_init(void)
 s-coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO);
 
 s-broken_set_mem_region = 1;
-#ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
 ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
 if (ret  0) {
 s-broken_set_mem_region = 0;
 }
-#endif
 
 #ifdef KVM_CAP_VCPU_EVENTS
 s-vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
 #endif
 
-#ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
 s-robust_singlestep =
 kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
-#endif
 
 #ifdef KVM_CAP_DEBUGREGS
 s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
@@ -850,7 +846,6 @@ static void kvm_handle_io(uint16_t port, void *data, int 
direction, int size,
 }
 }
 
-#ifdef KVM_CAP_INTERNAL_ERROR_DATA
 static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
 fprintf(stderr, KVM internal error.);
@@ -877,7 +872,6 @@ static int kvm_handle_internal_error(CPUState *env, struct 
kvm_run *run)
  */
 return -1;
 }
-#endif
 
 void kvm_flush_coalesced_mmio_buffer(void)
 {
@@ -1008,11 +1002,9 @@ int kvm_cpu_exec(CPUState *env)
 (uint64_t)run-hw.hardware_exit_reason);
 ret = -1;
 break;
-#ifdef KVM_CAP_INTERNAL_ERROR_DATA
 case KVM_EXIT_INTERNAL_ERROR:
 ret = kvm_handle_internal_error(env, run);
 break;
-#endif
 default:
 DPRINTF(kvm_arch_handle_exit\n);
 ret = kvm_arch_handle_exit(env, run);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/12] kvm: ppc: Drop CONFIG_KVM_PPC_PVR

2011-06-08 Thread Jan Kiszka
Required header support is now unconditionally available.

CC: Alexander Graf ag...@suse.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 configure|1 -
 target-ppc/kvm.c |9 -
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/configure b/configure
index ed54db9..0947f98 100755
--- a/configure
+++ b/configure
@@ -3221,7 +3221,6 @@ case $target_arch2 in
   if test $vhost_net = yes ; then
 echo CONFIG_VHOST_NET=y  $config_target_mak
   fi
-  echo CONFIG_KVM_PPC_PVR=y  $config_target_mak
 fi
 esac
 if test $target_bigendian = yes ; then
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index e7b1b10..0500e3f 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -104,21 +104,12 @@ static int kvm_arch_sync_sregs(CPUState *cenv)
 }
 }
 
-#if !defined(CONFIG_KVM_PPC_PVR)
-if (1) {
-fprintf(stderr, kvm error: missing PVR setting capability\n);
-return -ENOSYS;
-}
-#endif
-
 ret = kvm_vcpu_ioctl(cenv, KVM_GET_SREGS, sregs);
 if (ret) {
 return ret;
 }
 
-#ifdef CONFIG_KVM_PPC_PVR
 sregs.pvr = cenv-spr[SPR_PVR];
-#endif
 return kvm_vcpu_ioctl(cenv, KVM_SET_SREGS, sregs);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/12] Add kernel header update script

2011-06-08 Thread Peter Maydell
On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote:
 --- /dev/null
 +++ b/scripts/update-linux-headers.sh
 @@ -0,0 +1,55 @@
 +#!/bin/sh -e
 +#

 +if [ -z $output ]; then
 +    output=$PWD
 +fi

 +    mkdir -p $output/linux-headers/asm-$arch

This script is rather lacking in quoting throughout. As a random
example, this looks like it will break if you run the script from
a directory with a space in the path.

 +tmpdir=$TMPDIR/.tmp-hdrs-$$

Better (safer) to use mktemp, I think.

 if [ -z $linux -o ! -d $linux ]; then

test -o is obsolescent in POSIX; use
 if [ -z $linux ] || ! [ -d $linux ] ; then
instead.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/12] Add kernel header update script

2011-06-08 Thread Jan Kiszka
On 2011-06-08 16:33, Peter Maydell wrote:
 On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote:
 --- /dev/null
 +++ b/scripts/update-linux-headers.sh
 @@ -0,0 +1,55 @@
 +#!/bin/sh -e
 +#
 
 +if [ -z $output ]; then
 +output=$PWD
 +fi
 
 +mkdir -p $output/linux-headers/asm-$arch
 
 This script is rather lacking in quoting throughout. As a random
 example, this looks like it will break if you run the script from
 a directory with a space in the path.

True.

 
 +tmpdir=$TMPDIR/.tmp-hdrs-$$
 
 Better (safer) to use mktemp, I think.

Is that portable? I don't think so.

 
 if [ -z $linux -o ! -d $linux ]; then
 
 test -o is obsolescent in POSIX; use
  if [ -z $linux ] || ! [ -d $linux ] ; then
 instead.
 

OK.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/12] Add kernel header update script

2011-06-08 Thread Peter Maydell
2011/6/8 Jan Kiszka jan.kis...@siemens.com:
 On 2011-06-08 16:33, Peter Maydell wrote:
 On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote:
 +tmpdir=$TMPDIR/.tmp-hdrs-$$

 Better (safer) to use mktemp, I think.

 Is that portable? I don't think so.

We don't expect every random end user to run this script, though,
right? We already use mktemp in scripts/refresh-pxe-roms.sh, for
instance.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/12] Add kernel header update script

2011-06-08 Thread Jan Kiszka
This helper pulls the required kernel headers for KVM and vhost into a
specified directory. The update is triggered via

scripts/update-linux-headers.sh LINUX_PATH

and will place the output under linux-headers/linux and linux-headers/asm-*.
It also imports the COPYING to care for headers without an explicit license.

CC: Alexander Graf ag...@suse.de
CC: Christoph Hellwig h...@lst.de
CC: Peter Maydell peter.mayd...@linaro.org
CC: Andreas Färber andreas.faer...@web.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v2:
 - add quoting
 - use mktemp
 - avoid -o for test

 linux-headers/README|2 +
 scripts/update-linux-headers.sh |   55 +++
 2 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/README
 create mode 100755 scripts/update-linux-headers.sh

diff --git a/linux-headers/README b/linux-headers/README
new file mode 100644
index 000..5c9026b
--- /dev/null
+++ b/linux-headers/README
@@ -0,0 +1,2 @@
+Automatically imported Linux kernel headers.
+Only use scripts/update-linux-headers.sh to update!
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
new file mode 100755
index 000..e43f385
--- /dev/null
+++ b/scripts/update-linux-headers.sh
@@ -0,0 +1,55 @@
+#!/bin/sh -e
+#
+# Update Linux kernel headers QEMU requires from a specified kernel tree.
+#
+# Copyright (C) 2011 Siemens AG
+#
+# Authors:
+#  Jan Kiszkajan.kis...@siemens.com
+#
+# This work is licensed under the terms of the GNU GPL version 2.
+# See the COPYING file in the top-level directory.
+
+tmpdir=`mktemp -d`
+linux=$1
+output=$2
+
+if [ -z $linux ] || ! [ -d $linux ]; then
+cat  EOF
+usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH]
+
+LINUX_PATH  Linux kernel directory to obtain the headers from
+OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD)
+EOF
+exit 1
+fi
+
+if [ -z $output ]; then
+output=$PWD
+fi
+
+for arch in x86 powerpc s390; do
+make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install
+
+rm -rf $output/linux-headers/asm-$arch
+mkdir -p $output/linux-headers/asm-$arch
+for header in kvm.h kvm_para.h; do
+cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
+done
+if [ $arch == x86 ]; then
+cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
+fi
+done
+
+rm -rf $output/linux-headers/linux
+mkdir -p $output/linux-headers/linux
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+cp $tmpdir/include/linux/$header $output/linux-headers/linux
+done
+if [ -L $linux/source ]; then
+cp $linux/source/COPYING $output/linux-headers
+else
+cp $linux/COPYING $output/linux-headers
+fi
+
+rm -rf $tmpdir
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 01/12] Add kernel header update script

2011-06-08 Thread Peter Maydell
On 8 June 2011 16:06, Jan Kiszka jan.kis...@siemens.com wrote:
 +    if [ $arch == x86 ]; then

This should be a single '=' -- '==' is a bashism. The 'checkbashisms'
script (available in 'devscripts' package on debian and ubuntu)
catches this:

cam-vm-266:maverick:testing$ checkbashisms scripts/update-linux-headers.sh
possible bashism in
/home/petmay01/linaro/qemu-from-laptop/qemu/scripts/update-linux-headers.sh
line 39 (should be 'b = a'):
if [ $arch == x86 ]; then


Otherwise looks good.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/12] Add kernel header update script

2011-06-08 Thread Jan Kiszka
This helper pulls the required kernel headers for KVM and vhost into a
specified directory. The update is triggered via

scripts/update-linux-headers.sh LINUX_PATH

and will place the output under linux-headers/linux and linux-headers/asm-*.
It also imports the COPYING to care for headers without an explicit license.

CC: Alexander Graf ag...@suse.de
CC: Christoph Hellwig h...@lst.de
CC: Peter Maydell peter.mayd...@linaro.org
CC: Andreas Färber andreas.faer...@web.de
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v3:
 - remove bashism

Changes in v2:
 - add quoting
 - use mktemp
 - avoid -o for test

 linux-headers/README|2 +
 scripts/update-linux-headers.sh |   55 +++
 2 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/README
 create mode 100755 scripts/update-linux-headers.sh

diff --git a/linux-headers/README b/linux-headers/README
new file mode 100644
index 000..5c9026b
--- /dev/null
+++ b/linux-headers/README
@@ -0,0 +1,2 @@
+Automatically imported Linux kernel headers.
+Only use scripts/update-linux-headers.sh to update!
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
new file mode 100755
index 000..9d2a4bc
--- /dev/null
+++ b/scripts/update-linux-headers.sh
@@ -0,0 +1,55 @@
+#!/bin/sh -e
+#
+# Update Linux kernel headers QEMU requires from a specified kernel tree.
+#
+# Copyright (C) 2011 Siemens AG
+#
+# Authors:
+#  Jan Kiszkajan.kis...@siemens.com
+#
+# This work is licensed under the terms of the GNU GPL version 2.
+# See the COPYING file in the top-level directory.
+
+tmpdir=`mktemp -d`
+linux=$1
+output=$2
+
+if [ -z $linux ] || ! [ -d $linux ]; then
+cat  EOF
+usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH]
+
+LINUX_PATH  Linux kernel directory to obtain the headers from
+OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD)
+EOF
+exit 1
+fi
+
+if [ -z $output ]; then
+output=$PWD
+fi
+
+for arch in x86 powerpc s390; do
+make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install
+
+rm -rf $output/linux-headers/asm-$arch
+mkdir -p $output/linux-headers/asm-$arch
+for header in kvm.h kvm_para.h; do
+cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch
+done
+if [ $arch = x86 ]; then
+cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86
+fi
+done
+
+rm -rf $output/linux-headers/linux
+mkdir -p $output/linux-headers/linux
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+cp $tmpdir/include/linux/$header $output/linux-headers/linux
+done
+if [ -L $linux/source ]; then
+cp $linux/source/COPYING $output/linux-headers
+else
+cp $linux/COPYING $output/linux-headers
+fi
+
+rm -rf $tmpdir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/12] Add kernel header update script

2011-06-08 Thread Peter Maydell
On 8 June 2011 17:22, Jan Kiszka jan.kis...@siemens.com wrote:
 This helper pulls the required kernel headers for KVM and vhost into a
 specified directory. The update is triggered via

    scripts/update-linux-headers.sh LINUX_PATH

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---

 Changes in v3:
  - remove bashism

Thanks; can't see any problems in this version.

Reviewed-by: Peter Maydell peter.mayd...@linaro.org

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-08 Thread Brad Campbell

On 08/06/11 11:59, Eric Dumazet wrote:


Well, a bisection definitely should help, but needs a lot of time in
your case.


Yes. compile, test, crash, walk out to the other building to press 
reset, lather, rinse, repeat.


I need a reset button on the end of a 50M wire, or a hardware watchdog!

Actually it's not so bad. If I turn off slub debugging the kernel panics 
and reboots itself.


This.. :
[2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1
[2.913066] netconsole: device eth0 not up yet, forcing it
[3.660062] Refined TSC clocksource calibration: 3213.422 MHz.
[3.660118] Switching to clocksource tsc
[   63.200273] r8169 :03:00.0: eth0: unable to load firmware patch 
rtl_nic/rtl8168e-1.fw (-2)

[   63.223513] r8169 :03:00.0: eth0: link down
[   63.223556] r8169 :03:00.0: eth0: link down

..is slowing down reboots considerably. 3.0-rc does _not_ like some 
timing hardware in my machine. Having said that, at least it does not 
randomly panic on SCSI like 2.6.39 does.


Ok, I've ruled out TCPMSS. Found out where it was being set and neutered 
it. I've replicated it with only the single DNAT rule.




Could you try following patch, because this is the 'usual suspect' I had
yesterday :

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 46cbd28..9f548f9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int 
ntail,
fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta;
}

+#if 0
if (fastpath
size + sizeof(struct skb_shared_info)= ksize(skb-head)) {
memmove(skb-head + size, skb_shinfo(skb),
@@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int 
ntail,
off = nhead;
goto adjust_others;
}
-
+#endif
data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
if (!data)
goto nodata;





Nope.. that's not it. sigh That might have changed the characteristic 
of the fault slightly, but unfortunately I got caught with a couple of 
fsck's, so I only got to test it 3 times tonight.


It's unfortunate that this is a production system, so I can only take it 
down between about 9pm and 1am. That would normally be pretty 
productive, except that an fsck of a 14TB ext4 can take 30 minutes if it 
panics at the wrong time.


I'm out of time tonight, but I'll have a crack at some bisection 
tomorrow night. Now I just have to go back far enough that it works, and 
be near enough not to have to futz around with /proc /sys or drivers.


I really, really, really appreciate you guys helping me with this. It 
has been driving me absolutely bonkers. If I'm ever in the same town as 
any of you, dinner and drinks are on me.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


restricting users to only power control of VMs

2011-06-08 Thread Iordan Iordanov

Hi,

As the subject suggests, we are wondering whether there is any way to 
restrict certain classes of users from performing any action other than 
powering a VM up and down, and resetting it?


If this can't be done with KVM, does anybody have suggestions on how 
this can be accomplished? The only way I can think of is with a setuid 
binary that can only start VMs and send reset and shutdown commands to 
its monitor socket. However, this does seem hackish and can be insecure 
if it's not written perfectly.


Cheers,
Iordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


differencing disks support

2011-06-08 Thread Iordan Iordanov
Does KVM support or plan to support differencing disks (where there is a 
read-only source disk, and each person running a virtual machine can 
save block-level changes that their virtual machine is making to the 
disk in a separate differencing image)?


If so, can somebody suggest how I may make use of this feature (i.e. 
building the newest version from source, and any other requirements).


Thanks!
Iordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: differencing disks support

2011-06-08 Thread Dan VerWeire
On Wed, Jun 8, 2011 at 4:09 PM, Iordan Iordanov ior...@cdf.toronto.edu wrote:
 Does KVM support or plan to support differencing disks (where there is a
 read-only source disk, and each person running a virtual machine can save
 block-level changes that their virtual machine is making to the disk in a
 separate differencing image)?

 If so, can somebody suggest how I may make use of this feature (i.e.
 building the newest version from source, and any other requirements).

 Thanks!
 Iordan


I believe you could accomplish this with LVM2 snapshots. You would
create an LVM volume with the base install or set of data or whatever.
Then create snapshots of the original volume. Have your guests use the
snapshot volumes.

This page mentions doing it with Xen in the last paragraph:
http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html

If you need support in qemu specifically for some reason, that's out
of my realm. Hope this helps though.

Dan VerWeire
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm tools: Fix some SDL keyboard translations

2011-06-08 Thread Sasha Levin
This patch adds unmapped '', '', '|', '-', '+' and '='
which are quite useful in linux.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/ui/sdl.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index 2e7c395..30fd511 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -20,7 +20,8 @@ static u8 keymap[255] = {
[17]= 0x3e, /* 8 */
[18]= 0x46, /* 9 */
[19]= 0x45, /* 9 */
-
+   [20]= 0x4e, /* - */
+   [21]= 0x55, /* + */
[22]= 0x66, /* backspace */
 
[24]= 0x15, /* q */
@@ -47,6 +48,8 @@ static u8 keymap[255] = {
[46]= 0x4b, /* l */
 
[50]= 0x12, /* left shift */
+   [51]= 0x5d, /* | */
+
 
[52]= 0x1a, /* z */
[53]= 0x22, /* x */
@@ -55,8 +58,9 @@ static u8 keymap[255] = {
[56]= 0x32, /* b */
[57]= 0x31, /* n */
[58]= 0x3a, /* m */
-
-   [61]= 0x4e, /* - */
+   [59]= 0x41, /*  */
+   [60]= 0x49, /*  */
+   [61]= 0x4a, /* / */
[62]= 0x59, /* right shift */
[65]= 0x29, /* space */
 };
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm tools: Use double buffering with SDL

2011-06-08 Thread Sasha Levin
Page flip every time we copy the buffer over instead of invalidating
rects.
This should improve performance by letting hardware do the page
flipping.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/ui/sdl.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index 30fd511..59a6aa6 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -91,7 +91,7 @@ static void *sdl__thread(void *p)
if (!guest_screen)
die(Unable to create SDL RBG surface);
 
-   flags = SDL_HWSURFACE | SDL_ASYNCBLIT | SDL_HWACCEL;
+   flags = SDL_HWSURFACE | SDL_ASYNCBLIT | SDL_HWACCEL | SDL_DOUBLEBUF;
 
screen = SDL_SetVideoMode(fb-width, fb-height, fb-depth, flags);
if (!screen)
@@ -99,7 +99,7 @@ static void *sdl__thread(void *p)
 
for (;;) {
SDL_BlitSurface(guest_screen, NULL, screen, NULL);
-   SDL_UpdateRect(screen, 0, 0, 0, 0);
+   SDL_Flip(screen);
 
while (SDL_PollEvent(ev)) {
switch (ev.type) {
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-08 Thread Eric Dumazet
Le jeudi 09 juin 2011 à 01:02 +0800, Brad Campbell a écrit :
 On 08/06/11 11:59, Eric Dumazet wrote:
 
  Well, a bisection definitely should help, but needs a lot of time in
  your case.
 
 Yes. compile, test, crash, walk out to the other building to press 
 reset, lather, rinse, repeat.
 
 I need a reset button on the end of a 50M wire, or a hardware watchdog!
 
 Actually it's not so bad. If I turn off slub debugging the kernel panics 
 and reboots itself.
 
 This.. :
 [2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1
 [2.913066] netconsole: device eth0 not up yet, forcing it
 [3.660062] Refined TSC clocksource calibration: 3213.422 MHz.
 [3.660118] Switching to clocksource tsc
 [   63.200273] r8169 :03:00.0: eth0: unable to load firmware patch 
 rtl_nic/rtl8168e-1.fw (-2)
 [   63.223513] r8169 :03:00.0: eth0: link down
 [   63.223556] r8169 :03:00.0: eth0: link down
 
 ..is slowing down reboots considerably. 3.0-rc does _not_ like some 
 timing hardware in my machine. Having said that, at least it does not 
 randomly panic on SCSI like 2.6.39 does.
 
 Ok, I've ruled out TCPMSS. Found out where it was being set and neutered 
 it. I've replicated it with only the single DNAT rule.
 
 
  Could you try following patch, because this is the 'usual suspect' I had
  yesterday :
 
  diff --git a/net/core/skbuff.c b/net/core/skbuff.c
  index 46cbd28..9f548f9 100644
  --- a/net/core/skbuff.c
  +++ b/net/core/skbuff.c
  @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, 
  int ntail,
  fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta;
  }
 
  +#if 0
  if (fastpath
  size + sizeof(struct skb_shared_info)= ksize(skb-head)) {
  memmove(skb-head + size, skb_shinfo(skb),
  @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, 
  int ntail,
  off = nhead;
  goto adjust_others;
  }
  -
  +#endif
  data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
  if (!data)
  goto nodata;
 
 
 
 
 Nope.. that's not it. sigh That might have changed the characteristic 
 of the fault slightly, but unfortunately I got caught with a couple of 
 fsck's, so I only got to test it 3 times tonight.
 
 It's unfortunate that this is a production system, so I can only take it 
 down between about 9pm and 1am. That would normally be pretty 
 productive, except that an fsck of a 14TB ext4 can take 30 minutes if it 
 panics at the wrong time.
 
 I'm out of time tonight, but I'll have a crack at some bisection 
 tomorrow night. Now I just have to go back far enough that it works, and 
 be near enough not to have to futz around with /proc /sys or drivers.
 
 I really, really, really appreciate you guys helping me with this. It 
 has been driving me absolutely bonkers. If I'm ever in the same town as 
 any of you, dinner and drinks are on me.

Hmm, I wonder if kmemcheck could help you, but its slow as hell, so not
appropriate for production :(



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-08 Thread Rusty Russell
On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
 Hi Rusty,
 Yes, I can't figure out an instance of disk probing in parallel either, but as
 per the following commit, I think we still need use lock for safety. What's 
 your opinion?
 
 commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
 Author: Tejun Heo t...@kernel.org
 Date:   Sat Feb 21 11:04:45 2009 +0900
 
 [SCSI] sd: revive sd_index_lock
 
 Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
 use ida instead of idr incorrectly removed sd_index_lock around id
 allocation and free.  idr/ida do have internal locks but they protect
 their free object lists not the allocation itself.  The caller is
 responsible for that.  This missing synchronization led to the same id
 being assigned to multiple devices leading to oops.

I'm confused.  Tejun, Greg, anyone can probes happen in parallel?

If so, I'll have to review all my drivers.

Thanks,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio scsi host draft specification, v3

2011-06-08 Thread Rusty Russell
On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini pbonz...@redhat.com wrote:
 Hi all,
 
 after some preliminary discussion on the QEMU mailing list, I present a
 draft specification for a virtio-based SCSI host (controller, HBA, you
 name it).

OK, I'm impressed.  This is very well written and I doesn't make any of
the obvious mistakes wrt. virtio.

Unfortunately, I know almost nothing of SCSI, so I have to leave it to
others to decide if this is actually useful and sufficient.

I assume you have an implementation, as well?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tun: do not put self in waitq if doing a nonblock read

2011-06-08 Thread Amos Kong
Perf shows a relatively high rate (about 8%) race in
spin_lock_irqsave() when doing netperf between external host and
guest. It's mainly becuase the lock contention between the
tun_do_read() and tun_xmit_skb(), so this patch do not put self into
waitqueue to reduce this kind of race. After this patch, it drops to
4%.

Signed-off-by: Jason Wang jasow...@redhat.com
Signed-off-by: Amos Kong ak...@redhat.com
---
 drivers/net/tun.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 74e9405..95dbff4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -817,7 +817,8 @@ static ssize_t tun_do_read(struct tun_struct *tun,
 
tun_debug(KERN_INFO, tun, tun_chr_read\n);
 
-   add_wait_queue(tun-wq.wait, wait);
+   if (unlikely(!noblock))
+   add_wait_queue(tun-wq.wait, wait);
while (len) {
current-state = TASK_INTERRUPTIBLE;
 
@@ -848,7 +849,8 @@ static ssize_t tun_do_read(struct tun_struct *tun,
}
 
current-state = TASK_RUNNING;
-   remove_wait_queue(tun-wq.wait, wait);
+   if (unlikely(!noblock))
+   remove_wait_queue(tun-wq.wait, wait);
 
return ret;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-08 Thread Greg KH
On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote:
 On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
  Hi Rusty,
  Yes, I can't figure out an instance of disk probing in parallel either, but 
  as
  per the following commit, I think we still need use lock for safety. What's 
  your opinion?
  
  commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
  Author: Tejun Heo t...@kernel.org
  Date:   Sat Feb 21 11:04:45 2009 +0900
  
  [SCSI] sd: revive sd_index_lock
  
  Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
  use ida instead of idr incorrectly removed sd_index_lock around id
  allocation and free.  idr/ida do have internal locks but they protect
  their free object lists not the allocation itself.  The caller is
  responsible for that.  This missing synchronization led to the same id
  being assigned to multiple devices leading to oops.
 
 I'm confused.  Tejun, Greg, anyone can probes happen in parallel?
 
 If so, I'll have to review all my drivers.

I know we've tried it in the past, at the PCI device level, and ran into
some issues, but I don't remember if that code ever made it into the
mainline kernel or not.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: Adjust shadow paging to work when SMEP=1 and CR0.WP=0

2011-06-08 Thread Li, Xin
Do we have test cases with guest.wp=0 in KVM test suite?
Thanks!
-Xin

 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Monday, June 06, 2011 9:19 PM
 To: Marcelo Tosatti; kvm@vger.kernel.org; Yang, Wei Y; Shan, Haitao; Li, Xin
 Subject: [PATCH] KVM: Adjust shadow paging to work when SMEP=1 and CR0.WP=0
 
 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
 the kernel to write to them).  Unfortunately this also allows the kernel
 to fetch from these pages, even if CR4.SMEP is set.
 
 Adjust for this by also setting NX on the spte in these circumstances.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
 
 Turned out a little more complicated than I thought.
 
  Documentation/virtual/kvm/mmu.txt |   18 ++
  arch/x86/include/asm/kvm_host.h   |1 +
  arch/x86/kvm/mmu.c|   14 +-
  3 files changed, 32 insertions(+), 1 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/mmu.txt
 b/Documentation/virtual/kvm/mmu.txt
 index f46aa58..5dc972c 100644
 --- a/Documentation/virtual/kvm/mmu.txt
 +++ b/Documentation/virtual/kvm/mmu.txt
 @@ -165,6 +165,10 @@ Shadow pages contain the following information:
  Contains the value of efer.nxe for which the page is valid.
role.cr0_wp:
  Contains the value of cr0.wp for which the page is valid.
 +  role.smep_andnot_wp:
 +Contains the value of cr4.smep  !cr0.wp for which the page is valid
 +(pages for which this is true are different from other pages; see the
 +treatment of cr0.wp=0 below).
gfn:
  Either the guest page table containing the translations shadowed by this
  page, or the base page frame for linear translations.  See role.direct.
 @@ -317,6 +321,20 @@ on fault type:
 
  (user write faults generate a #PF)
 
 +In the first case there is an additional complication if CR4.SMEP is
 +enabled: since we've turned the page into a kernel page, the kernel may now
 +execute it.  We handle this by also setting spte.nx.  If we get a user
 +fetch or read fault, we'll change spte.u=1 and spte.nx=gpte.nx back.
 +
 +To prevent an spte that was converted into a kernel page with cr0.wp=0
 +from being written by the kernel after cr0.wp has changed to 1, we make
 +the value of cr0.wp part of the page role.  This means that an spte created
 +with one value of cr0.wp cannot be used when cr0.wp has a different value -
 +it will simply be missed by the shadow page lookup code.  A similar issue
 +exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after
 +changing cr4.smep to 1.  To avoid this, the value of !cr0.wp  cr4.smep
 +is also made a part of the page role.
 +
  Large pages
  ===
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index fc38eca..c7e7f53 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -205,6 +205,7 @@ union kvm_mmu_page_role {
   unsigned invalid:1;
   unsigned nxe:1;
   unsigned cr0_wp:1;
 + unsigned smep_andnot_wp:1;
   };
  };
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 2d14434..823f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1985,8 +1985,17 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
   spte |= PT_WRITABLE_MASK;
 
   if (!vcpu-arch.mmu.direct_map
 -  !(pte_access  ACC_WRITE_MASK))
 +  !(pte_access  ACC_WRITE_MASK)) {
   spte = ~PT_USER_MASK;
 + /*
 +  * If we converted a user page to a kernel page,
 +  * so that the kernel can write to it when cr0.wp=0,
 +  * then we should prevent the kernel from executing it
 +  * if SMEP is enabled.
 +  */
 + if (!kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
 + spte |= PT64_NX_MASK;
 + }
 
   /*
* Optimization: for pte sync, if spte was writable the hash
 @@ -2955,6 +2964,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
  int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
  {
   int r;
 + bool smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
   ASSERT(vcpu);
   ASSERT(!VALID_PAGE(vcpu-arch.mmu.root_hpa));
 
 @@ -2969,6 +2979,8 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 struct kvm_mmu *context)
 
   vcpu-arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
   vcpu-arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
 + vcpu-arch.mmu.base_role.smep_andnot_wp
 + = smep  !is_write_protection(vcpu);
 
   return r;
  }
 --
 1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html