[PATCH] Merge branch 'qemu-cvs'

2009-02-19 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* qemu-cvs:
  Fix cpu_physical_memory_rw() for 64-bit I/O accesses
  Avoid running audio ctl's when vm is not running
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: external module: backward compatibility for compound_head()

2009-02-19 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/external-module-compat-comm.h 
b/kernel/external-module-compat-comm.h
index 6e9a90a..527ab58 100644
--- a/kernel/external-module-compat-comm.h
+++ b/kernel/external-module-compat-comm.h
@@ -734,3 +734,16 @@ int kvm_pcidev_msi_enabled(struct pci_dev *dev);
 #define kvm_pcidev_msi_enabled(dev)(dev)-msi_enabled
 
 #endif
+
+/* compound_head() was introduced in 2.6.22 */
+
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,22)
+
+static inline struct page *compound_head(struct page *page)
+{
+   if (PageCompound(page))
+   page = (struct page *)page_private(page);
+   return page;
+}
+
+#endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: remove redundant check in mmu_set_spte

2009-02-19 Thread Avi Kivity
From: Joerg Roedel joerg.roe...@amd.com

The following code flow is unnecessary:

if (largepage)
was_rmapped = is_large_pte(*shadow_pte);
 else
was_rmapped = 1;

The is_large_pte() function will always evaluate to one here because the
(largepage  !is_large_pte) case is already handled in the first
if-clause. So we can remove this check and set was_rmapped to one always
here.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
Acked-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ef060ec..c90b4b2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1791,12 +1791,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
pgprintk(hfn old %lx new %lx\n,
 spte_to_pfn(*shadow_pte), pfn);
rmap_remove(vcpu-kvm, shadow_pte);
-   } else {
-   if (largepage)
-   was_rmapped = is_large_pte(*shadow_pte);
-   else
-   was_rmapped = 1;
-   }
+   } else
+   was_rmapped = 1;
}
if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
  dirty, largepage, global, gfn, pfn, speculative, true)) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: With -vnc option, can I still use ctrl+alt + n?

2009-02-19 Thread Neo Jia
On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski man...@wpkg.org wrote:
 Neo Jia schrieb:

 hi,

 I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n
 key to get the qemu system console. Is there anyway to make this work?

 Use Qemu/KVM monitor and it's sendkey function.

Sorry, could you specify the command line option? I don't know how to
get into monitor.

Thanks,
Neo



 For example:

 sendkey alt-f3


 --
 Tomasz Chmielewski
 http://wpkg.org




-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: With -vnc option, can I still use ctrl+alt + n?

2009-02-19 Thread Alexander Graf


On 19.02.2009, at 09:11, Neo Jia neo...@gmail.com wrote:

On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski  
man...@wpkg.org wrote:

Neo Jia schrieb:


hi,

I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n
key to get the qemu system console. Is there anyway to make this  
work?


Use Qemu/KVM monitor and it's sendkey function.


Sorry, could you specify the command line option? I don't know how to
get into monitor.


You could pass -monitor stdio to qemu. That gives you the monitor on  
the shell you started qemu from.


Are you using a mac to access vnc? Try ctrl-apple-2 then.

Alex




Thanks,
Neo




For example:

sendkey alt-f3


--
Tomasz Chmielewski
http://wpkg.org





--
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: With -vnc option, can I still use ctrl+alt + n?

2009-02-19 Thread Dongsheng Song
2009/2/19 Alexander Graf ag...@suse.de:

 You could pass -monitor stdio to qemu. That gives you the monitor on the
 shell you started qemu from.


So if I run kvm as daemon:

kvm -name Linux-x64 -smp 2 -m 2048M -hda hda.img \
-cdrom ../../var/iso/debian-500-amd64-DVD-1.iso \
-net nic,vlan=0,macaddr=52:54:00:12:34:00,model=e1000 \
-net tap,vlan=0,ifname=tap00,script=no \
-net nic,vlan=1,macaddr=52:54:00:12:34:10,model=e1000 \
-net tap,vlan=1,ifname=tap10,script=no \
-vnc :10 -daemonize

After logout the host, I can't get the qemu system console anymore?

---
Dongsheng Song
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: With -vnc option, can I still use ctrl+alt + n?

2009-02-19 Thread Neo Jia
On Thu, Feb 19, 2009 at 12:22 AM, Alexander Graf ag...@suse.de wrote:

 On 19.02.2009, at 09:11, Neo Jia neo...@gmail.com wrote:

 On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski man...@wpkg.org
 wrote:

 Neo Jia schrieb:

 hi,

 I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n
 key to get the qemu system console. Is there anyway to make this work?

 Use Qemu/KVM monitor and it's sendkey function.

 Sorry, could you specify the command line option? I don't know how to
 get into monitor.

 You could pass -monitor stdio to qemu. That gives you the monitor on the
 shell you started qemu from.

Yes this works for me. As I am giving the stdio to serial port, I need
to change the serial port to something else.

I am going to start a another thread about what I just find on qemu monitor.

Thanks,
Neo


 Are you using a mac to access vnc? Try ctrl-apple-2 then.

 Alex



 Thanks,
 Neo



 For example:

 sendkey alt-f3


 --
 Tomasz Chmielewski
 http://wpkg.org




 --
 I would remember that if researchers were not ambitious
 probably today we haven't the technology we are using!
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu info registers doesn't match the one I saw from kgdb?

2009-02-19 Thread Neo Jia
hi,

I am seeing something different between info registers from qemu
monitor window vs. kgdb. This is a 32-bit Linux guest running on
KVM-84.

When I just break into the guest kernel with kgdb, I tried the
follwoing commands:

(qemu) info registers
EAX=00010060 EBX=c0471e3c ECX= EDX=02fd
ESI=02fd EDI=c04c5d20 EBP=c0471ddc ESP=c0471ddc
EIP=c021129b EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b   00c0f300
CS =0060   00c09b00
SS =0068   00c09300
DS =007b   00c0f300
FS =   
GS =   
LDT=   
TR =   8b00
GDT= c0407a80 00ff
IDT= c0464000 07ff
CR0=80050033 CR2= CR3=004aa000 CR4=
DR0= DR1= DR2= DR3=
DR6=0ff0 DR7=0400
FCW=037f FSW= [ST=0] FTW=00 MXCSR=
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=

But from Windbg, I got:

(gdb) info registers
eax0x0  0x0
ecx0xc  0xc
edx0x0  0x0
ebx0x0  0x0
esp0xc0471f14   0xc0471f14
ebp0xc0471fc0   0xc0471fc0
esi0xc04ac07a   0xc04ac07a
edi0xc04ad1f9   0xc04ad1f9
eip0xc047a853   0xc047a853 setup_arch+1036
eflags 0x86 [ PF SF ]
cs 0x60 0x60
ss 0x68 0x68
ds 0xc049007b   0xc049007b
es 0x7b 0x7b
fs 0x   0x
gs 0x   0x

So, which one is correct? Do we still maintain the info registers on qemu?

Thanks,
Neo

-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)

2009-02-19 Thread Gerd Hoffmann
Anthony Liguori wrote:
 Are you suggesting that one should use cpufreq on a CPU without a
 constant tsc?  Isn't this just asking for trouble?

Depends on the (guest) clock source ;)

tsc isn't going to do well obviously.

kvmclock is designed to handle tsc frequency changes just fine.
And with the kvm-84 kernel module it actually works correctly.

HTH,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)

2009-02-19 Thread Tomasz Chmielewski

Gerd Hoffmann schrieb:

Anthony Liguori wrote:

Are you suggesting that one should use cpufreq on a CPU without a
constant tsc?  Isn't this just asking for trouble?


Depends on the (guest) clock source ;)

tsc isn't going to do well obviously.

kvmclock is designed to handle tsc frequency changes just fine.
And with the kvm-84 kernel module it actually works correctly.


So with Linux virtio guests I may have luck, but not so with Windows, 
which can't (yet?) use kvm-clock. Correct?
(it may be some time before I'm able to upgrade and check how it really 
works).



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] kvm: bios: make MMIO address page aligned in guest

2009-02-19 Thread Han, Weidong
MMIO of some devices are not page aligned, such as some EHCI
controllers and virtual Realtek NIC in guest. Current guest
bios doesn't guarantee the start address of MMIO page aligned.
This may result in failure of device assignment, because KVM
only allow to register page aligned memory slots. For example,
it fails to assign EHCI controller (its MMIO size is 1KB) with
virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
the EHCI controller will starts from 0xf2001400.

MMIO addresses in guest are allocated in guest bios. This patch
makes MMIO address page aligned in bios, then fixes the issue.

Signed-off-by: Weidong Han weidong@intel.com
---
 bios/rombios32.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/bios/rombios32.c b/bios/rombios32.c
index 9d2eaaa..4dea066 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -967,6 +967,9 @@ static void pci_bios_init_device(PCIDevice *d)
 *paddr = (*paddr + size - 1)  ~(size - 1);
 pci_set_io_region_addr(d, i, *paddr);
 *paddr += size;
+/* make memory address page aligned */
+if (!(val  PCI_ADDRESS_SPACE_IO))
+*paddr = (*paddr + 0xfff)  0xf000;
 }
 }
 break;
-- 
1.6.0.4


0001-kvm-bios-make-MMIO-address-page-aligned-in-guest.patch
Description: 0001-kvm-bios-make-MMIO-address-page-aligned-in-guest.patch


Re: qemu info registers doesn't match the one I saw from kgdb?

2009-02-19 Thread Jan Kiszka
Neo Jia wrote:
 hi,
 
 I am seeing something different between info registers from qemu
 monitor window vs. kgdb. This is a 32-bit Linux guest running on
 KVM-84.
 
 When I just break into the guest kernel with kgdb, I tried the
 follwoing commands:
 
 (qemu) info registers
 EAX=00010060 EBX=c0471e3c ECX= EDX=02fd
 ESI=02fd EDI=c04c5d20 EBP=c0471ddc ESP=c0471ddc
 EIP=c021129b EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =007b   00c0f300
 CS =0060   00c09b00
 SS =0068   00c09300
 DS =007b   00c0f300
 FS =   
 GS =   
 LDT=   
 TR =   8b00
 GDT= c0407a80 00ff
 IDT= c0464000 07ff
 CR0=80050033 CR2= CR3=004aa000 CR4=
 DR0= DR1= DR2= DR3=
 DR6=0ff0 DR7=0400
 FCW=037f FSW= [ST=0] FTW=00 MXCSR=
 FPR0=  FPR1= 
 FPR2=  FPR3= 
 FPR4=  FPR5= 
 FPR6=  FPR7= 
 XMM00= XMM01=
 XMM02= XMM03=
 XMM04= XMM05=
 XMM06= XMM07=
 
 But from Windbg, I got:
 
 (gdb) info registers
 eax0x0  0x0
 ecx0xc  0xc
 edx0x0  0x0
 ebx0x0  0x0
 esp0xc0471f14   0xc0471f14
 ebp0xc0471fc0   0xc0471fc0
 esi0xc04ac07a   0xc04ac07a
 edi0xc04ad1f9   0xc04ad1f9
 eip0xc047a853   0xc047a853 setup_arch+1036
 eflags 0x86 [ PF SF ]
 cs 0x60 0x60
 ss 0x68 0x68
 ds 0xc049007b   0xc049007b
 es 0x7b 0x7b
 fs 0x   0x
 gs 0x   0x
 
 So, which one is correct? Do we still maintain the info registers on qemu?

Yes, we do maintain them (for now only in the kvm tree, upstream is yet
lacking a few patches). But you have to keep in mind that, when you take
a snapshot of the guest running inside Windbg via info registers (or
via the built-in gdbstub), you actually debug Windbg itself, no longer
the guest kernel code Windbg is interrupting. That's why you see
different EIP values...

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)

2009-02-19 Thread Gerd Hoffmann
Tomasz Chmielewski wrote:
 So with Linux virtio guests I may have luck, but not so with Windows,
 which can't (yet?) use kvm-clock. Correct?

tsc isn't the only clocksource, there are also hpet and acpi (pm timer),
they shouldn't have trouble with tsc freq changes.  Dunno what windows
uses by default.

cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Rusty Russell
On Thursday 19 February 2009 02:54:06 Arnd Bergmann wrote:
 On Wednesday 18 February 2009, Rusty Russell wrote:
 
  2) Direct NIC attachment
  This is particularly interesting with SR-IOV or other multiqueue nics,
  but for boutique cases or benchmarks, could be for normal NICs.  So
  far I have some very sketched-out patches: for the attached nic 
  dev_alloc_skb() gets an skb from the guest (which supplies them via
  some kind of AIO interface), and a branch in netif_receive_skb()
  which returned it to the guest.  This bypasses all firewalling in
  the host though; we're basically having the guest process drive
  the NIC directly.   
 
 If this is not passing the PCI device directly to the guest, but
 uses your concept, wouldn't it still be possible to use the firewalling
 in the host? You can always inspect the headers, drop the frame, etc
 without copying the whole frame at any point.

It's possible, but you don't want routing or parsing, etc: the NIC
is just directly attached to the guest.

You could do it in qemu or whatever, but it would not be the kernel scheme
(netfilter/iptables).

  3) Direct interguest networking
  Anthony has been thinking here: vmsplice has already been mentioned.
  The idea of passing directly from one guest to another is an
  interesting one: using dma engines might be possible too.  Again,
  host can't firewall this traffic.  Simplest as a dedicated internal
  lan NIC, but we could theoretically do a fast-path for certain MAC
  addresses on a general guest NIC. 
 
 Another option would be to use an SR-IOV adapter from multiple guests,
 with a virtual ethernet bridge in the adapter. This moves the overhead
 from the CPU to the bus and/or adapter, so it may or may not be a real
 benefit depending on the workload.

Yes, I guess this should work.  Even different SR-IOV adapters will simply
send to one another.  I'm not sure this obviates the desire to have direct
inter-guest which is more generic though.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm mmu: fix another largepage memory leak

2009-02-19 Thread Joerg Roedel
In the paging_fetch function rmap_remove is called after setting a large
pte to non-present. This causes rmap_remove to not drop the reference to
the large page. The result is a memory leak of that page.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/paging_tmpl.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7314c09..0f11792 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -306,9 +306,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
continue;
 
if (is_large_pte(*sptep)) {
+   rmap_remove(vcpu-kvm, sptep);
set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
kvm_flush_remote_tlbs(vcpu-kvm);
-   rmap_remove(vcpu-kvm, sptep);
}
 
if (level == PT_DIRECTORY_LEVEL
-- 
1.5.6.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Rusty Russell
On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
 On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
  
  2) Direct NIC attachment This is particularly interesting with SR-IOV or
  other multiqueue nics, but for boutique cases or benchmarks, could be for
  normal NICs.  So far I have some very sketched-out patches: for the
  attached nic dev_alloc_skb() gets an skb from the guest (which supplies
  them via some kind of AIO interface), and a branch in netif_receive_skb()
  which returned it to the guest.  This bypasses all firewalling in the
  host though; we're basically having the guest process drive the NIC
  directly.
 
 Hi Rusty,
 
 Can I clarify that the idea with utilising SR-IOV would be to assign
 virtual functions to guests? That is, something conceptually similar to
 PCI pass-through in Xen (although I'm not sure that anyone has virtual
 function pass-through working yet).

Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes 
migrate complicated (if not impossible), and requires emulation or the same NIC 
on the destination host.

This would be the *host* seeing the virtual functions as multiple NICs, then
the ability to attach a given NIC directly to a process.

This isn't guest-visible: the kvm process is configured to connect directly to 
a NIC, rather than (say) bridging through the host.

 If so, wouldn't this also be useful
 on machines that have multiple NICs?

Yes, but mainly as a benchmark hack AFAICT :)

Hope that clarifies,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Chris Wright
* Simon Horman (ho...@verge.net.au) wrote:
 On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
  2) Direct NIC attachment This is particularly interesting with SR-IOV or
  other multiqueue nics, but for boutique cases or benchmarks, could be for
  normal NICs.  So far I have some very sketched-out patches: for the
  attached nic dev_alloc_skb() gets an skb from the guest (which supplies
  them via some kind of AIO interface), and a branch in netif_receive_skb()
  which returned it to the guest.  This bypasses all firewalling in the
  host though; we're basically having the guest process drive the NIC
  directly.
 
 Can I clarify that the idea with utilising SR-IOV would be to assign
 virtual functions to guests? That is, something conceptually similar to
 PCI pass-through in Xen (although I'm not sure that anyone has virtual
 function pass-through working yet). If so, wouldn't this also be useful
 on machines that have multiple NICs?

This would be the typical usecase for sr-iov.  But I think Rusty is
referring to giving a nic directly to a guest but the guest is still
seeing a virtio nic (not pass-through/device-assignment).  So there's
no bridge, and zero copy so the dma buffers are supplied by guest,
but host has the driver for the physical nic or the VF.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm mmu: fix another largepage memory leak

2009-02-19 Thread Marcelo Tosatti

On Thu, Feb 19, 2009 at 12:18:56PM +0100, Joerg Roedel wrote:
 In the paging_fetch function rmap_remove is called after setting a large
 pte to non-present. This causes rmap_remove to not drop the reference to
 the large page. The result is a memory leak of that page.
 
 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  arch/x86/kvm/paging_tmpl.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 7314c09..0f11792 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -306,9 +306,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t 
 addr,
   continue;
  
   if (is_large_pte(*sptep)) {
 + rmap_remove(vcpu-kvm, sptep);
   set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
   kvm_flush_remote_tlbs(vcpu-kvm);
 - rmap_remove(vcpu-kvm, sptep);
   }
  
   if (level == PT_DIRECTORY_LEVEL
 -- 

ACK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)

2009-02-19 Thread Tomasz Chmielewski

Marcelo Tosatti schrieb:

On Wed, Feb 18, 2009 at 09:02:31PM +0100, Tomasz Chmielewski wrote:

Marcelo Tosatti schrieb:

On Wed, Feb 18, 2009 at 08:18:50PM +0100, Tomasz Chmielewski wrote:

Marcelo Tosatti schrieb:

- what CPU frequency will the guests show? Current host 
frequency? Host  frequency from the moment the guest booted (i.e. 
right now the guest  will show 1GHz even if the host is running 
at 2GHz, or the way around)?

Host frequency from the moment the guest booted, since the guest does
not receive frequency change notifications.
Is it possible (or is it planned) to pass frequency to the guest (the 
 one which is displayed in /proc/cpuinfo)?

Possible, not planned AFAIK.

Possible, right now? How?


Write a paravirt notification scheme.


That's a bit low level.

I was thinking of a parameter to kvm (binary) which would pass the value 
to the guest.



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Arnd Bergmann
On Thursday 19 February 2009, Rusty Russell wrote:

 Not quite: I think PCI passthrough IMHO is the *wrong* way to do it:
 it makes migrate complicated (if not impossible), and requires
 emulation or the same NIC on the destination host.  
 
 This would be the *host* seeing the virtual functions as multiple
 NICs, then the ability to attach a given NIC directly to a process.

I guess what you mean then is what Intel calls VMDq, not SR-IOV.
Eddie has some slides about this at
http://docs.huihoo.com/kvm/kvmforum2008/kdf2008_7.pdf .

The latest network cards support both operation modes, and it
appears to me that there is a place for both. VMDq gives you
the best performance without limiting flexibility, while SR-IOV
performance in theory can be even better, but sacrificing a
lot of flexibility and potentially local (guest-to-gest)
performance.

AFAICT, any card that supports SR-IOV should also allow a VMDq
like model, as you describe.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm/qemu: use statfs to determine size of huge pages

2009-02-19 Thread Joerg Roedel
The current method of finding out the size of huge pages does not work
reliable anymore. Current Linux supports more than one huge page size
but /proc/meminfo only show one of the supported sizes.
To find out the real page size used can be found by calling statfs. This
patch changes kvm/qemu to use statfs instead of parsing /proc/meminfo.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 qemu/sysemu.h |2 +-
 qemu/vl.c |   42 --
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/qemu/sysemu.h b/qemu/sysemu.h
index 19464cf..4333495 100644
--- a/qemu/sysemu.h
+++ b/qemu/sysemu.h
@@ -99,7 +99,7 @@ extern int graphic_rotate;
 extern int no_quit;
 extern int semihosting_enabled;
 extern int old_param;
-extern int hpagesize;
+extern long hpagesize;
 extern const char *bootp_filename;
 
 #ifdef USE_KQEMU
diff --git a/qemu/vl.c b/qemu/vl.c
index bbd7aa3..b8c1162 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -61,6 +61,7 @@
 #include sys/ioctl.h
 #include sys/resource.h
 #include sys/socket.h
+#include sys/vfs.h
 #include netinet/in.h
 #include net/if.h
 #if defined(__NetBSD__)
@@ -254,7 +255,7 @@ const char *mem_path = NULL;
 #ifdef MAP_POPULATE
 int mem_prealloc = 1;  /* force preallocation of physical target memory */
 #endif
-int hpagesize = 0;
+long hpagesize = 0;
 const char *cpu_vendor_string;
 #ifdef TARGET_ARM
 int old_param = 0;
@@ -4717,32 +4718,29 @@ void qemu_get_launch_info(int *argc, char ***argv, int 
*opt_daemonize, const cha
 }
 
 #ifdef USE_KVM
-static int gethugepagesize(void)
+
+#define HUGETLBFS_MAGIC   0x958458f6
+
+static long gethugepagesize(const char *path)
 {
-int ret, fd;
-char buf[4096];
-const char *needle = Hugepagesize:;
-char *size;
-unsigned long hugepagesize;
+struct statfs fs;
+int ret;
 
-fd = open(/proc/meminfo, O_RDONLY);
-if (fd  0) {
-   perror(open);
-   exit(0);
+do {
+   ret = statfs(path, fs);
+} while (ret != 0  errno == EINTR);
+
+if (ret != 0) {
+   perror(statfs);
+   return 0;
 }
 
-ret = read(fd, buf, sizeof(buf));
-if (ret  0) {
-   perror(read);
-   exit(0);
+if (fs.f_type != HUGETLBFS_MAGIC) {
+   fprintf(stderr, Path not on HugeTLBFS: %s\n, path);
+   return 0;
 }
 
-size = strstr(buf, needle);
-if (!size)
-   return 0;
-size += strlen(needle);
-hugepagesize = strtol(size, NULL, 0);
-return hugepagesize;
+return fs.f_bsize;
 }
 
 static void *alloc_mem_area(size_t memory, unsigned long *len, const char 
*path)
@@ -4762,7 +4760,7 @@ static void *alloc_mem_area(size_t memory, unsigned long 
*len, const char *path)
 if (asprintf(filename, %s/kvm.XX, path) == -1)
return NULL;
 
-hpagesize = gethugepagesize() * 1024;
+hpagesize = gethugepagesize(path);
 if (!hpagesize)
return NULL;
 
-- 
1.5.6.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3.14] Report IRQ injection status to userspace.

2009-02-19 Thread Avi Kivity

Gleb Natapov wrote:

IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or = 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.
  


Applied, thanks.  I hacked kvm_set_msi() to return 1 always, please 
follow up with a fix to that.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recent kvm and vmware server comparisons?

2009-02-19 Thread Hans de Bruin

Martin Maurer wrote:

I suppose no-one has any?



VMware includes in its EULA (End User License Agreement) a prohibition for any 
licensee to publish benchmark results without VMware's approval.
(see https://www.vmware.com/tryvmware/eula.php)

Maybe this is a reason why all published VMWare benchmarks looks quite similar 
:-)

I would love to see a comparison but due to this restrictions it´s hard to get 
independent results.



Why compare kvm to vmware and not to real hardware? The results can than 
be compared to vmware/hardware and hyper-v/hardware.


--
Hans
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2617499 ] Patch from upstream attached

2009-02-19 Thread SourceForge.net
Bugs item #2617499, was opened at 2009-02-19 12:33
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jeff (toxxic)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch from upstream attached

Initial Comment:

This is a patch, derived from the QEMU subversion repository.  It fixes this 
problem, which could potentially cause data corruption.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2617499 ] Patch from upstream attached

2009-02-19 Thread SourceForge.net
Bugs item #2617499, was opened at 2009-02-19 12:33
Message generated for change (Comment added) made by toxxic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Deleted
Resolution: None
Priority: 1
Private: No
Submitted By: Jeff (toxxic)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch from upstream attached

Initial Comment:

This is a patch, derived from the QEMU subversion repository.  It fixes this 
problem, which could potentially cause data corruption.

--

Comment By: Jeff (toxxic)
Date: 2009-02-19 12:35

Message:
Blah...  This was supposed to be attached to bug #2556746.  



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2556746 ] FreeBSD/PC-BSD text screen corruption

2009-02-19 Thread SourceForge.net
Bugs item #2556746, was opened at 2009-02-02 04:19
Message generated for change (Comment added) made by toxxic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2556746group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tim Knowles (knowlet)
Assigned to: Nobody/Anonymous (nobody)
Summary: FreeBSD/PC-BSD text screen corruption 

Initial Comment:
Using either kvm-83, kvm-82 or kvm-81 I am unable to install FreeBSD or PC BSD 
due to screen corruption (screenshot attached).  The initial boot menu is shown 
and is legible.  Once you have selected the boot option  the boot process 
continues the screen becomes corrupted.  I initially discovered the problem 
when setting up an LVM backed guest in virt-manager but I have attached a 
minimal cmd line below that allows you to trigger it.

1) It would appear that this problem was introduced in kvm-81 (kvm-80 does not 
exhibit the problem with FBSD or PCBSD but I have not tested any other versions 
of kvm)
2) If I use the -no-kvm switch with KVM-83 this problem does not occur.

Details:
Host: 1 x Intel Core i7 920, Fedora 10 64bit. 6GB memory (Dell Studio XPS 435)
kvm-83: self compiled - gcc version 4.3.2 20081105 (Red Hat 4.3.2-7)
cmd line:  /usr/local/bin/qemu-system-x86_64 -m 512 -cdrom 
7.1-RELEASE-amd64-dvd1.iso

Guests:
FreeBSD 7,1
PC-BSD 7.0.2

PS: I'd also like to add my thanks for creating KVM, it's fabulous tool. Many 
thanks

--

Comment By: Jeff (toxxic)
Date: 2009-02-19 12:56

Message:

The QEMU subversion browser can generate a patch for this issue:

http://svn.savannah.gnu.org/viewvc/trunk/exec.c?r1=6601r2=6628pathrev=6628root=qemuview=patch

This patch installs cleanly against qemu/exec.c in KVM-81 and KVM-84.  

--

Comment By: Aurelien Jarno (aurel32)
Date: 2009-02-18 13:38

Message:
This is fixed in revision 6628 of QEMU, so probably soon in KVM. Any
workaround to this bug as suggested ahead is a bad idea, as the screen is
probably not the only affected by this bug. This means that some data can
be corrupted.

--

Comment By: Radek Hladik (kedarius)
Date: 2009-02-04 11:06

Message:
Confirming the problem too. 
kvm-83-2.fc11.x86_64
libvirt-0.6.0-1.fc11.x86_64
virt-manager-0.6.1-1.fc11.x86_64
qemu-0.9.1-12.fc11.x86_64

For the libvirt and virt-manager users, how they can use the workaround
mentioned by toxxic:
Press 6 in the boot, type 
set console=comconsole
use view-serial consoles and type 
boot
(choose xterm as term type)



--

Comment By: Jeff (toxxic)
Date: 2009-02-03 23:57

Message:

I can confirm this happens, when using VNC for the console.

Here's a workaround:

Start kvm with a -serial flag.  You're going to use it as a serial
console.
qemu-system-x86_64 -serial telnet::2226,server,nowait -cdrom
7.1-RELEASE-amd64-disc1.iso [...]

Then connect to port 2226:
telnet localhost 2226

Then when you boot FreeBSD CD, and the (legible) boot loader comes up.
choose 6. Escape to loader prompt

At the OK prompt, type:  set console=comconsole

The OK prompt will now appear in your telnet session.  Type boot and hit
return.  Continue with legible FreeBSD install via your telnet session.

You may want to set up a serial console on the FreeBSD system that you
installed, as well.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2556746group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm-userspace build break (linux/types.h)

2009-02-19 Thread Hollis Blanchard
A recent kernel merge breaks kvm-userspace build:
make[1]: Entering directory `/root/hollisb/kvm-userspace.git/libkvm'
gcc -m64 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer 
-Wall  -fno-stack-protector   -I /root/hollisb/kvm-userspace.git/kernel/include 
  -c -o libkvm.o libkvm.c
In file included from /usr/include/bits/fcntl.h:24,
 from /usr/include/fcntl.h:34,
 from libkvm.c:30:
/usr/include/sys/types.h:46: error: conflicting types for ‘loff_t’
/usr/include/linux/types.h:30: error: previous declaration of ‘loff_t’ 
was here
/usr/include/sys/types.h:62: error: conflicting types for ‘dev_t’
/usr/include/linux/types.h:13: error: previous declaration of ‘dev_t’ 
was here
[...]

I built like so:
./configure
make -C kernel LINUX=/path/to/kvm.git sync
make

The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0 and
cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh
Rajput jaswin...@infradead.org: 
-#include asm/types.h
+#include linux/types.h

With these changes, libkvm.c ends up
including /usr/include/linux/types.h, instead of the
intended ../kernel/include/linux/types.h.

Avi, suggestions? More make sync hacks?

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-userspace build break (linux/types.h)

2009-02-19 Thread Joerg Roedel
On Thu, Feb 19, 2009 at 03:50:14PM -0600, Hollis Blanchard wrote:
 A recent kernel merge breaks kvm-userspace build:
 make[1]: Entering directory `/root/hollisb/kvm-userspace.git/libkvm'
 gcc -m64 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer 
 -Wall  -fno-stack-protector   -I 
 /root/hollisb/kvm-userspace.git/kernel/include   -c -o libkvm.o libkvm.c
 In file included from /usr/include/bits/fcntl.h:24,
  from /usr/include/fcntl.h:34,
  from libkvm.c:30:
 /usr/include/sys/types.h:46: error: conflicting types for ‘loff_t’
 /usr/include/linux/types.h:30: error: previous declaration of 
 ‘loff_t’ was here
 /usr/include/sys/types.h:62: error: conflicting types for ‘dev_t’
 /usr/include/linux/types.h:13: error: previous declaration of ‘dev_t’ 
 was here
 [...]
 
 I built like so:
 ./configure
 make -C kernel LINUX=/path/to/kvm.git sync
 make
 
 The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0 and
 cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh
 Rajput jaswin...@infradead.org: 
 -#include asm/types.h
 +#include linux/types.h
 
 With these changes, libkvm.c ends up
 including /usr/include/linux/types.h, instead of the
 intended ../kernel/include/linux/types.h.

I had the same problem some weeks ago. IIRC I fixed it with some include
reordering in libkvm.h.

Joerg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Simon Horman
On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote:
 On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
  On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
   
   2) Direct NIC attachment This is particularly interesting with SR-IOV or
   other multiqueue nics, but for boutique cases or benchmarks, could be for
   normal NICs.  So far I have some very sketched-out patches: for the
   attached nic dev_alloc_skb() gets an skb from the guest (which supplies
   them via some kind of AIO interface), and a branch in netif_receive_skb()
   which returned it to the guest.  This bypasses all firewalling in the
   host though; we're basically having the guest process drive the NIC
   directly.
  
  Hi Rusty,
  
  Can I clarify that the idea with utilising SR-IOV would be to assign
  virtual functions to guests? That is, something conceptually similar to
  PCI pass-through in Xen (although I'm not sure that anyone has virtual
  function pass-through working yet).
 
 Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it
 makes migrate complicated (if not impossible), and requires emulation or
 the same NIC on the destination host.
 
 This would be the *host* seeing the virtual functions as multiple NICs,
 then the ability to attach a given NIC directly to a process.
 
 This isn't guest-visible: the kvm process is configured to connect
 directly to a NIC, rather than (say) bridging through the host.

Hi Rusty, Hi Chris,

Thanks for the clarification.

I think that the approach that Xen recommends for migration is to
use a bonding device that accesses the pass-through device if present
and a virtual nic.

The idea that you outline above does sound somewhat cleaner :-)

  If so, wouldn't this also be useful on machines that have multiple
  NICs?
 
 Yes, but mainly as a benchmark hack AFAICT :)

Ok, I was under the impression that at least in the Xen world it
was something people actually used. But I could easily be mistaken.

 Hope that clarifies, Rusty.

On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote:
 * Simon Horman (ho...@verge.net.au) wrote:
  On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
   2) Direct NIC attachment This is particularly interesting with SR-IOV or
   other multiqueue nics, but for boutique cases or benchmarks, could be for
   normal NICs.  So far I have some very sketched-out patches: for the
   attached nic dev_alloc_skb() gets an skb from the guest (which supplies
   them via some kind of AIO interface), and a branch in netif_receive_skb()
   which returned it to the guest.  This bypasses all firewalling in the
   host though; we're basically having the guest process drive the NIC
   directly.
  
  Can I clarify that the idea with utilising SR-IOV would be to assign
  virtual functions to guests? That is, something conceptually similar to
  PCI pass-through in Xen (although I'm not sure that anyone has virtual
  function pass-through working yet). If so, wouldn't this also be useful
  on machines that have multiple NICs?
 
 This would be the typical usecase for sr-iov.  But I think Rusty is
 referring to giving a nic directly to a guest but the guest is still
 seeing a virtio nic (not pass-through/device-assignment).  So there's
 no bridge, and zero copy so the dma buffers are supplied by guest,
 but host has the driver for the physical nic or the VF.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


centos 5.x on kvm-83 doesnt think pentium pro has fast system calls

2009-02-19 Thread Steven Stovall
Why wouldn't SEP be recognized by kvm-83 running a centos 5.x guest on a ppro?

Steven
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kvm-userspace build break (linux/types.h)

2009-02-19 Thread Zhang, Xiantao
For x86 and ia64,  linux/types.h will be hacked to asm/types.h when syncing the 
source. 
You may consult kernel/x86/hack-module.awk to get the answer.  
Xiantao

Joerg Roedel wrote:
 On Thu, Feb 19, 2009 at 03:50:14PM -0600, Hollis Blanchard wrote:
 A recent kernel merge breaks kvm-userspace build:
 make[1]: Entering directory
 `/root/hollisb/kvm-userspace.git/libkvm' gcc -m64
 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer
  -Wall  -fno-stack-protector   -I
 
 /root/hollisb/kvm-userspace.git/kernel/include   -c -o
 libkvm.o libkvm.c In file included from
 /usr/include/bits/fcntl.h:24, from /usr/include/fcntl.h:34,
 from libkvm.c:30: /usr/include/sys/types.h:46: error:
 conflicting types for 'loff_t' /usr/include/linux/types.h:30: error:
 previous declaration of 'loff_t' was here
 /usr/include/sys/types.h:62: error: conflicting types for 'dev_t'
 /usr/include/linux/types.h:13: error: previous declaration of
 'dev_t' was here [...] 
 
 I built like so:
 ./configure
 make -C kernel LINUX=/path/to/kvm.git sync
 make
 
 The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0
 and 
 cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh
 Rajput jaswin...@infradead.org:
 -#include asm/types.h
 +#include linux/types.h
 
 With these changes, libkvm.c ends up
 including /usr/include/linux/types.h, instead of the
 intended ../kernel/include/linux/types.h.
 
 I had the same problem some weeks ago. IIRC I fixed it with some
 include reordering in libkvm.h.
 
 Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recent kvm and vmware server comparisons?

2009-02-19 Thread Thomas Fjellstrom
On Thursday 19 February 2009, Hans de Bruin wrote:
 Martin Maurer wrote:
  I suppose no-one has any?
 
  VMware includes in its EULA (End User License Agreement) a prohibition
  for any licensee to publish benchmark results without VMware's approval.
  (see https://www.vmware.com/tryvmware/eula.php)
 
  Maybe this is a reason why all published VMWare benchmarks looks quite
  similar :-)
 
  I would love to see a comparison but due to this restrictions it´s hard
  to get independent results.

 Why compare kvm to vmware and not to real hardware? The results can than
 be compared to vmware/hardware and hyper-v/hardware.

hyper-v doesn't provide network or disk io ;)

-- 
Thomas Fjellstrom
tfjellst...@shaw.ca
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 3/7] PCI: reserve bus range for SR-IOV device

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/iov.c   |   34 ++
 drivers/pci/pci.h   |5 +
 drivers/pci/probe.c |3 +++
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 3bca8f8..0b80437 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -14,6 +14,16 @@
 #include pci.h
 
 
+static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 
*devfn)
+{
+   u16 bdf;
+
+   bdf = (dev-bus-number  8) + dev-devfn +
+ dev-sriov-offset + dev-sriov-stride * id;
+   *busnr = bdf  8;
+   *devfn = bdf  0xff;
+}
+
 static int sriov_init(struct pci_dev *dev, int pos)
 {
int i;
@@ -208,3 +218,27 @@ void pci_restore_iov_state(struct pci_dev *dev)
if (dev-sriov)
sriov_restore_state(dev);
 }
+
+/**
+ * pci_iov_bus_range - find bus range used by Virtual Function
+ * @bus: the PCI bus
+ *
+ * Returns max number of buses (exclude current one) used by Virtual
+ * Functions.
+ */
+int pci_iov_bus_range(struct pci_bus *bus)
+{
+   int max = 0;
+   u8 busnr, devfn;
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, bus-devices, bus_list) {
+   if (!dev-sriov)
+   continue;
+   virtfn_bdf(dev, dev-sriov-total - 1, busnr, devfn);
+   if (busnr  max)
+   max = busnr;
+   }
+
+   return max ? max - bus-number : 0;
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index b24c9e2..2cf32f5 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -217,6 +217,7 @@ extern void pci_iov_release(struct pci_dev *dev);
 extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
 extern void pci_restore_iov_state(struct pci_dev *dev);
+extern int pci_iov_bus_range(struct pci_bus *bus);
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
 {
@@ -234,6 +235,10 @@ static inline int pci_iov_resource_bar(struct pci_dev 
*dev, int resno,
 static inline void pci_restore_iov_state(struct pci_dev *dev)
 {
 }
+static inline int pci_iov_bus_range(struct pci_bus *bus)
+{
+   return 0;
+}
 #endif /* CONFIG_PCI_IOV */
 
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03b6f29..4c8abd0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1078,6 +1078,9 @@ unsigned int __devinit pci_scan_child_bus(struct pci_bus 
*bus)
for (devfn = 0; devfn  0x100; devfn += 8)
pci_scan_slot(bus, devfn);
 
+   /* Reserve buses for SR-IOV capability. */
+   max += pci_iov_bus_range(bus);
+
/*
 * After performing arch-dependent fixup of the bus, look behind
 * all PCI-to-PCI bridges on this bus.
-- 
1.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 0/7] PCI: Linux kernel SR-IOV support

2009-02-19 Thread Yu Zhao
Greetings,

Following patches are intended to support SR-IOV capability in the
Linux kernel. With these patches, people can turn a PCI device with
the capability into multiple ones from software perspective, which
will benefit KVM and achieve other purposes such as QoS, security,
and etc.

SR-IOV specification can be found at:
  
http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf
(it requires membership.)

Devices that support SR-IOV are available from following vendors:
  http://download.intel.com/design/network/ProdBrf/320025.pdf
  http://www.myri.com/vlsi/Lanai_Z8ES_Datasheet.pdf
  http://www.neterion.com/products/pdfs/X3100ProductBrief.pdf

Physical Function driver patches for Intel 82576 NIC are available:
  http://patchwork.kernel.org/patch/8063/
  http://patchwork.kernel.org/patch/8064/
  http://patchwork.kernel.org/patch/8065/
  http://patchwork.kernel.org/patch/8066/

Major changes from v9 to v10:
  1, minor fix in pci_restore_iov_state().
  2, respin against the latest tree.

Yu Zhao (7):
  PCI: initialize and release SR-IOV capability
  PCI: restore saved SR-IOV state
  PCI: reserve bus range for SR-IOV device
  PCI: add SR-IOV API for Physical Function driver
  PCI: handle SR-IOV Virtual Function Migration
  PCI: document SR-IOV sysfs entries
  PCI: manual for SR-IOV user and driver developer

 Documentation/ABI/testing/sysfs-bus-pci |   27 ++
 Documentation/DocBook/kernel-api.tmpl   |1 +
 Documentation/PCI/pci-iov-howto.txt |   99 +
 drivers/pci/Kconfig |   13 +
 drivers/pci/Makefile|3 +
 drivers/pci/iov.c   |  711 +++
 drivers/pci/pci.c   |8 +
 drivers/pci/pci.h   |   53 +++
 drivers/pci/probe.c |7 +
 include/linux/pci.h |   28 ++
 include/linux/pci_regs.h|   33 ++
 11 files changed, 983 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/PCI/pci-iov-howto.txt
 create mode 100644 drivers/pci/iov.c

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 2/7] PCI: restore saved SR-IOV state

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/iov.c |   29 +
 drivers/pci/pci.c |1 +
 drivers/pci/pci.h |4 
 3 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index e6736d4..3bca8f8 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -128,6 +128,25 @@ static void sriov_release(struct pci_dev *dev)
dev-sriov = NULL;
 }
 
+static void sriov_restore_state(struct pci_dev *dev)
+{
+   int i;
+   u16 ctrl;
+   struct pci_sriov *iov = dev-sriov;
+
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_CTRL, ctrl);
+   if (ctrl  PCI_SRIOV_CTRL_VFE)
+   return;
+
+   for (i = PCI_SRIOV_RESOURCES; i = PCI_SRIOV_RESOURCE_END; i++)
+   pci_update_resource(dev, i);
+
+   pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz);
+   pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
+   if (iov-ctrl  PCI_SRIOV_CTRL_VFE)
+   msleep(100);
+}
+
 /**
  * pci_iov_init - initialize the IOV capability
  * @dev: the PCI device
@@ -179,3 +198,13 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
return dev-sriov-pos + PCI_SRIOV_BAR +
4 * (resno - PCI_SRIOV_RESOURCES);
 }
+
+/**
+ * pci_restore_iov_state - restore the state of the IOV capability
+ * @dev: the PCI device
+ */
+void pci_restore_iov_state(struct pci_dev *dev)
+{
+   if (dev-sriov)
+   sriov_restore_state(dev);
+}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2eba2a5..8e21912 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -773,6 +773,7 @@ pci_restore_state(struct pci_dev *dev)
}
pci_restore_pcix_state(dev);
pci_restore_msi_state(dev);
+   pci_restore_iov_state(dev);
 
return 0;
 }
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 451db74..b24c9e2 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -216,6 +216,7 @@ extern int pci_iov_init(struct pci_dev *dev);
 extern void pci_iov_release(struct pci_dev *dev);
 extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
+extern void pci_restore_iov_state(struct pci_dev *dev);
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
 {
@@ -230,6 +231,9 @@ static inline int pci_iov_resource_bar(struct pci_dev *dev, 
int resno,
 {
return 0;
 }
+static inline void pci_restore_iov_state(struct pci_dev *dev)
+{
+}
 #endif /* CONFIG_PCI_IOV */
 
 #endif /* DRIVERS_PCI_H */
-- 
1.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 1/7] PCI: initialize and release SR-IOV capability

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/Kconfig  |   13 
 drivers/pci/Makefile |3 +
 drivers/pci/iov.c|  181 ++
 drivers/pci/pci.c|7 ++
 drivers/pci/pci.h|   37 ++
 drivers/pci/probe.c  |4 +
 include/linux/pci.h  |8 ++
 include/linux/pci_regs.h |   33 +
 8 files changed, 286 insertions(+), 0 deletions(-)
 create mode 100644 drivers/pci/iov.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 2a4501d..e8ea3e8 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -59,3 +59,16 @@ config HT_IRQ
   This allows native hypertransport devices to use interrupts.
 
   If unsure say Y.
+
+config PCI_IOV
+   bool PCI IOV support
+   depends on PCI
+   select PCI_MSI
+   default n
+   help
+ PCI-SIG I/O Virtualization (IOV) Specifications support.
+ Single Root IOV: allows the Physical Function driver to enable
+ the hardware capability, so the Virtual Function is accessible
+ via the PCI Configuration Space using its own Bus, Device and
+ Function Numbers. Each Virtual Function also has the PCI Memory
+ Space to map the device specific register set.
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 3d07ce2..ba99282 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -29,6 +29,9 @@ obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o
 
 obj-$(CONFIG_INTR_REMAP) += dmar.o intr_remapping.o
 
+# PCI IOV support
+obj-$(CONFIG_PCI_IOV) += iov.o
+
 #
 # Some architectures use the generic PCI setup functions
 #
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
new file mode 100644
index 000..e6736d4
--- /dev/null
+++ b/drivers/pci/iov.c
@@ -0,0 +1,181 @@
+/*
+ * drivers/pci/iov.c
+ *
+ * Copyright (C) 2009 Intel Corporation, Yu Zhao yu.z...@intel.com
+ *
+ * PCI Express I/O Virtualization (IOV) support.
+ *   Single Root IOV 1.0
+ */
+
+#include linux/pci.h
+#include linux/mutex.h
+#include linux/string.h
+#include linux/delay.h
+#include pci.h
+
+
+static int sriov_init(struct pci_dev *dev, int pos)
+{
+   int i;
+   int rc;
+   int nres;
+   u32 pgsz;
+   u16 ctrl, total, offset, stride;
+   struct pci_sriov *iov;
+   struct resource *res;
+   struct pci_dev *pdev;
+
+   if (dev-pcie_type != PCI_EXP_TYPE_RC_END 
+   dev-pcie_type != PCI_EXP_TYPE_ENDPOINT)
+   return -ENODEV;
+
+   pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
+   if (ctrl  PCI_SRIOV_CTRL_VFE) {
+   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
+   ssleep(1);
+   }
+
+   pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, total);
+   if (!total)
+   return 0;
+
+   list_for_each_entry(pdev, dev-bus-devices, bus_list)
+   if (pdev-sriov)
+   break;
+   if (list_empty(dev-bus-devices) || !pdev-sriov)
+   pdev = NULL;
+
+   ctrl = 0;
+   if (!pdev  pci_ari_enabled(dev-bus))
+   ctrl |= PCI_SRIOV_CTRL_ARI;
+
+   pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl);
+   pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total);
+   pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset);
+   pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride);
+   if (!offset || (total  1  !stride))
+   return -EIO;
+
+   pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz);
+   i = PAGE_SHIFT  12 ? PAGE_SHIFT - 12 : 0;
+   pgsz = ~((1  i) - 1);
+   if (!pgsz)
+   return -EIO;
+
+   pgsz = ~(pgsz - 1);
+   pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz);
+
+   nres = 0;
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = dev-resource + PCI_SRIOV_RESOURCES + i;
+   i += __pci_read_base(dev, pci_bar_unknown, res,
+pos + PCI_SRIOV_BAR + i * 4);
+   if (!res-flags)
+   continue;
+   if (resource_size(res)  (PAGE_SIZE - 1)) {
+   rc = -EIO;
+   goto failed;
+   }
+   res-end = res-start + resource_size(res) * total - 1;
+   nres++;
+   }
+
+   iov = kzalloc(sizeof(*iov), GFP_KERNEL);
+   if (!iov) {
+   rc = -ENOMEM;
+   goto failed;
+   }
+
+   iov-pos = pos;
+   iov-nres = nres;
+   iov-ctrl = ctrl;
+   iov-total = total;
+   iov-offset = offset;
+   iov-stride = stride;
+   iov-pgsz = pgsz;
+   iov-self = dev;
+   pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, iov-cap);
+   pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, iov-link);
+
+   if (pdev)
+   iov-pdev = pci_dev_get(pdev);
+   else {
+   iov-pdev = dev;
+  

[PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/iov.c   |  348 +++
 drivers/pci/pci.h   |3 +
 include/linux/pci.h |   14 ++
 3 files changed, 365 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 0b80437..8096fc9 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -13,6 +13,8 @@
 #include linux/delay.h
 #include pci.h
 
+#define VIRTFN_ID_LEN  8
+
 
 static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 
*devfn)
 {
@@ -24,6 +26,319 @@ static inline void virtfn_bdf(struct pci_dev *dev, int id, 
u8 *busnr, u8 *devfn)
*devfn = bdf  0xff;
 }
 
+static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr)
+{
+   int rc;
+   struct pci_bus *child;
+
+   if (bus-number == busnr)
+   return bus;
+
+   child = pci_find_bus(pci_domain_nr(bus), busnr);
+   if (child)
+   return child;
+
+   child = pci_add_new_bus(bus, NULL, busnr);
+   if (!child)
+   return NULL;
+
+   child-subordinate = busnr;
+   child-dev.parent = bus-bridge;
+   rc = pci_bus_add_child(child);
+   if (rc) {
+   pci_remove_bus(child);
+   return NULL;
+   }
+
+   return child;
+}
+
+static void virtfn_remove_bus(struct pci_bus *bus, int busnr)
+{
+   struct pci_bus *child;
+
+   if (bus-number == busnr)
+   return;
+
+   child = pci_find_bus(pci_domain_nr(bus), busnr);
+   BUG_ON(!child);
+
+   if (list_empty(child-devices))
+   pci_remove_bus(child);
+}
+
+static int virtfn_add(struct pci_dev *dev, int id, int reset)
+{
+   int i;
+   int rc;
+   u64 size;
+   u8 busnr, devfn;
+   char buf[VIRTFN_ID_LEN];
+   struct pci_dev *virtfn;
+   struct resource *res;
+   struct pci_sriov *iov = dev-sriov;
+
+   virtfn = alloc_pci_dev();
+   if (!virtfn)
+   return -ENOMEM;
+
+   virtfn_bdf(dev, id, busnr, devfn);
+   mutex_lock(iov-pdev-sriov-lock);
+   virtfn-bus = virtfn_add_bus(dev-bus, busnr);
+   if (!virtfn-bus) {
+   kfree(virtfn);
+   mutex_unlock(iov-pdev-sriov-lock);
+   return -ENOMEM;
+   }
+
+   virtfn-sysdata = dev-bus-sysdata;
+   virtfn-dev.parent = dev-dev.parent;
+   virtfn-dev.bus = dev-dev.bus;
+   virtfn-devfn = devfn;
+   virtfn-hdr_type = PCI_HEADER_TYPE_NORMAL;
+   virtfn-cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+   virtfn-error_state = pci_channel_io_normal;
+   virtfn-current_state = PCI_UNKNOWN;
+   virtfn-is_pcie = 1;
+   virtfn-pcie_type = PCI_EXP_TYPE_ENDPOINT;
+   virtfn-dma_mask = 0x;
+   virtfn-vendor = dev-vendor;
+   virtfn-subsystem_vendor = dev-subsystem_vendor;
+   virtfn-class = dev-class;
+   pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device);
+   pci_read_config_byte(virtfn, PCI_REVISION_ID, virtfn-revision);
+   pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID,
+virtfn-subsystem_device);
+
+   dev_set_name(virtfn-dev, %04x:%02x:%02x.%d,
+pci_domain_nr(virtfn-bus), busnr,
+PCI_SLOT(devfn), PCI_FUNC(devfn));
+
+   for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
+   res = dev-resource + PCI_SRIOV_RESOURCES + i;
+   if (!res-parent)
+   continue;
+   virtfn-resource[i].name = pci_name(virtfn);
+   virtfn-resource[i].flags = res-flags;
+   size = resource_size(res);
+   do_div(size, iov-total);
+   virtfn-resource[i].start = res-start + size * id;
+   virtfn-resource[i].end = virtfn-resource[i].start + size - 1;
+   rc = request_resource(res, virtfn-resource[i]);
+   BUG_ON(rc);
+   }
+
+   if (reset)
+   pci_execute_reset_function(virtfn);
+
+   pci_device_add(virtfn, virtfn-bus);
+   mutex_unlock(iov-pdev-sriov-lock);
+
+   virtfn-physfn = pci_dev_get(dev);
+
+   rc = pci_bus_add_device(virtfn);
+   if (rc)
+   goto failed1;
+   sprintf(buf, %d, id);
+   rc = sysfs_create_link(iov-dev.kobj, virtfn-dev.kobj, buf);
+   if (rc)
+   goto failed1;
+   rc = sysfs_create_link(virtfn-dev.kobj, dev-dev.kobj, physfn);
+   if (rc)
+   goto failed2;
+
+   kobject_uevent(virtfn-dev.kobj, KOBJ_CHANGE);
+
+   return 0;
+
+failed2:
+   sysfs_remove_link(iov-dev.kobj, buf);
+failed1:
+   pci_dev_put(dev);
+   mutex_lock(iov-pdev-sriov-lock);
+   pci_remove_bus_device(virtfn);
+   virtfn_remove_bus(dev-bus, busnr);
+   mutex_unlock(iov-pdev-sriov-lock);
+
+   return rc;
+}
+
+static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+{
+   u8 busnr, devfn;
+   char buf[VIRTFN_ID_LEN];
+   struct 

[PATCH v10 5/7] PCI: handle SR-IOV Virtual Function Migration

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/iov.c   |  119 +++
 drivers/pci/pci.h   |4 ++
 include/linux/pci.h |6 +++
 3 files changed, 129 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 8096fc9..063fe74 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -206,6 +206,97 @@ static void sriov_release_dev(struct device *dev)
iov-nr_virtfn = 0;
 }
 
+static int sriov_migration(struct pci_dev *dev)
+{
+   u16 status;
+   struct pci_sriov *iov = dev-sriov;
+
+   if (!iov-nr_virtfn)
+   return 0;
+
+   if (!(iov-cap  PCI_SRIOV_CAP_VFM))
+   return 0;
+
+   pci_read_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status);
+   if (!(status  PCI_SRIOV_STATUS_VFM))
+   return 0;
+
+   schedule_work(iov-mtask);
+
+   return 1;
+}
+
+static void sriov_migration_task(struct work_struct *work)
+{
+   int i;
+   u8 state;
+   u16 status;
+   struct pci_sriov *iov = container_of(work, struct pci_sriov, mtask);
+
+   for (i = iov-initial; i  iov-nr_virtfn; i++) {
+   state = readb(iov-mstate + i);
+   if (state == PCI_SRIOV_VFM_MI) {
+   writeb(PCI_SRIOV_VFM_AV, iov-mstate + i);
+   state = readb(iov-mstate + i);
+   if (state == PCI_SRIOV_VFM_AV)
+   virtfn_add(iov-self, i, 1);
+   } else if (state == PCI_SRIOV_VFM_MO) {
+   virtfn_remove(iov-self, i, 1);
+   writeb(PCI_SRIOV_VFM_UA, iov-mstate + i);
+   state = readb(iov-mstate + i);
+   if (state == PCI_SRIOV_VFM_AV)
+   virtfn_add(iov-self, i, 0);
+   }
+   }
+
+   pci_read_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status);
+   status = ~PCI_SRIOV_STATUS_VFM;
+   pci_write_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status);
+}
+
+static int sriov_enable_migration(struct pci_dev *dev, int nr_virtfn)
+{
+   int bir;
+   u32 table;
+   resource_size_t pa;
+   struct pci_sriov *iov = dev-sriov;
+
+   if (nr_virtfn = iov-initial)
+   return 0;
+
+   pci_read_config_dword(dev, iov-pos + PCI_SRIOV_VFM, table);
+   bir = PCI_SRIOV_VFM_BIR(table);
+   if (bir  PCI_STD_RESOURCE_END)
+   return -EIO;
+
+   table = PCI_SRIOV_VFM_OFFSET(table);
+   if (table + nr_virtfn  pci_resource_len(dev, bir))
+   return -EIO;
+
+   pa = pci_resource_start(dev, bir) + table;
+   iov-mstate = ioremap(pa, nr_virtfn);
+   if (!iov-mstate)
+   return -ENOMEM;
+
+   INIT_WORK(iov-mtask, sriov_migration_task);
+
+   iov-ctrl |= PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR;
+   pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
+
+   return 0;
+}
+
+static void sriov_disable_migration(struct pci_dev *dev)
+{
+   struct pci_sriov *iov = dev-sriov;
+
+   iov-ctrl = ~(PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR);
+   pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl);
+
+   cancel_work_sync(iov-mtask);
+   iounmap(iov-mstate);
+}
+
 static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 {
int rc;
@@ -294,6 +385,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
goto failed2;
}
 
+   if (iov-cap  PCI_SRIOV_CAP_VFM) {
+   rc = sriov_enable_migration(dev, nr_virtfn);
+   if (rc)
+   goto failed2;
+   }
+
kobject_uevent(dev-dev.kobj, KOBJ_CHANGE);
iov-nr_virtfn = nr_virtfn;
 
@@ -325,6 +422,9 @@ static void sriov_disable(struct pci_dev *dev)
if (!iov-nr_virtfn)
return;
 
+   if (iov-cap  PCI_SRIOV_CAP_VFM)
+   sriov_disable_migration(dev);
+
for (i = 0; i  iov-nr_virtfn; i++)
virtfn_remove(dev, i, 0);
 
@@ -590,3 +690,22 @@ void pci_disable_sriov(struct pci_dev *dev)
sriov_disable(dev);
 }
 EXPORT_SYMBOL_GPL(pci_disable_sriov);
+
+/**
+ * pci_sriov_migration - notify SR-IOV core of Virtual Function Migration
+ * @dev: the PCI device
+ *
+ * Returns IRQ_HANDLED if the IRQ is handled, or IRQ_NONE if not.
+ *
+ * Physical Function driver is responsible to register IRQ handler using
+ * VF Migration Interrupt Message Number, and call this function when the
+ * interrupt is generated by the hardware.
+ */
+irqreturn_t pci_sriov_migration(struct pci_dev *dev)
+{
+   if (!dev-sriov)
+   return IRQ_NONE;
+
+   return sriov_migration(dev) ? IRQ_HANDLED : IRQ_NONE;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_migration);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9bbf868..6764f02 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1,6 +1,8 @@
 #ifndef 

[PATCH v10 6/7] PCI: document SR-IOV sysfs entries

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 Documentation/ABI/testing/sysfs-bus-pci |   27 +++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index ceddcff..84dc100 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -9,3 +9,30 @@ Description:
that some devices may have malformatted data.  If the
underlying VPD has a writable section then the
corresponding section of this file will be writable.
+
+What:  /sys/bus/pci/devices/.../virtfn/N
+Date:  February 2009
+Contact:   Yu Zhao yu.z...@intel.com
+Description:
+   This symbol link appears when hardware supports SR-IOV
+   capability and Physical Function driver has enabled it.
+   The symbol link points to the PCI device sysfs entry of
+   Virtual Function whose index is N (0...MaxVFs-1).
+
+What:  /sys/bus/pci/devices/.../virtfn/dep_link
+Date:  February 2009
+Contact:   Yu Zhao yu.z...@intel.com
+Description:
+   This symbol link appears when hardware supports SR-IOV
+   capability and Physical Function driver has enabled it,
+   and this device has vendor specific dependencies with
+   others. The symbol link points to the PCI device sysfs
+   entry of Physical Function this device depends on.
+
+What:  /sys/bus/pci/devices/.../physfn
+Date:  February 2009
+Contact:   Yu Zhao yu.z...@intel.com
+Description:
+   This symbol link appears when a device is Virtual Function.
+   The symbol link points to the PCI device sysfs entry of
+   Physical Function this device associates with.
-- 
1.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 7/7] PCI: manual for SR-IOV user and driver developer

2009-02-19 Thread Yu Zhao
Signed-off-by: Yu Zhao yu.z...@intel.com
---
 Documentation/DocBook/kernel-api.tmpl |1 +
 Documentation/PCI/pci-iov-howto.txt   |   99 +
 2 files changed, 100 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/PCI/pci-iov-howto.txt

diff --git a/Documentation/DocBook/kernel-api.tmpl 
b/Documentation/DocBook/kernel-api.tmpl
index 5818ff7..506e611 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -251,6 +251,7 @@ X!Edrivers/pci/hotplug.c
 --
 !Edrivers/pci/probe.c
 !Edrivers/pci/rom.c
+!Edrivers/pci/iov.c
  /sect1
  sect1titlePCI Hotplug Support Library/title
 !Edrivers/pci/hotplug/pci_hotplug_core.c
diff --git a/Documentation/PCI/pci-iov-howto.txt 
b/Documentation/PCI/pci-iov-howto.txt
new file mode 100644
index 000..fc73ef5
--- /dev/null
+++ b/Documentation/PCI/pci-iov-howto.txt
@@ -0,0 +1,99 @@
+   PCI Express I/O Virtualization Howto
+   Copyright (C) 2009 Intel Corporation
+   Yu Zhao yu.z...@intel.com
+
+
+1. Overview
+
+1.1 What is SR-IOV
+
+Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
+capability which makes one physical device appear as multiple virtual
+devices. The physical device is referred to as Physical Function (PF)
+while the virtual devices are referred to as Virtual Functions (VF).
+Allocation of the VF can be dynamically controlled by the PF via
+registers encapsulated in the capability. By default, this feature is
+not enabled and the PF behaves as traditional PCIe device. Once it's
+turned on, each VF's PCI configuration space can be accessed by its own
+Bus, Device and Function Number (Routing ID). And each VF also has PCI
+Memory Space, which is used to map its register set. VF device driver
+operates on the register set so it can be functional and appear as a
+real existing PCI device.
+
+2. User Guide
+
+2.1 How can I enable SR-IOV capability
+
+The device driver (PF driver) will control the enabling and disabling
+of the capability via API provided by SR-IOV core. If the hardware
+has SR-IOV capability, loading its PF driver would enable it and all
+VFs associated with the PF.
+
+2.2 How can I use the Virtual Functions
+
+The VF is treated as hot-plugged PCI devices in the kernel, so they
+should be able to work in the same way as real PCI devices. The VF
+requires device driver that is same as a normal PCI device's.
+
+3. Developer Guide
+
+3.1 SR-IOV API
+
+To enable SR-IOV capability:
+   int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
+   'nr_virtfn' is number of VFs to be enabled.
+
+To disable SR-IOV capability:
+   void pci_disable_sriov(struct pci_dev *dev);
+
+To notify SR-IOV core of Virtual Function Migration:
+   irqreturn_t pci_sriov_migration(struct pci_dev *dev);
+
+3.2 Usage example
+
+Following piece of code illustrates the usage of the SR-IOV API.
+
+static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id 
*id)
+{
+   pci_enable_sriov(dev, NR_VIRTFN);
+
+   ...
+
+   return 0;
+}
+
+static void __devexit dev_remove(struct pci_dev *dev)
+{
+   pci_disable_sriov(dev);
+
+   ...
+}
+
+static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+{
+   ...
+
+   return 0;
+}
+
+static int dev_resume(struct pci_dev *dev)
+{
+   ...
+
+   return 0;
+}
+
+static void dev_shutdown(struct pci_dev *dev)
+{
+   ...
+}
+
+static struct pci_driver dev_driver = {
+   .name = SR-IOV Physical Function driver,
+   .id_table = dev_id_table,
+   .probe =dev_probe,
+   .remove =   __devexit_p(dev_remove),
+   .suspend =  dev_suspend,
+   .resume =   dev_resume,
+   .shutdown = dev_shutdown,
+};
-- 
1.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2609423 ] Segmentation fault when creating guest on PAE host

2009-02-19 Thread SourceForge.net
Bugs item #2609423, was opened at 2009-02-17 07:30
Message generated for change (Comment added) made by jiajun
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2609423group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Nobody/Anonymous (nobody)
Summary: Segmentation fault when creating guest on PAE host

Initial Comment:
Environment:
Kernel Commit:f0080da24a9990eff13cce5d0ee68e5f139725ce
Userspace Commit:56fea7f2df7f9e70b9449832b96ba1b9a760423f
Host Kernel Version:2.6.29-rc2

When creating guest on PAE host, qemu process will meet segmentation fault.

[r...@vt-nhm1 ~]# qemu -m 256 -hda /share/xvs/var/ia32p_SMP.img
Segmentation fault

qemu[9998]: segfault at 6c65746e ip 081a0550 sp a6279888 error 6 in 
qemu-system-x86_64[8048000+1a8000]



--

Comment By: Jiajun Xu (jiajun)
Date: 2009-02-19 23:59

Message:
The bug is fixed by kvm.userspace 68592ae18de1d45918542242e918085ca7f2e93c.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2609423group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/5] kvm/powerpc: Enable MPIC for E500 platform.

2009-02-19 Thread Aurelien Jarno
On Tue, Feb 17, 2009 at 04:55:51PM +0200, Blue Swirl wrote:
 On 2/17/09, Liu Yu yu@freescale.com wrote:
  MPIC and OpenPIC have very similar design.
   So a lot of code can be reused.
 
   Modification mainly include:
   1. keep struct openpic_t to the maximum size of both MPIC and OpenPIC.
   2. endianess swap.
 MPIC has the same endianess as target, so no need to swap for MPIC.
 
 I don't think this is correct, the host can still be different endian
 from target.
 

I do not agree. As long as we don't manipulate host memory, the host
endianess has nothing to do. The values are simply passed by value, they
don't need to be swapped.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html