[PATCH] Merge branch 'qemu-cvs'
From: Avi Kivity a...@redhat.com * qemu-cvs: Fix cpu_physical_memory_rw() for 64-bit I/O accesses Avoid running audio ctl's when vm is not running -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: external module: backward compatibility for compound_head()
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kernel/external-module-compat-comm.h b/kernel/external-module-compat-comm.h index 6e9a90a..527ab58 100644 --- a/kernel/external-module-compat-comm.h +++ b/kernel/external-module-compat-comm.h @@ -734,3 +734,16 @@ int kvm_pcidev_msi_enabled(struct pci_dev *dev); #define kvm_pcidev_msi_enabled(dev)(dev)-msi_enabled #endif + +/* compound_head() was introduced in 2.6.22 */ + +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,22) + +static inline struct page *compound_head(struct page *page) +{ + if (PageCompound(page)) + page = (struct page *)page_private(page); + return page; +} + +#endif -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: remove redundant check in mmu_set_spte
From: Joerg Roedel joerg.roe...@amd.com The following code flow is unnecessary: if (largepage) was_rmapped = is_large_pte(*shadow_pte); else was_rmapped = 1; The is_large_pte() function will always evaluate to one here because the (largepage !is_large_pte) case is already handled in the first if-clause. So we can remove this check and set was_rmapped to one always here. Signed-off-by: Joerg Roedel joerg.roe...@amd.com Acked-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ef060ec..c90b4b2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1791,12 +1791,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, pgprintk(hfn old %lx new %lx\n, spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu-kvm, shadow_pte); - } else { - if (largepage) - was_rmapped = is_large_pte(*shadow_pte); - else - was_rmapped = 1; - } + } else + was_rmapped = 1; } if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault, dirty, largepage, global, gfn, pfn, speculative, true)) { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: With -vnc option, can I still use ctrl+alt + n?
On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski man...@wpkg.org wrote: Neo Jia schrieb: hi, I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n key to get the qemu system console. Is there anyway to make this work? Use Qemu/KVM monitor and it's sendkey function. Sorry, could you specify the command line option? I don't know how to get into monitor. Thanks, Neo For example: sendkey alt-f3 -- Tomasz Chmielewski http://wpkg.org -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: With -vnc option, can I still use ctrl+alt + n?
On 19.02.2009, at 09:11, Neo Jia neo...@gmail.com wrote: On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski man...@wpkg.org wrote: Neo Jia schrieb: hi, I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n key to get the qemu system console. Is there anyway to make this work? Use Qemu/KVM monitor and it's sendkey function. Sorry, could you specify the command line option? I don't know how to get into monitor. You could pass -monitor stdio to qemu. That gives you the monitor on the shell you started qemu from. Are you using a mac to access vnc? Try ctrl-apple-2 then. Alex Thanks, Neo For example: sendkey alt-f3 -- Tomasz Chmielewski http://wpkg.org -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: With -vnc option, can I still use ctrl+alt + n?
2009/2/19 Alexander Graf ag...@suse.de: You could pass -monitor stdio to qemu. That gives you the monitor on the shell you started qemu from. So if I run kvm as daemon: kvm -name Linux-x64 -smp 2 -m 2048M -hda hda.img \ -cdrom ../../var/iso/debian-500-amd64-DVD-1.iso \ -net nic,vlan=0,macaddr=52:54:00:12:34:00,model=e1000 \ -net tap,vlan=0,ifname=tap00,script=no \ -net nic,vlan=1,macaddr=52:54:00:12:34:10,model=e1000 \ -net tap,vlan=1,ifname=tap10,script=no \ -vnc :10 -daemonize After logout the host, I can't get the qemu system console anymore? --- Dongsheng Song -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: With -vnc option, can I still use ctrl+alt + n?
On Thu, Feb 19, 2009 at 12:22 AM, Alexander Graf ag...@suse.de wrote: On 19.02.2009, at 09:11, Neo Jia neo...@gmail.com wrote: On Wed, Feb 18, 2009 at 11:38 PM, Tomasz Chmielewski man...@wpkg.org wrote: Neo Jia schrieb: hi, I am trying kvm-84 and with -vnc option I can't use ctrl + alt + n key to get the qemu system console. Is there anyway to make this work? Use Qemu/KVM monitor and it's sendkey function. Sorry, could you specify the command line option? I don't know how to get into monitor. You could pass -monitor stdio to qemu. That gives you the monitor on the shell you started qemu from. Yes this works for me. As I am giving the stdio to serial port, I need to change the serial port to something else. I am going to start a another thread about what I just find on qemu monitor. Thanks, Neo Are you using a mac to access vnc? Try ctrl-apple-2 then. Alex Thanks, Neo For example: sendkey alt-f3 -- Tomasz Chmielewski http://wpkg.org -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu info registers doesn't match the one I saw from kgdb?
hi, I am seeing something different between info registers from qemu monitor window vs. kgdb. This is a 32-bit Linux guest running on KVM-84. When I just break into the guest kernel with kgdb, I tried the follwoing commands: (qemu) info registers EAX=00010060 EBX=c0471e3c ECX= EDX=02fd ESI=02fd EDI=c04c5d20 EBP=c0471ddc ESP=c0471ddc EIP=c021129b EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 CS =0060 00c09b00 SS =0068 00c09300 DS =007b 00c0f300 FS = GS = LDT= TR = 8b00 GDT= c0407a80 00ff IDT= c0464000 07ff CR0=80050033 CR2= CR3=004aa000 CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 FCW=037f FSW= [ST=0] FTW=00 MXCSR= FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= But from Windbg, I got: (gdb) info registers eax0x0 0x0 ecx0xc 0xc edx0x0 0x0 ebx0x0 0x0 esp0xc0471f14 0xc0471f14 ebp0xc0471fc0 0xc0471fc0 esi0xc04ac07a 0xc04ac07a edi0xc04ad1f9 0xc04ad1f9 eip0xc047a853 0xc047a853 setup_arch+1036 eflags 0x86 [ PF SF ] cs 0x60 0x60 ss 0x68 0x68 ds 0xc049007b 0xc049007b es 0x7b 0x7b fs 0x 0x gs 0x 0x So, which one is correct? Do we still maintain the info registers on qemu? Thanks, Neo -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)
Anthony Liguori wrote: Are you suggesting that one should use cpufreq on a CPU without a constant tsc? Isn't this just asking for trouble? Depends on the (guest) clock source ;) tsc isn't going to do well obviously. kvmclock is designed to handle tsc frequency changes just fine. And with the kvm-84 kernel module it actually works correctly. HTH, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)
Gerd Hoffmann schrieb: Anthony Liguori wrote: Are you suggesting that one should use cpufreq on a CPU without a constant tsc? Isn't this just asking for trouble? Depends on the (guest) clock source ;) tsc isn't going to do well obviously. kvmclock is designed to handle tsc frequency changes just fine. And with the kvm-84 kernel module it actually works correctly. So with Linux virtio guests I may have luck, but not so with Windows, which can't (yet?) use kvm-clock. Correct? (it may be some time before I'm able to upgrade and check how it really works). -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] kvm: bios: make MMIO address page aligned in guest
MMIO of some devices are not page aligned, such as some EHCI controllers and virtual Realtek NIC in guest. Current guest bios doesn't guarantee the start address of MMIO page aligned. This may result in failure of device assignment, because KVM only allow to register page aligned memory slots. For example, it fails to assign EHCI controller (its MMIO size is 1KB) with virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of the EHCI controller will starts from 0xf2001400. MMIO addresses in guest are allocated in guest bios. This patch makes MMIO address page aligned in bios, then fixes the issue. Signed-off-by: Weidong Han weidong@intel.com --- bios/rombios32.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/bios/rombios32.c b/bios/rombios32.c index 9d2eaaa..4dea066 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -967,6 +967,9 @@ static void pci_bios_init_device(PCIDevice *d) *paddr = (*paddr + size - 1) ~(size - 1); pci_set_io_region_addr(d, i, *paddr); *paddr += size; +/* make memory address page aligned */ +if (!(val PCI_ADDRESS_SPACE_IO)) +*paddr = (*paddr + 0xfff) 0xf000; } } break; -- 1.6.0.4 0001-kvm-bios-make-MMIO-address-page-aligned-in-guest.patch Description: 0001-kvm-bios-make-MMIO-address-page-aligned-in-guest.patch
Re: qemu info registers doesn't match the one I saw from kgdb?
Neo Jia wrote: hi, I am seeing something different between info registers from qemu monitor window vs. kgdb. This is a 32-bit Linux guest running on KVM-84. When I just break into the guest kernel with kgdb, I tried the follwoing commands: (qemu) info registers EAX=00010060 EBX=c0471e3c ECX= EDX=02fd ESI=02fd EDI=c04c5d20 EBP=c0471ddc ESP=c0471ddc EIP=c021129b EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00c0f300 CS =0060 00c09b00 SS =0068 00c09300 DS =007b 00c0f300 FS = GS = LDT= TR = 8b00 GDT= c0407a80 00ff IDT= c0464000 07ff CR0=80050033 CR2= CR3=004aa000 CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 FCW=037f FSW= [ST=0] FTW=00 MXCSR= FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= But from Windbg, I got: (gdb) info registers eax0x0 0x0 ecx0xc 0xc edx0x0 0x0 ebx0x0 0x0 esp0xc0471f14 0xc0471f14 ebp0xc0471fc0 0xc0471fc0 esi0xc04ac07a 0xc04ac07a edi0xc04ad1f9 0xc04ad1f9 eip0xc047a853 0xc047a853 setup_arch+1036 eflags 0x86 [ PF SF ] cs 0x60 0x60 ss 0x68 0x68 ds 0xc049007b 0xc049007b es 0x7b 0x7b fs 0x 0x gs 0x 0x So, which one is correct? Do we still maintain the info registers on qemu? Yes, we do maintain them (for now only in the kvm tree, upstream is yet lacking a few patches). But you have to keep in mind that, when you take a snapshot of the guest running inside Windbg via info registers (or via the built-in gdbstub), you actually debug Windbg itself, no longer the guest kernel code Windbg is interrupting. That's why you see different EIP values... Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)
Tomasz Chmielewski wrote: So with Linux virtio guests I may have luck, but not so with Windows, which can't (yet?) use kvm-clock. Correct? tsc isn't the only clocksource, there are also hpet and acpi (pm timer), they shouldn't have trouble with tsc freq changes. Dunno what windows uses by default. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copyless virtio net thoughts?
On Thursday 19 February 2009 02:54:06 Arnd Bergmann wrote: On Wednesday 18 February 2009, Rusty Russell wrote: 2) Direct NIC attachment This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs. So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest. This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly. If this is not passing the PCI device directly to the guest, but uses your concept, wouldn't it still be possible to use the firewalling in the host? You can always inspect the headers, drop the frame, etc without copying the whole frame at any point. It's possible, but you don't want routing or parsing, etc: the NIC is just directly attached to the guest. You could do it in qemu or whatever, but it would not be the kernel scheme (netfilter/iptables). 3) Direct interguest networking Anthony has been thinking here: vmsplice has already been mentioned. The idea of passing directly from one guest to another is an interesting one: using dma engines might be possible too. Again, host can't firewall this traffic. Simplest as a dedicated internal lan NIC, but we could theoretically do a fast-path for certain MAC addresses on a general guest NIC. Another option would be to use an SR-IOV adapter from multiple guests, with a virtual ethernet bridge in the adapter. This moves the overhead from the CPU to the bus and/or adapter, so it may or may not be a real benefit depending on the workload. Yes, I guess this should work. Even different SR-IOV adapters will simply send to one another. I'm not sure this obviates the desire to have direct inter-guest which is more generic though. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm mmu: fix another largepage memory leak
In the paging_fetch function rmap_remove is called after setting a large pte to non-present. This causes rmap_remove to not drop the reference to the large page. The result is a memory leak of that page. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/paging_tmpl.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 7314c09..0f11792 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -306,9 +306,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, continue; if (is_large_pte(*sptep)) { + rmap_remove(vcpu-kvm, sptep); set_shadow_pte(sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu-kvm); - rmap_remove(vcpu-kvm, sptep); } if (level == PT_DIRECTORY_LEVEL -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copyless virtio net thoughts?
On Thursday 19 February 2009 10:01:42 Simon Horman wrote: On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote: 2) Direct NIC attachment This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs. So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest. This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly. Hi Rusty, Can I clarify that the idea with utilising SR-IOV would be to assign virtual functions to guests? That is, something conceptually similar to PCI pass-through in Xen (although I'm not sure that anyone has virtual function pass-through working yet). Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes migrate complicated (if not impossible), and requires emulation or the same NIC on the destination host. This would be the *host* seeing the virtual functions as multiple NICs, then the ability to attach a given NIC directly to a process. This isn't guest-visible: the kvm process is configured to connect directly to a NIC, rather than (say) bridging through the host. If so, wouldn't this also be useful on machines that have multiple NICs? Yes, but mainly as a benchmark hack AFAICT :) Hope that clarifies, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copyless virtio net thoughts?
* Simon Horman (ho...@verge.net.au) wrote: On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote: 2) Direct NIC attachment This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs. So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest. This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly. Can I clarify that the idea with utilising SR-IOV would be to assign virtual functions to guests? That is, something conceptually similar to PCI pass-through in Xen (although I'm not sure that anyone has virtual function pass-through working yet). If so, wouldn't this also be useful on machines that have multiple NICs? This would be the typical usecase for sr-iov. But I think Rusty is referring to giving a nic directly to a guest but the guest is still seeing a virtio nic (not pass-through/device-assignment). So there's no bridge, and zero copy so the dma buffers are supplied by guest, but host has the driver for the physical nic or the VF. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm mmu: fix another largepage memory leak
On Thu, Feb 19, 2009 at 12:18:56PM +0100, Joerg Roedel wrote: In the paging_fetch function rmap_remove is called after setting a large pte to non-present. This causes rmap_remove to not drop the reference to the large page. The result is a memory leak of that page. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/paging_tmpl.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 7314c09..0f11792 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -306,9 +306,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, continue; if (is_large_pte(*sptep)) { + rmap_remove(vcpu-kvm, sptep); set_shadow_pte(sptep, shadow_trap_nonpresent_pte); kvm_flush_remote_tlbs(vcpu-kvm); - rmap_remove(vcpu-kvm, sptep); } if (level == PT_DIRECTORY_LEVEL -- ACK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Houston, we have May 15, 1953 (says guest when host uses cpufreq, and dies)
Marcelo Tosatti schrieb: On Wed, Feb 18, 2009 at 09:02:31PM +0100, Tomasz Chmielewski wrote: Marcelo Tosatti schrieb: On Wed, Feb 18, 2009 at 08:18:50PM +0100, Tomasz Chmielewski wrote: Marcelo Tosatti schrieb: - what CPU frequency will the guests show? Current host frequency? Host frequency from the moment the guest booted (i.e. right now the guest will show 1GHz even if the host is running at 2GHz, or the way around)? Host frequency from the moment the guest booted, since the guest does not receive frequency change notifications. Is it possible (or is it planned) to pass frequency to the guest (the one which is displayed in /proc/cpuinfo)? Possible, not planned AFAIK. Possible, right now? How? Write a paravirt notification scheme. That's a bit low level. I was thinking of a parameter to kvm (binary) which would pass the value to the guest. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copyless virtio net thoughts?
On Thursday 19 February 2009, Rusty Russell wrote: Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes migrate complicated (if not impossible), and requires emulation or the same NIC on the destination host. This would be the *host* seeing the virtual functions as multiple NICs, then the ability to attach a given NIC directly to a process. I guess what you mean then is what Intel calls VMDq, not SR-IOV. Eddie has some slides about this at http://docs.huihoo.com/kvm/kvmforum2008/kdf2008_7.pdf . The latest network cards support both operation modes, and it appears to me that there is a place for both. VMDq gives you the best performance without limiting flexibility, while SR-IOV performance in theory can be even better, but sacrificing a lot of flexibility and potentially local (guest-to-gest) performance. AFAICT, any card that supports SR-IOV should also allow a VMDq like model, as you describe. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm/qemu: use statfs to determine size of huge pages
The current method of finding out the size of huge pages does not work reliable anymore. Current Linux supports more than one huge page size but /proc/meminfo only show one of the supported sizes. To find out the real page size used can be found by calling statfs. This patch changes kvm/qemu to use statfs instead of parsing /proc/meminfo. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- qemu/sysemu.h |2 +- qemu/vl.c | 42 -- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/qemu/sysemu.h b/qemu/sysemu.h index 19464cf..4333495 100644 --- a/qemu/sysemu.h +++ b/qemu/sysemu.h @@ -99,7 +99,7 @@ extern int graphic_rotate; extern int no_quit; extern int semihosting_enabled; extern int old_param; -extern int hpagesize; +extern long hpagesize; extern const char *bootp_filename; #ifdef USE_KQEMU diff --git a/qemu/vl.c b/qemu/vl.c index bbd7aa3..b8c1162 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -61,6 +61,7 @@ #include sys/ioctl.h #include sys/resource.h #include sys/socket.h +#include sys/vfs.h #include netinet/in.h #include net/if.h #if defined(__NetBSD__) @@ -254,7 +255,7 @@ const char *mem_path = NULL; #ifdef MAP_POPULATE int mem_prealloc = 1; /* force preallocation of physical target memory */ #endif -int hpagesize = 0; +long hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM int old_param = 0; @@ -4717,32 +4718,29 @@ void qemu_get_launch_info(int *argc, char ***argv, int *opt_daemonize, const cha } #ifdef USE_KVM -static int gethugepagesize(void) + +#define HUGETLBFS_MAGIC 0x958458f6 + +static long gethugepagesize(const char *path) { -int ret, fd; -char buf[4096]; -const char *needle = Hugepagesize:; -char *size; -unsigned long hugepagesize; +struct statfs fs; +int ret; -fd = open(/proc/meminfo, O_RDONLY); -if (fd 0) { - perror(open); - exit(0); +do { + ret = statfs(path, fs); +} while (ret != 0 errno == EINTR); + +if (ret != 0) { + perror(statfs); + return 0; } -ret = read(fd, buf, sizeof(buf)); -if (ret 0) { - perror(read); - exit(0); +if (fs.f_type != HUGETLBFS_MAGIC) { + fprintf(stderr, Path not on HugeTLBFS: %s\n, path); + return 0; } -size = strstr(buf, needle); -if (!size) - return 0; -size += strlen(needle); -hugepagesize = strtol(size, NULL, 0); -return hugepagesize; +return fs.f_bsize; } static void *alloc_mem_area(size_t memory, unsigned long *len, const char *path) @@ -4762,7 +4760,7 @@ static void *alloc_mem_area(size_t memory, unsigned long *len, const char *path) if (asprintf(filename, %s/kvm.XX, path) == -1) return NULL; -hpagesize = gethugepagesize() * 1024; +hpagesize = gethugepagesize(path); if (!hpagesize) return NULL; -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3.14] Report IRQ injection status to userspace.
Gleb Natapov wrote: IRQ injection status is either -1 (if there was no CPU found that should except the interrupt because IRQ was masked or ioapic was misconfigured or ...) or = 0 in that case the number indicates to how many CPUs interrupt was injected. If the value is 0 it means that the interrupt was coalesced and probably should be reinjected. Applied, thanks. I hacked kvm_set_msi() to return 1 always, please follow up with a fix to that. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recent kvm and vmware server comparisons?
Martin Maurer wrote: I suppose no-one has any? VMware includes in its EULA (End User License Agreement) a prohibition for any licensee to publish benchmark results without VMware's approval. (see https://www.vmware.com/tryvmware/eula.php) Maybe this is a reason why all published VMWare benchmarks looks quite similar :-) I would love to see a comparison but due to this restrictions it´s hard to get independent results. Why compare kvm to vmware and not to real hardware? The results can than be compared to vmware/hardware and hyper-v/hardware. -- Hans -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2617499 ] Patch from upstream attached
Bugs item #2617499, was opened at 2009-02-19 12:33 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jeff (toxxic) Assigned to: Nobody/Anonymous (nobody) Summary: Patch from upstream attached Initial Comment: This is a patch, derived from the QEMU subversion repository. It fixes this problem, which could potentially cause data corruption. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2617499 ] Patch from upstream attached
Bugs item #2617499, was opened at 2009-02-19 12:33 Message generated for change (Comment added) made by toxxic You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Deleted Resolution: None Priority: 1 Private: No Submitted By: Jeff (toxxic) Assigned to: Nobody/Anonymous (nobody) Summary: Patch from upstream attached Initial Comment: This is a patch, derived from the QEMU subversion repository. It fixes this problem, which could potentially cause data corruption. -- Comment By: Jeff (toxxic) Date: 2009-02-19 12:35 Message: Blah... This was supposed to be attached to bug #2556746. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2617499group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2556746 ] FreeBSD/PC-BSD text screen corruption
Bugs item #2556746, was opened at 2009-02-02 04:19 Message generated for change (Comment added) made by toxxic You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2556746group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tim Knowles (knowlet) Assigned to: Nobody/Anonymous (nobody) Summary: FreeBSD/PC-BSD text screen corruption Initial Comment: Using either kvm-83, kvm-82 or kvm-81 I am unable to install FreeBSD or PC BSD due to screen corruption (screenshot attached). The initial boot menu is shown and is legible. Once you have selected the boot option the boot process continues the screen becomes corrupted. I initially discovered the problem when setting up an LVM backed guest in virt-manager but I have attached a minimal cmd line below that allows you to trigger it. 1) It would appear that this problem was introduced in kvm-81 (kvm-80 does not exhibit the problem with FBSD or PCBSD but I have not tested any other versions of kvm) 2) If I use the -no-kvm switch with KVM-83 this problem does not occur. Details: Host: 1 x Intel Core i7 920, Fedora 10 64bit. 6GB memory (Dell Studio XPS 435) kvm-83: self compiled - gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) cmd line: /usr/local/bin/qemu-system-x86_64 -m 512 -cdrom 7.1-RELEASE-amd64-dvd1.iso Guests: FreeBSD 7,1 PC-BSD 7.0.2 PS: I'd also like to add my thanks for creating KVM, it's fabulous tool. Many thanks -- Comment By: Jeff (toxxic) Date: 2009-02-19 12:56 Message: The QEMU subversion browser can generate a patch for this issue: http://svn.savannah.gnu.org/viewvc/trunk/exec.c?r1=6601r2=6628pathrev=6628root=qemuview=patch This patch installs cleanly against qemu/exec.c in KVM-81 and KVM-84. -- Comment By: Aurelien Jarno (aurel32) Date: 2009-02-18 13:38 Message: This is fixed in revision 6628 of QEMU, so probably soon in KVM. Any workaround to this bug as suggested ahead is a bad idea, as the screen is probably not the only affected by this bug. This means that some data can be corrupted. -- Comment By: Radek Hladik (kedarius) Date: 2009-02-04 11:06 Message: Confirming the problem too. kvm-83-2.fc11.x86_64 libvirt-0.6.0-1.fc11.x86_64 virt-manager-0.6.1-1.fc11.x86_64 qemu-0.9.1-12.fc11.x86_64 For the libvirt and virt-manager users, how they can use the workaround mentioned by toxxic: Press 6 in the boot, type set console=comconsole use view-serial consoles and type boot (choose xterm as term type) -- Comment By: Jeff (toxxic) Date: 2009-02-03 23:57 Message: I can confirm this happens, when using VNC for the console. Here's a workaround: Start kvm with a -serial flag. You're going to use it as a serial console. qemu-system-x86_64 -serial telnet::2226,server,nowait -cdrom 7.1-RELEASE-amd64-disc1.iso [...] Then connect to port 2226: telnet localhost 2226 Then when you boot FreeBSD CD, and the (legible) boot loader comes up. choose 6. Escape to loader prompt At the OK prompt, type: set console=comconsole The OK prompt will now appear in your telnet session. Type boot and hit return. Continue with legible FreeBSD install via your telnet session. You may want to set up a serial console on the FreeBSD system that you installed, as well. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2556746group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-userspace build break (linux/types.h)
A recent kernel merge breaks kvm-userspace build: make[1]: Entering directory `/root/hollisb/kvm-userspace.git/libkvm' gcc -m64 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer -Wall -fno-stack-protector -I /root/hollisb/kvm-userspace.git/kernel/include -c -o libkvm.o libkvm.c In file included from /usr/include/bits/fcntl.h:24, from /usr/include/fcntl.h:34, from libkvm.c:30: /usr/include/sys/types.h:46: error: conflicting types for ‘loff_t’ /usr/include/linux/types.h:30: error: previous declaration of ‘loff_t’ was here /usr/include/sys/types.h:62: error: conflicting types for ‘dev_t’ /usr/include/linux/types.h:13: error: previous declaration of ‘dev_t’ was here [...] I built like so: ./configure make -C kernel LINUX=/path/to/kvm.git sync make The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0 and cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh Rajput jaswin...@infradead.org: -#include asm/types.h +#include linux/types.h With these changes, libkvm.c ends up including /usr/include/linux/types.h, instead of the intended ../kernel/include/linux/types.h. Avi, suggestions? More make sync hacks? -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-userspace build break (linux/types.h)
On Thu, Feb 19, 2009 at 03:50:14PM -0600, Hollis Blanchard wrote: A recent kernel merge breaks kvm-userspace build: make[1]: Entering directory `/root/hollisb/kvm-userspace.git/libkvm' gcc -m64 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer -Wall -fno-stack-protector -I /root/hollisb/kvm-userspace.git/kernel/include -c -o libkvm.o libkvm.c In file included from /usr/include/bits/fcntl.h:24, from /usr/include/fcntl.h:34, from libkvm.c:30: /usr/include/sys/types.h:46: error: conflicting types for ‘loff_t’ /usr/include/linux/types.h:30: error: previous declaration of ‘loff_t’ was here /usr/include/sys/types.h:62: error: conflicting types for ‘dev_t’ /usr/include/linux/types.h:13: error: previous declaration of ‘dev_t’ was here [...] I built like so: ./configure make -C kernel LINUX=/path/to/kvm.git sync make The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0 and cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh Rajput jaswin...@infradead.org: -#include asm/types.h +#include linux/types.h With these changes, libkvm.c ends up including /usr/include/linux/types.h, instead of the intended ../kernel/include/linux/types.h. I had the same problem some weeks ago. IIRC I fixed it with some include reordering in libkvm.h. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copyless virtio net thoughts?
On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote: On Thursday 19 February 2009 10:01:42 Simon Horman wrote: On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote: 2) Direct NIC attachment This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs. So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest. This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly. Hi Rusty, Can I clarify that the idea with utilising SR-IOV would be to assign virtual functions to guests? That is, something conceptually similar to PCI pass-through in Xen (although I'm not sure that anyone has virtual function pass-through working yet). Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it makes migrate complicated (if not impossible), and requires emulation or the same NIC on the destination host. This would be the *host* seeing the virtual functions as multiple NICs, then the ability to attach a given NIC directly to a process. This isn't guest-visible: the kvm process is configured to connect directly to a NIC, rather than (say) bridging through the host. Hi Rusty, Hi Chris, Thanks for the clarification. I think that the approach that Xen recommends for migration is to use a bonding device that accesses the pass-through device if present and a virtual nic. The idea that you outline above does sound somewhat cleaner :-) If so, wouldn't this also be useful on machines that have multiple NICs? Yes, but mainly as a benchmark hack AFAICT :) Ok, I was under the impression that at least in the Xen world it was something people actually used. But I could easily be mistaken. Hope that clarifies, Rusty. On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote: * Simon Horman (ho...@verge.net.au) wrote: On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote: 2) Direct NIC attachment This is particularly interesting with SR-IOV or other multiqueue nics, but for boutique cases or benchmarks, could be for normal NICs. So far I have some very sketched-out patches: for the attached nic dev_alloc_skb() gets an skb from the guest (which supplies them via some kind of AIO interface), and a branch in netif_receive_skb() which returned it to the guest. This bypasses all firewalling in the host though; we're basically having the guest process drive the NIC directly. Can I clarify that the idea with utilising SR-IOV would be to assign virtual functions to guests? That is, something conceptually similar to PCI pass-through in Xen (although I'm not sure that anyone has virtual function pass-through working yet). If so, wouldn't this also be useful on machines that have multiple NICs? This would be the typical usecase for sr-iov. But I think Rusty is referring to giving a nic directly to a guest but the guest is still seeing a virtio nic (not pass-through/device-assignment). So there's no bridge, and zero copy so the dma buffers are supplied by guest, but host has the driver for the physical nic or the VF. -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
centos 5.x on kvm-83 doesnt think pentium pro has fast system calls
Why wouldn't SEP be recognized by kvm-83 running a centos 5.x guest on a ppro? Steven -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: kvm-userspace build break (linux/types.h)
For x86 and ia64, linux/types.h will be hacked to asm/types.h when syncing the source. You may consult kernel/x86/hack-module.awk to get the answer. Xiantao Joerg Roedel wrote: On Thu, Feb 19, 2009 at 03:50:14PM -0600, Hollis Blanchard wrote: A recent kernel merge breaks kvm-userspace build: make[1]: Entering directory `/root/hollisb/kvm-userspace.git/libkvm' gcc -m64 -D__x86_64__ -MMD -MF ./.libkvm.d -g -fomit-frame-pointer -Wall -fno-stack-protector -I /root/hollisb/kvm-userspace.git/kernel/include -c -o libkvm.o libkvm.c In file included from /usr/include/bits/fcntl.h:24, from /usr/include/fcntl.h:34, from libkvm.c:30: /usr/include/sys/types.h:46: error: conflicting types for 'loff_t' /usr/include/linux/types.h:30: error: previous declaration of 'loff_t' was here /usr/include/sys/types.h:62: error: conflicting types for 'dev_t' /usr/include/linux/types.h:13: error: previous declaration of 'dev_t' was here [...] I built like so: ./configure make -C kernel LINUX=/path/to/kvm.git sync make The problem appears to be 00bfddaf7f68a6551319b536f052040c370756b0 and cef3767852a9b1a7ff4a8dfe0969e2d32eb728df, both from Jaswinder Singh Rajput jaswin...@infradead.org: -#include asm/types.h +#include linux/types.h With these changes, libkvm.c ends up including /usr/include/linux/types.h, instead of the intended ../kernel/include/linux/types.h. I had the same problem some weeks ago. IIRC I fixed it with some include reordering in libkvm.h. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recent kvm and vmware server comparisons?
On Thursday 19 February 2009, Hans de Bruin wrote: Martin Maurer wrote: I suppose no-one has any? VMware includes in its EULA (End User License Agreement) a prohibition for any licensee to publish benchmark results without VMware's approval. (see https://www.vmware.com/tryvmware/eula.php) Maybe this is a reason why all published VMWare benchmarks looks quite similar :-) I would love to see a comparison but due to this restrictions it´s hard to get independent results. Why compare kvm to vmware and not to real hardware? The results can than be compared to vmware/hardware and hyper-v/hardware. hyper-v doesn't provide network or disk io ;) -- Thomas Fjellstrom tfjellst...@shaw.ca -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v10 3/7] PCI: reserve bus range for SR-IOV device
Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c | 34 ++ drivers/pci/pci.h |5 + drivers/pci/probe.c |3 +++ 3 files changed, 42 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 3bca8f8..0b80437 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -14,6 +14,16 @@ #include pci.h +static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 *devfn) +{ + u16 bdf; + + bdf = (dev-bus-number 8) + dev-devfn + + dev-sriov-offset + dev-sriov-stride * id; + *busnr = bdf 8; + *devfn = bdf 0xff; +} + static int sriov_init(struct pci_dev *dev, int pos) { int i; @@ -208,3 +218,27 @@ void pci_restore_iov_state(struct pci_dev *dev) if (dev-sriov) sriov_restore_state(dev); } + +/** + * pci_iov_bus_range - find bus range used by Virtual Function + * @bus: the PCI bus + * + * Returns max number of buses (exclude current one) used by Virtual + * Functions. + */ +int pci_iov_bus_range(struct pci_bus *bus) +{ + int max = 0; + u8 busnr, devfn; + struct pci_dev *dev; + + list_for_each_entry(dev, bus-devices, bus_list) { + if (!dev-sriov) + continue; + virtfn_bdf(dev, dev-sriov-total - 1, busnr, devfn); + if (busnr max) + max = busnr; + } + + return max ? max - bus-number : 0; +} diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index b24c9e2..2cf32f5 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -217,6 +217,7 @@ extern void pci_iov_release(struct pci_dev *dev); extern int pci_iov_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type); extern void pci_restore_iov_state(struct pci_dev *dev); +extern int pci_iov_bus_range(struct pci_bus *bus); #else static inline int pci_iov_init(struct pci_dev *dev) { @@ -234,6 +235,10 @@ static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno, static inline void pci_restore_iov_state(struct pci_dev *dev) { } +static inline int pci_iov_bus_range(struct pci_bus *bus) +{ + return 0; +} #endif /* CONFIG_PCI_IOV */ #endif /* DRIVERS_PCI_H */ diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 03b6f29..4c8abd0 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1078,6 +1078,9 @@ unsigned int __devinit pci_scan_child_bus(struct pci_bus *bus) for (devfn = 0; devfn 0x100; devfn += 8) pci_scan_slot(bus, devfn); + /* Reserve buses for SR-IOV capability. */ + max += pci_iov_bus_range(bus); + /* * After performing arch-dependent fixup of the bus, look behind * all PCI-to-PCI bridges on this bus. -- 1.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v10 0/7] PCI: Linux kernel SR-IOV support
Greetings, Following patches are intended to support SR-IOV capability in the Linux kernel. With these patches, people can turn a PCI device with the capability into multiple ones from software perspective, which will benefit KVM and achieve other purposes such as QoS, security, and etc. SR-IOV specification can be found at: http://www.pcisig.com/members/downloads/specifications/iov/sr-iov1.0_11Sep07.pdf (it requires membership.) Devices that support SR-IOV are available from following vendors: http://download.intel.com/design/network/ProdBrf/320025.pdf http://www.myri.com/vlsi/Lanai_Z8ES_Datasheet.pdf http://www.neterion.com/products/pdfs/X3100ProductBrief.pdf Physical Function driver patches for Intel 82576 NIC are available: http://patchwork.kernel.org/patch/8063/ http://patchwork.kernel.org/patch/8064/ http://patchwork.kernel.org/patch/8065/ http://patchwork.kernel.org/patch/8066/ Major changes from v9 to v10: 1, minor fix in pci_restore_iov_state(). 2, respin against the latest tree. Yu Zhao (7): PCI: initialize and release SR-IOV capability PCI: restore saved SR-IOV state PCI: reserve bus range for SR-IOV device PCI: add SR-IOV API for Physical Function driver PCI: handle SR-IOV Virtual Function Migration PCI: document SR-IOV sysfs entries PCI: manual for SR-IOV user and driver developer Documentation/ABI/testing/sysfs-bus-pci | 27 ++ Documentation/DocBook/kernel-api.tmpl |1 + Documentation/PCI/pci-iov-howto.txt | 99 + drivers/pci/Kconfig | 13 + drivers/pci/Makefile|3 + drivers/pci/iov.c | 711 +++ drivers/pci/pci.c |8 + drivers/pci/pci.h | 53 +++ drivers/pci/probe.c |7 + include/linux/pci.h | 28 ++ include/linux/pci_regs.h| 33 ++ 11 files changed, 983 insertions(+), 0 deletions(-) create mode 100644 Documentation/PCI/pci-iov-howto.txt create mode 100644 drivers/pci/iov.c -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v10 2/7] PCI: restore saved SR-IOV state
Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c | 29 + drivers/pci/pci.c |1 + drivers/pci/pci.h |4 3 files changed, 34 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index e6736d4..3bca8f8 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -128,6 +128,25 @@ static void sriov_release(struct pci_dev *dev) dev-sriov = NULL; } +static void sriov_restore_state(struct pci_dev *dev) +{ + int i; + u16 ctrl; + struct pci_sriov *iov = dev-sriov; + + pci_read_config_word(dev, iov-pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) + return; + + for (i = PCI_SRIOV_RESOURCES; i = PCI_SRIOV_RESOURCE_END; i++) + pci_update_resource(dev, i); + + pci_write_config_dword(dev, iov-pos + PCI_SRIOV_SYS_PGSIZE, iov-pgsz); + pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); + if (iov-ctrl PCI_SRIOV_CTRL_VFE) + msleep(100); +} + /** * pci_iov_init - initialize the IOV capability * @dev: the PCI device @@ -179,3 +198,13 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno, return dev-sriov-pos + PCI_SRIOV_BAR + 4 * (resno - PCI_SRIOV_RESOURCES); } + +/** + * pci_restore_iov_state - restore the state of the IOV capability + * @dev: the PCI device + */ +void pci_restore_iov_state(struct pci_dev *dev) +{ + if (dev-sriov) + sriov_restore_state(dev); +} diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2eba2a5..8e21912 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -773,6 +773,7 @@ pci_restore_state(struct pci_dev *dev) } pci_restore_pcix_state(dev); pci_restore_msi_state(dev); + pci_restore_iov_state(dev); return 0; } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 451db74..b24c9e2 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -216,6 +216,7 @@ extern int pci_iov_init(struct pci_dev *dev); extern void pci_iov_release(struct pci_dev *dev); extern int pci_iov_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type); +extern void pci_restore_iov_state(struct pci_dev *dev); #else static inline int pci_iov_init(struct pci_dev *dev) { @@ -230,6 +231,9 @@ static inline int pci_iov_resource_bar(struct pci_dev *dev, int resno, { return 0; } +static inline void pci_restore_iov_state(struct pci_dev *dev) +{ +} #endif /* CONFIG_PCI_IOV */ #endif /* DRIVERS_PCI_H */ -- 1.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v10 1/7] PCI: initialize and release SR-IOV capability
Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/Kconfig | 13 drivers/pci/Makefile |3 + drivers/pci/iov.c| 181 ++ drivers/pci/pci.c|7 ++ drivers/pci/pci.h| 37 ++ drivers/pci/probe.c |4 + include/linux/pci.h |8 ++ include/linux/pci_regs.h | 33 + 8 files changed, 286 insertions(+), 0 deletions(-) create mode 100644 drivers/pci/iov.c diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 2a4501d..e8ea3e8 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -59,3 +59,16 @@ config HT_IRQ This allows native hypertransport devices to use interrupts. If unsure say Y. + +config PCI_IOV + bool PCI IOV support + depends on PCI + select PCI_MSI + default n + help + PCI-SIG I/O Virtualization (IOV) Specifications support. + Single Root IOV: allows the Physical Function driver to enable + the hardware capability, so the Virtual Function is accessible + via the PCI Configuration Space using its own Bus, Device and + Function Numbers. Each Virtual Function also has the PCI Memory + Space to map the device specific register set. diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 3d07ce2..ba99282 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -29,6 +29,9 @@ obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o obj-$(CONFIG_INTR_REMAP) += dmar.o intr_remapping.o +# PCI IOV support +obj-$(CONFIG_PCI_IOV) += iov.o + # # Some architectures use the generic PCI setup functions # diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c new file mode 100644 index 000..e6736d4 --- /dev/null +++ b/drivers/pci/iov.c @@ -0,0 +1,181 @@ +/* + * drivers/pci/iov.c + * + * Copyright (C) 2009 Intel Corporation, Yu Zhao yu.z...@intel.com + * + * PCI Express I/O Virtualization (IOV) support. + * Single Root IOV 1.0 + */ + +#include linux/pci.h +#include linux/mutex.h +#include linux/string.h +#include linux/delay.h +#include pci.h + + +static int sriov_init(struct pci_dev *dev, int pos) +{ + int i; + int rc; + int nres; + u32 pgsz; + u16 ctrl, total, offset, stride; + struct pci_sriov *iov; + struct resource *res; + struct pci_dev *pdev; + + if (dev-pcie_type != PCI_EXP_TYPE_RC_END + dev-pcie_type != PCI_EXP_TYPE_ENDPOINT) + return -ENODEV; + + pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + if (ctrl PCI_SRIOV_CTRL_VFE) { + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0); + ssleep(1); + } + + pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, total); + if (!total) + return 0; + + list_for_each_entry(pdev, dev-bus-devices, bus_list) + if (pdev-sriov) + break; + if (list_empty(dev-bus-devices) || !pdev-sriov) + pdev = NULL; + + ctrl = 0; + if (!pdev pci_ari_enabled(dev-bus)) + ctrl |= PCI_SRIOV_CTRL_ARI; + + pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, ctrl); + pci_write_config_word(dev, pos + PCI_SRIOV_NUM_VF, total); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_OFFSET, offset); + pci_read_config_word(dev, pos + PCI_SRIOV_VF_STRIDE, stride); + if (!offset || (total 1 !stride)) + return -EIO; + + pci_read_config_dword(dev, pos + PCI_SRIOV_SUP_PGSIZE, pgsz); + i = PAGE_SHIFT 12 ? PAGE_SHIFT - 12 : 0; + pgsz = ~((1 i) - 1); + if (!pgsz) + return -EIO; + + pgsz = ~(pgsz - 1); + pci_write_config_dword(dev, pos + PCI_SRIOV_SYS_PGSIZE, pgsz); + + nres = 0; + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = dev-resource + PCI_SRIOV_RESOURCES + i; + i += __pci_read_base(dev, pci_bar_unknown, res, +pos + PCI_SRIOV_BAR + i * 4); + if (!res-flags) + continue; + if (resource_size(res) (PAGE_SIZE - 1)) { + rc = -EIO; + goto failed; + } + res-end = res-start + resource_size(res) * total - 1; + nres++; + } + + iov = kzalloc(sizeof(*iov), GFP_KERNEL); + if (!iov) { + rc = -ENOMEM; + goto failed; + } + + iov-pos = pos; + iov-nres = nres; + iov-ctrl = ctrl; + iov-total = total; + iov-offset = offset; + iov-stride = stride; + iov-pgsz = pgsz; + iov-self = dev; + pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, iov-cap); + pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, iov-link); + + if (pdev) + iov-pdev = pci_dev_get(pdev); + else { + iov-pdev = dev; +
[PATCH v10 4/7] PCI: add SR-IOV API for Physical Function driver
Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c | 348 +++ drivers/pci/pci.h |3 + include/linux/pci.h | 14 ++ 3 files changed, 365 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 0b80437..8096fc9 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -13,6 +13,8 @@ #include linux/delay.h #include pci.h +#define VIRTFN_ID_LEN 8 + static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 *devfn) { @@ -24,6 +26,319 @@ static inline void virtfn_bdf(struct pci_dev *dev, int id, u8 *busnr, u8 *devfn) *devfn = bdf 0xff; } +static struct pci_bus *virtfn_add_bus(struct pci_bus *bus, int busnr) +{ + int rc; + struct pci_bus *child; + + if (bus-number == busnr) + return bus; + + child = pci_find_bus(pci_domain_nr(bus), busnr); + if (child) + return child; + + child = pci_add_new_bus(bus, NULL, busnr); + if (!child) + return NULL; + + child-subordinate = busnr; + child-dev.parent = bus-bridge; + rc = pci_bus_add_child(child); + if (rc) { + pci_remove_bus(child); + return NULL; + } + + return child; +} + +static void virtfn_remove_bus(struct pci_bus *bus, int busnr) +{ + struct pci_bus *child; + + if (bus-number == busnr) + return; + + child = pci_find_bus(pci_domain_nr(bus), busnr); + BUG_ON(!child); + + if (list_empty(child-devices)) + pci_remove_bus(child); +} + +static int virtfn_add(struct pci_dev *dev, int id, int reset) +{ + int i; + int rc; + u64 size; + u8 busnr, devfn; + char buf[VIRTFN_ID_LEN]; + struct pci_dev *virtfn; + struct resource *res; + struct pci_sriov *iov = dev-sriov; + + virtfn = alloc_pci_dev(); + if (!virtfn) + return -ENOMEM; + + virtfn_bdf(dev, id, busnr, devfn); + mutex_lock(iov-pdev-sriov-lock); + virtfn-bus = virtfn_add_bus(dev-bus, busnr); + if (!virtfn-bus) { + kfree(virtfn); + mutex_unlock(iov-pdev-sriov-lock); + return -ENOMEM; + } + + virtfn-sysdata = dev-bus-sysdata; + virtfn-dev.parent = dev-dev.parent; + virtfn-dev.bus = dev-dev.bus; + virtfn-devfn = devfn; + virtfn-hdr_type = PCI_HEADER_TYPE_NORMAL; + virtfn-cfg_size = PCI_CFG_SPACE_EXP_SIZE; + virtfn-error_state = pci_channel_io_normal; + virtfn-current_state = PCI_UNKNOWN; + virtfn-is_pcie = 1; + virtfn-pcie_type = PCI_EXP_TYPE_ENDPOINT; + virtfn-dma_mask = 0x; + virtfn-vendor = dev-vendor; + virtfn-subsystem_vendor = dev-subsystem_vendor; + virtfn-class = dev-class; + pci_read_config_word(dev, iov-pos + PCI_SRIOV_VF_DID, virtfn-device); + pci_read_config_byte(virtfn, PCI_REVISION_ID, virtfn-revision); + pci_read_config_word(virtfn, PCI_SUBSYSTEM_ID, +virtfn-subsystem_device); + + dev_set_name(virtfn-dev, %04x:%02x:%02x.%d, +pci_domain_nr(virtfn-bus), busnr, +PCI_SLOT(devfn), PCI_FUNC(devfn)); + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = dev-resource + PCI_SRIOV_RESOURCES + i; + if (!res-parent) + continue; + virtfn-resource[i].name = pci_name(virtfn); + virtfn-resource[i].flags = res-flags; + size = resource_size(res); + do_div(size, iov-total); + virtfn-resource[i].start = res-start + size * id; + virtfn-resource[i].end = virtfn-resource[i].start + size - 1; + rc = request_resource(res, virtfn-resource[i]); + BUG_ON(rc); + } + + if (reset) + pci_execute_reset_function(virtfn); + + pci_device_add(virtfn, virtfn-bus); + mutex_unlock(iov-pdev-sriov-lock); + + virtfn-physfn = pci_dev_get(dev); + + rc = pci_bus_add_device(virtfn); + if (rc) + goto failed1; + sprintf(buf, %d, id); + rc = sysfs_create_link(iov-dev.kobj, virtfn-dev.kobj, buf); + if (rc) + goto failed1; + rc = sysfs_create_link(virtfn-dev.kobj, dev-dev.kobj, physfn); + if (rc) + goto failed2; + + kobject_uevent(virtfn-dev.kobj, KOBJ_CHANGE); + + return 0; + +failed2: + sysfs_remove_link(iov-dev.kobj, buf); +failed1: + pci_dev_put(dev); + mutex_lock(iov-pdev-sriov-lock); + pci_remove_bus_device(virtfn); + virtfn_remove_bus(dev-bus, busnr); + mutex_unlock(iov-pdev-sriov-lock); + + return rc; +} + +static void virtfn_remove(struct pci_dev *dev, int id, int reset) +{ + u8 busnr, devfn; + char buf[VIRTFN_ID_LEN]; + struct
[PATCH v10 5/7] PCI: handle SR-IOV Virtual Function Migration
Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c | 119 +++ drivers/pci/pci.h |4 ++ include/linux/pci.h |6 +++ 3 files changed, 129 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 8096fc9..063fe74 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -206,6 +206,97 @@ static void sriov_release_dev(struct device *dev) iov-nr_virtfn = 0; } +static int sriov_migration(struct pci_dev *dev) +{ + u16 status; + struct pci_sriov *iov = dev-sriov; + + if (!iov-nr_virtfn) + return 0; + + if (!(iov-cap PCI_SRIOV_CAP_VFM)) + return 0; + + pci_read_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status); + if (!(status PCI_SRIOV_STATUS_VFM)) + return 0; + + schedule_work(iov-mtask); + + return 1; +} + +static void sriov_migration_task(struct work_struct *work) +{ + int i; + u8 state; + u16 status; + struct pci_sriov *iov = container_of(work, struct pci_sriov, mtask); + + for (i = iov-initial; i iov-nr_virtfn; i++) { + state = readb(iov-mstate + i); + if (state == PCI_SRIOV_VFM_MI) { + writeb(PCI_SRIOV_VFM_AV, iov-mstate + i); + state = readb(iov-mstate + i); + if (state == PCI_SRIOV_VFM_AV) + virtfn_add(iov-self, i, 1); + } else if (state == PCI_SRIOV_VFM_MO) { + virtfn_remove(iov-self, i, 1); + writeb(PCI_SRIOV_VFM_UA, iov-mstate + i); + state = readb(iov-mstate + i); + if (state == PCI_SRIOV_VFM_AV) + virtfn_add(iov-self, i, 0); + } + } + + pci_read_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status); + status = ~PCI_SRIOV_STATUS_VFM; + pci_write_config_word(iov-self, iov-pos + PCI_SRIOV_STATUS, status); +} + +static int sriov_enable_migration(struct pci_dev *dev, int nr_virtfn) +{ + int bir; + u32 table; + resource_size_t pa; + struct pci_sriov *iov = dev-sriov; + + if (nr_virtfn = iov-initial) + return 0; + + pci_read_config_dword(dev, iov-pos + PCI_SRIOV_VFM, table); + bir = PCI_SRIOV_VFM_BIR(table); + if (bir PCI_STD_RESOURCE_END) + return -EIO; + + table = PCI_SRIOV_VFM_OFFSET(table); + if (table + nr_virtfn pci_resource_len(dev, bir)) + return -EIO; + + pa = pci_resource_start(dev, bir) + table; + iov-mstate = ioremap(pa, nr_virtfn); + if (!iov-mstate) + return -ENOMEM; + + INIT_WORK(iov-mtask, sriov_migration_task); + + iov-ctrl |= PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR; + pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); + + return 0; +} + +static void sriov_disable_migration(struct pci_dev *dev) +{ + struct pci_sriov *iov = dev-sriov; + + iov-ctrl = ~(PCI_SRIOV_CTRL_VFM | PCI_SRIOV_CTRL_INTR); + pci_write_config_word(dev, iov-pos + PCI_SRIOV_CTRL, iov-ctrl); + + cancel_work_sync(iov-mtask); + iounmap(iov-mstate); +} + static int sriov_enable(struct pci_dev *dev, int nr_virtfn) { int rc; @@ -294,6 +385,12 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) goto failed2; } + if (iov-cap PCI_SRIOV_CAP_VFM) { + rc = sriov_enable_migration(dev, nr_virtfn); + if (rc) + goto failed2; + } + kobject_uevent(dev-dev.kobj, KOBJ_CHANGE); iov-nr_virtfn = nr_virtfn; @@ -325,6 +422,9 @@ static void sriov_disable(struct pci_dev *dev) if (!iov-nr_virtfn) return; + if (iov-cap PCI_SRIOV_CAP_VFM) + sriov_disable_migration(dev); + for (i = 0; i iov-nr_virtfn; i++) virtfn_remove(dev, i, 0); @@ -590,3 +690,22 @@ void pci_disable_sriov(struct pci_dev *dev) sriov_disable(dev); } EXPORT_SYMBOL_GPL(pci_disable_sriov); + +/** + * pci_sriov_migration - notify SR-IOV core of Virtual Function Migration + * @dev: the PCI device + * + * Returns IRQ_HANDLED if the IRQ is handled, or IRQ_NONE if not. + * + * Physical Function driver is responsible to register IRQ handler using + * VF Migration Interrupt Message Number, and call this function when the + * interrupt is generated by the hardware. + */ +irqreturn_t pci_sriov_migration(struct pci_dev *dev) +{ + if (!dev-sriov) + return IRQ_NONE; + + return sriov_migration(dev) ? IRQ_HANDLED : IRQ_NONE; +} +EXPORT_SYMBOL_GPL(pci_sriov_migration); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9bbf868..6764f02 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -1,6 +1,8 @@ #ifndef
[PATCH v10 6/7] PCI: document SR-IOV sysfs entries
Signed-off-by: Yu Zhao yu.z...@intel.com --- Documentation/ABI/testing/sysfs-bus-pci | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index ceddcff..84dc100 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -9,3 +9,30 @@ Description: that some devices may have malformatted data. If the underlying VPD has a writable section then the corresponding section of this file will be writable. + +What: /sys/bus/pci/devices/.../virtfn/N +Date: February 2009 +Contact: Yu Zhao yu.z...@intel.com +Description: + This symbol link appears when hardware supports SR-IOV + capability and Physical Function driver has enabled it. + The symbol link points to the PCI device sysfs entry of + Virtual Function whose index is N (0...MaxVFs-1). + +What: /sys/bus/pci/devices/.../virtfn/dep_link +Date: February 2009 +Contact: Yu Zhao yu.z...@intel.com +Description: + This symbol link appears when hardware supports SR-IOV + capability and Physical Function driver has enabled it, + and this device has vendor specific dependencies with + others. The symbol link points to the PCI device sysfs + entry of Physical Function this device depends on. + +What: /sys/bus/pci/devices/.../physfn +Date: February 2009 +Contact: Yu Zhao yu.z...@intel.com +Description: + This symbol link appears when a device is Virtual Function. + The symbol link points to the PCI device sysfs entry of + Physical Function this device associates with. -- 1.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v10 7/7] PCI: manual for SR-IOV user and driver developer
Signed-off-by: Yu Zhao yu.z...@intel.com --- Documentation/DocBook/kernel-api.tmpl |1 + Documentation/PCI/pci-iov-howto.txt | 99 + 2 files changed, 100 insertions(+), 0 deletions(-) create mode 100644 Documentation/PCI/pci-iov-howto.txt diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 5818ff7..506e611 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -251,6 +251,7 @@ X!Edrivers/pci/hotplug.c -- !Edrivers/pci/probe.c !Edrivers/pci/rom.c +!Edrivers/pci/iov.c /sect1 sect1titlePCI Hotplug Support Library/title !Edrivers/pci/hotplug/pci_hotplug_core.c diff --git a/Documentation/PCI/pci-iov-howto.txt b/Documentation/PCI/pci-iov-howto.txt new file mode 100644 index 000..fc73ef5 --- /dev/null +++ b/Documentation/PCI/pci-iov-howto.txt @@ -0,0 +1,99 @@ + PCI Express I/O Virtualization Howto + Copyright (C) 2009 Intel Corporation + Yu Zhao yu.z...@intel.com + + +1. Overview + +1.1 What is SR-IOV + +Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended +capability which makes one physical device appear as multiple virtual +devices. The physical device is referred to as Physical Function (PF) +while the virtual devices are referred to as Virtual Functions (VF). +Allocation of the VF can be dynamically controlled by the PF via +registers encapsulated in the capability. By default, this feature is +not enabled and the PF behaves as traditional PCIe device. Once it's +turned on, each VF's PCI configuration space can be accessed by its own +Bus, Device and Function Number (Routing ID). And each VF also has PCI +Memory Space, which is used to map its register set. VF device driver +operates on the register set so it can be functional and appear as a +real existing PCI device. + +2. User Guide + +2.1 How can I enable SR-IOV capability + +The device driver (PF driver) will control the enabling and disabling +of the capability via API provided by SR-IOV core. If the hardware +has SR-IOV capability, loading its PF driver would enable it and all +VFs associated with the PF. + +2.2 How can I use the Virtual Functions + +The VF is treated as hot-plugged PCI devices in the kernel, so they +should be able to work in the same way as real PCI devices. The VF +requires device driver that is same as a normal PCI device's. + +3. Developer Guide + +3.1 SR-IOV API + +To enable SR-IOV capability: + int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); + 'nr_virtfn' is number of VFs to be enabled. + +To disable SR-IOV capability: + void pci_disable_sriov(struct pci_dev *dev); + +To notify SR-IOV core of Virtual Function Migration: + irqreturn_t pci_sriov_migration(struct pci_dev *dev); + +3.2 Usage example + +Following piece of code illustrates the usage of the SR-IOV API. + +static int __devinit dev_probe(struct pci_dev *dev, const struct pci_device_id *id) +{ + pci_enable_sriov(dev, NR_VIRTFN); + + ... + + return 0; +} + +static void __devexit dev_remove(struct pci_dev *dev) +{ + pci_disable_sriov(dev); + + ... +} + +static int dev_suspend(struct pci_dev *dev, pm_message_t state) +{ + ... + + return 0; +} + +static int dev_resume(struct pci_dev *dev) +{ + ... + + return 0; +} + +static void dev_shutdown(struct pci_dev *dev) +{ + ... +} + +static struct pci_driver dev_driver = { + .name = SR-IOV Physical Function driver, + .id_table = dev_id_table, + .probe =dev_probe, + .remove = __devexit_p(dev_remove), + .suspend = dev_suspend, + .resume = dev_resume, + .shutdown = dev_shutdown, +}; -- 1.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2609423 ] Segmentation fault when creating guest on PAE host
Bugs item #2609423, was opened at 2009-02-17 07:30 Message generated for change (Comment added) made by jiajun You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2609423group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: Segmentation fault when creating guest on PAE host Initial Comment: Environment: Kernel Commit:f0080da24a9990eff13cce5d0ee68e5f139725ce Userspace Commit:56fea7f2df7f9e70b9449832b96ba1b9a760423f Host Kernel Version:2.6.29-rc2 When creating guest on PAE host, qemu process will meet segmentation fault. [r...@vt-nhm1 ~]# qemu -m 256 -hda /share/xvs/var/ia32p_SMP.img Segmentation fault qemu[9998]: segfault at 6c65746e ip 081a0550 sp a6279888 error 6 in qemu-system-x86_64[8048000+1a8000] -- Comment By: Jiajun Xu (jiajun) Date: 2009-02-19 23:59 Message: The bug is fixed by kvm.userspace 68592ae18de1d45918542242e918085ca7f2e93c. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2609423group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/5] kvm/powerpc: Enable MPIC for E500 platform.
On Tue, Feb 17, 2009 at 04:55:51PM +0200, Blue Swirl wrote: On 2/17/09, Liu Yu yu@freescale.com wrote: MPIC and OpenPIC have very similar design. So a lot of code can be reused. Modification mainly include: 1. keep struct openpic_t to the maximum size of both MPIC and OpenPIC. 2. endianess swap. MPIC has the same endianess as target, so no need to swap for MPIC. I don't think this is correct, the host can still be different endian from target. I do not agree. As long as we don't manipulate host memory, the host endianess has nothing to do. The values are simply passed by value, they don't need to be swapped. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html