Re: [RFC/PATCH 1/1] virtio: Introduce MMIO ops
On 30.04.20 13:11, Srivatsa Vaddagiri wrote: * Will Deacon [2020-04-30 11:41:50]: On Thu, Apr 30, 2020 at 04:04:46PM +0530, Srivatsa Vaddagiri wrote: If CONFIG_VIRTIO_MMIO_OPS is defined, then I expect this to be unconditionally set to 'magic_qcom_ops' that uses hypervisor-supported interface for IO (for example: message_queue_send() and message_queue_recevie() hypercalls). Hmm, but then how would such a kernel work as a guest under all the spec-compliant hypervisors out there? Ok I see your point and yes for better binary compatibility, the ops have to be set based on runtime detection of hypervisor capabilities. Ok. I guess the other option is to standardize on a new virtio transport (like ivshmem2-virtio)? I haven't looked at that, but I suppose it depends on what your hypervisor folks are willing to accomodate. I believe ivshmem2_virtio requires hypervisor to support PCI device emulation (for life-cycle management of VMs), which our hypervisor may not support. A simple shared memory and doorbell or message-queue based transport will work for us. As written in our private conversation, a mapping of the ivshmem2 device discovery to platform mechanism (device tree etc.) and maybe even the register access for doorbell and life-cycle management to something hypercall-like would be imaginable. What would count more from virtio perspective is a common mapping on a shared memory transport. That said, I also warned about all the features that PCI already defined (such as message-based interrupts) which you may have to add when going a different way for the shared memory device. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH 5/5] virtio: Add bounce DMA ops
On 29.04.20 12:45, Michael S. Tsirkin wrote: On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote: On 29.04.20 12:20, Michael S. Tsirkin wrote: On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote: That would still not work I think where swiotlb is used for pass-thr devices (when private memory is fine) as well as virtio devices (when shared memory is required). So that is a separate question. When there are multiple untrusted devices, at the moment it looks like a single bounce buffer is used. Which to me seems like a security problem, I think we should protect untrusted devices from each other. Definitely. That's the model we have for ivshmem-virtio as well. Jan Want to try implementing that? The desire is definitely there, currently "just" not the time. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/5] virtio: Add bounce DMA ops
On 29.04.20 12:20, Michael S. Tsirkin wrote: On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote: That would still not work I think where swiotlb is used for pass-thr devices (when private memory is fine) as well as virtio devices (when shared memory is required). So that is a separate question. When there are multiple untrusted devices, at the moment it looks like a single bounce buffer is used. Which to me seems like a security problem, I think we should protect untrusted devices from each other. Definitely. That's the model we have for ivshmem-virtio as well. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: VIRTIO adoption in other hypervisors
On 28.02.20 17:47, Alex Bennée wrote: Jan Kiszka writes: On 28.02.20 11:30, Jan Kiszka wrote: On 28.02.20 11:16, Alex Bennée wrote: Hi, I believe there has been some development work for supporting VIRTIO on Xen although it seems to have stalled according to: https://wiki.xenproject.org/wiki/Virtio_On_Xen Recently at KVM Forum there was Jan's talk about Inter-VM shared memory which proposed ivshmemv2 as a VIRTIO transport: https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/ As I understood it this would allow Xen (and other hypervisors) a simple way to be able to carry virtio traffic between guest and end point. And to clarify the scope of this effort: virtio-over-ivshmem is not the fastest option to offer virtio to a guest (static "DMA" window), but it is the simplest one from the hypervisor PoV and, thus, also likely the easiest one to argue over when it comes to security and safety. So to drill down on this is this a particular problem with type-1 hypervisors? Well, this typing doesn't help here (like it rarely does). There are kvm-based setups that are stripped down and hardened in a way where other folks would rather think of "type 1". I just had a discussion around such a model for a cloud scenario that runs on kvm. It seems to me any KVM-like run loop trivially supports a range of virtio devices by virtue of trapping accesses to the signalling area of a virtqueue and allowing the VMM to handle the transaction which ever way it sees fit. I've not quite understood the way Xen interfaces to QEMU aside from it's different to everything else. More over it seems the type-1 hypervisors are more interested in providing better isolation between segments of a system whereas VIRTIO currently assumes either the VMM or the hypervisor has full access the full guest address space. I've seen quite a lot of slides that want to isolate sections of device emulation to separate processes or even separate guest VMs. The point is in fact not only whether to trap IO accesses or to ask the guest to rather target something like ivshmem (in fact, that is where use cases I have in mind deviated from those of that cloud operator). It is specifically the question how the backend should be able to transfer data to/from the frontend. If you want to isolate the both from each other (driver VMs/domains/etc.), you either need a complex virtual IOMMU (or "grant tables") or a static DMA windows (like ivshmem). The former is more efficient with large transfers, the latter is much simpler and therefore more robust. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: VIRTIO adoption in other hypervisors
On 28.02.20 11:30, Jan Kiszka wrote: On 28.02.20 11:16, Alex Bennée wrote: Hi, I'm currently trying to get my head around virtio and was wondering how widespread adoption of virtio is amongst the various hypervisors and emulators out there. Obviously I'm familiar with QEMU both via KVM and even when just doing plain emulation (although with some restrictions). As far as I'm aware the various Rust based VMMs have vary degrees of support for virtio devices over KVM as well. CrosVM specifically is embracing virtio for multi-process device emulation. I believe there has been some development work for supporting VIRTIO on Xen although it seems to have stalled according to: https://wiki.xenproject.org/wiki/Virtio_On_Xen Recently at KVM Forum there was Jan's talk about Inter-VM shared memory which proposed ivshmemv2 as a VIRTIO transport: https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/ As I understood it this would allow Xen (and other hypervisors) a simple way to be able to carry virtio traffic between guest and end point. And to clarify the scope of this effort: virtio-over-ivshmem is not the fastest option to offer virtio to a guest (static "DMA" window), but it is the simplest one from the hypervisor PoV and, thus, also likely the easiest one to argue over when it comes to security and safety. Jan So some questions: - Am I missing anything out in that summary? - How about HyperV and the OSX equivalent? - Do any other type-1 hypervisors support virtio? From the top of my head, some other hypervisors with virtio support (irrespective of any classification): https://wiki.freebsd.org/bhyve https://projectacrn.org/ http://www.xhypervisor.org/ https://www.opensynergy.com/automotive-hypervisor/ But there are likely more. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: VIRTIO adoption in other hypervisors
On 28.02.20 11:16, Alex Bennée wrote: Hi, I'm currently trying to get my head around virtio and was wondering how widespread adoption of virtio is amongst the various hypervisors and emulators out there. Obviously I'm familiar with QEMU both via KVM and even when just doing plain emulation (although with some restrictions). As far as I'm aware the various Rust based VMMs have vary degrees of support for virtio devices over KVM as well. CrosVM specifically is embracing virtio for multi-process device emulation. I believe there has been some development work for supporting VIRTIO on Xen although it seems to have stalled according to: https://wiki.xenproject.org/wiki/Virtio_On_Xen Recently at KVM Forum there was Jan's talk about Inter-VM shared memory which proposed ivshmemv2 as a VIRTIO transport: https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/ As I understood it this would allow Xen (and other hypervisors) a simple way to be able to carry virtio traffic between guest and end point. So some questions: - Am I missing anything out in that summary? - How about HyperV and the OSX equivalent? - Do any other type-1 hypervisors support virtio? From the top of my head, some other hypervisors with virtio support (irrespective of any classification): https://wiki.freebsd.org/bhyve https://projectacrn.org/ http://www.xhypervisor.org/ https://www.opensynergy.com/automotive-hypervisor/ But there are likely more. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] tools/virtio: Fix build
On 13.10.19 14:20, Michael S. Tsirkin wrote: > On Sun, Oct 13, 2019 at 02:01:03PM +0200, Jan Kiszka wrote: >> On 13.10.19 13:52, Michael S. Tsirkin wrote: >>> On Sun, Oct 13, 2019 at 11:03:30AM +0200, Jan Kiszka wrote: >>>> From: Jan Kiszka >>>> >>>> Various changes in the recent kernel versions broke the build due to >>>> missing function and header stubs. >>>> >>>> Signed-off-by: Jan Kiszka >>> >>> Thanks! >>> I think it's already fixes in the vhost tree. >>> That tree also includes a bugfix for the test. >>> Can you pls give it a spin and report? >> >> Mostly fixed: the xen_domain stup is missing. >> >> Jan > > That's in xen/xen.h. Do you still see any build errors? ca16cf7b30ca79eeca4d612af121e664ee7d8737 lacks this - forgot to add to some commit? Jan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] tools/virtio: Fix build
On 13.10.19 13:52, Michael S. Tsirkin wrote: > On Sun, Oct 13, 2019 at 11:03:30AM +0200, Jan Kiszka wrote: >> From: Jan Kiszka >> >> Various changes in the recent kernel versions broke the build due to >> missing function and header stubs. >> >> Signed-off-by: Jan Kiszka > > Thanks! > I think it's already fixes in the vhost tree. > That tree also includes a bugfix for the test. > Can you pls give it a spin and report? Mostly fixed: the xen_domain stup is missing. Jan > Thanks! > >> --- >> tools/virtio/crypto/hash.h | 0 >> tools/virtio/linux/dma-mapping.h | 2 ++ >> tools/virtio/linux/kernel.h | 2 ++ >> 3 files changed, 4 insertions(+) >> create mode 100644 tools/virtio/crypto/hash.h >> >> diff --git a/tools/virtio/crypto/hash.h b/tools/virtio/crypto/hash.h >> new file mode 100644 >> index ..e69de29bb2d1 >> diff --git a/tools/virtio/linux/dma-mapping.h >> b/tools/virtio/linux/dma-mapping.h >> index f91aeb5fe571..db96cb4bf877 100644 >> --- a/tools/virtio/linux/dma-mapping.h >> +++ b/tools/virtio/linux/dma-mapping.h >> @@ -29,4 +29,6 @@ enum dma_data_direction { >> #define dma_unmap_single(...) do { } while (0) >> #define dma_unmap_page(...) do { } while (0) >> >> +#define dma_max_mapping_size(d) 0 >> + >> #endif >> diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h >> index 6683b4a70b05..ccf321173210 100644 >> --- a/tools/virtio/linux/kernel.h >> +++ b/tools/virtio/linux/kernel.h >> @@ -141,4 +141,6 @@ static inline void free_page(unsigned long addr) >> #define list_for_each_entry(a, b, c) while (0) >> /* end of stubs */ >> >> +#define xen_domain() 0 >> + >> #endif /* KERNEL_H */ >> -- >> 2.16.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] tools/virtio: Fix build
From: Jan Kiszka Various changes in the recent kernel versions broke the build due to missing function and header stubs. Signed-off-by: Jan Kiszka --- tools/virtio/crypto/hash.h | 0 tools/virtio/linux/dma-mapping.h | 2 ++ tools/virtio/linux/kernel.h | 2 ++ 3 files changed, 4 insertions(+) create mode 100644 tools/virtio/crypto/hash.h diff --git a/tools/virtio/crypto/hash.h b/tools/virtio/crypto/hash.h new file mode 100644 index ..e69de29bb2d1 diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h index f91aeb5fe571..db96cb4bf877 100644 --- a/tools/virtio/linux/dma-mapping.h +++ b/tools/virtio/linux/dma-mapping.h @@ -29,4 +29,6 @@ enum dma_data_direction { #define dma_unmap_single(...) do { } while (0) #define dma_unmap_page(...) do { } while (0) +#define dma_max_mapping_size(d)0 + #endif diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h index 6683b4a70b05..ccf321173210 100644 --- a/tools/virtio/linux/kernel.h +++ b/tools/virtio/linux/kernel.h @@ -141,4 +141,6 @@ static inline void free_page(unsigned long addr) #define list_for_each_entry(a, b, c) while (0) /* end of stubs */ +#define xen_domain() 0 + #endif /* KERNEL_H */ -- 2.16.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Otavio Pontes <otavio.pon...@intel.com> Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. Signed-off-by: Otavio Pontes <otavio.pon...@intel.com> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus] Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com> --- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/jailhouse.c| 8 arch/x86/pci/mmconfig-shared.c | 4 ++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index eb66fa9cd0fc..959d618dbb17 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, + int end, u64 addr); extern struct list_head pci_mmcfg_list; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index b68fd895235a..fa183a131edc 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void) if (pcibios_last_bus < 0) pcibios_last_bus = 0xff; +#ifdef CONFIG_PCI_MMCONFIG + if (setup_data.pci_mmconfig_base) { + pci_mmconfig_add(0, 0, pcibios_last_bus, +setup_data.pci_mmconfig_base); + pci_mmcfg_arch_init(); + } +#endif + return 0; } diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 96684d0adcf9..0e590272366b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, return new; } -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, +int end, u64 addr) { struct pci_mmcfg_region *new; -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 5/7] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka <jan.kis...@siemens.com> Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we have two PCI_MMCONFIG entries, one from the original i386 and another from x86_64. This consolidates both entries into a single one. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c19f5342ec2b..8986a6b6e3df 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2641,8 +2641,10 @@ config PCI_DIRECT depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG)) config PCI_MMCONFIG - def_bool y - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY) + bool "Support mmconfig PCI config space access" if X86_64 + default y + depends on PCI && (ACPI || SFI) + depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) config PCI_OLPC def_bool y @@ -2657,11 +2659,6 @@ config PCI_DOMAINS def_bool y depends on PCI -config PCI_MMCONFIG - bool "Support mmconfig PCI config space access" - default y - depends on X86_64 && PCI && (ACPI || SFI) - config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest is currently pending in the tip tree. This builds on top and enhances the PCI support for x86 and also ARM guests (ARM[64] does not require platform patches and works already). Key elements of this series are: - detection of Jailhouse via device tree hypervisor node - function-level PCI scan if Jailhouse is detected - MMCONFIG support for x86 guests As most changes affect x86, I would suggest to route the series also via tip after the necessary acks are collected. Changes in v5: - fix build breakage of patch 6 on i386 Changes in v4: - slit up Kconfig changes - respect pcibios_last_bus during mmconfig setup - cosmetic changes requested by Andy Changes in v3: - avoided duplicate scans of PCI functions under Jailhouse - reformated PCI_MMCONFIG condition and rephrase related commit log Changes in v2: - adjusted commit log and include ordering in patch 2 - rebased over Linus master Jan [1] https://lkml.org/lkml/2017/11/27/125 [2] http://jailhouse-project.org CC: Benedikt Spranger <b.spran...@linutronix.de> CC: Juergen Gross <jgr...@suse.com> CC: Mark Rutland <mark.rutl...@arm.com> CC: Otavio Pontes <otavio.pon...@intel.com> CC: Rob Herring <robh...@kernel.org> Jan Kiszka (6): jailhouse: Provide detection for non-x86 systems PCI: Scan all functions when running over Jailhouse x86: Align x86_64 PCI_MMCONFIG with 32-bit variant x86: Consolidate PCI_MMCONFIG configs x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI MAINTAINERS: Add entry for Jailhouse Otavio Pontes (1): x86/jailhouse: Enable PCI mmconfig access in inmates Documentation/devicetree/bindings/jailhouse.txt | 8 MAINTAINERS | 7 +++ arch/x86/Kconfig| 12 +++- arch/x86/include/asm/jailhouse_para.h | 2 +- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/jailhouse.c | 8 arch/x86/pci/legacy.c | 4 +++- arch/x86/pci/mmconfig-shared.c | 4 ++-- drivers/pci/probe.c | 22 +++--- include/linux/hypervisor.h | 17 +++-- 12 files changed, 74 insertions(+), 16 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka <jan.kis...@siemens.com> Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, we need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 6 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8986a6b6e3df..b53340e71f84 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2643,7 +2643,7 @@ config PCI_DIRECT config PCI_MMCONFIG bool "Support mmconfig PCI config space access" if X86_64 default y - depends on PCI && (ACPI || SFI) + depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) config PCI_OLPC @@ -2659,6 +2659,10 @@ config PCI_DOMAINS def_bool y depends on PCI +config MMCONF_FAM10H + def_bool y + depends on X86_64 && PCI_MMCONFIG && ACPI + config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 29786c87e864..73ccf80c09a2 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o obj-y += vsmp_64.o endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f0e6456ca7d3..12bc0a1139da 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) static void init_amd_gh(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 +#ifdef CONFIG_MMCONF_FAM10H /* do this for boot cpu */ if (c == _cpu_data) check_enable_amd_mmconf_dmi(); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 4/7] x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
From: Jan Kiszka <jan.kis...@siemens.com> Allow to enable PCI_MMCONFIG when only SFI is present and make this option default on. This will help consolidating both into one Kconfig statement. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..c19f5342ec2b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2659,7 +2659,8 @@ config PCI_DOMAINS config PCI_MMCONFIG bool "Support mmconfig PCI config space access" - depends on X86_64 && PCI && ACPI + default y + depends on X86_64 && PCI && (ACPI || SFI) config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 2/7] PCI: Scan all functions when running over Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a function 0. Therefore, Linux scans for devices at function 0 (devfn 0/8/16/...) and only scans for other functions if function 0 has its Multi-Function Device bit set or ARI or SR-IOV indicate there are more functions. The Jailhouse hypervisor may pass individual functions of a multi-function device to a guest without passing function 0, which means a Linux guest won't find them. Change Linux PCI probing so it scans all function numbers when running as a guest over Jailhouse. This is technically prohibited by the spec, so it is possible that PCI devices without the Multi-Function Device bit set may have unexpected behavior in response to this probe. Derived from original patch by Benedikt Spranger. CC: Benedikt Spranger <b.spran...@linutronix.de> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Acked-by: Bjorn Helgaas <bhelg...@google.com> Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com> --- arch/x86/pci/legacy.c | 4 +++- drivers/pci/probe.c | 22 +++--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 1cb01abcb1be..dfbe6ac38830 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -4,6 +4,7 @@ #include #include #include +#include #include /* @@ -34,13 +35,14 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { + int stride = jailhouse_paravirt() ? 1 : 8; int devfn; u32 l; if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += 8) { + for (devfn = 0; devfn < 256; devfn += stride) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && l != 0x && l != 0x) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ef5377438a1e..3c365dc996e7 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, { unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0; unsigned int start = bus->busn_res.start; - unsigned int devfn, cmax, max = start; + unsigned int devfn, fn, cmax, max = start; struct pci_dev *dev; + int nr_devs; dev_dbg(>dev, "scanning bus\n"); /* Go find them, Rover! */ - for (devfn = 0; devfn < 0x100; devfn += 8) - pci_scan_slot(bus, devfn); + for (devfn = 0; devfn < 256; devfn += 8) { + nr_devs = pci_scan_slot(bus, devfn); + + /* +* The Jailhouse hypervisor may pass individual functions of a +* multi-function device to a guest without passing function 0. +* Look for them as well. +*/ + if (jailhouse_paravirt() && nr_devs == 0) { + for (fn = 1; fn < 8; fn++) { + dev = pci_scan_single_device(bus, devfn + fn); + if (dev) + dev->multifunction = 1; + } + } + } /* Reserve buses for SR-IOV capability */ used_buses = pci_iov_bus_range(bus); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 1/7] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka <jan.kis...@siemens.com> Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. CC: Rob Herring <robh...@kernel.org> CC: Mark Rutland <mark.rutl...@arm.com> CC: Juergen Gross <jgr...@suse.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- Documentation/devicetree/bindings/jailhouse.txt | 8 arch/x86/include/asm/jailhouse_para.h | 2 +- include/linux/hypervisor.h | 17 +++-- 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt new file mode 100644 index ..2901c25ff340 --- /dev/null +++ b/Documentation/devicetree/bindings/jailhouse.txt @@ -0,0 +1,8 @@ +Jailhouse non-root cell device tree bindings + + +When running in a non-root Jailhouse cell (partition), the device tree of this +platform shall have a top-level "hypervisor" node with the following +properties: + +- compatible = "jailhouse,cell" diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h index 875b54376689..b885a961a150 100644 --- a/arch/x86/include/asm/jailhouse_para.h +++ b/arch/x86/include/asm/jailhouse_para.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL2.0 */ /* - * Jailhouse paravirt_ops implementation + * Jailhouse paravirt detection * * Copyright (c) Siemens AG, 2015-2017 * diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b19563f9a8eb..fc08b433c856 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -8,15 +8,28 @@ */ #ifdef CONFIG_X86 + +#include #include + static inline void hypervisor_pin_vcpu(int cpu) { x86_platform.hyper.pin_vcpu(cpu); } -#else + +#else /* !CONFIG_X86 */ + +#include + static inline void hypervisor_pin_vcpu(int cpu) { } -#endif + +static inline bool jailhouse_paravirt(void) +{ + return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); +} + +#endif /* !CONFIG_X86 */ #endif /* __LINUX_HYPEVISOR_H */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v5 7/7] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 4623caf8d72d..6dc0b8f3ae0e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7523,6 +7523,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/ix2505v* +JAILHOUSE HYPERVISOR INTERFACE +M: Jan Kiszka <jan.kis...@siemens.com> +L: jailhouse-...@googlegroups.com +S: Maintained +F: arch/x86/kernel/jailhouse.c +F: arch/x86/include/asm/jailhouse_para.h + JC42.4 TEMPERATURE SENSOR DRIVER M: Guenter Roeck <li...@roeck-us.net> L: linux-hw...@vger.kernel.org -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 2/7] PCI: Scan all functions when running over Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a function 0. Therefore, Linux scans for devices at function 0 (devfn 0/8/16/...) and only scans for other functions if function 0 has its Multi-Function Device bit set or ARI or SR-IOV indicate there are more functions. The Jailhouse hypervisor may pass individual functions of a multi-function device to a guest without passing function 0, which means a Linux guest won't find them. Change Linux PCI probing so it scans all function numbers when running as a guest over Jailhouse. This is technically prohibited by the spec, so it is possible that PCI devices without the Multi-Function Device bit set may have unexpected behavior in response to this probe. Derived from original patch by Benedikt Spranger. CC: Benedikt Spranger <b.spran...@linutronix.de> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Acked-by: Bjorn Helgaas <bhelg...@google.com> Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com> --- arch/x86/pci/legacy.c | 4 +++- drivers/pci/probe.c | 22 +++--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 1cb01abcb1be..dfbe6ac38830 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -4,6 +4,7 @@ #include #include #include +#include #include /* @@ -34,13 +35,14 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { + int stride = jailhouse_paravirt() ? 1 : 8; int devfn; u32 l; if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += 8) { + for (devfn = 0; devfn < 256; devfn += stride) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && l != 0x && l != 0x) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ef5377438a1e..3c365dc996e7 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, { unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0; unsigned int start = bus->busn_res.start; - unsigned int devfn, cmax, max = start; + unsigned int devfn, fn, cmax, max = start; struct pci_dev *dev; + int nr_devs; dev_dbg(>dev, "scanning bus\n"); /* Go find them, Rover! */ - for (devfn = 0; devfn < 0x100; devfn += 8) - pci_scan_slot(bus, devfn); + for (devfn = 0; devfn < 256; devfn += 8) { + nr_devs = pci_scan_slot(bus, devfn); + + /* +* The Jailhouse hypervisor may pass individual functions of a +* multi-function device to a guest without passing function 0. +* Look for them as well. +*/ + if (jailhouse_paravirt() && nr_devs == 0) { + for (fn = 1; fn < 8; fn++) { + dev = pci_scan_single_device(bus, devfn + fn); + if (dev) + dev->multifunction = 1; + } + } + } /* Reserve buses for SR-IOV capability */ used_buses = pci_iov_bus_range(bus); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest is currently pending in the tip tree. This builds on top and enhances the PCI support for x86 and also ARM guests (ARM[64] does not require platform patches and works already). Key elements of this series are: - detection of Jailhouse via device tree hypervisor node - function-level PCI scan if Jailhouse is detected - MMCONFIG support for x86 guests As most changes affect x86, I would suggest to route the series also via tip after the necessary acks are collected. Changes in v4: - slit up Kconfig changes - respect pcibios_last_bus during mmconfig setup - cosmetic changes requested by Andy Changes in v3: - avoided duplicate scans of PCI functions under Jailhouse - reformated PCI_MMCONFIG condition and rephrase related commit log Changes in v2: - adjusted commit log and include ordering in patch 2 - rebased over Linus master Jan [1] https://lkml.org/lkml/2017/11/27/125 [2] http://jailhouse-project.org CC: Benedikt Spranger <b.spran...@linutronix.de> CC: Juergen Gross <jgr...@suse.com> CC: Mark Rutland <mark.rutl...@arm.com> CC: Otavio Pontes <otavio.pon...@intel.com> CC: Rob Herring <robh...@kernel.org> Jan Kiszka (6): jailhouse: Provide detection for non-x86 systems PCI: Scan all functions when running over Jailhouse x86: Align x86_64 PCI_MMCONFIG with 32-bit variant x86: Consolidate PCI_MMCONFIG configs x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI MAINTAINERS: Add entry for Jailhouse Otavio Pontes (1): x86/jailhouse: Enable PCI mmconfig access in inmates Documentation/devicetree/bindings/jailhouse.txt | 8 MAINTAINERS | 7 +++ arch/x86/Kconfig| 12 +++- arch/x86/include/asm/jailhouse_para.h | 2 +- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/jailhouse.c | 8 arch/x86/pci/legacy.c | 4 +++- arch/x86/pci/mmconfig-shared.c | 4 ++-- drivers/pci/probe.c | 22 +++--- include/linux/hypervisor.h | 17 +++-- 12 files changed, 74 insertions(+), 16 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Otavio Pontes <otavio.pon...@intel.com> Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. Signed-off-by: Otavio Pontes <otavio.pon...@intel.com> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus] Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/jailhouse.c| 8 arch/x86/pci/mmconfig-shared.c | 4 ++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index eb66fa9cd0fc..959d618dbb17 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, + int end, u64 addr); extern struct list_head pci_mmcfg_list; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index b68fd895235a..fa183a131edc 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void) if (pcibios_last_bus < 0) pcibios_last_bus = 0xff; +#ifdef CONFIG_PCI_MMCONFIG + if (setup_data.pci_mmconfig_base) { + pci_mmconfig_add(0, 0, pcibios_last_bus, +setup_data.pci_mmconfig_base); + pci_mmcfg_arch_init(); + } +#endif + return 0; } diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 96684d0adcf9..0e590272366b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, return new; } -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, +int end, u64 addr) { struct pci_mmcfg_region *new; -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 4/7] x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
From: Jan Kiszka <jan.kis...@siemens.com> Allow to enable PCI_MMCONFIG when only SFI is present and make this option default on. This will help consolidating both into one Kconfig statement. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..c19f5342ec2b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2659,7 +2659,8 @@ config PCI_DOMAINS config PCI_MMCONFIG bool "Support mmconfig PCI config space access" - depends on X86_64 && PCI && ACPI + default y + depends on X86_64 && PCI && (ACPI || SFI) config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka <jan.kis...@siemens.com> Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, we need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 6 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8986a6b6e3df..08a3236cb6f2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2643,7 +2643,7 @@ config PCI_DIRECT config PCI_MMCONFIG bool "Support mmconfig PCI config space access" if X86_64 default y - depends on PCI && (ACPI || SFI) + depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) config PCI_OLPC @@ -2659,6 +2659,10 @@ config PCI_DOMAINS def_bool y depends on PCI +config MMCONF_FAM10H + def_bool y + depends on PCI_MMCONFIG && ACPI + config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 29786c87e864..73ccf80c09a2 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o obj-y += vsmp_64.o endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f0e6456ca7d3..12bc0a1139da 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) static void init_amd_gh(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 +#ifdef CONFIG_MMCONF_FAM10H /* do this for boot cpu */ if (c == _cpu_data) check_enable_amd_mmconf_dmi(); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 5/7] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka <jan.kis...@siemens.com> Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we have two PCI_MMCONFIG entries, one from the original i386 and another from x86_64. This consolidates both entries into a single one. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c19f5342ec2b..8986a6b6e3df 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2641,8 +2641,10 @@ config PCI_DIRECT depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG)) config PCI_MMCONFIG - def_bool y - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY) + bool "Support mmconfig PCI config space access" if X86_64 + default y + depends on PCI && (ACPI || SFI) + depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) config PCI_OLPC def_bool y @@ -2657,11 +2659,6 @@ config PCI_DOMAINS def_bool y depends on PCI -config PCI_MMCONFIG - bool "Support mmconfig PCI config space access" - default y - depends on X86_64 && PCI && (ACPI || SFI) - config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 1/7] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka <jan.kis...@siemens.com> Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. CC: Rob Herring <robh...@kernel.org> CC: Mark Rutland <mark.rutl...@arm.com> CC: Juergen Gross <jgr...@suse.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- Documentation/devicetree/bindings/jailhouse.txt | 8 arch/x86/include/asm/jailhouse_para.h | 2 +- include/linux/hypervisor.h | 17 +++-- 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt new file mode 100644 index ..2901c25ff340 --- /dev/null +++ b/Documentation/devicetree/bindings/jailhouse.txt @@ -0,0 +1,8 @@ +Jailhouse non-root cell device tree bindings + + +When running in a non-root Jailhouse cell (partition), the device tree of this +platform shall have a top-level "hypervisor" node with the following +properties: + +- compatible = "jailhouse,cell" diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h index 875b54376689..b885a961a150 100644 --- a/arch/x86/include/asm/jailhouse_para.h +++ b/arch/x86/include/asm/jailhouse_para.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL2.0 */ /* - * Jailhouse paravirt_ops implementation + * Jailhouse paravirt detection * * Copyright (c) Siemens AG, 2015-2017 * diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b19563f9a8eb..fc08b433c856 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -8,15 +8,28 @@ */ #ifdef CONFIG_X86 + +#include #include + static inline void hypervisor_pin_vcpu(int cpu) { x86_platform.hyper.pin_vcpu(cpu); } -#else + +#else /* !CONFIG_X86 */ + +#include + static inline void hypervisor_pin_vcpu(int cpu) { } -#endif + +static inline bool jailhouse_paravirt(void) +{ + return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); +} + +#endif /* !CONFIG_X86 */ #endif /* __LINUX_HYPEVISOR_H */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 7/7] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 4623caf8d72d..6dc0b8f3ae0e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7523,6 +7523,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/ix2505v* +JAILHOUSE HYPERVISOR INTERFACE +M: Jan Kiszka <jan.kis...@siemens.com> +L: jailhouse-...@googlegroups.com +S: Maintained +F: arch/x86/kernel/jailhouse.c +F: arch/x86/include/asm/jailhouse_para.h + JC42.4 TEMPERATURE SENSOR DRIVER M: Guenter Roeck <li...@roeck-us.net> L: linux-hw...@vger.kernel.org -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates
On 2018-03-01 11:31, Andy Shevchenko wrote: > On Thu, Mar 1, 2018 at 7:40 AM, Jan Kiszka <jan.kis...@siemens.com> wrote: > >> Use the PCI mmconfig base address exported by jailhouse in boot >> parameters in order to access the memory mapped PCI configuration space. > > >> --- a/arch/x86/kernel/jailhouse.c >> +++ b/arch/x86/kernel/jailhouse.c >> @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void) >> if (pcibios_last_bus < 0) >> pcibios_last_bus = 0xff; >> >> +#ifdef CONFIG_PCI_MMCONFIG >> + if (setup_data.pci_mmconfig_base) { > >> + pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base); > > Hmm... Shouldn't be pcibios_last_bus instead of 0xff? > Indeed. Thanks, Jan >> + pci_mmcfg_arch_init(); >> + } >> +#endif > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka <jan.kis...@siemens.com> Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, we need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 6 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index aef9d67ac186..b8e73e748acc 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2643,7 +2643,7 @@ config PCI_DIRECT config PCI_MMCONFIG bool "Support mmconfig PCI config space access" if X86_64 default y - depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)) + depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)) config PCI_OLPC def_bool y @@ -2658,6 +2658,10 @@ config PCI_DOMAINS def_bool y depends on PCI +config MMCONF_FAM10H + def_bool y + depends on PCI_MMCONFIG && ACPI + config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 29786c87e864..73ccf80c09a2 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o obj-y += vsmp_64.o endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f0e6456ca7d3..12bc0a1139da 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) static void init_amd_gh(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 +#ifdef CONFIG_MMCONF_FAM10H /* do this for boot cpu */ if (c == _cpu_data) check_enable_amd_mmconf_dmi(); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest is currently pending in the tip tree. This builds on top and enhances the PCI support for x86 and also ARM guests (ARM[64] does not require platform patches and works already). Key elements of this series are: - detection of Jailhouse via device tree hypervisor node - function-level PCI scan if Jailhouse is detected - MMCONFIG support for x86 guests As most changes affect x86, I would suggest to route the series also via tip after the necessary acks are collected. Changes in v3: - avoided duplicate scans of PCI functions under Jailhouse - reformated PCI_MMCONFIG condition and rephrase related commit log Changes in v2: - adjusted commit log and include ordering in patch 2 - rebased over Linus master Jan [1] https://lkml.org/lkml/2017/11/27/125 [2] http://jailhouse-project.org CC: Benedikt Spranger <b.spran...@linutronix.de> CC: Mark Rutland <mark.rutl...@arm.com> CC: Otavio Pontes <otavio.pon...@intel.com> CC: Rob Herring <robh...@kernel.org> Jan Kiszka (5): jailhouse: Provide detection for non-x86 systems PCI: Scan all functions when running over Jailhouse x86: Consolidate PCI_MMCONFIG configs x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI MAINTAINERS: Add entry for Jailhouse Otavio Pontes (1): x86/jailhouse: Enable PCI mmconfig access in inmates Documentation/devicetree/bindings/jailhouse.txt | 8 MAINTAINERS | 7 +++ arch/x86/Kconfig| 11 ++- arch/x86/include/asm/jailhouse_para.h | 2 +- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/jailhouse.c | 7 +++ arch/x86/pci/legacy.c | 4 +++- arch/x86/pci/mmconfig-shared.c | 4 ++-- drivers/pci/probe.c | 22 +++--- include/linux/hypervisor.h | 17 +++-- 12 files changed, 72 insertions(+), 16 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 4/6] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka <jan.kis...@siemens.com> Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we have two PCI_MMCONFIG entries, one from the original i386 and another from x86_64. This consolidates both entries into a single one. The logic for x86_32, where this option was not under user control, remains identical. On x86_64, PCI_MMCONFIG becomes additionally configurable for SFI systems even if ACPI was disabled. This just simplifies the logic without restricting the configurability in any way. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..aef9d67ac186 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2641,8 +2641,9 @@ config PCI_DIRECT depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG)) config PCI_MMCONFIG - def_bool y - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY) + bool "Support mmconfig PCI config space access" if X86_64 + default y + depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)) config PCI_OLPC def_bool y @@ -2657,10 +2658,6 @@ config PCI_DOMAINS def_bool y depends on PCI -config PCI_MMCONFIG - bool "Support mmconfig PCI config space access" - depends on X86_64 && PCI && ACPI - config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 1/6] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka <jan.kis...@siemens.com> Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. CC: Rob Herring <robh...@kernel.org> CC: Mark Rutland <mark.rutl...@arm.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- Documentation/devicetree/bindings/jailhouse.txt | 8 arch/x86/include/asm/jailhouse_para.h | 2 +- include/linux/hypervisor.h | 17 +++-- 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt new file mode 100644 index ..2901c25ff340 --- /dev/null +++ b/Documentation/devicetree/bindings/jailhouse.txt @@ -0,0 +1,8 @@ +Jailhouse non-root cell device tree bindings + + +When running in a non-root Jailhouse cell (partition), the device tree of this +platform shall have a top-level "hypervisor" node with the following +properties: + +- compatible = "jailhouse,cell" diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h index 875b54376689..b885a961a150 100644 --- a/arch/x86/include/asm/jailhouse_para.h +++ b/arch/x86/include/asm/jailhouse_para.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL2.0 */ /* - * Jailhouse paravirt_ops implementation + * Jailhouse paravirt detection * * Copyright (c) Siemens AG, 2015-2017 * diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b19563f9a8eb..fc08b433c856 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -8,15 +8,28 @@ */ #ifdef CONFIG_X86 + +#include #include + static inline void hypervisor_pin_vcpu(int cpu) { x86_platform.hyper.pin_vcpu(cpu); } -#else + +#else /* !CONFIG_X86 */ + +#include + static inline void hypervisor_pin_vcpu(int cpu) { } -#endif + +static inline bool jailhouse_paravirt(void) +{ + return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); +} + +#endif /* !CONFIG_X86 */ #endif /* __LINUX_HYPEVISOR_H */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Otavio Pontes <otavio.pon...@intel.com> Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. Signed-off-by: Otavio Pontes <otavio.pon...@intel.com> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG] Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/jailhouse.c| 7 +++ arch/x86/pci/mmconfig-shared.c | 4 ++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index eb66fa9cd0fc..959d618dbb17 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, + int end, u64 addr); extern struct list_head pci_mmcfg_list; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index b68fd895235a..7fe2a73da0b3 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void) if (pcibios_last_bus < 0) pcibios_last_bus = 0xff; +#ifdef CONFIG_PCI_MMCONFIG + if (setup_data.pci_mmconfig_base) { + pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base); + pci_mmcfg_arch_init(); + } +#endif + return 0; } diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 96684d0adcf9..0e590272366b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, return new; } -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, +int end, u64 addr) { struct pci_mmcfg_region *new; -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 2/6] PCI: Scan all functions when running over Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a function 0. Therefore, Linux scans for devices at function 0 (devfn 0/8/16/...) and only scans for other functions if function 0 has its Multi-Function Device bit set or ARI or SR-IOV indicate there are more functions. The Jailhouse hypervisor may pass individual functions of a multi-function device to a guest without passing function 0, which means a Linux guest won't find them. Change Linux PCI probing so it scans all function numbers when running as a guest over Jailhouse. This is technically prohibited by the spec, so it is possible that PCI devices without the Multi-Function Device bit set may have unexpected behavior in response to this probe. Derived from original patch by Benedikt Spranger. CC: Benedikt Spranger <b.spran...@linutronix.de> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Acked-by: Bjorn Helgaas <bhelg...@google.com> --- arch/x86/pci/legacy.c | 4 +++- drivers/pci/probe.c | 22 +++--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 1cb01abcb1be..dfbe6ac38830 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -4,6 +4,7 @@ #include #include #include +#include #include /* @@ -34,13 +35,14 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { + int stride = jailhouse_paravirt() ? 1 : 8; int devfn; u32 l; if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += 8) { + for (devfn = 0; devfn < 256; devfn += stride) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && l != 0x && l != 0x) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ef5377438a1e..da22d6d216f8 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, { unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0; unsigned int start = bus->busn_res.start; - unsigned int devfn, cmax, max = start; + unsigned int devfn, fn, cmax, max = start; struct pci_dev *dev; + int nr_devs; dev_dbg(>dev, "scanning bus\n"); /* Go find them, Rover! */ - for (devfn = 0; devfn < 0x100; devfn += 8) - pci_scan_slot(bus, devfn); + for (devfn = 0; devfn < 0x100; devfn += 8) { + nr_devs = pci_scan_slot(bus, devfn); + + /* +* The Jailhouse hypervisor may pass individual functions of a +* multi-function device to a guest without passing function 0. +* Look for them as well. +*/ + if (jailhouse_paravirt() && nr_devs == 0) { + for (fn = 1; fn < 8; fn++) { + dev = pci_scan_single_device(bus, devfn + fn); + if (dev) + dev->multifunction = 1; + } + } + } /* Reserve buses for SR-IOV capability */ used_buses = pci_iov_bus_range(bus); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 6/6] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..4b889f282c77 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7521,6 +7521,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/ix2505v* +JAILHOUSE HYPERVISOR INTERFACE +M: Jan Kiszka <jan.kis...@siemens.com> +L: jailhouse-...@googlegroups.com +S: Maintained +F: arch/x86/kernel/jailhouse.c +F: arch/x86/include/asm/jailhouse_para.h + JC42.4 TEMPERATURE SENSOR DRIVER M: Guenter Roeck <li...@roeck-us.net> L: linux-hw...@vger.kernel.org -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 2/6] PCI: Scan all functions when running over Jailhouse
On 2018-02-28 09:44, Thomas Gleixner wrote: > On Wed, 28 Feb 2018, Jan Kiszka wrote: > >> From: Jan Kiszka <jan.kis...@siemens.com> >> >> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to >> have a function 0. Therefore, Linux scans for devices at function 0 >> (devfn 0/8/16/...) and only scans for other functions if function 0 >> has its Multi-Function Device bit set or ARI or SR-IOV indicate >> there are more functions. >> >> The Jailhouse hypervisor may pass individual functions of a >> multi-function device to a guest without passing function 0, which >> means a Linux guest won't find them. >> >> Change Linux PCI probing so it scans all function numbers when >> running as a guest over Jailhouse. > >> void pcibios_scan_specific_bus(int busn) >> { >> +int stride = jailhouse_paravirt() ? 1 : 8; >> int devfn; >> u32 l; >> >> if (pci_find_bus(0, busn)) >> return; >> >> -for (devfn = 0; devfn < 256; devfn += 8) { >> +for (devfn = 0; devfn < 256; devfn += stride) { >> if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && >> l != 0x && l != 0x) { >> DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, >> l); > > Shouldn't that take the situation into account where the MFD bit is set on > a regular devfn, i.e. (devfn % 8) == 0? In that case you'd scan the > subfunctions twice. Good point, and it also applies to pci_scan_child_bus_extend. Will add some filters. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 1/6] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka <jan.kis...@siemens.com> Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. CC: Rob Herring <robh...@kernel.org> CC: Mark Rutland <mark.rutl...@arm.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- Documentation/devicetree/bindings/jailhouse.txt | 8 arch/x86/include/asm/jailhouse_para.h | 2 +- include/linux/hypervisor.h | 17 +++-- 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt new file mode 100644 index ..2901c25ff340 --- /dev/null +++ b/Documentation/devicetree/bindings/jailhouse.txt @@ -0,0 +1,8 @@ +Jailhouse non-root cell device tree bindings + + +When running in a non-root Jailhouse cell (partition), the device tree of this +platform shall have a top-level "hypervisor" node with the following +properties: + +- compatible = "jailhouse,cell" diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h index 875b54376689..b885a961a150 100644 --- a/arch/x86/include/asm/jailhouse_para.h +++ b/arch/x86/include/asm/jailhouse_para.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL2.0 */ /* - * Jailhouse paravirt_ops implementation + * Jailhouse paravirt detection * * Copyright (c) Siemens AG, 2015-2017 * diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b19563f9a8eb..fc08b433c856 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -8,15 +8,28 @@ */ #ifdef CONFIG_X86 + +#include #include + static inline void hypervisor_pin_vcpu(int cpu) { x86_platform.hyper.pin_vcpu(cpu); } -#else + +#else /* !CONFIG_X86 */ + +#include + static inline void hypervisor_pin_vcpu(int cpu) { } -#endif + +static inline bool jailhouse_paravirt(void) +{ + return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); +} + +#endif /* !CONFIG_X86 */ #endif /* __LINUX_HYPEVISOR_H */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest is currently pending in the tip tree. This builds on top and enhances the PCI support for x86 and also ARM guests (ARM[64] does not require platform patches and works already). Key elements of this series are: - detection of Jailhouse via device tree hypervisor node - function-level PCI scan if Jailhouse is detected - MMCONFIG support for x86 guests As most changes affect x86, I would suggest to route the series also via tip after the necessary acks are collected. Changes in v2: - adjusted commit log and include ordering in patch 2 - rebased over Linus master Jan [1] https://lkml.org/lkml/2017/11/27/125 [2] http://jailhouse-project.org CC: Benedikt Spranger <b.spran...@linutronix.de> CC: Mark Rutland <mark.rutl...@arm.com> CC: Otavio Pontes <otavio.pon...@intel.com> CC: Rob Herring <robh...@kernel.org> Jan Kiszka (5): jailhouse: Provide detection for non-x86 systems PCI: Scan all functions when running over Jailhouse x86: Consolidate PCI_MMCONFIG configs x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI MAINTAINERS: Add entry for Jailhouse Otavio Pontes (1): x86/jailhouse: Enable PCI mmconfig access in inmates Documentation/devicetree/bindings/jailhouse.txt | 8 MAINTAINERS | 7 +++ arch/x86/Kconfig| 11 ++- arch/x86/include/asm/jailhouse_para.h | 2 +- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/jailhouse.c | 7 +++ arch/x86/pci/legacy.c | 4 +++- arch/x86/pci/mmconfig-shared.c | 4 ++-- drivers/pci/probe.c | 4 +++- include/linux/hypervisor.h | 17 +++-- 12 files changed, 56 insertions(+), 14 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Otavio Pontes <otavio.pon...@intel.com> Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. Signed-off-by: Otavio Pontes <otavio.pon...@intel.com> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG] Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/jailhouse.c| 7 +++ arch/x86/pci/mmconfig-shared.c | 4 ++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index eb66fa9cd0fc..959d618dbb17 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, + int end, u64 addr); extern struct list_head pci_mmcfg_list; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index b68fd895235a..7fe2a73da0b3 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void) if (pcibios_last_bus < 0) pcibios_last_bus = 0xff; +#ifdef CONFIG_PCI_MMCONFIG + if (setup_data.pci_mmconfig_base) { + pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base); + pci_mmcfg_arch_init(); + } +#endif + return 0; } diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 96684d0adcf9..0e590272366b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, return new; } -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, +int end, u64 addr) { struct pci_mmcfg_region *new; -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 4/6] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka <jan.kis...@siemens.com> Not sure if those two worked by design or just by chance so far. In any case, it's at least cleaner and clearer to express this in a single config statement. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..63e85e7da12e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2641,8 +2641,9 @@ config PCI_DIRECT depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG)) config PCI_MMCONFIG - def_bool y - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY) + bool "Support mmconfig PCI config space access" if X86_64 + default y + depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) config PCI_OLPC def_bool y @@ -2657,10 +2658,6 @@ config PCI_DOMAINS def_bool y depends on PCI -config PCI_MMCONFIG - bool "Support mmconfig PCI config space access" - depends on X86_64 && PCI && ACPI - config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 2/6] PCI: Scan all functions when running over Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a function 0. Therefore, Linux scans for devices at function 0 (devfn 0/8/16/...) and only scans for other functions if function 0 has its Multi-Function Device bit set or ARI or SR-IOV indicate there are more functions. The Jailhouse hypervisor may pass individual functions of a multi-function device to a guest without passing function 0, which means a Linux guest won't find them. Change Linux PCI probing so it scans all function numbers when running as a guest over Jailhouse. This is technically prohibited by the spec, so it is possible that PCI devices without the Multi-Function Device bit set may have unexpected behavior in response to this probe. Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid changing the behavior in the absence of the hypervisor. CC: Benedikt Spranger <b.spran...@linutronix.de> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> Acked-by: Bjorn Helgaas <bhelg...@google.com> --- arch/x86/pci/legacy.c | 4 +++- drivers/pci/probe.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 1cb01abcb1be..dfbe6ac38830 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -4,6 +4,7 @@ #include #include #include +#include #include /* @@ -34,13 +35,14 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { + int stride = jailhouse_paravirt() ? 1 : 8; int devfn; u32 l; if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += 8) { + for (devfn = 0; devfn < 256; devfn += stride) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && l != 0x && l != 0x) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ef5377438a1e..ce728251ae36 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include "pci.h" @@ -2517,6 +2518,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, unsigned int available_buses) { unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0; + unsigned int stride = jailhouse_paravirt() ? 1 : 8; unsigned int start = bus->busn_res.start; unsigned int devfn, cmax, max = start; struct pci_dev *dev; @@ -2524,7 +2526,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, dev_dbg(>dev, "scanning bus\n"); /* Go find them, Rover! */ - for (devfn = 0; devfn < 0x100; devfn += 8) + for (devfn = 0; devfn < 0x100; devfn += stride) pci_scan_slot(bus, devfn); /* Reserve buses for SR-IOV capability */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka <jan.kis...@siemens.com> Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, we need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 6 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 63e85e7da12e..5b0ac52e357a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2643,7 +2643,7 @@ config PCI_DIRECT config PCI_MMCONFIG bool "Support mmconfig PCI config space access" if X86_64 default y - depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) + depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) config PCI_OLPC def_bool y @@ -2658,6 +2658,10 @@ config PCI_DOMAINS def_bool y depends on PCI +config MMCONF_FAM10H + def_bool y + depends on PCI_MMCONFIG && ACPI + config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 29786c87e864..73ccf80c09a2 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o obj-y += vsmp_64.o endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f0e6456ca7d3..12bc0a1139da 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) static void init_amd_gh(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 +#ifdef CONFIG_MMCONF_FAM10H /* do this for boot cpu */ if (c == _cpu_data) check_enable_amd_mmconf_dmi(); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 6/6] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..4b889f282c77 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7521,6 +7521,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/ix2505v* +JAILHOUSE HYPERVISOR INTERFACE +M: Jan Kiszka <jan.kis...@siemens.com> +L: jailhouse-...@googlegroups.com +S: Maintained +F: arch/x86/kernel/jailhouse.c +F: arch/x86/include/asm/jailhouse_para.h + JC42.4 TEMPERATURE SENSOR DRIVER M: Guenter Roeck <li...@roeck-us.net> L: linux-hw...@vger.kernel.org -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse
On 2018-02-22 21:57, Bjorn Helgaas wrote: > On Mon, Jan 22, 2018 at 07:12:46AM +0100, Jan Kiszka wrote: >> From: Jan Kiszka <jan.kis...@siemens.com> >> >> PCI and PCIBIOS probing only scans devices at function number 0/8/16/... >> Subdevices (e.g. multiqueue) have function numbers which are not a >> multiple of 8. > > Suggested text: > > Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to > have a function 0. Therefore, Linux scans for devices at function 0 > (devfn 0/8/16/...) and only scans for other functions if function 0 > has its Multi-Function Device bit set or ARI or SR-IOV indicate > there are more functions. > > The Jailhouse hypervisor may pass individual functions of a > multi-function device to a guest without passing function 0, which > means a Linux guest won't find them. > > Change Linux PCI probing so it scans all function numbers when > running as a guest over Jailhouse. > > This is technically prohibited by the spec, so it is possible that > PCI devices without the Multi-Function Device bit set may have > unexpected behavior in response to this probe. > >> The simple hypervisor Jailhouse passes subdevices directly w/o providing >> a virtual PCI topology like KVM. As a consequence a PCI passthrough from >> Jailhouse to a guest will not be detected by Linux. >> >> Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid >> changing the behavior in the absence of the hypervisor. >> >> CC: Benedikt Spranger <b.spran...@linutronix.de> >> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> > > With subject change to: > > PCI: Scan all functions when running over Jailhouse > > Acked-by: Bjorn Helgaas <bhelg...@google.com> > Thanks, all suggestions picked up for next round. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse
On 2018-02-23 14:23, Andy Shevchenko wrote: > On Mon, Jan 22, 2018 at 8:12 AM, Jan Kiszka <jan.kis...@siemens.com> wrote: > >> #include >> #include >> #include >> +#include > > Keep it in order? > Done. > >> #include >> #include >> #include >> +#include > > Ditto. > Despite the context suggesting it, this file has no ordering. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] x86: Consolidate PCI_MMCONFIG configs
On 2018-01-28 18:26, Andy Shevchenko wrote: > On Mon, Jan 22, 2018 at 8:12 AM, Jan Kiszka <jan.kis...@siemens.com> wrote: >> From: Jan Kiszka <jan.kis...@siemens.com> >> >> Not sure if those two worked by design or just by chance so far. In any >> case, it's at least cleaner and clearer to express this in a single >> config statement. > > Congrats! You found by the way a bug in > > commit e279b6c1d329e50b766bce96aacc197eae8a053b > Author: Sam Ravnborg <s...@ravnborg.org> > Date: Tue Nov 6 20:41:05 2007 +0100 > >x86: start unification of arch/x86/Kconfig.* > > ...and proper fix seems to split PCI stuff to common + X86_32 only + X86_64 > only > Hmm, is that a change request on this patch? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 6/6] MAINTAINERS: Add entry for Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 426ba037d943..dd51a2012b36 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7468,6 +7468,13 @@ Q: http://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/ix2505v* +JAILHOUSE HYPERVISOR INTERFACE +M: Jan Kiszka <jan.kis...@siemens.com> +L: jailhouse-...@googlegroups.com +S: Maintained +F: arch/x86/kernel/jailhouse.c +F: arch/x86/include/asm/jailhouse_para.h + JC42.4 TEMPERATURE SENSOR DRIVER M: Guenter Roeck <li...@roeck-us.net> L: linux-hw...@vger.kernel.org -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest is currently pending in the tip tree. This builds on top and enhances the PCI support for x86 and also ARM guests (ARM[64] does not require platform patches and works already). Key elements of this series are: - detection of Jailhouse via device tree hypervisor node - function-level PCI scan if Jailhouse is detected - MMCONFIG support for x86 guests As most changes affect x86, I would suggest to route the series also via tip after the necessary acks are collected. Jan [1] https://lkml.org/lkml/2017/11/27/125 [2] http://jailhouse-project.org CC: Benedikt Spranger <b.spran...@linutronix.de> CC: Mark Rutland <mark.rutl...@arm.com> CC: Otavio Pontes <otavio.pon...@intel.com> CC: Rob Herring <robh...@kernel.org> Jan Kiszka (5): jailhouse: Provide detection for non-x86 systems pci: Scan all functions when probing while running over Jailhouse x86: Consolidate PCI_MMCONFIG configs x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI MAINTAINERS: Add entry for Jailhouse Otavio Pontes (1): x86/jailhouse: Enable PCI mmconfig access in inmates Documentation/devicetree/bindings/jailhouse.txt | 8 MAINTAINERS | 7 +++ arch/x86/Kconfig| 11 ++- arch/x86/include/asm/jailhouse_para.h | 2 +- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/Makefile| 2 +- arch/x86/kernel/cpu/amd.c | 2 +- arch/x86/kernel/jailhouse.c | 7 +++ arch/x86/pci/legacy.c | 4 +++- arch/x86/pci/mmconfig-shared.c | 4 ++-- drivers/pci/probe.c | 4 +++- include/linux/hypervisor.h | 17 +++-- 12 files changed, 56 insertions(+), 14 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
From: Jan Kiszka <jan.kis...@siemens.com> Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the latter can be built without having to enable ACPI as well. Primarily, we need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI, instead of just the former. Saves some bytes in the Jailhouse non-root kernel. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 6 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/cpu/amd.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f2038417a590..77ba0eb0a258 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2597,7 +2597,7 @@ config PCI_DIRECT config PCI_MMCONFIG bool "Support mmconfig PCI config space access" if X86_64 default y - depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) + depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) config PCI_OLPC def_bool y @@ -2612,6 +2612,10 @@ config PCI_DOMAINS def_bool y depends on PCI +config MMCONF_FAM10H + def_bool y + depends on PCI_MMCONFIG && ACPI + config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index aed9296dccd3..b2c9e230e2fe 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -143,6 +143,6 @@ ifeq ($(CONFIG_X86_64),y) obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o - obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o + obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o obj-y += vsmp_64.o endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index ea831c858195..47edf599f6fd 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -690,7 +690,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) static void init_amd_gh(struct cpuinfo_x86 *c) { -#ifdef CONFIG_X86_64 +#ifdef CONFIG_MMCONF_FAM10H /* do this for boot cpu */ if (c == _cpu_data) check_enable_amd_mmconf_dmi(); -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 4/6] x86: Consolidate PCI_MMCONFIG configs
From: Jan Kiszka <jan.kis...@siemens.com> Not sure if those two worked by design or just by chance so far. In any case, it's at least cleaner and clearer to express this in a single config statement. Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/Kconfig | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 423e4b64e683..f2038417a590 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2595,8 +2595,9 @@ config PCI_DIRECT depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG)) config PCI_MMCONFIG - def_bool y - depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY) + bool "Support mmconfig PCI config space access" if X86_64 + default y + depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || X86_64) config PCI_OLPC def_bool y @@ -2611,10 +2612,6 @@ config PCI_DOMAINS def_bool y depends on PCI -config PCI_MMCONFIG - bool "Support mmconfig PCI config space access" - depends on X86_64 && PCI && ACPI - config PCI_CNB20LE_QUIRK bool "Read CNB20LE Host Bridge Windows" if EXPERT depends on PCI -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates
From: Otavio Pontes <otavio.pon...@intel.com> Use the PCI mmconfig base address exported by jailhouse in boot parameters in order to access the memory mapped PCI configuration space. Signed-off-by: Otavio Pontes <otavio.pon...@intel.com> [Jan: rebased, fixed !CONFIG_PCI_MMCONFIG] Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/include/asm/pci_x86.h | 2 ++ arch/x86/kernel/jailhouse.c| 7 +++ arch/x86/pci/mmconfig-shared.c | 4 ++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index eb66fa9cd0fc..959d618dbb17 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, + int end, u64 addr); extern struct list_head pci_mmcfg_list; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index b68fd895235a..7fe2a73da0b3 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void) if (pcibios_last_bus < 0) pcibios_last_bus = 0xff; +#ifdef CONFIG_PCI_MMCONFIG + if (setup_data.pci_mmconfig_base) { + pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base); + pci_mmcfg_arch_init(); + } +#endif + return 0; } diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 96684d0adcf9..0e590272366b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, return new; } -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) +struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, +int end, u64 addr) { struct pci_mmcfg_region *new; -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/6] jailhouse: Provide detection for non-x86 systems
From: Jan Kiszka <jan.kis...@siemens.com> Implement jailhouse_paravirt() via device tree probing on architectures != x86. Will be used by the PCI core. CC: Rob Herring <robh...@kernel.org> CC: Mark Rutland <mark.rutl...@arm.com> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- Documentation/devicetree/bindings/jailhouse.txt | 8 arch/x86/include/asm/jailhouse_para.h | 2 +- include/linux/hypervisor.h | 17 +++-- 3 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/jailhouse.txt diff --git a/Documentation/devicetree/bindings/jailhouse.txt b/Documentation/devicetree/bindings/jailhouse.txt new file mode 100644 index ..2901c25ff340 --- /dev/null +++ b/Documentation/devicetree/bindings/jailhouse.txt @@ -0,0 +1,8 @@ +Jailhouse non-root cell device tree bindings + + +When running in a non-root Jailhouse cell (partition), the device tree of this +platform shall have a top-level "hypervisor" node with the following +properties: + +- compatible = "jailhouse,cell" diff --git a/arch/x86/include/asm/jailhouse_para.h b/arch/x86/include/asm/jailhouse_para.h index 875b54376689..b885a961a150 100644 --- a/arch/x86/include/asm/jailhouse_para.h +++ b/arch/x86/include/asm/jailhouse_para.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL2.0 */ /* - * Jailhouse paravirt_ops implementation + * Jailhouse paravirt detection * * Copyright (c) Siemens AG, 2015-2017 * diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b19563f9a8eb..fc08b433c856 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -8,15 +8,28 @@ */ #ifdef CONFIG_X86 + +#include #include + static inline void hypervisor_pin_vcpu(int cpu) { x86_platform.hyper.pin_vcpu(cpu); } -#else + +#else /* !CONFIG_X86 */ + +#include + static inline void hypervisor_pin_vcpu(int cpu) { } -#endif + +static inline bool jailhouse_paravirt(void) +{ + return of_find_compatible_node(NULL, NULL, "jailhouse,cell"); +} + +#endif /* !CONFIG_X86 */ #endif /* __LINUX_HYPEVISOR_H */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse
From: Jan Kiszka <jan.kis...@siemens.com> PCI and PCIBIOS probing only scans devices at function number 0/8/16/... Subdevices (e.g. multiqueue) have function numbers which are not a multiple of 8. The simple hypervisor Jailhouse passes subdevices directly w/o providing a virtual PCI topology like KVM. As a consequence a PCI passthrough from Jailhouse to a guest will not be detected by Linux. Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid changing the behavior in the absence of the hypervisor. CC: Benedikt Spranger <b.spran...@linutronix.de> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> --- arch/x86/pci/legacy.c | 4 +++- drivers/pci/probe.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 1cb01abcb1be..a7b0476b4f44 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -5,6 +5,7 @@ #include #include #include +#include /* * Discover remaining PCI buses in case there are peer host bridges. @@ -34,13 +35,14 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { + int stride = jailhouse_paravirt() ? 1 : 8; int devfn; u32 l; if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += 8) { + for (devfn = 0; devfn < 256; devfn += stride) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) && l != 0x && l != 0x) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 14e0ea1ff38b..60ad14c8245f 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "pci.h" #define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */ @@ -2454,6 +2455,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, unsigned int available_buses) { unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0; + unsigned int stride = jailhouse_paravirt() ? 1 : 8; unsigned int start = bus->busn_res.start; unsigned int devfn, cmax, max = start; struct pci_dev *dev; @@ -2461,7 +2463,7 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, dev_dbg(>dev, "scanning bus\n"); /* Go find them, Rover! */ - for (devfn = 0; devfn < 0x100; devfn += 8) + for (devfn = 0; devfn < 0x100; devfn += stride) pci_scan_slot(bus, devfn); /* Reserve buses for SR-IOV capability. */ -- 2.13.6 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/6] virtio core DMA API conversion
On 2015-11-10 03:18, Andy Lutomirski wrote: > On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt >> I thus go back to my original statement, it's a LOT easier to handle if >> the device itself is self describing, indicating whether it is set to >> bypass a host iommu or not. For L1->L2, well, that wouldn't be the >> first time qemu/VFIO plays tricks with the passed through device >> configuration space... > > Which leaves the special case of Xen, where even preexisting devices > don't bypass the IOMMU. Can we keep this specific to powerpc and > sparc? On x86, this problem is basically nonexistent, since the IOMMU > is properly self-describing. > > IOW, I think that on x86 we should assume that all virtio devices > honor the IOMMU. >From the guest driver POV, that is OK because either there is no IOMMU to program (the current situation with qemu), there can be one that doesn't need it (the current situation with qemu and iommu=on) or there is (Xen) or will be (future qemu) one that requires it. > >> >> Note that the above can be solved via some kind of compromise: The >> device self describes the ability to honor the iommu, along with the >> property (or ACPI table entry) that indicates whether or not it does. >> >> IE. We could use the revision or ProgIf field of the config space for >> example. Or something in virtio config. If it's an "old" device, we >> know it always bypass. If it's a new device, we know it only bypasses >> if the corresponding property is in. I still would have to sort out the >> openbios case for mac among others but it's at least a workable >> direction. >> >> BTW. Don't you have a similar problem on x86 that today qemu claims >> that everything honors the iommu in ACPI ? > > Only on a single experimental configuration, and that can apparently > just be fixed going forward without any real problems being caused. BTW, I once tried to describe the current situation on QEMU x86 with IOMMU enabled via ACPI. While you can easily add IOMMU device exceptions to the static tables, the fun starts when considering device hotplug for virtio. Unless I missed some trick, ACPI doesn't seem like being designed for that level of flexibility. You would have to reserve a complete PCI bus, declare that one as not being IOMMU-governed, and then only add new virtio devices to that bus. Possible, but a lot of restrictions that existing management software would have to be aware of as well. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: RFC: virtio-peer shared memory based peer communication device
On 2015-09-18 23:11, Paolo Bonzini wrote: > On 18/09/2015 18:29, Claudio Fontana wrote: >> >> this is a first RFC for virtio-peer 0.1, which is still very much a work in >> progress: >> >> https://github.com/hw-claudio/virtio-peer/wiki >> >> It is also available as PDF there, but the text is reproduced here for >> commenting: >> >> Peer shared memory communication device (virtio-peer) > > Apart from the windows idea, how does virtio-peer compare to virtio-rpmsg? rpmsg is a very specialized thing. It targets single AMP cores, assuming that those have full access to the main memory. And it is also a centralized approach where all message go through the main Linux instance. I suspect we could cover that use case as well with generic inter-vm shared memory device, but I didn't think about all details yet. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: RFC: virtio-peer shared memory based peer communication device
On 2015-09-21 14:13, Michael S. Tsirkin wrote: > On Fri, Sep 18, 2015 at 06:29:27PM +0200, Claudio Fontana wrote: >> Hello, >> >> this is a first RFC for virtio-peer 0.1, which is still very much a work in >> progress: >> >> https://github.com/hw-claudio/virtio-peer/wiki >> >> It is also available as PDF there, but the text is reproduced here for >> commenting: >> >> Peer shared memory communication device (virtio-peer) >> >> General Overview >> >> (I recommend looking at the PDF for some clarifying pictures) >> >> The Virtio Peer shared memory communication device (virtio-peer) is a >> virtual device which allows high performance low latency guest to >> guest communication. It uses a new queue extension feature tentatively >> called VIRTIO_F_WINDOW which indicates that descriptor tables, >> available and used rings and Queue Data reside in physical memory >> ranges called Windows, each identified with an unique identifier >> called WindowID. > > So if I had to summarize the difference from regular virtio, > I'd say the main one is that this uses window id + offset > instead of the physical address. > > > My question is - why do it? > > All windows are in memory space, are they not? > > How about guest using full physical addresses, > and hypervisor sending the window physical address > to VM2? > > VM2 can uses that to find both window id and offset. > > > This way at least VM1 can use regular virtio without changes. What would be the value of having different drivers in VM1 and VM2, specifically if both run Linux? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-03 10:08, Michael S. Tsirkin wrote: > On Tue, Sep 01, 2015 at 06:28:28PM +0200, Jan Kiszka wrote: >> On 2015-09-01 18:02, Michael S. Tsirkin wrote: >>> On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote: >>>> On 2015-09-01 16:34, Michael S. Tsirkin wrote: >>>>> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote: >>>>>> On 2015-09-01 11:24, Michael S. Tsirkin wrote: >>>>>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote: >>>>>>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote: >>>>>>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote: >>>>>>>>>> Leaving all the implementation and interface details aside, this >>>>>>>>>> discussion is first of all about two fundamentally different >>>>>>>>>> approaches: >>>>>>>>>> static shared memory windows vs. dynamically remapped shared windows >>>>>>>>>> (a >>>>>>>>>> third one would be copying in the hypervisor, but I suppose we all >>>>>>>>>> agree >>>>>>>>>> that the whole exercise is about avoiding that). Which way do we >>>>>>>>>> want or >>>>>>>>>> have to go? >>>>>>>>>> >>>>>>>>>> Jan >>>>>>>>> >>>>>>>>> Dynamic is a superset of static: you can always make it static if you >>>>>>>>> wish. Static has the advantage of simplicity, but that's lost once you >>>>>>>>> realize you need to invent interfaces to make it work. Since we can >>>>>>>>> use >>>>>>>>> existing IOMMU interfaces for the dynamic one, what's the >>>>>>>>> disadvantage? >>>>>>>> >>>>>>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor >>>>>>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this >>>>>>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that >>>>>>>> sense, generic grant tables would be more appealing. >>>>>>> >>>>>>> That's not how we do things for KVM, PV features need to be >>>>>>> modular and interchangeable with emulation. >>>>>> >>>>>> I know, and we may have to make some compromise for Jailhouse if that >>>>>> brings us valuable standardization and broad guest support. But we will >>>>>> surely not support an arbitrary amount of IOMMU models for that reason. >>>>>> >>>>>>> >>>>>>> If you just want something that's cross-platform and easy to >>>>>>> implement, just build a PV IOMMU. Maybe use virtio for this. >>>>>> >>>>>> That is likely required to keep the complexity manageable and to allow >>>>>> static preconfiguration. >>>>> >>>>> Real IOMMU allow static configuration just fine. This is exactly >>>>> what VFIO uses. >>>> >>>> Please specify more precisely which feature in which IOMMU you are >>>> referring to. Also, given that you refer to VFIO, I suspect we have >>>> different thing in mind. I'm talking about an IOMMU device model, like >>>> the one we have in QEMU now for VT-d. That one is not at all >>>> preconfigured by the host for VFIO. >>> >>> I really just mean that VFIO creates a mostly static IOMMU configuration. >>> >>> It's configured by the guest, not the host. >> >> OK, that resolves my confusion. >> >>> >>> I don't see host control over configuration as being particularly important. >> >> We do, see below. >> >>> >>> >>>>> >>>>>> Well, we could declare our virtio-shmem device to be an IOMMU device >>>>>> that controls access of a remote VM to RAM of the one that owns the >>>>>> device. In the static case, this access may at most be enabled/disabled >>>>>> but not moved around. The static regions would have to be discoverable >>>>>> for the VM (register read-back), and the guest's firmware will likely >>>>>> have to declare those ranges
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-03 10:37, Michael S. Tsirkin wrote: > On Thu, Sep 03, 2015 at 10:21:28AM +0200, Jan Kiszka wrote: >> On 2015-09-03 10:08, Michael S. Tsirkin wrote: >>> >>> IOW if you wish, you actually can create a shared memory device, >>> make it accessible to the IOMMU and place some or all >>> data there. >>> >> >> Actually, that could also be something more sophisticated, including >> virtio-net, IF that device will be able to express its DMA window >> restrictions (a bit like 32-bit PCI devices being restricted to <4G >> addresses or ISA devices <1M). >> >> Jan > > Actually, it's the bus restriction, not the device restriction. > > So if you want to use bounce buffers in the name of security or > real-time requirements, you should be able to do this if virtio uses the > DMA API. Bounce buffer will only be the simplest option (though fine for low-rate traffic that we also have in mind, like virtual consoles). Given properly-sized regions, even if fixed, and the right communication stacks, you can directly allocate application buffers in those regions and avoid most to all copying. In any case, if we manage to address this variation along with your proposal, that would help tremendously. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-08-31 16:11, Michael S. Tsirkin wrote: > Hello! > During the KVM forum, we discussed supporting virtio on top > of ivshmem. No, not on top of ivshmem. On top of shared memory. Our model is different from the simplistic ivshmem. > I have considered it, and came up with an alternative > that has several advantages over that - please see below. > Comments welcome. > > - > > Existing solutions to userspace switching between VMs on the > same host are vhost-user and ivshmem. > > vhost-user works by mapping memory of all VMs being bridged into the > switch memory space. > > By comparison, ivshmem works by exposing a shared region of memory to all VMs. > VMs are required to use this region to store packets. The switch only > needs access to this region. > > Another difference between vhost-user and ivshmem surfaces when polling > is used. With vhost-user, the switch is required to handle > data movement between VMs, if using polling, this means that 1 host CPU > needs to be sacrificed for this task. > > This is easiest to understand when one of the VMs is > used with VF pass-through. This can be schematically shown below: > > +-- VM1 --++---VM2---+ > | virtio-pci +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- > NIC > +-++-+ > > > With ivshmem in theory communication can happen directly, with two VMs > polling the shared memory region. > > > I won't spend time listing advantages of vhost-user over ivshmem. > Instead, having identified two advantages of ivshmem over vhost-user, > below is a proposal to extend vhost-user to gain the advantages > of ivshmem. > > > 1: virtio in guest can be extended to allow support > for IOMMUs. This provides guest with full flexibility > about memory which is readable or write able by each device. > By setting up a virtio device for each other VM we need to > communicate to, guest gets full control of its security, from > mapping all memory (like with current vhost-user) to only > mapping buffers used for networking (like ivshmem) to > transient mappings for the duration of data transfer only. > This also allows use of VFIO within guests, for improved > security. > > vhost user would need to be extended to send the > mappings programmed by guest IOMMU. > > 2. qemu can be extended to serve as a vhost-user client: > remote VM mappings over the vhost-user protocol, and > map them into another VM's memory. > This mapping can take, for example, the form of > a BAR of a pci device, which I'll call here vhost-pci - > with bus address allowed > by VM1's IOMMU mappings being translated into > offsets within this BAR within VM2's physical > memory space. > > Since the translation can be a simple one, VM2 > can perform it within its vhost-pci device driver. > > While this setup would be the most useful with polling, > VM1's ioeventfd can also be mapped to > another VM2's irqfd, and vice versa, such that VMs > can trigger interrupts to each other without need > for a helper thread on the host. > > > The resulting channel might look something like the following: > > +-- VM1 --+ +---VM2---+ > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC > +-+ +-+ > > comparing the two diagrams, a vhost-user thread on the host is > no longer required, reducing the host CPU utilization when > polling is active. At the same time, VM2 can not access all of VM1's > memory - it is limited by the iommu configuration setup by VM1. > > > Advantages over ivshmem: > > - more flexibility, endpoint VMs do not have to place data at any > specific locations to use the device, in practice this likely > means less data copies. > - better standardization/code reuse > virtio changes within guests would be fairly easy to implement > and would also benefit other backends, besides vhost-user > standard hotplug interfaces can be used to add and remove these > channels as VMs are added or removed. > - migration support > It's easy to implement since ownership of memory is well defined. > For example, during migration VM2 can notify hypervisor of VM1 > by updating dirty bitmap each time is writes into VM1 memory. > > Thanks, > This sounds like a different interface to a concept very similar to Xen's grant table, no? Well, there might be benefits for some use cases, for ours this is too dynamic, in fact. We'd like to avoid remappings during runtime controlled by guest activities, which is clearly required for this model. Another shortcoming: If VM1 does not trust (security or safety-wise) VM2 while preparing a message for it, it has to keep the buffer invisible for VM2 until it is completed and signed, hashed etc. That means it has to reprogram the IOMMU frequently. With the concept we discussed at KVM Forum, there would be shared memory mapped read-only to VM2 while being R/W for VM1. That would
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-01 10:01, Michael S. Tsirkin wrote: > On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote: >> Leaving all the implementation and interface details aside, this >> discussion is first of all about two fundamentally different approaches: >> static shared memory windows vs. dynamically remapped shared windows (a >> third one would be copying in the hypervisor, but I suppose we all agree >> that the whole exercise is about avoiding that). Which way do we want or >> have to go? >> >> Jan > > Dynamic is a superset of static: you can always make it static if you > wish. Static has the advantage of simplicity, but that's lost once you > realize you need to invent interfaces to make it work. Since we can use > existing IOMMU interfaces for the dynamic one, what's the disadvantage? Complexity. Having to emulate even more of an IOMMU in the hypervisor (we already have to do a bit for VT-d IR in Jailhouse) and doing this per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that sense, generic grant tables would be more appealing. But what we would actually need is an interface that is only *optionally* configured by a guest for dynamic scenarios, otherwise preconfigured by the hypervisor for static setups. And we need guests that support both. That's the challenge. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-01 18:02, Michael S. Tsirkin wrote: > On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote: >> On 2015-09-01 16:34, Michael S. Tsirkin wrote: >>> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote: >>>> On 2015-09-01 11:24, Michael S. Tsirkin wrote: >>>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote: >>>>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote: >>>>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote: >>>>>>>> Leaving all the implementation and interface details aside, this >>>>>>>> discussion is first of all about two fundamentally different >>>>>>>> approaches: >>>>>>>> static shared memory windows vs. dynamically remapped shared windows (a >>>>>>>> third one would be copying in the hypervisor, but I suppose we all >>>>>>>> agree >>>>>>>> that the whole exercise is about avoiding that). Which way do we want >>>>>>>> or >>>>>>>> have to go? >>>>>>>> >>>>>>>> Jan >>>>>>> >>>>>>> Dynamic is a superset of static: you can always make it static if you >>>>>>> wish. Static has the advantage of simplicity, but that's lost once you >>>>>>> realize you need to invent interfaces to make it work. Since we can use >>>>>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage? >>>>>> >>>>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor >>>>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this >>>>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that >>>>>> sense, generic grant tables would be more appealing. >>>>> >>>>> That's not how we do things for KVM, PV features need to be >>>>> modular and interchangeable with emulation. >>>> >>>> I know, and we may have to make some compromise for Jailhouse if that >>>> brings us valuable standardization and broad guest support. But we will >>>> surely not support an arbitrary amount of IOMMU models for that reason. >>>> >>>>> >>>>> If you just want something that's cross-platform and easy to >>>>> implement, just build a PV IOMMU. Maybe use virtio for this. >>>> >>>> That is likely required to keep the complexity manageable and to allow >>>> static preconfiguration. >>> >>> Real IOMMU allow static configuration just fine. This is exactly >>> what VFIO uses. >> >> Please specify more precisely which feature in which IOMMU you are >> referring to. Also, given that you refer to VFIO, I suspect we have >> different thing in mind. I'm talking about an IOMMU device model, like >> the one we have in QEMU now for VT-d. That one is not at all >> preconfigured by the host for VFIO. > > I really just mean that VFIO creates a mostly static IOMMU configuration. > > It's configured by the guest, not the host. OK, that resolves my confusion. > > I don't see host control over configuration as being particularly important. We do, see below. > > >>> >>>> Well, we could declare our virtio-shmem device to be an IOMMU device >>>> that controls access of a remote VM to RAM of the one that owns the >>>> device. In the static case, this access may at most be enabled/disabled >>>> but not moved around. The static regions would have to be discoverable >>>> for the VM (register read-back), and the guest's firmware will likely >>>> have to declare those ranges reserved to the guest OS. >>>> In the dynamic case, the guest would be able to create an alternative >>>> mapping. >>> >>> >>> I don't think we want a special device just to support the >>> static case. It might be a bit less code to write, but >>> eventually it should be up to the guest. >>> Fundamentally, it's policy that host has no business >>> dictating. >> >> "A bit less" is to be validated, and I doubt its just "a bit". But if >> KVM and its guests will also support some PV-IOMMU that we can reuse for >> our scenarios, than that is fine. KVM would not have to mandate support >> for it while we would, that's all. > > Someone will have to do this work. > >
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-01 11:24, Michael S. Tsirkin wrote: > On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote: >> On 2015-09-01 10:01, Michael S. Tsirkin wrote: >>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote: >>>> Leaving all the implementation and interface details aside, this >>>> discussion is first of all about two fundamentally different approaches: >>>> static shared memory windows vs. dynamically remapped shared windows (a >>>> third one would be copying in the hypervisor, but I suppose we all agree >>>> that the whole exercise is about avoiding that). Which way do we want or >>>> have to go? >>>> >>>> Jan >>> >>> Dynamic is a superset of static: you can always make it static if you >>> wish. Static has the advantage of simplicity, but that's lost once you >>> realize you need to invent interfaces to make it work. Since we can use >>> existing IOMMU interfaces for the dynamic one, what's the disadvantage? >> >> Complexity. Having to emulate even more of an IOMMU in the hypervisor >> (we already have to do a bit for VT-d IR in Jailhouse) and doing this >> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that >> sense, generic grant tables would be more appealing. > > That's not how we do things for KVM, PV features need to be > modular and interchangeable with emulation. I know, and we may have to make some compromise for Jailhouse if that brings us valuable standardization and broad guest support. But we will surely not support an arbitrary amount of IOMMU models for that reason. > > If you just want something that's cross-platform and easy to > implement, just build a PV IOMMU. Maybe use virtio for this. That is likely required to keep the complexity manageable and to allow static preconfiguration. Well, we could declare our virtio-shmem device to be an IOMMU device that controls access of a remote VM to RAM of the one that owns the device. In the static case, this access may at most be enabled/disabled but not moved around. The static regions would have to be discoverable for the VM (register read-back), and the guest's firmware will likely have to declare those ranges reserved to the guest OS. In the dynamic case, the guest would be able to create an alternative mapping. We would probably have to define a generic page table structure for that. Or do you rather have some MPU-like control structure in mind, more similar to the memory region descriptions vhost is already using? Also not yet clear to me are how the vhost-pci device and the translations it will have to do should look like for VM2. > >> But what we would >> actually need is an interface that is only *optionally* configured by a >> guest for dynamic scenarios, otherwise preconfigured by the hypervisor >> for static setups. And we need guests that support both. That's the >> challenge. >> >> Jan > > That's already there for IOMMUs: vfio does the static setup by default, > enabling iommu by guests is optional. Cannot follow yet how vfio comes into play regarding some preconfigured virtual IOMMU. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: rfc: vhost user enhancements for vm2vm communication
On 2015-09-01 16:34, Michael S. Tsirkin wrote: > On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote: >> On 2015-09-01 11:24, Michael S. Tsirkin wrote: >>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote: >>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote: >>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote: >>>>>> Leaving all the implementation and interface details aside, this >>>>>> discussion is first of all about two fundamentally different approaches: >>>>>> static shared memory windows vs. dynamically remapped shared windows (a >>>>>> third one would be copying in the hypervisor, but I suppose we all agree >>>>>> that the whole exercise is about avoiding that). Which way do we want or >>>>>> have to go? >>>>>> >>>>>> Jan >>>>> >>>>> Dynamic is a superset of static: you can always make it static if you >>>>> wish. Static has the advantage of simplicity, but that's lost once you >>>>> realize you need to invent interfaces to make it work. Since we can use >>>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage? >>>> >>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor >>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this >>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that >>>> sense, generic grant tables would be more appealing. >>> >>> That's not how we do things for KVM, PV features need to be >>> modular and interchangeable with emulation. >> >> I know, and we may have to make some compromise for Jailhouse if that >> brings us valuable standardization and broad guest support. But we will >> surely not support an arbitrary amount of IOMMU models for that reason. >> >>> >>> If you just want something that's cross-platform and easy to >>> implement, just build a PV IOMMU. Maybe use virtio for this. >> >> That is likely required to keep the complexity manageable and to allow >> static preconfiguration. > > Real IOMMU allow static configuration just fine. This is exactly > what VFIO uses. Please specify more precisely which feature in which IOMMU you are referring to. Also, given that you refer to VFIO, I suspect we have different thing in mind. I'm talking about an IOMMU device model, like the one we have in QEMU now for VT-d. That one is not at all preconfigured by the host for VFIO. > >> Well, we could declare our virtio-shmem device to be an IOMMU device >> that controls access of a remote VM to RAM of the one that owns the >> device. In the static case, this access may at most be enabled/disabled >> but not moved around. The static regions would have to be discoverable >> for the VM (register read-back), and the guest's firmware will likely >> have to declare those ranges reserved to the guest OS. >> In the dynamic case, the guest would be able to create an alternative >> mapping. > > > I don't think we want a special device just to support the > static case. It might be a bit less code to write, but > eventually it should be up to the guest. > Fundamentally, it's policy that host has no business > dictating. "A bit less" is to be validated, and I doubt its just "a bit". But if KVM and its guests will also support some PV-IOMMU that we can reuse for our scenarios, than that is fine. KVM would not have to mandate support for it while we would, that's all. > >> We would probably have to define a generic page table structure >> for that. Or do you rather have some MPU-like control structure in mind, >> more similar to the memory region descriptions vhost is already using? > > I don't care much. Page tables use less memory if a lot of memory needs > to be covered. OTOH if you want to use virtio (e.g. to allow command > batching) that likely means commands to manipulate the IOMMU, and > maintaining it all on the host. You decide. I don't care very much about the dynamic case as we won't support it anyway. However, if the configuration concept used for it is applicable to static mode as well, then we could reuse it. But preconfiguration will required register-based region description, I suspect. > >> Also not yet clear to me are how the vhost-pci device and the >> translations it will have to do should look like for VM2. > > I think we can use vhost-pci BAR + VM1 bus address as the > VM2 physical address. In other words, all memory exposed to > virtio-pci by VM1 through it's IOMMU is
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-29 01:21, Benjamin Herrenschmidt wrote: On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote: New QEMU always advertises this feature flag. If iommu=on, QEMU's virtio devices refuse to work unless the driver acknowledges the flag. This should be configurable. Advertisement of that flag must be configurable, or we won't be able to run older guests anymore which don't know it, thus will reject it. The only precondition: there must be no IOMMU if we turn it off. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-29 10:17, Paolo Bonzini wrote: On 29/07/2015 02:47, Andy Lutomirski wrote: If new kernels ignore the IOMMU for devices that don't set the flag and there are physical devices that already exist and don't set the flag, then those devices won't work reliably on most modern non-virtual platforms, PPC included. Are there many virtio physical devices out there ? We are talking about a virtio flag right ? Or have you been considering something else ? Yes, virtio flag. I dislike having a virtio flag at all, but so far no one has come up with any better ideas. If there was a reliable, cross-platform mechanism for per-device PCI bus properties, I'd be all for using that instead. No, a virtio flag doesn't make sense. That will create the risk of subtly breaking old guests over new setups. I wouldn't suggest this. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 15:06, Michael S. Tsirkin wrote: On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote: On 28/07/2015 12:12, Benjamin Herrenschmidt wrote: That is an experimental feature (it's x-iommu), so it can change. The plan was: - for PPC, virtio never honors IOMMU - for non-PPC, either have virtio always honor IOMMU, or enforce that virtio is not under IOMMU. I dislike having PPC special cased. In fact, today x86 guests also assume that virtio bypasses IOMMU I believe. In fact *all* guests do. This doesn't matter much, since the only guests that implement an IOMMU in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind of stability. Hmm I think Jan (cc) said it was already used out there. Yes, no known issues with vt-d emulation for almost a year now. Error reporting could be improved, and interrupt remapping is still missing, but those are minor issues in this context. In my testing setups, I also have virtio devices in use, passed through to an L2 guest, but only in 1:1 mapping so that their broken IOMMU support causes no practical problems. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 19:15, Paolo Bonzini wrote: On 28/07/2015 18:42, Jan Kiszka wrote: On the other hand interrupt remapping is absolutely necessary for production use, hence my point that x86 does not promise API stability. Well, we currently implement the features that the Q35 used to expose. Adding interrupt remapping will require a new chipset and/or a hack switch to ignore compatibility. Isn't the VT-d register space separate from other Q35 features and backwards-compatible? You could even add it to PIIX in theory just by adding a DMAR. Yes, it's practically working, but it's not accurate /wrt how that hardware looked like in reality. It's not like for example SMRAM, where the registers are in the northbridge configuration space and move around in every chipset generation. (Any kind of stability actually didn't include crashes; those are not expected :)) The Google patches for userspace PIC and IOAPIC are proceeding well, so hopefully we can have interrupt remapping soon. If the day had 48 hours... I'd love to look into this, first adding QEMU support for the new irqchip architecture. I hope I can squeeze in some time for that... Google also had an intern that was looking at it. Great! Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 18:36, Paolo Bonzini wrote: On 28/07/2015 15:11, Jan Kiszka wrote: This doesn't matter much, since the only guests that implement an IOMMU in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind of stability. Hmm I think Jan (cc) said it was already used out there. Yes, no known issues with vt-d emulation for almost a year now. Error reporting could be improved, and interrupt remapping is still missing, but those are minor issues in this context. On the other hand interrupt remapping is absolutely necessary for production use, hence my point that x86 does not promise API stability. Well, we currently implement the features that the Q35 used to expose. Adding interrupt remapping will require a new chipset and/or a hack switch to ignore compatibility. (Any kind of stability actually didn't include crashes; those are not expected :)) The Google patches for userspace PIC and IOAPIC are proceeding well, so hopefully we can have interrupt remapping soon. If the day had 48 hours... I'd love to look into this, first adding QEMU support for the new irqchip architecture. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 19:10, Andy Lutomirski wrote: On Tue, Jul 28, 2015 at 9:44 AM, Jan Kiszka jan.kis...@siemens.com wrote: The ability to have virtio on systems with IOMMU in place makes testing much more efficient for us. Ideally, we would have it in non-identity mapping scenarios as well, e.g. to start secondary Linux instances in the test VMs, giving them their own virtio devices. And we will eventually have this need on ARM as well. Virtio needs to be backward compatible, so the change to put these devices under IOMMU control could be advertised during feature negotiations and controlled on QEMU side via a device property. Newer guest drivers would have to acknowledge that they support virtio via IOMMUs. Older ones would refuse to work, and the admin could instead spawn VMs with this feature disabled. The trouble is that this is really a property of the bus and not of the device. If you build a virtio device that physically plugs into a PCIe slot, the device has no concept of an IOMMU in the first place. If one would build a real virtio device today, it would be broken because every IOMMU would start to translate its requests. Already from that POV, we really need to introduce a feature flag I will be IOMMU-translated so that a potential physical implementation can carry it unconditionally. Similarly, if you take an L0-provided IOMMU-supporting device and pass it through to L2 using current QEMU on L1 (with Q35 emulation and iommu enabled), then, from L2's perspective, the device is 1:1 no matter what the device thinks. IOW, I think the original design was wrong and now we have to deal with it. I think the best solution would be to teach QEMU to fix its ACPI tables so that 1:1 virtio devices are actually exposed as 1:1. Only the current drivers are broken. And we can easily tell them apart from newer ones via feature flags. Sorry, don't get the problem. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 18:11, Andy Lutomirski wrote: On Jul 28, 2015 6:11 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2015-07-28 15:06, Michael S. Tsirkin wrote: On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote: On 28/07/2015 12:12, Benjamin Herrenschmidt wrote: That is an experimental feature (it's x-iommu), so it can change. The plan was: - for PPC, virtio never honors IOMMU - for non-PPC, either have virtio always honor IOMMU, or enforce that virtio is not under IOMMU. I dislike having PPC special cased. In fact, today x86 guests also assume that virtio bypasses IOMMU I believe. In fact *all* guests do. This doesn't matter much, since the only guests that implement an IOMMU in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind of stability. Hmm I think Jan (cc) said it was already used out there. Yes, no known issues with vt-d emulation for almost a year now. Error reporting could be improved, and interrupt remapping is still missing, but those are minor issues in this context. In my testing setups, I also have virtio devices in use, passed through to an L2 guest, but only in 1:1 mapping so that their broken IOMMU support causes no practical problems. How are you getting 1:1 to work? Is it something that L0 QEMU can advertise to L1? If so, can we just do that unconditionally, which would make my patch work? The guest hypervisor is Jailhouse and the guest is the root cell that loaded the hypervisor, thus continues with identity mappings. You usually don't have 1:1 mapping with other setups - maybe with some Xen configuration? Dunno. I have no objection to 1:1 devices in general. It's only devices that the PCI code on the guest identifies as not 1:1 but that are nonetheless 1:1 that cause problems. The ability to have virtio on systems with IOMMU in place makes testing much more efficient for us. Ideally, we would have it in non-identity mapping scenarios as well, e.g. to start secondary Linux instances in the test VMs, giving them their own virtio devices. And we will eventually have this need on ARM as well. Virtio needs to be backward compatible, so the change to put these devices under IOMMU control could be advertised during feature negotiations and controlled on QEMU side via a device property. Newer guest drivers would have to acknowledge that they support virtio via IOMMUs. Older ones would refuse to work, and the admin could instead spawn VMs with this feature disabled. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 20:22, Andy Lutomirski wrote: On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2015-07-28 19:10, Andy Lutomirski wrote: The trouble is that this is really a property of the bus and not of the device. If you build a virtio device that physically plugs into a PCIe slot, the device has no concept of an IOMMU in the first place. If one would build a real virtio device today, it would be broken because every IOMMU would start to translate its requests. Already from that POV, we really need to introduce a feature flag I will be IOMMU-translated so that a potential physical implementation can carry it unconditionally. Except that, with my patches, it would work correctly. ISTM the thing I haven't looked at your patches yet - they make the virtio PCI driver in Linux IOMMU-compatible? Perfect - except for a compatibility check, right? that's broken right now is QEMU and the virtio_pci driver. My patches fix the driver. Last year that would have been the end of the story except for PPC. Now we have to deal with QEMU. Similarly, if you take an L0-provided IOMMU-supporting device and pass it through to L2 using current QEMU on L1 (with Q35 emulation and iommu enabled), then, from L2's perspective, the device is 1:1 no matter what the device thinks. IOW, I think the original design was wrong and now we have to deal with it. I think the best solution would be to teach QEMU to fix its ACPI tables so that 1:1 virtio devices are actually exposed as 1:1. Only the current drivers are broken. And we can easily tell them apart from newer ones via feature flags. Sorry, don't get the problem. I still don't see how feature flags solve the problem. Suppose we added a feature flag meaning respects IOMMU. Bad case 1: Build a malicious device that advertises non-IOMMU-respecting virtio. Plug it in behind an IOMMU. Host starts leaking physical addresses to the device (and the device doesn't work, of course). Maybe that's only barely a security problem, but still... I don't see right now how critical such a hypothetical case could be. But the OS / its drivers could still decide to refuse talking to such a device. Bad case 2: Use current QEMU w/ IOMMU enabled. Assign a virtio device provided by L0 QEMU to L2. L1 crashes. I consider *that* to be a security problem, although in practice no one will configure their system that way because it has zero chance of actually working. Nonetheless, the device does work if L1 accesses it directly? The issue is vfio doesn't notice that the device doesn't respect the IOMMU because respects-IOMMU is a property of the PCI bus and the platform IOMMU, and vfio assumes it works correctly. I would have no problem with rejecting configurations in future QEMU that try to expose unconfined virtio devices in the presence of IOMMU emulation. Once we can do better, it's just about letting the guest know about the difference. The current situation is indeed just broken, we don't need to discuss this as we can't change history to prevent this. Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio device that *does* respect the IOMMU and sets the feature flag. They emulate Q35 with an IOMMU. They boot Linux 4.1. Data corruption in the guest. No. In that case, the feature negotiation of virtio-with-iommu-support would have failed for older drivers, and the device would have never been used by the guest. We could make the rule that *all* virtio-pci devices (except on PPC) respect the bus rules. We'd have to fix QEMU so that virtio devices on Q35 iommu=on systems set up a PCI topology where the devices *aren't* behind the IOMMU or are protected by RMRRs or whatever. Then old kernels would work correctly on new hosts, new kernels would work correctly except on old iommu-providing hosts, and Xen would work. I don't see a point in doing anything about old QEMU with IOMMU enabled and virtio devices plugged except declaring such setups broken. No one should have configured this for production purposes, only for test setups (like we, with the knowledge about the limitations). In fact, on Xen, it's impossible without colossal hacks to support non-IOMMU-respecting virtio devices because Xen acts as an intermediate IOMMU between the Linux dom0 guest and the actual host. The QEMU host doesn't even know that Xen is involved. This is why Xen and virtio don't currently work together (without my patches): the device thinks it doesn't respect the IOMMU, the driver thinks the device doesn't respect the IOMMU, and they're both wrong. TL;DR: I think there are only two cases. Either a device respects the IOMMU or a device doesn't know whether it respects the IOMMU. The latter case is problematic. See above, the latter is only problematic on setups that actually use an IOMMU. If that includes Xen, then no one should use it until virtio can declare itself IOMMU
Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On 2015-07-28 21:24, Andy Lutomirski wrote: On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka jan.kis...@siemens.com wrote: On 2015-07-28 20:22, Andy Lutomirski wrote: On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2015-07-28 19:10, Andy Lutomirski wrote: The trouble is that this is really a property of the bus and not of the device. If you build a virtio device that physically plugs into a PCIe slot, the device has no concept of an IOMMU in the first place. If one would build a real virtio device today, it would be broken because every IOMMU would start to translate its requests. Already from that POV, we really need to introduce a feature flag I will be IOMMU-translated so that a potential physical implementation can carry it unconditionally. Except that, with my patches, it would work correctly. ISTM the thing I haven't looked at your patches yet - they make the virtio PCI driver in Linux IOMMU-compatible? Perfect - except for a compatibility check, right? Yes. (virtio_pci_legacy, anyway. Presumably virtio_pci_modern is easy to adapt, too.) that's broken right now is QEMU and the virtio_pci driver. My patches fix the driver. Last year that would have been the end of the story except for PPC. Now we have to deal with QEMU. Similarly, if you take an L0-provided IOMMU-supporting device and pass it through to L2 using current QEMU on L1 (with Q35 emulation and iommu enabled), then, from L2's perspective, the device is 1:1 no matter what the device thinks. IOW, I think the original design was wrong and now we have to deal with it. I think the best solution would be to teach QEMU to fix its ACPI tables so that 1:1 virtio devices are actually exposed as 1:1. Only the current drivers are broken. And we can easily tell them apart from newer ones via feature flags. Sorry, don't get the problem. I still don't see how feature flags solve the problem. Suppose we added a feature flag meaning respects IOMMU. Bad case 1: Build a malicious device that advertises non-IOMMU-respecting virtio. Plug it in behind an IOMMU. Host starts leaking physical addresses to the device (and the device doesn't work, of course). Maybe that's only barely a security problem, but still... I don't see right now how critical such a hypothetical case could be. But the OS / its drivers could still decide to refuse talking to such a device. How does OS know it's such a device as opposed to a QEMU-supplied thing? It can restrict itself to virtio devices exposing the feature if it feels uncomfortable that it might be talking to some evil piece of silicon (instead of the hypervisor, which has to be trusted anyway). Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio device that *does* respect the IOMMU and sets the feature flag. They emulate Q35 with an IOMMU. They boot Linux 4.1. Data corruption in the guest. No. In that case, the feature negotiation of virtio-with-iommu-support would have failed for older drivers, and the device would have never been used by the guest. So are you suggesting that newer virtio devices always provide this feature flag and, if supplied by QEMU with iommu=on, simply refuse to operate of the driver doesn't support that flag? Exactly. That could work as long as QEMU with the current (broken?) iommu=on never exposes such a device. QEMU would have to be adjusted first so that all its virtio-pci device models take IOMMUs into account - if they exist or not. Only then it could expose the feature and expect the guest to acknowledge it. For compat reasons, QEMU should still be able to expose virtio devices without the flag set - but then without any IOMMU emulation enabled as well. That would prevent the current setup we are using today, but it's trivial to update the guest kernel to a newer virtio driver which would restore our scenario again. We could make the rule that *all* virtio-pci devices (except on PPC) respect the bus rules. We'd have to fix QEMU so that virtio devices on Q35 iommu=on systems set up a PCI topology where the devices *aren't* behind the IOMMU or are protected by RMRRs or whatever. Then old kernels would work correctly on new hosts, new kernels would work correctly except on old iommu-providing hosts, and Xen would work. I don't see a point in doing anything about old QEMU with IOMMU enabled and virtio devices plugged except declaring such setups broken. No one should have configured this for production purposes, only for test setups (like we, with the knowledge about the limitations). I'm fine with that. In fact, I proposed these patches before QEMU had this feature in the first place. In fact, on Xen, it's impossible without colossal hacks to support non-IOMMU-respecting virtio devices because Xen acts as an intermediate IOMMU between the Linux dom0 guest and the actual host. The QEMU host doesn't even know that Xen is involved
Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 14:35 schrieb Jan Kiszka: Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote: On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote: The motivation for making VM-to-VM fast is that while software switches on the host are efficient today (thanks to vhost-user), there is no efficient solution if the software switch is a VM. I see. This sounds like a noble goal indeed. I would love to run the software switch as just another VM in the long term. It would make it much easier for the various software switches to coexist in the world. The main technical risk I see in this proposal is that eliminating the memory copies might not have the desired effect. I might be tempted to keep the copies but prevent the kernel from having to inspect the vrings (more like vhost-user). But that is just a hunch and I suppose the first step would be a prototype to check the performance anyway. For what it is worth here is my view of networking performance on x86 in the Haswell+ era: https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow Thanks. I've been thinking about how to eliminate the VM - host - VM switching and instead achieve just VM - VM. The holy grail of VM-to-VM networking is an exitless I/O path. In other words, packets can be transferred between VMs without any vmexits (this requires a polling driver). Here is how it works. QEMU gets -device vhost-user so that a VM can act as the vhost-user server: VM1 (virtio-net guest driver) - VM2 (vhost-user device) VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device and plays the host role instead of the normal virtio-net guest driver role. The ugly thing about this is that VM2 needs to map all of VM1's guest RAM so it can access the vrings and packet data. The solution to this is something like the Shared Buffers BAR but this time it contains not just the packet data but also the vring, let's call it the Shared Virtqueues BAR. The Shared Virtqueues BAR eliminates the need for vhost-net on the host because VM1 and VM2 communicate directly using virtqueue notify or polling vring memory. Virtqueue notify works by connecting an eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have an ioeventfd that is irqfd for VM1 to signal completions. We had such a discussion before: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 Would be great to get this ball rolling again. Jan But one challenge would remain even then (unless both sides only poll): exit-free inter-VM signaling, no? But that's a hardware issue first of all. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote: On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote: The motivation for making VM-to-VM fast is that while software switches on the host are efficient today (thanks to vhost-user), there is no efficient solution if the software switch is a VM. I see. This sounds like a noble goal indeed. I would love to run the software switch as just another VM in the long term. It would make it much easier for the various software switches to coexist in the world. The main technical risk I see in this proposal is that eliminating the memory copies might not have the desired effect. I might be tempted to keep the copies but prevent the kernel from having to inspect the vrings (more like vhost-user). But that is just a hunch and I suppose the first step would be a prototype to check the performance anyway. For what it is worth here is my view of networking performance on x86 in the Haswell+ era: https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow Thanks. I've been thinking about how to eliminate the VM - host - VM switching and instead achieve just VM - VM. The holy grail of VM-to-VM networking is an exitless I/O path. In other words, packets can be transferred between VMs without any vmexits (this requires a polling driver). Here is how it works. QEMU gets -device vhost-user so that a VM can act as the vhost-user server: VM1 (virtio-net guest driver) - VM2 (vhost-user device) VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device and plays the host role instead of the normal virtio-net guest driver role. The ugly thing about this is that VM2 needs to map all of VM1's guest RAM so it can access the vrings and packet data. The solution to this is something like the Shared Buffers BAR but this time it contains not just the packet data but also the vring, let's call it the Shared Virtqueues BAR. The Shared Virtqueues BAR eliminates the need for vhost-net on the host because VM1 and VM2 communicate directly using virtqueue notify or polling vring memory. Virtqueue notify works by connecting an eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have an ioeventfd that is irqfd for VM1 to signal completions. We had such a discussion before: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 Would be great to get this ball rolling again. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi: On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka jan.kis...@siemens.com wrote: Am 2015-04-27 um 14:35 schrieb Jan Kiszka: Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote: On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote: The motivation for making VM-to-VM fast is that while software switches on the host are efficient today (thanks to vhost-user), there is no efficient solution if the software switch is a VM. I see. This sounds like a noble goal indeed. I would love to run the software switch as just another VM in the long term. It would make it much easier for the various software switches to coexist in the world. The main technical risk I see in this proposal is that eliminating the memory copies might not have the desired effect. I might be tempted to keep the copies but prevent the kernel from having to inspect the vrings (more like vhost-user). But that is just a hunch and I suppose the first step would be a prototype to check the performance anyway. For what it is worth here is my view of networking performance on x86 in the Haswell+ era: https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow Thanks. I've been thinking about how to eliminate the VM - host - VM switching and instead achieve just VM - VM. The holy grail of VM-to-VM networking is an exitless I/O path. In other words, packets can be transferred between VMs without any vmexits (this requires a polling driver). Here is how it works. QEMU gets -device vhost-user so that a VM can act as the vhost-user server: VM1 (virtio-net guest driver) - VM2 (vhost-user device) VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device and plays the host role instead of the normal virtio-net guest driver role. The ugly thing about this is that VM2 needs to map all of VM1's guest RAM so it can access the vrings and packet data. The solution to this is something like the Shared Buffers BAR but this time it contains not just the packet data but also the vring, let's call it the Shared Virtqueues BAR. The Shared Virtqueues BAR eliminates the need for vhost-net on the host because VM1 and VM2 communicate directly using virtqueue notify or polling vring memory. Virtqueue notify works by connecting an eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have an ioeventfd that is irqfd for VM1 to signal completions. We had such a discussion before: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 Would be great to get this ball rolling again. Jan But one challenge would remain even then (unless both sides only poll): exit-free inter-VM signaling, no? But that's a hardware issue first of all. To start with ioeventfd-irqfd can be used. It incurs a light-weight exit in VM1 and interrupt injection in VM2. For networking the cost is mitigated by NAPI drivers which switch between interrupts and polling. During notification-heavy periods the guests would use polling anyway. A hardware solution would be some kind of inter-guest interrupt injection. I don't know VMX well enough to know whether that is possible on Intel CPUs. Today, we have posted interrupts to avoid the vm-exit on the target CPU, but there is nothing yet (to my best knowledge) to avoid the exit on the sender side (unless we ignore security). That's the same problem with intra-guest IPIs, BTW. For throughput and given NAPI patterns, that's probably not an issue as you noted. It may be for latency, though, when almost every cycle counts. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 16:36 schrieb Luke Gorrie: On 27 April 2015 at 16:30, Jan Kiszka jan.kis...@siemens.com wrote: Today, we have posted interrupts to avoid the vm-exit on the target CPU, but there is nothing yet (to my best knowledge) to avoid the exit on the sender side (unless we ignore security). That's the same problem with intra-guest IPIs, BTW. For throughput and given NAPI patterns, that's probably not an issue as you noted. It may be for latency, though, when almost every cycle counts. Poll-mode networking applications (DPDK, Snabb Switch, etc) are typically busy-looping to poll the vring. They may have a very short usleep() between checks to save power but they don't wait on their eventfd. So for those particular applications latency is on the order of tens of microseconds even without guest exits. That's one side, don't forget the others (the normal guests). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support
On 2014-10-22 22:34, Benjamin Herrenschmidt wrote: On Wed, 2014-10-22 at 16:17 +0200, Jan Kiszka wrote: I thought about this again, and I'm not sure anymore if we can use ACPI to black-list the incompatible virtio devices. Reason: hotplug. To my understanding, the ACPI DRHD tables won't change during runtime when a device shows up or disappears. We would have to isolate virtio devices from the rest of the system by using separate buses for it (and avoid listing those in any DRHD table) and enforce that they only get plugged into those buses. I suppose that is not desirable. Maybe it's better to fix virtio /wrt IOMMUs. I always go back to my initial proposal which is to define that current virtio always bypass any iommu (which is what it does really) and have it expose via a new capability if that isn't the case. That means fixing that Xen thingy to allow qemu to know what to expose I assume but that seems to be the less bad approach. Just one thing to consider: feature negotiation happens after guest startup. If we run a virtio device under IOMMU control, what will we have to do when the guest says it does not support such devices? Simply reject operation? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support
On 2014-10-22 10:44, Michael S. Tsirkin wrote: On Wed, Oct 08, 2014 at 11:04:28AM +0200, Cornelia Huck wrote: On Tue, 07 Oct 2014 18:24:22 -0700 Andy Lutomirski l...@amacapital.net wrote: On 10/07/2014 07:39 AM, Cornelia Huck wrote: This patchset aims to get us some way to implement virtio-1 compliant and transitional devices in qemu. Branch available at git://github.com/cohuck/qemu virtio-1 I've mainly focused on: - endianness handling - extended feature bits - virtio-ccw new/changed commands At the risk of some distraction, would it be worth thinking about a solution to the IOMMU bypassing mess as part of this? I think that is a whole different issue. virtio-1 is basically done - we just need to implement it - while the IOMMU/DMA stuff certainly needs more discussion. Therefore, I'd like to defer to the other discussion thread here. I agree, let's do a separate thread for this. I also think it's up to the hypervisors at this point. People talked about using ACPI to report IOMMU bypass to guest. If that happens, we don't need a feature bit. I thought about this again, and I'm not sure anymore if we can use ACPI to black-list the incompatible virtio devices. Reason: hotplug. To my understanding, the ACPI DRHD tables won't change during runtime when a device shows up or disappears. We would have to isolate virtio devices from the rest of the system by using separate buses for it (and avoid listing those in any DRHD table) and enforce that they only get plugged into those buses. I suppose that is not desirable. Maybe it's better to fix virtio /wrt IOMMUs. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Using virtio for inter-VM communication
On 2014-06-17 07:24, Paolo Bonzini wrote: Il 15/06/2014 08:20, Jan Kiszka ha scritto: I think implementing Xen hypercalls in jailhouse for grant table and event channels would actually make a lot of sense. The Xen implementation is 2.5kLOC and I think it should be possible to compact it noticeably, especially if you limit yourself to 64-bit guests. At least the grant table model seems unsuited for Jailhouse. It allows a guest to influence the mapping of another guest during runtime. This we want (or even have) to avoid in Jailhouse. IIRC implementing the grant table hypercalls with copies is inefficient but valid. Back to #1: This is what Rusty is suggesting for virtio. Nothing to win with grant tables then. And if we really have to copy, I would prefer to use a standard. I guess we need to play with prototypes to assess feasibility and impact on existing code. Jan signature.asc Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Using virtio for inter-VM communication
On 2014-06-13 10:45, Paolo Bonzini wrote: Il 13/06/2014 08:23, Jan Kiszka ha scritto: That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. Scenarios behind this are things like driver VMs: You pass through the physical hardware to a driver guest that talks to the hardware and relays data via one or more virtual channels to other VMs. This confines a certain set of security and stability risks to the driver VM. I think implementing Xen hypercalls in jailhouse for grant table and event channels would actually make a lot of sense. The Xen implementation is 2.5kLOC and I think it should be possible to compact it noticeably, especially if you limit yourself to 64-bit guests. At least the grant table model seems unsuited for Jailhouse. It allows a guest to influence the mapping of another guest during runtime. This we want (or even have) to avoid in Jailhouse. I'm therefore more in favor of a model where the shared memory region is defined on cell (guest) creation by adding a virtual device that comes with such a region. Jan It should also be almost enough to run Xen PVH guests as jailhouse partitions. If later Xen starts to support virtio, you will get that for free. Paolo signature.asc Description: OpenPGP digital signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Using virtio for inter-VM communication
On 2014-06-13 02:47, Rusty Russell wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2014-06-12 04:27, Rusty Russell wrote: Henning Schild henning.sch...@siemens.com writes: It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. The reason why a trusted helper, i.e. additional logic in the hypervisor, is not our favorite solution is that we'd like to keep the hypervisor as small as possible. I wouldn't exclude such an approach categorically, but we have to weigh the costs (lines of code, additional hypervisor interface) carefully against the gain (existing specifications and guest driver infrastructure). Reasonable, but I think you'll find it is about the minimal implementation in practice. Unfortunately, I don't have time during the next 6 months to implement it myself :( Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA working group discussion): What speaks against introducing an alternative encoding of addresses inside virtio data structures? The idea of this flag was to replace guest-physical addresses with offsets into a shared memory region associated with or part of a virtio device. We would also need a way of defining the shared memory region. But that's not the problem. If such a feature is not accepted by the guest? How to you fall back? Depends on the hypervisor and its scope, but it should be quite straightforward: full-featured ones like KVM could fall back to slow copying, specialized ones like Jailhouse would clear FEATURES_OK if the guest driver does not accept it (because there would be no ring walking or copying code in Jailhouse), thus refuse the activate the device. That would be absolutely fine for application domains of specialized hypervisors (often embedded, customized guests etc.). The shared memory regions could be exposed as a BARs (PCI) or additional address ranges (device tree) and addressed in the redefined guest address fields via some region index and offset. We don't add features which unmake the standard. That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. Scenarios behind this are things like driver VMs: You pass through the physical hardware to a driver guest that talks to the hardware and relays data via one or more virtual channels to other VMs. This confines a certain set of security and stability risks to the driver VM. Is it too invasive to existing infrastructure or does it have some other pitfalls? You'll have to convince every vendor to implement your addition to the standard. Which is easier than inventing a completely new system, but it's not quite virtio. It would be an optional addition, a feature all three sides (host and the communicating guests) would have to agree on. I think we would only have to agree on extending the spec to enable this - after demonstrating it via an implementation, of course. Thanks, Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Using virtio for inter-VM communication
On 2014-06-12 04:27, Rusty Russell wrote: Henning Schild henning.sch...@siemens.com writes: Hi, i am working on the jailhouse[1] project and am currently looking at inter-VM communication. We want to connect guests directly with virtual consoles based on shared memory. The code complexity in the hypervisor should be minimal, it should just make the shared memory discoverable and provide a signaling mechanism. Hi Henning, The virtio assumption was that the host can see all of guest memory. This simplifies things significantly, and makes it efficient. If you don't have this, *someone* needs to do a copy. Usually the guest OS does a bounce buffer into your shared region. Goodbye performance. Or you can play remapping tricks. Goodbye performance again. My preferred model is to have a trusted helper (ie. host) which understands how to copy between virtio rings. The backend guest (to steal Xen vocab) R/O maps the descriptor, avail ring and used rings in the guest. It then asks the trusted helper to do various operation (copy into writable descriptor, copy out of readable descriptor, mark used). The virtio ring itself acts as a grant table. Note: that helper mechanism is completely protocol agnostic. It was also explicitly designed into the virtio mechanism (with its 4k boundaries for data structures and its 'len' field to indicate how much was written into the descriptor). It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. The reason why a trusted helper, i.e. additional logic in the hypervisor, is not our favorite solution is that we'd like to keep the hypervisor as small as possible. I wouldn't exclude such an approach categorically, but we have to weigh the costs (lines of code, additional hypervisor interface) carefully against the gain (existing specifications and guest driver infrastructure). Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA working group discussion): What speaks against introducing an alternative encoding of addresses inside virtio data structures? The idea of this flag was to replace guest-physical addresses with offsets into a shared memory region associated with or part of a virtio device. That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. Is it too invasive to existing infrastructure or does it have some other pitfalls? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: virtio PCI on KVM without IO BARs
On 2013-02-28 16:24, Michael S. Tsirkin wrote: Another problem with PIO is support for physical virtio devices, and nested virt: KVM currently programs all PIO accesses to cause vm exit, so using this device in a VM will be slow. Not answering your question, but support for programming direct PIO access into KVM's I/O bitmap would be feasible. Such feature may have some value for assigned devices that use PIO more heavily. They cause lengthy user-space exists so far. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection
On 2012-08-13 10:35, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This is required to get past the following assert with: commit 1523ed9e1d46b0b54540049d491475ccac7e6421 Author: Jan Kiszka jan.kis...@siemens.com Date: Thu May 17 10:32:39 2012 -0300 virtio/vhost: Add support for KVM in-kernel MSI injection Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- hw/msix.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 800fc32..c1e6dc3 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev) { int vector; +if (!dev-msix_vector_use_notifier !dev-msix_vector_release_notifier) +return; + assert(dev-msix_vector_use_notifier dev-msix_vector_release_notifier); I think to remember pointing out that there is a bug somewhere in the reset code which deactivates a non-active vhost instance, no? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection
On 2012-08-13 20:03, Michael S. Tsirkin wrote: On Mon, Aug 13, 2012 at 02:06:10PM +0200, Jan Kiszka wrote: On 2012-08-13 10:35, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@linux-iscsi.org This is required to get past the following assert with: commit 1523ed9e1d46b0b54540049d491475ccac7e6421 Author: Jan Kiszka jan.kis...@siemens.com Date: Thu May 17 10:32:39 2012 -0300 virtio/vhost: Add support for KVM in-kernel MSI injection Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Cc: Jan Kiszka jan.kis...@siemens.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- hw/msix.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 800fc32..c1e6dc3 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev) { int vector; +if (!dev-msix_vector_use_notifier !dev-msix_vector_release_notifier) +return; + assert(dev-msix_vector_use_notifier dev-msix_vector_release_notifier); I think to remember pointing out that there is a bug somewhere in the reset code which deactivates a non-active vhost instance, no? Jan Could not find it. Could you dig it up pls? http://thread.gmane.org/gmane.linux.scsi.target.devel/2277/focus=2309 Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC V8 17/17] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
On 2012-05-02 12:09, Raghavendra K T wrote: From: Raghavendra K T raghavendra...@linux.vnet.ibm.com KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU This contains valuable documentation for features that are already supported. Can you break them out and post as separate patch already? One comment on them below. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- Documentation/virtual/kvm/cpuid.txt |4 ++ Documentation/virtual/kvm/hypercalls.txt | 60 ++ 2 files changed, 64 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 8820685..062dff9 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 -- +KVM_FEATURE_PV_UNHALT || 6 || guest checks this feature bit + || || before enabling paravirtualized + || || spinlock support. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 000..bc3f14a --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,60 @@ +KVM Hypercalls Documentation +=== +The template for each hypercall is: +1. Hypercall name, value. +2. Architecture(s) +3. Status (deprecated, obsolete, active) +4. Purpose + +1. KVM_HC_VAPIC_POLL_IRQ + +Value: 1 +Architecture: x86 +Purpose: None Purpose: Trigger guest exit so that the host can check for pending interrupts on reentry. + +2. KVM_HC_MMU_OP + +Value: 2 +Architecture: x86 +Status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES + +Value: 3 +Architecture: PPC +Status: active +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE + +Value: 4 +Architecture: PPC +Status: active +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. + +5. KVM_HC_KICK_CPU + +Value: 5 +Architecture: x86 +Status: active +Purpose: Hypercall used to wakeup a vcpu from HLT state + +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest +kernel mode for an event to occur (ex: a spinlock to become available) can +execute HLT instruction once it has busy-waited for more than a threshold +time-interval. Execution of HLT instruction would cause the hypervisor to put +the vcpu to sleep until occurence of an appropriate event. Another vcpu of the +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, +specifying APIC ID of the vcpu to be wokenup. + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls. Thanks, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH (repost) RFC 2/2] virtio-pci: recall and return msix notifications on ISR read
On 2011-11-02 21:11, Michael S. Tsirkin wrote: MSIX spec requires that device can be operated with all vectors masked, by polling pending bits. Add APIs to recall an msix notification, and make polling mode possible in virtio-pci by clearing the pending bits and setting ISR appropriately on ISR read. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/msix.c | 26 ++ hw/msix.h |3 +++ hw/virtio-pci.c | 11 ++- 3 files changed, 39 insertions(+), 1 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 63b41b9..fe967c9 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -349,6 +349,32 @@ void msix_notify(PCIDevice *dev, unsigned vector) stl_le_phys(address, data); } +/* Recall outstanding MSI-X notifications for a vector, if possible. + * Return true if any were outstanding. */ +bool msix_recall(PCIDevice *dev, unsigned vector) +{ +bool ret; +if (vector = dev-msix_entries_nr) +return false; +ret = msix_is_pending(dev, vector); +msix_clr_pending(dev, vector); +return ret; +} I would prefer to have a single API instead to clarify the tight relation: bool msi[x]_set_notify(PCIDevice *dev, unsigned vector, unsigned level) Would return true for level=1 if the message was either sent directly or queued (we could deliver false if it was already queued, but I see no use case for this yet). Also, I don't see the generic value of some msix_recall_all. I think it's better handled in a single loop over all vectors at caller site, clearing the individual interrupt reason bits on a per-vector basis there. msix_recall_all is only useful in the virtio case where you have one vector of reason A and all the rest of B. Once you had multiple reason C vectors as well, it would not help anymore. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: IO APIC emulation failure with qemu-kvm
On 2011-02-04 14:35, Ravi Kumar Kulkarni wrote: Hi all, I'm Initializing the Local and IO APIC for a propeitary operating system running in Virtualized Environment . Im facing some problem with qemu-kvm but the code runs fine with qemu. Does it also run fine with qemu-kvm and -no-kvm-irqchip? What versions of the kernel and qemu-kvm are you using? If not the latest git, does updating change the picture? when i run my kernel image with qemu-kvm it gives emulation error failure trying to execute the code outside ROM or RAM at fec0(IO APIC base address) but the same code runs fine with qemu. can anyone please point me where might be the problem or how to find out this one? Start with capturing the activity of you guest via ftrace, enabling all kvm:* events. You may also try to attach gdb to qemu and analyze the different code path in both versions (specifically if you have debugging symbols for your guest). BTW, is your OS doing any fancy [IO]APIC relocations? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization