Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 5/27/2016 4:19 PM, Lan Tianyu wrote: > As for the individual issue of 288vcpu support, there are already issues > with 64vcpu guests at the moment. While it is certainly fine to remove > the hard limit at 255 vcpus, there is a lot of other work required to > even get 128vcpu guests stable. Could you give some points to these issues? We are enabling more vcpus support and it can boot up 255 vcpus without IR support basically. It's very helpful to learn about known issues. Hi Andrew: We are designing vIOMMU support for Xen. Increasing vcpu from 128 to 255 also can be implemented parallelly since it doesn't need vIOMMU support. From your previous comment "there is a lot of other work required to even get 128vcpu guests stable", you have some concerns about stability of 128vcpus. I wonder what we need to do before starting work of increasing vcpu number from 128 to 255? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 7/5/2016 9:57 PM, Jan Beulich wrote: On 05.07.16 at 15:37,wrote: Hi Stefano, Andrew and Jan: Could you give us more guides here to move forward virtual iommu development? Thanks. Due to ... On 6/29/2016 11:04 AM, Tian, Kevin wrote: Please let us know your thoughts. If no one has explicit objection based on above rough idea, we'll go to write the high level design doc for more detail discussion. ... this I actually expected we'd get to see something, rather than our input being waited for. OK. I get it. Because no response, double confirm we are on the right way. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
>>> On 05.07.16 at 15:37,wrote: > Hi Stefano, Andrew and Jan: > Could you give us more guides here to move forward virtual iommu > development? Thanks. Due to ... > On 6/29/2016 11:04 AM, Tian, Kevin wrote: >> Please let us know your thoughts. If no one has explicit objection based >> on above rough idea, we'll go to write the high level design doc for more >> detail discussion. ... this I actually expected we'd get to see something, rather than our input being waited for. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
Hi Stefano, Andrew and Jan: Could you give us more guides here to move forward virtual iommu development? Thanks. On 6/29/2016 11:04 AM, Tian, Kevin wrote: From: Lan, Tianyu Sent: Sunday, June 26, 2016 9:43 PM On 6/8/2016 4:11 PM, Tian, Kevin wrote: It makes sense... I thought you used this security issue against placing vIOMMU in Qemu, which made me a bit confused earlier. :-) We are still thinking feasibility of some staging plan, e.g. first implementing some vIOMMU features w/o dependency on root-complex in Xen (HVM only) and then later enabling full vIOMMU feature w/ root-complex in Xen (covering HVMLite). If we can reuse most code between two stages while shorten time-to-market by half (e.g. from 2yr to 1yr), it's still worthy of pursuing. will report back soon once the idea is consolidated... Thanks Kevin After discussion with Kevin, we draft a staging plan of implementing vIOMMU in Xen based on Qemu host bridge. Both virtual devices and passthough devices use one vIOMMU in Xen. Your comments are very appreciated. The rationale here is to separate BIOS structures from actual vIOMMU emulation. vIOMMU will be always emulated in Xen hypervisor, regardless of where Q35 emulation is done or whether it's HVM or HVMLite. The staging plan is more for the BIOS structure reporting which is Q35 specific. For now we first target Qemu Q35 emulation, with a set of vIOMMU ops introduced as Tianyu listed below to help interact between Qemu and Xen. Later when Xen Q35 emulation is ready, the reporting can be done in Xen. The main limitation of this model is on DMA emulation of Qemu virtual devices, which needs to query Xen vIOMMU for every virtual DMA. It is possibly fine for virtual devices which are normally not for performance critical usages. Also there may be some chance to cache some translations within Qemu like thru ATS (may not worthy of it though...). 1. Enable Q35 support in the hvmloader. In the real world, VTD support starts from Q35 and OS may have such assumption that VTD only exists on the Q35 or newer platform. Q35 support seems necessary for vIOMMU support. In regardless of Q35 host bridge in the Qemu or Xen hypervisor, hvmloader needs to be compatible with Q35 and build Q35 ACPI tables. Qemu already has Q35 emulation and so the hvmloader job can start with Qemu. When host bridge in Xen is ready, these changes also can be reused. 2. Implement vIOMMU in Xen based on Qemu host bridge. Add a new device type "Xen iommu" in the Qemu as a wrapper of vIOMMU hypercalls to communicate with Xen vIOMMU. It's in charge of: 1) Query vIOMMU capability(E,G interrupt remapping, DMA translation, SVM and so on) 2) Create vIOMMU with predefined base address of IOMMU unit regs 3) Notify hvmloader to populate related content in the ACPI DMAR table.(Add vIOMMU info to struct hvm_info_table) 4) Deal with DMA translation request of virtual devices and return back translated address. 5) Attach/detach hotplug device from vIOMMU New hypercalls for vIOMMU that are also necessary when host bridge in Xen. 1) Query vIOMMU capability 2) Create vIOMMU(IOMMU unit reg base as params) 3) Virtual device's DMA translation 4) Attach/detach hotplug device from VIOMMU We don't need 4). Hotplug device is automatically handled by the vIOMMU with INCLUDE_ALL flag set (which should be the case if we only have one vIOMMU in Xen). We don't need further notify this event to Xen vIOMMU. And once we have Xen Q35 emulation in place, possibly only 3) is required then. All IOMMU emulations will be done in Xen 1) DMA translation 2) Interrupt remapping 3) Shared Virtual Memory (SVM) Please let us know your thoughts. If no one has explicit objection based on above rough idea, we'll go to write the high level design doc for more detail discussion. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Lan, Tianyu > Sent: Sunday, June 26, 2016 9:43 PM > > On 6/8/2016 4:11 PM, Tian, Kevin wrote: > > It makes sense... I thought you used this security issue against > > placing vIOMMU in Qemu, which made me a bit confused earlier. :-) > > > > We are still thinking feasibility of some staging plan, e.g. first > > implementing some vIOMMU features w/o dependency on root-complex in > > Xen (HVM only) and then later enabling full vIOMMU feature w/ > > root-complex in Xen (covering HVMLite). If we can reuse most code > > between two stages while shorten time-to-market by half (e.g. from > > 2yr to 1yr), it's still worthy of pursuing. will report back soon > > once the idea is consolidated... > > > > Thanks Kevin > > > After discussion with Kevin, we draft a staging plan of implementing > vIOMMU in Xen based on Qemu host bridge. Both virtual devices and > passthough devices use one vIOMMU in Xen. Your comments are very > appreciated. The rationale here is to separate BIOS structures from actual vIOMMU emulation. vIOMMU will be always emulated in Xen hypervisor, regardless of where Q35 emulation is done or whether it's HVM or HVMLite. The staging plan is more for the BIOS structure reporting which is Q35 specific. For now we first target Qemu Q35 emulation, with a set of vIOMMU ops introduced as Tianyu listed below to help interact between Qemu and Xen. Later when Xen Q35 emulation is ready, the reporting can be done in Xen. The main limitation of this model is on DMA emulation of Qemu virtual devices, which needs to query Xen vIOMMU for every virtual DMA. It is possibly fine for virtual devices which are normally not for performance critical usages. Also there may be some chance to cache some translations within Qemu like thru ATS (may not worthy of it though...). > > 1. Enable Q35 support in the hvmloader. > In the real world, VTD support starts from Q35 and OS may have such > assumption that VTD only exists on the Q35 or newer platform. > Q35 support seems necessary for vIOMMU support. > > In regardless of Q35 host bridge in the Qemu or Xen hypervisor, > hvmloader needs to be compatible with Q35 and build Q35 ACPI tables. > > Qemu already has Q35 emulation and so the hvmloader job can start with > Qemu. When host bridge in Xen is ready, these changes also can be reused. > > 2. Implement vIOMMU in Xen based on Qemu host bridge. > Add a new device type "Xen iommu" in the Qemu as a wrapper of vIOMMU > hypercalls to communicate with Xen vIOMMU. > > It's in charge of: > 1) Query vIOMMU capability(E,G interrupt remapping, DMA translation, SVM > and so on) > 2) Create vIOMMU with predefined base address of IOMMU unit regs > 3) Notify hvmloader to populate related content in the ACPI DMAR > table.(Add vIOMMU info to struct hvm_info_table) > 4) Deal with DMA translation request of virtual devices and return > back translated address. > 5) Attach/detach hotplug device from vIOMMU > > > New hypercalls for vIOMMU that are also necessary when host bridge in Xen. > 1) Query vIOMMU capability > 2) Create vIOMMU(IOMMU unit reg base as params) > 3) Virtual device's DMA translation > 4) Attach/detach hotplug device from VIOMMU We don't need 4). Hotplug device is automatically handled by the vIOMMU with INCLUDE_ALL flag set (which should be the case if we only have one vIOMMU in Xen). We don't need further notify this event to Xen vIOMMU. And once we have Xen Q35 emulation in place, possibly only 3) is required then. > > > All IOMMU emulations will be done in Xen > 1) DMA translation > 2) Interrupt remapping > 3) Shared Virtual Memory (SVM) Please let us know your thoughts. If no one has explicit objection based on above rough idea, we'll go to write the high level design doc for more detail discussion. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Stefano Stabellini [mailto:sstabell...@kernel.org] > Sent: Tuesday, June 07, 2016 6:07 PM > > On Tue, 7 Jun 2016, Tian, Kevin wrote: > > > I think of QEMU as a provider of complex, high level emulators, such as > > > the e1000, Cirrus VGA, SCSI controllers, etc., which don't necessarily > > > need to be fast. > > > > Earlier you said Qemu imposes security issues. Here you said Qemu can > > still provide complex emulators. Does it mean that security issue in Qemu > > simply comes from the part which should be moved into Xen? Any > > elaboration here? > > It imposes security issues because, although it doesn't have to run as > root anymore, QEMU still has to run with fully privileged libxc and > xenstore handles. In other words, a malicious guest breaking into QEMU > would have relatively easy access to the whole host. There is a design > to solve this, see Ian Jackson's talk at FOSDEM this year: > > https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/ > https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/a > ttachments/other/921/export/events/attachments/virt_iaas_qemu_for_xen_secure_by_ > default/other/921/talk.txt > > Other solutions to solve this issue are stubdoms or simply using PV > guests and HVMlite guests only. > > Irrespective of the problematic security angle, which is unsolved, I > think of QEMU as a provider of complex emulators, as I wrote above. > > Does it make sense? It makes sense... I thought you used this security issue against placing vIOMMU in Qemu, which made me a bit confused earlier. :-) We are still thinking feasibility of some staging plan, e.g. first implementing some vIOMMU features w/o dependency on root-complex in Xen (HVM only) and then later enabling full vIOMMU feature w/ root-complex in Xen (covering HVMLite). If we can reuse most code between two stages while shorten time-to-market by half (e.g. from 2yr to 1yr), it's still worthy of pursuing. will report back soon once the idea is consolidated... Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On Tue, 7 Jun 2016, Tian, Kevin wrote: > > I think of QEMU as a provider of complex, high level emulators, such as > > the e1000, Cirrus VGA, SCSI controllers, etc., which don't necessarily > > need to be fast. > > Earlier you said Qemu imposes security issues. Here you said Qemu can > still provide complex emulators. Does it mean that security issue in Qemu > simply comes from the part which should be moved into Xen? Any > elaboration here? It imposes security issues because, although it doesn't have to run as root anymore, QEMU still has to run with fully privileged libxc and xenstore handles. In other words, a malicious guest breaking into QEMU would have relatively easy access to the whole host. There is a design to solve this, see Ian Jackson's talk at FOSDEM this year: https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/ https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/attachments/other/921/export/events/attachments/virt_iaas_qemu_for_xen_secure_by_default/other/921/talk.txt Other solutions to solve this issue are stubdoms or simply using PV guests and HVMlite guests only. Irrespective of the problematic security angle, which is unsolved, I think of QEMU as a provider of complex emulators, as I wrote above. Does it make sense? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
>>> On 07.06.16 at 07:14,wrote: > After some internal discussion with Tianyu/Eddie, I realized my earlier > description is incomplete which takes only passthrough device into > consideration (as you saw it's mainly around interaction between vIOMMU > and pIOMMU). However from guest p.o.v, all the devices should be covered > by vIOMMU to match today's physical platform, including: > > 1) DMA-capable virtual device in Qemu, in Dom0 user space > 2) PV devices, in Dom0 kernel space > 3) Passthrough devices, in Xen hypervisor > > A natural implementation is to have vIOMMU together with where the > DMA is emulated, which ends up to a possible way with multiple vIOMMUs > in multiple layers: > > 1) vIOMMU in Dom0 user > 2) vIOMMU in Dom0 kernel > 3) vIOMMU in Xen hypervisor > > Of course we may come up an option to still keep all vIOMMUs in Xen > hypervisor, which however means every vDMA operations in Qemu or > BE driver need to issue Xen hypercall to get vIOMMU's approval. I haven't > thought thoroughly how big/complex this issue is, but it does be a > limitation from a quick thought. > > So, likely we'll have to consider presence of multiple vIOMMUs, each in > different layers, regardless of root-complex in Qemu or Xen. There > needs to be some interface abstractions to allow vIOMMU/root-complex > communicating with each other. Well, not an easy task... Right - for DMA-capable devices emulated in qemu, it would seem natural to have them go through a vIOMMU in qemu. Whether that vIOMMU implementation would have to consult the hypervisor (or perhaps even just be a wrapper around various hypercalls, i.e. backed by an implementation in the hypervisor) would be an independent aspect. Otoh, having vIOMMU in only qemu, and requiring round trips through qemu for any of the hypervisor's internal purposes doesn't seem like a good idea to me. And finally I don't see the relevance of PV devices here: Their nature makes it that they could easily be left completely independent of an vIOMMU (as long as there's no plan to bypass a virtualization level in the nested case, i.e. a PV frontend in L2 with a backend living in L0). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Stefano Stabellini > Sent: Saturday, June 04, 2016 1:15 AM > > On Fri, 3 Jun 2016, Andrew Cooper wrote: > > On 03/06/16 12:17, Tian, Kevin wrote: > > >> Very sorry for the delay. > > >> > > >> There are multiple interacting issues here. On the one side, it would > > >> be useful if we could have a central point of coordination on > > >> PVH/HVMLite work. Roger - as the person who last did HVMLite work, > > >> would you mind organising that? > > >> > > >> For the qemu/xen interaction, the current state is woeful and a tangled > > >> mess. I wish to ensure that we don't make any development decisions > > >> which makes the situation worse. > > >> > > >> In your case, the two motivations are quite different I would recommend > > >> dealing with them independently. > > >> > > >> IIRC, the issue with more than 255 cpus and interrupt remapping is that > > >> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs > > >> can't be programmed to generate x2apic interrupts? In principle, if you > > >> don't have an IOAPIC, are there any other issues to be considered? What > > >> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC > > >> deliver xapic interrupts? > > > The key is the APIC ID. There is no modification to existing PCI MSI and > > > IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send > > > interrupt message containing 8bit APIC ID, which cannot address >255 > > > cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to > > > enable >255 cpus with x2apic mode. > > > > Thanks for clarifying. > > > > > > > > If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot > > > deliver interrupts to all cpus in the system if #cpu > 255. > > > > Ok. So not ideal (and we certainly want to address it), but this isn't > > a complete show stopper for a guest. > > > > >> On the other side of things, what is IGD passthrough going to look like > > >> in Skylake? Is there any device-model interaction required (i.e. the > > >> opregion), or will it work as a completely standalone device? What are > > >> your plans with the interaction of virtual graphics and shared virtual > > >> memory? > > >> > > > The plan is to use a so-called universal pass-through driver in the guest > > > which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.) > > > > This is fantastic news. > > > > > > > > > > > Here is a brief of potential usages relying on vIOMMU: > > > > > > a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. > > > It requires interrupt remapping capability present on vIOMMU; > > > > > > b) support guest SVM (Shared Virtual Memory), which relies on the > > > 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU > > > needs to enable both 1st level and 2nd level translation in nested > > > mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is > > > the main usage today (to support OpenCL 2.0 SVM feature). In the > > > future SVM might be used by other I/O devices too; > > > > > > c) support VFIO-based user space driver (e.g. DPDK) in the guest, > > > which relies on the 2nd level translation capability (IOVA->GPA) on > > > vIOMMU. pIOMMU 2nd level becomes a shadowing structure of > > > vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA); > > > > All of these look like interesting things to do. I know there is a lot > > of interest for b). > > > > As a quick aside, does Xen currently boot on a Phi? Last time I looked > > at the Phi manual, I would expect Xen to crash on boot because of MCXSR > > differences from more-common x86 hardware. Tianyu can correct me for the detail info. Xen can boot on Xeon Phi. However we need a hacky patch in guest Linux kernel to disable dependency check around interrupt remapping. Otherwise guest kernel boot will fail. Now we're suffering from some performance issue. When the analysis is ongoing, could you elaborate the limitation you see with 64vcpu guest? It would be helpful whether we are hunting the same problem or not... > > > > > > > > > > > And below is my thought viability of implementing vIOMMU in Qemu: > > > > > > a) enable >255 vcpus: > > > > > > o Enable Q35 in Qemu-Xen; > > > o Add interrupt remapping in Qemu vIOMMU; > > > o Virtual interrupt injection in hypervisor needs to know virtual > > > interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI, > > > which requires new hypervisor interfaces as Andrew pointed out: > > > * either for hypervisor to query IR from Qemu which is not > > > good; > > > * or for Qemu to register IR info to hypervisor which means > > > partial IR knowledge implemented in hypervisor (then why not putting > > > whole IR emulation in Xen?) > > > > > > b) support SVM > > > > > > o Enable Q35 in Qemu-Xen; > > > o Add 1st level translation capability in Qemu vIOMMU; > > > o VT-d context entry points to guest 1st level translation table > >
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On Fri, 3 Jun 2016, Andrew Cooper wrote: > On 03/06/16 12:17, Tian, Kevin wrote: > >> Very sorry for the delay. > >> > >> There are multiple interacting issues here. On the one side, it would > >> be useful if we could have a central point of coordination on > >> PVH/HVMLite work. Roger - as the person who last did HVMLite work, > >> would you mind organising that? > >> > >> For the qemu/xen interaction, the current state is woeful and a tangled > >> mess. I wish to ensure that we don't make any development decisions > >> which makes the situation worse. > >> > >> In your case, the two motivations are quite different I would recommend > >> dealing with them independently. > >> > >> IIRC, the issue with more than 255 cpus and interrupt remapping is that > >> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs > >> can't be programmed to generate x2apic interrupts? In principle, if you > >> don't have an IOAPIC, are there any other issues to be considered? What > >> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC > >> deliver xapic interrupts? > > The key is the APIC ID. There is no modification to existing PCI MSI and > > IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send > > interrupt message containing 8bit APIC ID, which cannot address >255 > > cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to > > enable >255 cpus with x2apic mode. > > Thanks for clarifying. > > > > > If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot > > deliver interrupts to all cpus in the system if #cpu > 255. > > Ok. So not ideal (and we certainly want to address it), but this isn't > a complete show stopper for a guest. > > >> On the other side of things, what is IGD passthrough going to look like > >> in Skylake? Is there any device-model interaction required (i.e. the > >> opregion), or will it work as a completely standalone device? What are > >> your plans with the interaction of virtual graphics and shared virtual > >> memory? > >> > > The plan is to use a so-called universal pass-through driver in the guest > > which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.) > > This is fantastic news. > > > > > > > Here is a brief of potential usages relying on vIOMMU: > > > > a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. > > It requires interrupt remapping capability present on vIOMMU; > > > > b) support guest SVM (Shared Virtual Memory), which relies on the > > 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU > > needs to enable both 1st level and 2nd level translation in nested > > mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is > > the main usage today (to support OpenCL 2.0 SVM feature). In the > > future SVM might be used by other I/O devices too; > > > > c) support VFIO-based user space driver (e.g. DPDK) in the guest, > > which relies on the 2nd level translation capability (IOVA->GPA) on > > vIOMMU. pIOMMU 2nd level becomes a shadowing structure of > > vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA); > > All of these look like interesting things to do. I know there is a lot > of interest for b). > > As a quick aside, does Xen currently boot on a Phi? Last time I looked > at the Phi manual, I would expect Xen to crash on boot because of MCXSR > differences from more-common x86 hardware. > > > > > > > And below is my thought viability of implementing vIOMMU in Qemu: > > > > a) enable >255 vcpus: > > > > o Enable Q35 in Qemu-Xen; > > o Add interrupt remapping in Qemu vIOMMU; > > o Virtual interrupt injection in hypervisor needs to know virtual > > interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI, > > which requires new hypervisor interfaces as Andrew pointed out: > > * either for hypervisor to query IR from Qemu which is not > > good; > > * or for Qemu to register IR info to hypervisor which means > > partial IR knowledge implemented in hypervisor (then why not putting > > whole IR emulation in Xen?) > > > > b) support SVM > > > > o Enable Q35 in Qemu-Xen; > > o Add 1st level translation capability in Qemu vIOMMU; > > o VT-d context entry points to guest 1st level translation table > > which is nest-translated by 2nd level translation table so vIOMMU > > structure can be directly linked. It means: > > * Xen IOMMU driver enables nested mode; > > * Introduce a new hypercall so Qemu vIOMMU can register > > GPA root of guest 1st level translation table which is then written > > to context entry in pIOMMU; > > > > c) support VFIO-based user space driver > > > > o Enable Q35 in Qemu-Xen; > > o Leverage existing 2nd level translation implementation in Qemu > > vIOMMU; > > o Change Xen IOMMU to support (IOVA->HPA) translation which > > means decouple current logic from P2M layer (only for GPA->HPA); > > o
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
>>> On 03.06.16 at 15:51,wrote: > As a quick aside, does Xen currently boot on a Phi? Last time I looked > at the Phi manual, I would expect Xen to crash on boot because of MCXSR > differences from more-common x86 hardware. It does boot, as per reports we've got. Perhaps, much like I did until I was explicitly told there's a significant difference, you're mixing the earlier non-self-booting one with KNL? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 03/06/16 14:09, Lan, Tianyu wrote: > > > On 6/3/2016 7:17 PM, Tian, Kevin wrote: >>> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] >>> Sent: Friday, June 03, 2016 2:59 AM >>> >>> On 02/06/16 16:03, Lan, Tianyu wrote: On 5/27/2016 4:19 PM, Lan Tianyu wrote: > On 2016年05月26日 19:35, Andrew Cooper wrote: >> On 26/05/16 09:29, Lan Tianyu wrote: >> >> To be viable going forwards, any solution must work with >> PVH/HVMLite as >> much as HVM. This alone negates qemu as a viable option. >> >> From a design point of view, having Xen needing to delegate to >> qemu to >> inject an interrupt into a guest seems backwards. >> > > Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and > the qemu virtual iommu can't work for it. We have to rewrite virtual > iommu in the Xen, right? > >> >> A whole lot of this would be easier to reason about if/when we get a >> basic root port implementation in Xen, which is necessary for >> HVMLite, >> and which will make the interaction with qemu rather more clean. >> It is >> probably worth coordinating work in this area. > > The virtual iommu also should be under basic root port in Xen, right? > >> >> As for the individual issue of 288vcpu support, there are already >> issues >> with 64vcpu guests at the moment. While it is certainly fine to >> remove >> the hard limit at 255 vcpus, there is a lot of other work >> required to >> even get 128vcpu guests stable. > > > Could you give some points to these issues? We are enabling more > vcpus > support and it can boot up 255 vcpus without IR support basically. > It's > very helpful to learn about known issues. > > We will also add more tests for 128 vcpus into our regular test to > find > related bugs. Increasing max vcpu to 255 should be a good start. Hi Andrew: Could you give more inputs about issues with 64 vcpus and what needs to be done to make 128vcpu guest stable? We hope to do somethings to improve them. What's progress of PCI host bridge in Xen? From your opinion, we should do that first, right? Thanks. >>> >>> Very sorry for the delay. >>> >>> There are multiple interacting issues here. On the one side, it would >>> be useful if we could have a central point of coordination on >>> PVH/HVMLite work. Roger - as the person who last did HVMLite work, >>> would you mind organising that? >>> >>> For the qemu/xen interaction, the current state is woeful and a tangled >>> mess. I wish to ensure that we don't make any development decisions >>> which makes the situation worse. >>> >>> In your case, the two motivations are quite different I would recommend >>> dealing with them independently. >>> >>> IIRC, the issue with more than 255 cpus and interrupt remapping is that >>> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs >>> can't be programmed to generate x2apic interrupts? In principle, if >>> you >>> don't have an IOAPIC, are there any other issues to be considered? >>> What >>> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC >>> deliver xapic interrupts? >> >> The key is the APIC ID. There is no modification to existing PCI MSI and >> IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send >> interrupt message containing 8bit APIC ID, which cannot address >255 >> cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to >> enable >255 cpus with x2apic mode. >> >> If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC >> cannot >> deliver interrupts to all cpus in the system if #cpu > 255. > > Another key factor, Linux kernel disables x2apic mode when MAX APIC id > is > 255 if no interrupt remapping function. The reason for this is what > Kevin said. So booting up >255 cpus relies on the interrupt remapping. That is an implementation decision of Linux, not an architectural requirement. We need to carefully distinguish the two (even if it doesn't affect the planned outcome from Xen's point if view), as Linux is not the only operating system we virtualise. One interesting issue in this area is plain, no-frills HVMLite domains, which have an LAPIC but no IOAPIC, as they have no legacy devices/PCI bus/etc. In this scenario, no vIOMMU would be required for x2apic mode, even if the domain had >255 vcpus. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 03/06/16 12:17, Tian, Kevin wrote: >> Very sorry for the delay. >> >> There are multiple interacting issues here. On the one side, it would >> be useful if we could have a central point of coordination on >> PVH/HVMLite work. Roger - as the person who last did HVMLite work, >> would you mind organising that? >> >> For the qemu/xen interaction, the current state is woeful and a tangled >> mess. I wish to ensure that we don't make any development decisions >> which makes the situation worse. >> >> In your case, the two motivations are quite different I would recommend >> dealing with them independently. >> >> IIRC, the issue with more than 255 cpus and interrupt remapping is that >> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs >> can't be programmed to generate x2apic interrupts? In principle, if you >> don't have an IOAPIC, are there any other issues to be considered? What >> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC >> deliver xapic interrupts? > The key is the APIC ID. There is no modification to existing PCI MSI and > IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send > interrupt message containing 8bit APIC ID, which cannot address >255 > cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to > enable >255 cpus with x2apic mode. Thanks for clarifying. > > If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot > deliver interrupts to all cpus in the system if #cpu > 255. Ok. So not ideal (and we certainly want to address it), but this isn't a complete show stopper for a guest. >> On the other side of things, what is IGD passthrough going to look like >> in Skylake? Is there any device-model interaction required (i.e. the >> opregion), or will it work as a completely standalone device? What are >> your plans with the interaction of virtual graphics and shared virtual >> memory? >> > The plan is to use a so-called universal pass-through driver in the guest > which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.) This is fantastic news. > > > Here is a brief of potential usages relying on vIOMMU: > > a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. > It requires interrupt remapping capability present on vIOMMU; > > b) support guest SVM (Shared Virtual Memory), which relies on the > 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU > needs to enable both 1st level and 2nd level translation in nested > mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is > the main usage today (to support OpenCL 2.0 SVM feature). In the > future SVM might be used by other I/O devices too; > > c) support VFIO-based user space driver (e.g. DPDK) in the guest, > which relies on the 2nd level translation capability (IOVA->GPA) on > vIOMMU. pIOMMU 2nd level becomes a shadowing structure of > vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA); All of these look like interesting things to do. I know there is a lot of interest for b). As a quick aside, does Xen currently boot on a Phi? Last time I looked at the Phi manual, I would expect Xen to crash on boot because of MCXSR differences from more-common x86 hardware. > > > And below is my thought viability of implementing vIOMMU in Qemu: > > a) enable >255 vcpus: > > o Enable Q35 in Qemu-Xen; > o Add interrupt remapping in Qemu vIOMMU; > o Virtual interrupt injection in hypervisor needs to know virtual > interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI, > which requires new hypervisor interfaces as Andrew pointed out: > * either for hypervisor to query IR from Qemu which is not > good; > * or for Qemu to register IR info to hypervisor which means > partial IR knowledge implemented in hypervisor (then why not putting > whole IR emulation in Xen?) > > b) support SVM > > o Enable Q35 in Qemu-Xen; > o Add 1st level translation capability in Qemu vIOMMU; > o VT-d context entry points to guest 1st level translation table > which is nest-translated by 2nd level translation table so vIOMMU > structure can be directly linked. It means: > * Xen IOMMU driver enables nested mode; > * Introduce a new hypercall so Qemu vIOMMU can register > GPA root of guest 1st level translation table which is then written > to context entry in pIOMMU; > > c) support VFIO-based user space driver > > o Enable Q35 in Qemu-Xen; > o Leverage existing 2nd level translation implementation in Qemu > vIOMMU; > o Change Xen IOMMU to support (IOVA->HPA) translation which > means decouple current logic from P2M layer (only for GPA->HPA); > o As a means of shadowing approach, Xen IOMMU driver needs to > know both (IOVA->GPA) and (GPA->HPA) info to update (IOVA->HPA) > mapping in case of any one is changed. So new interface is required > for Qemu vIOMMU to propagate
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 6/3/2016 7:17 PM, Tian, Kevin wrote: From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: Friday, June 03, 2016 2:59 AM On 02/06/16 16:03, Lan, Tianyu wrote: On 5/27/2016 4:19 PM, Lan Tianyu wrote: On 2016年05月26日 19:35, Andrew Cooper wrote: On 26/05/16 09:29, Lan Tianyu wrote: To be viable going forwards, any solution must work with PVH/HVMLite as much as HVM. This alone negates qemu as a viable option. From a design point of view, having Xen needing to delegate to qemu to inject an interrupt into a guest seems backwards. Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and the qemu virtual iommu can't work for it. We have to rewrite virtual iommu in the Xen, right? A whole lot of this would be easier to reason about if/when we get a basic root port implementation in Xen, which is necessary for HVMLite, and which will make the interaction with qemu rather more clean. It is probably worth coordinating work in this area. The virtual iommu also should be under basic root port in Xen, right? As for the individual issue of 288vcpu support, there are already issues with 64vcpu guests at the moment. While it is certainly fine to remove the hard limit at 255 vcpus, there is a lot of other work required to even get 128vcpu guests stable. Could you give some points to these issues? We are enabling more vcpus support and it can boot up 255 vcpus without IR support basically. It's very helpful to learn about known issues. We will also add more tests for 128 vcpus into our regular test to find related bugs. Increasing max vcpu to 255 should be a good start. Hi Andrew: Could you give more inputs about issues with 64 vcpus and what needs to be done to make 128vcpu guest stable? We hope to do somethings to improve them. What's progress of PCI host bridge in Xen? From your opinion, we should do that first, right? Thanks. Very sorry for the delay. There are multiple interacting issues here. On the one side, it would be useful if we could have a central point of coordination on PVH/HVMLite work. Roger - as the person who last did HVMLite work, would you mind organising that? For the qemu/xen interaction, the current state is woeful and a tangled mess. I wish to ensure that we don't make any development decisions which makes the situation worse. In your case, the two motivations are quite different I would recommend dealing with them independently. IIRC, the issue with more than 255 cpus and interrupt remapping is that you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs can't be programmed to generate x2apic interrupts? In principle, if you don't have an IOAPIC, are there any other issues to be considered? What happens if you configure the LAPICs in x2apic mode, but have the IOAPIC deliver xapic interrupts? The key is the APIC ID. There is no modification to existing PCI MSI and IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send interrupt message containing 8bit APIC ID, which cannot address >255 cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to enable >255 cpus with x2apic mode. If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot deliver interrupts to all cpus in the system if #cpu > 255. Another key factor, Linux kernel disables x2apic mode when MAX APIC id is > 255 if no interrupt remapping function. The reason for this is what Kevin said. So booting up >255 cpus relies on the interrupt remapping. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Friday, June 03, 2016 2:59 AM > > On 02/06/16 16:03, Lan, Tianyu wrote: > > On 5/27/2016 4:19 PM, Lan Tianyu wrote: > >> On 2016年05月26日 19:35, Andrew Cooper wrote: > >>> On 26/05/16 09:29, Lan Tianyu wrote: > >>> > >>> To be viable going forwards, any solution must work with PVH/HVMLite as > >>> much as HVM. This alone negates qemu as a viable option. > >>> > >>> From a design point of view, having Xen needing to delegate to qemu to > >>> inject an interrupt into a guest seems backwards. > >>> > >> > >> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and > >> the qemu virtual iommu can't work for it. We have to rewrite virtual > >> iommu in the Xen, right? > >> > >>> > >>> A whole lot of this would be easier to reason about if/when we get a > >>> basic root port implementation in Xen, which is necessary for HVMLite, > >>> and which will make the interaction with qemu rather more clean. It is > >>> probably worth coordinating work in this area. > >> > >> The virtual iommu also should be under basic root port in Xen, right? > >> > >>> > >>> As for the individual issue of 288vcpu support, there are already > >>> issues > >>> with 64vcpu guests at the moment. While it is certainly fine to remove > >>> the hard limit at 255 vcpus, there is a lot of other work required to > >>> even get 128vcpu guests stable. > >> > >> > >> Could you give some points to these issues? We are enabling more vcpus > >> support and it can boot up 255 vcpus without IR support basically. It's > >> very helpful to learn about known issues. > >> > >> We will also add more tests for 128 vcpus into our regular test to find > >> related bugs. Increasing max vcpu to 255 should be a good start. > > > > Hi Andrew: > > Could you give more inputs about issues with 64 vcpus and what needs to > > be done to make 128vcpu guest stable? We hope to do somethings to > > improve them. > > > > What's progress of PCI host bridge in Xen? From your opinion, we should > > do that first, right? Thanks. > > Very sorry for the delay. > > There are multiple interacting issues here. On the one side, it would > be useful if we could have a central point of coordination on > PVH/HVMLite work. Roger - as the person who last did HVMLite work, > would you mind organising that? > > For the qemu/xen interaction, the current state is woeful and a tangled > mess. I wish to ensure that we don't make any development decisions > which makes the situation worse. > > In your case, the two motivations are quite different I would recommend > dealing with them independently. > > IIRC, the issue with more than 255 cpus and interrupt remapping is that > you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs > can't be programmed to generate x2apic interrupts? In principle, if you > don't have an IOAPIC, are there any other issues to be considered? What > happens if you configure the LAPICs in x2apic mode, but have the IOAPIC > deliver xapic interrupts? The key is the APIC ID. There is no modification to existing PCI MSI and IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send interrupt message containing 8bit APIC ID, which cannot address >255 cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to enable >255 cpus with x2apic mode. If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot deliver interrupts to all cpus in the system if #cpu > 255. > > On the other side of things, what is IGD passthrough going to look like > in Skylake? Is there any device-model interaction required (i.e. the > opregion), or will it work as a completely standalone device? What are > your plans with the interaction of virtual graphics and shared virtual > memory? > The plan is to use a so-called universal pass-through driver in the guest which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.) Here is a brief of potential usages relying on vIOMMU: a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. It requires interrupt remapping capability present on vIOMMU; b) support guest SVM (Shared Virtual Memory), which relies on the 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU needs to enable both 1st level and 2nd level translation in nested mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is the main usage today (to support OpenCL 2.0 SVM feature). In the future SVM might be used by other I/O devices too; c) support VFIO-based user space driver (e.g. DPDK) in the guest, which relies on the 2nd level translation capability (IOVA->GPA) on vIOMMU. pIOMMU 2nd level becomes a shadowing structure of vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA); And below is my thought viability of implementing vIOMMU in Qemu: a) enable >255 vcpus: o Enable Q35 in Qemu-Xen; o Add interrupt remapping in Qemu vIOMMU; o Virtual interrupt
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 02/06/16 16:03, Lan, Tianyu wrote: > On 5/27/2016 4:19 PM, Lan Tianyu wrote: >> On 2016年05月26日 19:35, Andrew Cooper wrote: >>> On 26/05/16 09:29, Lan Tianyu wrote: >>> >>> To be viable going forwards, any solution must work with PVH/HVMLite as >>> much as HVM. This alone negates qemu as a viable option. >>> >>> From a design point of view, having Xen needing to delegate to qemu to >>> inject an interrupt into a guest seems backwards. >>> >> >> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and >> the qemu virtual iommu can't work for it. We have to rewrite virtual >> iommu in the Xen, right? >> >>> >>> A whole lot of this would be easier to reason about if/when we get a >>> basic root port implementation in Xen, which is necessary for HVMLite, >>> and which will make the interaction with qemu rather more clean. It is >>> probably worth coordinating work in this area. >> >> The virtual iommu also should be under basic root port in Xen, right? >> >>> >>> As for the individual issue of 288vcpu support, there are already >>> issues >>> with 64vcpu guests at the moment. While it is certainly fine to remove >>> the hard limit at 255 vcpus, there is a lot of other work required to >>> even get 128vcpu guests stable. >> >> >> Could you give some points to these issues? We are enabling more vcpus >> support and it can boot up 255 vcpus without IR support basically. It's >> very helpful to learn about known issues. >> >> We will also add more tests for 128 vcpus into our regular test to find >> related bugs. Increasing max vcpu to 255 should be a good start. > > Hi Andrew: > Could you give more inputs about issues with 64 vcpus and what needs to > be done to make 128vcpu guest stable? We hope to do somethings to > improve them. > > What's progress of PCI host bridge in Xen? From your opinion, we should > do that first, right? Thanks. Very sorry for the delay. There are multiple interacting issues here. On the one side, it would be useful if we could have a central point of coordination on PVH/HVMLite work. Roger - as the person who last did HVMLite work, would you mind organising that? For the qemu/xen interaction, the current state is woeful and a tangled mess. I wish to ensure that we don't make any development decisions which makes the situation worse. In your case, the two motivations are quite different I would recommend dealing with them independently. IIRC, the issue with more than 255 cpus and interrupt remapping is that you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs can't be programmed to generate x2apic interrupts? In principle, if you don't have an IOAPIC, are there any other issues to be considered? What happens if you configure the LAPICs in x2apic mode, but have the IOAPIC deliver xapic interrupts? On the other side of things, what is IGD passthrough going to look like in Skylake? Is there any device-model interaction required (i.e. the opregion), or will it work as a completely standalone device? What are your plans with the interaction of virtual graphics and shared virtual memory? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 5/27/2016 4:19 PM, Lan Tianyu wrote: On 2016年05月26日 19:35, Andrew Cooper wrote: On 26/05/16 09:29, Lan Tianyu wrote: To be viable going forwards, any solution must work with PVH/HVMLite as much as HVM. This alone negates qemu as a viable option. From a design point of view, having Xen needing to delegate to qemu to inject an interrupt into a guest seems backwards. Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and the qemu virtual iommu can't work for it. We have to rewrite virtual iommu in the Xen, right? A whole lot of this would be easier to reason about if/when we get a basic root port implementation in Xen, which is necessary for HVMLite, and which will make the interaction with qemu rather more clean. It is probably worth coordinating work in this area. The virtual iommu also should be under basic root port in Xen, right? As for the individual issue of 288vcpu support, there are already issues with 64vcpu guests at the moment. While it is certainly fine to remove the hard limit at 255 vcpus, there is a lot of other work required to even get 128vcpu guests stable. Could you give some points to these issues? We are enabling more vcpus support and it can boot up 255 vcpus without IR support basically. It's very helpful to learn about known issues. We will also add more tests for 128 vcpus into our regular test to find related bugs. Increasing max vcpu to 255 should be a good start. Hi Andrew: Could you give more inputs about issues with 64 vcpus and what needs to be done to make 128vcpu guest stable? We hope to do somethings to improve them. What's progress of PCI host bridge in Xen? From your opinion, we should do that first, right? Thanks. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On Thu, May 26, 2016 at 12:35 PM, Andrew Cooperwrote: > On 26/05/16 09:29, Lan Tianyu wrote: >> Hi All: >> We try pushing virtual iommu support for Xen guest and there are some >> features blocked by it. >> >> Motivation: >> --- >> 1) Add SVM(Shared Virtual Memory) support for Xen guest >> To support iGFX pass-through for SVM enabled devices, it requires >> virtual iommu support to emulate related registers and intercept/handle >> guest SVM configure in the VMM. >> >> 2) Increase max vcpu support for one VM. >> >> So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance >> Computing) cloud computing, it requires more vcpus support in a single >> VM. The usage model is to create just one VM on a machine with the >> same number vcpus as logical cpus on the host and pin vcpu on each >> logical cpu in order to get good compute performance. >> >> Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and >> supports 288 logical cpus. So we hope VM can support 288 vcpu >> to meet HPC requirement. >> >> Current Linux kernel requires IR(interrupt remapping) when MAX APIC >> ID is > 255 because interrupt only can be delivered among 0~255 cpus >> without IR. IR in VM relies on the virtual iommu support. >> >> KVM Virtual iommu support status >> >> Current, Qemu has a basic virtual iommu to do address translation for >> virtual device and it only works for the Q35 machine type. KVM reuses it >> and Redhat is adding IR to support more than 255 vcpus. >> >> How to add virtual iommu for Xen? >> - >> First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't >> support Q35 so far. Enabling Q35 for Xen seems not a short term task. >> Anthony did some related jobs before. >> >> I'd like to see your comments about how to implement virtual iommu for Xen. >> >> 1) Reuse Qemu virtual iommu or write a separate one for Xen? >> 2) Enable Q35 for Xen to reuse Qemu virtual iommu? >> >> Your comments are very appreciated. Thanks a lot. > > To be viable going forwards, any solution must work with PVH/HVMLite as > much as HVM. This alone negates qemu as a viable option. There's a big difference between "suboptimal" and "not viable". Obviously it would be nice to be able to have HVMLite do graphics pass-through, but if this functionality ends up being HVM-only, is that really such a huge issue? If as Paul seems to indicate, the extra work to get the functionality in Xen isn't very large, then it's worth pursuing; but I don't think we should take other options off the table. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Paul Durrant [mailto:paul.durr...@citrix.com] > Sent: Friday, May 27, 2016 4:47 PM > > > > > > A whole lot of this would be easier to reason about if/when we get a > > > basic root port implementation in Xen, which is necessary for HVMLite, > > > and which will make the interaction with qemu rather more clean. It is > > > probably worth coordinating work in this area. > > > > Would it make Xen too complex? Qemu also has its own root port > > implementation, and then you need some tricks within Qemu to not > > use its own root port but instead registering to Xen root port. Why is > > such movement more clean? > > > > Upstream QEMU already registers PCI BDFs with Xen, and Xen already handles > cf8 and cfc > accesses (to turn them into single config space read/write ioreqs). So, it > really isn't much > of a leap to put the root port implementation in Xen. > > Paul > Thanks for your information. I didn't realize that fact. Curious is there anyone already working on a basic root port support in Xen? If yes, what's the current progress? Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> -Original Message- > From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of > Tian, Kevin > Sent: 27 May 2016 09:35 > To: Andrew Cooper; Lan, Tianyu; jbeul...@suse.com; sstabell...@kernel.org; > Ian Jackson; xen-de...@lists.xensource.com; Eddie Dong; Nakajima, Jun; > yang.zhang...@gmail.com; Anthony Perard > Subject: Re: [Xen-devel] Discussion about virtual iommu support for Xen > guest > > > From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > > Sent: Thursday, May 26, 2016 7:36 PM > > > > On 26/05/16 09:29, Lan Tianyu wrote: > > > Hi All: > > > We try pushing virtual iommu support for Xen guest and there are some > > > features blocked by it. > > > > > > Motivation: > > > --- > > > 1) Add SVM(Shared Virtual Memory) support for Xen guest > > > To support iGFX pass-through for SVM enabled devices, it requires > > > virtual iommu support to emulate related registers and intercept/handle > > > guest SVM configure in the VMM. > > > > > > 2) Increase max vcpu support for one VM. > > > > > > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance > > > Computing) cloud computing, it requires more vcpus support in a single > > > VM. The usage model is to create just one VM on a machine with the > > > same number vcpus as logical cpus on the host and pin vcpu on each > > > logical cpu in order to get good compute performance. > > > > > > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and > > > supports 288 logical cpus. So we hope VM can support 288 vcpu > > > to meet HPC requirement. > > > > > > Current Linux kernel requires IR(interrupt remapping) when MAX APIC > > > ID is > 255 because interrupt only can be delivered among 0~255 cpus > > > without IR. IR in VM relies on the virtual iommu support. > > > > > > KVM Virtual iommu support status > > > > > > Current, Qemu has a basic virtual iommu to do address translation for > > > virtual device and it only works for the Q35 machine type. KVM reuses it > > > and Redhat is adding IR to support more than 255 vcpus. > > > > > > How to add virtual iommu for Xen? > > > - > > > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't > > > support Q35 so far. Enabling Q35 for Xen seems not a short term task. > > > Anthony did some related jobs before. > > > > > > I'd like to see your comments about how to implement virtual iommu for > Xen. > > > > > > 1) Reuse Qemu virtual iommu or write a separate one for Xen? > > > 2) Enable Q35 for Xen to reuse Qemu virtual iommu? > > > > > > Your comments are very appreciated. Thanks a lot. > > > > To be viable going forwards, any solution must work with PVH/HVMLite as > > much as HVM. This alone negates qemu as a viable option. > > KVM wants things done in Qemu as much as possible. Now Xen may > have more things moved into hypervisor instead for HVMLite. The end > result is that many new platform features from IHVs will require > double effort in the future (nvdimm is another example) which means > much longer enabling path to bring those new features to customers. > > I can understand the importance of covering HVMLite in Xen community, > but is it really the only factor to negate Qemu option? > > > > > From a design point of view, having Xen needing to delegate to qemu to > > inject an interrupt into a guest seems backwards. > > > > > > A whole lot of this would be easier to reason about if/when we get a > > basic root port implementation in Xen, which is necessary for HVMLite, > > and which will make the interaction with qemu rather more clean. It is > > probably worth coordinating work in this area. > > Would it make Xen too complex? Qemu also has its own root port > implementation, and then you need some tricks within Qemu to not > use its own root port but instead registering to Xen root port. Why is > such movement more clean? > Upstream QEMU already registers PCI BDFs with Xen, and Xen already handles cf8 and cfc accesses (to turn them into single config space read/write ioreqs). So, it really isn't much of a leap to put the root port implementation in Xen. Paul > > > > > > As for the individual issue of 288vcpu support, there are already issues > > with 64vcpu guests at the moment. While it is certainly fine to remove > > the hard limit at 255 vcpus, there is a lot of other work required to > > even get 128vcpu guests stable. > > > > Thanks > Kevin > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com] > Sent: Thursday, May 26, 2016 7:36 PM > > On 26/05/16 09:29, Lan Tianyu wrote: > > Hi All: > > We try pushing virtual iommu support for Xen guest and there are some > > features blocked by it. > > > > Motivation: > > --- > > 1) Add SVM(Shared Virtual Memory) support for Xen guest > > To support iGFX pass-through for SVM enabled devices, it requires > > virtual iommu support to emulate related registers and intercept/handle > > guest SVM configure in the VMM. > > > > 2) Increase max vcpu support for one VM. > > > > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance > > Computing) cloud computing, it requires more vcpus support in a single > > VM. The usage model is to create just one VM on a machine with the > > same number vcpus as logical cpus on the host and pin vcpu on each > > logical cpu in order to get good compute performance. > > > > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and > > supports 288 logical cpus. So we hope VM can support 288 vcpu > > to meet HPC requirement. > > > > Current Linux kernel requires IR(interrupt remapping) when MAX APIC > > ID is > 255 because interrupt only can be delivered among 0~255 cpus > > without IR. IR in VM relies on the virtual iommu support. > > > > KVM Virtual iommu support status > > > > Current, Qemu has a basic virtual iommu to do address translation for > > virtual device and it only works for the Q35 machine type. KVM reuses it > > and Redhat is adding IR to support more than 255 vcpus. > > > > How to add virtual iommu for Xen? > > - > > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't > > support Q35 so far. Enabling Q35 for Xen seems not a short term task. > > Anthony did some related jobs before. > > > > I'd like to see your comments about how to implement virtual iommu for Xen. > > > > 1) Reuse Qemu virtual iommu or write a separate one for Xen? > > 2) Enable Q35 for Xen to reuse Qemu virtual iommu? > > > > Your comments are very appreciated. Thanks a lot. > > To be viable going forwards, any solution must work with PVH/HVMLite as > much as HVM. This alone negates qemu as a viable option. KVM wants things done in Qemu as much as possible. Now Xen may have more things moved into hypervisor instead for HVMLite. The end result is that many new platform features from IHVs will require double effort in the future (nvdimm is another example) which means much longer enabling path to bring those new features to customers. I can understand the importance of covering HVMLite in Xen community, but is it really the only factor to negate Qemu option? > > From a design point of view, having Xen needing to delegate to qemu to > inject an interrupt into a guest seems backwards. > > > A whole lot of this would be easier to reason about if/when we get a > basic root port implementation in Xen, which is necessary for HVMLite, > and which will make the interaction with qemu rather more clean. It is > probably worth coordinating work in this area. Would it make Xen too complex? Qemu also has its own root port implementation, and then you need some tricks within Qemu to not use its own root port but instead registering to Xen root port. Why is such movement more clean? > > > As for the individual issue of 288vcpu support, there are already issues > with 64vcpu guests at the moment. While it is certainly fine to remove > the hard limit at 255 vcpus, there is a lot of other work required to > even get 128vcpu guests stable. > Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 2016年05月26日 19:35, Andrew Cooper wrote: > On 26/05/16 09:29, Lan Tianyu wrote: > > To be viable going forwards, any solution must work with PVH/HVMLite as > much as HVM. This alone negates qemu as a viable option. > > From a design point of view, having Xen needing to delegate to qemu to > inject an interrupt into a guest seems backwards. > Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and the qemu virtual iommu can't work for it. We have to rewrite virtual iommu in the Xen, right? > > A whole lot of this would be easier to reason about if/when we get a > basic root port implementation in Xen, which is necessary for HVMLite, > and which will make the interaction with qemu rather more clean. It is > probably worth coordinating work in this area. The virtual iommu also should be under basic root port in Xen, right? > > As for the individual issue of 288vcpu support, there are already issues > with 64vcpu guests at the moment. While it is certainly fine to remove > the hard limit at 255 vcpus, there is a lot of other work required to > even get 128vcpu guests stable. Could you give some points to these issues? We are enabling more vcpus support and it can boot up 255 vcpus without IR support basically. It's very helpful to learn about known issues. We will also add more tests for 128 vcpus into our regular test to find related bugs. Increasing max vcpu to 255 should be a good start. > > ~Andrew > -- Best regards Tianyu Lan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Yang Zhang [mailto:yang.zhang...@gmail.com] > Sent: Friday, May 27, 2016 10:26 AM > > On 2016/5/26 16:29, Lan Tianyu wrote: > > Hi All: > > We try pushing virtual iommu support for Xen guest and there are some > > features blocked by it. > > > > Motivation: > > --- > > 1) Add SVM(Shared Virtual Memory) support for Xen guest > > To support iGFX pass-through for SVM enabled devices, it requires > > virtual iommu support to emulate related registers and intercept/handle > > guest SVM configure in the VMM. > > IIRC, SVM needs the nested IOMMU support not only virtual iommu. Correct > me if i am wrong. > nested in physical IOMMU. You don't need to present nested in vIOMMU. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
> From: Lan, Tianyu > Sent: Friday, May 27, 2016 10:27 AM > > On 2016年05月26日 16:42, Dong, Eddie wrote: > > If enabling virtual Q35 solves the problem, it has the advantage: When more > > and more > virtual IOMMU feature comes (likely), we can reuse the KVM code for Xen. > > How big is the effort for virtual Q35? > > I think the most effort are to rebuild all ACPI tables for Q35 and add > Q35 support in the hvmloader. My concern is about new ACPI tables' > compatibility issue. Especially with Windows guest. > Another question is how tightly this vIOMMU implementation is bound to Q35? Can it work with old chipset too and if yes how big is the effort compared to others? Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 2016/5/26 16:29, Lan Tianyu wrote: Hi All: We try pushing virtual iommu support for Xen guest and there are some features blocked by it. Motivation: --- 1) Add SVM(Shared Virtual Memory) support for Xen guest To support iGFX pass-through for SVM enabled devices, it requires virtual iommu support to emulate related registers and intercept/handle guest SVM configure in the VMM. IIRC, SVM needs the nested IOMMU support not only virtual iommu. Correct me if i am wrong. 2) Increase max vcpu support for one VM. So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance Computing) cloud computing, it requires more vcpus support in a single VM. The usage model is to create just one VM on a machine with the same number vcpus as logical cpus on the host and pin vcpu on each logical cpu in order to get good compute performance. Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and supports 288 logical cpus. So we hope VM can support 288 vcpu to meet HPC requirement. Current Linux kernel requires IR(interrupt remapping) when MAX APIC ID is > 255 because interrupt only can be delivered among 0~255 cpus without IR. IR in VM relies on the virtual iommu support. KVM Virtual iommu support status Current, Qemu has a basic virtual iommu to do address translation for virtual device and it only works for the Q35 machine type. KVM reuses it and Redhat is adding IR to support more than 255 vcpus. How to add virtual iommu for Xen? - First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't support Q35 so far. Enabling Q35 for Xen seems not a short term task. Anthony did some related jobs before. I'd like to see your comments about how to implement virtual iommu for Xen. 1) Reuse Qemu virtual iommu or write a separate one for Xen? 2) Enable Q35 for Xen to reuse Qemu virtual iommu? Your comments are very appreciated. Thanks a lot. -- best regards yang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 2016年05月26日 16:42, Dong, Eddie wrote: > If enabling virtual Q35 solves the problem, it has the advantage: When more > and more virtual IOMMU feature comes (likely), we can reuse the KVM code for > Xen. > How big is the effort for virtual Q35? I think the most effort are to rebuild all ACPI tables for Q35 and add Q35 support in the hvmloader. My concern is about new ACPI tables' compatibility issue. Especially with Windows guest. -- Best regards Tianyu Lan > > Thx Eddie > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
On 26/05/16 09:29, Lan Tianyu wrote: > Hi All: > We try pushing virtual iommu support for Xen guest and there are some > features blocked by it. > > Motivation: > --- > 1) Add SVM(Shared Virtual Memory) support for Xen guest > To support iGFX pass-through for SVM enabled devices, it requires > virtual iommu support to emulate related registers and intercept/handle > guest SVM configure in the VMM. > > 2) Increase max vcpu support for one VM. > > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance > Computing) cloud computing, it requires more vcpus support in a single > VM. The usage model is to create just one VM on a machine with the > same number vcpus as logical cpus on the host and pin vcpu on each > logical cpu in order to get good compute performance. > > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and > supports 288 logical cpus. So we hope VM can support 288 vcpu > to meet HPC requirement. > > Current Linux kernel requires IR(interrupt remapping) when MAX APIC > ID is > 255 because interrupt only can be delivered among 0~255 cpus > without IR. IR in VM relies on the virtual iommu support. > > KVM Virtual iommu support status > > Current, Qemu has a basic virtual iommu to do address translation for > virtual device and it only works for the Q35 machine type. KVM reuses it > and Redhat is adding IR to support more than 255 vcpus. > > How to add virtual iommu for Xen? > - > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't > support Q35 so far. Enabling Q35 for Xen seems not a short term task. > Anthony did some related jobs before. > > I'd like to see your comments about how to implement virtual iommu for Xen. > > 1) Reuse Qemu virtual iommu or write a separate one for Xen? > 2) Enable Q35 for Xen to reuse Qemu virtual iommu? > > Your comments are very appreciated. Thanks a lot. To be viable going forwards, any solution must work with PVH/HVMLite as much as HVM. This alone negates qemu as a viable option. From a design point of view, having Xen needing to delegate to qemu to inject an interrupt into a guest seems backwards. A whole lot of this would be easier to reason about if/when we get a basic root port implementation in Xen, which is necessary for HVMLite, and which will make the interaction with qemu rather more clean. It is probably worth coordinating work in this area. As for the individual issue of 288vcpu support, there are already issues with 64vcpu guests at the moment. While it is certainly fine to remove the hard limit at 255 vcpus, there is a lot of other work required to even get 128vcpu guests stable. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Discussion about virtual iommu support for Xen guest
If enabling virtual Q35 solves the problem, it has the advantage: When more and more virtual IOMMU feature comes (likely), we can reuse the KVM code for Xen. How big is the effort for virtual Q35? Thx Eddie > -Original Message- > From: Lan, Tianyu > Sent: Thursday, May 26, 2016 4:30 PM > To: jbeul...@suse.com; sstabell...@kernel.org; ian.jack...@eu.citrix.com; > xen-de...@lists.xensource.com; Tian, Kevin; Dong, > Eddie ; Nakajima, Jun ; > yang.zhang...@gmail.com; anthony.per...@citrix.com > Subject: Discussion about virtual iommu support for Xen guest > > Hi All: > We try pushing virtual iommu support for Xen guest and there are some > features blocked by it. > > Motivation: > --- > 1) Add SVM(Shared Virtual Memory) support for Xen guest To support iGFX > pass-through for SVM enabled devices, it requires virtual iommu support to > emulate related registers and intercept/handle guest SVM configure in the > VMM. > > 2) Increase max vcpu support for one VM. > > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance > Computing) cloud computing, it requires more vcpus support in a single VM. > The usage model is to create just one VM on a machine with the same number > vcpus as logical cpus on the host and pin vcpu on each logical cpu in order > to get > good compute performance. > > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and supports > 288 logical cpus. So we hope VM can support 288 vcpu to meet HPC > requirement. > > Current Linux kernel requires IR(interrupt remapping) when MAX APIC ID is > > 255 because interrupt only can be delivered among 0~255 cpus without IR. IR in > VM relies on the virtual iommu support. > > KVM Virtual iommu support status > > Current, Qemu has a basic virtual iommu to do address translation for virtual > device and it only works for the Q35 machine type. KVM reuses it and Redhat is > adding IR to support more than 255 vcpus. > > How to add virtual iommu for Xen? > - > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't > support Q35 so far. Enabling Q35 for Xen seems not a short term task. > Anthony did some related jobs before. > > I'd like to see your comments about how to implement virtual iommu for Xen. > > 1) Reuse Qemu virtual iommu or write a separate one for Xen? > 2) Enable Q35 for Xen to reuse Qemu virtual iommu? > > Your comments are very appreciated. Thanks a lot. > -- > Best regards > Tianyu Lan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Discussion about virtual iommu support for Xen guest
Hi All: We try pushing virtual iommu support for Xen guest and there are some features blocked by it. Motivation: --- 1) Add SVM(Shared Virtual Memory) support for Xen guest To support iGFX pass-through for SVM enabled devices, it requires virtual iommu support to emulate related registers and intercept/handle guest SVM configure in the VMM. 2) Increase max vcpu support for one VM. So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance Computing) cloud computing, it requires more vcpus support in a single VM. The usage model is to create just one VM on a machine with the same number vcpus as logical cpus on the host and pin vcpu on each logical cpu in order to get good compute performance. Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and supports 288 logical cpus. So we hope VM can support 288 vcpu to meet HPC requirement. Current Linux kernel requires IR(interrupt remapping) when MAX APIC ID is > 255 because interrupt only can be delivered among 0~255 cpus without IR. IR in VM relies on the virtual iommu support. KVM Virtual iommu support status Current, Qemu has a basic virtual iommu to do address translation for virtual device and it only works for the Q35 machine type. KVM reuses it and Redhat is adding IR to support more than 255 vcpus. How to add virtual iommu for Xen? - First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't support Q35 so far. Enabling Q35 for Xen seems not a short term task. Anthony did some related jobs before. I'd like to see your comments about how to implement virtual iommu for Xen. 1) Reuse Qemu virtual iommu or write a separate one for Xen? 2) Enable Q35 for Xen to reuse Qemu virtual iommu? Your comments are very appreciated. Thanks a lot. -- Best regards Tianyu Lan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel