Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-08-02 Thread Lan, Tianyu

On 5/27/2016 4:19 PM, Lan Tianyu wrote:

> As for the individual issue of 288vcpu support, there are already issues
> with 64vcpu guests at the moment. While it is certainly fine to remove
> the hard limit at 255 vcpus, there is a lot of other work required to
> even get 128vcpu guests stable.


Could you give some points to these issues? We are enabling more vcpus
support and it can boot up 255 vcpus without IR support basically. It's
very helpful to learn about known issues.


Hi Andrew:
We are designing vIOMMU support for Xen. Increasing vcpu
from 128 to 255 also can be implemented parallelly since it doesn't
need vIOMMU support. From your previous comment "there is a lot of other
work required to even get 128vcpu guests stable", you have some concerns 
about stability of 128vcpus. I wonder what we need to do before

starting work of increasing vcpu number from 128 to 255?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-07-05 Thread Lan, Tianyu



On 7/5/2016 9:57 PM, Jan Beulich wrote:

On 05.07.16 at 15:37,  wrote:

Hi Stefano, Andrew and Jan:
Could you give us more guides here to move forward virtual iommu
development? Thanks.


Due to ...


On 6/29/2016 11:04 AM, Tian, Kevin wrote:

Please let us know your thoughts. If no one has explicit objection based
on above rough idea, we'll go to write the high level design doc for more
detail discussion.


... this I actually expected we'd get to see something, rather than
our input being waited for.


OK. I get it. Because no response, double confirm we are on the
right way.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-07-05 Thread Jan Beulich
>>> On 05.07.16 at 15:37,  wrote:
> Hi Stefano, Andrew and Jan:
> Could you give us more guides here to move forward virtual iommu 
> development? Thanks.

Due to ...

> On 6/29/2016 11:04 AM, Tian, Kevin wrote:
>> Please let us know your thoughts. If no one has explicit objection based
>> on above rough idea, we'll go to write the high level design doc for more
>> detail discussion.

... this I actually expected we'd get to see something, rather than
our input being waited for.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-07-05 Thread Lan, Tianyu

Hi Stefano, Andrew and Jan:
Could you give us more guides here to move forward virtual iommu 
development? Thanks.


On 6/29/2016 11:04 AM, Tian, Kevin wrote:

From: Lan, Tianyu
Sent: Sunday, June 26, 2016 9:43 PM

On 6/8/2016 4:11 PM, Tian, Kevin wrote:

It makes sense... I thought you used this security issue against
placing vIOMMU in Qemu, which made me a bit confused earlier. :-)

We are still thinking feasibility of some staging plan, e.g. first
implementing some vIOMMU features w/o dependency on root-complex in
Xen (HVM only) and then later enabling full vIOMMU feature w/
root-complex in Xen (covering HVMLite). If we can reuse most code
between two stages while shorten time-to-market by half (e.g. from
2yr to 1yr), it's still worthy of pursuing. will report back soon
once the idea is consolidated...

Thanks Kevin



After discussion with Kevin, we draft a staging plan of implementing
vIOMMU in Xen based on Qemu host bridge. Both virtual devices and
passthough devices use one vIOMMU in Xen. Your comments are very
appreciated.


The rationale here is to separate BIOS structures from actual vIOMMU
emulation. vIOMMU will be always emulated in Xen hypervisor, regardless of
where Q35 emulation is done or whether it's HVM or HVMLite. The staging
plan is more for the BIOS structure reporting which is Q35 specific. For now
we first target Qemu Q35 emulation, with a set of vIOMMU ops introduced
as Tianyu listed below to help interact between Qemu and Xen. Later when
Xen Q35 emulation is ready, the reporting can be done in Xen.

The main limitation of this model is on DMA emulation of Qemu virtual
devices, which needs to query Xen vIOMMU for every virtual DMA. It is
possibly fine for virtual devices which are normally not for performance
critical usages. Also there may be some chance to cache some translations
within Qemu like thru ATS (may not worthy of it though...).



1. Enable Q35 support in the hvmloader.
In the real world, VTD support starts from Q35 and OS may have such
assumption that VTD only exists on the Q35 or newer platform.
Q35 support seems necessary for vIOMMU support.

In regardless of Q35 host bridge in the Qemu or Xen hypervisor,
hvmloader needs to be compatible with Q35 and build Q35 ACPI tables.

Qemu already has Q35 emulation and so the hvmloader job can start with
Qemu. When host bridge in Xen is ready, these changes also can be reused.

2. Implement vIOMMU in Xen based on Qemu host bridge.
Add a new device type "Xen iommu" in the Qemu as a wrapper of vIOMMU
hypercalls to communicate with Xen vIOMMU.

It's in charge of:
1) Query vIOMMU capability(E,G interrupt remapping, DMA translation, SVM
and so on)
2) Create vIOMMU with predefined base address of IOMMU unit regs
3) Notify hvmloader to populate related content in the ACPI DMAR
table.(Add vIOMMU info to struct hvm_info_table)
4) Deal with DMA translation request of virtual devices and return
back translated address.
5) Attach/detach hotplug device from vIOMMU


New hypercalls for vIOMMU that are also necessary when host bridge in Xen.
1) Query vIOMMU capability
2) Create vIOMMU(IOMMU unit reg base as params)
3) Virtual device's DMA translation
4) Attach/detach hotplug device from VIOMMU


We don't need 4). Hotplug device is automatically handled by the vIOMMU
with INCLUDE_ALL flag set (which should be the case if we only have one
vIOMMU in Xen). We don't need further notify this event to Xen vIOMMU.

And once we have Xen Q35 emulation in place, possibly only 3) is required
then.




All IOMMU emulations will be done in Xen
1) DMA translation
2) Interrupt remapping
3) Shared Virtual Memory (SVM)


Please let us know your thoughts. If no one has explicit objection based
on above rough idea, we'll go to write the high level design doc for more
detail discussion.

Thanks
Kevin



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-28 Thread Tian, Kevin
> From: Lan, Tianyu
> Sent: Sunday, June 26, 2016 9:43 PM
> 
> On 6/8/2016 4:11 PM, Tian, Kevin wrote:
> > It makes sense... I thought you used this security issue against
> > placing vIOMMU in Qemu, which made me a bit confused earlier. :-)
> >
> > We are still thinking feasibility of some staging plan, e.g. first
> > implementing some vIOMMU features w/o dependency on root-complex in
> > Xen (HVM only) and then later enabling full vIOMMU feature w/
> > root-complex in Xen (covering HVMLite). If we can reuse most code
> > between two stages while shorten time-to-market by half (e.g. from
> > 2yr to 1yr), it's still worthy of pursuing. will report back soon
> > once the idea is consolidated...
> >
> > Thanks Kevin
> 
> 
> After discussion with Kevin, we draft a staging plan of implementing
> vIOMMU in Xen based on Qemu host bridge. Both virtual devices and
> passthough devices use one vIOMMU in Xen. Your comments are very
> appreciated.

The rationale here is to separate BIOS structures from actual vIOMMU
emulation. vIOMMU will be always emulated in Xen hypervisor, regardless of 
where Q35 emulation is done or whether it's HVM or HVMLite. The staging
plan is more for the BIOS structure reporting which is Q35 specific. For now
we first target Qemu Q35 emulation, with a set of vIOMMU ops introduced
as Tianyu listed below to help interact between Qemu and Xen. Later when 
Xen Q35 emulation is ready, the reporting can be done in Xen.

The main limitation of this model is on DMA emulation of Qemu virtual
devices, which needs to query Xen vIOMMU for every virtual DMA. It is 
possibly fine for virtual devices which are normally not for performance 
critical usages. Also there may be some chance to cache some translations
within Qemu like thru ATS (may not worthy of it though...).

> 
> 1. Enable Q35 support in the hvmloader.
> In the real world, VTD support starts from Q35 and OS may have such
> assumption that VTD only exists on the Q35 or newer platform.
> Q35 support seems necessary for vIOMMU support.
> 
> In regardless of Q35 host bridge in the Qemu or Xen hypervisor,
> hvmloader needs to be compatible with Q35 and build Q35 ACPI tables.
> 
> Qemu already has Q35 emulation and so the hvmloader job can start with
> Qemu. When host bridge in Xen is ready, these changes also can be reused.
> 
> 2. Implement vIOMMU in Xen based on Qemu host bridge.
> Add a new device type "Xen iommu" in the Qemu as a wrapper of vIOMMU
> hypercalls to communicate with Xen vIOMMU.
> 
> It's in charge of:
> 1) Query vIOMMU capability(E,G interrupt remapping, DMA translation, SVM
> and so on)
> 2) Create vIOMMU with predefined base address of IOMMU unit regs
> 3) Notify hvmloader to populate related content in the ACPI DMAR
> table.(Add vIOMMU info to struct hvm_info_table)
> 4) Deal with DMA translation request of virtual devices and return
> back translated address.
> 5) Attach/detach hotplug device from vIOMMU
> 
> 
> New hypercalls for vIOMMU that are also necessary when host bridge in Xen.
> 1) Query vIOMMU capability
> 2) Create vIOMMU(IOMMU unit reg base as params)
> 3) Virtual device's DMA translation
> 4) Attach/detach hotplug device from VIOMMU

We don't need 4). Hotplug device is automatically handled by the vIOMMU
with INCLUDE_ALL flag set (which should be the case if we only have one
vIOMMU in Xen). We don't need further notify this event to Xen vIOMMU.

And once we have Xen Q35 emulation in place, possibly only 3) is required
then.

> 
> 
> All IOMMU emulations will be done in Xen
> 1) DMA translation
> 2) Interrupt remapping
> 3) Shared Virtual Memory (SVM)

Please let us know your thoughts. If no one has explicit objection based 
on above rough idea, we'll go to write the high level design doc for more
detail discussion.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-08 Thread Tian, Kevin
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: Tuesday, June 07, 2016 6:07 PM
> 
> On Tue, 7 Jun 2016, Tian, Kevin wrote:
> > > I think of QEMU as a provider of complex, high level emulators, such as
> > > the e1000, Cirrus VGA, SCSI controllers, etc., which don't necessarily
> > > need to be fast.
> >
> > Earlier you said Qemu imposes security issues. Here you said Qemu can
> > still provide complex emulators. Does it mean that security issue in Qemu
> > simply comes from the part which should be moved into Xen? Any
> > elaboration here?
> 
> It imposes security issues because, although it doesn't have to run as
> root anymore, QEMU still has to run with fully privileged libxc and
> xenstore handles. In other words, a malicious guest breaking into QEMU
> would have relatively easy access to the whole host. There is a design
> to solve this, see Ian Jackson's talk at FOSDEM this year:
> 
> https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/
> https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/a
> ttachments/other/921/export/events/attachments/virt_iaas_qemu_for_xen_secure_by_
> default/other/921/talk.txt
> 
> Other solutions to solve this issue are stubdoms or simply using PV
> guests and HVMlite guests only.
> 
> Irrespective of the problematic security angle, which is unsolved, I
> think of QEMU as a provider of complex emulators, as I wrote above.
> 
> Does it make sense?

It makes sense... I thought you used this security issue against placing 
vIOMMU in Qemu, which made me a bit confused earlier. :-)

We are still thinking feasibility of some staging plan, e.g. first implementing
some vIOMMU features w/o dependency on root-complex in Xen (HVM only)
and then later enabling full vIOMMU feature w/ root-complex in Xen (covering 
HVMLite). If we can reuse most code between two stages while shorten 
time-to-market by half (e.g. from 2yr to 1yr), it's still worthy of pursuing.
will report back soon once the idea is consolidated...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-07 Thread Stefano Stabellini
On Tue, 7 Jun 2016, Tian, Kevin wrote:
> > I think of QEMU as a provider of complex, high level emulators, such as
> > the e1000, Cirrus VGA, SCSI controllers, etc., which don't necessarily
> > need to be fast.
> 
> Earlier you said Qemu imposes security issues. Here you said Qemu can 
> still provide complex emulators. Does it mean that security issue in Qemu
> simply comes from the part which should be moved into Xen? Any
> elaboration here?

It imposes security issues because, although it doesn't have to run as
root anymore, QEMU still has to run with fully privileged libxc and
xenstore handles. In other words, a malicious guest breaking into QEMU
would have relatively easy access to the whole host. There is a design
to solve this, see Ian Jackson's talk at FOSDEM this year:

https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/
https://fosdem.org/2016/schedule/event/virt_iaas_qemu_for_xen_secure_by_default/attachments/other/921/export/events/attachments/virt_iaas_qemu_for_xen_secure_by_default/other/921/talk.txt

Other solutions to solve this issue are stubdoms or simply using PV
guests and HVMlite guests only.

Irrespective of the problematic security angle, which is unsolved, I
think of QEMU as a provider of complex emulators, as I wrote above.

Does it make sense?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-07 Thread Jan Beulich
>>> On 07.06.16 at 07:14,  wrote:
> After some internal discussion with Tianyu/Eddie, I realized my earlier
> description is incomplete which takes only passthrough device into
> consideration (as you saw it's mainly around interaction between vIOMMU
> and pIOMMU). However from guest p.o.v, all the devices should be covered
> by vIOMMU to match today's physical platform, including:
> 
> 1) DMA-capable virtual device in Qemu, in Dom0 user space
> 2) PV devices, in Dom0 kernel space
> 3) Passthrough devices, in Xen hypervisor
> 
> A natural implementation is to have vIOMMU together with where the
> DMA is emulated, which ends up to a possible way with multiple vIOMMUs
> in multiple layers:
> 
> 1) vIOMMU in Dom0 user
> 2) vIOMMU in Dom0 kernel
> 3) vIOMMU in Xen hypervisor
> 
> Of course we may come up an option to still keep all vIOMMUs in Xen
> hypervisor, which however means every vDMA operations in Qemu or
> BE driver need to issue Xen hypercall to get vIOMMU's approval. I haven't
> thought thoroughly how big/complex this issue is, but it does be a
> limitation from a quick thought.
> 
> So, likely we'll have to consider presence of multiple vIOMMUs, each in 
> different layers, regardless of root-complex in Qemu or Xen. There
> needs to be some interface abstractions to allow vIOMMU/root-complex
> communicating with each other. Well, not an easy task...

Right - for DMA-capable devices emulated in qemu, it would seem
natural to have them go through a vIOMMU in qemu. Whether
that vIOMMU implementation would have to consult the hypervisor
(or perhaps even just be a wrapper around various hypercalls, i.e.
backed by an implementation in the hypervisor) would be an
independent aspect.

Otoh, having vIOMMU in only qemu, and requiring round trips
through qemu for any of the hypervisor's internal purposes doesn't
seem like a good idea to me.

And finally I don't see the relevance of PV devices here: Their
nature makes it that they could easily be left completely independent
of an vIOMMU (as long as there's no plan to bypass a virtualization
level in the nested case, i.e. a PV frontend in L2 with a backend
living in L0).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-06 Thread Tian, Kevin
> From: Stefano Stabellini
> Sent: Saturday, June 04, 2016 1:15 AM
> 
> On Fri, 3 Jun 2016, Andrew Cooper wrote:
> > On 03/06/16 12:17, Tian, Kevin wrote:
> > >> Very sorry for the delay.
> > >>
> > >> There are multiple interacting issues here.  On the one side, it would
> > >> be useful if we could have a central point of coordination on
> > >> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
> > >> would you mind organising that?
> > >>
> > >> For the qemu/xen interaction, the current state is woeful and a tangled
> > >> mess.  I wish to ensure that we don't make any development decisions
> > >> which makes the situation worse.
> > >>
> > >> In your case, the two motivations are quite different I would recommend
> > >> dealing with them independently.
> > >>
> > >> IIRC, the issue with more than 255 cpus and interrupt remapping is that
> > >> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
> > >> can't be programmed to generate x2apic interrupts?  In principle, if you
> > >> don't have an IOAPIC, are there any other issues to be considered?  What
> > >> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
> > >> deliver xapic interrupts?
> > > The key is the APIC ID. There is no modification to existing PCI MSI and
> > > IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
> > > interrupt message containing 8bit APIC ID, which cannot address >255
> > > cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
> > > enable >255 cpus with x2apic mode.
> >
> > Thanks for clarifying.
> >
> > >
> > > If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot
> > > deliver interrupts to all cpus in the system if #cpu > 255.
> >
> > Ok.  So not ideal (and we certainly want to address it), but this isn't
> > a complete show stopper for a guest.
> >
> > >> On the other side of things, what is IGD passthrough going to look like
> > >> in Skylake?  Is there any device-model interaction required (i.e. the
> > >> opregion), or will it work as a completely standalone device?  What are
> > >> your plans with the interaction of virtual graphics and shared virtual
> > >> memory?
> > >>
> > > The plan is to use a so-called universal pass-through driver in the guest
> > > which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.)
> >
> > This is fantastic news.
> >
> > >
> > > 
> > > Here is a brief of potential usages relying on vIOMMU:
> > >
> > > a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread.
> > > It requires interrupt remapping capability present on vIOMMU;
> > >
> > > b) support guest SVM (Shared Virtual Memory), which relies on the
> > > 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU
> > > needs to enable both 1st level and 2nd level translation in nested
> > > mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is
> > > the main usage today (to support OpenCL 2.0 SVM feature). In the
> > > future SVM might be used by other I/O devices too;
> > >
> > > c) support VFIO-based user space driver (e.g. DPDK) in the guest,
> > > which relies on the 2nd level translation capability (IOVA->GPA) on
> > > vIOMMU. pIOMMU 2nd level becomes a shadowing structure of
> > > vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA);
> >
> > All of these look like interesting things to do.  I know there is a lot
> > of interest for b).
> >
> > As a quick aside, does Xen currently boot on a Phi?  Last time I looked
> > at the Phi manual, I would expect Xen to crash on boot because of MCXSR
> > differences from more-common x86 hardware.

Tianyu can correct me for the detail info. Xen can boot on Xeon Phi. However
we need a hacky patch in guest Linux kernel to disable dependency check
around interrupt remapping. Otherwise guest kernel boot will fail.

Now we're suffering from some performance issue. When the analysis is
ongoing, could you elaborate the limitation you see with 64vcpu guest? It
would be helpful whether we are hunting the same problem or not...

> >
> > >
> > > 
> > > And below is my thought viability of implementing vIOMMU in Qemu:
> > >
> > > a) enable >255 vcpus:
> > >
> > >   o Enable Q35 in Qemu-Xen;
> > >   o Add interrupt remapping in Qemu vIOMMU;
> > >   o Virtual interrupt injection in hypervisor needs to know virtual
> > > interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI,
> > > which requires new hypervisor interfaces as Andrew pointed out:
> > >   * either for hypervisor to query IR from Qemu which is not
> > > good;
> > >   * or for Qemu to register IR info to hypervisor which means
> > > partial IR knowledge implemented in hypervisor (then why not putting
> > > whole IR emulation in Xen?)
> > >
> > > b) support SVM
> > >
> > >   o Enable Q35 in Qemu-Xen;
> > >   o Add 1st level translation capability in Qemu vIOMMU;
> > >   o VT-d context entry points to guest 1st level translation table
> > 

Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Stefano Stabellini
On Fri, 3 Jun 2016, Andrew Cooper wrote:
> On 03/06/16 12:17, Tian, Kevin wrote:
> >> Very sorry for the delay.
> >>
> >> There are multiple interacting issues here.  On the one side, it would
> >> be useful if we could have a central point of coordination on
> >> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
> >> would you mind organising that?
> >>
> >> For the qemu/xen interaction, the current state is woeful and a tangled
> >> mess.  I wish to ensure that we don't make any development decisions
> >> which makes the situation worse.
> >>
> >> In your case, the two motivations are quite different I would recommend
> >> dealing with them independently.
> >>
> >> IIRC, the issue with more than 255 cpus and interrupt remapping is that
> >> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
> >> can't be programmed to generate x2apic interrupts?  In principle, if you
> >> don't have an IOAPIC, are there any other issues to be considered?  What
> >> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
> >> deliver xapic interrupts?
> > The key is the APIC ID. There is no modification to existing PCI MSI and
> > IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
> > interrupt message containing 8bit APIC ID, which cannot address >255
> > cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
> > enable >255 cpus with x2apic mode.
> 
> Thanks for clarifying.
> 
> >
> > If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot
> > deliver interrupts to all cpus in the system if #cpu > 255.
> 
> Ok.  So not ideal (and we certainly want to address it), but this isn't
> a complete show stopper for a guest.
> 
> >> On the other side of things, what is IGD passthrough going to look like
> >> in Skylake?  Is there any device-model interaction required (i.e. the
> >> opregion), or will it work as a completely standalone device?  What are
> >> your plans with the interaction of virtual graphics and shared virtual
> >> memory?
> >>
> > The plan is to use a so-called universal pass-through driver in the guest
> > which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.)
> 
> This is fantastic news.
> 
> >
> > 
> > Here is a brief of potential usages relying on vIOMMU:
> >
> > a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. 
> > It requires interrupt remapping capability present on vIOMMU;
> >
> > b) support guest SVM (Shared Virtual Memory), which relies on the
> > 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU
> > needs to enable both 1st level and 2nd level translation in nested
> > mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is
> > the main usage today (to support OpenCL 2.0 SVM feature). In the
> > future SVM might be used by other I/O devices too;
> >
> > c) support VFIO-based user space driver (e.g. DPDK) in the guest,
> > which relies on the 2nd level translation capability (IOVA->GPA) on 
> > vIOMMU. pIOMMU 2nd level becomes a shadowing structure of
> > vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA);
> 
> All of these look like interesting things to do.  I know there is a lot
> of interest for b).
> 
> As a quick aside, does Xen currently boot on a Phi?  Last time I looked
> at the Phi manual, I would expect Xen to crash on boot because of MCXSR
> differences from more-common x86 hardware.
> 
> >
> > 
> > And below is my thought viability of implementing vIOMMU in Qemu:
> >
> > a) enable >255 vcpus:
> >
> > o Enable Q35 in Qemu-Xen;
> > o Add interrupt remapping in Qemu vIOMMU;
> > o Virtual interrupt injection in hypervisor needs to know virtual
> > interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI,
> > which requires new hypervisor interfaces as Andrew pointed out:
> > * either for hypervisor to query IR from Qemu which is not
> > good;
> > * or for Qemu to register IR info to hypervisor which means
> > partial IR knowledge implemented in hypervisor (then why not putting
> > whole IR emulation in Xen?)
> >
> > b) support SVM
> >
> > o Enable Q35 in Qemu-Xen;
> > o Add 1st level translation capability in Qemu vIOMMU;
> > o VT-d context entry points to guest 1st level translation table
> > which is nest-translated by 2nd level translation table so vIOMMU
> > structure can be directly linked. It means:
> > * Xen IOMMU driver enables nested mode;
> > * Introduce a new hypercall so Qemu vIOMMU can register
> > GPA root of guest 1st level translation table which is then written
> > to context entry in pIOMMU;
> >
> > c) support VFIO-based user space driver
> >
> > o Enable Q35 in Qemu-Xen;
> > o Leverage existing 2nd level translation implementation in Qemu 
> > vIOMMU;
> > o Change Xen IOMMU to support (IOVA->HPA) translation which
> > means decouple current logic from P2M layer (only for GPA->HPA);
> > o 

Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Jan Beulich
>>> On 03.06.16 at 15:51,  wrote:
> As a quick aside, does Xen currently boot on a Phi?  Last time I looked
> at the Phi manual, I would expect Xen to crash on boot because of MCXSR
> differences from more-common x86 hardware.

It does boot, as per reports we've got. Perhaps, much like I did
until I was explicitly told there's a significant difference, you're
mixing the earlier non-self-booting one with KNL?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Andrew Cooper
On 03/06/16 14:09, Lan, Tianyu wrote:
>
>
> On 6/3/2016 7:17 PM, Tian, Kevin wrote:
>>> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
>>> Sent: Friday, June 03, 2016 2:59 AM
>>>
>>> On 02/06/16 16:03, Lan, Tianyu wrote:
 On 5/27/2016 4:19 PM, Lan Tianyu wrote:
> On 2016年05月26日 19:35, Andrew Cooper wrote:
>> On 26/05/16 09:29, Lan Tianyu wrote:
>>
>> To be viable going forwards, any solution must work with
>> PVH/HVMLite as
>> much as HVM.  This alone negates qemu as a viable option.
>>
>> From a design point of view, having Xen needing to delegate to
>> qemu to
>> inject an interrupt into a guest seems backwards.
>>
>
> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
> the qemu virtual iommu can't work for it. We have to rewrite virtual
> iommu in the Xen, right?
>
>>
>> A whole lot of this would be easier to reason about if/when we get a
>> basic root port implementation in Xen, which is necessary for
>> HVMLite,
>> and which will make the interaction with qemu rather more clean. 
>> It is
>> probably worth coordinating work in this area.
>
> The virtual iommu also should be under basic root port in Xen, right?
>
>>
>> As for the individual issue of 288vcpu support, there are already
>> issues
>> with 64vcpu guests at the moment. While it is certainly fine to
>> remove
>> the hard limit at 255 vcpus, there is a lot of other work
>> required to
>> even get 128vcpu guests stable.
>
>
> Could you give some points to these issues? We are enabling more
> vcpus
> support and it can boot up 255 vcpus without IR support basically.
> It's
> very helpful to learn about known issues.
>
> We will also add more tests for 128 vcpus into our regular test to
> find
> related bugs. Increasing max vcpu to 255 should be a good start.

 Hi Andrew:
 Could you give more inputs about issues with 64 vcpus and what
 needs to
 be done to make 128vcpu guest stable? We hope to do somethings to
 improve them.

 What's progress of PCI host bridge in Xen? From your opinion, we
 should
 do that first, right? Thanks.
>>>
>>> Very sorry for the delay.
>>>
>>> There are multiple interacting issues here.  On the one side, it would
>>> be useful if we could have a central point of coordination on
>>> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
>>> would you mind organising that?
>>>
>>> For the qemu/xen interaction, the current state is woeful and a tangled
>>> mess.  I wish to ensure that we don't make any development decisions
>>> which makes the situation worse.
>>>
>>> In your case, the two motivations are quite different I would recommend
>>> dealing with them independently.
>>>
>>> IIRC, the issue with more than 255 cpus and interrupt remapping is that
>>> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
>>> can't be programmed to generate x2apic interrupts?  In principle, if
>>> you
>>> don't have an IOAPIC, are there any other issues to be considered? 
>>> What
>>> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
>>> deliver xapic interrupts?
>>
>> The key is the APIC ID. There is no modification to existing PCI MSI and
>> IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
>> interrupt message containing 8bit APIC ID, which cannot address >255
>> cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
>> enable >255 cpus with x2apic mode.
>>
>> If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC
>> cannot
>> deliver interrupts to all cpus in the system if #cpu > 255.
>
> Another key factor, Linux kernel disables x2apic mode when MAX APIC id
> is > 255 if no interrupt remapping function. The reason for this is what
> Kevin said. So booting up >255 cpus relies on the interrupt remapping.

That is an implementation decision of Linux, not an architectural
requirement.

We need to carefully distinguish the two (even if it doesn't affect the
planned outcome from Xen's point if view), as Linux is not the only
operating system we virtualise.


One interesting issue in this area is plain, no-frills HVMLite domains,
which have an LAPIC but no IOAPIC, as they have no legacy devices/PCI
bus/etc.  In this scenario, no vIOMMU would be required for x2apic mode,
even if the domain had >255 vcpus.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Andrew Cooper
On 03/06/16 12:17, Tian, Kevin wrote:
>> Very sorry for the delay.
>>
>> There are multiple interacting issues here.  On the one side, it would
>> be useful if we could have a central point of coordination on
>> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
>> would you mind organising that?
>>
>> For the qemu/xen interaction, the current state is woeful and a tangled
>> mess.  I wish to ensure that we don't make any development decisions
>> which makes the situation worse.
>>
>> In your case, the two motivations are quite different I would recommend
>> dealing with them independently.
>>
>> IIRC, the issue with more than 255 cpus and interrupt remapping is that
>> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
>> can't be programmed to generate x2apic interrupts?  In principle, if you
>> don't have an IOAPIC, are there any other issues to be considered?  What
>> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
>> deliver xapic interrupts?
> The key is the APIC ID. There is no modification to existing PCI MSI and
> IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
> interrupt message containing 8bit APIC ID, which cannot address >255
> cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
> enable >255 cpus with x2apic mode.

Thanks for clarifying.

>
> If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot
> deliver interrupts to all cpus in the system if #cpu > 255.

Ok.  So not ideal (and we certainly want to address it), but this isn't
a complete show stopper for a guest.

>> On the other side of things, what is IGD passthrough going to look like
>> in Skylake?  Is there any device-model interaction required (i.e. the
>> opregion), or will it work as a completely standalone device?  What are
>> your plans with the interaction of virtual graphics and shared virtual
>> memory?
>>
> The plan is to use a so-called universal pass-through driver in the guest
> which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.)

This is fantastic news.

>
> 
> Here is a brief of potential usages relying on vIOMMU:
>
> a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. 
> It requires interrupt remapping capability present on vIOMMU;
>
> b) support guest SVM (Shared Virtual Memory), which relies on the
> 1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU
> needs to enable both 1st level and 2nd level translation in nested
> mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is
> the main usage today (to support OpenCL 2.0 SVM feature). In the
> future SVM might be used by other I/O devices too;
>
> c) support VFIO-based user space driver (e.g. DPDK) in the guest,
> which relies on the 2nd level translation capability (IOVA->GPA) on 
> vIOMMU. pIOMMU 2nd level becomes a shadowing structure of
> vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA);

All of these look like interesting things to do.  I know there is a lot
of interest for b).

As a quick aside, does Xen currently boot on a Phi?  Last time I looked
at the Phi manual, I would expect Xen to crash on boot because of MCXSR
differences from more-common x86 hardware.

>
> 
> And below is my thought viability of implementing vIOMMU in Qemu:
>
> a) enable >255 vcpus:
>
>   o Enable Q35 in Qemu-Xen;
>   o Add interrupt remapping in Qemu vIOMMU;
>   o Virtual interrupt injection in hypervisor needs to know virtual
> interrupt remapping (IR) structure, since IR is behind vIOAPIC/vMSI,
> which requires new hypervisor interfaces as Andrew pointed out:
>   * either for hypervisor to query IR from Qemu which is not
> good;
>   * or for Qemu to register IR info to hypervisor which means
> partial IR knowledge implemented in hypervisor (then why not putting
> whole IR emulation in Xen?)
>
> b) support SVM
>
>   o Enable Q35 in Qemu-Xen;
>   o Add 1st level translation capability in Qemu vIOMMU;
>   o VT-d context entry points to guest 1st level translation table
> which is nest-translated by 2nd level translation table so vIOMMU
> structure can be directly linked. It means:
>   * Xen IOMMU driver enables nested mode;
>   * Introduce a new hypercall so Qemu vIOMMU can register
> GPA root of guest 1st level translation table which is then written
> to context entry in pIOMMU;
>
> c) support VFIO-based user space driver
>
>   o Enable Q35 in Qemu-Xen;
>   o Leverage existing 2nd level translation implementation in Qemu 
> vIOMMU;
>   o Change Xen IOMMU to support (IOVA->HPA) translation which
> means decouple current logic from P2M layer (only for GPA->HPA);
>   o As a means of shadowing approach, Xen IOMMU driver needs to
> know both (IOVA->GPA) and (GPA->HPA) info to update (IOVA->HPA)
> mapping in case of any one is changed. So new interface is required
> for Qemu vIOMMU to propagate 

Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Lan, Tianyu



On 6/3/2016 7:17 PM, Tian, Kevin wrote:

From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
Sent: Friday, June 03, 2016 2:59 AM

On 02/06/16 16:03, Lan, Tianyu wrote:

On 5/27/2016 4:19 PM, Lan Tianyu wrote:

On 2016年05月26日 19:35, Andrew Cooper wrote:

On 26/05/16 09:29, Lan Tianyu wrote:

To be viable going forwards, any solution must work with PVH/HVMLite as
much as HVM.  This alone negates qemu as a viable option.

From a design point of view, having Xen needing to delegate to qemu to
inject an interrupt into a guest seems backwards.



Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
the qemu virtual iommu can't work for it. We have to rewrite virtual
iommu in the Xen, right?



A whole lot of this would be easier to reason about if/when we get a
basic root port implementation in Xen, which is necessary for HVMLite,
and which will make the interaction with qemu rather more clean.  It is
probably worth coordinating work in this area.


The virtual iommu also should be under basic root port in Xen, right?



As for the individual issue of 288vcpu support, there are already
issues
with 64vcpu guests at the moment. While it is certainly fine to remove
the hard limit at 255 vcpus, there is a lot of other work required to
even get 128vcpu guests stable.



Could you give some points to these issues? We are enabling more vcpus
support and it can boot up 255 vcpus without IR support basically. It's
very helpful to learn about known issues.

We will also add more tests for 128 vcpus into our regular test to find
related bugs. Increasing max vcpu to 255 should be a good start.


Hi Andrew:
Could you give more inputs about issues with 64 vcpus and what needs to
be done to make 128vcpu guest stable? We hope to do somethings to
improve them.

What's progress of PCI host bridge in Xen? From your opinion, we should
do that first, right? Thanks.


Very sorry for the delay.

There are multiple interacting issues here.  On the one side, it would
be useful if we could have a central point of coordination on
PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
would you mind organising that?

For the qemu/xen interaction, the current state is woeful and a tangled
mess.  I wish to ensure that we don't make any development decisions
which makes the situation worse.

In your case, the two motivations are quite different I would recommend
dealing with them independently.

IIRC, the issue with more than 255 cpus and interrupt remapping is that
you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
can't be programmed to generate x2apic interrupts?  In principle, if you
don't have an IOAPIC, are there any other issues to be considered?  What
happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
deliver xapic interrupts?


The key is the APIC ID. There is no modification to existing PCI MSI and
IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
interrupt message containing 8bit APIC ID, which cannot address >255
cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
enable >255 cpus with x2apic mode.

If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot
deliver interrupts to all cpus in the system if #cpu > 255.


Another key factor, Linux kernel disables x2apic mode when MAX APIC id
is > 255 if no interrupt remapping function. The reason for this is what
Kevin said. So booting up >255 cpus relies on the interrupt remapping.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-03 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Friday, June 03, 2016 2:59 AM
> 
> On 02/06/16 16:03, Lan, Tianyu wrote:
> > On 5/27/2016 4:19 PM, Lan Tianyu wrote:
> >> On 2016年05月26日 19:35, Andrew Cooper wrote:
> >>> On 26/05/16 09:29, Lan Tianyu wrote:
> >>>
> >>> To be viable going forwards, any solution must work with PVH/HVMLite as
> >>> much as HVM.  This alone negates qemu as a viable option.
> >>>
> >>> From a design point of view, having Xen needing to delegate to qemu to
> >>> inject an interrupt into a guest seems backwards.
> >>>
> >>
> >> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
> >> the qemu virtual iommu can't work for it. We have to rewrite virtual
> >> iommu in the Xen, right?
> >>
> >>>
> >>> A whole lot of this would be easier to reason about if/when we get a
> >>> basic root port implementation in Xen, which is necessary for HVMLite,
> >>> and which will make the interaction with qemu rather more clean.  It is
> >>> probably worth coordinating work in this area.
> >>
> >> The virtual iommu also should be under basic root port in Xen, right?
> >>
> >>>
> >>> As for the individual issue of 288vcpu support, there are already
> >>> issues
> >>> with 64vcpu guests at the moment. While it is certainly fine to remove
> >>> the hard limit at 255 vcpus, there is a lot of other work required to
> >>> even get 128vcpu guests stable.
> >>
> >>
> >> Could you give some points to these issues? We are enabling more vcpus
> >> support and it can boot up 255 vcpus without IR support basically. It's
> >> very helpful to learn about known issues.
> >>
> >> We will also add more tests for 128 vcpus into our regular test to find
> >> related bugs. Increasing max vcpu to 255 should be a good start.
> >
> > Hi Andrew:
> > Could you give more inputs about issues with 64 vcpus and what needs to
> > be done to make 128vcpu guest stable? We hope to do somethings to
> > improve them.
> >
> > What's progress of PCI host bridge in Xen? From your opinion, we should
> > do that first, right? Thanks.
> 
> Very sorry for the delay.
> 
> There are multiple interacting issues here.  On the one side, it would
> be useful if we could have a central point of coordination on
> PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
> would you mind organising that?
> 
> For the qemu/xen interaction, the current state is woeful and a tangled
> mess.  I wish to ensure that we don't make any development decisions
> which makes the situation worse.
> 
> In your case, the two motivations are quite different I would recommend
> dealing with them independently.
> 
> IIRC, the issue with more than 255 cpus and interrupt remapping is that
> you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
> can't be programmed to generate x2apic interrupts?  In principle, if you
> don't have an IOAPIC, are there any other issues to be considered?  What
> happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
> deliver xapic interrupts?

The key is the APIC ID. There is no modification to existing PCI MSI and
IOAPIC with the introduction of x2apic. PCI MSI/IOAPIC can only send
interrupt message containing 8bit APIC ID, which cannot address >255
cpus. Interrupt remapping supports 32bit APIC ID so it's necessary to
enable >255 cpus with x2apic mode.

If LAPIC is in x2apic while interrupt remapping is disabled, IOAPIC cannot
deliver interrupts to all cpus in the system if #cpu > 255.

> 
> On the other side of things, what is IGD passthrough going to look like
> in Skylake?  Is there any device-model interaction required (i.e. the
> opregion), or will it work as a completely standalone device?  What are
> your plans with the interaction of virtual graphics and shared virtual
> memory?
> 

The plan is to use a so-called universal pass-through driver in the guest
which only accesses standard PCI resource (w/o opregion, PCH/MCH, etc.)


Here is a brief of potential usages relying on vIOMMU:

a) enable >255 vcpus on Xeon Phi, as the initial purpose of this thread. 
It requires interrupt remapping capability present on vIOMMU;

b) support guest SVM (Shared Virtual Memory), which relies on the
1st level translation table capability (GVA->GPA) on vIOMMU. pIOMMU
needs to enable both 1st level and 2nd level translation in nested
mode (GVA->GPA->HPA) for passthrough device. IGD passthrough is
the main usage today (to support OpenCL 2.0 SVM feature). In the
future SVM might be used by other I/O devices too;

c) support VFIO-based user space driver (e.g. DPDK) in the guest,
which relies on the 2nd level translation capability (IOVA->GPA) on 
vIOMMU. pIOMMU 2nd level becomes a shadowing structure of
vIOMMU 2nd level by replacing GPA with HPA (becomes IOVA->HPA);


And below is my thought viability of implementing vIOMMU in Qemu:

a) enable >255 vcpus:

o Enable Q35 in Qemu-Xen;
o Add interrupt remapping in Qemu vIOMMU;
o Virtual interrupt 

Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-02 Thread Andrew Cooper
On 02/06/16 16:03, Lan, Tianyu wrote:
> On 5/27/2016 4:19 PM, Lan Tianyu wrote:
>> On 2016年05月26日 19:35, Andrew Cooper wrote:
>>> On 26/05/16 09:29, Lan Tianyu wrote:
>>>
>>> To be viable going forwards, any solution must work with PVH/HVMLite as
>>> much as HVM.  This alone negates qemu as a viable option.
>>>
>>> From a design point of view, having Xen needing to delegate to qemu to
>>> inject an interrupt into a guest seems backwards.
>>>
>>
>> Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
>> the qemu virtual iommu can't work for it. We have to rewrite virtual
>> iommu in the Xen, right?
>>
>>>
>>> A whole lot of this would be easier to reason about if/when we get a
>>> basic root port implementation in Xen, which is necessary for HVMLite,
>>> and which will make the interaction with qemu rather more clean.  It is
>>> probably worth coordinating work in this area.
>>
>> The virtual iommu also should be under basic root port in Xen, right?
>>
>>>
>>> As for the individual issue of 288vcpu support, there are already
>>> issues
>>> with 64vcpu guests at the moment. While it is certainly fine to remove
>>> the hard limit at 255 vcpus, there is a lot of other work required to
>>> even get 128vcpu guests stable.
>>
>>
>> Could you give some points to these issues? We are enabling more vcpus
>> support and it can boot up 255 vcpus without IR support basically. It's
>> very helpful to learn about known issues.
>>
>> We will also add more tests for 128 vcpus into our regular test to find
>> related bugs. Increasing max vcpu to 255 should be a good start.
>
> Hi Andrew:
> Could you give more inputs about issues with 64 vcpus and what needs to
> be done to make 128vcpu guest stable? We hope to do somethings to
> improve them.
>
> What's progress of PCI host bridge in Xen? From your opinion, we should
> do that first, right? Thanks.

Very sorry for the delay.

There are multiple interacting issues here.  On the one side, it would
be useful if we could have a central point of coordination on
PVH/HVMLite work.  Roger - as the person who last did HVMLite work,
would you mind organising that?

For the qemu/xen interaction, the current state is woeful and a tangled
mess.  I wish to ensure that we don't make any development decisions
which makes the situation worse.

In your case, the two motivations are quite different I would recommend
dealing with them independently.

IIRC, the issue with more than 255 cpus and interrupt remapping is that
you can only use x2apic mode with more than 255 cpus, and IOAPIC RTEs
can't be programmed to generate x2apic interrupts?  In principle, if you
don't have an IOAPIC, are there any other issues to be considered?  What
happens if you configure the LAPICs in x2apic mode, but have the IOAPIC
deliver xapic interrupts?

On the other side of things, what is IGD passthrough going to look like
in Skylake?  Is there any device-model interaction required (i.e. the
opregion), or will it work as a completely standalone device?  What are
your plans with the interaction of virtual graphics and shared virtual
memory?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-06-02 Thread Lan, Tianyu

On 5/27/2016 4:19 PM, Lan Tianyu wrote:

On 2016年05月26日 19:35, Andrew Cooper wrote:

On 26/05/16 09:29, Lan Tianyu wrote:

To be viable going forwards, any solution must work with PVH/HVMLite as
much as HVM.  This alone negates qemu as a viable option.

From a design point of view, having Xen needing to delegate to qemu to
inject an interrupt into a guest seems backwards.



Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
the qemu virtual iommu can't work for it. We have to rewrite virtual
iommu in the Xen, right?



A whole lot of this would be easier to reason about if/when we get a
basic root port implementation in Xen, which is necessary for HVMLite,
and which will make the interaction with qemu rather more clean.  It is
probably worth coordinating work in this area.


The virtual iommu also should be under basic root port in Xen, right?



As for the individual issue of 288vcpu support, there are already issues
with 64vcpu guests at the moment. While it is certainly fine to remove
the hard limit at 255 vcpus, there is a lot of other work required to
even get 128vcpu guests stable.



Could you give some points to these issues? We are enabling more vcpus
support and it can boot up 255 vcpus without IR support basically. It's
very helpful to learn about known issues.

We will also add more tests for 128 vcpus into our regular test to find
related bugs. Increasing max vcpu to 255 should be a good start.


Hi Andrew:
Could you give more inputs about issues with 64 vcpus and what needs to
be done to make 128vcpu guest stable? We hope to do somethings to
improve them.

What's progress of PCI host bridge in Xen? From your opinion, we should
do that first, right? Thanks.










~Andrew






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-31 Thread George Dunlap
On Thu, May 26, 2016 at 12:35 PM, Andrew Cooper
 wrote:
> On 26/05/16 09:29, Lan Tianyu wrote:
>> Hi All:
>> We try pushing virtual iommu support for Xen guest and there are some
>> features blocked by it.
>>
>> Motivation:
>> ---
>> 1) Add SVM(Shared Virtual Memory) support for Xen guest
>> To support iGFX pass-through for SVM enabled devices, it requires
>> virtual iommu support to emulate related registers and intercept/handle
>> guest SVM configure in the VMM.
>>
>> 2) Increase max vcpu support for one VM.
>>
>> So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
>> Computing) cloud computing, it requires more vcpus support in a single
>> VM. The usage model is to create just one VM on a machine with the
>> same number vcpus as logical cpus on the host and pin vcpu on each
>> logical cpu in order to get good compute performance.
>>
>> Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
>> supports 288 logical cpus. So we hope VM can support 288 vcpu
>> to meet HPC requirement.
>>
>> Current Linux kernel requires IR(interrupt remapping) when MAX APIC
>> ID is > 255 because interrupt only can be delivered among 0~255 cpus
>> without IR. IR in VM relies on the virtual iommu support.
>>
>> KVM Virtual iommu support status
>> 
>> Current, Qemu has a basic virtual iommu to do address translation for
>> virtual device and it only works for the Q35 machine type. KVM reuses it
>> and Redhat is adding IR to support more than 255 vcpus.
>>
>> How to add virtual iommu for Xen?
>> -
>> First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
>> support Q35 so far. Enabling Q35 for Xen seems not a short term task.
>> Anthony did some related jobs before.
>>
>> I'd like to see your comments about how to implement virtual iommu for Xen.
>>
>> 1) Reuse Qemu virtual iommu or write a separate one for Xen?
>> 2) Enable Q35 for Xen to reuse Qemu virtual iommu?
>>
>> Your comments are very appreciated. Thanks a lot.
>
> To be viable going forwards, any solution must work with PVH/HVMLite as
> much as HVM.  This alone negates qemu as a viable option.

There's a big difference between "suboptimal" and "not viable".
Obviously it would be nice to be able to have HVMLite do graphics
pass-through, but if this functionality ends up being HVM-only, is
that really such a huge issue?

If as Paul seems to indicate, the extra work to get the functionality
in Xen isn't very large, then it's worth pursuing; but I don't think
we should take other options off the table.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Tian, Kevin
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Friday, May 27, 2016 4:47 PM
> > >
> > > A whole lot of this would be easier to reason about if/when we get a
> > > basic root port implementation in Xen, which is necessary for HVMLite,
> > > and which will make the interaction with qemu rather more clean.  It is
> > > probably worth coordinating work in this area.
> >
> > Would it make Xen too complex? Qemu also has its own root port
> > implementation, and then you need some tricks within Qemu to not
> > use its own root port but instead registering to Xen root port. Why is
> > such movement more clean?
> >
> 
> Upstream QEMU already registers PCI BDFs with Xen, and Xen already handles 
> cf8 and cfc
> accesses (to turn them into single config space read/write ioreqs). So, it 
> really isn't much
> of a leap to put the root port implementation in Xen.
> 
>   Paul
> 

Thanks for your information. I didn't realize that fact.

Curious is there anyone already working on a basic root port support
in Xen? If yes, what's the current progress?

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Tian, Kevin
> Sent: 27 May 2016 09:35
> To: Andrew Cooper; Lan, Tianyu; jbeul...@suse.com; sstabell...@kernel.org;
> Ian Jackson; xen-de...@lists.xensource.com; Eddie Dong; Nakajima, Jun;
> yang.zhang...@gmail.com; Anthony Perard
> Subject: Re: [Xen-devel] Discussion about virtual iommu support for Xen
> guest
> 
> > From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> > Sent: Thursday, May 26, 2016 7:36 PM
> >
> > On 26/05/16 09:29, Lan Tianyu wrote:
> > > Hi All:
> > > We try pushing virtual iommu support for Xen guest and there are some
> > > features blocked by it.
> > >
> > > Motivation:
> > > ---
> > > 1) Add SVM(Shared Virtual Memory) support for Xen guest
> > > To support iGFX pass-through for SVM enabled devices, it requires
> > > virtual iommu support to emulate related registers and intercept/handle
> > > guest SVM configure in the VMM.
> > >
> > > 2) Increase max vcpu support for one VM.
> > >
> > > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
> > > Computing) cloud computing, it requires more vcpus support in a single
> > > VM. The usage model is to create just one VM on a machine with the
> > > same number vcpus as logical cpus on the host and pin vcpu on each
> > > logical cpu in order to get good compute performance.
> > >
> > > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
> > > supports 288 logical cpus. So we hope VM can support 288 vcpu
> > > to meet HPC requirement.
> > >
> > > Current Linux kernel requires IR(interrupt remapping) when MAX APIC
> > > ID is > 255 because interrupt only can be delivered among 0~255 cpus
> > > without IR. IR in VM relies on the virtual iommu support.
> > >
> > > KVM Virtual iommu support status
> > > 
> > > Current, Qemu has a basic virtual iommu to do address translation for
> > > virtual device and it only works for the Q35 machine type. KVM reuses it
> > > and Redhat is adding IR to support more than 255 vcpus.
> > >
> > > How to add virtual iommu for Xen?
> > > -
> > > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
> > > support Q35 so far. Enabling Q35 for Xen seems not a short term task.
> > > Anthony did some related jobs before.
> > >
> > > I'd like to see your comments about how to implement virtual iommu for
> Xen.
> > >
> > > 1) Reuse Qemu virtual iommu or write a separate one for Xen?
> > > 2) Enable Q35 for Xen to reuse Qemu virtual iommu?
> > >
> > > Your comments are very appreciated. Thanks a lot.
> >
> > To be viable going forwards, any solution must work with PVH/HVMLite as
> > much as HVM.  This alone negates qemu as a viable option.
> 
> KVM wants things done in Qemu as much as possible. Now Xen may
> have more things moved into hypervisor instead for HVMLite. The end
> result is that many new platform features from IHVs will require
> double effort in the future (nvdimm is another example) which means
> much longer enabling path to bring those new features to customers.
> 
> I can understand the importance of covering HVMLite in Xen community,
> but is it really the only factor to negate Qemu option?
> 
> >
> > From a design point of view, having Xen needing to delegate to qemu to
> > inject an interrupt into a guest seems backwards.
> >
> >
> > A whole lot of this would be easier to reason about if/when we get a
> > basic root port implementation in Xen, which is necessary for HVMLite,
> > and which will make the interaction with qemu rather more clean.  It is
> > probably worth coordinating work in this area.
> 
> Would it make Xen too complex? Qemu also has its own root port
> implementation, and then you need some tricks within Qemu to not
> use its own root port but instead registering to Xen root port. Why is
> such movement more clean?
> 

Upstream QEMU already registers PCI BDFs with Xen, and Xen already handles cf8 
and cfc accesses (to turn them into single config space read/write ioreqs). So, 
it really isn't much of a leap to put the root port implementation in Xen.

  Paul

> >
> >
> > As for the individual issue of 288vcpu support, there are already issues
> > with 64vcpu guests at the moment.  While it is certainly fine to remove
> > the hard limit at 255 vcpus, there is a lot of other work required to
> > even get 128vcpu guests stable.
> >
> 
> Thanks
> Kevin
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Thursday, May 26, 2016 7:36 PM
> 
> On 26/05/16 09:29, Lan Tianyu wrote:
> > Hi All:
> > We try pushing virtual iommu support for Xen guest and there are some
> > features blocked by it.
> >
> > Motivation:
> > ---
> > 1) Add SVM(Shared Virtual Memory) support for Xen guest
> > To support iGFX pass-through for SVM enabled devices, it requires
> > virtual iommu support to emulate related registers and intercept/handle
> > guest SVM configure in the VMM.
> >
> > 2) Increase max vcpu support for one VM.
> >
> > So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
> > Computing) cloud computing, it requires more vcpus support in a single
> > VM. The usage model is to create just one VM on a machine with the
> > same number vcpus as logical cpus on the host and pin vcpu on each
> > logical cpu in order to get good compute performance.
> >
> > Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
> > supports 288 logical cpus. So we hope VM can support 288 vcpu
> > to meet HPC requirement.
> >
> > Current Linux kernel requires IR(interrupt remapping) when MAX APIC
> > ID is > 255 because interrupt only can be delivered among 0~255 cpus
> > without IR. IR in VM relies on the virtual iommu support.
> >
> > KVM Virtual iommu support status
> > 
> > Current, Qemu has a basic virtual iommu to do address translation for
> > virtual device and it only works for the Q35 machine type. KVM reuses it
> > and Redhat is adding IR to support more than 255 vcpus.
> >
> > How to add virtual iommu for Xen?
> > -
> > First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
> > support Q35 so far. Enabling Q35 for Xen seems not a short term task.
> > Anthony did some related jobs before.
> >
> > I'd like to see your comments about how to implement virtual iommu for Xen.
> >
> > 1) Reuse Qemu virtual iommu or write a separate one for Xen?
> > 2) Enable Q35 for Xen to reuse Qemu virtual iommu?
> >
> > Your comments are very appreciated. Thanks a lot.
> 
> To be viable going forwards, any solution must work with PVH/HVMLite as
> much as HVM.  This alone negates qemu as a viable option.

KVM wants things done in Qemu as much as possible. Now Xen may 
have more things moved into hypervisor instead for HVMLite. The end
result is that many new platform features from IHVs will require
double effort in the future (nvdimm is another example) which means
much longer enabling path to bring those new features to customers.

I can understand the importance of covering HVMLite in Xen community,
but is it really the only factor to negate Qemu option?

> 
> From a design point of view, having Xen needing to delegate to qemu to
> inject an interrupt into a guest seems backwards.
> 
> 
> A whole lot of this would be easier to reason about if/when we get a
> basic root port implementation in Xen, which is necessary for HVMLite,
> and which will make the interaction with qemu rather more clean.  It is
> probably worth coordinating work in this area.

Would it make Xen too complex? Qemu also has its own root port 
implementation, and then you need some tricks within Qemu to not
use its own root port but instead registering to Xen root port. Why is
such movement more clean?

> 
> 
> As for the individual issue of 288vcpu support, there are already issues
> with 64vcpu guests at the moment.  While it is certainly fine to remove
> the hard limit at 255 vcpus, there is a lot of other work required to
> even get 128vcpu guests stable.
> 

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Lan Tianyu
On 2016年05月26日 19:35, Andrew Cooper wrote:
> On 26/05/16 09:29, Lan Tianyu wrote:
> 
> To be viable going forwards, any solution must work with PVH/HVMLite as
> much as HVM.  This alone negates qemu as a viable option.
> 
> From a design point of view, having Xen needing to delegate to qemu to
> inject an interrupt into a guest seems backwards.
>

Sorry, I am not familiar with HVMlite. HVMlite doesn't use Qemu and
the qemu virtual iommu can't work for it. We have to rewrite virtual
iommu in the Xen, right?

> 
> A whole lot of this would be easier to reason about if/when we get a
> basic root port implementation in Xen, which is necessary for HVMLite,
> and which will make the interaction with qemu rather more clean.  It is
> probably worth coordinating work in this area.

The virtual iommu also should be under basic root port in Xen, right?

> 
> As for the individual issue of 288vcpu support, there are already issues
> with 64vcpu guests at the moment. While it is certainly fine to remove
> the hard limit at 255 vcpus, there is a lot of other work required to
> even get 128vcpu guests stable.


Could you give some points to these issues? We are enabling more vcpus
support and it can boot up 255 vcpus without IR support basically. It's
very helpful to learn about known issues.

We will also add more tests for 128 vcpus into our regular test to find
related bugs. Increasing max vcpu to 255 should be a good start.





> 
> ~Andrew
> 


-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Tian, Kevin
> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> Sent: Friday, May 27, 2016 10:26 AM
> 
> On 2016/5/26 16:29, Lan Tianyu wrote:
> > Hi All:
> > We try pushing virtual iommu support for Xen guest and there are some
> > features blocked by it.
> >
> > Motivation:
> > ---
> > 1) Add SVM(Shared Virtual Memory) support for Xen guest
> > To support iGFX pass-through for SVM enabled devices, it requires
> > virtual iommu support to emulate related registers and intercept/handle
> > guest SVM configure in the VMM.
> 
> IIRC, SVM needs the nested IOMMU support not only virtual iommu. Correct
> me if i am wrong.
> 

nested in physical IOMMU. You don't need to present nested in vIOMMU.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Tian, Kevin
> From: Lan, Tianyu
> Sent: Friday, May 27, 2016 10:27 AM
> 
> On 2016年05月26日 16:42, Dong, Eddie wrote:
> > If enabling virtual Q35 solves the problem, it has the advantage: When more 
> > and more
> virtual IOMMU feature comes (likely), we can reuse the KVM code for Xen.
> > How big is the effort for virtual Q35?
> 
> I think the most effort are to rebuild all ACPI tables for Q35 and add
> Q35 support in the hvmloader. My concern is about new ACPI tables'
> compatibility issue. Especially with Windows guest.
> 

Another question is how tightly this vIOMMU implementation is bound to Q35?
Can it work with old chipset too and if yes how big is the effort compared to
others?

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-27 Thread Yang Zhang

On 2016/5/26 16:29, Lan Tianyu wrote:

Hi All:
We try pushing virtual iommu support for Xen guest and there are some
features blocked by it.

Motivation:
---
1) Add SVM(Shared Virtual Memory) support for Xen guest
To support iGFX pass-through for SVM enabled devices, it requires
virtual iommu support to emulate related registers and intercept/handle
guest SVM configure in the VMM.


IIRC, SVM needs the nested IOMMU support not only virtual iommu. Correct 
me if i am wrong.




2) Increase max vcpu support for one VM.

So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
Computing) cloud computing, it requires more vcpus support in a single
VM. The usage model is to create just one VM on a machine with the
same number vcpus as logical cpus on the host and pin vcpu on each
logical cpu in order to get good compute performance.

Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
supports 288 logical cpus. So we hope VM can support 288 vcpu
to meet HPC requirement.

Current Linux kernel requires IR(interrupt remapping) when MAX APIC
ID is > 255 because interrupt only can be delivered among 0~255 cpus
without IR. IR in VM relies on the virtual iommu support.

KVM Virtual iommu support status

Current, Qemu has a basic virtual iommu to do address translation for
virtual device and it only works for the Q35 machine type. KVM reuses it
and Redhat is adding IR to support more than 255 vcpus.

How to add virtual iommu for Xen?
-
First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
support Q35 so far. Enabling Q35 for Xen seems not a short term task.
Anthony did some related jobs before.

I'd like to see your comments about how to implement virtual iommu for Xen.

1) Reuse Qemu virtual iommu or write a separate one for Xen?
2) Enable Q35 for Xen to reuse Qemu virtual iommu?

Your comments are very appreciated. Thanks a lot.




--
best regards
yang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-26 Thread Lan Tianyu
On 2016年05月26日 16:42, Dong, Eddie wrote:
> If enabling virtual Q35 solves the problem, it has the advantage: When more 
> and more virtual IOMMU feature comes (likely), we can reuse the KVM code for 
> Xen.
> How big is the effort for virtual Q35?

I think the most effort are to rebuild all ACPI tables for Q35 and add
Q35 support in the hvmloader. My concern is about new ACPI tables'
compatibility issue. Especially with Windows guest.

-- 
Best regards
Tianyu Lan

> 
> Thx Eddie
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-26 Thread Andrew Cooper
On 26/05/16 09:29, Lan Tianyu wrote:
> Hi All:
> We try pushing virtual iommu support for Xen guest and there are some
> features blocked by it.
>
> Motivation:
> ---
> 1) Add SVM(Shared Virtual Memory) support for Xen guest
> To support iGFX pass-through for SVM enabled devices, it requires
> virtual iommu support to emulate related registers and intercept/handle
> guest SVM configure in the VMM.
>
> 2) Increase max vcpu support for one VM.
>
> So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
> Computing) cloud computing, it requires more vcpus support in a single
> VM. The usage model is to create just one VM on a machine with the
> same number vcpus as logical cpus on the host and pin vcpu on each
> logical cpu in order to get good compute performance.
>
> Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
> supports 288 logical cpus. So we hope VM can support 288 vcpu
> to meet HPC requirement.
>
> Current Linux kernel requires IR(interrupt remapping) when MAX APIC
> ID is > 255 because interrupt only can be delivered among 0~255 cpus
> without IR. IR in VM relies on the virtual iommu support.
>
> KVM Virtual iommu support status
> 
> Current, Qemu has a basic virtual iommu to do address translation for
> virtual device and it only works for the Q35 machine type. KVM reuses it
> and Redhat is adding IR to support more than 255 vcpus.
>
> How to add virtual iommu for Xen?
> -
> First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
> support Q35 so far. Enabling Q35 for Xen seems not a short term task.
> Anthony did some related jobs before.
>
> I'd like to see your comments about how to implement virtual iommu for Xen.
>
> 1) Reuse Qemu virtual iommu or write a separate one for Xen?
> 2) Enable Q35 for Xen to reuse Qemu virtual iommu?
>
> Your comments are very appreciated. Thanks a lot.

To be viable going forwards, any solution must work with PVH/HVMLite as
much as HVM.  This alone negates qemu as a viable option.

From a design point of view, having Xen needing to delegate to qemu to
inject an interrupt into a guest seems backwards.


A whole lot of this would be easier to reason about if/when we get a
basic root port implementation in Xen, which is necessary for HVMLite,
and which will make the interaction with qemu rather more clean.  It is
probably worth coordinating work in this area.


As for the individual issue of 288vcpu support, there are already issues
with 64vcpu guests at the moment.  While it is certainly fine to remove
the hard limit at 255 vcpus, there is a lot of other work required to
even get 128vcpu guests stable.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-26 Thread Dong, Eddie
If enabling virtual Q35 solves the problem, it has the advantage: When more and 
more virtual IOMMU feature comes (likely), we can reuse the KVM code for Xen.
How big is the effort for virtual Q35?

Thx Eddie

> -Original Message-
> From: Lan, Tianyu
> Sent: Thursday, May 26, 2016 4:30 PM
> To: jbeul...@suse.com; sstabell...@kernel.org; ian.jack...@eu.citrix.com;
> xen-de...@lists.xensource.com; Tian, Kevin ; Dong,
> Eddie ; Nakajima, Jun ;
> yang.zhang...@gmail.com; anthony.per...@citrix.com
> Subject: Discussion about virtual iommu support for Xen guest
> 
> Hi All:
> We try pushing virtual iommu support for Xen guest and there are some
> features blocked by it.
> 
> Motivation:
> ---
> 1) Add SVM(Shared Virtual Memory) support for Xen guest To support iGFX
> pass-through for SVM enabled devices, it requires virtual iommu support to
> emulate related registers and intercept/handle guest SVM configure in the
> VMM.
> 
> 2) Increase max vcpu support for one VM.
> 
> So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
> Computing) cloud computing, it requires more vcpus support in a single VM.
> The usage model is to create just one VM on a machine with the same number
> vcpus as logical cpus on the host and pin vcpu on each logical cpu in order 
> to get
> good compute performance.
> 
> Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and supports
> 288 logical cpus. So we hope VM can support 288 vcpu to meet HPC
> requirement.
> 
> Current Linux kernel requires IR(interrupt remapping) when MAX APIC ID is >
> 255 because interrupt only can be delivered among 0~255 cpus without IR. IR in
> VM relies on the virtual iommu support.
> 
> KVM Virtual iommu support status
> 
> Current, Qemu has a basic virtual iommu to do address translation for virtual
> device and it only works for the Q35 machine type. KVM reuses it and Redhat is
> adding IR to support more than 255 vcpus.
> 
> How to add virtual iommu for Xen?
> -
> First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
> support Q35 so far. Enabling Q35 for Xen seems not a short term task.
> Anthony did some related jobs before.
> 
> I'd like to see your comments about how to implement virtual iommu for Xen.
> 
> 1) Reuse Qemu virtual iommu or write a separate one for Xen?
> 2) Enable Q35 for Xen to reuse Qemu virtual iommu?
> 
> Your comments are very appreciated. Thanks a lot.
> --
> Best regards
> Tianyu Lan
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Discussion about virtual iommu support for Xen guest

2016-05-26 Thread Lan Tianyu
Hi All:
We try pushing virtual iommu support for Xen guest and there are some
features blocked by it.

Motivation:
---
1) Add SVM(Shared Virtual Memory) support for Xen guest
To support iGFX pass-through for SVM enabled devices, it requires
virtual iommu support to emulate related registers and intercept/handle
guest SVM configure in the VMM.

2) Increase max vcpu support for one VM.

So far, max vcpu for Xen hvm guest is 128. For HPC(High Performance
Computing) cloud computing, it requires more vcpus support in a single
VM. The usage model is to create just one VM on a machine with the
same number vcpus as logical cpus on the host and pin vcpu on each
logical cpu in order to get good compute performance.

Intel Xeon phi KNL(Knights Landing) is dedicated to HPC market and
supports 288 logical cpus. So we hope VM can support 288 vcpu
to meet HPC requirement.

Current Linux kernel requires IR(interrupt remapping) when MAX APIC
ID is > 255 because interrupt only can be delivered among 0~255 cpus
without IR. IR in VM relies on the virtual iommu support.

KVM Virtual iommu support status

Current, Qemu has a basic virtual iommu to do address translation for
virtual device and it only works for the Q35 machine type. KVM reuses it
and Redhat is adding IR to support more than 255 vcpus.

How to add virtual iommu for Xen?
-
First idea came to my mind is to reuse Qemu virtual iommu but Xen didn't
support Q35 so far. Enabling Q35 for Xen seems not a short term task.
Anthony did some related jobs before.

I'd like to see your comments about how to implement virtual iommu for Xen.

1) Reuse Qemu virtual iommu or write a separate one for Xen?
2) Enable Q35 for Xen to reuse Qemu virtual iommu?

Your comments are very appreciated. Thanks a lot.
-- 
Best regards
Tianyu Lan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel