On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.or...@amd.com> wrote:

> Hi Rahul,
>


Hello all

[sorry for the possible format issues]


>
> On 26/10/2022 16:33, Rahul Singh wrote:
> >
> >
> > Hi Julien,
> >
> >> On 26 Oct 2022, at 2:36 pm, Julien Grall <jul...@xen.org> wrote:
> >>
> >>
> >>
> >> On 26/10/2022 14:17, Rahul Singh wrote:
> >>> Hi All,
> >>
> >> Hi Rahul,
> >>
> >>> At Arm, we started to implement the POC to support 2 levels of page
> tables/nested translation in SMMUv3.
> >>> To support nested translation for guest OS Xen needs to expose the
> virtual IOMMU. If we passthrough the
> >>> device to the guest that is behind an IOMMU and virtual IOMMU is
> enabled for the guest there is a need to
> >>> add IOMMU binding for the device in the passthrough node as per [1].
> This email is to get an agreement on
> >>> how to add the IOMMU binding for guest OS.
> >>> Before I will explain how to add the IOMMU binding let me give a brief
> overview of how we will add support for virtual
> >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3
> Nested translation support. SMMUv3 hardware
> >>> supports two stages of translation. Each stage of translation can be
> independently enabled. An incoming address is logically
> >>> translated from VA to IPA in stage 1, then the IPA is input to stage 2
> which translates the IPA to the output PA. Stage 1 is
> >>> intended to be used by a software entity( Guest OS) to provide
> isolation or translation to buffers within the entity, for example,
> >>> DMA isolation within an OS. Stage 2 is intended to be available in
> systems supporting the Virtualization Extensions and is
> >>> intended to virtualize device DMA to guest VM address spaces. When
> both stage 1 and stage 2 are enabled, the translation
> >>> configuration is called nesting.
> >>> Stage 1 translation support is required to provide isolation between
> different devices within the guest OS. XEN already supports
> >>> Stage 2 translation but there is no support for Stage 1 translation
> for guests. We will add support for guests to configure
> >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU
> hardware and exposes the virtual SMMU to the guest.
> >>> Guest can use the native SMMU driver to configure the stage 1
> translation. When the guest configures the SMMU for Stage 1,
> >>> XEN will trap the access and configure the hardware accordingly.
> >>> Now back to the question of how we can add the IOMMU binding between
> the virtual IOMMU and the master devices so that
> >>> guests can configure the IOMMU correctly. The solution that I am
> suggesting is as below:
> >>> For dom0, while handling the DT node(handle_node()) Xen will replace
> the phandle in the "iommus" property with the virtual
> >>> IOMMU node phandle.
> >> Below, you said that each IOMMUs may have a different ID space. So
> shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the
> user to specify the mapping?
> >
> > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This
> also helps in the ACPI case
> > where we don’t need to modify the tables to delete the pIOMMU entries
> and create one vIOMMU.
> > In this case, no need to replace the phandle as Xen create the vIOMMU
> with the same pIOMMU
> > phandle and same base address.
> >
> > For domU guests one vIOMMU per guest will be created.
> >
> >>
> >>> For domU guests, when passthrough the device to the guest as per [2],
> add the below property in the partial device tree
> >>> node that is required to describe the generic device tree binding for
> IOMMUs and their master(s)
> >>> "iommus = < &magic_phandle 0xvMasterID>
> >>>      • magic_phandle will be the phandle ( vIOMMU phandle in xl)  that
> will be documented so that the user can set that in partial DT node
> (0xfdea).
> >>
> >> Does this mean only one IOMMU will be supported in the guest?
> >
> > Yes.
> >
> >>
> >>>      • vMasterID will be the virtual master ID that the user will
> provide.
> >>> The partial device tree will look like this:
> >>> /dts-v1/;
> >>>  / {
> >>>     /* #*cells are here to keep DTC happy */
> >>>     #address-cells = <2>;
> >>>     #size-cells = <2>;
> >>>       aliases {
> >>>         net = &mac0;
> >>>     };
> >>>       passthrough {
> >>>         compatible = "simple-bus";
> >>>         ranges;
> >>>         #address-cells = <2>;
> >>>         #size-cells = <2>;
> >>>         mac0: ethernet@10000000 {
> >>>             compatible = "calxeda,hb-xgmac";
> >>>             reg = <0 0x10000000 0 0x1000>;
> >>>             interrupts = <0 80 4  0 81 4  0 82 4>;
> >>>            iommus = <0xfdea 0x01>;
> >>>         };
> >>>     };
> >>> };
> >>>  In xl.cfg we need to define a new option to inform Xen about
> vMasterId to pMasterId mapping and to which IOMMU device this
> >>> the master device is connected so that Xen can configure the right
> IOMMU. This is required if the system has devices that have
> >>> the same master ID but behind a different IOMMU.
> >>
> >> In xl.cfg, we already pass the device-tree node path to passthrough. So
> Xen should already have all the information about the IOMMU and Master-ID.
> So it doesn't seem necessary for Device-Tree.
> >>
> >> For ACPI, I would have expected the information to be found in the
> IOREQ.
> >>
> >> So can you add more context why this is necessary for everyone?
> >
> > We have information for IOMMU and Master-ID but we don’t have
> information for linking vMaster-ID to pMaster-ID.
> > The device tree node will be used to assign the device to the guest and
> configure the Stage-2 translation. Guest will use the
> > vMaster-ID to configure the vIOMMU during boot. Xen needs information to
> link vMaster-ID to pMaster-ID to configure
> > the corresponding pIOMMU. As I mention we need vMaster-ID in case a
> system could have 2 identical Master-ID but
> > each one connected to a different SMMU and assigned to the guest.
>
> I think the proposed solution would work and I would just like to clear
> some issues.
>
> Please correct me if I'm wrong:
>
> In the xl config file we already need to specify dtdev to point to the
> device path in host dtb.
> In the partial device tree we specify the vMasterId as well as magic
> phandle.
> Isn't it that we already have all the information necessary without the
> need for iommu_devid_map?
> For me it looks like the partial dtb provides vMasterID and dtdev provides
> pMasterID as well as physical phandle to SMMU.
>
> Having said that, I can also understand that specifying everything in one
> place using iommu_devid_map can be easier
> and reduces the need for device tree parsing.
>
> Apart from that, what is the reason of exposing only one vSMMU to guest
> instead of one vSMMU per pSMMU?
> In the latter solution, the whole issue with handling devices with the
> same stream ID but belonging to different SMMUs
> would be gone. It would also result in a more natural way of the device
> tree look. Normally a guest would see
> e.g. both SMMUs and exposing only one can be misleading.
>

I also have the same question. From earlier answers as I understand it is
going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge for
DomU?

Also I am thinking how this solution would work for IPMMU-VMSA Gen3(Gen4),
which also supports two stages of translation, so the nested translation
could be possible in general, although there might be some pitfalls
(yes, I understand that code to emulate access to control registers would
be different in comparison with SMMUv3, but some other code could be
common).




>
> >>
> >>>  iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” ,
> “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”]
> >>>      • PMASTER_ID is the physical master ID of the device from the
> physical DT.
> >>>      • VMASTER_ID is the virtual master Id that the user will
> configure in the partial device tree.
> >>>      • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU
> device to which this device is connected.



If iommu_devid_map is a way to go, I have a question, would this
configuration cover the following cases?
1. Device has several stream IDs
2. Several devices share the stream ID (or several stream IDs)




> >>
> >> Below you give an example for Platform device. How would that fit in
> the context of PCI passthrough?
> >
> > In PCI passthrough case, xl will create the "iommu-map" property in vpci
> host bridge node with phandle to vIOMMU node.
> > vSMMUv3 node will be created in xl.
> >
> >>
> >>>  Example: Let's say the user wants to assign the below physical device
> in DT to the guest.
> >>>  iommu@4f000000 {
> >>>                 compatible = "arm,smmu-v3";
> >>>                      interrupts = <0x00 0xe4 0xf04>;
> >>>                 interrupt-parent = <0x01>;
> >>>                 #iommu-cells = <0x01>;
> >>>                 interrupt-names = "combined";
> >>>                 reg = <0x00 0x4f000000 0x00 0x40000>;
> >>>                 phandle = <0xfdeb>;
> >>>                 name = "iommu";
> >>> };
> >>
> >> So I guess this node will be written by Xen. How will you the case
> where there are extra property to added (e.g. dma-coherent)?
> >
> > In this example this is physical IOMMU node. vIOMMU node wil be created
> by xl during guest creation.
> >>
> >>>  test@10000000 {
> >>>      compatible = "viommu-test”;
> >>>      iommus = <0xfdeb 0x10>;
> >>
> >> I am a bit confused. Here you use 0xfdeb for the phandle but below...
> >
> > Here 0xfdeb is the physical IOMMU node phandle...
> >>
> >>>      interrupts = <0x00 0xff 0x04>;
> >>>      reg = <0x00 0x10000000 0x00 0x1000>;
> >>>      name = "viommu-test";
> >>> };
> >>>  The partial Device tree node will be like this:
> >>>  / {
> >>>     /* #*cells are here to keep DTC happy */
> >>>     #address-cells = <2>;
> >>>     #size-cells = <2>;
> >>>       passthrough {
> >>>         compatible = "simple-bus";
> >>>         ranges;
> >>>         #address-cells = <2>;
> >>>         #size-cells = <2>;
> >>>      test@10000000 {
> >>>              compatible = "viommu-test";
> >>>              reg = <0 0x10000000 0 0x1000>;
> >>>              interrupts = <0 80 4  0 81 4  0 82 4>;
> >>>              iommus = <0xfdea 0x01>;
> >>
> >> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle?
> >
> > but here user has to set the “iommus” property with magic phanle as
> explained earlier. 0xfdea is magic phandle.
> >
> > Regards,
> > Rahul
>
> ~Michal
>
>
>

-- 
Regards,

Oleksandr Tyshchenko

Reply via email to