On Wed, Oct 26, 2022 at 8:18 PM Michal Orzel <michal.or...@amd.com> wrote:
> Hi Rahul, > Hello all [sorry for the possible format issues] > > On 26/10/2022 16:33, Rahul Singh wrote: > > > > > > Hi Julien, > > > >> On 26 Oct 2022, at 2:36 pm, Julien Grall <jul...@xen.org> wrote: > >> > >> > >> > >> On 26/10/2022 14:17, Rahul Singh wrote: > >>> Hi All, > >> > >> Hi Rahul, > >> > >>> At Arm, we started to implement the POC to support 2 levels of page > tables/nested translation in SMMUv3. > >>> To support nested translation for guest OS Xen needs to expose the > virtual IOMMU. If we passthrough the > >>> device to the guest that is behind an IOMMU and virtual IOMMU is > enabled for the guest there is a need to > >>> add IOMMU binding for the device in the passthrough node as per [1]. > This email is to get an agreement on > >>> how to add the IOMMU binding for guest OS. > >>> Before I will explain how to add the IOMMU binding let me give a brief > overview of how we will add support for virtual > >>> IOMMU on Arm. In order to implement virtual IOMMU Xen need SMMUv3 > Nested translation support. SMMUv3 hardware > >>> supports two stages of translation. Each stage of translation can be > independently enabled. An incoming address is logically > >>> translated from VA to IPA in stage 1, then the IPA is input to stage 2 > which translates the IPA to the output PA. Stage 1 is > >>> intended to be used by a software entity( Guest OS) to provide > isolation or translation to buffers within the entity, for example, > >>> DMA isolation within an OS. Stage 2 is intended to be available in > systems supporting the Virtualization Extensions and is > >>> intended to virtualize device DMA to guest VM address spaces. When > both stage 1 and stage 2 are enabled, the translation > >>> configuration is called nesting. > >>> Stage 1 translation support is required to provide isolation between > different devices within the guest OS. XEN already supports > >>> Stage 2 translation but there is no support for Stage 1 translation > for guests. We will add support for guests to configure > >>> the Stage 1 transition via virtual IOMMU. XEN will emulate the SMMU > hardware and exposes the virtual SMMU to the guest. > >>> Guest can use the native SMMU driver to configure the stage 1 > translation. When the guest configures the SMMU for Stage 1, > >>> XEN will trap the access and configure the hardware accordingly. > >>> Now back to the question of how we can add the IOMMU binding between > the virtual IOMMU and the master devices so that > >>> guests can configure the IOMMU correctly. The solution that I am > suggesting is as below: > >>> For dom0, while handling the DT node(handle_node()) Xen will replace > the phandle in the "iommus" property with the virtual > >>> IOMMU node phandle. > >> Below, you said that each IOMMUs may have a different ID space. So > shouldn't we expose one vIOMMU per pIOMMU? If not, how do you expect the > user to specify the mapping? > > > > Yes you are right we need to create one vIOMMU per pIOMMU for dom0. This > also helps in the ACPI case > > where we don’t need to modify the tables to delete the pIOMMU entries > and create one vIOMMU. > > In this case, no need to replace the phandle as Xen create the vIOMMU > with the same pIOMMU > > phandle and same base address. > > > > For domU guests one vIOMMU per guest will be created. > > > >> > >>> For domU guests, when passthrough the device to the guest as per [2], > add the below property in the partial device tree > >>> node that is required to describe the generic device tree binding for > IOMMUs and their master(s) > >>> "iommus = < &magic_phandle 0xvMasterID> > >>> • magic_phandle will be the phandle ( vIOMMU phandle in xl) that > will be documented so that the user can set that in partial DT node > (0xfdea). > >> > >> Does this mean only one IOMMU will be supported in the guest? > > > > Yes. > > > >> > >>> • vMasterID will be the virtual master ID that the user will > provide. > >>> The partial device tree will look like this: > >>> /dts-v1/; > >>> / { > >>> /* #*cells are here to keep DTC happy */ > >>> #address-cells = <2>; > >>> #size-cells = <2>; > >>> aliases { > >>> net = &mac0; > >>> }; > >>> passthrough { > >>> compatible = "simple-bus"; > >>> ranges; > >>> #address-cells = <2>; > >>> #size-cells = <2>; > >>> mac0: ethernet@10000000 { > >>> compatible = "calxeda,hb-xgmac"; > >>> reg = <0 0x10000000 0 0x1000>; > >>> interrupts = <0 80 4 0 81 4 0 82 4>; > >>> iommus = <0xfdea 0x01>; > >>> }; > >>> }; > >>> }; > >>> In xl.cfg we need to define a new option to inform Xen about > vMasterId to pMasterId mapping and to which IOMMU device this > >>> the master device is connected so that Xen can configure the right > IOMMU. This is required if the system has devices that have > >>> the same master ID but behind a different IOMMU. > >> > >> In xl.cfg, we already pass the device-tree node path to passthrough. So > Xen should already have all the information about the IOMMU and Master-ID. > So it doesn't seem necessary for Device-Tree. > >> > >> For ACPI, I would have expected the information to be found in the > IOREQ. > >> > >> So can you add more context why this is necessary for everyone? > > > > We have information for IOMMU and Master-ID but we don’t have > information for linking vMaster-ID to pMaster-ID. > > The device tree node will be used to assign the device to the guest and > configure the Stage-2 translation. Guest will use the > > vMaster-ID to configure the vIOMMU during boot. Xen needs information to > link vMaster-ID to pMaster-ID to configure > > the corresponding pIOMMU. As I mention we need vMaster-ID in case a > system could have 2 identical Master-ID but > > each one connected to a different SMMU and assigned to the guest. > > I think the proposed solution would work and I would just like to clear > some issues. > > Please correct me if I'm wrong: > > In the xl config file we already need to specify dtdev to point to the > device path in host dtb. > In the partial device tree we specify the vMasterId as well as magic > phandle. > Isn't it that we already have all the information necessary without the > need for iommu_devid_map? > For me it looks like the partial dtb provides vMasterID and dtdev provides > pMasterID as well as physical phandle to SMMU. > > Having said that, I can also understand that specifying everything in one > place using iommu_devid_map can be easier > and reduces the need for device tree parsing. > > Apart from that, what is the reason of exposing only one vSMMU to guest > instead of one vSMMU per pSMMU? > In the latter solution, the whole issue with handling devices with the > same stream ID but belonging to different SMMUs > would be gone. It would also result in a more natural way of the device > tree look. Normally a guest would see > e.g. both SMMUs and exposing only one can be misleading. > I also have the same question. From earlier answers as I understand it is going to be identity vSMMU <-> pSMMU mappings for Dom0, so why diverge for DomU? Also I am thinking how this solution would work for IPMMU-VMSA Gen3(Gen4), which also supports two stages of translation, so the nested translation could be possible in general, although there might be some pitfalls (yes, I understand that code to emulate access to control registers would be different in comparison with SMMUv3, but some other code could be common). > > >> > >>> iommu_devid_map = [ “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS” , > “PMASTER_ID[@VMASTER_ID],IOMMU_BASE_ADDRESS”] > >>> • PMASTER_ID is the physical master ID of the device from the > physical DT. > >>> • VMASTER_ID is the virtual master Id that the user will > configure in the partial device tree. > >>> • IOMMU_BASE_ADDRESS is the base address of the physical IOMMU > device to which this device is connected. If iommu_devid_map is a way to go, I have a question, would this configuration cover the following cases? 1. Device has several stream IDs 2. Several devices share the stream ID (or several stream IDs) > >> > >> Below you give an example for Platform device. How would that fit in > the context of PCI passthrough? > > > > In PCI passthrough case, xl will create the "iommu-map" property in vpci > host bridge node with phandle to vIOMMU node. > > vSMMUv3 node will be created in xl. > > > >> > >>> Example: Let's say the user wants to assign the below physical device > in DT to the guest. > >>> iommu@4f000000 { > >>> compatible = "arm,smmu-v3"; > >>> interrupts = <0x00 0xe4 0xf04>; > >>> interrupt-parent = <0x01>; > >>> #iommu-cells = <0x01>; > >>> interrupt-names = "combined"; > >>> reg = <0x00 0x4f000000 0x00 0x40000>; > >>> phandle = <0xfdeb>; > >>> name = "iommu"; > >>> }; > >> > >> So I guess this node will be written by Xen. How will you the case > where there are extra property to added (e.g. dma-coherent)? > > > > In this example this is physical IOMMU node. vIOMMU node wil be created > by xl during guest creation. > >> > >>> test@10000000 { > >>> compatible = "viommu-test”; > >>> iommus = <0xfdeb 0x10>; > >> > >> I am a bit confused. Here you use 0xfdeb for the phandle but below... > > > > Here 0xfdeb is the physical IOMMU node phandle... > >> > >>> interrupts = <0x00 0xff 0x04>; > >>> reg = <0x00 0x10000000 0x00 0x1000>; > >>> name = "viommu-test"; > >>> }; > >>> The partial Device tree node will be like this: > >>> / { > >>> /* #*cells are here to keep DTC happy */ > >>> #address-cells = <2>; > >>> #size-cells = <2>; > >>> passthrough { > >>> compatible = "simple-bus"; > >>> ranges; > >>> #address-cells = <2>; > >>> #size-cells = <2>; > >>> test@10000000 { > >>> compatible = "viommu-test"; > >>> reg = <0 0x10000000 0 0x1000>; > >>> interrupts = <0 80 4 0 81 4 0 82 4>; > >>> iommus = <0xfdea 0x01>; > >> > >> ... you use 0xfdea. Does this mean 'xl' will rewrite the phandle? > > > > but here user has to set the “iommus” property with magic phanle as > explained earlier. 0xfdea is magic phandle. > > > > Regards, > > Rahul > > ~Michal > > > -- Regards, Oleksandr Tyshchenko