Re: [PATCH 1/2] arm64: dts: stratix10: Add Stratix10 SMMU support
Hi Robin, On 08/08/2018 12:38 PM, Robin Murphy wrote: On 25/07/18 19:24, thor.tha...@linux.intel.com wrote: From: Thor Thayer Add SMMU support to the Stratix10 Device Tree which includes adding the SMMU node and adding IOMMU stream ids to the SMMU peripherals. Update bindings. Signed-off-by: Thor Thayer --- This patch is dependent on the patch series "iommu/arm-smmu: Add runtime pm/sleep support" (https://patchwork.ozlabs.org/cover/946160/) --- .../devicetree/bindings/iommu/arm,smmu.txt | 25 ++ arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi | 30 ++ 2 files changed, 55 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 7c71a6ed465a..8e3fe0594e3e 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -18,6 +18,7 @@ conditions. "arm,mmu-500" "cavium,smmu-v2" "qcom,-smmu-v2", "qcom,smmu-v2" + "altr,smmu-v2" Can we guarantee that no Altera SoC will ever exist with a different SMMU implementation, configuration, or clock tree? If we must have compatibles for SoC-specific integrations, I'd be a lot happier if they were actually SoC-specific, i.e. at least "altr,stratix10-smmu", or even something like "altr,gx5500-smmu" if there's a chance of new incompatible designs being added to the Stratix 10 family in future. Good point. I'll get a better compatible string if I pursue this. I'm still dubious that we actually need this for MMU-500, though, since we will always need the TCU clock enabled to do anything, and given the difficulty in associating particular TBU clocks with whichever domains might cause allocations into which TBU's TLBs, it seems highly unlikely that it'll ever be feasible to work at a granularity finer than "all of the clocks". And at that point the names don't really matter, and we merely need something like the proposed of_clk_bulk_get()[1], which works fine regardless of how many TBUs and distinct clocks exist for a particular MMU-500 configuration and integration. Yes, I would prefer to use the standard arm,mmu-500 but with the changes proposed by [2] that this patch was dependent on, it seemed I would need to make a new structure and type. I like the patch series for devm_clk_bulk_get_all() that includes of_clk_bulk_get(). That would enable my patch to work with minor changes to add bulk_clk to arm-smmu.c. However, the patchset doesn't seem to have been accepted yet. depending on the particular implementation and/or the version of the architecture implemented. @@ -179,3 +180,27 @@ conditions. < SMMU_MDP_AHB_CLK>; clock-names = "bus", "iface"; }; + + /* Stratix10 arm,smmu-v2 implementation */ + smmu5: iommu@fa00 { + compatible = "altr,smmu-v2", "arm,mmu-500", + "arm,smmu-v2"; + reg = <0xfa00 0x4>; + #global-interrupts = <2>; + #iommu-cells = <1>; + clocks = < STRATIX10_L4_MAIN_CLK>; + clock-names = "masters"; This isn't documented as an actual property, and if it also clocks the TCU then I'm not sure it's really the most accurate name. Robin. [1] https://patchwork.kernel.org/patch/10427095/ In the patch I'll remove the clock-names. I'll keep track of the status of this patch (and [3] from the same patchset). I tested a simple patch that uses devm_clk_bulk_get_all() from these bulk_clk patches and it works well with the standard "arm,mmu-500" compatible. This patch has dependencies on [2]. It seems like [2] could use the bulk_clk patches in [1] above (in particular [2]'s patch 1/4). The function arm_smmu_fill_clk_data() wouldn't be needed because everything happens in devm_clk_bulk_get_all(). However, those bulk_clk patches have been hanging out there since May. I'm unclear on how to proceed. Do I continue with dependency on [2] or create a new patch adding the bulk clocks changes in [1] (and [3])? Thanks for reviewing! Thor [2] https://patchwork.kernel.org/patch/10546677/ [3] https://patchwork.kernel.org/patch/10427079/ + interrupt-parent = <>; + interrupts = <0 128 4>, /* Global Secure Fault */ + <0 129 4>, /* Global Non-secure Fault */ + /* Non-secure Context Interrupts (32) */ + <0 138 4>, <0 139 4>, <0 140 4>, <0 141 4>, + <0 142 4>, <0 143 4>, <0 144 4>, <0 145 4>, + <0 146 4>, <0 147 4>, <0 148 4>, <0 149 4>, + <0 150 4>, <0 151 4>, <0 152 4>, <0 153 4>, + <0 154 4>, <0 155 4>, <0 156 4>, <0 157 4>, + <0 158 4>, <0 159 4>, <0 160 4>, <0 161 4>, + <0 162 4>, <0 163 4>, <0 164 4>, <0 165 4>, + <0 166 4>, <0 167 4>, <0 168 4>, <0 169
Re: IOAT DMA w/IOMMU
On 16/08/18 12:53 PM, Kit Chow wrote: > > > On 08/16/2018 10:21 AM, Logan Gunthorpe wrote: >> >> On 16/08/18 11:16 AM, Kit Chow wrote: >>> I only have access to intel hosts for testing (and possibly an AMD >>> host currently collecting dust) and am not sure how to go about getting >>> the proper test coverage for other architectures. >> Well, I thought you were only changing the Intel IOMMU implementation... >> So testing on Intel hardware seems fine to me. > For the ntb change, I wasn't sure if there was some other arch where > ntb_async_tx_submit would work ok with the pci bar address but > would break with the dma map (assuming map_resource has been implemented). > If its all x86 at the moment, I guess I'm good to go. Please confirm. I expect lots of arches have issues with dma_map_resource(), it's pretty new and if it didn't work correctly on x86 I imagine many other arches are wrong too. I an arch is doesn't work, someone will have to fix that arches Iommu implementation. But using dma_map_resource() in ntb_transport is definitely correct compared to what was done before. Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IOAT DMA w/IOMMU
On 08/16/2018 10:21 AM, Logan Gunthorpe wrote: On 16/08/18 11:16 AM, Kit Chow wrote: I only have access to intel hosts for testing (and possibly an AMD host currently collecting dust) and am not sure how to go about getting the proper test coverage for other architectures. Well, I thought you were only changing the Intel IOMMU implementation... So testing on Intel hardware seems fine to me. For the ntb change, I wasn't sure if there was some other arch where ntb_async_tx_submit would work ok with the pci bar address but would break with the dma map (assuming map_resource has been implemented). If its all x86 at the moment, I guess I'm good to go. Please confirm. Thanks Kit Any suggestions? Do you anticipate any issues with dma mapping the pci bar address in ntb_async_tx_submit on non-x86 archs? Do you or know of folks who have ntb test setups with non-x86 hosts who might be able to help? I expect other IOMMUs will need to be updated to properly support dma_map_resource but that will probably need to be done on a case by case basis as the need arises. Unfortunately it seems the work to add dma_map_resource() didn't really consider that the IOMMUs had to have proper support and probably should have errored out instead of just passing along the physical address if the IOMMU didn't have support. I don't know of anyone with an NTB setup on anything but x86 at the moment. Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU
On 15/08/18 20:56, Dmitry Osipenko wrote: On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote: On 02/08/18 19:24, Dmitry Osipenko wrote: On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote: On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote: On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote: On 27/07/18 15:10, Dmitry Osipenko wrote: On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote: On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote: On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote: The proposed solution adds a new option to the base device driver structure that allows device drivers to explicitly convey to the drivers core that the implicit IOMMU backing for devices must not happen. Why is IOMMU mapping a problem for the Tegra GPU driver? If we add something like this then it should not be the choice of the device driver, but of the user and/or the firmware. Agreed, and it would still need somebody to configure an identity domain so that transactions aren't aborted immediately. We currently allow the identity domain to be used by default via a command-line option, so I guess we'd need a way for firmware to request that on a per-device basis. The IOMMU mapping itself is not a problem, the problem is the management of the IOMMU. For Tegra we don't want anything to intrude into the IOMMU activities because: 1) GPU HW require additional configuration for the IOMMU usage and dumb mapping of the allocations simply doesn't work. Generally, that's already handled by the DRM drivers allocating their own unmanaged domains. The only problem we really need to solve in that regard is that currently the device DMA ops don't get updated when moving away from the managed domain. That's been OK for the VFIO case where the device is bound to a different driver which we know won't make any explicit DMA API calls, but for the more general case of IOMMU-aware drivers we could certainly do with a bit of cooperation between the IOMMU API, DMA API, and arch code to update the DMA ops dynamically to cope with intermediate subsystems making DMA API calls on behalf of devices they don't know the intimate details of. 2) Older Tegra generations have a limited resource and capabilities in regards to IOMMU usage, allocating IOMMU domain per-device is just impossible for example. 3) HW performs context switches and so particular allocations have to be assigned to a particular contexts IOMMU domain. I understand Qualcomm SoCs have a similar thing too, and AFAICS that case just doesn't fit into the current API model at all. We need the IOMMU driver to somehow know about the specific details of which devices have magic associations with specific contexts, and we almost certainly need a more expressive interface than iommu_domain_alloc() to have any hope of reliable results. This is correct for Qualcomm GPUs - The GPU hardware context switching requires a specific context and there are some restrictions around secure contexts as well. We don't really care if the DMA attaches to a context just as long as it doesn't attach to the one(s) we care about. Perhaps a "valid context" mask would work in from the DT or the device struct to give the subsystems a clue as to which domains they were allowed to use. I recognize that there isn't a one-size-fits-all solution to this problem so I'm open to different ideas. Designating whether implicit IOMMU backing is appropriate for a device via device-tree property sounds a bit awkward because that will be a kinda software description (of a custom Linux driver model), while device-tree is supposed to describe HW. What about to grant IOMMU drivers with ability to decide whether the implicit backing for a device is appropriate? Like this: bool implicit_iommu_for_dma_is_allowed(struct device *dev) { const struct iommu_ops *ops = dev->bus->iommu_ops; struct iommu_group *group; group = iommu_group_get(dev); if (!group) return NULL; iommu_group_put(group); if (!ops->implicit_iommu_for_dma_is_allowed) return true; return ops->implicit_iommu_for_dma_is_allowed(dev); } Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing for a device is appropriate. Guys, does it sound good to you or maybe you have something else on your mind? Even if it's not an ideal solution, it fixes the immediate problem and should be good enough for the starter. To me that looks like a step ion the wrong direction that won't help at all in actually addressing the underlying issues. If the GPU driver wants to explicitly control IOMMU mappings instead of relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own unmanaged domain. At that point it shouldn't matter if a DMA ops domain was allocated, since the GPU device will no longer be attached to it. It is not obvious
Re: IOAT DMA w/IOMMU
On 16/08/18 11:16 AM, Kit Chow wrote: > I only have access to intel hosts for testing (and possibly an AMD > host currently collecting dust) and am not sure how to go about getting > the proper test coverage for other architectures. Well, I thought you were only changing the Intel IOMMU implementation... So testing on Intel hardware seems fine to me. > Any suggestions? Do you anticipate any issues with dma mapping the pci bar > address in ntb_async_tx_submit on non-x86 archs? Do you or know of folks > who have ntb test setups with non-x86 hosts who might be able to help? I expect other IOMMUs will need to be updated to properly support dma_map_resource but that will probably need to be done on a case by case basis as the need arises. Unfortunately it seems the work to add dma_map_resource() didn't really consider that the IOMMUs had to have proper support and probably should have errored out instead of just passing along the physical address if the IOMMU didn't have support. I don't know of anyone with an NTB setup on anything but x86 at the moment. Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IOAT DMA w/IOMMU
On 08/09/2018 02:36 PM, Logan Gunthorpe wrote: On 09/08/18 03:31 PM, Eric Pilmore wrote: On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe wrote: Hey, On 09/08/18 12:51 PM, Eric Pilmore wrote: Was wondering if anybody here has used IOAT DMA engines with an IOMMU turned on (Xeon based system)? My specific question is really whether it is possible to DMA (w/IOAT) to a PCI BAR address as the destination without having to map that address to the IOVA space of the DMA engine first (assuming the IOMMU is on)? I haven't tested this scenario but my guess would be that IOAT would indeed go through the IOMMU and the PCI BAR address would need to be properly mapped into the IOAT's IOVA. The fact that you see DMAR errors is probably a good indication that this is the case. I really don't know why you'd want to DMA something without mapping it. The thought was to avoid paying the price of having to go through yet another translation and also because it was not believed to be necessary anyway since the DMA device could go straight to a PCI BAR address without the need for a mapping. We have been playing with two DMA engines, IOAT and PLX. The PLX does not have any issues going straight to the PCI BAR address, but unlike IOAT, PLX is sitting "in the PCI tree". Yes, you can do true P2P transactions with the PLX DMA engine. (Though you do have to be careful as technically you need to translate to a PCI bus address not the CPU physical address which are the same on x86; and you need to watch that ACS doesn't mess it up). It doesn't sound like it's possible to avoid the IOMMU when using the IOAT so you need the mapping. I would not expect the extra translation in the IOMMU to be noticeable, performance wise. Logan Hi Logan, Based on Dave Jiang's suggestion, I am looking into upstreaming the ntb change to dma map the pci bar address in ntb_async_tx_submit() (along with the intel iommu change to support dma_map_resource). I only have access to intel hosts for testing (and possibly an AMD host currently collecting dust) and am not sure how to go about getting the proper test coverage for other architectures. Any suggestions? Do you anticipate any issues with dma mapping the pci bar address in ntb_async_tx_submit on non-x86 archs? Do you or know of folks who have ntb test setups with non-x86 hosts who might be able to help? Thanks Kit ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout
On 2018-08-16 10:18 AM, Will Deacon wrote: On Thu, Aug 16, 2018 at 04:21:17PM +0800, Leizhen (ThunderTown) wrote: On 2018/8/15 20:26, Robin Murphy wrote: On 15/08/18 11:23, Zhen Lei wrote: diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 1d64710..3f5c236 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -566,7 +566,7 @@ struct arm_smmu_device { intgerr_irq; intcombined_irq; -atomic_tsync_nr; +u32sync_nr; unsigned longias; /* IPA */ unsigned longoas; /* PA */ @@ -775,6 +775,11 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent) return 0; } +static inline void arm_smmu_cmdq_sync_set_msidata(u64 *cmd, u32 msidata) If we *are* going to go down this route then I think it would make sense to move the msiaddr and CMDQ_SYNC_0_CS_MSI logic here as well; i.e. arm_smmu_cmdq_build_cmd() always generates a "normal" SEV-based sync command, then calling this guy would convert it to an MSI-based one. As-is, having bits of mutually-dependent data handled across two separate places just seems too messy and error-prone. Yes, How about create a new function "arm_smmu_cmdq_build_sync_msi_cmd"? static inline void arm_smmu_cmdq_build_sync_msi_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) { cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); cmd[1] = ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; } None of this seems justified given the numbers from John, so please just do the simple thing and build the command with the lock held. Agreed - sorry if my wording was unclear, but that suggestion was only for the possibility of it proving genuinely worthwhile to build the command outside the lock. Since that isn't the case, I definitely prefer the simpler approach too. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout
On Thu, Aug 16, 2018 at 04:21:17PM +0800, Leizhen (ThunderTown) wrote: > On 2018/8/15 20:26, Robin Murphy wrote: > > On 15/08/18 11:23, Zhen Lei wrote: > >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > >> index 1d64710..3f5c236 100644 > >> --- a/drivers/iommu/arm-smmu-v3.c > >> +++ b/drivers/iommu/arm-smmu-v3.c > >> @@ -566,7 +566,7 @@ struct arm_smmu_device { > >> > >> intgerr_irq; > >> intcombined_irq; > >> -atomic_tsync_nr; > >> +u32sync_nr; > >> > >> unsigned longias; /* IPA */ > >> unsigned longoas; /* PA */ > >> @@ -775,6 +775,11 @@ static int queue_remove_raw(struct arm_smmu_queue *q, > >> u64 *ent) > >> return 0; > >> } > >> > >> +static inline void arm_smmu_cmdq_sync_set_msidata(u64 *cmd, u32 msidata) > > > > If we *are* going to go down this route then I think it would make sense > > to move the msiaddr and CMDQ_SYNC_0_CS_MSI logic here as well; i.e. > > arm_smmu_cmdq_build_cmd() always generates a "normal" SEV-based sync > > command, then calling this guy would convert it to an MSI-based one. > > As-is, having bits of mutually-dependent data handled across two > > separate places just seems too messy and error-prone. > > Yes, How about create a new function "arm_smmu_cmdq_build_sync_msi_cmd"? > > static inline > void arm_smmu_cmdq_build_sync_msi_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > { > cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); > cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); > cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); > cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); > cmd[1] = ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; > } None of this seems justified given the numbers from John, so please just do the simple thing and build the command with the lock held. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 1/2] iommu/arm-smmu-v3: fix unexpected CMD_SYNC timeout
On 2018/8/15 20:26, Robin Murphy wrote: > On 15/08/18 11:23, Zhen Lei wrote: >> The condition "(int)(VAL - sync_idx) >= 0" to break loop in function >> __arm_smmu_sync_poll_msi requires that sync_idx must be increased >> monotonously according to the sequence of the CMDs in the cmdq. >> >> But ".msidata = atomic_inc_return_relaxed(>sync_nr)" is not protected >> by spinlock, so the following scenarios may appear: >> cpu0cpu1 >> msidata=0 >> msidata=1 >> insert cmd1 >> insert cmd0 >> smmu execute cmd1 >> smmu execute cmd0 >> poll timeout, because msidata=1 is overridden by >> cmd0, that means VAL=0, sync_idx=1. >> >> This is not a functional problem, just make the caller wait for a long >> time until TIMEOUT. It's rare to happen, because any other CMD_SYNCs >> during the waiting period will break it. >> >> Signed-off-by: Zhen Lei >> --- >> drivers/iommu/arm-smmu-v3.c | 12 >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index 1d64710..3f5c236 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -566,7 +566,7 @@ struct arm_smmu_device { >> >> intgerr_irq; >> intcombined_irq; >> -atomic_tsync_nr; >> +u32sync_nr; >> >> unsigned longias; /* IPA */ >> unsigned longoas; /* PA */ >> @@ -775,6 +775,11 @@ static int queue_remove_raw(struct arm_smmu_queue *q, >> u64 *ent) >> return 0; >> } >> >> +static inline void arm_smmu_cmdq_sync_set_msidata(u64 *cmd, u32 msidata) > > If we *are* going to go down this route then I think it would make sense to > move the msiaddr and CMDQ_SYNC_0_CS_MSI logic here as well; i.e. > arm_smmu_cmdq_build_cmd() always generates a "normal" SEV-based sync command, > then calling this guy would convert it to an MSI-based one. As-is, having > bits of mutually-dependent data handled across two separate places just seems > too messy and error-prone. Yes, How about create a new function "arm_smmu_cmdq_build_sync_msi_cmd"? static inline void arm_smmu_cmdq_build_sync_msi_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) { cmd[0] = FIELD_PREP(CMDQ_0_OP, ent->opcode); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); cmd[1] = ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; } > > That said, I still don't think that just building the whole command under the > lock is really all that bad - even when it doesn't get optimised into one of > the assignments that memset you call out is only a single "stp xzr, xzr, > ...", and a couple of extra branches doesn't seem a huge deal compared to the > DSB and MMIO accesses (and potentially polling) that we're about to do > anyway. I've tried hacking things up enough to convince GCC to inline a > specialisation of the relevant switch case when ent->opcode is known, and > that reduces the "overhead" down to just a handful of ALU instructions. I > still need to try cleaning said hack up and double-check that it doesn't have > any adverse impact on all the other SMMUv3 stuff in development, but watch > this space... > > Robin. > >> +{ >> +cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA, msidata); >> +} >> + >> /* High-level queue accessors */ >> static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) >> { >> @@ -836,7 +841,6 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct >> arm_smmu_cmdq_ent *ent) >> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV); >> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); >> cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); >> -cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIDATA, ent->sync.msidata); >> cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; >> break; >> default: >> @@ -947,7 +951,6 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct >> arm_smmu_device *smmu) >> struct arm_smmu_cmdq_ent ent = { >> .opcode = CMDQ_OP_CMD_SYNC, >> .sync= { >> -.msidata = atomic_inc_return_relaxed(>sync_nr), >> .msiaddr = virt_to_phys(>sync_count), >> }, >> }; >> @@ -955,6 +958,8 @@ static int __arm_smmu_cmdq_issue_sync_msi(struct >> arm_smmu_device *smmu) >> arm_smmu_cmdq_build_cmd(cmd, ); >> >> spin_lock_irqsave(>cmdq.lock, flags); >> +ent.sync.msidata = ++smmu->sync_nr; >> +arm_smmu_cmdq_sync_set_msidata(cmd, ent.sync.msidata); >> arm_smmu_cmdq_insert_cmd(smmu, cmd); >> spin_unlock_irqrestore(>cmdq.lock, flags); >> >> @@ -2179,7 +2184,6 @@ static int arm_smmu_init_structures(struct >> arm_smmu_device *smmu)