Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax
On Mon, 21 Jun 2021 16:00:36 +0200, Thierry Reding wrote: > Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible > string") introduced a jsonschema syntax error as a result of a rebase > gone wrong. Fix it. Applied, thanks! [1/1] dt-bindings: arm-smmu: Fix json-schema syntax commit: bf3ec9deaa33889630722c47f7bb86ba58872ea7 Best regards, -- Krzysztof Kozlowski ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
On Wed, Jun 23, 2021 at 11:31 AM Jason Wang wrote: > > > 在 2021/6/22 下午4:14, Yongji Xie 写道: > > On Tue, Jun 22, 2021 at 3:50 PM Jason Wang wrote: > >> > >> 在 2021/6/22 下午3:22, Yongji Xie 写道: > We need fix a way to propagate the error to the userspace. > > E.g if we want to stop the deivce, we will delay the status reset until > we get respose from the userspace? > > >>> I didn't get how to delay the status reset. And should it be a DoS > >>> that we want to fix if the userspace doesn't give a response forever? > >> > >> You're right. So let's make set_status() can fail first, then propagate > >> its failure via VHOST_VDPA_SET_STATUS. > >> > > OK. So we only need to propagate the failure in the vhost-vdpa case, right? > > > I think not, we need to deal with the reset for virtio as well: > > E.g in register_virtio_devices(), we have: > > /* We always start by resetting the device, in case a previous > * driver messed it up. This also tests that code path a > little. */ >dev->config->reset(dev); > > We probably need to make reset can fail and then fail the > register_virtio_device() as well. > OK, looks like virtio_add_status() and virtio_device_ready()[1] should be also modified if we need to propagate the failure in the virtio-vdpa case. Or do we only need to care about the reset case? [1] https://lore.kernel.org/lkml/20210517093428.670-1-xieyon...@bytedance.com/ Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
在 2021/6/22 下午4:14, Yongji Xie 写道: On Tue, Jun 22, 2021 at 3:50 PM Jason Wang wrote: 在 2021/6/22 下午3:22, Yongji Xie 写道: We need fix a way to propagate the error to the userspace. E.g if we want to stop the deivce, we will delay the status reset until we get respose from the userspace? I didn't get how to delay the status reset. And should it be a DoS that we want to fix if the userspace doesn't give a response forever? You're right. So let's make set_status() can fail first, then propagate its failure via VHOST_VDPA_SET_STATUS. OK. So we only need to propagate the failure in the vhost-vdpa case, right? I think not, we need to deal with the reset for virtio as well: E.g in register_virtio_devices(), we have: /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ dev->config->reset(dev); We probably need to make reset can fail and then fail the register_virtio_device() as well. Thanks Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
Konrad Rzeszutek Wilk wrote on Tue, Jun 22, 2021 at 05:58:14PM -0400: > On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote: > > Thanks, that should be good. > > > > Do you want me to send a follow-up patch with the two extra checks > > (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr) > > tlb_offset < alloc_size > > > > or are we certain this can't ever happen? > > I would love more patches and I saw the previous one you posted. > > But we only got two (or one) weeks before the next merge window opens > so I am sending to Linus the one that was tested with NVMe and crypto > (see above). > > That is the > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14 > > And then after Linus releases the 5.14 - I would love to take your > cleanup on top of that and test it? That sounds good to me, will send with proper formatting after release. -- Dominique ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 2021-06-21 15:32, Lu Baolu wrote: Hi Robin, On 2021/6/21 19:59, Robin Murphy wrote: On 2021-06-21 11:34, John Garry wrote: On 21/06/2021 11:00, Lu Baolu wrote: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Yes. We need to distinguish the "must" and "nice-to-have" cases of setting strict mode. Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. This patch series looks good to me and I have added by reviewed-by. Probably we could make another patch series to improve it so that the kernel optimization should not override the user setting. On a personal level I would be happy with that approach, but I think it's better to not start changing things right away in a follow-up series. So how about we add this patch (which replaces 6/6 "iommu: Remove mode argument from iommu_set_dma_strict()")? Robin, any opinion? For me it boils down to whether there are any realistic workloads where non-strict mode *would* still perform better under virtualisation. The At present, we see that strict mode has better performance in the virtualization environment because it will make the shadow page table management more efficient. When the hardware supports nested translation, we may have to re-evaluate this since there's no need for a shadowing page table anymore. I guess I was assuming that in most cases, proper nested mode could look distinct enough that we'd be able to treat it differently in the first place. For instance, if it's handing guest tables directly to the hardware, would the host have any reason to still set the "caching mode" ID bit? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
On 2021-06-22 17:06, Doug Anderson wrote: Hi, On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy wrote: Hi Doug, On 2021-06-22 00:52, Douglas Anderson wrote: This patch attempts to put forward a proposal for enabling non-strict DMA on a device-by-device basis. The patch series requests non-strict DMA for the Qualcomm SDHCI controller as a first device to enable, getting a nice bump in performance with what's believed to be a very small drop in security / safety (see the patch for the full argument). As part of this patch series I am end up slightly cleaning up some of the interactions between the PCI subsystem and the IOMMU subsystem but I don't go all the way to fully remove all the tentacles. Specifically this patch series only concerns itself with a single aspect: strict vs. non-strict mode for the IOMMU. I'm hoping that this will be easier to talk about / reason about for more subsystems compared to overall deciding what it means for a device to be "external" or "untrusted". If something like this patch series ends up being landable, it will undoubtedly need coordination between many maintainers to land. I believe it's fully bisectable but later patches in the series definitely depend on earlier ones. Sorry for the long CC list. :( Unfortunately, this doesn't work. In normal operation, the default domains should be established long before individual drivers are even loaded (if they are modules), let alone anywhere near probing. The fact that iommu_probe_device() sometimes gets called far too late off the back of driver probe is an unfortunate artefact of the original probe-deferral scheme, and causes other problems like potentially malformed groups - I've been forming a plan to fix that for a while now, so I for one really can't condone anything trying to rely on it. Non-deterministic behaviour based on driver probe order for multi-device groups is part of the existing problem, and your proposal seems equally vulnerable to that too. Doh! :( I definitely can't say I understand the iommu subsystem amazingly well. It was working for me, but I could believe that I was somehow violating a rule somewhere. I'm having a bit of a hard time understanding where the problem is though. Is there any chance that you missed the part of my series where I introduced a "pre_probe" step? Specifically, I see this: * really_probe() is called w/ a driver and a device. * -> calls dev->bus->dma_configure() w/ a "struct device *" * -> eventually calls iommu_probe_device() w/ the device. This... * -> calls iommu_alloc_default_domain() w/ the device * -> calls iommu_group_alloc_default_domain() * -> always allocates a new domain ...so we always have a "struct device" when a domain is allocated if that domain is going to be associated with a device. I will agree that iommu_probe_device() is called before the driver probe, but unless I missed something it's after the device driver is loaded. ...and assuming something like patch #1 in this series looks OK then iommu_probe_device() will be called after "pre_probe". So assuming I'm not missing something, I'm not actually relying the IOMMU getting init off the back of driver probe. ...is implicitly that. Sorry that it's not obvious. The "proper" flow is that iommu_probe_device() is called for everything which already exists during the IOMMU driver's own probe when it calls bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which appears subsequently. The only trouble is, to observe it in action on a DT-based system you'd currently have to go back at least 4 years, before 09515ef5ddad... Basically there were two issues: firstly we need the of_xlate step before add_device (now probe_device) for a DT-based IOMMU driver to know whether it should care about the given device or not. When -EPROBE_DEFER was the only tool we had to ensure probe ordering, and resolving the "iommus" DT property the only place to decide that, delaying it all until driver probe time was the only reasonable option, however ugly. The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is merely doing its best to fake up the previous behaviour. Try binding a dummy driver to your device first, then unbind it and bind the real one, and you'll see that iommu_probe_device() doesn't run the second or subsequent times. Now that we have fw_devlink to take care of ordering, the main reason for this weirdness is largely gone, so I'm keen to start getting rid of the divergence again as far as possible. Fundamentally, IOMMU drivers are supposed to be aware of all devices which the kernel knows about, regardless of whether they have a driver available or not. The second issue is that when we have multiple IOMMU instances, the initial bus_set_iommu() "replay" is only useful for the first instance, so devices managed by other instances which aren't up and running yet will be glossed over. Currently this ends up being papered over by the solution to the first
[GIT PULL] (stable) stable/for-linus-5.14
Hey Linus, Please git pull the following branch: git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git stable/for-linus-5.14 which has the regression for the DMA operations where the offset was ignored and corruptions would appear. Going forward there will be a cleanups to make the offset and alignment logic more clearer and better test-cases to help with this. Bumyong Lee (1): swiotlb: manipulate orig_addr when tlb_addr has offset kernel/dma/swiotlb.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote: > Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400: > > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and > > mangled. I pushed them original patch from Bumyong there and will let > > it sit for a day and then create a stable branch and give it to Linus. > > Thanks, that should be good. > > Do you want me to send a follow-up patch with the two extra checks > (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr) > tlb_offset < alloc_size > > or are we certain this can't ever happen? I would love more patches and I saw the previous one you posted. But we only got two (or one) weeks before the next merge window opens so I am sending to Linus the one that was tested with NVMe and crypto (see above). That is the https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14 And then after Linus releases the 5.14 - I would love to take your cleanup on top of that and test it? > (I didn't see any hit in dmesg when I ran with these, but my opinion is > better safe than sorry...) > > > > Then I need to expand the test-regression bucket so that this does not > > happen again. Dominique, how easy would it be to purchase one of those > > devices? > > My company is making such a device, but it's not on the market yet > (was planned for august, with some delay in approvisionning it'll > probably be a bit late), and would mean buying from Japan so I'm not > sure how convenient that would be... > > These are originally NXP devices so I assume Horia would have better > suggestions, if you would? > > > > I was originally thinking to create a crypto device in QEMU to simulate > > this but that may take longer to write than just getting the real thing. > > > > Or I could create some fake devices with weird offsets and write a driver > > for it to exercise this.. like this one I had done some time ago that > > needs some brushing off. > > Just a fake device with fake offsets as a test is probably good enough, > ideally would need to exerce both failures we've seen (offset in > dma_sync_single_for_device like caam does and in the original mapping (I > assume?) like the NVMe driver does), but that sounds possible :) Yup. Working on that now. > > > Thanks again! > -- > Dominique ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 01/12] swiotlb: Refactor swiotlb init functions
On Sat, 19 Jun 2021, Claire Chang wrote: > Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct > initialization to make the code reusable. > > Signed-off-by: Claire Chang > Reviewed-by: Christoph Hellwig > Tested-by: Stefano Stabellini > Tested-by: Will Deacon Acked-by: Stefano Stabellini > --- > kernel/dma/swiotlb.c | 50 ++-- > 1 file changed, 25 insertions(+), 25 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 52e2ac526757..1f9b2b9e7490 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void) > memset(vaddr, 0, bytes); > } > > -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > verbose) > +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t > start, > + unsigned long nslabs, bool late_alloc) > { > + void *vaddr = phys_to_virt(start); > unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > + > + mem->nslabs = nslabs; > + mem->start = start; > + mem->end = mem->start + bytes; > + mem->index = 0; > + mem->late_alloc = late_alloc; > + spin_lock_init(&mem->lock); > + for (i = 0; i < mem->nslabs; i++) { > + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > + mem->slots[i].alloc_size = 0; > + } > + memset(vaddr, 0, bytes); > +} > + > +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > verbose) > +{ > struct io_tlb_mem *mem; > size_t alloc_size; > > @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned > long nslabs, int verbose) > if (!mem) > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > __func__, alloc_size, PAGE_SIZE); > - mem->nslabs = nslabs; > - mem->start = __pa(tlb); > - mem->end = mem->start + bytes; > - mem->index = 0; > - spin_lock_init(&mem->lock); > - for (i = 0; i < mem->nslabs; i++) { > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > - mem->slots[i].alloc_size = 0; > - } > + > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > > io_tlb_default_mem = mem; > if (verbose) > @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size) > int > swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) > { > - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > struct io_tlb_mem *mem; > + unsigned long bytes = nslabs << IO_TLB_SHIFT; > > if (swiotlb_force == SWIOTLB_NO_FORCE) > return 0; > @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long > nslabs) > if (!mem) > return -ENOMEM; > > - mem->nslabs = nslabs; > - mem->start = virt_to_phys(tlb); > - mem->end = mem->start + bytes; > - mem->index = 0; > - mem->late_alloc = 1; > - spin_lock_init(&mem->lock); > - for (i = 0; i < mem->nslabs; i++) { > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > - mem->slots[i].alloc_size = 0; > - } > - > + memset(mem, 0, sizeof(*mem)); > set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); > - memset(tlb, 0, bytes); > + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); > > io_tlb_default_mem = mem; > swiotlb_print_info(); > -- > 2.32.0.288.g62a8d224e6-goog > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
Hi, On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan wrote: > > On Tue, Jun 22, 2021 at 1:02 PM Rob Herring wrote: > > > > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy wrote: > > > > > > > > Hi Doug, > > > > > > > > On 2021-06-22 00:52, Douglas Anderson wrote: > > > > > > > > > > This patch attempts to put forward a proposal for enabling non-strict > > > > > DMA on a device-by-device basis. The patch series requests non-strict > > > > > DMA for the Qualcomm SDHCI controller as a first device to enable, > > > > > getting a nice bump in performance with what's believed to be a very > > > > > small drop in security / safety (see the patch for the full argument). > > > > > > > > > > As part of this patch series I am end up slightly cleaning up some of > > > > > the interactions between the PCI subsystem and the IOMMU subsystem but > > > > > I don't go all the way to fully remove all the tentacles. Specifically > > > > > this patch series only concerns itself with a single aspect: strict > > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier > > > > > to talk about / reason about for more subsystems compared to overall > > > > > deciding what it means for a device to be "external" or "untrusted". > > > > > > > > > > If something like this patch series ends up being landable, it will > > > > > undoubtedly need coordination between many maintainers to land. I > > > > > believe it's fully bisectable but later patches in the series > > > > > definitely depend on earlier ones. Sorry for the long CC list. :( > > > > > > > > Unfortunately, this doesn't work. In normal operation, the default > > > > domains should be established long before individual drivers are even > > > > loaded (if they are modules), let alone anywhere near probing. The fact > > > > that iommu_probe_device() sometimes gets called far too late off the > > > > back of driver probe is an unfortunate artefact of the original > > > > probe-deferral scheme, and causes other problems like potentially > > > > malformed groups - I've been forming a plan to fix that for a while now, > > > > so I for one really can't condone anything trying to rely on it. > > > > Non-deterministic behaviour based on driver probe order for multi-device > > > > groups is part of the existing problem, and your proposal seems equally > > > > vulnerable to that too. > > > > > > Doh! :( I definitely can't say I understand the iommu subsystem > > > amazingly well. It was working for me, but I could believe that I was > > > somehow violating a rule somewhere. > > > > > > I'm having a bit of a hard time understanding where the problem is > > > though. Is there any chance that you missed the part of my series > > > where I introduced a "pre_probe" step? Specifically, I see this: > > > > > > * really_probe() is called w/ a driver and a device. > > > * -> calls dev->bus->dma_configure() w/ a "struct device *" > > > * -> eventually calls iommu_probe_device() w/ the device. > > > * -> calls iommu_alloc_default_domain() w/ the device > > > * -> calls iommu_group_alloc_default_domain() > > > * -> always allocates a new domain > > > > > > ...so we always have a "struct device" when a domain is allocated if > > > that domain is going to be associated with a device. > > > > > > I will agree that iommu_probe_device() is called before the driver > > > probe, but unless I missed something it's after the device driver is > > > loaded. ...and assuming something like patch #1 in this series looks > > > OK then iommu_probe_device() will be called after "pre_probe". > > > > > > So assuming I'm not missing something, I'm not actually relying the > > > IOMMU getting init off the back of driver probe. > > > > > > > > > > FWIW we already have a go-faster knob for people who want to tweak the > > > > security/performance compromise for specific devices, namely the sysfs > > > > interface for changing a group's domain type before binding the relevant > > > > driver(s). Is that something you could use in your application, say from > > > > an initramfs script? > > > > > > We've never had an initramfs script in Chrome OS. I don't know all the > > > history of why (I'm trying to check), but I'm nearly certain it was a > > > conscious decision. Probably it has to do with the fact that we're not > > > trying to build a generic distribution where a single boot source can > > > boot a huge variety of hardware. We generally have one kernel for a > > > class of devices. I believe avoiding the initramfs just keeps things > > > simpler. > > > > > > I think trying to revamp Chrome OS to switch to an initramfs type > > > system would be a pretty big undertaking since (as I understand it) > > > you can't just run a little command and then return to the normal boot > > > flow. Once you switch to initramfs you're committing to finding / > > > setting up the rootfs yourself and on Chrome OS I believe that means
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
On Tue, Jun 22, 2021 at 1:02 PM Rob Herring wrote: > > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote: > > Hi, > > > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy wrote: > > > > > > Hi Doug, > > > > > > On 2021-06-22 00:52, Douglas Anderson wrote: > > > > > > > > This patch attempts to put forward a proposal for enabling non-strict > > > > DMA on a device-by-device basis. The patch series requests non-strict > > > > DMA for the Qualcomm SDHCI controller as a first device to enable, > > > > getting a nice bump in performance with what's believed to be a very > > > > small drop in security / safety (see the patch for the full argument). > > > > > > > > As part of this patch series I am end up slightly cleaning up some of > > > > the interactions between the PCI subsystem and the IOMMU subsystem but > > > > I don't go all the way to fully remove all the tentacles. Specifically > > > > this patch series only concerns itself with a single aspect: strict > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier > > > > to talk about / reason about for more subsystems compared to overall > > > > deciding what it means for a device to be "external" or "untrusted". > > > > > > > > If something like this patch series ends up being landable, it will > > > > undoubtedly need coordination between many maintainers to land. I > > > > believe it's fully bisectable but later patches in the series > > > > definitely depend on earlier ones. Sorry for the long CC list. :( > > > > > > Unfortunately, this doesn't work. In normal operation, the default > > > domains should be established long before individual drivers are even > > > loaded (if they are modules), let alone anywhere near probing. The fact > > > that iommu_probe_device() sometimes gets called far too late off the > > > back of driver probe is an unfortunate artefact of the original > > > probe-deferral scheme, and causes other problems like potentially > > > malformed groups - I've been forming a plan to fix that for a while now, > > > so I for one really can't condone anything trying to rely on it. > > > Non-deterministic behaviour based on driver probe order for multi-device > > > groups is part of the existing problem, and your proposal seems equally > > > vulnerable to that too. > > > > Doh! :( I definitely can't say I understand the iommu subsystem > > amazingly well. It was working for me, but I could believe that I was > > somehow violating a rule somewhere. > > > > I'm having a bit of a hard time understanding where the problem is > > though. Is there any chance that you missed the part of my series > > where I introduced a "pre_probe" step? Specifically, I see this: > > > > * really_probe() is called w/ a driver and a device. > > * -> calls dev->bus->dma_configure() w/ a "struct device *" > > * -> eventually calls iommu_probe_device() w/ the device. > > * -> calls iommu_alloc_default_domain() w/ the device > > * -> calls iommu_group_alloc_default_domain() > > * -> always allocates a new domain > > > > ...so we always have a "struct device" when a domain is allocated if > > that domain is going to be associated with a device. > > > > I will agree that iommu_probe_device() is called before the driver > > probe, but unless I missed something it's after the device driver is > > loaded. ...and assuming something like patch #1 in this series looks > > OK then iommu_probe_device() will be called after "pre_probe". > > > > So assuming I'm not missing something, I'm not actually relying the > > IOMMU getting init off the back of driver probe. > > > > > > > FWIW we already have a go-faster knob for people who want to tweak the > > > security/performance compromise for specific devices, namely the sysfs > > > interface for changing a group's domain type before binding the relevant > > > driver(s). Is that something you could use in your application, say from > > > an initramfs script? > > > > We've never had an initramfs script in Chrome OS. I don't know all the > > history of why (I'm trying to check), but I'm nearly certain it was a > > conscious decision. Probably it has to do with the fact that we're not > > trying to build a generic distribution where a single boot source can > > boot a huge variety of hardware. We generally have one kernel for a > > class of devices. I believe avoiding the initramfs just keeps things > > simpler. > > > > I think trying to revamp Chrome OS to switch to an initramfs type > > system would be a pretty big undertaking since (as I understand it) > > you can't just run a little command and then return to the normal boot > > flow. Once you switch to initramfs you're committing to finding / > > setting up the rootfs yourself and on Chrome OS I believe that means a > > whole bunch of dm-verity work. > > > > > > ...so probably the initramfs is a no-go for me, but I'm still crossing > > my fingers that the pre_probe() might be legit if you take a second > > look at it? > > Couldn't you have a driver
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote: > Hi, > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy wrote: > > > > Hi Doug, > > > > On 2021-06-22 00:52, Douglas Anderson wrote: > > > > > > This patch attempts to put forward a proposal for enabling non-strict > > > DMA on a device-by-device basis. The patch series requests non-strict > > > DMA for the Qualcomm SDHCI controller as a first device to enable, > > > getting a nice bump in performance with what's believed to be a very > > > small drop in security / safety (see the patch for the full argument). > > > > > > As part of this patch series I am end up slightly cleaning up some of > > > the interactions between the PCI subsystem and the IOMMU subsystem but > > > I don't go all the way to fully remove all the tentacles. Specifically > > > this patch series only concerns itself with a single aspect: strict > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier > > > to talk about / reason about for more subsystems compared to overall > > > deciding what it means for a device to be "external" or "untrusted". > > > > > > If something like this patch series ends up being landable, it will > > > undoubtedly need coordination between many maintainers to land. I > > > believe it's fully bisectable but later patches in the series > > > definitely depend on earlier ones. Sorry for the long CC list. :( > > > > Unfortunately, this doesn't work. In normal operation, the default > > domains should be established long before individual drivers are even > > loaded (if they are modules), let alone anywhere near probing. The fact > > that iommu_probe_device() sometimes gets called far too late off the > > back of driver probe is an unfortunate artefact of the original > > probe-deferral scheme, and causes other problems like potentially > > malformed groups - I've been forming a plan to fix that for a while now, > > so I for one really can't condone anything trying to rely on it. > > Non-deterministic behaviour based on driver probe order for multi-device > > groups is part of the existing problem, and your proposal seems equally > > vulnerable to that too. > > Doh! :( I definitely can't say I understand the iommu subsystem > amazingly well. It was working for me, but I could believe that I was > somehow violating a rule somewhere. > > I'm having a bit of a hard time understanding where the problem is > though. Is there any chance that you missed the part of my series > where I introduced a "pre_probe" step? Specifically, I see this: > > * really_probe() is called w/ a driver and a device. > * -> calls dev->bus->dma_configure() w/ a "struct device *" > * -> eventually calls iommu_probe_device() w/ the device. > * -> calls iommu_alloc_default_domain() w/ the device > * -> calls iommu_group_alloc_default_domain() > * -> always allocates a new domain > > ...so we always have a "struct device" when a domain is allocated if > that domain is going to be associated with a device. > > I will agree that iommu_probe_device() is called before the driver > probe, but unless I missed something it's after the device driver is > loaded. ...and assuming something like patch #1 in this series looks > OK then iommu_probe_device() will be called after "pre_probe". > > So assuming I'm not missing something, I'm not actually relying the > IOMMU getting init off the back of driver probe. > > > > FWIW we already have a go-faster knob for people who want to tweak the > > security/performance compromise for specific devices, namely the sysfs > > interface for changing a group's domain type before binding the relevant > > driver(s). Is that something you could use in your application, say from > > an initramfs script? > > We've never had an initramfs script in Chrome OS. I don't know all the > history of why (I'm trying to check), but I'm nearly certain it was a > conscious decision. Probably it has to do with the fact that we're not > trying to build a generic distribution where a single boot source can > boot a huge variety of hardware. We generally have one kernel for a > class of devices. I believe avoiding the initramfs just keeps things > simpler. > > I think trying to revamp Chrome OS to switch to an initramfs type > system would be a pretty big undertaking since (as I understand it) > you can't just run a little command and then return to the normal boot > flow. Once you switch to initramfs you're committing to finding / > setting up the rootfs yourself and on Chrome OS I believe that means a > whole bunch of dm-verity work. > > > ...so probably the initramfs is a no-go for me, but I'm still crossing > my fingers that the pre_probe() might be legit if you take a second > look at it? Couldn't you have a driver flag that has the same effect as twiddling sysfs? At the being of probe, check the flag and go set the underlying sysfs setting in the device. Though you may want this to be per device, not per driver. To do that early, I th
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
On Tue, Jun 22, 2021 at 9:46 AM Doug Anderson wrote: > > Hi, > > On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan wrote: > > > > On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson > > wrote: > > > > > > In the patch ("drivers: base: Add bits to struct device to control > > > iommu strictness") we add the ability for devices to tell us about > > > their IOMMU strictness requirements. Let's now take that into account > > > in the IOMMU layer. > > > > > > A few notes here: > > > * Presumably this is always how iommu_get_dma_strict() was intended to > > > behave. Had this not been the intention then it never would have > > > taken a domain as a parameter. > > > * The iommu_set_dma_strict() feels awfully non-symmetric now. That > > > function sets the _default_ strictness globally in the system > > > whereas iommu_get_dma_strict() returns the value for a given domain > > > (falling back to the default). Presumably, at least, the fact that > > > iommu_set_dma_strict() doesn't take a domain makes this obvious. > > > > > > The function iommu_get_dma_strict() should now make it super obvious > > > where strictness comes from and who overides who. Though the function > > > changed a bunch to make the logic clearer, the only two new rules > > > should be: > > > * Devices can force strictness for themselves, overriding the cmdline > > > "iommu.strict=0" or a call to iommu_set_dma_strict(false)). > > > * Devices can request non-strictness for themselves, assuming there > > > was no cmdline "iommu.strict=1" or a call to > > > iommu_set_dma_strict(true). > > > > > > Signed-off-by: Douglas Anderson > > > --- > > > > > > drivers/iommu/iommu.c | 56 +-- > > > include/linux/iommu.h | 2 ++ > > > 2 files changed, 45 insertions(+), 13 deletions(-) > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > index 808ab70d5df5..0c84a4c06110 100644 > > > --- a/drivers/iommu/iommu.c > > > +++ b/drivers/iommu/iommu.c > > > @@ -28,8 +28,19 @@ > > > static struct kset *iommu_group_kset; > > > static DEFINE_IDA(iommu_group_ida); > > > > > > +enum iommu_strictness { > > > + IOMMU_DEFAULT_STRICTNESS = -1, > > > + IOMMU_NOT_STRICT = 0, > > > + IOMMU_STRICT = 1, > > > +}; > > > +static inline enum iommu_strictness bool_to_strictness(bool strictness) > > > +{ > > > + return (enum iommu_strictness)strictness; > > > +} > > > + > > > static unsigned int iommu_def_domain_type __read_mostly; > > > -static bool iommu_dma_strict __read_mostly = true; > > > +static enum iommu_strictness cmdline_dma_strict __read_mostly = > > > IOMMU_DEFAULT_STRICTNESS; > > > +static enum iommu_strictness driver_dma_strict __read_mostly = > > > IOMMU_DEFAULT_STRICTNESS; > > > static u32 iommu_cmd_line __read_mostly; > > > > > > struct iommu_group { > > > @@ -69,7 +80,6 @@ static const char * const > > > iommu_group_resv_type_string[] = { > > > }; > > > > > > #define IOMMU_CMD_LINE_DMA_API BIT(0) > > > -#define IOMMU_CMD_LINE_STRICT BIT(1) > > > > > > static int iommu_alloc_default_domain(struct iommu_group *group, > > > struct device *dev); > > > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", > > > iommu_set_def_domain_type); > > > > > > static int __init iommu_dma_setup(char *str) > > > { > > > - int ret = kstrtobool(str, &iommu_dma_strict); > > > + bool strict; > > > + int ret = kstrtobool(str, &strict); > > > > > > if (!ret) > > > - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; > > > + cmdline_dma_strict = bool_to_strictness(strict); > > > return ret; > > > } > > > early_param("iommu.strict", iommu_dma_setup); > > > > > > void iommu_set_dma_strict(bool strict) > > > { > > > - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) > > > - iommu_dma_strict = strict; > > > + /* A driver can request strictness but not the other way around */ > > > + if (driver_dma_strict != IOMMU_STRICT) > > > + driver_dma_strict = bool_to_strictness(strict); > > > } > > > > > > bool iommu_get_dma_strict(struct iommu_domain *domain) > > > { > > > - /* only allow lazy flushing for DMA domains */ > > > - if (domain->type == IOMMU_DOMAIN_DMA) > > > - return iommu_dma_strict; > > > + /* Non-DMA domains or anyone forcing it to strict makes it strict > > > */ > > > + if (domain->type != IOMMU_DOMAIN_DMA || > > > + cmdline_dma_strict == IOMMU_STRICT || > > > + driver_dma_strict == IOMMU_STRICT || > > > + domain->force_strict) > > > + return true; > > > + > > > + /* Anyone requesting non-strict (if no forces) makes it > > > non-strict */ > > > + if (cmdline_dma_strict == IOMMU_NOT_STRICT || > > > + driver_dma_strict == IOMMU_NOT_STRICT || > > > + domain->request_non_strict) > > > +
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
Hi, On Tue, Jun 22, 2021 at 10:46 AM John Garry wrote: > > On 22/06/2021 00:52, Douglas Anderson wrote: > > > > This patch attempts to put forward a proposal for enabling non-strict > > DMA on a device-by-device basis. The patch series requests non-strict > > DMA for the Qualcomm SDHCI controller as a first device to enable, > > getting a nice bump in performance with what's believed to be a very > > small drop in security / safety (see the patch for the full argument). > > > > As part of this patch series I am end up slightly cleaning up some of > > the interactions between the PCI subsystem and the IOMMU subsystem but > > I don't go all the way to fully remove all the tentacles. Specifically > > this patch series only concerns itself with a single aspect: strict > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier > > to talk about / reason about for more subsystems compared to overall > > deciding what it means for a device to be "external" or "untrusted". > > > > If something like this patch series ends up being landable, it will > > undoubtedly need coordination between many maintainers to land. I > > believe it's fully bisectable but later patches in the series > > definitely depend on earlier ones. Sorry for the long CC list. :( > > > > JFYI, In case to missed it, and I know it's not the same thing as you > want, above, but the following series will allow you to build the kernel > to default to lazy mode: > > https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.ga...@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e > > So iommu.strict=0 would be no longer always required for arm64. Excellent! I'm never a fan of command line parameters as a replacement for Kconfig. They are great for debugging or for cases where the user of the kernel and the person compiling the kernel are not the same (like with off-the-shelf Linux distros) but aren't great for setting a default for embedded environments. I actually think that something like my patchset may be even more important atop yours. Do you agree? If the default is non-strict it could be extra important to be able to mark a certain device to be in "strict" mode. ...also, unfortunately I probably won't be able to use your patchest for my use case. I think we want the extra level of paranoia by default and really only want to allow non-strict mode for devices that we're really convinced are safe. -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
Hi, On Tue, Jun 22, 2021 at 11:45 AM Rajat Jain wrote: > > On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson > wrote: > > > > In the patch ("drivers: base: Add bits to struct device to control > > iommu strictness") we add the ability for devices to tell us about > > their IOMMU strictness requirements. Let's now take that into account > > in the IOMMU layer. > > > > A few notes here: > > * Presumably this is always how iommu_get_dma_strict() was intended to > > behave. Had this not been the intention then it never would have > > taken a domain as a parameter. > > * The iommu_set_dma_strict() feels awfully non-symmetric now. That > > function sets the _default_ strictness globally in the system > > whereas iommu_get_dma_strict() returns the value for a given domain > > (falling back to the default). Presumably, at least, the fact that > > iommu_set_dma_strict() doesn't take a domain makes this obvious. > > > > The function iommu_get_dma_strict() should now make it super obvious > > where strictness comes from and who overides who. Though the function > > changed a bunch to make the logic clearer, the only two new rules > > should be: > > * Devices can force strictness for themselves, overriding the cmdline > > "iommu.strict=0" or a call to iommu_set_dma_strict(false)). > > * Devices can request non-strictness for themselves, assuming there > > was no cmdline "iommu.strict=1" or a call to > > iommu_set_dma_strict(true). > > Along the same lines, I believe a platform (device tree / ACPI) should > also be able to have a say in this. I assume in your proposal, a > platform would expose a property in device tree which the device > driver would need to parse and then use it to set these bits in the > "struct device"? Nothing would prevent creating a device tree or ACPI property that caused either "force-strict" or "request-non-strict" from being set if everyone agrees that it's a good idea. I wouldn't reject the idea myself, but I do worry that we'd devolve into the usual bikeshed for exactly how this should look. I talked about this a bit in my response to Saravana, but basically: * If there was some generic property, would we call it "untrusted", "external", or something else? * How do you describe "trust" in a generic "objective" way? It's not really boolean and trying to describe exactly how trustworthy something should be considered is hard. * At least for the device tree there's a general requirement that it describes the hardware and not so much how the software should configure the hardware. As I understand it there is _some_ leeway here where it's OK to describe how the hardware was designed for the OS to configure it, but it's a pretty high bar and a hard sell. In general the device tree isn't supposed to be used to describe "policy". In other words: if one OS might decide on one setting and another OS on another then it doesn't really belong in the device tree. * In general the kernel is also not really supposed to have policy hardcoded in either, though it feels like we can get away with having a good default/sane policy and allowing overriding the policy with command line parameters (like iommu.strict). In the case where something has to be configured at bootup there's not many ways to do better. tl;dr: I have no plans to try to make an overarching property, but my patch series does allow subsystems to come up with and easily implement their own rules as it makes sense. While this might seem hodgepodge I prefer to see it as "flexible" since I'm not convinced that we're going to be able to come up with an overarching trust framework. -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson wrote: > > In the patch ("drivers: base: Add bits to struct device to control > iommu strictness") we add the ability for devices to tell us about > their IOMMU strictness requirements. Let's now take that into account > in the IOMMU layer. > > A few notes here: > * Presumably this is always how iommu_get_dma_strict() was intended to > behave. Had this not been the intention then it never would have > taken a domain as a parameter. > * The iommu_set_dma_strict() feels awfully non-symmetric now. That > function sets the _default_ strictness globally in the system > whereas iommu_get_dma_strict() returns the value for a given domain > (falling back to the default). Presumably, at least, the fact that > iommu_set_dma_strict() doesn't take a domain makes this obvious. > > The function iommu_get_dma_strict() should now make it super obvious > where strictness comes from and who overides who. Though the function > changed a bunch to make the logic clearer, the only two new rules > should be: > * Devices can force strictness for themselves, overriding the cmdline > "iommu.strict=0" or a call to iommu_set_dma_strict(false)). > * Devices can request non-strictness for themselves, assuming there > was no cmdline "iommu.strict=1" or a call to > iommu_set_dma_strict(true). Along the same lines, I believe a platform (device tree / ACPI) should also be able to have a say in this. I assume in your proposal, a platform would expose a property in device tree which the device driver would need to parse and then use it to set these bits in the "struct device"? Thanks, Rajat > > Signed-off-by: Douglas Anderson > --- > > drivers/iommu/iommu.c | 56 +-- > include/linux/iommu.h | 2 ++ > 2 files changed, 45 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 808ab70d5df5..0c84a4c06110 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -28,8 +28,19 @@ > static struct kset *iommu_group_kset; > static DEFINE_IDA(iommu_group_ida); > > +enum iommu_strictness { > + IOMMU_DEFAULT_STRICTNESS = -1, > + IOMMU_NOT_STRICT = 0, > + IOMMU_STRICT = 1, > +}; > +static inline enum iommu_strictness bool_to_strictness(bool strictness) > +{ > + return (enum iommu_strictness)strictness; > +} > + > static unsigned int iommu_def_domain_type __read_mostly; > -static bool iommu_dma_strict __read_mostly = true; > +static enum iommu_strictness cmdline_dma_strict __read_mostly = > IOMMU_DEFAULT_STRICTNESS; > +static enum iommu_strictness driver_dma_strict __read_mostly = > IOMMU_DEFAULT_STRICTNESS; > static u32 iommu_cmd_line __read_mostly; > > struct iommu_group { > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = { > }; > > #define IOMMU_CMD_LINE_DMA_API BIT(0) > -#define IOMMU_CMD_LINE_STRICT BIT(1) > > static int iommu_alloc_default_domain(struct iommu_group *group, > struct device *dev); > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", > iommu_set_def_domain_type); > > static int __init iommu_dma_setup(char *str) > { > - int ret = kstrtobool(str, &iommu_dma_strict); > + bool strict; > + int ret = kstrtobool(str, &strict); > > if (!ret) > - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; > + cmdline_dma_strict = bool_to_strictness(strict); > return ret; > } > early_param("iommu.strict", iommu_dma_setup); > > void iommu_set_dma_strict(bool strict) > { > - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) > - iommu_dma_strict = strict; > + /* A driver can request strictness but not the other way around */ > + if (driver_dma_strict != IOMMU_STRICT) > + driver_dma_strict = bool_to_strictness(strict); > } > > bool iommu_get_dma_strict(struct iommu_domain *domain) > { > - /* only allow lazy flushing for DMA domains */ > - if (domain->type == IOMMU_DOMAIN_DMA) > - return iommu_dma_strict; > + /* Non-DMA domains or anyone forcing it to strict makes it strict */ > + if (domain->type != IOMMU_DOMAIN_DMA || > + cmdline_dma_strict == IOMMU_STRICT || > + driver_dma_strict == IOMMU_STRICT || > + domain->force_strict) > + return true; > + > + /* Anyone requesting non-strict (if no forces) makes it non-strict */ > + if (cmdline_dma_strict == IOMMU_NOT_STRICT || > + driver_dma_strict == IOMMU_NOT_STRICT || > + domain->request_non_strict) > + return false; > + > + /* Nobody said anything, so it's strict by default */ > return true; > } > EXPORT_SYMBOL_GPL(iommu_get_dma_strict); > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev) > > static int iommu_group_alloc_default_domain(
Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
On 2021-06-22 15:27, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-22 17:41, Robin Murphy wrote: On 2021-06-22 08:11, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-21 21:15, Robin Murphy wrote: On 2021-06-18 03:51, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context with tlb_flush_all() callback in partial walk flush to improve unmap performance on select few platforms where the cost of over-invalidation is less than the unmap latency. I still think this doesn't belong anywhere near io-pgtable at all. It's a driver-internal decision how exactly it implements a non-leaf invalidation, and that may be more complex than a predetermined boolean decision. For example, I've just realised for SMMUv3 we can't invalidate multiple levels of table at once with a range command, since if we assume the whole thing is mapped at worst-case page granularity we may fail to invalidate any parts which are mapped as intermediate-level blocks. If invalidating a 1GB region (with 4KB granule) means having to fall back to 256K non-range commands, we may not want to invalidate by VA then, even though doing so for a 2MB region is still optimal. It's also quite feasible that drivers might want to do this for leaf invalidations too - if you don't like issuing 512 commands to invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB? - and at that point the logic really has to be in the driver anyway. Ok I will move this to tlb_flush_walk() functions in the drivers. In the previous v1 thread, you suggested to make the choice in iommu_get_dma_strict() test, I assume you meant the test in iommu_dma_init_domain() with a flag or was it the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in init_domain? Yes, I meant literally inside the same condition where we currently set "pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in arm_smmu_init_domain_context(). Ok got it, thanks. I am still a bit confused on where this flag would be? Should this be a part of struct iommu_domain? Well, if you were to rewrite the config with an alternative set of flush_ops at that point it would be implicit. For a flag, probably either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less useful than generalising straight to a "maximum number of by-VA invalidations it's worth sending individually" threshold value? But then we would still need some flag to make this implementation specific (qcom specific for now) and this threshold would just be another condition although it would have been useful if this was generic enough. Well, for that approach I assume we could do something like special-case 0, or if it's a mutable per-domain value maybe just initialise it to SIZE_MAX or whatever such that it would never be reached in practice. Whichever way, it was meant to be implied that anything at the domain level would still be subject to final adjustment by the init_context hook. Robin. It's clear to me what overall shape and separation of responsibility is most logical, but beyond that I don't have a particularly strong opinion on the exact implementation; I've just been chucking ideas around :) Your ideas are very informative and useful :) Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
On 22/06/2021 00:52, Douglas Anderson wrote: This patch attempts to put forward a proposal for enabling non-strict DMA on a device-by-device basis. The patch series requests non-strict DMA for the Qualcomm SDHCI controller as a first device to enable, getting a nice bump in performance with what's believed to be a very small drop in security / safety (see the patch for the full argument). As part of this patch series I am end up slightly cleaning up some of the interactions between the PCI subsystem and the IOMMU subsystem but I don't go all the way to fully remove all the tentacles. Specifically this patch series only concerns itself with a single aspect: strict vs. non-strict mode for the IOMMU. I'm hoping that this will be easier to talk about / reason about for more subsystems compared to overall deciding what it means for a device to be "external" or "untrusted". If something like this patch series ends up being landable, it will undoubtedly need coordination between many maintainers to land. I believe it's fully bisectable but later patches in the series definitely depend on earlier ones. Sorry for the long CC list. :( JFYI, In case to missed it, and I know it's not the same thing as you want, above, but the following series will allow you to build the kernel to default to lazy mode: https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.ga...@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e So iommu.strict=0 would be no longer always required for arm64. Thanks, John Douglas Anderson (6): drivers: base: Add the concept of "pre_probe" to drivers drivers: base: Add bits to struct device to control iommu strictness PCI: Indicate that we want to force strict DMA for untrusted devices iommu: Combine device strictness requests with the global default iommu: Stop reaching into PCIe devices to decide strict vs. non-strict mmc: sdhci-msm: Request non-strict IOMMU mode drivers/base/dd.c | 10 +-- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iommu.c | 56 +++ drivers/mmc/host/sdhci-msm.c | 8 + drivers/pci/probe.c | 4 ++- include/linux/device.h| 11 +++ include/linux/device/driver.h | 9 ++ include/linux/iommu.h | 2 ++ 8 files changed, 85 insertions(+), 17 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
Hi, On Tue, Jun 22, 2021 at 9:53 AM Doug Anderson wrote: > > Hi, > > On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu wrote: > > > > On 6/22/21 7:52 AM, Douglas Anderson wrote: > > > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device > > > *dev) > > > > > > static int iommu_group_alloc_default_domain(struct bus_type *bus, > > > struct iommu_group *group, > > > - unsigned int type) > > > + unsigned int type, > > > + struct device *dev) > > > { > > > struct iommu_domain *dom; > > > > > > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct > > > bus_type *bus, > > > if (!dom) > > > return -ENOMEM; > > > > > > + /* Save the strictness requests from the device */ > > > + if (dev && type == IOMMU_DOMAIN_DMA) { > > > + dom->request_non_strict = dev->request_non_strict_iommu; > > > + dom->force_strict = dev->force_strict_iommu; > > > + } > > > + > > > > An iommu default domain might be used by multiple devices which might > > have different "strict" attributions. Then who could override who? > > My gut instinct would be that if multiple devices were part of a given > domain that it would be combined like this: > > 1. Any device that requests strict makes the domain strict force strict. > > 2. To request non-strict all of the devices in the domain would have > to request non-strict. > > To do that I'd have to change my patchset obviously, but I don't think > it should be hard. We can just keep a count of devices and a count of > the strict vs. non-strict requests? If there are no other blockers > I'll try to do that in my v2. One issue, I guess, is that we might need to transition a non-strict domain to become strict. Is that possible? We'd end up with an extra "flush queue" that we didn't need to allocate, but are there other problems? Actually, in general would it be possible to transition between strict and non-strict at runtime as long as we called init_iova_flush_queue()? Maybe that's a better solution than all this--we just boot in strict mode and can just transition to non-strict mode after-the-fact? -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
Hi, On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu wrote: > > On 6/22/21 7:52 AM, Douglas Anderson wrote: > > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device > > *dev) > > > > static int iommu_group_alloc_default_domain(struct bus_type *bus, > > struct iommu_group *group, > > - unsigned int type) > > + unsigned int type, > > + struct device *dev) > > { > > struct iommu_domain *dom; > > > > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct > > bus_type *bus, > > if (!dom) > > return -ENOMEM; > > > > + /* Save the strictness requests from the device */ > > + if (dev && type == IOMMU_DOMAIN_DMA) { > > + dom->request_non_strict = dev->request_non_strict_iommu; > > + dom->force_strict = dev->force_strict_iommu; > > + } > > + > > An iommu default domain might be used by multiple devices which might > have different "strict" attributions. Then who could override who? My gut instinct would be that if multiple devices were part of a given domain that it would be combined like this: 1. Any device that requests strict makes the domain strict force strict. 2. To request non-strict all of the devices in the domain would have to request non-strict. To do that I'd have to change my patchset obviously, but I don't think it should be hard. We can just keep a count of devices and a count of the strict vs. non-strict requests? If there are no other blockers I'll try to do that in my v2. -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
Hi, On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan wrote: > > On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson > wrote: > > > > In the patch ("drivers: base: Add bits to struct device to control > > iommu strictness") we add the ability for devices to tell us about > > their IOMMU strictness requirements. Let's now take that into account > > in the IOMMU layer. > > > > A few notes here: > > * Presumably this is always how iommu_get_dma_strict() was intended to > > behave. Had this not been the intention then it never would have > > taken a domain as a parameter. > > * The iommu_set_dma_strict() feels awfully non-symmetric now. That > > function sets the _default_ strictness globally in the system > > whereas iommu_get_dma_strict() returns the value for a given domain > > (falling back to the default). Presumably, at least, the fact that > > iommu_set_dma_strict() doesn't take a domain makes this obvious. > > > > The function iommu_get_dma_strict() should now make it super obvious > > where strictness comes from and who overides who. Though the function > > changed a bunch to make the logic clearer, the only two new rules > > should be: > > * Devices can force strictness for themselves, overriding the cmdline > > "iommu.strict=0" or a call to iommu_set_dma_strict(false)). > > * Devices can request non-strictness for themselves, assuming there > > was no cmdline "iommu.strict=1" or a call to > > iommu_set_dma_strict(true). > > > > Signed-off-by: Douglas Anderson > > --- > > > > drivers/iommu/iommu.c | 56 +-- > > include/linux/iommu.h | 2 ++ > > 2 files changed, 45 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 808ab70d5df5..0c84a4c06110 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -28,8 +28,19 @@ > > static struct kset *iommu_group_kset; > > static DEFINE_IDA(iommu_group_ida); > > > > +enum iommu_strictness { > > + IOMMU_DEFAULT_STRICTNESS = -1, > > + IOMMU_NOT_STRICT = 0, > > + IOMMU_STRICT = 1, > > +}; > > +static inline enum iommu_strictness bool_to_strictness(bool strictness) > > +{ > > + return (enum iommu_strictness)strictness; > > +} > > + > > static unsigned int iommu_def_domain_type __read_mostly; > > -static bool iommu_dma_strict __read_mostly = true; > > +static enum iommu_strictness cmdline_dma_strict __read_mostly = > > IOMMU_DEFAULT_STRICTNESS; > > +static enum iommu_strictness driver_dma_strict __read_mostly = > > IOMMU_DEFAULT_STRICTNESS; > > static u32 iommu_cmd_line __read_mostly; > > > > struct iommu_group { > > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] > > = { > > }; > > > > #define IOMMU_CMD_LINE_DMA_API BIT(0) > > -#define IOMMU_CMD_LINE_STRICT BIT(1) > > > > static int iommu_alloc_default_domain(struct iommu_group *group, > > struct device *dev); > > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", > > iommu_set_def_domain_type); > > > > static int __init iommu_dma_setup(char *str) > > { > > - int ret = kstrtobool(str, &iommu_dma_strict); > > + bool strict; > > + int ret = kstrtobool(str, &strict); > > > > if (!ret) > > - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; > > + cmdline_dma_strict = bool_to_strictness(strict); > > return ret; > > } > > early_param("iommu.strict", iommu_dma_setup); > > > > void iommu_set_dma_strict(bool strict) > > { > > - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) > > - iommu_dma_strict = strict; > > + /* A driver can request strictness but not the other way around */ > > + if (driver_dma_strict != IOMMU_STRICT) > > + driver_dma_strict = bool_to_strictness(strict); > > } > > > > bool iommu_get_dma_strict(struct iommu_domain *domain) > > { > > - /* only allow lazy flushing for DMA domains */ > > - if (domain->type == IOMMU_DOMAIN_DMA) > > - return iommu_dma_strict; > > + /* Non-DMA domains or anyone forcing it to strict makes it strict */ > > + if (domain->type != IOMMU_DOMAIN_DMA || > > + cmdline_dma_strict == IOMMU_STRICT || > > + driver_dma_strict == IOMMU_STRICT || > > + domain->force_strict) > > + return true; > > + > > + /* Anyone requesting non-strict (if no forces) makes it non-strict > > */ > > + if (cmdline_dma_strict == IOMMU_NOT_STRICT || > > + driver_dma_strict == IOMMU_NOT_STRICT || > > + domain->request_non_strict) > > + return false; > > + > > + /* Nobody said anything, so it's strict by default */ > > If iommu.strict is not set in the command line, upstream treats it as > iommu.strict=1. Meaning, no drivers can override it. > > If I understand it correctly, with your series, if iommu.strict=1 is > no
Re: [PATCH] dt-bindings: arm-smmu: Fix json-schema syntax
On Mon, Jun 21, 2021 at 7:58 AM Thierry Reding wrote: > > From: Thierry Reding > > Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible > string") introduced a jsonschema syntax error as a result of a rebase > gone wrong. Fix it. > > Fixes: 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible string") > Reported-by: Rob Herring > Signed-off-by: Thierry Reding > --- > Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) Acked-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
Hi, On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy wrote: > > Hi Doug, > > On 2021-06-22 00:52, Douglas Anderson wrote: > > > > This patch attempts to put forward a proposal for enabling non-strict > > DMA on a device-by-device basis. The patch series requests non-strict > > DMA for the Qualcomm SDHCI controller as a first device to enable, > > getting a nice bump in performance with what's believed to be a very > > small drop in security / safety (see the patch for the full argument). > > > > As part of this patch series I am end up slightly cleaning up some of > > the interactions between the PCI subsystem and the IOMMU subsystem but > > I don't go all the way to fully remove all the tentacles. Specifically > > this patch series only concerns itself with a single aspect: strict > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier > > to talk about / reason about for more subsystems compared to overall > > deciding what it means for a device to be "external" or "untrusted". > > > > If something like this patch series ends up being landable, it will > > undoubtedly need coordination between many maintainers to land. I > > believe it's fully bisectable but later patches in the series > > definitely depend on earlier ones. Sorry for the long CC list. :( > > Unfortunately, this doesn't work. In normal operation, the default > domains should be established long before individual drivers are even > loaded (if they are modules), let alone anywhere near probing. The fact > that iommu_probe_device() sometimes gets called far too late off the > back of driver probe is an unfortunate artefact of the original > probe-deferral scheme, and causes other problems like potentially > malformed groups - I've been forming a plan to fix that for a while now, > so I for one really can't condone anything trying to rely on it. > Non-deterministic behaviour based on driver probe order for multi-device > groups is part of the existing problem, and your proposal seems equally > vulnerable to that too. Doh! :( I definitely can't say I understand the iommu subsystem amazingly well. It was working for me, but I could believe that I was somehow violating a rule somewhere. I'm having a bit of a hard time understanding where the problem is though. Is there any chance that you missed the part of my series where I introduced a "pre_probe" step? Specifically, I see this: * really_probe() is called w/ a driver and a device. * -> calls dev->bus->dma_configure() w/ a "struct device *" * -> eventually calls iommu_probe_device() w/ the device. * -> calls iommu_alloc_default_domain() w/ the device * -> calls iommu_group_alloc_default_domain() * -> always allocates a new domain ...so we always have a "struct device" when a domain is allocated if that domain is going to be associated with a device. I will agree that iommu_probe_device() is called before the driver probe, but unless I missed something it's after the device driver is loaded. ...and assuming something like patch #1 in this series looks OK then iommu_probe_device() will be called after "pre_probe". So assuming I'm not missing something, I'm not actually relying the IOMMU getting init off the back of driver probe. > FWIW we already have a go-faster knob for people who want to tweak the > security/performance compromise for specific devices, namely the sysfs > interface for changing a group's domain type before binding the relevant > driver(s). Is that something you could use in your application, say from > an initramfs script? We've never had an initramfs script in Chrome OS. I don't know all the history of why (I'm trying to check), but I'm nearly certain it was a conscious decision. Probably it has to do with the fact that we're not trying to build a generic distribution where a single boot source can boot a huge variety of hardware. We generally have one kernel for a class of devices. I believe avoiding the initramfs just keeps things simpler. I think trying to revamp Chrome OS to switch to an initramfs type system would be a pretty big undertaking since (as I understand it) you can't just run a little command and then return to the normal boot flow. Once you switch to initramfs you're committing to finding / setting up the rootfs yourself and on Chrome OS I believe that means a whole bunch of dm-verity work. ...so probably the initramfs is a no-go for me, but I'm still crossing my fingers that the pre_probe() might be legit if you take a second look at it? -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
Hi Robin, On 2021-06-22 17:41, Robin Murphy wrote: On 2021-06-22 08:11, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-21 21:15, Robin Murphy wrote: On 2021-06-18 03:51, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context with tlb_flush_all() callback in partial walk flush to improve unmap performance on select few platforms where the cost of over-invalidation is less than the unmap latency. I still think this doesn't belong anywhere near io-pgtable at all. It's a driver-internal decision how exactly it implements a non-leaf invalidation, and that may be more complex than a predetermined boolean decision. For example, I've just realised for SMMUv3 we can't invalidate multiple levels of table at once with a range command, since if we assume the whole thing is mapped at worst-case page granularity we may fail to invalidate any parts which are mapped as intermediate-level blocks. If invalidating a 1GB region (with 4KB granule) means having to fall back to 256K non-range commands, we may not want to invalidate by VA then, even though doing so for a 2MB region is still optimal. It's also quite feasible that drivers might want to do this for leaf invalidations too - if you don't like issuing 512 commands to invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB? - and at that point the logic really has to be in the driver anyway. Ok I will move this to tlb_flush_walk() functions in the drivers. In the previous v1 thread, you suggested to make the choice in iommu_get_dma_strict() test, I assume you meant the test in iommu_dma_init_domain() with a flag or was it the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in init_domain? Yes, I meant literally inside the same condition where we currently set "pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in arm_smmu_init_domain_context(). Ok got it, thanks. I am still a bit confused on where this flag would be? Should this be a part of struct iommu_domain? Well, if you were to rewrite the config with an alternative set of flush_ops at that point it would be implicit. For a flag, probably either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less useful than generalising straight to a "maximum number of by-VA invalidations it's worth sending individually" threshold value? But then we would still need some flag to make this implementation specific (qcom specific for now) and this threshold would just be another condition although it would have been useful if this was generic enough. It's clear to me what overall shape and separation of responsibility is most logical, but beyond that I don't have a particularly strong opinion on the exact implementation; I've just been chucking ideas around :) Your ideas are very informative and useful :) Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On Tue, Jun 22, 2021 at 2:17 AM Geert Uytterhoeven wrote: > > Hi Rob, > > On Tue, Jun 15, 2021 at 9:16 PM Rob Herring wrote: > > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > > same size as the list is redundant and can be dropped. Note that is DT > > schema specific behavior and not standard json-schema behavior. The tooling > > will fixup the final schema adding any unspecified minItems/maxItems. > > > > This condition is partially checked with the meta-schema already, but > > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > > An improved meta-schema is pending. > > > Signed-off-by: Rob Herring > > > --- a/Documentation/devicetree/bindings/net/stm32-dwmac.yaml > > +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.yaml > > @@ -46,7 +46,6 @@ properties: > > > >clocks: > > minItems: 3 > > -maxItems: 5 > > items: > >- description: GMAC main clock > >- description: MAC TX clock > > While resolving the conflict with commit fea99822914039c6 > ("dt-bindings: net: document ptp_ref clk in dwmac") in soc/for-next, > I noticed the following construct for clock-names: > > clock-names: > minItems: 3 > maxItems: 6 > contains: > enum: > - stmmaceth > - mac-clk-tx > - mac-clk-rx > - ethstp > - eth-ck > - ptp_ref > > Should this use items instead of enum, and drop maxItems, or is this > a valid construct to support specifying the clocks in random order? > If the latter, it does mean that the order of clock-names may not > match the order of the clock descriptions. 'contains' is true if one or more entries match the strings. So it is really saying one of these is required. That's not really much of a constraint. There's 'minContains' and 'maxContains' in newer json-schema versions (not yet supported) that could add some constraints if there has to be at least N entries from contains. An 'items' schema (as opposed to a list) would say all items have to match one of the strings. I'm sure that's too strict. TLDR: clocks for this binding are a mess and the above is probably all we can do here. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] dma-mapping: use vmalloc_to_page for vmalloc addresses
This commit is dedicated to fix incorrect conversion from cpu_addr to page address in cases when we get virtual address which allocated in the vmalloc range. As the result, virt_to_page() cannot convert this address properly and return incorrect page address. Need to detect such cases and obtains the page address using vmalloc_to_page() instead. Signed-off-by: Roman Skakun Reviewed-by: Andrii Anisov --- Hey! Thanks for suggestions, Christoph! I updated the patch according to your advice. But, I'm so surprised because nobody catches this problem in the common code before. It looks a bit strange as for me. kernel/dma/ops_helpers.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c index 910ae69cae77..782728d8a393 100644 --- a/kernel/dma/ops_helpers.c +++ b/kernel/dma/ops_helpers.c @@ -5,6 +5,14 @@ */ #include +static struct page *cpu_addr_to_page(void *cpu_addr) +{ + if (is_vmalloc_addr(cpu_addr)) + return vmalloc_to_page(cpu_addr); + else + return virt_to_page(cpu_addr); +} + /* * Create scatter-list for the already allocated DMA buffer. */ @@ -12,7 +20,7 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - struct page *page = virt_to_page(cpu_addr); + struct page *page = cpu_addr_to_page(cpu_addr); int ret; ret = sg_alloc_table(sgt, 1, GFP_KERNEL); @@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, return -ENXIO; return remap_pfn_range(vma, vma->vm_start, - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff, + page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); #else return -ENXIO; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
On 2021-06-22 08:11, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-21 21:15, Robin Murphy wrote: On 2021-06-18 03:51, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context with tlb_flush_all() callback in partial walk flush to improve unmap performance on select few platforms where the cost of over-invalidation is less than the unmap latency. I still think this doesn't belong anywhere near io-pgtable at all. It's a driver-internal decision how exactly it implements a non-leaf invalidation, and that may be more complex than a predetermined boolean decision. For example, I've just realised for SMMUv3 we can't invalidate multiple levels of table at once with a range command, since if we assume the whole thing is mapped at worst-case page granularity we may fail to invalidate any parts which are mapped as intermediate-level blocks. If invalidating a 1GB region (with 4KB granule) means having to fall back to 256K non-range commands, we may not want to invalidate by VA then, even though doing so for a 2MB region is still optimal. It's also quite feasible that drivers might want to do this for leaf invalidations too - if you don't like issuing 512 commands to invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB? - and at that point the logic really has to be in the driver anyway. Ok I will move this to tlb_flush_walk() functions in the drivers. In the previous v1 thread, you suggested to make the choice in iommu_get_dma_strict() test, I assume you meant the test in iommu_dma_init_domain() with a flag or was it the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in init_domain? Yes, I meant literally inside the same condition where we currently set "pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;" in arm_smmu_init_domain_context(). I am still a bit confused on where this flag would be? Should this be a part of struct iommu_domain? Well, if you were to rewrite the config with an alternative set of flush_ops at that point it would be implicit. For a flag, probably either in arm_smmu_domain or arm_smmu_impl. Maybe a flag would be less useful than generalising straight to a "maximum number of by-VA invalidations it's worth sending individually" threshold value? It's clear to me what overall shape and separation of responsibility is most logical, but beyond that I don't have a particularly strong opinion on the exact implementation; I've just been chucking ideas around :) Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
On 2021-06-22 00:52, Douglas Anderson wrote: In the patch ("drivers: base: Add bits to struct device to control iommu strictness") we add the ability for devices to tell us about their IOMMU strictness requirements. Let's now take that into account in the IOMMU layer. A few notes here: * Presumably this is always how iommu_get_dma_strict() was intended to behave. Had this not been the intention then it never would have taken a domain as a parameter. FWIW strictness does have the semantic of being a per-domain property, but mostly in the sense that it's only relevant to IOMMU_DOMAIN_DMA domains, so the main thing was encapsulating that check rather than duplicating it all over callsites. * The iommu_set_dma_strict() feels awfully non-symmetric now. That function sets the _default_ strictness globally in the system whereas iommu_get_dma_strict() returns the value for a given domain (falling back to the default). Presumably, at least, the fact that iommu_set_dma_strict() doesn't take a domain makes this obvious. It *is* asymmetric - one is for IOMMU core code and individual driver internals to know whether they need to do whatever bits of setting up a flush queue for a given domain they are responsible for, while the other is specifically for two drivers to force the global default in order to preserve legacy driver-specific behaviour. Maybe that should have been called something like iommu_set_dma_default_strict instead... :/ Robin. The function iommu_get_dma_strict() should now make it super obvious where strictness comes from and who overides who. Though the function changed a bunch to make the logic clearer, the only two new rules should be: * Devices can force strictness for themselves, overriding the cmdline "iommu.strict=0" or a call to iommu_set_dma_strict(false)). * Devices can request non-strictness for themselves, assuming there was no cmdline "iommu.strict=1" or a call to iommu_set_dma_strict(true). Signed-off-by: Douglas Anderson --- drivers/iommu/iommu.c | 56 +-- include/linux/iommu.h | 2 ++ 2 files changed, 45 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..0c84a4c06110 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -28,8 +28,19 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); +enum iommu_strictness { + IOMMU_DEFAULT_STRICTNESS = -1, + IOMMU_NOT_STRICT = 0, + IOMMU_STRICT = 1, +}; +static inline enum iommu_strictness bool_to_strictness(bool strictness) +{ + return (enum iommu_strictness)strictness; +} + static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; static u32 iommu_cmd_line __read_mostly; struct iommu_group { @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = { }; #define IOMMU_CMD_LINE_DMA_API BIT(0) -#define IOMMU_CMD_LINE_STRICT BIT(1) static int iommu_alloc_default_domain(struct iommu_group *group, struct device *dev); @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type); static int __init iommu_dma_setup(char *str) { - int ret = kstrtobool(str, &iommu_dma_strict); + bool strict; + int ret = kstrtobool(str, &strict); if (!ret) - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; + cmdline_dma_strict = bool_to_strictness(strict); return ret; } early_param("iommu.strict", iommu_dma_setup); void iommu_set_dma_strict(bool strict) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + /* A driver can request strictness but not the other way around */ + if (driver_dma_strict != IOMMU_STRICT) + driver_dma_strict = bool_to_strictness(strict); } bool iommu_get_dma_strict(struct iommu_domain *domain) { - /* only allow lazy flushing for DMA domains */ - if (domain->type == IOMMU_DOMAIN_DMA) - return iommu_dma_strict; + /* Non-DMA domains or anyone forcing it to strict makes it strict */ + if (domain->type != IOMMU_DOMAIN_DMA || + cmdline_dma_strict == IOMMU_STRICT || + driver_dma_strict == IOMMU_STRICT || + domain->force_strict) + return true; + + /* Anyone requesting non-strict (if no forces) makes it non-strict */ + if (cmdline_dma_strict == IOMMU_NOT_STRICT || + driver_dma_strict == IOMMU_NOT_STRICT || + domain->request_non_strict) + return false; + + /* Nobody said anything, so it's strict by default *
Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
Hi Doug, On 2021-06-22 00:52, Douglas Anderson wrote: This patch attempts to put forward a proposal for enabling non-strict DMA on a device-by-device basis. The patch series requests non-strict DMA for the Qualcomm SDHCI controller as a first device to enable, getting a nice bump in performance with what's believed to be a very small drop in security / safety (see the patch for the full argument). As part of this patch series I am end up slightly cleaning up some of the interactions between the PCI subsystem and the IOMMU subsystem but I don't go all the way to fully remove all the tentacles. Specifically this patch series only concerns itself with a single aspect: strict vs. non-strict mode for the IOMMU. I'm hoping that this will be easier to talk about / reason about for more subsystems compared to overall deciding what it means for a device to be "external" or "untrusted". If something like this patch series ends up being landable, it will undoubtedly need coordination between many maintainers to land. I believe it's fully bisectable but later patches in the series definitely depend on earlier ones. Sorry for the long CC list. :( Unfortunately, this doesn't work. In normal operation, the default domains should be established long before individual drivers are even loaded (if they are modules), let alone anywhere near probing. The fact that iommu_probe_device() sometimes gets called far too late off the back of driver probe is an unfortunate artefact of the original probe-deferral scheme, and causes other problems like potentially malformed groups - I've been forming a plan to fix that for a while now, so I for one really can't condone anything trying to rely on it. Non-deterministic behaviour based on driver probe order for multi-device groups is part of the existing problem, and your proposal seems equally vulnerable to that too. FWIW we already have a go-faster knob for people who want to tweak the security/performance compromise for specific devices, namely the sysfs interface for changing a group's domain type before binding the relevant driver(s). Is that something you could use in your application, say from an initramfs script? Thanks, Robin. Douglas Anderson (6): drivers: base: Add the concept of "pre_probe" to drivers drivers: base: Add bits to struct device to control iommu strictness PCI: Indicate that we want to force strict DMA for untrusted devices iommu: Combine device strictness requests with the global default iommu: Stop reaching into PCIe devices to decide strict vs. non-strict mmc: sdhci-msm: Request non-strict IOMMU mode drivers/base/dd.c | 10 +-- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iommu.c | 56 +++ drivers/mmc/host/sdhci-msm.c | 8 + drivers/pci/probe.c | 4 ++- include/linux/device.h| 11 +++ include/linux/device/driver.h | 9 ++ include/linux/iommu.h | 2 ++ 8 files changed, 85 insertions(+), 17 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On 2021-06-21 18:51, Ross Philipson wrote: On 6/18/21 2:32 PM, Robin Murphy wrote: On 2021-06-18 17:12, Ross Philipson wrote: The IOMMU should always be set to default translated type after the PMRs are disabled to protect the MLE from DMA. Signed-off-by: Ross Philipson --- drivers/iommu/intel/iommu.c | 5 + drivers/iommu/iommu.c | 6 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284..4f0256d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev) */ static int device_def_domain_type(struct device *dev) { + /* Do not allow identity domain when Secure Launch is configured */ + if (slaunch_get_flags() & SL_FLAG_ACTIVE) + return IOMMU_DOMAIN_DMA; Is this specific to Intel? It seems like it could easily be done commonly like the check for untrusted external devices. It is currently Intel only but that will change. I will look into what you suggest. Yeah, it's simple and unobtrusive enough that I reckon it's worth going straight to the common version if it's worth doing at all. + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d..d49b7dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + + /* Do not allow identity domain when Secure Launch is configured */ + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; Quietly ignoring the setting and possibly leaving iommu_def_domain_type uninitialised (note that 0 is not actually a usable type) doesn't seem great. AFAICS this probably warrants similar treatment to the Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event though passthrough was requested. Or perhaps something more is needed here? mem_encrypt_active() case - there doesn't seem a great deal of value in trying to save users from themselves if they care about measured boot yet explicitly pass options which may compromise measured boot. If you really want to go down that route there's at least the sysfs interface you'd need to nobble as well, not to mention the various ways of completely disabling IOMMUs... Doing a secure launch with the kernel is not a general purpose user use case. A lot of work is done to secure the environment. Allowing passthrough mode would leave the secure launch kernel exposed to DMA. I think what we are trying to do here is what we intend though there may be a better way or perhaps it is incomplete as you suggest. On second thoughts this is overkill anyway - if you do hook iommu_get_def_domain_type(), you're done (in terms of the kernel-managed setting, at least); it doesn't matter what iommu_def_domain_type gets set to if will never get used. However, since this isn't really a per-device thing, it might be more semantically appropriate to leave that alone and instead only massage the default type in iommu_subsys_init(), as for memory encryption. When you say "secure the environment", what's the actual threat model here, i.e. who's securing what against whom? If it's a device lockdown type thing where the system owner wants to defend against the end user trying to mess with the software stack or gain access to parts they shouldn't, then possibly you can trust the command line, but there are definitely other places which need consideration. If on the other hand it's more about giving the end user confidence that their choice of software stack isn't being interfered with by a malicious host or external third parties, then it probably leans towards the opposite being true... If the command line *is* within the threat model, consider "iommu=off" and/or "intel_iommu=off" for example: I don't know how PMRs work, but I can only imagine that that's liable to leave things either wide open, or blocked to the point of no DMA working at all, neither of which seems to be what you want. I'm guessing "intel_iommu=tboot_noforce" might have some relevant implications too. It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on !SECURE_LAUNCH for clarity though. This came from a specific request to not make disabling IOMMU modes build time dependent. This is because a secure launch enabled kernel can also be booted as a general purpose kernel in cases where this is desired. Ah, thanks for clarif
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
Hi Rob, On Tue, Jun 15, 2021 at 9:16 PM Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > Signed-off-by: Rob Herring > --- a/Documentation/devicetree/bindings/net/stm32-dwmac.yaml > +++ b/Documentation/devicetree/bindings/net/stm32-dwmac.yaml > @@ -46,7 +46,6 @@ properties: > >clocks: > minItems: 3 > -maxItems: 5 > items: >- description: GMAC main clock >- description: MAC TX clock While resolving the conflict with commit fea99822914039c6 ("dt-bindings: net: document ptp_ref clk in dwmac") in soc/for-next, I noticed the following construct for clock-names: clock-names: minItems: 3 maxItems: 6 contains: enum: - stmmaceth - mac-clk-tx - mac-clk-rx - ethstp - eth-ck - ptp_ref Should this use items instead of enum, and drop maxItems, or is this a valid construct to support specifying the clocks in random order? If the latter, it does mean that the order of clock-names may not match the order of the clock descriptions. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
On Tue, Jun 22, 2021 at 3:50 PM Jason Wang wrote: > > > 在 2021/6/22 下午3:22, Yongji Xie 写道: > >> We need fix a way to propagate the error to the userspace. > >> > >> E.g if we want to stop the deivce, we will delay the status reset until > >> we get respose from the userspace? > >> > > I didn't get how to delay the status reset. And should it be a DoS > > that we want to fix if the userspace doesn't give a response forever? > > > You're right. So let's make set_status() can fail first, then propagate > its failure via VHOST_VDPA_SET_STATUS. > OK. So we only need to propagate the failure in the vhost-vdpa case, right? Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
在 2021/6/22 下午3:22, Yongji Xie 写道: We need fix a way to propagate the error to the userspace. E.g if we want to stop the deivce, we will delay the status reset until we get respose from the userspace? I didn't get how to delay the status reset. And should it be a DoS that we want to fix if the userspace doesn't give a response forever? You're right. So let's make set_status() can fail first, then propagate its failure via VHOST_VDPA_SET_STATUS. + } +} + +static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->config_size; +} + +static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + memcpy(buf, dev->config + offset, len); +} + +static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + /* Now we only support read-only configuration space */ +} + +static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->generation; +} + +static int vduse_vdpa_set_map(struct vdpa_device *vdpa, + struct vhost_iotlb *iotlb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + int ret; + + ret = vduse_domain_set_map(dev->domain, iotlb); + if (ret) + return ret; + + ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + if (ret) { + vduse_domain_clear_map(dev->domain, iotlb); + return ret; + } + + return 0; +} + +static void vduse_vdpa_free(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + dev->vdev = NULL; +} + +static const struct vdpa_config_ops vduse_vdpa_config_ops = { + .set_vq_address = vduse_vdpa_set_vq_address, + .kick_vq= vduse_vdpa_kick_vq, + .set_vq_cb = vduse_vdpa_set_vq_cb, + .set_vq_num = vduse_vdpa_set_vq_num, + .set_vq_ready = vduse_vdpa_set_vq_ready, + .get_vq_ready = vduse_vdpa_get_vq_ready, + .set_vq_state = vduse_vdpa_set_vq_state, + .get_vq_state = vduse_vdpa_get_vq_state, + .get_vq_align = vduse_vdpa_get_vq_align, + .get_features = vduse_vdpa_get_features, + .set_features = vduse_vdpa_set_features, + .set_config_cb = vduse_vdpa_set_config_cb, + .get_vq_num_max = vduse_vdpa_get_vq_num_max, + .get_device_id = vduse_vdpa_get_device_id, + .get_vendor_id = vduse_vdpa_get_vendor_id, + .get_status = vduse_vdpa_get_status, + .set_status = vduse_vdpa_set_status, + .get_config_size= vduse_vdpa_get_config_size, + .get_config = vduse_vdpa_get_config, + .set_config = vduse_vdpa_set_config, + .get_generation = vduse_vdpa_get_generation, + .set_map= vduse_vdpa_set_map, + .free = vduse_vdpa_free, +}; + +static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_map_page(domain, page, offset, size, dir, attrs); +} + +static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); +} + +static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova; + void *addr; + + *dma_addr = DMA_MAPPING_ERROR; + addr = vduse_domain_alloc_coherent(domain, size, + (dma_addr_t *)&iova, flag, attrs); + if (!addr) + return NULL; + + *dma_addr = (dma_addr_t)iova; + + return addr; +} + +static void vduse_dev_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->dom
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400: > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and > mangled. I pushed them original patch from Bumyong there and will let > it sit for a day and then create a stable branch and give it to Linus. Thanks, that should be good. Do you want me to send a follow-up patch with the two extra checks (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr) tlb_offset < alloc_size or are we certain this can't ever happen? (I didn't see any hit in dmesg when I ran with these, but my opinion is better safe than sorry...) > Then I need to expand the test-regression bucket so that this does not > happen again. Dominique, how easy would it be to purchase one of those > devices? My company is making such a device, but it's not on the market yet (was planned for august, with some delay in approvisionning it'll probably be a bit late), and would mean buying from Japan so I'm not sure how convenient that would be... These are originally NXP devices so I assume Horia would have better suggestions, if you would? > I was originally thinking to create a crypto device in QEMU to simulate > this but that may take longer to write than just getting the real thing. > > Or I could create some fake devices with weird offsets and write a driver > for it to exercise this.. like this one I had done some time ago that > needs some brushing off. Just a fake device with fake offsets as a test is probably good enough, ideally would need to exerce both failures we've seen (offset in dma_sync_single_for_device like caam does and in the original mapping (I assume?) like the NVMe driver does), but that sounds possible :) Thanks again! -- Dominique ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT
Hi Jean, On 6/18/21 5:20 PM, Jean-Philippe Brucker wrote: > Extract the code that sets up the IOMMU infrastructure from IORT, since > it can be reused by VIOT. Move it one level up into a new > acpi_iommu_configure_id() function, which calls the IORT parsing > function which in turn calls the acpi_iommu_fwspec_init() helper. > > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Thanks Eric > --- > include/acpi/acpi_bus.h | 3 ++ > include/linux/acpi_iort.h | 8 ++--- > drivers/acpi/arm64/iort.c | 74 +-- > drivers/acpi/scan.c | 73 +- > 4 files changed, 86 insertions(+), 72 deletions(-) > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h > index 3a82faac5767..41f092a269f6 100644 > --- a/include/acpi/acpi_bus.h > +++ b/include/acpi/acpi_bus.h > @@ -588,6 +588,9 @@ struct acpi_pci_root { > > bool acpi_dma_supported(struct acpi_device *adev); > enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev); > +int acpi_iommu_fwspec_init(struct device *dev, u32 id, > +struct fwnode_handle *fwnode, > +const struct iommu_ops *ops); > int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, > u64 *size); > int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, > diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h > index f7f054833afd..f1f0842a2cb2 100644 > --- a/include/linux/acpi_iort.h > +++ b/include/linux/acpi_iort.h > @@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev); > int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); > /* IOMMU interface */ > int iort_dma_get_ranges(struct device *dev, u64 *size); > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, > - const u32 *id_in); > +int iort_iommu_configure_id(struct device *dev, const u32 *id_in); > int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head > *head); > phys_addr_t acpi_iort_dma_get_max_cpu_address(void); > #else > @@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device > *dev) { } > /* IOMMU interface */ > static inline int iort_dma_get_ranges(struct device *dev, u64 *size) > { return -ENODEV; } > -static inline const struct iommu_ops *iort_iommu_configure_id( > - struct device *dev, const u32 *id_in) > -{ return NULL; } > +static inline int iort_iommu_configure_id(struct device *dev, const u32 > *id_in) > +{ return -ENODEV; } > static inline > int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head > *head) > { return 0; } > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index a940be1cf2af..487d1095030d 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -806,23 +806,6 @@ static struct acpi_iort_node > *iort_get_msi_resv_iommu(struct device *dev) > return NULL; > } > > -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device > *dev) > -{ > - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > - > - return (fwspec && fwspec->ops) ? fwspec->ops : NULL; > -} > - > -static inline int iort_add_device_replay(struct device *dev) > -{ > - int err = 0; > - > - if (dev->bus && !device_iommu_mapped(dev)) > - err = iommu_probe_device(dev); > - > - return err; > -} > - > /** > * iort_iommu_msi_get_resv_regions - Reserved region driver helper > * @dev: Device from iommu_get_resv_regions() > @@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type) > } > } > > -static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, > -struct fwnode_handle *fwnode, > -const struct iommu_ops *ops) > -{ > - int ret = iommu_fwspec_init(dev, fwnode, ops); > - > - if (!ret) > - ret = iommu_fwspec_add_ids(dev, &streamid, 1); > - > - return ret; > -} > - > static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node) > { > struct acpi_iort_root_complex *pci_rc; > @@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct > acpi_iort_node *node, > return iort_iommu_driver_enabled(node->type) ? > -EPROBE_DEFER : -ENODEV; > > - return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops); > + return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops); > } > > struct iort_pci_alias_info { > @@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev, > * @dev: device to configure > * @id_in: optional input id const value pointer > * > - * Returns: iommu_ops pointer on configuration success > - * NULL on configuration failure > + * Returns: 0 on success, <0 on failure > */ > -const struct iommu_ops *iort_iommu_configure_id(stru
Re: [PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
Hi Jean, On 6/18/21 5:20 PM, Jean-Philippe Brucker wrote: > Passing a 64-bit address width to iommu_setup_dma_ops() is valid on > virtual platforms, but isn't currently possible. The overflow check in > iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass > a limit address instead of a size, so callers don't have to fake a size > to work around the check. > > The base and limit parameters are being phased out, because: > * they are redundant for x86 callers. dma-iommu already reserves the > first page, and the upper limit is already in domain->geometry. > * they can now be obtained from dev->dma_range_map on Arm. > But removing them on Arm isn't completely straightforward so is left for > future work. As an intermediate step, simplify the x86 callers by > passing dummy limits. > > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Eric > --- > include/linux/dma-iommu.h | 4 ++-- > arch/arm64/mm/dma-mapping.c | 2 +- > drivers/iommu/amd/iommu.c | 2 +- > drivers/iommu/dma-iommu.c | 12 ++-- > drivers/iommu/intel/iommu.c | 5 + > 5 files changed, 11 insertions(+), 14 deletions(-) > > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 6e75a2d689b4..758ca4694257 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, > dma_addr_t base); > void iommu_put_dma_cookie(struct iommu_domain *domain); > > /* Setup call for arch DMA mapping code */ > -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size); > +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); > > /* The DMA API isn't _quite_ the whole story, though... */ > /* > @@ -50,7 +50,7 @@ struct msi_msg; > struct device; > > static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, > - u64 size) > +u64 dma_limit) > { > } > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 4bf1dd3eb041..6719f9efea09 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, > u64 size, > > dev->dma_coherent = coherent; > if (iommu) > - iommu_setup_dma_ops(dev, dma_base, size); > + iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1); > > #ifdef CONFIG_XEN > if (xen_swiotlb_detect()) > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 3ac42bbdefc6..216323fb27ef 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev) > /* Domains are initialized for this device - have a look what we ended > up with */ > domain = iommu_get_domain_for_dev(dev); > if (domain->type == IOMMU_DOMAIN_DMA) > - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0); > + iommu_setup_dma_ops(dev, 0, U64_MAX); > else > set_dma_ops(dev, NULL); > } > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7bcdd1205535..c62e19bed302 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev) > * iommu_dma_init_domain - Initialise a DMA mapping domain > * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() > * @base: IOVA at which the mappable address space starts > - * @size: Size of IOVA space > + * @limit: Last address of the IOVA space > * @dev: Device the domain is being initialised for > * > - * @base and @size should be exact multiples of IOMMU page granularity to > + * @base and @limit + 1 should be exact multiples of IOMMU page granularity > to > * avoid rounding surprises. If necessary, we reserve the page at address 0 > * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but > * any change which could make prior IOVAs invalid will fail. > */ > static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t > base, > - u64 size, struct device *dev) > + dma_addr_t limit, struct device *dev) > { > struct iommu_dma_cookie *cookie = domain->iova_cookie; > unsigned long order, base_pfn; > @@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain > *domain, dma_addr_t base, > /* Check the domain allows at least some access to the device... */ > if (domain->geometry.force_aperture) { > if (base > domain->geometry.aperture_end || > - base + size <= domain->geometry.aperture_start) { > + limit < domain->geometry.aperture_start) { > pr_warn("specified DMA range outside IOMMU > capability\n"); > return -EFAULT; >
Re: [PATCH 1/1] dma-mapping: remove trailing spaces and tabs
Thanks, applied to the dma-mapping tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma debug: report -EEXIST errors in add_dma_entry
Thanks, applied to the dma-mapping tree. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
.On Tue, Jun 22, 2021 at 1:07 PM Jason Wang wrote: > > > 在 2021/6/21 下午6:41, Yongji Xie 写道: > > On Mon, Jun 21, 2021 at 5:14 PM Jason Wang wrote: > >> > >> 在 2021/6/15 下午10:13, Xie Yongji 写道: > >>> This VDUSE driver enables implementing vDPA devices in userspace. > >>> The vDPA device's control path is handled in kernel and the data > >>> path is handled in userspace. > >>> > >>> A message mechnism is used by VDUSE driver to forward some control > >>> messages such as starting/stopping datapath to userspace. Userspace > >>> can use read()/write() to receive/reply those control messages. > >>> > >>> And some ioctls are introduced to help userspace to implement the > >>> data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file > >>> descriptors referring to vDPA device's iova regions. Then userspace > >>> can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES > >>> and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features > >>> and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD > >>> ioctls can be used to inject interrupt and setup the kickfd for > >>> virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the > >>> configuration space and inject a config interrupt. > >>> > >>> Signed-off-by: Xie Yongji > >>> --- > >>>Documentation/userspace-api/ioctl/ioctl-number.rst |1 + > >>>drivers/vdpa/Kconfig | 10 + > >>>drivers/vdpa/Makefile |1 + > >>>drivers/vdpa/vdpa_user/Makefile|5 + > >>>drivers/vdpa/vdpa_user/vduse_dev.c | 1453 > >>> > >>>include/uapi/linux/vduse.h | 143 ++ > >>>6 files changed, 1613 insertions(+) > >>>create mode 100644 drivers/vdpa/vdpa_user/Makefile > >>>create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c > >>>create mode 100644 include/uapi/linux/vduse.h > >>> > >>> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst > >>> b/Documentation/userspace-api/ioctl/ioctl-number.rst > >>> index 9bfc2b510c64..acd95e9dcfe7 100644 > >>> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst > >>> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst > >>> @@ -300,6 +300,7 @@ Code Seq#Include File > >>>Comments > >>>'z' 10-4F drivers/s390/crypto/zcrypt_api.h > >>> conflict! > >>>'|' 00-7F linux/media.h > >>>0x80 00-1F linux/fb.h > >>> +0x81 00-1F linux/vduse.h > >>>0x89 00-06 arch/x86/include/asm/sockios.h > >>>0x89 0B-DF linux/sockios.h > >>>0x89 E0-EF linux/sockios.h > >>> SIOCPROTOPRIVATE range > >>> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig > >>> index a503c1b2bfd9..6e23bce6433a 100644 > >>> --- a/drivers/vdpa/Kconfig > >>> +++ b/drivers/vdpa/Kconfig > >>> @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK > >>> vDPA block device simulator which terminates IO request in a > >>> memory buffer. > >>> > >>> +config VDPA_USER > >>> + tristate "VDUSE (vDPA Device in Userspace) support" > >>> + depends on EVENTFD && MMU && HAS_DMA > >>> + select DMA_OPS > >>> + select VHOST_IOTLB > >>> + select IOMMU_IOVA > >>> + help > >>> + With VDUSE it is possible to emulate a vDPA Device > >>> + in a userspace program. > >>> + > >>>config IFCVF > >>>tristate "Intel IFC VF vDPA driver" > >>>depends on PCI_MSI > >>> diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile > >>> index 67fe7f3d6943..f02ebed33f19 100644 > >>> --- a/drivers/vdpa/Makefile > >>> +++ b/drivers/vdpa/Makefile > >>> @@ -1,6 +1,7 @@ > >>># SPDX-License-Identifier: GPL-2.0 > >>>obj-$(CONFIG_VDPA) += vdpa.o > >>>obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ > >>> +obj-$(CONFIG_VDPA_USER) += vdpa_user/ > >>>obj-$(CONFIG_IFCVF)+= ifcvf/ > >>>obj-$(CONFIG_MLX5_VDPA) += mlx5/ > >>>obj-$(CONFIG_VP_VDPA)+= virtio_pci/ > >>> diff --git a/drivers/vdpa/vdpa_user/Makefile > >>> b/drivers/vdpa/vdpa_user/Makefile > >>> new file mode 100644 > >>> index ..260e0b26af99 > >>> --- /dev/null > >>> +++ b/drivers/vdpa/vdpa_user/Makefile > >>> @@ -0,0 +1,5 @@ > >>> +# SPDX-License-Identifier: GPL-2.0 > >>> + > >>> +vduse-y := vduse_dev.o iova_domain.o > >>> + > >>> +obj-$(CONFIG_VDPA_USER) += vduse.o > >>> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c > >>> b/drivers/vdpa/vdpa_user/vduse_dev.c > >>> new file mode 100644 > >>> index ..5271cbd15e28 > >>> --- /dev/null > >>> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > >>> @@ -0,0 +1,1453 @@ > >>> +// SPDX-License-Identifier: GPL-2.0-only > >>> +/* > >>> + * VDUSE: vDPA Device in Userspace > >>> + * > >>> + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All > >>> rights reserved. > >>> + * > >>> + * Author: Xie Yongji > >>> + * > >>> + */ > >>> + > >>> +
Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
Hi Robin, On 2021-06-21 21:15, Robin Murphy wrote: On 2021-06-18 03:51, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context with tlb_flush_all() callback in partial walk flush to improve unmap performance on select few platforms where the cost of over-invalidation is less than the unmap latency. I still think this doesn't belong anywhere near io-pgtable at all. It's a driver-internal decision how exactly it implements a non-leaf invalidation, and that may be more complex than a predetermined boolean decision. For example, I've just realised for SMMUv3 we can't invalidate multiple levels of table at once with a range command, since if we assume the whole thing is mapped at worst-case page granularity we may fail to invalidate any parts which are mapped as intermediate-level blocks. If invalidating a 1GB region (with 4KB granule) means having to fall back to 256K non-range commands, we may not want to invalidate by VA then, even though doing so for a 2MB region is still optimal. It's also quite feasible that drivers might want to do this for leaf invalidations too - if you don't like issuing 512 commands to invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB? - and at that point the logic really has to be in the driver anyway. Ok I will move this to tlb_flush_walk() functions in the drivers. In the previous v1 thread, you suggested to make the choice in iommu_get_dma_strict() test, I assume you meant the test in iommu_dma_init_domain() with a flag or was it the leaf driver(ex:arm-smmu.c) test of iommu_get_dma_strict() in init_domain? I am still a bit confused on where this flag would be? Should this be a part of struct iommu_domain? Thanks, Sai Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/io-pgtable-arm.c | 3 ++- include/linux/io-pgtable.h | 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58e79b5..5d362f2214bd 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NON_STRICT | IO_PGTABLE_QUIRK_ARM_TTBR1 | - IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)) + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA | + IO_PGTABLE_QUIRK_TLB_INV_ALL)) return NULL; data = arm_lpae_alloc_pgtable(cfg); diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4d40dfa75b55..45441592a0e6 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -82,6 +82,10 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability * attributes set in the TCR for a non-coherent page-table walker. +* +* IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate +* entire context for partial walk flush to increase unmap +* performance on select few platforms. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -89,6 +93,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) + #define IO_PGTABLE_QUIRK_TLB_INV_ALLBIT(7) unsigned long quirks; unsigned long pgsize_bitmap; unsigned intias; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu