Re: Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
On Wed, Jun 30, 2021 at 5:51 PM Stefan Hajnoczi wrote: > > On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote: > > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi wrote: > > > > > > On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote: > > > > +/* ioctls */ > > > > + > > > > +struct vduse_dev_config { > > > > + char name[VDUSE_NAME_MAX]; /* vduse device name */ > > > > + __u32 vendor_id; /* virtio vendor id */ > > > > + __u32 device_id; /* virtio device id */ > > > > + __u64 features; /* device features */ > > > > + __u64 bounce_size; /* bounce buffer size for iommu */ > > > > + __u16 vq_size_max; /* the max size of virtqueue */ > > > > > > The VIRTIO specification allows per-virtqueue sizes. A device can have > > > two virtqueues, where the first one allows up to 1024 descriptors and > > > the second one allows only 128 descriptors, for example. > > > > > > > Good point! But it looks like virtio-vdpa/virtio-pci doesn't support > > that now. All virtqueues have the same maximum size. > > I see struct vpda_config_ops only supports a per-device max vq size: > u16 (*get_vq_num_max)(struct vdpa_device *vdev); > > virtio-pci supports per-virtqueue sizes because the struct > virtio_pci_common_cfg->queue_size register is per-queue (controlled by > queue_select). > Oh, yes. I miss queue_select. > I guess this is a question for Jason: will vdpa will keep this limitation? > If yes, then VDUSE can stick to it too without running into problems in > the future. > > > > > + __u16 padding; /* padding */ > > > > + __u32 vq_num; /* the number of virtqueues */ > > > > + __u32 vq_align; /* the allocation alignment of virtqueue's > > > > metadata */ > > > > > > I'm not sure what this is? > > > > > > > This will be used by vring_create_virtqueue() too. > > If there is no official definition for the meaning of this value then > "/* same as vring_create_virtqueue()'s vring_align parameter */" would > be clearer. That way the reader knows what to research in order to > understand how this field works. > OK. > I don't remember but maybe it was used to support vrings when the > host/guest have non-4KB page sizes. I wonder if anyone has an official > definition for this value? Not sure. Maybe we might need some alignment which is less than PAGE_SIZE sometimes. Thanks, Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4] swiotlb: fix implicit debugfs declarations
Factor out the debugfs bits from rmem_swiotlb_device_init() into a separate rmem_swiotlb_debugfs_init() to fix the implicit debugfs declarations. Fixes: 461021875c50 ("swiotlb: Add restricted DMA pool initialization") Reported-by: kernel test robot Signed-off-by: Claire Chang --- kernel/dma/swiotlb.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0ffbaae9fba2..b7f76bca89bf 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -712,6 +712,21 @@ late_initcall(swiotlb_create_default_debugfs); #endif #ifdef CONFIG_DMA_RESTRICTED_POOL + +#ifdef CONFIG_DEBUG_FS +static void rmem_swiotlb_debugfs_init(struct reserved_mem *rmem) +{ + struct io_tlb_mem *mem = rmem->priv; + + mem->debugfs = debugfs_create_dir(rmem->name, debugfs_dir); + swiotlb_create_debugfs_files(mem); +} +#else +static void rmem_swiotlb_debugfs_init(struct reserved_mem *rmem) +{ +} +#endif + struct page *swiotlb_alloc(struct device *dev, size_t size) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; @@ -766,11 +781,7 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem, rmem->priv = mem; - if (IS_ENABLED(CONFIG_DEBUG_FS)) { - mem->debugfs = - debugfs_create_dir(rmem->name, debugfs_dir); - swiotlb_create_debugfs_files(mem); - } + rmem_swiotlb_debugfs_init(rmem); } dev->dma_io_tlb_mem = mem; -- 2.32.0.93.g670b81a890-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/arm: Cleanup resources in case of probe error path
> + > +err_pm_disable: > + pm_runtime_disable(dev); > return ret; > } Should it be pm_runtime_force_suspend()? qcom_iommu_device_remove() doesn't use pm_runtime_disable(dev). 875 static int qcom_iommu_device_remove(struct platform_device *pdev) 876 { ... 881 >---pm_runtime_force_suspend(&pdev->dev); 882 >---platform_set_drvdata(pdev, NULL); 883 >---iommu_device_sysfs_remove(&qcom_iommu->iommu); 884 >---iommu_device_unregister(&qcom_iommu->iommu); ... 887 } -KR ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
On Wed, Jun 30, 2021 at 07:44:21PM +0200, Thierry Reding wrote: > On Wed, Jun 30, 2021 at 01:12:26PM +0200, Marek Szyprowski wrote: > > Hi Leon, > > > > On 29.06.2021 10:40, Leon Romanovsky wrote: > > > From: Maor Gottlieb > > > > > > orig_nents should represent the number of entries with pages, > > > but __sg_alloc_table_from_pages sets orig_nents as the number of > > > total entries in the table. This is wrong when the API is used for > > > dynamic allocation where not all the table entries are mapped with > > > pages. It wasn't observed until now, since RDMA umem who uses this > > > API in the dynamic form doesn't use orig_nents implicit or explicit > > > by the scatterlist APIs. > > > > > > Fix it by: > > > 1. Set orig_nents as number of entries with pages also in > > > __sg_alloc_table_from_pages. > > > 2. Add a new field total_nents to reflect the total number of entries > > > in the table. This is required for the release flow (sg_free_table). > > > This filed should be used internally only by scatterlist. > > > > > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation > > > of SG table from pages") > > > Signed-off-by: Maor Gottlieb > > > Signed-off-by: Leon Romanovsky > > > > This patch landed in linux-next 20210630 as commit a52724456928 > > ("lib/scatterlist: Fix wrong update of orig_nents"). It causes serious > > regression in DMA-IOMMU integration, which can be observed for example > > on ARM Juno board during boot: > > > > Unable to handle kernel paging request at virtual address 00376f42a6e40454 > > Mem abort info: > > ESR = 0x9604 > > EC = 0x25: DABT (current EL), IL = 32 bits > > SET = 0, FnV = 0 > > EA = 0, S1PTW = 0 > > FSC = 0x04: level 0 translation fault > > Data abort info: > > ISV = 0, ISS = 0x0004 > > CM = 0, WnR = 0 > > [00376f42a6e40454] address between user and kernel address ranges > > Internal error: Oops: 9604 [#1] PREEMPT SMP > > Modules linked in: > > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.0-next-20210630+ #3585 > > Hardware name: ARM Juno development board (r1) (DT) > > pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) > > pc : __sg_free_table+0x60/0xa0 > > lr : __sg_free_table+0x7c/0xa0 > > .. > > Call trace: > > __sg_free_table+0x60/0xa0 > > sg_free_table+0x1c/0x28 > > iommu_dma_alloc+0xc8/0x388 > > dma_alloc_attrs+0xcc/0xf0 > > dmam_alloc_attrs+0x68/0xb8 > > sil24_port_start+0x60/0xe0 > > ata_host_start.part.32+0xbc/0x208 > > ata_host_activate+0x64/0x150 > > sil24_init_one+0x1e8/0x268 > > local_pci_probe+0x3c/0xa0 > > pci_device_probe+0x128/0x1c8 > > really_probe+0x138/0x2d0 > > __driver_probe_device+0x78/0xd8 > > driver_probe_device+0x40/0x110 > > __driver_attach+0xcc/0x118 > > bus_for_each_dev+0x68/0xc8 > > driver_attach+0x20/0x28 > > bus_add_driver+0x168/0x1f8 > > driver_register+0x60/0x110 > > __pci_register_driver+0x5c/0x68 > > sil24_pci_driver_init+0x20/0x28 > > do_one_initcall+0x84/0x450 > > kernel_init_freeable+0x31c/0x38c > > kernel_init+0x20/0x120 > > ret_from_fork+0x10/0x18 > > Code: d37be885 6b01007f 5284 54a2 (f8656813) > > ---[ end trace 4ba4f0c9c48711a1 ]--- > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > > > It looks that some changes to the scatterlist structures are missing > > outside of the lib/scatterlist.c. > > > > For now I would suggest to revert this change. > > I see a very similar crash on Tegra during the HDA driver's probe. > > Leon, let me know if you need a tester for a replacement patch. With a great pleasure, I'll contact you offline when we prepare it. For now, this patch will be dropped. Thanks > > Thanks, > Thierry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
On Wed, Jun 30, 2021 at 01:12:26PM +0200, Marek Szyprowski wrote: > Hi Leon, > > On 29.06.2021 10:40, Leon Romanovsky wrote: > > From: Maor Gottlieb > > > > orig_nents should represent the number of entries with pages, > > but __sg_alloc_table_from_pages sets orig_nents as the number of > > total entries in the table. This is wrong when the API is used for > > dynamic allocation where not all the table entries are mapped with > > pages. It wasn't observed until now, since RDMA umem who uses this > > API in the dynamic form doesn't use orig_nents implicit or explicit > > by the scatterlist APIs. > > > > Fix it by: > > 1. Set orig_nents as number of entries with pages also in > > __sg_alloc_table_from_pages. > > 2. Add a new field total_nents to reflect the total number of entries > > in the table. This is required for the release flow (sg_free_table). > > This filed should be used internally only by scatterlist. > > > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of > > SG table from pages") > > Signed-off-by: Maor Gottlieb > > Signed-off-by: Leon Romanovsky > > This patch landed in linux-next 20210630 as commit a52724456928 > ("lib/scatterlist: Fix wrong update of orig_nents"). It causes serious > regression in DMA-IOMMU integration, which can be observed for example > on ARM Juno board during boot: > > Unable to handle kernel paging request at virtual address 00376f42a6e40454 > Mem abort info: > ESR = 0x9604 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x0004 > CM = 0, WnR = 0 > [00376f42a6e40454] address between user and kernel address ranges > Internal error: Oops: 9604 [#1] PREEMPT SMP > Modules linked in: > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.0-next-20210630+ #3585 > Hardware name: ARM Juno development board (r1) (DT) > pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) > pc : __sg_free_table+0x60/0xa0 > lr : __sg_free_table+0x7c/0xa0 > .. > Call trace: > __sg_free_table+0x60/0xa0 > sg_free_table+0x1c/0x28 > iommu_dma_alloc+0xc8/0x388 > dma_alloc_attrs+0xcc/0xf0 > dmam_alloc_attrs+0x68/0xb8 > sil24_port_start+0x60/0xe0 > ata_host_start.part.32+0xbc/0x208 > ata_host_activate+0x64/0x150 > sil24_init_one+0x1e8/0x268 > local_pci_probe+0x3c/0xa0 > pci_device_probe+0x128/0x1c8 > really_probe+0x138/0x2d0 > __driver_probe_device+0x78/0xd8 > driver_probe_device+0x40/0x110 > __driver_attach+0xcc/0x118 > bus_for_each_dev+0x68/0xc8 > driver_attach+0x20/0x28 > bus_add_driver+0x168/0x1f8 > driver_register+0x60/0x110 > __pci_register_driver+0x5c/0x68 > sil24_pci_driver_init+0x20/0x28 > do_one_initcall+0x84/0x450 > kernel_init_freeable+0x31c/0x38c > kernel_init+0x20/0x120 > ret_from_fork+0x10/0x18 > Code: d37be885 6b01007f 5284 54a2 (f8656813) > ---[ end trace 4ba4f0c9c48711a1 ]--- > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > It looks that some changes to the scatterlist structures are missing > outside of the lib/scatterlist.c. > > For now I would suggest to revert this change. I see a very similar crash on Tegra during the HDA driver's probe. Leon, let me know if you need a tester for a replacement patch. Thanks, Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Hi Will and Claire, On Wed, Jun 30, 2021 at 12:43:48PM +0100, Will Deacon wrote: > On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > > > > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > > > use it to determine whether to bounce the data or not. This will be > > > > useful later to allow for different pools. > > > > > > > > Signed-off-by: Claire Chang > > > > Reviewed-by: Christoph Hellwig > > > > Tested-by: Stefano Stabellini > > > > Tested-by: Will Deacon > > > > Acked-by: Stefano Stabellini > > > > > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > > > get to an X session consistently (although not every single time), > > > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > > > > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > > > to provide any further information, debug, or test patches as necessary. > > > > Are you using swiotlb=force? or the swiotlb_map is called because of > > !dma_capable? > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) > > The command line is in the dmesg: > > | Kernel command line: initrd=\amd-ucode.img > initrd=\initramfs-linux-next-llvm.img > root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp > irqpoll > > but I worry that this looks _very_ similar to the issue reported by Qian > Cai which we thought we had fixed. Nathan -- is the failure deterministic? Yes, for the most part. It does not happen every single boot so when I was bisecting, I did a series of seven boots and only considered the revision good when all seven of them made it to LightDM's greeter. My results that I notated show most bad revisions failed anywhere from four to six times. > > `BUG: unable to handle page fault for address: 003a8290` and > > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory > > (maybe dev->dma_io_tlb_mem) was corrupted? > > The dev->dma_io_tlb_mem should be set here > > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > > through device_initialize. > > I'm less sure about this. 'dma_io_tlb_mem' should be pointing at > 'io_tlb_default_mem', which is a page-aligned allocation from memblock. > The spinlock is at offset 0x24 in that structure, and looking at the > register dump from the crash: > > Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006 > Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: > RCX: 8900572ad580 > Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c > RDI: 1d17 > Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c > R09: > Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 > R12: 0212 > Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 > R15: 0020 > Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() > GS:89005728() knlGS: > Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: > 80050033 > Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d > CR4: 00350ee0 > Jun 29 18:28:42 hp-4300G kernel: Call Trace: > Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 > Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 > > Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and > RDX pointing at the spinlock. Yet RAX is holding junk :/ > > I agree that enabling KASAN would be a good idea, but I also think we > probably need to get some more information out of swiotlb_tbl_map_single() > to see see what exactly is going wrong in there. I can certainly enable KASAN and if there is any debug print I can add or dump anything, let me know! Cheers, Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/3] iommu: dart: Add DART iommu driver
Looks really good! Just a few minor comments. With them addressed, Reviewed-by: Alyssa Rosenzweig > + Say Y here if you are using an Apple SoC with a DART IOMMU. Nit: Do we need to spell out "with a DART IOMMU"? Don't all the apple socs need DART? > +/* > + * This structure is used to identify a single stream attached to a domain. > + * It's used as a list inside that domain to be able to attach multiple > + * streams to a single domain. Since multiple devices can use a single stream > + * it additionally keeps track of how many devices are represented by this > + * stream. Once that number reaches zero it is detached from the IOMMU domain > + * and all translations from this stream are disabled. > + * > + * @dart: DART instance to which this stream belongs > + * @sid: stream id within the DART instance > + * @num_devices: count of devices attached to this stream > + * @stream_head: list head for the next stream > + */ > +struct apple_dart_stream { > + struct apple_dart *dart; > + u32 sid; > + > + u32 num_devices; > + > + struct list_head stream_head; > +}; It wasn't obvious to me why we can get away without reference counting. Looking ahead it looks like we assert locks in each case. Maybe add that to the comment? ``` > +static void apple_dart_hw_set_ttbr(struct apple_dart *dart, u16 sid, u16 idx, > +phys_addr_t paddr) > +{ > + writel(DART_TTBR_VALID | (paddr >> DART_TTBR_SHIFT), > +dart->regs + DART_TTBR(sid, idx)); > +} ``` Should we be checking alignment here? Something like BUG_ON(paddr & ((1 << DART_TTBR_SHIFT) - 1)); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/3] iommu: io-pgtable: add DART pagetable format
Reviewed-by: Alyssa Rosenzweig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/3] dt-bindings: iommu: add DART iommu bindings
Reviewed-by: Alyssa Rosenzweig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path
On 2021-06-30 14:48, Marek Szyprowski wrote: On 30.06.2021 14:59, Will Deacon wrote: On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote: On 08.06.2021 18:45, Amey Narkhede wrote: If device registration fails, remove sysfs attribute and if setting bus callbacks fails, unregister the device and cleanup the sysfs attribute. Signed-off-by: Amey Narkhede This patch landed in linux-next some time ago as commit 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path"). After bisecting and some manual searching I finally found that it is responsible for breaking s2idle on DragonBoard 410c. Here is the log (captured with no_console_suspend): # time rtcwake -s10 -mmem rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan 1 00:02:13 1970 PM: suspend entry (s2idle) Filesystems sync: 0.002 seconds Freezing user space processes ... (elapsed 0.006 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done. Unable to handle kernel NULL pointer dereference at virtual address 0070 Mem abort info: ESR = 0x9606 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x0006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000 [0070] pgd=080085c3c003, p4d=080085c3c003, pud=080088dcf003, pmd= Internal error: Oops: 9606 [#1] PREEMPT SMP Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511 snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2 snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916 videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) pc : msm_runtime_suspend+0x1c/0x60 [msm] lr : msm_pm_suspend+0x18/0x38 [msm] ... Call trace: msm_runtime_suspend+0x1c/0x60 [msm] msm_pm_suspend+0x18/0x38 [msm] dpm_run_callback+0x84/0x378 I wonder if we're missing a pm_runtime_disable() call on the failure path? i.e. something like the diff below... I've checked and it doesn't fix anything. What's happened previously? Has an IOMMU actually failed to probe, or is this a fiddly "code movement unveils latent bug elsewhere" kind of thing? There doesn't look to be much capable of going wrong in msm_runtime_suspend() itself, so is the DRM driver also in a broken half-probed state where it's left its pm_runtime_ops behind without its drvdata being valid? Robin. Will --->8 diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c index 25ed444ff94d..ce8f354755d0 100644 --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c @@ -836,14 +836,14 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) ret = devm_of_platform_populate(dev); if (ret) { dev_err(dev, "Failed to populate iommu contexts\n"); - return ret; + goto err_pm_disable; } ret = iommu_device_sysfs_add(&qcom_iommu->iommu, dev, NULL, dev_name(dev)); if (ret) { dev_err(dev, "Failed to register iommu in sysfs\n"); - return ret; + goto err_pm_disable; } ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, dev); @@ -869,6 +869,9 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) err_sysfs_remove: iommu_device_sysfs_remove(&qcom_iommu->iommu); + +err_pm_disable: + pm_runtime_disable(dev); return ret; } Best regards ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] swiotlb: fix implicit debugfs_create_dir declaration
On 2021-06-30 05:06, Claire Chang wrote: Factor out the debugfs bits from rmem_swiotlb_device_init() into a separate rmem_swiotlb_debugfs_init(). Fixes: 461021875c50 ("swiotlb: Add restricted DMA pool initialization") Reported-by: kernel test robot Signed-off-by: Claire Chang --- kernel/dma/swiotlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0ffbaae9fba2..355dc98d381c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -709,6 +709,16 @@ static int __init swiotlb_create_default_debugfs(void) late_initcall(swiotlb_create_default_debugfs); +static void rmem_swiotlb_debugfs_init(const char *name, struct io_tlb_mem *mem) Nit: maybe just pass in the reserved_mem, then dereference rmem->name and rmem->priv inside here? +{ + mem->debugfs = debugfs_create_dir(name, debugfs_dir); + swiotlb_create_debugfs_files(mem); +} +#else +__maybe_unused static void rmem_swiotlb_debugfs_init(const char *name, +struct io_tlb_mem *mem) +{ +} #endif #ifdef CONFIG_DMA_RESTRICTED_POOL @@ -766,11 +776,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem, rmem->priv = mem; - if (IS_ENABLED(CONFIG_DEBUG_FS)) { - mem->debugfs = - debugfs_create_dir(rmem->name, debugfs_dir); - swiotlb_create_debugfs_files(mem); - } + if (IS_ENABLED(CONFIG_DEBUG_FS)) + rmem_swiotlb_debugfs_init(rmem->name, mem); Just make the call unconditional (and drop the __maybe_unused annotation above) - the stub already does nothing for the DEBUG_FS=n case, so the IS_ENABLED() is pointless. Robin. } dev->dma_io_tlb_mem = mem; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path
On 30.06.2021 14:59, Will Deacon wrote: > On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote: >> On 08.06.2021 18:45, Amey Narkhede wrote: >>> If device registration fails, remove sysfs attribute >>> and if setting bus callbacks fails, unregister the device >>> and cleanup the sysfs attribute. >>> >>> Signed-off-by: Amey Narkhede >> This patch landed in linux-next some time ago as commit 249c9dc6aa0d >> ("iommu/arm: Cleanup resources in case of probe error path"). After >> bisecting and some manual searching I finally found that it is >> responsible for breaking s2idle on DragonBoard 410c. Here is the log >> (captured with no_console_suspend): >> >> # time rtcwake -s10 -mmem >> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan 1 00:02:13 1970 >> PM: suspend entry (s2idle) >> Filesystems sync: 0.002 seconds >> Freezing user space processes ... (elapsed 0.006 seconds) done. >> OOM killer disabled. >> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done. >> Unable to handle kernel NULL pointer dereference at virtual address >> 0070 >> Mem abort info: >> ESR = 0x9606 >> EC = 0x25: DABT (current EL), IL = 32 bits >> SET = 0, FnV = 0 >> EA = 0, S1PTW = 0 >> FSC = 0x06: level 2 translation fault >> Data abort info: >> ISV = 0, ISS = 0x0006 >> CM = 0, WnR = 0 >> user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000 >> [0070] pgd=080085c3c003, p4d=080085c3c003, >> pud=080088dcf003, pmd= >> Internal error: Oops: 9606 [#1] PREEMPT SMP >> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b >> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511 >> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon >> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm >> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital >> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem >> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2 >> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916 >> videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem >> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592 >> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) >> pc : msm_runtime_suspend+0x1c/0x60 [msm] >> lr : msm_pm_suspend+0x18/0x38 [msm] >> ... >> Call trace: >> msm_runtime_suspend+0x1c/0x60 [msm] >> msm_pm_suspend+0x18/0x38 [msm] >> dpm_run_callback+0x84/0x378 > I wonder if we're missing a pm_runtime_disable() call on the failure path? > i.e. something like the diff below... I've checked and it doesn't fix anything. > Will > > --->8 > > diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c > b/drivers/iommu/arm/arm-smmu/qcom_iommu.c > index 25ed444ff94d..ce8f354755d0 100644 > --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c > +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c > @@ -836,14 +836,14 @@ static int qcom_iommu_device_probe(struct > platform_device *pdev) > ret = devm_of_platform_populate(dev); > if (ret) { > dev_err(dev, "Failed to populate iommu contexts\n"); > - return ret; > + goto err_pm_disable; > } > > ret = iommu_device_sysfs_add(&qcom_iommu->iommu, dev, NULL, > dev_name(dev)); > if (ret) { > dev_err(dev, "Failed to register iommu in sysfs\n"); > - return ret; > + goto err_pm_disable; > } > > ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, > dev); > @@ -869,6 +869,9 @@ static int qcom_iommu_device_probe(struct platform_device > *pdev) > > err_sysfs_remove: > iommu_device_sysfs_remove(&qcom_iommu->iommu); > + > +err_pm_disable: > + pm_runtime_disable(dev); > return ret; > } > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path
On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote: > On 08.06.2021 18:45, Amey Narkhede wrote: > > If device registration fails, remove sysfs attribute > > and if setting bus callbacks fails, unregister the device > > and cleanup the sysfs attribute. > > > > Signed-off-by: Amey Narkhede > > This patch landed in linux-next some time ago as commit 249c9dc6aa0d > ("iommu/arm: Cleanup resources in case of probe error path"). After > bisecting and some manual searching I finally found that it is > responsible for breaking s2idle on DragonBoard 410c. Here is the log > (captured with no_console_suspend): > > # time rtcwake -s10 -mmem > rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan 1 00:02:13 1970 > PM: suspend entry (s2idle) > Filesystems sync: 0.002 seconds > Freezing user space processes ... (elapsed 0.006 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done. > Unable to handle kernel NULL pointer dereference at virtual address > 0070 > Mem abort info: > ESR = 0x9606 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x06: level 2 translation fault > Data abort info: > ISV = 0, ISS = 0x0006 > CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000 > [0070] pgd=080085c3c003, p4d=080085c3c003, > pud=080088dcf003, pmd= > Internal error: Oops: 9606 [#1] PREEMPT SMP > Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b > venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511 > snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon > qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm > venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital > videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem > snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2 > snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916 > videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem > CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592 > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) > pc : msm_runtime_suspend+0x1c/0x60 [msm] > lr : msm_pm_suspend+0x18/0x38 [msm] > ... > Call trace: > msm_runtime_suspend+0x1c/0x60 [msm] > msm_pm_suspend+0x18/0x38 [msm] > dpm_run_callback+0x84/0x378 I wonder if we're missing a pm_runtime_disable() call on the failure path? i.e. something like the diff below... Will --->8 diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c index 25ed444ff94d..ce8f354755d0 100644 --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c @@ -836,14 +836,14 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) ret = devm_of_platform_populate(dev); if (ret) { dev_err(dev, "Failed to populate iommu contexts\n"); - return ret; + goto err_pm_disable; } ret = iommu_device_sysfs_add(&qcom_iommu->iommu, dev, NULL, dev_name(dev)); if (ret) { dev_err(dev, "Failed to register iommu in sysfs\n"); - return ret; + goto err_pm_disable; } ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, dev); @@ -869,6 +869,9 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) err_sysfs_remove: iommu_device_sysfs_remove(&qcom_iommu->iommu); + +err_pm_disable: + pm_runtime_disable(dev); return ret; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path
Hi, On 08.06.2021 18:45, Amey Narkhede wrote: > If device registration fails, remove sysfs attribute > and if setting bus callbacks fails, unregister the device > and cleanup the sysfs attribute. > > Signed-off-by: Amey Narkhede This patch landed in linux-next some time ago as commit 249c9dc6aa0d ("iommu/arm: Cleanup resources in case of probe error path"). After bisecting and some manual searching I finally found that it is responsible for breaking s2idle on DragonBoard 410c. Here is the log (captured with no_console_suspend): # time rtcwake -s10 -mmem rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan 1 00:02:13 1970 PM: suspend entry (s2idle) Filesystems sync: 0.002 seconds Freezing user space processes ... (elapsed 0.006 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done. Unable to handle kernel NULL pointer dereference at virtual address 0070 Mem abort info: ESR = 0x9606 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x0006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=8ad08000 [0070] pgd=080085c3c003, p4d=080085c3c003, pud=080088dcf003, pmd= Internal error: Oops: 9606 [#1] PREEMPT SMP Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511 snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2 snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916 videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) pc : msm_runtime_suspend+0x1c/0x60 [msm] lr : msm_pm_suspend+0x18/0x38 [msm] ... Call trace: msm_runtime_suspend+0x1c/0x60 [msm] msm_pm_suspend+0x18/0x38 [msm] dpm_run_callback+0x84/0x378 __device_suspend+0x118/0x680 dpm_suspend+0x150/0x4f0 dpm_suspend_start+0x98/0xa0 suspend_devices_and_enter+0xfc/0xaf0 pm_suspend+0x2b0/0x3d0 state_store+0x84/0x108 kobj_attr_store+0x14/0x28 sysfs_kf_write+0x60/0x70 kernfs_fop_write_iter+0x124/0x1a8 new_sync_write+0xe8/0x1b0 vfs_write+0x1e8/0x450 ksys_write+0x64/0xf0 __arm64_sys_write+0x14/0x20 invoke_syscall+0x40/0xf8 el0_svc_common+0x60/0x100 do_el0_svc_compat+0x1c/0x48 el0_svc_compat+0x20/0x30 el0t_32_sync_handler+0xec/0x140 el0t_32_sync+0x168/0x16c Code: 910003fd f9000bf3 f9403c02 52800040 (f9403842) ---[ end trace 215b72fcd7026947 ]--- Reverting it on top of linux-next fixes s2idle oepration on that board. > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 -- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 15 --- > drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 +++-- > 3 files changed, 35 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 54b2f27b81d4..de2499754025 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -3669,10 +3669,20 @@ static int arm_smmu_device_probe(struct > platform_device *pdev) > ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev); > if (ret) { > dev_err(dev, "Failed to register iommu\n"); > - return ret; > + goto err_sysfs_remove; > } > > - return arm_smmu_set_bus_ops(&arm_smmu_ops); > + ret = arm_smmu_set_bus_ops(&arm_smmu_ops); > + if (ret) > + goto err_unregister_device; > + > + return 0; > + > +err_unregister_device: > + iommu_device_unregister(&smmu->iommu); > +err_sysfs_remove: > + iommu_device_sysfs_remove(&smmu->iommu); > + return ret; > } > > static int arm_smmu_device_remove(struct platform_device *pdev) > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c > b/drivers/iommu/arm/arm-smmu/arm-smmu.c > index 6f72c4d208ca..88a3023676ce 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -2164,7 +2164,7 @@ static int arm_smmu_device_probe(struct platform_device > *pdev) > err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev); > if (err) { > dev_err(dev, "Failed to register iommu\n"); > - return err; > + goto err_sysfs_remove; > } > > platform_set_drvdata(pdev, smmu); > @@ -2187,10 +2187,19 @@ static int arm_smmu_device_probe(struct > pla
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, Jun 30, 2021 at 05:17:27PM +0800, Claire Chang wrote: > On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > > use it to determine whether to bounce the data or not. This will be > > > useful later to allow for different pools. > > > > > > Signed-off-by: Claire Chang > > > Reviewed-by: Christoph Hellwig > > > Tested-by: Stefano Stabellini > > > Tested-by: Will Deacon > > > Acked-by: Stefano Stabellini > > > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > > get to an X session consistently (although not every single time), > > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > > to provide any further information, debug, or test patches as necessary. > > Are you using swiotlb=force? or the swiotlb_map is called because of > !dma_capable? > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) The command line is in the dmesg: | Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux-next-llvm.img root=PARTUUID=8680aa0c-cf09-4a69-8cf3-970478040ee7 rw intel_pstate=no_hwp irqpoll but I worry that this looks _very_ similar to the issue reported by Qian Cai which we thought we had fixed. Nathan -- is the failure deterministic? > `BUG: unable to handle page fault for address: 003a8290` and > the fact it crashed at `_raw_spin_lock_irqsave` look like the memory > (maybe dev->dma_io_tlb_mem) was corrupted? > The dev->dma_io_tlb_mem should be set here > (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) > through device_initialize. I'm less sure about this. 'dma_io_tlb_mem' should be pointing at 'io_tlb_default_mem', which is a page-aligned allocation from memblock. The spinlock is at offset 0x24 in that structure, and looking at the register dump from the crash: Jun 29 18:28:42 hp-4300G kernel: RSP: 0018:adb4013db9e8 EFLAGS: 00010006 Jun 29 18:28:42 hp-4300G kernel: RAX: 003a8290 RBX: RCX: 8900572ad580 Jun 29 18:28:42 hp-4300G kernel: RDX: 89005653f024 RSI: 000c RDI: 1d17 Jun 29 18:28:42 hp-4300G kernel: RBP: 0a20d000 R08: 000c R09: Jun 29 18:28:42 hp-4300G kernel: R10: 0a20d000 R11: 89005653f000 R12: 0212 Jun 29 18:28:42 hp-4300G kernel: R13: 1000 R14: 0002 R15: 0020 Jun 29 18:28:42 hp-4300G kernel: FS: 7f1f8898ea40() GS:89005728() knlGS: Jun 29 18:28:42 hp-4300G kernel: CS: 0010 DS: ES: CR0: 80050033 Jun 29 18:28:42 hp-4300G kernel: CR2: 003a8290 CR3: 0001020d CR4: 00350ee0 Jun 29 18:28:42 hp-4300G kernel: Call Trace: Jun 29 18:28:42 hp-4300G kernel: _raw_spin_lock_irqsave+0x39/0x50 Jun 29 18:28:42 hp-4300G kernel: swiotlb_tbl_map_single+0x12b/0x4c0 Then that correlates with R11 holding the 'dma_io_tlb_mem' pointer and RDX pointing at the spinlock. Yet RAX is holding junk :/ I agree that enabling KASAN would be a good idea, but I also think we probably need to get some more information out of swiotlb_tbl_map_single() to see see what exactly is going wrong in there. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
On Wed, Jun 30, 2021 at 01:12:26PM +0200, Marek Szyprowski wrote: > Hi Leon, > > On 29.06.2021 10:40, Leon Romanovsky wrote: > > From: Maor Gottlieb > > > > orig_nents should represent the number of entries with pages, > > but __sg_alloc_table_from_pages sets orig_nents as the number of > > total entries in the table. This is wrong when the API is used for > > dynamic allocation where not all the table entries are mapped with > > pages. It wasn't observed until now, since RDMA umem who uses this > > API in the dynamic form doesn't use orig_nents implicit or explicit > > by the scatterlist APIs. > > > > Fix it by: > > 1. Set orig_nents as number of entries with pages also in > > __sg_alloc_table_from_pages. > > 2. Add a new field total_nents to reflect the total number of entries > > in the table. This is required for the release flow (sg_free_table). > > This filed should be used internally only by scatterlist. > > > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of > > SG table from pages") > > Signed-off-by: Maor Gottlieb > > Signed-off-by: Leon Romanovsky <...> > For now I would suggest to revert this change. Thanks for the report, we will drop this patch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH rdma-next v1 1/2] lib/scatterlist: Fix wrong update of orig_nents
Hi Leon, On 29.06.2021 10:40, Leon Romanovsky wrote: > From: Maor Gottlieb > > orig_nents should represent the number of entries with pages, > but __sg_alloc_table_from_pages sets orig_nents as the number of > total entries in the table. This is wrong when the API is used for > dynamic allocation where not all the table entries are mapped with > pages. It wasn't observed until now, since RDMA umem who uses this > API in the dynamic form doesn't use orig_nents implicit or explicit > by the scatterlist APIs. > > Fix it by: > 1. Set orig_nents as number of entries with pages also in > __sg_alloc_table_from_pages. > 2. Add a new field total_nents to reflect the total number of entries > in the table. This is required for the release flow (sg_free_table). > This filed should be used internally only by scatterlist. > > Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of > SG table from pages") > Signed-off-by: Maor Gottlieb > Signed-off-by: Leon Romanovsky This patch landed in linux-next 20210630 as commit a52724456928 ("lib/scatterlist: Fix wrong update of orig_nents"). It causes serious regression in DMA-IOMMU integration, which can be observed for example on ARM Juno board during boot: Unable to handle kernel paging request at virtual address 00376f42a6e40454 Mem abort info: ESR = 0x9604 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 [00376f42a6e40454] address between user and kernel address ranges Internal error: Oops: 9604 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.13.0-next-20210630+ #3585 Hardware name: ARM Juno development board (r1) (DT) pstate: 8005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) pc : __sg_free_table+0x60/0xa0 lr : __sg_free_table+0x7c/0xa0 .. Call trace: __sg_free_table+0x60/0xa0 sg_free_table+0x1c/0x28 iommu_dma_alloc+0xc8/0x388 dma_alloc_attrs+0xcc/0xf0 dmam_alloc_attrs+0x68/0xb8 sil24_port_start+0x60/0xe0 ata_host_start.part.32+0xbc/0x208 ata_host_activate+0x64/0x150 sil24_init_one+0x1e8/0x268 local_pci_probe+0x3c/0xa0 pci_device_probe+0x128/0x1c8 really_probe+0x138/0x2d0 __driver_probe_device+0x78/0xd8 driver_probe_device+0x40/0x110 __driver_attach+0xcc/0x118 bus_for_each_dev+0x68/0xc8 driver_attach+0x20/0x28 bus_add_driver+0x168/0x1f8 driver_register+0x60/0x110 __pci_register_driver+0x5c/0x68 sil24_pci_driver_init+0x20/0x28 do_one_initcall+0x84/0x450 kernel_init_freeable+0x31c/0x38c kernel_init+0x20/0x120 ret_from_fork+0x10/0x18 Code: d37be885 6b01007f 5284 54a2 (f8656813) ---[ end trace 4ba4f0c9c48711a1 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b It looks that some changes to the scatterlist structures are missing outside of the lib/scatterlist.c. For now I would suggest to revert this change. > --- > include/linux/scatterlist.h | 8 ++-- > lib/scatterlist.c | 32 > 2 files changed, 14 insertions(+), 26 deletions(-) > > diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h > index 6f70572b2938..1c889141eb91 100644 > --- a/include/linux/scatterlist.h > +++ b/include/linux/scatterlist.h > @@ -35,8 +35,12 @@ struct scatterlist { > > struct sg_table { > struct scatterlist *sgl;/* the list */ > - unsigned int nents; /* number of mapped entries */ > - unsigned int orig_nents;/* original size of list */ > + unsigned int nents; /* number of DMA mapped entries */ > + unsigned int orig_nents;/* number of CPU mapped entries */ > + /* The fields below should be used internally only by > + * scatterlist implementation. > + */ > + unsigned int total_nents; /* number of total entries in the table > */ > }; > > /* > diff --git a/lib/scatterlist.c b/lib/scatterlist.c > index a59778946404..6db70a1e7dd0 100644 > --- a/lib/scatterlist.c > +++ b/lib/scatterlist.c > @@ -192,33 +192,26 @@ static void sg_kfree(struct scatterlist *sg, unsigned > int nents) > void __sg_free_table(struct sg_table *table, unsigned int max_ents, >unsigned int nents_first_chunk, sg_free_fn *free_fn) > { > - struct scatterlist *sgl, *next; > + struct scatterlist *sgl, *next = NULL; > unsigned curr_max_ents = nents_first_chunk ?: max_ents; > > if (unlikely(!table->sgl)) > return; > > sgl = table->sgl; > - while (table->orig_nents) { > - unsigned int alloc_size = table->orig_nents; > - unsigned int
Re: [PATCH] iommu/arm-smmu-v3: Add default domain quirk for Arm Fast Models
On 2021-06-30 09:56, Marc Zyngier wrote: On Tue, 29 Jun 2021 18:34:40 +0100, Will Deacon wrote: On Fri, Jun 18, 2021 at 05:24:49PM +0100, Robin Murphy wrote: Arm Fast Models are still implementing legacy virtio-pci devices behind the SMMU, which continue to be problematic as "real hardware" devices (from the point of view of the simulated system) without the mitigating VIRTIO_F_ACCESS_PLATFORM feature. Since we now have the ability to force passthrough on a device-specific basis, let's use it to work around this particular oddity so that people who just want to boot Linux on a model don't have to fiddle around with things to avoid the SMMU getting in the way by default (and I don't have to keep telling them which things...) Signed-off-by: Robin Murphy --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++ 1 file changed, 15 insertions(+) Any chance of getting the fastmodels updated instead? It feels like it has to happen *eventually*, and then there would be no need for this bodge. That'd be ideal. What are the chances of that happening before the Sun turns into a black hole? We'll try making some noise again internally and see where that goes, but given the progress over the last 4 years I'd be inclined to posit a new theory of the universe eventually ending in one giant event 0x10. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 54b2f27b81d4..13cf16e8f45b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2649,6 +2649,20 @@ static int arm_smmu_dev_disable_feature(struct device *dev, } } +static int arm_smmu_def_domain_type(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + /* Legacy virtio-block devices on Arm Fast Models */ + if (pdev->vendor == 0x1af4 && pdev->device == 0x1001 && + pdev->subsystem_vendor == 0x00ff && pdev->subsystem_device == 0x0002) + return IOMMU_DOMAIN_IDENTITY; + } + + return 0; +} Could this be expressed as a PCI quirk instead? It would at least keep the ID matching out of the SMMU driver... I don't think so - getting the information from the PCI layer to the IOMMU layer at the right place and time to influence the default domain choice would seemingly need some uglier and even less popular infrastructure adding. It's something that only the IOMMU layer has any need to be aware of, and the def_domain_type callback essentially exists for this kind of special-casing - compare intel-iommu's device_def_domain_type() for example. If we accept a workaround at all, I do believe this is the least-worst option. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
Hi Will, On 2021-03-25 23:03, Will Deacon wrote: On Tue, Mar 09, 2021 at 12:10:44PM +0530, Sai Prakash Ranjan wrote: On 2021-02-05 17:38, Sai Prakash Ranjan wrote: > On 2021-02-04 03:16, Will Deacon wrote: > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote: > > > On 2021-02-01 23:50, Jordan Crouse wrote: > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote: > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon wrote: > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote: > > > > > > > On 2021-01-29 14:35, Will Deacon wrote: > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote: > > > > > > > > > +#define IOMMU_LLC(1 << 6) > > > > > > > > > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it > > > > > > > > will > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a > > > > > > > > MAIR > > > > > > > > set up for this memory type). Now, we also have that issue for the PTW, > > > > > > > > but > > > > > > > > since we always use cache maintenance (i.e. the streaming API) for > > > > > > > > publishing the page-tables to a non-coheren walker, it works out. > > > > > > > > However, > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent > > > > > > > > allocation, then they're potentially in for a nasty surprise due to the > > > > > > > > mismatched outer-cacheability attributes. > > > > > > > > > > > > > > > > > > > > > > Can't we add the syscached memory type similar to what is done on android? > > > > > > > > > > > > Maybe. How does the GPU driver map these things on the CPU side? > > > > > > > > > > Currently we use writecombine mappings for everything, although there > > > > > are some cases that we'd like to use cached (but have not merged > > > > > patches that would give userspace a way to flush/invalidate) > > > > > > > > > > > > > LLC/system cache doesn't have a relationship with the CPU cache. Its > > > > just a > > > > little accelerator that sits on the connection from the GPU to DDR and > > > > caches > > > > accesses. The hint that Sai is suggesting is used to mark the buffers as > > > > 'no-write-allocate' to prevent GPU write operations from being cached in > > > > the LLC > > > > which a) isn't interesting and b) takes up cache space for read > > > > operations. > > > > > > > > Its easiest to think of the LLC as a bonus accelerator that has no cost > > > > for > > > > us to use outside of the unfortunate per buffer hint. > > > > > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a > > > > different hint) and in that case we have all of concerns that Will > > > > identified. > > > > > > > > > > For mismatched outer cacheability attributes which Will > > > mentioned, I was > > > referring to [1] in android kernel. > > > > I've lost track of the conversation here :/ > > > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also > > mapped > > into the CPU and with what attributes? Rob said "writecombine for > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC? > > > > Rob answered this. > > > Finally, we need to be careful when we use the word "hint" as > > "allocation > > hint" has a specific meaning in the architecture, and if we only > > mismatch on > > those then we're actually ok. But I think IOMMU_LLC is more than > > just a > > hint, since it actually drives eviction policy (i.e. it enables > > writeback). > > > > Sorry for the pedantry, but I just want to make sure we're all talking > > about the same things! > > > > Sorry for the confusion which probably was caused by my mentioning of > android, NWA(no write allocate) is an allocation hint which we can > ignore > for now as it is not introduced yet in upstream. > Any chance of taking this forward? We do not want to miss out on small fps gain when the product gets released. Do we have a solution to the mismatched virtual alias? Sorry for the long delay on this thread. For mismatched virtual alias question, wasn't this already discussed in stretch when initial support for system cache [1] (which was reverted by you) was added? Excerpt from there, "As seen in downstream kernels there are few non-coherent devices which would not want to allocate in system cache, and therefore would want Inner/Outer non-cached memory. So, we may want to either override the attributes per-device, or as you suggested we may want to introduce another memory type 'sys-cached' that can be added with its separate infra." As for DMA API usage, we do not have any upstream users (video will be one if they decide to upstream that). [1] https://patchwork.kernel.org/project/linux-arm-msm/patch/20180615105329.26800-1-vivek.gau...@codeaurora.org/ Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: Re: [PATCH v8 10/10] Documentation: Add documentation for VDUSE
On Tue, Jun 29, 2021 at 01:43:11PM +0800, Yongji Xie wrote: > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi wrote: > > On Tue, Jun 15, 2021 at 10:13:31PM +0800, Xie Yongji wrote: > > > + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) > > > + { > > > + int fd; > > > + void *addr; > > > + size_t size; > > > + struct vduse_iotlb_entry entry; > > > + > > > + entry.start = iova; > > > + entry.last = iova + 1; > > > > Why +1? > > > > I expected the request to include *len so that VDUSE can create a bounce > > buffer for the full iova range, if necessary. > > > > The function is used to translate iova to va. And the *len is not > specified by the caller. Instead, it's used to tell the caller the > length of the contiguous iova region from the specified iova. And the > ioctl VDUSE_IOTLB_GET_FD will get the file descriptor to the first > overlapped iova region. So using iova + 1 should be enough here. Does the entry.last field have any purpose with VDUSE_IOTLB_GET_FD? I wonder why userspace needs to assign a value at all if it's always +1. > > > > + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry); > > > + if (fd < 0) > > > + return NULL; > > > + > > > + size = entry.last - entry.start + 1; > > > + *len = entry.last - iova + 1; > > > + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, > > > + fd, entry.offset); > > > + close(fd); > > > + if (addr == MAP_FAILED) > > > + return NULL; > > > + > > > + /* do something to cache this iova region */ > > > > How is userspace expected to manage iotlb mmaps? When should munmap(2) > > be called? > > > > The simple way is using a list to store the iotlb mappings. And we > should call the munmap(2) for the old mappings when VDUSE_UPDATE_IOTLB > or VDUSE_STOP_DATAPLANE message is received. Thanks for explaining. It would be helpful to have a description of IOTLB operation in this document. > > Should userspace expect VDUSE_IOTLB_GET_FD to return a full chunk of > > guest RAM (e.g. multiple gigabytes) that can be cached permanently or > > will it return just enough pages to cover [start, last)? > > > > It should return one iotlb mapping that covers [start, last). In > vhost-vdpa cases, it might be a full chunk of guest RAM. In > virtio-vdpa cases, it might be the whole bounce buffer or one coherent > mapping (produced by dma_alloc_coherent()). Great, thanks. Adding something about this to the documentation would help others implementing VDUSE devices or libraries. > > > + > > > + return addr + iova - entry.start; > > > + } > > > + > > > +- VDUSE_DEV_GET_FEATURES: Get the negotiated features > > > > Are these VIRTIO feature bits? Please explain how feature negotiation > > works. There must be a way for userspace to report the device's > > supported feature bits to the kernel. > > > > Yes, these are VIRTIO feature bits. Userspace will specify the > device's supported feature bits when creating a new VDUSE device with > ioctl(VDUSE_CREATE_DEV). Can the VDUSE device influence feature bit negotiation? For example, if the VDUSE virtio-blk device does not implement discard/write-zeroes, how does QEMU or the guest find out about this? > > > +- VDUSE_DEV_UPDATE_CONFIG: Update the configuration space and inject a > > > config interrupt > > > > Does this mean the contents of the configuration space are cached by > > VDUSE? > > Yes, but the kernel will also store the same contents. > > > The downside is that the userspace code cannot generate the > > contents on demand. Most devices doin't need to generate the contents > > on demand, so I think this is okay but I had expected a different > > interface: > > > > kernel->userspace VDUSE_DEV_GET_CONFIG > > userspace->kernel VDUSE_DEV_INJECT_CONFIG_IRQ > > > > The problem is how to handle the failure of VDUSE_DEV_GET_CONFIG. We > will need lots of modification of virtio codes to support that. So to > make it simple, we choose this way: > > userspace -> kernel VDUSE_DEV_SET_CONFIG > userspace -> kernel VDUSE_DEV_INJECT_CONFIG_IRQ > > > I think you can leave it the way it is, but I wanted to mention this in > > case someone thinks it's important to support generating the contents of > > the configuration space on demand. > > > > Sorry, I didn't get you here. Can't VDUSE_DEV_SET_CONFIG and > VDUSE_DEV_INJECT_CONFIG_IRQ achieve that? If the contents of the configuration space change continuously, then the VDUSE_DEV_SET_CONFIG approach is inefficient and might have race conditions. For example, imagine a device where the driver can read a timer from the configuration space. I think the VIRTIO device model allows that although I'm not aware of any devices that do something like it today. The problem is that VDUSE_DEV_SET_CONFIG would have to be called f
Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
On Tue, Jun 29, 2021 at 10:59:51AM +0800, Yongji Xie wrote: > On Mon, Jun 28, 2021 at 9:02 PM Stefan Hajnoczi wrote: > > > > On Tue, Jun 15, 2021 at 10:13:30PM +0800, Xie Yongji wrote: > > > +/* ioctls */ > > > + > > > +struct vduse_dev_config { > > > + char name[VDUSE_NAME_MAX]; /* vduse device name */ > > > + __u32 vendor_id; /* virtio vendor id */ > > > + __u32 device_id; /* virtio device id */ > > > + __u64 features; /* device features */ > > > + __u64 bounce_size; /* bounce buffer size for iommu */ > > > + __u16 vq_size_max; /* the max size of virtqueue */ > > > > The VIRTIO specification allows per-virtqueue sizes. A device can have > > two virtqueues, where the first one allows up to 1024 descriptors and > > the second one allows only 128 descriptors, for example. > > > > Good point! But it looks like virtio-vdpa/virtio-pci doesn't support > that now. All virtqueues have the same maximum size. I see struct vpda_config_ops only supports a per-device max vq size: u16 (*get_vq_num_max)(struct vdpa_device *vdev); virtio-pci supports per-virtqueue sizes because the struct virtio_pci_common_cfg->queue_size register is per-queue (controlled by queue_select). I guess this is a question for Jason: will vdpa will keep this limitation? If yes, then VDUSE can stick to it too without running into problems in the future. > > > + __u16 padding; /* padding */ > > > + __u32 vq_num; /* the number of virtqueues */ > > > + __u32 vq_align; /* the allocation alignment of virtqueue's metadata > > > */ > > > > I'm not sure what this is? > > > > This will be used by vring_create_virtqueue() too. If there is no official definition for the meaning of this value then "/* same as vring_create_virtqueue()'s vring_align parameter */" would be clearer. That way the reader knows what to research in order to understand how this field works. I don't remember but maybe it was used to support vrings when the host/guest have non-4KB page sizes. I wonder if anyone has an official definition for this value? signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On 6/21/21 5:15 PM, Andy Lutomirski wrote: On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson wrote: On 6/18/21 2:32 PM, Robin Murphy wrote: On 2021-06-18 17:12, Ross Philipson wrote: The IOMMU should always be set to default translated type after the PMRs are disabled to protect the MLE from DMA. Signed-off-by: Ross Philipson --- drivers/iommu/intel/iommu.c | 5 + drivers/iommu/iommu.c | 6 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284..4f0256d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev) */ static int device_def_domain_type(struct device *dev) { +/* Do not allow identity domain when Secure Launch is configured */ +if (slaunch_get_flags() & SL_FLAG_ACTIVE) +return IOMMU_DOMAIN_DMA; Is this specific to Intel? It seems like it could easily be done commonly like the check for untrusted external devices. It is currently Intel only but that will change. I will look into what you suggest. + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d..d49b7dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; -iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + +/* Do not allow identity domain when Secure Launch is configured */ +if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) +iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; Quietly ignoring the setting and possibly leaving iommu_def_domain_type uninitialised (note that 0 is not actually a usable type) doesn't seem great. AFAICS this probably warrants similar treatment to the Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event though passthrough was requested. Or perhaps something more is needed here? mem_encrypt_active() case - there doesn't seem a great deal of value in trying to save users from themselves if they care about measured boot yet explicitly pass options which may compromise measured boot. If you really want to go down that route there's at least the sysfs interface you'd need to nobble as well, not to mention the various ways of completely disabling IOMMUs... Doing a secure launch with the kernel is not a general purpose user use case. A lot of work is done to secure the environment. Allowing passthrough mode would leave the secure launch kernel exposed to DMA. I think what we are trying to do here is what we intend though there may be a better way or perhaps it is incomplete as you suggest. I don't really like all these special cases. Generically, what you're trying to do is (AFAICT) to get the kernel to run in a mode in which it does its best not to trust attached devices. Nothing about this is specific to Secure Launch. There are plenty of scenarios in which this the case: - Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen device domains (did I get the name right), whatever tricks QEMU has, etc. - SRTM / DRTM technologies (including but not limited to Secure Launch -- plain old Secure Boot can work like this too). - Secure guest technologies, including but not limited to TDX and SEV. - Any computer with a USB-C port or other external DMA-capable port. - Regular computers in which the admin wants to enable this mode for whatever reason. Can you folks all please agree on a coordinated way for a Linux kernel to configure itself appropriately? Or to be configured via initramfs, boot option, or some other trusted source of configuration supplied at boot time? We don't need a whole bunch of if (TDX), if (SEV), if (secure launch), if (I have a USB-C port with PCIe exposed), if (running on Xen), and similar checks all over the place. I replied to Robin Murphy in another thread. As far as the IOMMU is concerned, I think we need to rethink our approach. As to the other technologies you mention here, we have not considered special casing anything at this point. Thanks Ross ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On 6/22/21 7:06 AM, Robin Murphy wrote: On 2021-06-21 18:51, Ross Philipson wrote: On 6/18/21 2:32 PM, Robin Murphy wrote: On 2021-06-18 17:12, Ross Philipson wrote: The IOMMU should always be set to default translated type after the PMRs are disabled to protect the MLE from DMA. Signed-off-by: Ross Philipson --- drivers/iommu/intel/iommu.c | 5 + drivers/iommu/iommu.c | 6 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284..4f0256d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev) */ static int device_def_domain_type(struct device *dev) { + /* Do not allow identity domain when Secure Launch is configured */ + if (slaunch_get_flags() & SL_FLAG_ACTIVE) + return IOMMU_DOMAIN_DMA; Is this specific to Intel? It seems like it could easily be done commonly like the check for untrusted external devices. It is currently Intel only but that will change. I will look into what you suggest. Yeah, it's simple and unobtrusive enough that I reckon it's worth going straight to the common version if it's worth doing at all. + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d..d49b7dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + + /* Do not allow identity domain when Secure Launch is configured */ + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; Quietly ignoring the setting and possibly leaving iommu_def_domain_type uninitialised (note that 0 is not actually a usable type) doesn't seem great. AFAICS this probably warrants similar treatment to the Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event though passthrough was requested. Or perhaps something more is needed here? mem_encrypt_active() case - there doesn't seem a great deal of value in trying to save users from themselves if they care about measured boot yet explicitly pass options which may compromise measured boot. If you really want to go down that route there's at least the sysfs interface you'd need to nobble as well, not to mention the various ways of completely disabling IOMMUs... Doing a secure launch with the kernel is not a general purpose user use case. A lot of work is done to secure the environment. Allowing passthrough mode would leave the secure launch kernel exposed to DMA. I think what we are trying to do here is what we intend though there may be a better way or perhaps it is incomplete as you suggest. On second thoughts this is overkill anyway - if you do hook iommu_get_def_domain_type(), you're done (in terms of the kernel-managed setting, at least); it doesn't matter what iommu_def_domain_type gets set to if will never get used. However, since this isn't really a per-device thing, it might be more semantically appropriate to leave that alone and instead only massage the default type in iommu_subsys_init(), as for memory encryption. When you say "secure the environment", what's the actual threat model here, i.e. who's securing what against whom? If it's a device lockdown type thing where the system owner wants to defend against the end user trying to mess with the software stack or gain access to parts they shouldn't, then possibly you can trust the command line, but there are definitely other places which need consideration. If on the other hand it's more about giving the end user confidence that their choice of software stack isn't being interfered with by a malicious host or external third parties, then it probably leans towards the opposite being true... If the command line *is* within the threat model, consider "iommu=off" and/or "intel_iommu=off" for example: I don't know how PMRs work, but I can only imagine that that's liable to leave things either wide open, or blocked to the point of no DMA working at all, neither of which seems to be what you want. I'm guessing "intel_iommu=tboot_noforce" might have some relevant implications too. Thank you for you suggestions and feedback. Sorry we did not get back sooner. After the comments from you and Andy Lutomirski we decided we needed to re-imagine what we are trying to accomplish here and how else we might approach it. Ross It might be reasonable to make IOMMU_DEFAULT_P
Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, Jun 30, 2021 at 9:43 AM Nathan Chancellor wrote: > > On Thu, Jun 24, 2021 at 11:55:20PM +0800, Claire Chang wrote: > > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > > use it to determine whether to bounce the data or not. This will be > > useful later to allow for different pools. > > > > Signed-off-by: Claire Chang > > Reviewed-by: Christoph Hellwig > > Tested-by: Stefano Stabellini > > Tested-by: Will Deacon > > Acked-by: Stefano Stabellini > > This patch as commit af452ec1b1a3 ("swiotlb: Use is_swiotlb_force_bounce > for swiotlb data bouncing") causes my Ryzen 3 4300G system to fail to > get to an X session consistently (although not every single time), > presumably due to a crash in the AMDGPU driver that I see in dmesg. > > I have attached logs at af452ec1b1a3 and f127c9556a8e and I am happy > to provide any further information, debug, or test patches as necessary. Are you using swiotlb=force? or the swiotlb_map is called because of !dma_capable? (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/dma/direct.h#n93) `BUG: unable to handle page fault for address: 003a8290` and the fact it crashed at `_raw_spin_lock_irqsave` look like the memory (maybe dev->dma_io_tlb_mem) was corrupted? The dev->dma_io_tlb_mem should be set here (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/probe.c#n2528) through device_initialize. I can't tell what happened from the logs, but maybe we could try KASAN to see if it provides more clue. Thanks, Claire > > Cheers, > Nathan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 3/6] vfio: remove the unused mdev iommu hook
> From: Joerg Roedel > Sent: Monday, May 17, 2021 11:35 PM > > On Mon, May 17, 2021 at 10:35:00AM -0300, Jason Gunthorpe wrote: > > Well, I'm sorry, but there is a huge other thread talking about the > > IOASID design in great detail and why this is all needed. Jumping into > > this thread without context and basically rejecting all the > > conclusions that were reached over the last several weeks is really > > not helpful - especially since your objection is not technical. > > > > I think you should wait for Intel to put together the /dev/ioasid uAPI > > proposal and the example use cases it should address then you can give > > feedback there, with proper context. > > Yes, I think the next step is that someone who read the whole thread > writes up the conclusions and a rough /dev/ioasid API proposal, also > mentioning the use-cases it addresses. Based on that we can discuss the > implications this needs to have for IOMMU-API and code. > > From the use-cases I know the mdev concept is just fine. But if there is > a more generic one we can talk about it. > Although /dev/iommu v2 proposal is still in progress, I think there are enough background gathered in v1 to resume this discussion now. In a nutshell /dev/iommu requires two sets of services from the iommu layer: - for an kernel-managed I/O page table via map/unmap; - for an user-managed I/O page table via bind/invalidate and nested on a kernel-managed parent I/O page table; Each I/O page table could be attached by multiple devices. /dev/iommu maintains device specific routing information (RID, or RID+PASID) for where to install the I/O page table in the IOMMU for each attached device. Kernel-managed page table is represented by iommu domain. Existing IOMMU-API allows /dev/iommu to attach a RID device to iommu domain. A new interface is required, e.g. iommu_attach_device_pasid(domain, dev, pasid), to cover (RID+PASID) attaching. Once attaching succeeds, no change to following map/unmap which are domain-wide thus applied to both RID and RID+PASID. In case of dev_iotlb invalidation is required, the iommu driver is responsible for handling it for every attached RID or RID+PASID if ats is enabled. to take one example, the parent (RID1) has three work queues. WQ1 is for parent's own DMA-API usage, with WQ2 (PASID-x) assigned to VM1 and WQ3 (PASID-y) assigned to VM2. VM2 is also assigned with another device (RID2). In this case there are three kernel-managed I/O page tables (IOVA in kernel, GPA for VM1 and GPA for VM2), thus RID1 is attached to three domains: RID1 --- domain1 (default, IOVA) | | | |-- [RID1] | |-- domain2 (vm1, GPA) | | | |-- [RID1, PASID-x] | |-- domain3 (vm2, GPA) | | | |-- [RID1, PASID-y] | | | |-- [RID2] The iommu layer should maintain above attaching status per device and per iommu domain. There is no mdev/subdev concept in the iommu layer. It's just about RID or PASID. User-manage I/O page table might be represented by a new object which describes: - routing information (RID or RID+PASID) - pointer to iommu_domain of the parent I/O page table (inherit the domain ID in iotlb due to nesting) - address of the I/O page table There might be chance to share the structure with native SVA which also has page table managed outside of iommu subsystem. But we can leave it and figure out until coding. And a new set of IOMMU-API: - iommu_{un}bind_pgtable(domain, dev, addr); - iommu_{un}bind_pgtable_pasid(domain, dev, addr, pasid); - iommu_cache_invalidate(domain, dev, invalid_info); - and APIs for registering fault handler and completing faults; (here 'domain' is the one representing the parent I/O page table) Because nesting essentially creates a new reference to the parent I/O page table, iommu_bind_pgtable_pasid() implicitly calls __iommu_attach_ device_pasid() to setup the connection between the parent domain and the new [RID,PASID]. It's not necessary to do so for iommu_bind_pgtable() since the RID is already attached when the parent I/O page table is created. In consequence the example topology is updated as below, with guest SVA enabled in both vm1 and vm2: RID1 --- domain1 (default, IOVA) | | | |-- [RID1] | |-- domain2 (vm1, GPA) | | | |-- [RID1, PASID-x] | |-- [RID1, PASID-a] // nested for vm1 process1 | |-- [RID1, PASID-b] // nested for vm1 process2 | |-- domain3 (vm2, GPA) | | | |-- [RID1, PASID-y] | |-- [RID1, PASID-c] // nested for vm2 process1 | | | |-- [RID2] | |-- [RID2, PASID-a] // nested for vm2 process2 Thoughts? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-v3: Add default domain quirk for Arm Fast Models
On Tue, 29 Jun 2021 18:34:40 +0100, Will Deacon wrote: > > On Fri, Jun 18, 2021 at 05:24:49PM +0100, Robin Murphy wrote: > > Arm Fast Models are still implementing legacy virtio-pci devices behind > > the SMMU, which continue to be problematic as "real hardware" devices > > (from the point of view of the simulated system) without the mitigating > > VIRTIO_F_ACCESS_PLATFORM feature. > > > > Since we now have the ability to force passthrough on a device-specific > > basis, let's use it to work around this particular oddity so that people > > who just want to boot Linux on a model don't have to fiddle around with > > things to avoid the SMMU getting in the way by default (and I don't have > > to keep telling them which things...) > > > > Signed-off-by: Robin Murphy > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++ > > 1 file changed, 15 insertions(+) > > Any chance of getting the fastmodels updated instead? It feels like it > has to happen *eventually*, and then there would be no need for this bodge. That'd be ideal. What are the chances of that happening before the Sun turns into a black hole? > Will > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index 54b2f27b81d4..13cf16e8f45b 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -2649,6 +2649,20 @@ static int arm_smmu_dev_disable_feature(struct > > device *dev, > > } > > } > > > > +static int arm_smmu_def_domain_type(struct device *dev) > > +{ > > + if (dev_is_pci(dev)) { > > + struct pci_dev *pdev = to_pci_dev(dev); > > + > > + /* Legacy virtio-block devices on Arm Fast Models */ > > + if (pdev->vendor == 0x1af4 && pdev->device == 0x1001 && > > + pdev->subsystem_vendor == 0x00ff && pdev->subsystem_device > > == 0x0002) > > + return IOMMU_DOMAIN_IDENTITY; > > + } > > + > > + return 0; > > +} Could this be expressed as a PCI quirk instead? It would at least keep the ID matching out of the SMMU driver... Thanks, M. -- Without deviation from the norm, progress is not possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 7/8] iommu/arm-smmu: Get associated RMR info and install bypass SMR
> -Original Message- > From: Jon Nettleton [mailto:j...@solid-run.com] > Sent: 29 June 2021 17:26 > To: Robin Murphy > Cc: Shameerali Kolothum Thodi ; > linux-arm-kernel ; ACPI Devel Maling > List ; iommu@lists.linux-foundation.org; Linuxarm > ; Steven Price ; Guohanjun > (Hanjun Guo) ; yangyicong > ; sami.muja...@arm.com; wanghuiqiang > > Subject: Re: [PATCH v5 7/8] iommu/arm-smmu: Get associated RMR info and > install bypass SMR > [...] > Shameer, > > Sorry for the delays. Here is a diff of the changes that should > address the issues pointed out by Robin, > I have tested that this works as expected on my HoneyComb LX2160A. Ok. Thanks for that. > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c > b/drivers/iommu/arm/arm-smmu/arm-smmu.c > index ab7b9db77625..a358bd326d0b 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -2068,29 +2068,21 @@ static void > arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu) > struct list_head rmr_list; > struct iommu_resv_region *e; > int i, cnt = 0; > - u32 smr; > u32 reg; > > INIT_LIST_HEAD(&rmr_list); > if (iommu_dma_get_rmrs(dev_fwnode(smmu->dev), &rmr_list)) > return; > > - reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0); > + /* Rather than trying to look at existing mappings that > +* are setup by the firmware and then invalidate the ones > +* that do no have matching RMR entries, just disable the > +* SMMU until it gets enabled again in the reset routine. > +*/ > > - if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & > ARM_SMMU_sCR0_CLIENTPD)) { > - /* > -* SMMU is already enabled and disallowing bypass, so > preserve > -* the existing SMRs > -*/ > - for (i = 0; i < smmu->num_mapping_groups; i++) { > - smr = arm_smmu_gr0_read(smmu, > ARM_SMMU_GR0_SMR(i)); > - if (!FIELD_GET(ARM_SMMU_SMR_VALID, smr)) > - continue; > - smmu->smrs[i].id = > FIELD_GET(ARM_SMMU_SMR_ID, smr); > - smmu->smrs[i].mask = > FIELD_GET(ARM_SMMU_SMR_MASK, smr); > - smmu->smrs[i].valid = true; > - } > - } > + reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0); > + reg &= ~ARM_SMMU_sCR0_CLIENTPD; > + arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sCR0, reg); > > list_for_each_entry(e, &rmr_list, list) { > u32 sid = e->fw_data.rmr.sid; > @@ -2100,25 +2092,16 @@ static void > arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu) > continue; > if (smmu->s2crs[i].count == 0) { > smmu->smrs[i].id = sid; > - smmu->smrs[i].mask = ~0; > + smmu->smrs[i].mask = 0; > smmu->smrs[i].valid = true; > } > smmu->s2crs[i].count++; > smmu->s2crs[i].type = S2CR_TYPE_BYPASS; > smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT; > - smmu->s2crs[i].cbndx = 0xff; > > cnt++; > } > > - if ((reg & ARM_SMMU_sCR0_USFCFG) && !(reg & > ARM_SMMU_sCR0_CLIENTPD)) { > - /* Remove the valid bit for unused SMRs */ > - for (i = 0; i < smmu->num_mapping_groups; i++) { > - if (smmu->s2crs[i].count == 0) > - smmu->smrs[i].valid = false; > - } > - } > - > dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt, >cnt == 1 ? "" : "s"); > iommu_dma_put_rmrs(dev_fwnode(smmu->dev), &rmr_list); > > Please include that in your next patch series. Let me know if you > want me to send you the patch direct > off the list. No problem, I will take this in next. Thanks, Shameer ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/24] dt-bindings: mediatek: mt8195: Add binding for MM IOMMU
On Wed, 2021-06-30 at 08:26 +0200, Krzysztof Kozlowski wrote: > On 30/06/2021 04:34, Yong Wu wrote: > > This patch adds descriptions for mt8195 IOMMU which also use ARM > > Short-Descriptor translation table format. > > > > In mt8195, there are two smi-common HW and IOMMU, one is for vdo(video > > output), the other is for vpp(video processing pipe). They connects > > with different smi-larbs, then some setting(larbid_remap) is different. > > Differentiate them with the compatible string. > > > > Something like this: > > > > IOMMU(VDO) IOMMU(VPP) > >| | > > SMI_COMMON_VDO SMI_COMMON_VPP > > --- > > | | ... | | ... > > larb0 larb2 ...larb1 larb3... > > > > Another change is that we have a new IOMMU that is for infra master like > > PCIe and USB. The infra master don't have the larb and ports, thus we > > rename the port header file to mt8195-memory-port.h rather than > > mt8195-larb-port.h. > > > > Also, the IOMMU is not only for MM, thus, we don't call it "m4u" which > > means "MultiMedia Memory Management UNIT". thus, use the "iommu" as the > > compatiable string. > > > > Signed-off-by: Yong Wu > > --- > > .../bindings/iommu/mediatek,iommu.yaml| 7 + > > .../dt-bindings/memory/mt8195-memory-port.h | 390 ++ > > 2 files changed, 397 insertions(+) > > create mode 100644 include/dt-bindings/memory/mt8195-memory-port.h > > > > I understand this will go through IOMMU tree. Do you know about any > further patches for memory controllers which will need the header? This header file will only be used in dtsi file. the iommu masters will use these port definitions in the dtsi. The SMI driver no need include this header file. By the way, the mt8195 SMI patches is at: https://lore.kernel.org/linux-mediatek/20210616114346.18812-1-yong...@mediatek.com/ > > Acked-by: Krzysztof Kozlowski Thanks very much for your quick review. > > > Best regards, > Krzysztof > > ___ > Linux-mediatek mailing list > linux-media...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] /dev/ioasid uAPI proposal
On Mon, Jun 07, 2021 at 11:14:24AM -0300, Jason Gunthorpe wrote: > "non-coherent DMA" is some general euphemism that evokes images of > embedded platforms that don't have coherent DMA at all and have low > cost ways to regain coherence. This is not at all what we are talking > about here at all. It literally is the same way of working. And not just low-end embedded platforms use this, but a lot of older server platforms did as well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] /dev/ioasid uAPI proposal
On Fri, Jun 04, 2021 at 08:58:05AM -0300, Jason Gunthorpe wrote: > On Fri, Jun 04, 2021 at 09:11:03AM +0800, Jason Wang wrote: > > > nor do any virtio drivers implement the required platform specific > > > cache flushing to make no-snoop TLPs work. > > > > I don't get why virtio drivers needs to do that. I think DMA API should hide > > those arch/platform specific stuffs from us. > > It is not arch/platform stuff. If the device uses no-snoop then a > very platform specific recovery is required in the device driver. Well, the proper way to support NO_SNOOP DMA would be to force the DMA API into supporting the device as if the platform was not DMA coherent, probably on a per-call basis. It is just that no one bothered to actually do the work an people just kept piling hacks over hacks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] /dev/ioasid uAPI proposal
On Wed, Jun 09, 2021 at 09:47:42AM -0300, Jason Gunthorpe wrote: > I can vaugely understand this rational for vfio, but not at all for > the platform's iommu driver, sorry. Agreed. More importantly the dependency is not for the platform iommu driver but just for the core iommu code, which is always built in if enabled. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu