Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > > I think we agreed (or agree to disagree and commit) for device > > > > > > types that > > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > > re-invent > > > > > > another way to do the same things. Not looking for a shortcut by > > > > > > any means, > > > > > > but we need to plan around existing hardware though. Looks like > > > > > > vDPA took > > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > > the "home" and needs to stand alone. > > > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > > > So just to make this clear, I did check internally if there are any > > > > plans > > > > for vDPA + SVM. There are none at the moment. > > > > > > Not SVM, SIOV. > > > > ... And that included.. I should have said vDPA + PASID, No current plans. > > I have no idea who set expectations with you. Can you please put me in > > touch > > with that person, privately is fine. > > It was the team that aruged VDPA had to be done through VFIO - SIOV > and PASID was one of their reasons it had to be VFIO, check the list > archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. > > If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, > From: Jason Wang > Sent: Thursday, October 22, 2020 10:56 AM > [...] > If you(Intel) don't have plan to do vDPA, you should not prevent other vendors > from implementing PASID capable hardware through non-VFIO subsystem/uAPI > on top of your SIOV architecture. Isn't it? yes, that's true. > So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g > it's not > hard to have a PASID capable virtio device through qemu, and we can start from > there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Regards, Yi Liu > > Thanks > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午11:54, Liu, Yi L wrote: Hi Jason, From: Jason Wang Sent: Thursday, October 22, 2020 10:56 AM [...] If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? yes, that's true. So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Yes, fine with me. Thanks Regards, Yi Liu Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 03:24:42PM -0300, Jason Gunthorpe wrote: > > > Contrary to your argument, vDPA went with a half blown device only > > iommu user without considering existing abstractions like containers > > VDPA IOMMU was done *for Intel*, as the kind of half-architected thing > you are advocating should be allowed for IDXD here. Not sure why you > think bashing that work is going to help your case here. I'm not bashing that work, sorry if it comes out that way, but just feels like double standards. I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native SVM is orthogonal to how we achieve mdev passthrough to guest and vSVM. We visited that exact thing multiple times. Doing SVM is quite simple and doesn't carry the weight of other (Kevin explained this in detail not too long ago) long list of things we need to accomplish for mdev pass through. For SVM, just access to hw, mmio and bind_mm to get a PASID bound with IOMMU. For IDXD that creates passthough devices for guest access and vSVM is through the VFIO path. For guest SVM, we expose mdev's to guest OS, idxd in the guest provides vSVM services. vSVM is *not* build around native SVM interfaces. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING in dma_map_page_attrs
Hello, syzbot found the following issue on: HEAD commit:c4d6fe73 Merge tag 'xarray-5.9' of git://git.infradead.org.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=14862ff050 kernel config: https://syzkaller.appspot.com/x/.config?x=7d790573d3e379c4 dashboard link: https://syzkaller.appspot.com/bug?extid=34dc2fea3478e659af01 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+34dc2fea3478e659a...@syzkaller.appspotmail.com infiniband syz1: set active infiniband syz1: added vcan0 [ cut here ] WARNING: CPU: 1 PID: 9851 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149 Modules linked in: CPU: 1 PID: 9851 Comm: syz-executor.1 Not tainted 5.9.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149 Code: 80 3c 10 00 0f 85 ed 01 00 00 48 8b 1d 36 c3 fa 0c e9 2d fc ff ff 48 89 c3 e9 d1 fd ff ff e8 04 12 12 00 0f 0b e8 fd 11 12 00 <0f> 0b 49 c7 c4 ff ff ff ff e9 d5 fd ff ff e8 ea 11 12 00 48 8d 7b RSP: 0018:c90001546c68 EFLAGS: 00010246 RAX: 0004 RBX: 894d0040 RCX: c9000dbe4000 RDX: 0004 RSI: 815d3b03 RDI: 88806a988b00 RBP: 8880236cc400 R08: 0002 R09: R10: 0002 R11: R12: ea8db300 R13: 88806a9886e8 R14: 04b8 R15: 0002 FS: 7f678fae2700() GS:88802ce0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f299a39b190 CR3: 69f31000 CR4: 00350ee0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: dma_map_single_attrs include/linux/dma-mapping.h:279 [inline] ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline] ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715 ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline] ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline] ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092 add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680 enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301 ib_register_device drivers/infiniband/core/device.c:1376 [inline] ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335 rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182 rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247 rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507 rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline] rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250 nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555 rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:671 sys_sendmsg+0x6e8/0x810 net/socket.c:2353 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d9f9 Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f678fae1c88 EFLAGS: 0246 ORIG_RAX: 002e RAX: ffda RBX: 0071f480 RCX: 0045d9f9 RDX: RSI: 2200 RDI: 0003 RBP: 004aab13 R08: R09: R10: R11: 0246 R12: 0075bf00 R13: 7ffc6f9b8bbf R14: 7f678fac2000 R15: 0003 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 7/7] mm: Remove examples from enum zone_type comment
We can't really list every setup in common code. On top of that they are unlikely to stay true for long as things change in the arch trees independently of this comment. Suggested-by: Christoph Hellwig Signed-off-by: Nicolas Saenz Julienne --- include/linux/mmzone.h | 20 1 file changed, 20 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fb3bf696c05e..9d0c454d23cd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -354,26 +354,6 @@ enum zone_type { * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit * platforms may need both zones as they support peripherals with * different DMA addressing limitations. -* -* Some examples: -* -* - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the -*rest of the lower 4G. -* -* - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on -*the specific device. -* -* - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the -*lower 4G. -* -* - powerpc only uses ZONE_DMA, the size, up to 2G, may vary -*depending on the specific device. -* -* - s390 uses ZONE_DMA fixed to the lower 2G. -* -* - ia64 and riscv only use ZONE_DMA32. -* -* - parisc uses neither. */ #ifdef CONFIG_ZONE_DMA ZONE_DMA, -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 07/14] iommu/ioasid: Add an iterator API for ioasid_set
On Mon, Sep 28, 2020 at 02:38:34PM -0700, Jacob Pan wrote: > Users of an ioasid_set may not keep track of all the IOASIDs allocated > under the set. When collective actions are needed for each IOASIDs, it > is useful to iterate over all the IOASIDs within the set. For example, > when the ioasid_set is freed, the user might perform the same cleanup > operation on each IOASID. > > This patch adds an API to iterate all the IOASIDs within the set. > > Signed-off-by: Jacob Pan Could add a short description of the function parameters, but Reviewed-by: Jean-Philippe Brucker > --- > drivers/iommu/ioasid.c | 17 + > include/linux/ioasid.h | 9 + > 2 files changed, 26 insertions(+) > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index cf8c7d34e2de..9628e78b2ab4 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -701,6 +701,23 @@ int ioasid_adjust_set(struct ioasid_set *set, int quota) > EXPORT_SYMBOL_GPL(ioasid_adjust_set); > > /** > + * ioasid_set_for_each_ioasid - Iterate over all the IOASIDs within the set > + * > + * Caller must hold a reference of the set and handles its own locking. > + */ > +void ioasid_set_for_each_ioasid(struct ioasid_set *set, > + void (*fn)(ioasid_t id, void *data), > + void *data) > +{ > + struct ioasid_data *entry; > + unsigned long index; > + > + xa_for_each(>xa, index, entry) > + fn(index, data); > +} > +EXPORT_SYMBOL_GPL(ioasid_set_for_each_ioasid); > + > +/** > * ioasid_find - Find IOASID data > * @set: the IOASID set > * @ioasid: the IOASID to find > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h > index 0a5e82148eb9..aab58bc26714 100644 > --- a/include/linux/ioasid.h > +++ b/include/linux/ioasid.h > @@ -75,6 +75,9 @@ int ioasid_register_allocator(struct ioasid_allocator_ops > *allocator); > void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator); > int ioasid_attach_data(ioasid_t ioasid, void *data); > void ioasid_detach_data(ioasid_t ioasid); > +void ioasid_set_for_each_ioasid(struct ioasid_set *sdata, > + void (*fn)(ioasid_t id, void *data), > + void *data); > #else /* !CONFIG_IOASID */ > static inline void ioasid_install_capacity(ioasid_t total) > { > @@ -131,5 +134,11 @@ static inline int ioasid_attach_data(ioasid_t ioasid, > void *data) > static inline void ioasid_detach_data(ioasid_t ioasid) > { > } > + > +static inline void ioasid_set_for_each_ioasid(struct ioasid_set *sdata, > + void (*fn)(ioasid_t id, void > *data), > + void *data) > +{ > +} > #endif /* CONFIG_IOASID */ > #endif /* __LINUX_IOASID_H */ > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub
On Mon, Oct 19, 2020 at 01:18:22PM -0400, Arvind Sankar wrote: > On Mon, Oct 19, 2020 at 04:51:53PM +0200, Daniel Kiper wrote: > > On Fri, Oct 16, 2020 at 04:51:51PM -0400, Arvind Sankar wrote: > > > On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote: > > > > > > > > I am discussing with Ross the other option. We can create > > > > .rodata.mle_header section and put it at fixed offset as > > > > kernel_info is. So, we would have, e.g.: > > > > > > > > arch/x86/boot/compressed/vmlinux.lds.S: > > > > .rodata.kernel_info KERNEL_INFO_OFFSET : { > > > > *(.rodata.kernel_info) > > > > } > > > > ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, > > > > "kernel_info at bad address!") > > > > > > > > .rodata.mle_header MLE_HEADER_OFFSET : { > > > > *(.rodata.mle_header) > > > > } > > > > ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header > > > > at bad address!") > > > > > > > > arch/x86/boot/compressed/sl_stub.S: > > > > #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET) > > > > > > > > .section ".rodata.mle_header", "a" > > > > > > > > SYM_DATA_START(mle_header) > > > > .long 0x9082ac5a/* UUID0 */ > > > > .long 0x74a7476f/* UUID1 */ > > > > .long 0xa2555c0f/* UUID2 */ > > > > .long 0x42b651cb/* UUID3 */ > > > > .long 0x0034/* MLE header size */ > > > > .long 0x00020002/* MLE version 2.2 */ > > > > .long mleh_rva(sl_stub_entry)/* Linear entry point of MLE > > > > (virt. address) */ > > > > .long 0x/* First valid page of MLE */ > > > > .long 0x/* Offset within binary of first byte of > > > > MLE */ > > > > .long 0x/* Offset within binary of last byte + 1 > > > > of MLE */ > > > > .long 0x0223/* Bit vector of MLE-supported > > > > capabilities */ > > > > .long 0x/* Starting linear address of command > > > > line (unused) */ > > > > .long 0x/* Ending linear address of command line > > > > (unused) */ > > > > SYM_DATA_END(mle_header) > > > > > > > > Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere. > > > > Anyway, is it acceptable? > > > > What do you think about my MLE_HEADER_OFFSET and related stuff proposal? > > > > I'm wondering if it would be easier to just allow relocations in these > special "header" sections. I need to check how easy/hard it is to do > that without triggering linker warnings. Ross and I still bouncing some ideas. We came to the conclusion that putting mle_header into kernel .rodata.kernel_info section or even arch/x86/boot/compressed/kernel_info.S file would be the easiest thing to do at this point. Of course I would suggest some renaming too. E.g. .rodata.kernel_info to .rodata.kernel_headers, etc. Does it make sense for you? Daniel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub
On 10/21/20 12:18 PM, Arvind Sankar wrote: > On Wed, Oct 21, 2020 at 05:28:33PM +0200, Daniel Kiper wrote: >> On Mon, Oct 19, 2020 at 01:18:22PM -0400, Arvind Sankar wrote: >>> On Mon, Oct 19, 2020 at 04:51:53PM +0200, Daniel Kiper wrote: On Fri, Oct 16, 2020 at 04:51:51PM -0400, Arvind Sankar wrote: > On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote: >> >> I am discussing with Ross the other option. We can create >> .rodata.mle_header section and put it at fixed offset as >> kernel_info is. So, we would have, e.g.: >> >> arch/x86/boot/compressed/vmlinux.lds.S: >> .rodata.kernel_info KERNEL_INFO_OFFSET : { >> *(.rodata.kernel_info) >> } >> ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info >> at bad address!") >> >> .rodata.mle_header MLE_HEADER_OFFSET : { >> *(.rodata.mle_header) >> } >> ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at >> bad address!") >> >> arch/x86/boot/compressed/sl_stub.S: >> #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET) >> >> .section ".rodata.mle_header", "a" >> >> SYM_DATA_START(mle_header) >> .long 0x9082ac5a/* UUID0 */ >> .long 0x74a7476f/* UUID1 */ >> .long 0xa2555c0f/* UUID2 */ >> .long 0x42b651cb/* UUID3 */ >> .long 0x0034/* MLE header size */ >> .long 0x00020002/* MLE version 2.2 */ >> .long mleh_rva(sl_stub_entry)/* Linear entry point of MLE >> (virt. address) */ >> .long 0x/* First valid page of MLE */ >> .long 0x/* Offset within binary of first byte of >> MLE */ >> .long 0x/* Offset within binary of last byte + 1 >> of MLE */ >> .long 0x0223/* Bit vector of MLE-supported >> capabilities */ >> .long 0x/* Starting linear address of command line >> (unused) */ >> .long 0x/* Ending linear address of command line >> (unused) */ >> SYM_DATA_END(mle_header) >> >> Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere. >> Anyway, is it acceptable? What do you think about my MLE_HEADER_OFFSET and related stuff proposal? >>> >>> I'm wondering if it would be easier to just allow relocations in these >>> special "header" sections. I need to check how easy/hard it is to do >>> that without triggering linker warnings. >> >> Ross and I still bouncing some ideas. We came to the conclusion that >> putting mle_header into kernel .rodata.kernel_info section or even >> arch/x86/boot/compressed/kernel_info.S file would be the easiest thing >> to do at this point. Of course I would suggest some renaming too. E.g. >> .rodata.kernel_info to .rodata.kernel_headers, etc. Does it make sense >> for you? >> >> Daniel > > I haven't been able to come up with any different options that don't > require post-processing of the kernel image. Allowing relocations in > specific sections seems to not be possible with lld, and anyway would > require the fields to be 64-bit sized so it doesn't really help. > > Putting mle_header into kernel_info seems like a reasonable thing to me, > and if you do that, putting it into kernel_info.S would seem to be > necessary? Would you also have a fixed field with the offset of the That seems like a reasonable place for it to go. > mle_header from kernel_info? That seems nicer than having the > bootloader scan the variable data for magic strings. Yes kernel_info will have a field to the offset of the mle_header. I agree that would be nicer. Thanks Ross > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午1:51, Raj, Ashok wrote: On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: I think we agreed (or agree to disagree and commit) for device types that we have for SIOV, VFIO based approach works well without having to re-invent another way to do the same things. Not looking for a shortcut by any means, but we need to plan around existing hardware though. Looks like vDPA took some shortcuts then to not abstract iommu uAPI instead :-)? When all necessary hardware was available.. This would be a solved puzzle. I think it is the opposite, vIOMMU and related has outgrown VFIO as the "home" and needs to stand alone. Apparently the HW that will need PASID for vDPA is Intel HW, so if So just to make this clear, I did check internally if there are any plans for vDPA + SVM. There are none at the moment. Not SVM, SIOV. ... And that included.. I should have said vDPA + PASID, No current plans. I have no idea who set expectations with you. Can you please put me in touch with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Can you explain why it must care VFIO abstractions? vDPA is trying to hide device details which is fundamentally different with what VFIO wants to do. vDPA allows the parent to deal with IOMMU stuffs, and if necessary, the parent can talk with IOMMU drivers directly via IOMMU APIs. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The reality is VFIO should not be the only user for (v)SVA/SIOV/PASID. The kernel hard already had users like GPU or uacce. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. The mdev is not perfect for sure, but it's another topic. If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA
Using two distinct DMA zones turned out to be problematic. Here's an attempt go back to a saner default. I tested this on both a RPi4 and QEMU. --- Changes since v3: - Drop patch adding define in dma-mapping - Address small review changes - Update Ard's patch - Add new patch removing examples from mmzone.h Changes since v2: - Introduce Ard's patch - Improve OF dma-ranges parsing function - Add unit test for OF function - Address small changes - Move crashkernel reservation later in boot process Changes since v1: - Parse dma-ranges instead of using machine compatible string Ard Biesheuvel (1): arm64: mm: Set ZONE_DMA size based on early IORT scan Nicolas Saenz Julienne (6): arm64: mm: Move reserve_crashkernel() into mem_init() arm64: mm: Move zone_dma_bits initialization into zone_sizes_init() of/address: Introduce of_dma_get_max_cpu_address() of: unittest: Add test for of_dma_get_max_cpu_address() arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges mm: Remove examples from enum zone_type comment arch/arm64/mm/init.c | 16 ++-- drivers/acpi/arm64/iort.c | 52 +++ drivers/of/address.c | 42 +++ drivers/of/unittest.c | 18 ++ include/linux/acpi_iort.h | 4 +++ include/linux/mmzone.h| 20 --- include/linux/of.h| 7 ++ 7 files changed, 130 insertions(+), 29 deletions(-) -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub
On Wed, Oct 21, 2020 at 05:28:33PM +0200, Daniel Kiper wrote: > On Mon, Oct 19, 2020 at 01:18:22PM -0400, Arvind Sankar wrote: > > On Mon, Oct 19, 2020 at 04:51:53PM +0200, Daniel Kiper wrote: > > > On Fri, Oct 16, 2020 at 04:51:51PM -0400, Arvind Sankar wrote: > > > > On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote: > > > > > > > > > > I am discussing with Ross the other option. We can create > > > > > .rodata.mle_header section and put it at fixed offset as > > > > > kernel_info is. So, we would have, e.g.: > > > > > > > > > > arch/x86/boot/compressed/vmlinux.lds.S: > > > > > .rodata.kernel_info KERNEL_INFO_OFFSET : { > > > > > *(.rodata.kernel_info) > > > > > } > > > > > ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, > > > > > "kernel_info at bad address!") > > > > > > > > > > .rodata.mle_header MLE_HEADER_OFFSET : { > > > > > *(.rodata.mle_header) > > > > > } > > > > > ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header > > > > > at bad address!") > > > > > > > > > > arch/x86/boot/compressed/sl_stub.S: > > > > > #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET) > > > > > > > > > > .section ".rodata.mle_header", "a" > > > > > > > > > > SYM_DATA_START(mle_header) > > > > > .long 0x9082ac5a/* UUID0 */ > > > > > .long 0x74a7476f/* UUID1 */ > > > > > .long 0xa2555c0f/* UUID2 */ > > > > > .long 0x42b651cb/* UUID3 */ > > > > > .long 0x0034/* MLE header size */ > > > > > .long 0x00020002/* MLE version 2.2 */ > > > > > .long mleh_rva(sl_stub_entry)/* Linear entry point of > > > > > MLE (virt. address) */ > > > > > .long 0x/* First valid page of MLE */ > > > > > .long 0x/* Offset within binary of first byte > > > > > of MLE */ > > > > > .long 0x/* Offset within binary of last byte + > > > > > 1 of MLE */ > > > > > .long 0x0223/* Bit vector of MLE-supported > > > > > capabilities */ > > > > > .long 0x/* Starting linear address of command > > > > > line (unused) */ > > > > > .long 0x/* Ending linear address of command > > > > > line (unused) */ > > > > > SYM_DATA_END(mle_header) > > > > > > > > > > Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere. > > > > > Anyway, is it acceptable? > > > > > > What do you think about my MLE_HEADER_OFFSET and related stuff proposal? > > > > > > > I'm wondering if it would be easier to just allow relocations in these > > special "header" sections. I need to check how easy/hard it is to do > > that without triggering linker warnings. > > Ross and I still bouncing some ideas. We came to the conclusion that > putting mle_header into kernel .rodata.kernel_info section or even > arch/x86/boot/compressed/kernel_info.S file would be the easiest thing > to do at this point. Of course I would suggest some renaming too. E.g. > .rodata.kernel_info to .rodata.kernel_headers, etc. Does it make sense > for you? > > Daniel I haven't been able to come up with any different options that don't require post-processing of the kernel image. Allowing relocations in specific sections seems to not be possible with lld, and anyway would require the fields to be 64-bit sized so it doesn't really help. Putting mle_header into kernel_info seems like a reasonable thing to me, and if you do that, putting it into kernel_info.S would seem to be necessary? Would you also have a fixed field with the offset of the mle_header from kernel_info? That seems nicer than having the bootloader scan the variable data for magic strings. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > SVM is orthogonal to how we achieve mdev passthrough to guest and > vSVM. Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed on the VDPA side as well, I think that is why JasonW brought this up in the first place. We may not see vSVA for VDPA, but that seems like some special sub mode of all the other vIOMMU and PASID stuff, and not a completely distinct thing. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:32:18PM -0300, Jason Gunthorpe wrote: > On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > > > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > > SVM is orthogonal to how we achieve mdev passthrough to guest and > > vSVM. > > Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed > on the VDPA side as well, I think that is why JasonW brought this up > in the first place. True, to that effect we are working on trying to move PASID allocation outside of VFIO, so both agents VFIO and vDPA with PASID, when that comes available can support one way to allocate and manage PASID's from user space. Since the IOASID is almost standalone, this is possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 10:51:46AM -0700, Raj, Ashok wrote: > > If they didn't plan to use it, bit of a strawman argument, right? > > This doesn't need to continue like the debates :-) Pun intended :-) > > I don't think it makes any sense to have an abstract strawman argument > design discussion. Yi is looking into for pasid management alone. Rest > of the IOMMU related topics should wait until we have another > *real* use requiring consolidation. Actually I'm really annoyed right now that the other Intel team wasted quiet a lot of the rest of our time on arguing about vDPA and vfio with no actual interest in this technology. So you'll excuse me if I'm not particularly enamored with this discussion right now. > Contrary to your argument, vDPA went with a half blown device only > iommu user without considering existing abstractions like containers VDPA IOMMU was done *for Intel*, as the kind of half-architected thing you are advocating should be allowed for IDXD here. Not sure why you think bashing that work is going to help your case here. I'm saying Intel needs to get its architecture together and stop ceating this mess across the kernel to support Intel devices. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 03/14] iommu/ioasid: Add a separate function for detach data
On Mon, Sep 28, 2020 at 02:38:30PM -0700, Jacob Pan wrote: > IOASID private data can be cleared by ioasid_attach_data() with a NULL > data pointer. A common use case is for a caller to free the data > afterward. ioasid_attach_data() calls synchronize_rcu() before return > such that free data can be sure without outstanding readers. > However, since synchronize_rcu() may sleep, ioasid_attach_data() cannot > be used under spinlocks. > > This patch adds ioasid_detach_data() as a separate API where > synchronize_rcu() is called only in this case. ioasid_attach_data() can > then be used under spinlocks. In addition, this change makes the API > symmetrical. > > Signed-off-by: Jacob Pan A typo below, but Reviewed-by: Jean-Philippe Brucker > --- > drivers/iommu/intel/svm.c | 4 ++-- > drivers/iommu/ioasid.c| 54 > ++- > include/linux/ioasid.h| 5 - > 3 files changed, 50 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > index 2c5645f0737a..06a16bee7b65 100644 > --- a/drivers/iommu/intel/svm.c > +++ b/drivers/iommu/intel/svm.c > @@ -398,7 +398,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, > struct device *dev, > list_add_rcu(>list, >devs); > out: > if (!IS_ERR_OR_NULL(svm) && list_empty(>devs)) { > - ioasid_attach_data(data->hpasid, NULL); > + ioasid_detach_data(data->hpasid); > kfree(svm); > } > > @@ -441,7 +441,7 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid) >* the unbind, IOMMU driver will get notified >* and perform cleanup. >*/ > - ioasid_attach_data(pasid, NULL); > + ioasid_detach_data(pasid); > kfree(svm); > } > } > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index 5f63af07acd5..6cfbdfb492e0 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -272,24 +272,58 @@ int ioasid_attach_data(ioasid_t ioasid, void *data) > > spin_lock(_allocator_lock); > ioasid_data = xa_load(_allocator->xa, ioasid); > - if (ioasid_data) > - rcu_assign_pointer(ioasid_data->private, data); > - else > + > + if (!ioasid_data) { > ret = -ENOENT; > - spin_unlock(_allocator_lock); > + goto done_unlock; > + } > > - /* > - * Wait for readers to stop accessing the old private data, so the > - * caller can free it. > - */ > - if (!ret) > - synchronize_rcu(); > + if (ioasid_data->private) { > + ret = -EBUSY; > + goto done_unlock; > + } > + rcu_assign_pointer(ioasid_data->private, data); > + > +done_unlock: > + spin_unlock(_allocator_lock); > > return ret; > } > EXPORT_SYMBOL_GPL(ioasid_attach_data); > > /** > + * ioasid_detach_data - Clear the private data of an ioasid > + * > + * @ioasid: the IOASIDD to clear private data IOASID > + */ > +void ioasid_detach_data(ioasid_t ioasid) > +{ > + struct ioasid_data *ioasid_data; > + > + spin_lock(_allocator_lock); > + ioasid_data = xa_load(_allocator->xa, ioasid); > + > + if (!ioasid_data) { > + pr_warn("IOASID %u not found to detach data from\n", ioasid); > + goto done_unlock; > + } > + > + if (ioasid_data->private) { > + rcu_assign_pointer(ioasid_data->private, NULL); > + goto done_unlock; > + } > + > +done_unlock: > + spin_unlock(_allocator_lock); > + /* > + * Wait for readers to stop accessing the old private data, > + * so the caller can free it. > + */ > + synchronize_rcu(); > +} > +EXPORT_SYMBOL_GPL(ioasid_detach_data); > + > +/** > * ioasid_alloc - Allocate an IOASID > * @set: the IOASID set > * @min: the minimum ID (inclusive) > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h > index 9c44947a68c8..c7f649fa970a 100644 > --- a/include/linux/ioasid.h > +++ b/include/linux/ioasid.h > @@ -40,7 +40,7 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, > int ioasid_register_allocator(struct ioasid_allocator_ops *allocator); > void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator); > int ioasid_attach_data(ioasid_t ioasid, void *data); > - > +void ioasid_detach_data(ioasid_t ioasid); > #else /* !CONFIG_IOASID */ > static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, > ioasid_t max, void *private) > @@ -72,5 +72,8 @@ static inline int ioasid_attach_data(ioasid_t ioasid, void > *data) > return -ENOTSUPP; > } > > +static inline void ioasid_detach_data(ioasid_t ioasid) > +{ > +} > #endif /* CONFIG_IOASID */ > #endif /* __LINUX_IOASID_H */ > -- > 2.7.4 >
Re: [PATCH v3 08/14] iommu/ioasid: Add reference couting functions
On Mon, Sep 28, 2020 at 02:38:35PM -0700, Jacob Pan wrote: > There can be multiple users of an IOASID, each user could have hardware > contexts associated with the IOASID. In order to align lifecycles, > reference counting is introduced in this patch. It is expected that when > an IOASID is being freed, each user will drop a reference only after its > context is cleared. > > Signed-off-by: Jacob Pan > --- > drivers/iommu/ioasid.c | 117 > + > include/linux/ioasid.h | 24 ++ > 2 files changed, 141 insertions(+) > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index 9628e78b2ab4..828cc44b1b1c 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -16,8 +16,26 @@ static ioasid_t ioasid_capacity = PCI_PASID_MAX; > static ioasid_t ioasid_capacity_avail = PCI_PASID_MAX; > static DEFINE_XARRAY_ALLOC(ioasid_sets); > > +enum ioasid_state { > + IOASID_STATE_INACTIVE, > + IOASID_STATE_ACTIVE, > + IOASID_STATE_FREE_PENDING, > +}; > + > +/** > + * struct ioasid_data - Meta data about ioasid > + * > + * @id: Unique ID > + * @users: Number of active users > + * @state: Track state of the IOASID > + * @set: ioasid_set of the IOASID belongs to > + * @private: Private data associated with the IOASID > + * @rcu: For free after RCU grace period > + */ > struct ioasid_data { > ioasid_t id; > + refcount_t users; > + enum ioasid_state state; > struct ioasid_set *set; > void *private; > struct rcu_head rcu; > @@ -511,6 +529,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t > min, ioasid_t max, > goto exit_free; > } > data->id = id; > + data->state = IOASID_STATE_ACTIVE; > + refcount_set(>users, 1); > > /* Store IOASID in the per set data */ > if (xa_err(xa_store(>xa, id, data, GFP_ATOMIC))) { > @@ -560,6 +580,14 @@ static void ioasid_free_locked(struct ioasid_set *set, > ioasid_t ioasid) > if (WARN_ON(!xa_load(_sets, data->set->id))) > return; > > + /* Free is already in progress */ > + if (data->state == IOASID_STATE_FREE_PENDING) > + return; But the previous call to ioasid_free_locked() dropped a reference, then returned because more refs where held. Shouldn't this call also dec_and_test() the reference and call ioasid_do_free_locked() if necessary? > + > + data->state = IOASID_STATE_FREE_PENDING; > + if (!refcount_dec_and_test(>users)) > + return; > + > ioasid_do_free_locked(data); > } > > @@ -717,6 +745,95 @@ void ioasid_set_for_each_ioasid(struct ioasid_set *set, > } > EXPORT_SYMBOL_GPL(ioasid_set_for_each_ioasid); > > +int ioasid_get_locked(struct ioasid_set *set, ioasid_t ioasid) > +{ > + struct ioasid_data *data; > + > + data = xa_load(_allocator->xa, ioasid); > + if (!data) { > + pr_err("Trying to get unknown IOASID %u\n", ioasid); > + return -EINVAL; > + } > + if (data->state == IOASID_STATE_FREE_PENDING) { > + pr_err("Trying to get IOASID being freed%u\n", ioasid); Strange placement of the %u > + return -EBUSY; > + } > + > + /* Check set ownership if the set is non-null */ > + if (set && data->set != set) { > + pr_err("Trying to get IOASID %u outside the set\n", ioasid); > + /* data found but does not belong to the set */ > + return -EACCES; > + } > + refcount_inc(>users); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(ioasid_get_locked); If this is a public facing let's add a lockdep_assert_held() to make sure they do hold the right lock. Same for ioasid_put_locked(). Thanks, Jean > + > +/** > + * ioasid_get - Obtain a reference to an ioasid > + * @set: the ioasid_set to check permission against if not NULL > + * @ioasid: the ID to remove > + * > + * > + * Return: 0 on success, error if failed. > + */ > +int ioasid_get(struct ioasid_set *set, ioasid_t ioasid) > +{ > + int ret; > + > + spin_lock(_allocator_lock); > + ret = ioasid_get_locked(set, ioasid); > + spin_unlock(_allocator_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(ioasid_get); > + > +bool ioasid_put_locked(struct ioasid_set *set, ioasid_t ioasid) > +{ > + struct ioasid_data *data; > + > + data = xa_load(_allocator->xa, ioasid); > + if (!data) { > + pr_err("Trying to put unknown IOASID %u\n", ioasid); > + return false; > + } > + if (set && data->set != set) { > + pr_err("Trying to drop IOASID %u outside the set\n", ioasid); > + return false; > + } > + if (!refcount_dec_and_test(>users)) > + return false; > + > + ioasid_do_free_locked(data); > + > + return true; > +} > +EXPORT_SYMBOL_GPL(ioasid_put_locked); > + > +/** > + * ioasid_put - Release a reference to an ioasid > + * @set: the ioasid_set
Re: [PATCH v3 05/14] iommu/ioasid: Redefine IOASID set and allocation APIs
On Mon, Sep 28, 2020 at 02:38:32PM -0700, Jacob Pan wrote: > ioasid_set was introduced as an arbitrary token that is shared by a > group of IOASIDs. For example, two IOASIDs allocated via the same > ioasid_set pointer belong to the same set. > > For guest SVA usages, system-wide IOASID resources need to be > partitioned such that each VM can have its own quota and being managed > separately. ioasid_set is the perfect candidate for meeting such > requirements. This patch redefines and extends ioasid_set with the > following new fields: > - Quota > - Reference count > - Storage of its namespace > - The token is now stored in the ioasid_set with types > > Basic ioasid_set level APIs are introduced that wire up these new data. > Existing users of IOASID APIs are converted where a host IOASID set is > allocated for bare-metal usage. > > Signed-off-by: Liu Yi L > Signed-off-by: Jacob Pan > --- > drivers/iommu/intel/iommu.c | 26 +++-- > drivers/iommu/intel/pasid.h | 1 + > drivers/iommu/intel/svm.c | 25 ++-- > drivers/iommu/ioasid.c | 277 > > include/linux/ioasid.h | 53 +++-- > 5 files changed, 333 insertions(+), 49 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index e7bcb299e51e..872391890323 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -104,6 +104,9 @@ > */ > #define INTEL_IOMMU_PGSIZES (~0xFFFUL) > > +/* PASIDs used by host SVM */ > +struct ioasid_set *host_pasid_set; > + > static inline int agaw_to_level(int agaw) > { > return agaw + 2; > @@ -3147,8 +3150,8 @@ static void intel_vcmd_ioasid_free(ioasid_t ioasid, > void *data) >* Sanity check the ioasid owner is done at upper layer, e.g. VFIO >* We can only free the PASID when all the devices are unbound. >*/ > - if (ioasid_find(NULL, ioasid, NULL)) { > - pr_alert("Cannot free active IOASID %d\n", ioasid); > + if (IS_ERR(ioasid_find(host_pasid_set, ioasid, NULL))) { > + pr_err("IOASID %d to be freed but not in system set\n", ioasid); > return; > } > vcmd_free_pasid(iommu, ioasid); > @@ -,8 +3336,17 @@ static int __init init_dmars(void) > goto free_iommu; > > /* PASID is needed for scalable mode irrespective to SVM */ > - if (intel_iommu_sm) > + if (intel_iommu_sm) { > ioasid_install_capacity(intel_pasid_max_id); > + /* We should not run out of IOASIDs at boot */ > + host_pasid_set = ioasid_set_alloc(NULL, PID_MAX_DEFAULT, > + IOASID_SET_TYPE_NULL); > + if (IS_ERR_OR_NULL(host_pasid_set)) { > + pr_err("Failed to allocate host PASID set %lu\n", Printing a negative value as unsigned > + PTR_ERR(host_pasid_set)); > + intel_iommu_sm = 0; > + } > + } > > /* >* for each drhd > @@ -3381,7 +3393,7 @@ static int __init init_dmars(void) > disable_dmar_iommu(iommu); > free_dmar_iommu(iommu); > } > - > + ioasid_set_put(host_pasid_set); > kfree(g_iommus); > > error: > @@ -5163,7 +5175,7 @@ static void auxiliary_unlink_device(struct dmar_domain > *domain, > domain->auxd_refcnt--; > > if (!domain->auxd_refcnt && domain->default_pasid > 0) > - ioasid_free(domain->default_pasid); > + ioasid_free(host_pasid_set, domain->default_pasid); > } > > static int aux_domain_add_dev(struct dmar_domain *domain, > @@ -5181,7 +5193,7 @@ static int aux_domain_add_dev(struct dmar_domain > *domain, > int pasid; > > /* No private data needed for the default pasid */ > - pasid = ioasid_alloc(NULL, PASID_MIN, > + pasid = ioasid_alloc(host_pasid_set, PASID_MIN, >pci_max_pasids(to_pci_dev(dev)) - 1, >NULL); > if (pasid == INVALID_IOASID) { > @@ -5224,7 +5236,7 @@ static int aux_domain_add_dev(struct dmar_domain > *domain, > spin_unlock(>lock); > spin_unlock_irqrestore(_domain_lock, flags); > if (!domain->auxd_refcnt && domain->default_pasid > 0) > - ioasid_free(domain->default_pasid); > + ioasid_free(host_pasid_set, domain->default_pasid); > > return ret; > } > diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h > index c9850766c3a9..ccdc23446015 100644 > --- a/drivers/iommu/intel/pasid.h > +++ b/drivers/iommu/intel/pasid.h > @@ -99,6 +99,7 @@ static inline bool pasid_pte_is_present(struct pasid_entry > *pte) > } > > extern u32 intel_pasid_max_id; > +extern struct ioasid_set *host_pasid_set; > int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); > void intel_pasid_free_id(int pasid); > void
Re: [PATCH v3 06/14] iommu/ioasid: Introduce API to adjust the quota of an ioasid_set
On Mon, Sep 28, 2020 at 02:38:33PM -0700, Jacob Pan wrote: > Each ioasid_set is given a quota during allocation. As system > administrators balance resources among VMs, we shall support the > adjustment of quota at runtime. The new quota cannot be less than the > outstanding IOASIDs already allocated within the set. The extra quota > will be returned to the system-wide IOASID pool if the new quota is > smaller than the existing one. > > Signed-off-by: Jacob Pan Minor comments below, but Reviewed-by: Jean-Philippe Brucker > --- > drivers/iommu/ioasid.c | 47 +++ > include/linux/ioasid.h | 6 ++ > 2 files changed, 53 insertions(+) > > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index 61e25c2375ab..cf8c7d34e2de 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -654,6 +654,53 @@ void ioasid_set_put(struct ioasid_set *set) > EXPORT_SYMBOL_GPL(ioasid_set_put); > > /** > + * ioasid_adjust_set - Adjust the quota of an IOASID set > + * @set: IOASID set to be assigned > + * @quota: Quota allowed in this set > + * > + * Return 0 on success. If the new quota is smaller than the number of > + * IOASIDs already allocated, -EINVAL will be returned. No change will be > + * made to the existing quota. > + */ > +int ioasid_adjust_set(struct ioasid_set *set, int quota) @quota probably doesn't need to be signed (since it's the same as an ioasid_t, which is unsigned). > +{ > + int ret = 0; > + > + if (quota <= 0) > + return -EINVAL; > + > + spin_lock(_allocator_lock); > + if (set->nr_ioasids > quota) { > + pr_err("New quota %d is smaller than outstanding IOASIDs %d\n", > + quota, set->nr_ioasids); > + ret = -EINVAL; > + goto done_unlock; > + } > + > + if ((quota > set->quota) && > + (quota - set->quota > ioasid_capacity_avail)) { misaligned > + ret = -ENOSPC; > + goto done_unlock; > + } > + > + /* Return the delta back to system pool */ > + ioasid_capacity_avail += set->quota - quota; > + > + /* > + * May have a policy to prevent giving all available IOASIDs > + * to one set. But we don't enforce here, it should be in the > + * upper layers. > + */ I think here would be OK to check about fairness. But anyway, we don't have to worry about this yet, so I'd just drop the comment. Thanks, Jean > + set->quota = quota; > + > +done_unlock: > + spin_unlock(_allocator_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(ioasid_adjust_set); > + > +/** > * ioasid_find - Find IOASID data > * @set: the IOASID set > * @ioasid: the IOASID to find > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h > index 1ae213b660f0..0a5e82148eb9 100644 > --- a/include/linux/ioasid.h > +++ b/include/linux/ioasid.h > @@ -62,6 +62,7 @@ struct ioasid_allocator_ops { > void ioasid_install_capacity(ioasid_t total); > ioasid_t ioasid_get_capacity(void); > struct ioasid_set *ioasid_set_alloc(void *token, ioasid_t quota, int type); > +int ioasid_adjust_set(struct ioasid_set *set, int quota); > void ioasid_set_get(struct ioasid_set *set); > void ioasid_set_put(struct ioasid_set *set); > > @@ -99,6 +100,11 @@ static inline struct ioasid_set *ioasid_set_alloc(void > *token, ioasid_t quota, i > return ERR_PTR(-ENOTSUPP); > } > > +static inline int ioasid_adjust_set(struct ioasid_set *set, int quota) > +{ > + return -ENOTSUPP; > +} > + > static inline void ioasid_set_put(struct ioasid_set *set) > { > } > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 04/14] iommu/ioasid: Support setting system-wide capacity
On Mon, Sep 28, 2020 at 02:38:31PM -0700, Jacob Pan wrote: > IOASID is a system-wide resource that could vary on different systems. > The default capacity is 20 bits as defined in the PCI-E specifications. > This patch adds a function to allow adjusting system IOASID capacity. > For VT-d this is set during boot as part of the Intel IOMMU > initialization. > > Signed-off-by: Jacob Pan Reviewed-by: Jean-Philippe Brucker > --- > drivers/iommu/intel/iommu.c | 5 + > drivers/iommu/ioasid.c | 20 > include/linux/ioasid.h | 11 +++ > 3 files changed, 36 insertions(+) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 18ed3b3c70d7..e7bcb299e51e 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -42,6 +42,7 @@ > #include > #include > #include > +#include (not in alphabetical order) > #include > #include > #include > @@ -3331,6 +3332,10 @@ static int __init init_dmars(void) > if (ret) > goto free_iommu; > > + /* PASID is needed for scalable mode irrespective to SVM */ > + if (intel_iommu_sm) > + ioasid_install_capacity(intel_pasid_max_id); > + > /* >* for each drhd >* enable fault log > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c > index 6cfbdfb492e0..4277cb17e15b 100644 > --- a/drivers/iommu/ioasid.c > +++ b/drivers/iommu/ioasid.c > @@ -10,6 +10,10 @@ > #include > #include > > +/* Default to PCIe standard 20 bit PASID */ > +#define PCI_PASID_MAX 0x10 > +static ioasid_t ioasid_capacity = PCI_PASID_MAX; > +static ioasid_t ioasid_capacity_avail = PCI_PASID_MAX; > struct ioasid_data { > ioasid_t id; > struct ioasid_set *set; > @@ -17,6 +21,22 @@ struct ioasid_data { > struct rcu_head rcu; > }; > > +void ioasid_install_capacity(ioasid_t total) > +{ > + if (ioasid_capacity && ioasid_capacity != PCI_PASID_MAX) { > + pr_warn("IOASID capacity is already set.\n"); > + return; > + } > + ioasid_capacity = ioasid_capacity_avail = total; > +} > +EXPORT_SYMBOL_GPL(ioasid_install_capacity); > + > +ioasid_t ioasid_get_capacity(void) > +{ > + return ioasid_capacity; > +} > +EXPORT_SYMBOL_GPL(ioasid_get_capacity); > + > /* > * struct ioasid_allocator_data - Internal data structure to hold information > * about an allocator. There are two types of allocators: > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h > index c7f649fa970a..7fc320656be2 100644 > --- a/include/linux/ioasid.h > +++ b/include/linux/ioasid.h > @@ -32,6 +32,8 @@ struct ioasid_allocator_ops { > #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 } > > #if IS_ENABLED(CONFIG_IOASID) > +void ioasid_install_capacity(ioasid_t total); > +ioasid_t ioasid_get_capacity(void); > ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, > void *private); > void ioasid_free(ioasid_t ioasid); > @@ -42,6 +44,15 @@ void ioasid_unregister_allocator(struct > ioasid_allocator_ops *allocator); > int ioasid_attach_data(ioasid_t ioasid, void *data); > void ioasid_detach_data(ioasid_t ioasid); > #else /* !CONFIG_IOASID */ > +static inline void ioasid_install_capacity(ioasid_t total) > +{ > +} > + > +static inline ioasid_t ioasid_get_capacity(void) > +{ > + return 0; > +} > + > static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, > ioasid_t max, void *private) > { > -- > 2.7.4 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan
From: Ard Biesheuvel We recently introduced a 1 GB sized ZONE_DMA to cater for platforms incorporating masters that can address less than 32 bits of DMA, in particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has peripherals that can only address up to 1 GB (and its PCIe host bridge can only access the bottom 3 GB) Instructing the DMA layer about these limitations is straight-forward, even though we had to fix some issues regarding memory limits set in the IORT for named components, and regarding the handling of ACPI _DMA methods. However, the DMA layer also needs to be able to allocate memory that is guaranteed to meet those DMA constraints, for bounce buffering as well as allocating the backing for consistent mappings. This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately, it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and potentially in other places where allocations cannot cross zone boundaries. Therefore, we should avoid having two separate DMA zones when possible. So let's do an early scan of the IORT, and only create the ZONE_DMA if we encounter any devices that need it. This puts the burden on the firmware to describe such limitations in the IORT, which may be redundant (and less precise) if _DMA methods are also being provided. However, it should be noted that this situation is highly unusual for arm64 ACPI machines. Also, the DMA subsystem still gives precedence to the _DMA method if implemented, and so we will not lose the ability to perform streaming DMA outside the ZONE_DMA if the _DMA method permits it. Cc: Jeremy Linton Cc: Lorenzo Pieralisi Cc: Nicolas Saenz Julienne Cc: Rob Herring Cc: Christoph Hellwig Cc: Robin Murphy Cc: Hanjun Guo Cc: Sudeep Holla Cc: Anshuman Khandual Signed-off-by: Ard Biesheuvel [nsaenz: Rebased, removed documentation change and add declaration in acpi_iort.h] Signed-off-by: Nicolas Saenz Julienne --- Changes since v3: - Use min_not_zero() - Check ACPI revision - Remove unnecessary #ifdef in zone_sizes_init() arch/arm64/mm/init.c | 3 ++- drivers/acpi/arm64/iort.c | 52 +++ include/linux/acpi_iort.h | 4 +++ 3 files changed, 58 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 94e38f99748b..f5d4f85506e4 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -190,7 +191,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) #ifdef CONFIG_ZONE_DMA dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); - zone_dma_bits = min(32U, dt_zone_dma_bits); + zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_iort_get_zone_dma_size()); arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 9929ff50c0c0..05fe4a076bab 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -1718,3 +1718,55 @@ void __init acpi_iort_init(void) iort_init_platform_devices(); } + +#ifdef CONFIG_ZONE_DMA +/* + * Check the IORT whether any devices exist whose DMA mask is < 32 bits. + * If so, return the smallest value encountered, or 32 otherwise. + */ +unsigned int __init acpi_iort_get_zone_dma_size(void) +{ + struct acpi_table_iort *iort; + struct acpi_iort_node *node, *end; + acpi_status status; + u8 limit = 32; + int i; + + if (acpi_disabled) + return limit; + + status = acpi_get_table(ACPI_SIG_IORT, 0, + (struct acpi_table_header **)); + if (ACPI_FAILURE(status)) + return limit; + + node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset); + end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length); + + for (i = 0; i < iort->node_count; i++) { + if (node >= end) + break; + + switch (node->type) { + struct acpi_iort_named_component *ncomp; + struct acpi_iort_root_complex *rc; + + case ACPI_IORT_NODE_NAMED_COMPONENT: + ncomp = (struct acpi_iort_named_component *)node->node_data; + limit = min_not_zero(limit, ncomp->memory_address_limit); + break; + + case ACPI_IORT_NODE_PCI_ROOT_COMPLEX: + if (node->revision < 1) + break; + + rc = (struct acpi_iort_root_complex *)node->node_data; + limit = min_not_zero(limit, rc->memory_address_limit); + break; + } + node = ACPI_ADD_PTR(struct acpi_iort_node, node, node->length); + } +
[PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms incorporating masters that can address less than 32 bits of DMA, in particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has peripherals that can only address up to 1 GB (and its PCIe host bridge can only access the bottom 3 GB) The DMA layer also needs to be able to allocate memory that is guaranteed to meet those DMA constraints, for bounce buffering as well as allocating the backing for consistent mappings. This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately, it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and potentially in other places where allocations cannot cross zone boundaries. Therefore, we should avoid having two separate DMA zones when possible. So, with the help of of_dma_get_max_cpu_address() get the topmost physical address accessible to all DMA masters in system and use that information to fine-tune ZONE_DMA's size. In the absence of addressing limited masters ZONE_DMA will span the whole 32-bit address space, otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit address space, and have ZONE_DMA32 cover the rest of the 32-bit address space. Signed-off-by: Nicolas Saenz Julienne --- Changes since v3: - Simplify code for readability. Changes since v2: - Updated commit log by shamelessly copying Ard's ACPI commit log arch/arm64/mm/init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 410721fc4fc0..94e38f99748b 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -42,8 +42,6 @@ #include #include -#define ARM64_ZONE_DMA_BITS30 - /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) static void __init zone_sizes_init(unsigned long min, unsigned long max) { unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; + unsigned int __maybe_unused dt_zone_dma_bits; #ifdef CONFIG_ZONE_DMA - zone_dma_bits = ARM64_ZONE_DMA_BITS; + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); + zone_dma_bits = min(32U, dt_zone_dma_bits); arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT data as the rest of dma-ranges unit tests. Signed-off-by: Nicolas Saenz Julienne --- Changes since v3: - Remove HAS_DMA guards drivers/of/unittest.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 06cc988faf78..b9a4d047a95e 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -869,6 +869,23 @@ static void __init of_unittest_changeset(void) #endif } +static void __init of_unittest_dma_get_max_cpu_address(void) +{ + struct device_node *np; + phys_addr_t cpu_addr; + + np = of_find_node_by_path("/testcase-data/address-tests"); + if (!np) { + pr_err("missing testcase data\n"); + return; + } + + cpu_addr = of_dma_get_max_cpu_address(np); + unittest(cpu_addr == 0x5000, +"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting %x)\n", +_addr, 0x5000); +} + static void __init of_unittest_dma_ranges_one(const char *path, u64 expect_dma_addr, u64 expect_paddr) { @@ -3266,6 +3283,7 @@ static int __init of_unittest(void) of_unittest_changeset(); of_unittest_parse_interrupts(); of_unittest_parse_interrupts_extended(); + of_unittest_dma_get_max_cpu_address(); of_unittest_parse_dma_ranges(); of_unittest_pci_dma_ranges(); of_unittest_match_node(); -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()
crashkernel might reserve memory located in ZONE_DMA. We plan to delay ZONE_DMA's initialization after unflattening the devicetree and ACPI's boot table initialization, so move it later in the boot process. Specifically into mem_init(), this is the last place crashkernel will be able to reserve the memory before the page allocator kicks in. There isn't any apparent reason for doing this earlier. Signed-off-by: Nicolas Saenz Julienne --- arch/arm64/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 095540667f0f..fc4ab0d6d5d2 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -386,8 +386,6 @@ void __init arm64_memblock_init(void) else arm64_dma32_phys_limit = PHYS_MASK + 1; - reserve_crashkernel(); - reserve_elfcorehdr(); high_memory = __va(memblock_end_of_DRAM() - 1) + 1; @@ -508,6 +506,8 @@ void __init mem_init(void) else swiotlb_force = SWIOTLB_NO_FORCE; + reserve_crashkernel(); + set_max_mapnr(max_pfn - PHYS_PFN_OFFSET); #ifndef CONFIG_SPARSEMEM_VMEMMAP -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
zone_dma_bits's initialization happens earlier that it's actually needed, in arm64_memblock_init(). So move it into the more suitable zone_sizes_init(). Signed-off-by: Nicolas Saenz Julienne --- arch/arm64/mm/init.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index fc4ab0d6d5d2..410721fc4fc0 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -190,6 +190,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; #ifdef CONFIG_ZONE_DMA + zone_dma_bits = ARM64_ZONE_DMA_BITS; + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif #ifdef CONFIG_ZONE_DMA32 @@ -376,11 +378,6 @@ void __init arm64_memblock_init(void) early_init_fdt_scan_reserved_mem(); - if (IS_ENABLED(CONFIG_ZONE_DMA)) { - zone_dma_bits = ARM64_ZONE_DMA_BITS; - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS); - } - if (IS_ENABLED(CONFIG_ZONE_DMA32)) arm64_dma32_phys_limit = max_zone_phys(32); else -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU physical address addressable by all DMA masters in the system. It's specially useful for setting memory zones sizes at early boot time. Signed-off-by: Nicolas Saenz Julienne --- Changes since v3: - use u64 with cpu_end Changes since v2: - Use PHYS_ADDR_MAX - return phys_dma_t - Rename function - Correct subject - Add support to start parsing from an arbitrary device node in order for the function to work with unit tests drivers/of/address.c | 42 ++ include/linux/of.h | 7 +++ 2 files changed, 49 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index eb9ab4f1e80b..47dfe5881e18 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) } #endif /* CONFIG_HAS_DMA */ +/** + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA + * @np: The node to start searching from or NULL to start from the root + * + * Gets the highest CPU physical address that is addressable by all DMA masters + * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no + * DMA constrained device is found, it returns PHYS_ADDR_MAX. + */ +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np) +{ + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX; + struct of_range_parser parser; + phys_addr_t subtree_max_addr; + struct device_node *child; + struct of_range range; + const __be32 *ranges; + u64 cpu_end = 0; + int len; + + if (!np) + np = of_root; + + ranges = of_get_property(np, "dma-ranges", ); + if (ranges && len) { + of_dma_range_parser_init(, np); + for_each_of_range(, ) + if (range.cpu_addr + range.size > cpu_end) + cpu_end = range.cpu_addr + range.size; + + if (max_cpu_addr > cpu_end) + max_cpu_addr = cpu_end; + } + + for_each_available_child_of_node(np, child) { + subtree_max_addr = of_dma_get_max_cpu_address(child); + if (max_cpu_addr > subtree_max_addr) + max_cpu_addr = subtree_max_addr; + } + + return max_cpu_addr; +} + /** * of_dma_is_coherent - Check if device is coherent * @np:device node diff --git a/include/linux/of.h b/include/linux/of.h index 481ec0467285..db8db8f2c967 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id, const char *map_name, const char *map_mask_name, struct device_node **target, u32 *id_out); +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np); + #else /* CONFIG_OF */ static inline void of_core_init(void) @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id, return -EINVAL; } +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np) +{ + return PHYS_ADDR_MAX; +} + #define of_match_ptr(_ptr) NULL #define of_match_node(_matches, _node) NULL #endif /* CONFIG_OF */ -- 2.28.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support
On 2020-10-19 12:30, Chao Hao wrote: MTK_IOMMU driver writes one page entry and does tlb flush at a time currently. More optimal would be to aggregate the writes and flush BUS buffer in the end. That's exactly what iommu_iotlb_gather_add_page() is meant to achieve. Rather than jumping straight into hacking up a new API to go round the back of the existing API design, it would be far better to ask the question of why that's not behaving as expected. For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range() instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase 50% performance or more(depending on size of every page size) in comparison to flushing after each page entry update. So we prefer to use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and tlb_flush_walk/leaf() for MTK platforms. In the case of mapping, it sounds like what you actually want to do is hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP cleverer, because the current implementation is as dumb as it could possibly be. In fact if we simply passed an address range to .iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all any more. Robin. Signed-off-by: Chao Hao --- drivers/iommu/mtk_iommu.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 785b228d39a6..d3400c15ff7b 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, } } +static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size) +{ + mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL) +} + static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather, unsigned long iova, size_t granule, void *cookie) @@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = { .map= mtk_iommu_map, .unmap = mtk_iommu_unmap, .flush_iotlb_all = mtk_iommu_flush_iotlb_all, + .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync, .iotlb_sync = mtk_iommu_iotlb_sync, .iova_to_phys = mtk_iommu_iova_to_phys, .probe_device = mtk_iommu_probe_device, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > I think we agreed (or agree to disagree and commit) for device types > > > > > that > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > re-invent > > > > > another way to do the same things. Not looking for a shortcut by any > > > > > means, > > > > > but we need to plan around existing hardware though. Looks like vDPA > > > > > took > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > the "home" and needs to stand alone. > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > So just to make this clear, I did check internally if there are any plans > > > for vDPA + SVM. There are none at the moment. > > > > Not SVM, SIOV. > > ... And that included.. I should have said vDPA + PASID, No current plans. > I have no idea who set expectations with you. Can you please put me in touch > with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives If they didn't plan to use it, bit of a strawman argument, right? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu