Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 29, 2018 at 10:53:32AM -0800, Linus Torvalds wrote: > Most of the high-performance IO is already using SG lists anyway, no? > Disk/networking/whatever. Networking basically never uses S/G lists. Block I/O mostly uses it, and graphics / media seems to have a fair amount of S/G uses, including very, erm special ones. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 29, 2018 at 10:31 AM Christoph Hellwig wrote: > > > Or, better yet, plan on removing the single-page dma mappign entirely > > at a later date, and make the issue moot. > > What would be the replacement? Build a S/G list for every single page > mapping? Not sure that would create a lot of happy campers.. It's what we ended up doing with some other cases, and it didn't really end up hurting as much as I thought it would. I'm thinking of the vfs functions that end up turning "buf, len" into struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len }; and then passing it around as a single-entry iov instead (not even that - they end up being an iov_iter, which is not just the iov, but the whole "what _kind_ of iov" indirection) Maybe a very similar model could be used for just simplifying the core dma mapping setup: sure, people will want to do single-area dma, but how bad would it be to just turn them into single-entry SG lists on stack, and then the dma-maping internally would just always see that? Most of the high-performance IO is already using SG lists anyway, no? Disk/networking/whatever. But just an idea. And the "map_sg()" error handling isn't actually any better, I think. It returns zero on error, no? So it's not improving the error handling. The whole dma-mapping layer seems full of those kinds of "inspired error handling choices" ;) Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 29, 2018 at 09:44:05AM -0800, Linus Torvalds wrote: > No. Really. If there's no iotlb, then you just mark that one page > reserved. It simply doesn't get used. It doesn't mean you suddenly > need a swiotlb. Sure, we could just skip that page entirely based on dma_to_phys. > But whatever. It's independent from the patch series under discussion. > Make dma_mapping_error() at least return a real error (eg -EINVAL, or > whatever is the common error), and we can maybe do this later. Ok, I'll do that. > Or, better yet, plan on removing the single-page dma mappign entirely > at a later date, and make the issue moot. What would be the replacement? Build a S/G list for every single page mapping? Not sure that would create a lot of happy campers.. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 29, 2018 at 8:23 AM Christoph Hellwig wrote: > > We can. At least in theory. The problem is that depending on the > crazy mapping from physical and kernel virtual address to dma addresses > these might be pages at pretty random places. Look at fun like > arch/x86/pci/sta2x11-fixup.c for how ugly these mappings could look. > > It also means that we might have setup swiotlb on just about every > 32-bit architecture, even if it has no real addressing limit except for > the one we imposed. No. Really. If there's no iotlb, then you just mark that one page reserved. It simply doesn't get used. It doesn't mean you suddenly need a swiotlb. If there *is* a iotlb, none of this should matter, because you'd just never map anything into that page. But whatever. It's independent from the patch series under discussion. Make dma_mapping_error() at least return a real error (eg -EINVAL, or whatever is the common error), and we can maybe do this later. Or, better yet, plan on removing the single-page dma mappign entirely at a later date, and make the issue moot. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 11:19:15AM -0800, Linus Torvalds wrote: > Let me just paste it back in here: > > "Which is what we ALREADY do for these exact reasons. If the DMA > mappings means that you'd need to add one more page to that list of > reserved pages, then so be it." > > So no, I'm not at all confused. > > Let me re-iterate: the argument that all addresses have to be dma'able is > garbage. > > *Exactly* as with kmalloc and limited virtual addresses, we can limit > physical addresses. We can. At least in theory. The problem is that depending on the crazy mapping from physical and kernel virtual address to dma addresses these might be pages at pretty random places. Look at fun like arch/x86/pci/sta2x11-fixup.c for how ugly these mappings could look. It also means that we might have setup swiotlb on just about every 32-bit architecture, even if it has no real addressing limit except for the one we imposed. I don't really see how this is a win for us just to be able to report more detailed error codes, which would be nice to have, but the lack of which hasn't really harmed us. So as far as I'm concerned I'd go either with the series that we are discussing here, or change the map_page method to return an errno and the dma_addr_t in the argument. As Davem pointed out that can lead to less optimal code, but it would still be better than the indirect call we have. But then again I think the series as posted here might and up much simpler and good enough without opening up this rathole. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 12:48 PM Russell King - ARM Linux wrote: > > On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote: > > From: Linus Torvalds > > Date: Wed, 28 Nov 2018 10:00:06 -0800 > > > > > Not all memory is accessible even to the kernel. If you have memory > > > that shows up in the last page of phys_addr_t, you just mark it > > > reserved at boot-time. > > > > It's not the physical memory at the end that needs to be reserved. > > > > It's the IOMMU mapping arena. > > True, if and only if you have an IOMMU. > > Where there isn't an IOMMU, then we'd have to reserve every page that > that translates to a bus address in the top 4K of dma_addr_t on any > bus in the system - that means knowing early in the kernel > initialisation about all buses in the system so we can detect and > reserve these pages. > The arch and platform differences/confusion as to "what is DMA error value" is the reason why dma_mapping_error is part dma ops. I understand that iy would make sense to reduce the overhead of an additional call, however, I am not sure if would be trivial change to address this. I was down this path of trying to address the missing mapping error checks a few years ago and introduced dma_mapping_error checks in the DMA DEBUG API. As you might already know that there is no common definition for the mapping error. Quick look at the defines shows: #define CALGARY_MAPPING_ERROR 0 #define S390_MAPPING_ERROR (~(dma_addr_t) 0x0) #define SPARC_MAPPING_ERROR (~(dma_addr_t)0x0) This is the reason why there is a arch or in some cases bus specific mapping_error ops is needed. We could unify this and fix all these first. I haven't looked at the patch set closely, maybe you are already doing this. thanks, -- Shuah ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote: > From: Linus Torvalds > Date: Wed, 28 Nov 2018 10:00:06 -0800 > > > Not all memory is accessible even to the kernel. If you have memory > > that shows up in the last page of phys_addr_t, you just mark it > > reserved at boot-time. > > It's not the physical memory at the end that needs to be reserved. > > It's the IOMMU mapping arena. True, if and only if you have an IOMMU. Where there isn't an IOMMU, then we'd have to reserve every page that that translates to a bus address in the top 4K of dma_addr_t on any bus in the system - that means knowing early in the kernel initialisation about all buses in the system so we can detect and reserve these pages. I don't think that's trivial to do. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 11:19:15AM -0800, Linus Torvalds wrote: > On Wed, Nov 28, 2018, 10:08 Russell King - ARM Linux wrote: > > > > > > > You already cannot do that kmalloc(), exactly because of ERR_PTR(). > > > > I'm very sorry, but I think you are confused. > > > > kmalloc() returns a _virtual_ address, which quite rightly must not be > > in the top 4K of 4GB, exactly due to ERR_PTR(). That is fine. > > > > However, that is a completely different kettle of fish from a physical > > or DMA address - neither of which are virtual addresses. > > > > You cut away the part that showed that I'm not in the least confused. > > Let me just paste it back in here: > > "Which is what we ALREADY do for these exact reasons. If the DMA > mappings means that you'd need to add one more page to that list of > reserved pages, then so be it." We're not talking about just one more page. We're talking about _every_ page that _may_ map to a bus address in the range of 4GB - 4K (a point I've already made earlier in this discussion.) It's not just about physical addresses, it's about bus addresses, and there are buses out there that have a translation between physical address and bus address without IOMMUs and the like. Can we know ahead of time about all these translations? It means moving that information out of various bus drivers into core architecture code (yuck, when you have multiple different platforms) or into DT (which means effectively breaking every damn platform that's using older DT files) - something that we don't have to do today. I remain 100% opposed to your idea. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
From: Linus Torvalds Date: Wed, 28 Nov 2018 10:00:06 -0800 > Not all memory is accessible even to the kernel. If you have memory > that shows up in the last page of phys_addr_t, you just mark it > reserved at boot-time. It's not the physical memory at the end that needs to be reserved. It's the IOMMU mapping arena. That basically requires changes to every IOMMU implementation in the tree. Not saying it's hard or impossible, but it is real meticulous work that needs to be done in order to realize this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 06:08:41PM +, Russell King - ARM Linux wrote: > On Wed, Nov 28, 2018 at 10:00:06AM -0800, Linus Torvalds wrote: > > On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux > > wrote: > > > > > > > I don't think this is a huge deal, but ERR_PTR() has been hugely > > > > successful elsewhere. And I'm not hugely convinced about all these > > > > "any address can be valid" arguments. How the hell do you generate a > > > > random dma address in the last page that isn't even page-aligned? > > > > > > kmalloc() a 64-byte buffer, dma_map_single() that buffer. > > > > No. > > > > You already cannot do that kmalloc(), exactly because of ERR_PTR(). > > I'm very sorry, but I think you are confused. > > kmalloc() returns a _virtual_ address, which quite rightly must not be > in the top 4K of 4GB, exactly due to ERR_PTR(). That is fine. > > However, that is a completely different kettle of fish from a physical > or DMA address - neither of which are virtual addresses. > > Now, say we have 1GB of RAM which starts at 0xc000 _physical_. > The kernel is configured with a 2GB/2GB user/kernel split, which means > all 1GB of RAM is mapped as lowmem from 0x8000 to 0xbfff > inclusive. This means kmalloc() can return any address in that range. > > ERR_PTR() will work correctly on any of those pointers, meaning that > none of them will be seen as an error. > > However, map any virtual address in the range of 0xb000 to > 0xbfff into DMA space, and the resulting DMA address could well > be in the range of 0xf000 to 0x - precisely the range > of addresses that you are advocating to be used for error codes. > > > The whole argument of "every possible piece of memory is DMA'able" is > > just wrong. > > I'm very sorry, but I do not buy your argument - you are conflating > virtual addresses which ERR_PTR() deals in with physical and bus > addresses - and if you persist down this route, you will cause > regressions. Here's another case: i.MX6 with 4GB of RAM. Devices are mapped to 0-0x0fff physical, RAM is mapped to 0x1000-0x physical. The last 256MB of RAM is not accessible as this is a 32-bit device. DMA addresses are the same as physical addresses. While the final physical page will be highmem in a normal kernel, and thus will not be available for kmalloc(), that doesn't mean it can't happen. A crashdump kernel loaded high in physical memory (eg, last 512MB and given the last 512MB to play around in) would have the top 512MB as lowmem, and therefore available for kmalloc(). If a page is available in lowmem, it's available for kmalloc(), and we can't say that we will never allocate memory from such a page for DMA - if we do and we're using an IS_ERR_VALUE() scheme, it _will_ break if that happens as memory will end up being mapped by the DMA API but dma_mapping_error() will see it as a failure. It won't be an obvious breakage, because it depends on the right conditions happening - a kmalloc() from the top page of physical RAM and that being passed to dma_map_single(). IOW, it's not something that a quick boot test would find, it's something that is likely to cause failures after a system has been running for a period of time. There are other situations where there are possibilities - such as: dma_map_page(dev, page, offset, size, direction) If 'page' is a highmem page which happens to be the top page in the 4GB space, and offset is non-zero, and there's a 1:1 mapping between physical address and DMA address, the returned value will be 0xf000 + offset - within the "last 4095 values are errors" range. Networking uses this for fragments - the packet fragment list is a list of pages, offsets and sizes - we have sendpage() that may end up finding that last page, and TCP-sized packets may be generated from it which would certianly result in non-zero offsets being passed to dma_map_page(). So, whatever way _I_ look at it, I find your proposal to be unsafe and potentially regression causing, and I *completely* and strongly oppose it in its current form. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018, 10:08 Russell King - ARM Linux > > > You already cannot do that kmalloc(), exactly because of ERR_PTR(). > > I'm very sorry, but I think you are confused. > > kmalloc() returns a _virtual_ address, which quite rightly must not be > in the top 4K of 4GB, exactly due to ERR_PTR(). That is fine. > > However, that is a completely different kettle of fish from a physical > or DMA address - neither of which are virtual addresses. > You cut away the part that showed that I'm not in the least confused. Let me just paste it back in here: "Which is what we ALREADY do for these exact reasons. If the DMA mappings means that you'd need to add one more page to that list of reserved pages, then so be it." So no, I'm not at all confused. Let me re-iterate: the argument that all addresses have to be dma'able is garbage. *Exactly* as with kmalloc and limited virtual addresses, we can limit physical addresses. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 10:00:06AM -0800, Linus Torvalds wrote: > On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux > wrote: > > > > > I don't think this is a huge deal, but ERR_PTR() has been hugely > > > successful elsewhere. And I'm not hugely convinced about all these > > > "any address can be valid" arguments. How the hell do you generate a > > > random dma address in the last page that isn't even page-aligned? > > > > kmalloc() a 64-byte buffer, dma_map_single() that buffer. > > No. > > You already cannot do that kmalloc(), exactly because of ERR_PTR(). I'm very sorry, but I think you are confused. kmalloc() returns a _virtual_ address, which quite rightly must not be in the top 4K of 4GB, exactly due to ERR_PTR(). That is fine. However, that is a completely different kettle of fish from a physical or DMA address - neither of which are virtual addresses. Now, say we have 1GB of RAM which starts at 0xc000 _physical_. The kernel is configured with a 2GB/2GB user/kernel split, which means all 1GB of RAM is mapped as lowmem from 0x8000 to 0xbfff inclusive. This means kmalloc() can return any address in that range. ERR_PTR() will work correctly on any of those pointers, meaning that none of them will be seen as an error. However, map any virtual address in the range of 0xb000 to 0xbfff into DMA space, and the resulting DMA address could well be in the range of 0xf000 to 0x - precisely the range of addresses that you are advocating to be used for error codes. > The whole argument of "every possible piece of memory is DMA'able" is > just wrong. I'm very sorry, but I do not buy your argument - you are conflating virtual addresses which ERR_PTR() deals in with physical and bus addresses - and if you persist down this route, you will cause regressions. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux wrote: > > > I don't think this is a huge deal, but ERR_PTR() has been hugely > > successful elsewhere. And I'm not hugely convinced about all these > > "any address can be valid" arguments. How the hell do you generate a > > random dma address in the last page that isn't even page-aligned? > > kmalloc() a 64-byte buffer, dma_map_single() that buffer. No. You already cannot do that kmalloc(), exactly because of ERR_PTR(). Not all memory is accessible even to the kernel. If you have memory that shows up in the last page of phys_addr_t, you just mark it reserved at boot-time. Which is what we ALREADY do for these exact reasons. If the DMA mappings means that you'd need to add one more page to that list of reserved pages, then so be it. The whole argument of "every possible piece of memory is DMA'able" is just wrong. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Wed, Nov 28, 2018 at 08:47:05AM -0800, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 11:41 PM Christoph Hellwig wrote: > > > > On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote: > > > Well, I can tweak the last patch to return -EINVAL from dma_mapping_error > > > instead of the old 1 is as bool true. The callers should all be fine, > > > although I'd have to audit them. Still wouldn't help with being able to > > > return different errors. > > > > Any opinions? I'd really like to make some forward progress on this > > series. > > So I do think that yes, dma_mapping_error() should return an error > code, not 0/1. > > But I was really hoping that the individual drivers themselves could > return error codes. Right now the patch-series has code like this: > > ret = needs_bounce(dev, dma_addr, size); > if (ret < 0) > - return ARM_MAPPING_ERROR; > + return DMA_MAPPING_ERROR; > > which while it all makes sense in the context of this patch-series, I > *really* think it would have been so much nicer to return the error > code 'ret' instead (which in this case is -E2BIG). > > I don't think this is a huge deal, but ERR_PTR() has been hugely > successful elsewhere. And I'm not hugely convinced about all these > "any address can be valid" arguments. How the hell do you generate a > random dma address in the last page that isn't even page-aligned? kmalloc() a 64-byte buffer, dma_map_single() that buffer. If you have RAM that maps to a _bus_ address in the top page of 4GB of a 32-bit bus address, then you lose. Simples. Subsystems like I2C, SPI, USB etc all deal with small kmalloc'd buffers and their drivers make use of DMA. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Tue, Nov 27, 2018 at 11:41 PM Christoph Hellwig wrote: > > On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote: > > Well, I can tweak the last patch to return -EINVAL from dma_mapping_error > > instead of the old 1 is as bool true. The callers should all be fine, > > although I'd have to audit them. Still wouldn't help with being able to > > return different errors. > > Any opinions? I'd really like to make some forward progress on this > series. So I do think that yes, dma_mapping_error() should return an error code, not 0/1. But I was really hoping that the individual drivers themselves could return error codes. Right now the patch-series has code like this: ret = needs_bounce(dev, dma_addr, size); if (ret < 0) - return ARM_MAPPING_ERROR; + return DMA_MAPPING_ERROR; which while it all makes sense in the context of this patch-series, I *really* think it would have been so much nicer to return the error code 'ret' instead (which in this case is -E2BIG). I don't think this is a huge deal, but ERR_PTR() has been hugely successful elsewhere. And I'm not hugely convinced about all these "any address can be valid" arguments. How the hell do you generate a random dma address in the last page that isn't even page-aligned? Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote: > On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote: > > No, the big immediate benefit of allowing "return -EINVAL" etc is > > simply legibility and error avoidance. > > Well, I can tweak the last patch to return -EINVAL from dma_mapping_error > instead of the old 1 is as bool true. The callers should all be fine, > although I'd have to audit them. Still wouldn't help with being able to > return different errors. Any opinions? I'd really like to make some forward progress on this series. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Fri, Nov 23, 2018 at 02:03:13PM +0100, Joerg Roedel wrote: > On Fri, Nov 23, 2018 at 11:01:55AM +, Russell King - ARM Linux wrote: > > Yuck. So, if we have a 4GB non-PAE 32-bit system, or a PAE system > > where we have valid memory across the 4GB boundary and no IOMMU, > > we have to reserve the top 4K page in the first 4GB of RAM? > > But that is only needed when dma_addr_t is 32bit anyway, no? You said: But we can easily work around that by reserving the top 4k of the first 4GB of IOVA address space in the allocator, no? Then these values are never returned as valid DMA handles. To me, your proposal equates to this in code: int dma_error(dma_addr_t addr) { return (addr & ~(dma_addr_t)0xfff) == 0xf000 ? (s32)addr : 0; } Hence, the size of dma_addr_t would be entirely irrelevant. This makes dma_addr_t values in the range of 0xf000 to 0x special, irrespective of whether dma_addr_t is 32-bit or 64-bit. If that's not what you meant, then I'm afraid your statement wasn't worded very well - so please can you re-word to state more precisely what your proposal is? > > Rather than inventing magic cookies like this, I'd much rather we > > sanitised the API so that we have functions that return success or > > an error code, rather than trying to shoe-horn some kind of magic > > error codes into dma_addr_t and subtly break systems in the process. > > Sure, but is has the obvious downside that we need to touch every driver > that uses these functions, and that are probably a lot of drivers. So we have two options: - change the interface - subtly break drivers In any case, I disagree that we need to touch every driver. Today, drivers just do: if (dma_mapping_error(dev, addr)) and, because dma_mapping_error() returns a boolean, they only care about the true/falseness. If we're going to start returning error codes, then there's no point just returning error codes unless we have some way for drivers to use them. Yes, the simple way would be to make dma_mapping_error() translate addr to an error code, and that maintains compatibility with existing drivers. If, instead, we revamped the DMA API, and had the *new* mapping functions which returned an error code, then the existing mapping functions can be implemented as part of compatibility rather trivially: dma_addr_t dma_map_single(...) { dma_addr_t addr; int error; error = dma_map_single_error(..., ); if (error) addr = DMA_MAPPING_ERROR; return addr; } which means that if drivers want access to the error code, they use dma_map_single_error(), meanwhile existing drivers just get on with life as it _currently_ is, with the cookie-based all-ones error code - without introducing any of this potential breakage. Remember, existing drivers would need modification in _any_ case to make use of a returned error code, so the argument that we'd need to touch every driver doesn't really stand up. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Fri, Nov 23, 2018 at 11:01:55AM +, Russell King - ARM Linux wrote: > Yuck. So, if we have a 4GB non-PAE 32-bit system, or a PAE system > where we have valid memory across the 4GB boundary and no IOMMU, > we have to reserve the top 4K page in the first 4GB of RAM? But that is only needed when dma_addr_t is 32bit anyway, no? > Rather than inventing magic cookies like this, I'd much rather we > sanitised the API so that we have functions that return success or > an error code, rather than trying to shoe-horn some kind of magic > error codes into dma_addr_t and subtly break systems in the process. Sure, but is has the obvious downside that we need to touch every driver that uses these functions, and that are probably a lot of drivers. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Fri, Nov 23, 2018 at 11:49:18AM +0100, Joerg Roedel wrote: > On Thu, Nov 22, 2018 at 05:52:15PM +, Robin Murphy wrote: > > Unfortunately, with things like the top-down IOVA allocator, and 32-bit > > systems in general, "the top 4095" values may well still be valid addresses > > - we're relying on a 1-byte mapping of the very top byte of memory/IOVA > > space being sufficiently ridiculous that no real code would ever do that, > > but even a 4-byte mapping of the top 4 bytes is within the realms of the > > plausible (I've definitely seen the USB layer make 8-byte mappings from any > > old offset within a page, for example). > > But we can easily work around that by reserving the top 4k of the first > 4GB of IOVA address space in the allocator, no? Then these values are > never returned as valid DMA handles. Yuck. So, if we have a 4GB non-PAE 32-bit system, or a PAE system where we have valid memory across the 4GB boundary and no IOMMU, we have to reserve the top 4K page in the first 4GB of RAM? Linus' IS_DMA_ERR() solution is way more preferable, but unfortunately it still falls short, because it knocks out the top 4K of every DMA capable bus. A DMA capable bus may involve some translation of the address (eg, by simple offsetting) which means that we'd need to potentially knock out multiple pages from the page allocator for each of those pages that correspond to the top 4K of any DMA capable bus. Knowing that at the right time at boot will be fun! It also sounds wasteful to me. Rather than inventing magic cookies like this, I'd much rather we sanitised the API so that we have functions that return success or an error code, rather than trying to shoe-horn some kind of magic error codes into dma_addr_t and subtly break systems in the process. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 05:52:15PM +, Robin Murphy wrote: > Unfortunately, with things like the top-down IOVA allocator, and 32-bit > systems in general, "the top 4095" values may well still be valid addresses > - we're relying on a 1-byte mapping of the very top byte of memory/IOVA > space being sufficiently ridiculous that no real code would ever do that, > but even a 4-byte mapping of the top 4 bytes is within the realms of the > plausible (I've definitely seen the USB layer make 8-byte mappings from any > old offset within a page, for example). But we can easily work around that by reserving the top 4k of the first 4GB of IOVA address space in the allocator, no? Then these values are never returned as valid DMA handles. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 06:05:26PM +, Russell King - ARM Linux wrote: > An alternative idea would be to migrate away from the > dma_map_single() and dma_map_page() interfaces that return a > dma_addr_t, and instead have them return an error code or zero > on success. See here for a proposal: https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030912.html That is just the attr variants, but that would be a start. Dave didn't particularly like it, though. > note the simpler unmap API, which inherently guarantees that the > parameters to the map could be carried over to the unmap - without > our many driver authors having to think about it. The problem is that we can often derive some or all parameters from field already inherent in the upper layer or hardware interface. So for these cases your version would bloat the required data structures. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote: > No, the big immediate benefit of allowing "return -EINVAL" etc is > simply legibility and error avoidance. Well, I can tweak the last patch to return -EINVAL from dma_mapping_error instead of the old 1 is as bool true. The callers should all be fine, although I'd have to audit them. Still wouldn't help with being able to return different errors. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote: > On Thu, Nov 22, 2018 at 9:52 AM Robin Murphy wrote: > > > > Unfortunately, with things like the top-down IOVA allocator, and 32-bit > > systems in general, "the top 4095" values may well still be valid > > addresses - > > Ugh. > > > The only immediate benefit I can see is that we could distinguish cases > > like the first which can never possibly succeed, and thus callers could > > actually give up instead of doing what various subsystems currently do > > and retry the exact same mapping indefinitely on the apparent assumption > > that errors must be transient. > > No, the big immediate benefit of allowing "return -EINVAL" etc is > simply legibility and error avoidance. > > It's basically how pretty much all the rest of the kernel returns > errors, so not only is it very obvious, it's also what people do > without even thinking.. So it would be good to work. An alternative idea would be to migrate away from the dma_map_single() and dma_map_page() interfaces that return a dma_addr_t, and instead have them return an error code or zero on success. I've thought for some time that our DMA interfaces aren't particularly friendly, especially after we had the stuff with PCI DMA which migrated its way into the DMA API: DEFINE_DMA_UNMAP_ADDR DEFINE_DMA_UNMAP_LEN dma_unmap_* When coupled that with the requirement that dma_unmap_*() should be called with the same parameters as dma_map_*(), I wonder why we never did: struct dma_map_state { dma_addr_t dma_addr; whatever's needed; } int dma_map_single(struct device *dev, struct dma_map_state *state, void *cpu, size_t len, enum dma_data_direction dir); void dma_unmap_single(struct device *dev, struct dma_map_state *state); note the simpler unmap API, which inherently guarantees that the parameters to the map could be carried over to the unmap - without our many driver authors having to think about it. That also paves the way for dma_map_single() to return an error code or zero. However, I fear that boat sailed long ago - but maybe its worth thinking along these lines if we want to sanitise the API now? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 9:52 AM Robin Murphy wrote: > > Unfortunately, with things like the top-down IOVA allocator, and 32-bit > systems in general, "the top 4095" values may well still be valid > addresses - Ugh. > The only immediate benefit I can see is that we could distinguish cases > like the first which can never possibly succeed, and thus callers could > actually give up instead of doing what various subsystems currently do > and retry the exact same mapping indefinitely on the apparent assumption > that errors must be transient. No, the big immediate benefit of allowing "return -EINVAL" etc is simply legibility and error avoidance. It's basically how pretty much all the rest of the kernel returns errors, so not only is it very obvious, it's also what people do without even thinking.. So it would be good to work. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On 22/11/2018 17:09, Linus Torvalds wrote: On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux wrote: I'm afraid that won't work very well - 32 bit platforms with 64-bit addresses (LPAE) would have dma_addr_t as a 64-bit value, which wouldn't fit into an unsigned long. Good point. So we'd have to have a special IS_DMA_ERR() function that takes a dma_addr_t and checks the same "is it the top 4095 values". Because I'd still prefer to allow people to return the _actual_ error, and to have "return -Exyz" as the error case. That part still DTRT even with dma_addr_t. Unfortunately, with things like the top-down IOVA allocator, and 32-bit systems in general, "the top 4095" values may well still be valid addresses - we're relying on a 1-byte mapping of the very top byte of memory/IOVA space being sufficiently ridiculous that no real code would ever do that, but even a 4-byte mapping of the top 4 bytes is within the realms of the plausible (I've definitely seen the USB layer make 8-byte mappings from any old offset within a page, for example). Thus we'd have to go with the extra complication of detecting and carving out problematic memory maps in those corner cases as Russell alluded to, for the sake of better error reporting in places where, in the majority of cases, there's not really all that many ways to go wrong. Off the top of my head, I guess: -EINVAL if the address is outside the device's DMA mask (and there is no IOMMU or bounce buffer). -ENOSPC if there is an IOMMU or bounce buffer, but no free IOVA/buffer space currently. -ESOMETHINGWENTWRONGWITHIOMMUPAGETABLES or similar, because it's not like the caller can treat that as anything other than an opaque failure anyway. The only immediate benefit I can see is that we could distinguish cases like the first which can never possibly succeed, and thus callers could actually give up instead of doing what various subsystems currently do and retry the exact same mapping indefinitely on the apparent assumption that errors must be transient. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 08:50:47AM -0800, Linus Torvalds wrote: > On Thu, Nov 22, 2018 at 6:03 AM Christoph Hellwig wrote: > > > > The 0 return doesn't work for direct mappings that have ram at address > > zero and a lot of IOMMUs that start allocating bus space from address > > zero, so we can't consolidate on that, but I think we can move everyone > > to all-Fs, which the patch here does. > > Hmm. Maybe not limit it to just all ones, but actually use the > (standard, for the kernel) IS_ERR_VALUE()? > > That basically reserves the last 4095 values in an unsigned long for > error values. > > Then those functions could actually return *what* error they > encountered, using just plain > > return -ENOMEM; > > or whatever? Linus, I'm afraid that won't work very well - 32 bit platforms with 64-bit addresses (LPAE) would have dma_addr_t as a 64-bit value, which wouldn't fit into an unsigned long. IS_ERR_VALUE() would cast a 64-bit DMA address down to a 32-bit pointer (effectively masking with 0x). It would have the effect of making (eg) 0x_fVVV an error, where are any of the top 32-bits of a 64-bit bus address, and VVV is the error code value. That could be a problem if you hit it in several places throughout your available RAM... we'd have to mark every top page of RAM in a naturally aligned 4GB as unusable, as well as block the top page in natually aligned 4GB blocks from IOMMUs... and then what about buses that have weird offsets... So, I don't think the IS_ERR_VALUE() would work very well. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 09:09:47AM -0800, Linus Torvalds wrote: > On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux > wrote: > > > > I'm afraid that won't work very well - 32 bit platforms with 64-bit > > addresses (LPAE) would have dma_addr_t as a 64-bit value, which > > wouldn't fit into an unsigned long. > > Good point. So we'd have to have a special IS_DMA_ERR() function that > takes a dma_addr_t and checks the same "is it the top 4095 values". No problem with that. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux wrote: > > I'm afraid that won't work very well - 32 bit platforms with 64-bit > addresses (LPAE) would have dma_addr_t as a 64-bit value, which > wouldn't fit into an unsigned long. Good point. So we'd have to have a special IS_DMA_ERR() function that takes a dma_addr_t and checks the same "is it the top 4095 values". Because I'd still prefer to allow people to return the _actual_ error, and to have "return -Exyz" as the error case. That part still DTRT even with dma_addr_t. Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: remove the ->mapping_error method from dma_map_ops V2
On Thu, Nov 22, 2018 at 6:03 AM Christoph Hellwig wrote: > > The 0 return doesn't work for direct mappings that have ram at address > zero and a lot of IOMMUs that start allocating bus space from address > zero, so we can't consolidate on that, but I think we can move everyone > to all-Fs, which the patch here does. Hmm. Maybe not limit it to just all ones, but actually use the (standard, for the kernel) IS_ERR_VALUE()? That basically reserves the last 4095 values in an unsigned long for error values. Then those functions could actually return *what* error they encountered, using just plain return -ENOMEM; or whatever? Linus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu