Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 10:53:32AM -0800, Linus Torvalds wrote:
> Most of the high-performance IO is already using SG lists anyway, no?
> Disk/networking/whatever.

Networking basically never uses S/G lists.  Block I/O mostly uses it,
and graphics / media seems to have a fair amount of S/G uses, including
very, erm special ones.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Linus Torvalds
On Thu, Nov 29, 2018 at 10:31 AM Christoph Hellwig  wrote:
>
> > Or, better yet, plan on removing the single-page dma mappign entirely
> > at a later date, and make the issue moot.
>
> What would be the replacement?  Build a S/G list for every single page
> mapping?  Not sure that would create a lot of happy campers..

It's what we ended up doing with some other cases, and it didn't
really end up hurting as much as I thought it would.

I'm thinking of the vfs functions that end up turning "buf, len" into

struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };

and then passing it around as a single-entry iov instead (not even
that - they end up being an iov_iter, which is not just the iov, but
the whole "what _kind_ of iov" indirection)

Maybe a very similar model could be used for just simplifying the core
dma mapping setup: sure, people will want to do single-area dma, but
how bad would it be to just turn them into single-entry SG lists on
stack, and then the dma-maping internally would just always see that?

Most of the high-performance IO is already using SG lists anyway, no?
Disk/networking/whatever.

But just an idea. And the "map_sg()" error handling isn't actually any
better, I think. It returns zero on error, no? So it's not improving
the error handling.

The whole dma-mapping layer seems full of those kinds of "inspired
error handling choices" ;)

  Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 09:44:05AM -0800, Linus Torvalds wrote:
> No. Really. If there's no iotlb, then you just mark that one page
> reserved. It simply doesn't get used. It doesn't mean you suddenly
> need a swiotlb.

Sure, we could just skip that page entirely based on dma_to_phys.

> But whatever. It's independent from the patch series under discussion.
> Make dma_mapping_error() at least return a real error (eg -EINVAL, or
> whatever is the common error), and we can maybe do this later.

Ok, I'll do that.

> Or, better yet, plan on removing the single-page dma mappign entirely
> at a later date, and make the issue moot.

What would be the replacement?  Build a S/G list for every single page
mapping?  Not sure that would create a lot of happy campers..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Linus Torvalds
On Thu, Nov 29, 2018 at 8:23 AM Christoph Hellwig  wrote:
>
> We can.  At least in theory.  The problem is that depending on the
> crazy mapping from physical and kernel virtual address to dma addresses
> these might be pages at pretty random places.  Look at fun like
> arch/x86/pci/sta2x11-fixup.c for how ugly these mappings could look.
>
> It also means that we might have setup swiotlb on just about every
> 32-bit architecture, even if it has no real addressing limit except for
> the one we imposed.

No. Really. If there's no iotlb, then you just mark that one page
reserved. It simply doesn't get used. It doesn't mean you suddenly
need a swiotlb.

If there *is* a iotlb, none of this should matter, because you'd just
never map anything into that page.

But whatever. It's independent from the patch series under discussion.
Make dma_mapping_error() at least return a real error (eg -EINVAL, or
whatever is the common error), and we can maybe do this later.

Or, better yet, plan on removing the single-page dma mappign entirely
at a later date, and make the issue moot.

  Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 11:19:15AM -0800, Linus Torvalds wrote:
> Let me just paste it back in here:
> 
>   "Which is what we ALREADY do for these exact reasons. If the DMA
> mappings means that you'd need to add one more page to that list of
> reserved pages, then so be it."
> 
> So no, I'm not at all confused.
> 
> Let me re-iterate: the argument that all addresses have to be dma'able is
> garbage.
> 
> *Exactly* as with kmalloc and limited virtual addresses, we can limit
> physical addresses.

We can.  At least in theory.  The problem is that depending on the
crazy mapping from physical and kernel virtual address to dma addresses
these might be pages at pretty random places.  Look at fun like
arch/x86/pci/sta2x11-fixup.c for how ugly these mappings could look.

It also means that we might have setup swiotlb on just about every
32-bit architecture, even if it has no real addressing limit except for
the one we imposed.  I don't really see how this is a win for us just
to be able to report more detailed error codes, which would be nice
to have, but the lack of which hasn't really harmed us.

So as far as I'm concerned I'd go either with the series that we are
discussing here, or change the map_page method to return an errno
and the dma_addr_t in the argument.  As Davem pointed out that can lead
to less optimal code, but it would still be better than the indirect
call we have.  But then again I think the series as posted here might
and up much simpler and good enough without opening up this rathole.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Shuah Khan
On Wed, Nov 28, 2018 at 12:48 PM Russell King - ARM Linux
 wrote:
>
> On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote:
> > From: Linus Torvalds 
> > Date: Wed, 28 Nov 2018 10:00:06 -0800
> >
> > > Not all memory is accessible even to the kernel. If you have memory
> > > that shows up in the last page of phys_addr_t, you just mark it
> > > reserved at boot-time.
> >
> > It's not the physical memory at the end that needs to be reserved.
> >
> > It's the IOMMU mapping arena.
>
> True, if and only if you have an IOMMU.
>
> Where there isn't an IOMMU, then we'd have to reserve every page that
> that translates to a bus address in the top 4K of dma_addr_t on any
> bus in the system - that means knowing early in the kernel
> initialisation about all buses in the system so we can detect and
> reserve these pages.
>

The arch and platform differences/confusion as to "what is DMA error
value" is the reason why dma_mapping_error is part dma ops. I
understand that iy would make sense to reduce the overhead of an
additional call, however, I am not sure if would be trivial change to
address this.

I was down this path of trying to address the missing mapping error
checks a few years ago and introduced dma_mapping_error checks in the
DMA DEBUG API. As you might already know that there is no common
definition for the mapping error.

Quick look at the defines shows:

#define CALGARY_MAPPING_ERROR 0
#define S390_MAPPING_ERROR  (~(dma_addr_t) 0x0)
#define SPARC_MAPPING_ERROR (~(dma_addr_t)0x0)

This is the reason why there is a arch or in some cases bus specific
mapping_error ops is needed.
We could unify this and fix all these first. I haven't looked at the
patch set closely, maybe you are already doing this.

thanks,
-- Shuah
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 11:27:17AM -0800, David Miller wrote:
> From: Linus Torvalds 
> Date: Wed, 28 Nov 2018 10:00:06 -0800
> 
> > Not all memory is accessible even to the kernel. If you have memory
> > that shows up in the last page of phys_addr_t, you just mark it
> > reserved at boot-time.
> 
> It's not the physical memory at the end that needs to be reserved.
> 
> It's the IOMMU mapping arena.

True, if and only if you have an IOMMU.

Where there isn't an IOMMU, then we'd have to reserve every page that
that translates to a bus address in the top 4K of dma_addr_t on any
bus in the system - that means knowing early in the kernel
initialisation about all buses in the system so we can detect and
reserve these pages.

I don't think that's trivial to do.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 11:19:15AM -0800, Linus Torvalds wrote:
> On Wed, Nov 28, 2018, 10:08 Russell King - ARM Linux  wrote:
> 
> > >
> > > You already cannot do that kmalloc(), exactly because of ERR_PTR().
> >
> > I'm very sorry, but I think you are confused.
> >
> > kmalloc() returns a _virtual_ address, which quite rightly must not be
> > in the top 4K of 4GB, exactly due to ERR_PTR().  That is fine.
> >
> > However, that is a completely different kettle of fish from a physical
> > or DMA address - neither of which are virtual addresses.
> >
> 
> You cut away the part that showed that I'm not in the least confused.
> 
> Let me just paste it back in here:
> 
>   "Which is what we ALREADY do for these exact reasons. If the DMA
> mappings means that you'd need to add one more page to that list of
> reserved pages, then so be it."

We're not talking about just one more page.  We're talking about
_every_ page that _may_ map to a bus address in the range of 4GB - 4K
(a point I've already made earlier in this discussion.)

It's not just about physical addresses, it's about bus addresses,
and there are buses out there that have a translation between physical
address and bus address without IOMMUs and the like.

Can we know ahead of time about all these translations?  It means
moving that information out of various bus drivers into core
architecture code (yuck, when you have multiple different platforms)
or into DT (which means effectively breaking every damn platform
that's using older DT files) - something that we don't have to do
today.

I remain 100% opposed to your idea.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread David Miller
From: Linus Torvalds 
Date: Wed, 28 Nov 2018 10:00:06 -0800

> Not all memory is accessible even to the kernel. If you have memory
> that shows up in the last page of phys_addr_t, you just mark it
> reserved at boot-time.

It's not the physical memory at the end that needs to be reserved.

It's the IOMMU mapping arena.

That basically requires changes to every IOMMU implementation in
the tree.  Not saying it's hard or impossible, but it is real
meticulous work that needs to be done in order to realize this.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 06:08:41PM +, Russell King - ARM Linux wrote:
> On Wed, Nov 28, 2018 at 10:00:06AM -0800, Linus Torvalds wrote:
> > On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux
> >  wrote:
> > >
> > > > I don't think this is a huge deal, but ERR_PTR() has been hugely
> > > > successful elsewhere. And I'm not hugely convinced about all these
> > > > "any address can be valid" arguments. How the hell do you generate a
> > > > random dma address in the last page that isn't even page-aligned?
> > >
> > > kmalloc() a 64-byte buffer, dma_map_single() that buffer.
> > 
> > No.
> > 
> > You already cannot do that kmalloc(), exactly because of ERR_PTR().
> 
> I'm very sorry, but I think you are confused.
> 
> kmalloc() returns a _virtual_ address, which quite rightly must not be
> in the top 4K of 4GB, exactly due to ERR_PTR().  That is fine.
> 
> However, that is a completely different kettle of fish from a physical
> or DMA address - neither of which are virtual addresses.
> 
> Now, say we have 1GB of RAM which starts at 0xc000 _physical_.
> The kernel is configured with a 2GB/2GB user/kernel split, which means
> all 1GB of RAM is mapped as lowmem from 0x8000 to 0xbfff
> inclusive.  This means kmalloc() can return any address in that range.
> 
> ERR_PTR() will work correctly on any of those pointers, meaning that
> none of them will be seen as an error.
> 
> However, map any virtual address in the range of 0xb000 to
> 0xbfff into DMA space, and the resulting DMA address could well
> be in the range of 0xf000 to 0x - precisely the range
> of addresses that you are advocating to be used for error codes.
> 
> > The whole argument of "every possible piece of memory is DMA'able" is
> > just wrong.
> 
> I'm very sorry, but I do not buy your argument - you are conflating
> virtual addresses which ERR_PTR() deals in with physical and bus
> addresses - and if you persist down this route, you will cause
> regressions.

Here's another case:

i.MX6 with 4GB of RAM.  Devices are mapped to 0-0x0fff physical,
RAM is mapped to 0x1000-0x physical.  The last 256MB of
RAM is not accessible as this is a 32-bit device.  DMA addresses are
the same as physical addresses.

While the final physical page will be highmem in a normal kernel,
and thus will not be available for kmalloc(), that doesn't mean it
can't happen.  A crashdump kernel loaded high in physical memory
(eg, last 512MB and given the last 512MB to play around in) would
have the top 512MB as lowmem, and therefore available for kmalloc().

If a page is available in lowmem, it's available for kmalloc(), and
we can't say that we will never allocate memory from such a page for
DMA - if we do and we're using an IS_ERR_VALUE() scheme, it _will_
break if that happens as memory will end up being mapped by the DMA
API but dma_mapping_error() will see it as a failure.

It won't be an obvious breakage, because it depends on the right
conditions happening - a kmalloc() from the top page of physical
RAM and that being passed to dma_map_single().  IOW, it's not something
that a quick boot test would find, it's something that is likely to
cause failures after a system has been running for a period of time.

There are other situations where there are possibilities - such as:

dma_map_page(dev, page, offset, size, direction)

If 'page' is a highmem page which happens to be the top page in the
4GB space, and offset is non-zero, and there's a 1:1 mapping between
physical address and DMA address, the returned value will be
0xf000 + offset - within the "last 4095 values are errors"
range.

Networking uses this for fragments - the packet fragment list is
a list of pages, offsets and sizes - we have sendpage() that may
end up finding that last page, and TCP-sized packets may be
generated from it which would certianly result in non-zero offsets
being passed to dma_map_page().

So, whatever way _I_ look at it, I find your proposal to be unsafe
and potentially regression causing, and I *completely* and strongly
oppose it in its current form.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Linus Torvalds
On Wed, Nov 28, 2018, 10:08 Russell King - ARM Linux  >
> > You already cannot do that kmalloc(), exactly because of ERR_PTR().
>
> I'm very sorry, but I think you are confused.
>
> kmalloc() returns a _virtual_ address, which quite rightly must not be
> in the top 4K of 4GB, exactly due to ERR_PTR().  That is fine.
>
> However, that is a completely different kettle of fish from a physical
> or DMA address - neither of which are virtual addresses.
>

You cut away the part that showed that I'm not in the least confused.

Let me just paste it back in here:

  "Which is what we ALREADY do for these exact reasons. If the DMA
mappings means that you'd need to add one more page to that list of
reserved pages, then so be it."

So no, I'm not at all confused.

Let me re-iterate: the argument that all addresses have to be dma'able is
garbage.

*Exactly* as with kmalloc and limited virtual addresses, we can limit
physical addresses.

  Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 10:00:06AM -0800, Linus Torvalds wrote:
> On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux
>  wrote:
> >
> > > I don't think this is a huge deal, but ERR_PTR() has been hugely
> > > successful elsewhere. And I'm not hugely convinced about all these
> > > "any address can be valid" arguments. How the hell do you generate a
> > > random dma address in the last page that isn't even page-aligned?
> >
> > kmalloc() a 64-byte buffer, dma_map_single() that buffer.
> 
> No.
> 
> You already cannot do that kmalloc(), exactly because of ERR_PTR().

I'm very sorry, but I think you are confused.

kmalloc() returns a _virtual_ address, which quite rightly must not be
in the top 4K of 4GB, exactly due to ERR_PTR().  That is fine.

However, that is a completely different kettle of fish from a physical
or DMA address - neither of which are virtual addresses.

Now, say we have 1GB of RAM which starts at 0xc000 _physical_.
The kernel is configured with a 2GB/2GB user/kernel split, which means
all 1GB of RAM is mapped as lowmem from 0x8000 to 0xbfff
inclusive.  This means kmalloc() can return any address in that range.

ERR_PTR() will work correctly on any of those pointers, meaning that
none of them will be seen as an error.

However, map any virtual address in the range of 0xb000 to
0xbfff into DMA space, and the resulting DMA address could well
be in the range of 0xf000 to 0x - precisely the range
of addresses that you are advocating to be used for error codes.

> The whole argument of "every possible piece of memory is DMA'able" is
> just wrong.

I'm very sorry, but I do not buy your argument - you are conflating
virtual addresses which ERR_PTR() deals in with physical and bus
addresses - and if you persist down this route, you will cause
regressions.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Linus Torvalds
On Wed, Nov 28, 2018 at 9:45 AM Russell King - ARM Linux
 wrote:
>
> > I don't think this is a huge deal, but ERR_PTR() has been hugely
> > successful elsewhere. And I'm not hugely convinced about all these
> > "any address can be valid" arguments. How the hell do you generate a
> > random dma address in the last page that isn't even page-aligned?
>
> kmalloc() a 64-byte buffer, dma_map_single() that buffer.

No.

You already cannot do that kmalloc(), exactly because of ERR_PTR().

Not all memory is accessible even to the kernel. If you have memory
that shows up in the last page of phys_addr_t, you just mark it
reserved at boot-time.

Which is what we ALREADY do for these exact reasons. If the DMA
mappings means that you'd need to add one more page to that list of
reserved pages, then so be it.

The whole argument of "every possible piece of memory is DMA'able" is
just wrong.

Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 08:47:05AM -0800, Linus Torvalds wrote:
> On Tue, Nov 27, 2018 at 11:41 PM Christoph Hellwig  wrote:
> >
> > On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote:
> > > Well, I can tweak the last patch to return -EINVAL from dma_mapping_error
> > > instead of the old 1 is as bool true.  The callers should all be fine,
> > > although I'd have to audit them.  Still wouldn't help with being able to
> > > return different errors.
> >
> > Any opinions?  I'd really like to make some forward progress on this
> > series.
> 
> So I do think that yes, dma_mapping_error() should return an error
> code, not 0/1.
> 
> But I was really hoping that the individual drivers themselves could
> return error codes. Right now the patch-series has code like this:
> 
>   ret = needs_bounce(dev, dma_addr, size);
>   if (ret < 0)
> - return ARM_MAPPING_ERROR;
> + return DMA_MAPPING_ERROR;
> 
> which while it all makes sense in the context of this patch-series, I
> *really* think it would have been so much nicer to return the error
> code 'ret' instead (which in this case is -E2BIG).
> 
> I don't think this is a huge deal, but ERR_PTR() has been hugely
> successful elsewhere. And I'm not hugely convinced about all these
> "any address can be valid" arguments. How the hell do you generate a
> random dma address in the last page that isn't even page-aligned?

kmalloc() a 64-byte buffer, dma_map_single() that buffer.  If you
have RAM that maps to a _bus_ address in the top page of 4GB of a
32-bit bus address, then you lose.  Simples.

Subsystems like I2C, SPI, USB etc all deal with small kmalloc'd
buffers and their drivers make use of DMA.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-28 Thread Linus Torvalds
On Tue, Nov 27, 2018 at 11:41 PM Christoph Hellwig  wrote:
>
> On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote:
> > Well, I can tweak the last patch to return -EINVAL from dma_mapping_error
> > instead of the old 1 is as bool true.  The callers should all be fine,
> > although I'd have to audit them.  Still wouldn't help with being able to
> > return different errors.
>
> Any opinions?  I'd really like to make some forward progress on this
> series.

So I do think that yes, dma_mapping_error() should return an error
code, not 0/1.

But I was really hoping that the individual drivers themselves could
return error codes. Right now the patch-series has code like this:

  ret = needs_bounce(dev, dma_addr, size);
  if (ret < 0)
- return ARM_MAPPING_ERROR;
+ return DMA_MAPPING_ERROR;

which while it all makes sense in the context of this patch-series, I
*really* think it would have been so much nicer to return the error
code 'ret' instead (which in this case is -E2BIG).

I don't think this is a huge deal, but ERR_PTR() has been hugely
successful elsewhere. And I'm not hugely convinced about all these
"any address can be valid" arguments. How the hell do you generate a
random dma address in the last page that isn't even page-aligned?

 Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-27 Thread Christoph Hellwig
On Fri, Nov 23, 2018 at 07:55:11AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote:
> > No, the big immediate benefit of allowing "return -EINVAL" etc is
> > simply legibility and error avoidance.
> 
> Well, I can tweak the last patch to return -EINVAL from dma_mapping_error
> instead of the old 1 is as bool true.  The callers should all be fine,
> although I'd have to audit them.  Still wouldn't help with being able to
> return different errors.

Any opinions?  I'd really like to make some forward progress on this
series.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 02:03:13PM +0100, Joerg Roedel wrote:
> On Fri, Nov 23, 2018 at 11:01:55AM +, Russell King - ARM Linux wrote:
> > Yuck.  So, if we have a 4GB non-PAE 32-bit system, or a PAE system
> > where we have valid memory across the 4GB boundary and no IOMMU,
> > we have to reserve the top 4K page in the first 4GB of RAM?
> 
> But that is only needed when dma_addr_t is 32bit anyway, no?

You said:

  But we can easily work around that by reserving the top 4k of the first
  4GB of IOVA address space in the allocator, no? Then these values are
  never returned as valid DMA handles.

To me, your proposal equates to this in code:

int dma_error(dma_addr_t addr)
{
return (addr & ~(dma_addr_t)0xfff) == 0xf000 ? (s32)addr : 0; 
}

Hence, the size of dma_addr_t would be entirely irrelevant.  This
makes dma_addr_t values in the range of 0xf000 to 0x special,
irrespective of whether dma_addr_t is 32-bit or 64-bit.

If that's not what you meant, then I'm afraid your statement wasn't
worded very well - so please can you re-word to state more precisely
what your proposal is?

> > Rather than inventing magic cookies like this, I'd much rather we
> > sanitised the API so that we have functions that return success or
> > an error code, rather than trying to shoe-horn some kind of magic
> > error codes into dma_addr_t and subtly break systems in the process.
> 
> Sure, but is has the obvious downside that we need to touch every driver
> that uses these functions, and that are probably a lot of drivers.

So we have two options:
- change the interface
- subtly break drivers

In any case, I disagree that we need to touch every driver.  Today,
drivers just do:

if (dma_mapping_error(dev, addr))

and, because dma_mapping_error() returns a boolean, they only care about
the true/falseness.  If we're going to start returning error codes, then
there's no point just returning error codes unless we have some way for
drivers to use them.  Yes, the simple way would be to make
dma_mapping_error() translate addr to an error code, and that maintains
compatibility with existing drivers.

If, instead, we revamped the DMA API, and had the *new* mapping functions
which returned an error code, then the existing mapping functions can be
implemented as part of compatibility rather trivially:

dma_addr_t dma_map_single(...)
{
dma_addr_t addr;
int error;

error = dma_map_single_error(..., );
if (error)
addr = DMA_MAPPING_ERROR;
return addr;
}

which means that if drivers want access to the error code, they use
dma_map_single_error(), meanwhile existing drivers just get on with life
as it _currently_ is, with the cookie-based all-ones error code - without
introducing any of this potential breakage.

Remember, existing drivers would need modification in _any_ case to make
use of a returned error code, so the argument that we'd need to touch
every driver doesn't really stand up.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-23 Thread Joerg Roedel
On Fri, Nov 23, 2018 at 11:01:55AM +, Russell King - ARM Linux wrote:
> Yuck.  So, if we have a 4GB non-PAE 32-bit system, or a PAE system
> where we have valid memory across the 4GB boundary and no IOMMU,
> we have to reserve the top 4K page in the first 4GB of RAM?

But that is only needed when dma_addr_t is 32bit anyway, no?

> Rather than inventing magic cookies like this, I'd much rather we
> sanitised the API so that we have functions that return success or
> an error code, rather than trying to shoe-horn some kind of magic
> error codes into dma_addr_t and subtly break systems in the process.

Sure, but is has the obvious downside that we need to touch every driver
that uses these functions, and that are probably a lot of drivers.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 11:49:18AM +0100, Joerg Roedel wrote:
> On Thu, Nov 22, 2018 at 05:52:15PM +, Robin Murphy wrote:
> > Unfortunately, with things like the top-down IOVA allocator, and 32-bit
> > systems in general, "the top 4095" values may well still be valid addresses
> > - we're relying on a 1-byte mapping of the very top byte of memory/IOVA
> > space being sufficiently ridiculous that no real code would ever do that,
> > but even a 4-byte mapping of the top 4 bytes is within the realms of the
> > plausible (I've definitely seen the USB layer make 8-byte mappings from any
> > old offset within a page, for example).
> 
> But we can easily work around that by reserving the top 4k of the first
> 4GB of IOVA address space in the allocator, no? Then these values are
> never returned as valid DMA handles.

Yuck.  So, if we have a 4GB non-PAE 32-bit system, or a PAE system
where we have valid memory across the 4GB boundary and no IOMMU,
we have to reserve the top 4K page in the first 4GB of RAM?

Linus' IS_DMA_ERR() solution is way more preferable, but unfortunately
it still falls short, because it knocks out the top 4K of every DMA
capable bus.

A DMA capable bus may involve some translation of the address (eg, by
simple offsetting) which means that we'd need to potentially knock out
multiple pages from the page allocator for each of those pages that
correspond to the top 4K of any DMA capable bus.  Knowing that at the
right time at boot will be fun!  It also sounds wasteful to me.

Rather than inventing magic cookies like this, I'd much rather we
sanitised the API so that we have functions that return success or
an error code, rather than trying to shoe-horn some kind of magic
error codes into dma_addr_t and subtly break systems in the process.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-23 Thread Joerg Roedel
On Thu, Nov 22, 2018 at 05:52:15PM +, Robin Murphy wrote:
> Unfortunately, with things like the top-down IOVA allocator, and 32-bit
> systems in general, "the top 4095" values may well still be valid addresses
> - we're relying on a 1-byte mapping of the very top byte of memory/IOVA
> space being sufficiently ridiculous that no real code would ever do that,
> but even a 4-byte mapping of the top 4 bytes is within the realms of the
> plausible (I've definitely seen the USB layer make 8-byte mappings from any
> old offset within a page, for example).

But we can easily work around that by reserving the top 4k of the first
4GB of IOVA address space in the allocator, no? Then these values are
never returned as valid DMA handles.


Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 06:05:26PM +, Russell King - ARM Linux wrote:
> An alternative idea would be to migrate away from the
> dma_map_single() and dma_map_page() interfaces that return a
> dma_addr_t, and instead have them return an error code or zero
> on success.

See here for a proposal:

https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030912.html

That is just the attr variants, but that would be a start.  Dave didn't
particularly like it, though.

> note the simpler unmap API, which inherently guarantees that the
> parameters to the map could be carried over to the unmap - without
> our many driver authors having to think about it.

The problem is that we can often derive some or all parameters from
field already inherent in the upper layer or hardware interface.  So
for these cases your version would bloat the required data structures.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote:
> No, the big immediate benefit of allowing "return -EINVAL" etc is
> simply legibility and error avoidance.

Well, I can tweak the last patch to return -EINVAL from dma_mapping_error
instead of the old 1 is as bool true.  The callers should all be fine,
although I'd have to audit them.  Still wouldn't help with being able to
return different errors.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Russell King - ARM Linux
On Thu, Nov 22, 2018 at 09:55:25AM -0800, Linus Torvalds wrote:
> On Thu, Nov 22, 2018 at 9:52 AM Robin Murphy  wrote:
> >
> > Unfortunately, with things like the top-down IOVA allocator, and 32-bit
> > systems in general, "the top 4095" values may well still be valid
> > addresses -
> 
> Ugh.
> 
> > The only immediate benefit I can see is that we could distinguish cases
> > like the first which can never possibly succeed, and thus callers could
> > actually give up instead of doing what various subsystems currently do
> > and retry the exact same mapping indefinitely on the apparent assumption
> > that errors must be transient.
> 
> No, the big immediate benefit of allowing "return -EINVAL" etc is
> simply legibility and error avoidance.
> 
> It's basically how pretty much all the rest of the kernel returns
> errors, so not only is it very obvious, it's also what people do
> without even thinking.. So it would be good to work.

An alternative idea would be to migrate away from the
dma_map_single() and dma_map_page() interfaces that return a
dma_addr_t, and instead have them return an error code or zero
on success.

I've thought for some time that our DMA interfaces aren't particularly
friendly, especially after we had the stuff with PCI DMA which migrated
its way into the DMA API:

DEFINE_DMA_UNMAP_ADDR
DEFINE_DMA_UNMAP_LEN
dma_unmap_*

When coupled that with the requirement that dma_unmap_*() should be
called with the same parameters as dma_map_*(), I wonder why we never
did:

struct dma_map_state {
dma_addr_t dma_addr;
whatever's needed;
}

int dma_map_single(struct device *dev, struct dma_map_state *state,
   void *cpu, size_t len, enum dma_data_direction dir);
void dma_unmap_single(struct device *dev, struct dma_map_state *state);

note the simpler unmap API, which inherently guarantees that the
parameters to the map could be carried over to the unmap - without
our many driver authors having to think about it.

That also paves the way for dma_map_single() to return an error code
or zero.

However, I fear that boat sailed long ago - but maybe its worth
thinking along these lines if we want to sanitise the API now?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:52 AM Robin Murphy  wrote:
>
> Unfortunately, with things like the top-down IOVA allocator, and 32-bit
> systems in general, "the top 4095" values may well still be valid
> addresses -

Ugh.

> The only immediate benefit I can see is that we could distinguish cases
> like the first which can never possibly succeed, and thus callers could
> actually give up instead of doing what various subsystems currently do
> and retry the exact same mapping indefinitely on the apparent assumption
> that errors must be transient.

No, the big immediate benefit of allowing "return -EINVAL" etc is
simply legibility and error avoidance.

It's basically how pretty much all the rest of the kernel returns
errors, so not only is it very obvious, it's also what people do
without even thinking.. So it would be good to work.

   Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Robin Murphy

On 22/11/2018 17:09, Linus Torvalds wrote:

On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux
 wrote:


I'm afraid that won't work very well - 32 bit platforms with 64-bit
addresses (LPAE) would have dma_addr_t as a 64-bit value, which
wouldn't fit into an unsigned long.


Good point. So we'd have to have a special IS_DMA_ERR() function that
takes a dma_addr_t and checks the same "is it the top 4095 values".

Because I'd still prefer to allow people to return the _actual_ error,
and to have "return -Exyz" as the error case. That part still DTRT
even with dma_addr_t.


Unfortunately, with things like the top-down IOVA allocator, and 32-bit 
systems in general, "the top 4095" values may well still be valid 
addresses - we're relying on a 1-byte mapping of the very top byte of 
memory/IOVA space being sufficiently ridiculous that no real code would 
ever do that, but even a 4-byte mapping of the top 4 bytes is within the 
realms of the plausible (I've definitely seen the USB layer make 8-byte 
mappings from any old offset within a page, for example).


Thus we'd have to go with the extra complication of detecting and 
carving out problematic memory maps in those corner cases as Russell 
alluded to, for the sake of better error reporting in places where, in 
the majority of cases, there's not really all that many ways to go 
wrong. Off the top of my head, I guess:


-EINVAL if the address is outside the device's DMA mask (and there is no 
IOMMU or bounce buffer).


-ENOSPC if there is an IOMMU or bounce buffer, but no free IOVA/buffer 
space currently.


-ESOMETHINGWENTWRONGWITHIOMMUPAGETABLES or similar, because it's not 
like the caller can treat that as anything other than an opaque failure 
anyway.


The only immediate benefit I can see is that we could distinguish cases 
like the first which can never possibly succeed, and thus callers could 
actually give up instead of doing what various subsystems currently do 
and retry the exact same mapping indefinitely on the apparent assumption 
that errors must be transient.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Russell King - ARM Linux
On Thu, Nov 22, 2018 at 08:50:47AM -0800, Linus Torvalds wrote:
> On Thu, Nov 22, 2018 at 6:03 AM Christoph Hellwig  wrote:
> >
> > The 0 return doesn't work for direct mappings that have ram at address
> > zero and a lot of IOMMUs that start allocating bus space from address
> > zero, so we can't consolidate on that, but I think we can move everyone
> > to all-Fs, which the patch here does.
> 
> Hmm. Maybe not limit it to just all ones, but actually use the
> (standard, for the kernel) IS_ERR_VALUE()?
> 
> That basically reserves the last 4095 values in an unsigned long for
> error values.
> 
> Then those functions could actually return *what* error they
> encountered, using just plain
> 
> return -ENOMEM;
> 
> or whatever?

Linus,

I'm afraid that won't work very well - 32 bit platforms with 64-bit
addresses (LPAE) would have dma_addr_t as a 64-bit value, which
wouldn't fit into an unsigned long.

IS_ERR_VALUE() would cast a 64-bit DMA address down to a 32-bit
pointer (effectively masking with 0x).  It would have the
effect of making (eg) 0x_fVVV an error, where 
are any of the top 32-bits of a 64-bit bus address, and VVV is the
error code value.

That could be a problem if you hit it in several places throughout
your available RAM... we'd have to mark every top page of RAM in
a naturally aligned 4GB as unusable, as well as block the top page
in natually aligned 4GB blocks from IOMMUs... and then what about
buses that have weird offsets...

So, I don't think the IS_ERR_VALUE() would work very well.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Russell King - ARM Linux
On Thu, Nov 22, 2018 at 09:09:47AM -0800, Linus Torvalds wrote:
> On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux
>  wrote:
> >
> > I'm afraid that won't work very well - 32 bit platforms with 64-bit
> > addresses (LPAE) would have dma_addr_t as a 64-bit value, which
> > wouldn't fit into an unsigned long.
> 
> Good point. So we'd have to have a special IS_DMA_ERR() function that
> takes a dma_addr_t and checks the same "is it the top 4095 values".

No problem with that.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:07 AM Russell King - ARM Linux
 wrote:
>
> I'm afraid that won't work very well - 32 bit platforms with 64-bit
> addresses (LPAE) would have dma_addr_t as a 64-bit value, which
> wouldn't fit into an unsigned long.

Good point. So we'd have to have a special IS_DMA_ERR() function that
takes a dma_addr_t and checks the same "is it the top 4095 values".

Because I'd still prefer to allow people to return the _actual_ error,
and to have "return -Exyz" as the error case. That part still DTRT
even with dma_addr_t.

  Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V2

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 6:03 AM Christoph Hellwig  wrote:
>
> The 0 return doesn't work for direct mappings that have ram at address
> zero and a lot of IOMMUs that start allocating bus space from address
> zero, so we can't consolidate on that, but I think we can move everyone
> to all-Fs, which the patch here does.

Hmm. Maybe not limit it to just all ones, but actually use the
(standard, for the kernel) IS_ERR_VALUE()?

That basically reserves the last 4095 values in an unsigned long for
error values.

Then those functions could actually return *what* error they
encountered, using just plain

return -ENOMEM;

or whatever?

   Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu