subject:"\"\\\[Linaro\\\-mm\\\-sig\\\] \\\[RFCv3 2\\\/2\\\] dma\\\-buf\\\: add helpers for sharing attacher constraints with dma\\\-parms\""

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-06-03 Thread Sumit Semwal

On 3 June 2015 at 15:07, Hans Verkuil  wrote:
> On 06/03/15 10:41, Russell King - ARM Linux wrote:
>> On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
>>> Hi Sumit,
>>>
>>> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
 Hi Russell, everyone,

 First up, sincere apologies for being awol for sometime; had some
 personal / medical things to take care of, and then I thought I'd wait
 for the merge window to get over before beginning to discuss this
 again.

 On 11 February 2015 at 21:53, Russell King - ARM Linux
  wrote:
> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>> Hello,
>>
>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>>> Which is a damn good reason to NAK it - by that admission, it's a 
>>> half-baked
>>> idea.
>>>
>>> If all we want to know is whether the importer can accept only 
>>> contiguous
>>> memory or not, make a flag to do that, and allow the exporter to test 
>>> this
>>> flag.  Don't over-engineer this to make it _seem_ like it can do 
>>> something
>>> that it actually totally fails with.
>>>
>>> As I've already pointed out, there's a major problem if you have already
>>> had a less restrictive attachment which has an active mapping, and a new
>>> more restrictive attachment comes along later.
>>>
>>> It seems from Rob's descriptions that we also need another flag in the
>>> importer to indicate whether it wants to have a valid struct page in the
>>> scatter list, or whether it (correctly) uses the DMA accessors on the
>>> scatter list - so that exporters can reject importers which are buggy.
>>
>> Okay, but flag-based approach also have limitations.
>
> Yes, the flag-based approach doesn't let you describe in detail what
> the importer can accept - which, given the issues that I've raised
> is a *good* thing.  We won't be misleading anyone into thinking that
> we can do something that's really half-baked, and which we have no
> present requirement for.
>
> This is precisely what Linus talks about when he says "don't over-
> engineer" - if we over-engineer this, we end up with something that
> sort-of works, and that's a bad thing.
>
> The Keep It Simple approach here makes total sense - what are our
> current requirements - to be able to say that an importer can only accept:
>   - contiguous memory rather than a scatterlist
>   - scatterlists with struct page pointers
>
> Does solving that need us to compare all the constraints of each and
> every importer, possibly ending up with constraints which can't be
> satisfied?  No.  Does the flag approach satisfy the requirements?  Yes.
>

 So, for basic constraint-sharing, we'll just go with the flag based
 approach, with a flag (best place for it is still dev->dma_params I
 suppose) for denoting contiguous or scatterlist. Is that agreed, then?
 Also, with this idea, of course, there won't be any helpers for trying
 to calculate constraints; it would be totally the exporter's
 responsibility to handle it via the attach() dma_buf_op if it wishes
 to.
>>>
>>> What's wrong with the proposed max_segment_count? Many media devices do
>>> have a limited max_segment_count and that should be taken into account.
>>
>> So what happens if you have a dma_buf exporter, and several dma_buf
>> importers.  One dma_buf importer attaches to the exporter, and asks
>> for the buffer, and starts making use of the buffer.  This export has
>> many scatterlist segments.
>>
>> Another dma_buf importer attaches to the same buffer, and now asks for
>> the buffer, but the number of scatterlist segments exceeds it
>> requirement.

So, in the midst of all the various directions this discussion has
taken, I seem to have missed to reiterate the base premise for this
suggestion [1] - that we can use this information to help implement a
deferred allocation logic - so that all the importers can attach
first, and the exporter can do the actual allocation on the first
map() call.
This is also inline with the prescribed usage of dma_buf_attach() /
dma_buf_map_attachment() sequence - ideally speaking, all
participating 'importers' of dma_buf should only attach first, and
then map() at a 'later' time, which is usually right before using the
buffer actually.
Note: at present, both DRI and V4L subsystems don't do that; while
proposing this RFC I had deliberately kept that separate, as it is a
related but orthogonal problem to solve. I guess I should address that
in parallel.
>>
>> You can't reallocate the buffer because it's in-use by another importer.
>> There is no way to revoke the buffer from the other importer.  So there
>> is no way to satisfy this importer's requirements.
>>
You're right; but in a deferred allocation mechanism, this
constraint-sharing can atleast help decide on the most rest

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-06-03 Thread Hans Verkuil

On 06/03/15 10:41, Russell King - ARM Linux wrote:
> On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
>> Hi Sumit,
>>
>> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
>>> Hi Russell, everyone,
>>>
>>> First up, sincere apologies for being awol for sometime; had some
>>> personal / medical things to take care of, and then I thought I'd wait
>>> for the merge window to get over before beginning to discuss this
>>> again.
>>>
>>> On 11 February 2015 at 21:53, Russell King - ARM Linux
>>>  wrote:
 On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
> Hello,
>
> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>> Which is a damn good reason to NAK it - by that admission, it's a 
>> half-baked
>> idea.
>>
>> If all we want to know is whether the importer can accept only contiguous
>> memory or not, make a flag to do that, and allow the exporter to test 
>> this
>> flag.  Don't over-engineer this to make it _seem_ like it can do 
>> something
>> that it actually totally fails with.
>>
>> As I've already pointed out, there's a major problem if you have already
>> had a less restrictive attachment which has an active mapping, and a new
>> more restrictive attachment comes along later.
>>
>> It seems from Rob's descriptions that we also need another flag in the
>> importer to indicate whether it wants to have a valid struct page in the
>> scatter list, or whether it (correctly) uses the DMA accessors on the
>> scatter list - so that exporters can reject importers which are buggy.
>
> Okay, but flag-based approach also have limitations.

 Yes, the flag-based approach doesn't let you describe in detail what
 the importer can accept - which, given the issues that I've raised
 is a *good* thing.  We won't be misleading anyone into thinking that
 we can do something that's really half-baked, and which we have no
 present requirement for.

 This is precisely what Linus talks about when he says "don't over-
 engineer" - if we over-engineer this, we end up with something that
 sort-of works, and that's a bad thing.

 The Keep It Simple approach here makes total sense - what are our
 current requirements - to be able to say that an importer can only accept:
   - contiguous memory rather than a scatterlist
   - scatterlists with struct page pointers

 Does solving that need us to compare all the constraints of each and
 every importer, possibly ending up with constraints which can't be
 satisfied?  No.  Does the flag approach satisfy the requirements?  Yes.

>>>
>>> So, for basic constraint-sharing, we'll just go with the flag based
>>> approach, with a flag (best place for it is still dev->dma_params I
>>> suppose) for denoting contiguous or scatterlist. Is that agreed, then?
>>> Also, with this idea, of course, there won't be any helpers for trying
>>> to calculate constraints; it would be totally the exporter's
>>> responsibility to handle it via the attach() dma_buf_op if it wishes
>>> to.
>>
>> What's wrong with the proposed max_segment_count? Many media devices do
>> have a limited max_segment_count and that should be taken into account.
> 
> So what happens if you have a dma_buf exporter, and several dma_buf
> importers.  One dma_buf importer attaches to the exporter, and asks
> for the buffer, and starts making use of the buffer.  This export has
> many scatterlist segments.
> 
> Another dma_buf importer attaches to the same buffer, and now asks for
> the buffer, but the number of scatterlist segments exceeds it's
> requirement.
> 
> You can't reallocate the buffer because it's in-use by another importer.
> There is no way to revoke the buffer from the other importer.  So there
> is no way to satisfy this importer's requirements.
> 
> What I'm showing is that the idea that exporting these parameters fixes
> some problem is just an illusion - it may work for the single importer
> case, but doesn't for the multiple importer case.
> 
> Importers really have two choices here: either they accept what the
> exporter is giving them, or they reject it.

I agree completely with that.

> The other issue here is that DMA scatterlists are _not_ really that
> determinable in terms of number of entries when it comes to systems with
> system IOMMUs.  System IOMMUs, which should be integrated into the DMA
> API, are permitted to coalesce entries in the physical page range.  For
> example:
> 
>   nsg = 128;
>   n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);
> 
> Here, n might be 4 if the system IOMMU has been able to coalesce the 128
> entries down to 4 IOMMU entries - and that means for DMA purposes, only
> the first four scatterlist entries should be walked (this is why
> dma_map_sg() returns a positive integer when mapping.)
> 
> Each struct device has a set of parameters which control how the IOMMU
> entries are coalesc

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-06-03 Thread Russell King - ARM Linux

On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
> Hi Sumit,
> 
> On 05/05/2015 04:41 PM, Sumit Semwal wrote:
> > Hi Russell, everyone,
> > 
> > First up, sincere apologies for being awol for sometime; had some
> > personal / medical things to take care of, and then I thought I'd wait
> > for the merge window to get over before beginning to discuss this
> > again.
> > 
> > On 11 February 2015 at 21:53, Russell King - ARM Linux
> >  wrote:
> >> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
> >>> Hello,
> >>>
> >>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
>  Which is a damn good reason to NAK it - by that admission, it's a 
>  half-baked
>  idea.
> 
>  If all we want to know is whether the importer can accept only contiguous
>  memory or not, make a flag to do that, and allow the exporter to test 
>  this
>  flag.  Don't over-engineer this to make it _seem_ like it can do 
>  something
>  that it actually totally fails with.
> 
>  As I've already pointed out, there's a major problem if you have already
>  had a less restrictive attachment which has an active mapping, and a new
>  more restrictive attachment comes along later.
> 
>  It seems from Rob's descriptions that we also need another flag in the
>  importer to indicate whether it wants to have a valid struct page in the
>  scatter list, or whether it (correctly) uses the DMA accessors on the
>  scatter list - so that exporters can reject importers which are buggy.
> >>>
> >>> Okay, but flag-based approach also have limitations.
> >>
> >> Yes, the flag-based approach doesn't let you describe in detail what
> >> the importer can accept - which, given the issues that I've raised
> >> is a *good* thing.  We won't be misleading anyone into thinking that
> >> we can do something that's really half-baked, and which we have no
> >> present requirement for.
> >>
> >> This is precisely what Linus talks about when he says "don't over-
> >> engineer" - if we over-engineer this, we end up with something that
> >> sort-of works, and that's a bad thing.
> >>
> >> The Keep It Simple approach here makes total sense - what are our
> >> current requirements - to be able to say that an importer can only accept:
> >>   - contiguous memory rather than a scatterlist
> >>   - scatterlists with struct page pointers
> >>
> >> Does solving that need us to compare all the constraints of each and
> >> every importer, possibly ending up with constraints which can't be
> >> satisfied?  No.  Does the flag approach satisfy the requirements?  Yes.
> >>
> > 
> > So, for basic constraint-sharing, we'll just go with the flag based
> > approach, with a flag (best place for it is still dev->dma_params I
> > suppose) for denoting contiguous or scatterlist. Is that agreed, then?
> > Also, with this idea, of course, there won't be any helpers for trying
> > to calculate constraints; it would be totally the exporter's
> > responsibility to handle it via the attach() dma_buf_op if it wishes
> > to.
> 
> What's wrong with the proposed max_segment_count? Many media devices do
> have a limited max_segment_count and that should be taken into account.

So what happens if you have a dma_buf exporter, and several dma_buf
importers.  One dma_buf importer attaches to the exporter, and asks
for the buffer, and starts making use of the buffer.  This export has
many scatterlist segments.

Another dma_buf importer attaches to the same buffer, and now asks for
the buffer, but the number of scatterlist segments exceeds it's
requirement.

You can't reallocate the buffer because it's in-use by another importer.
There is no way to revoke the buffer from the other importer.  So there
is no way to satisfy this importer's requirements.

What I'm showing is that the idea that exporting these parameters fixes
some problem is just an illusion - it may work for the single importer
case, but doesn't for the multiple importer case.

Importers really have two choices here: either they accept what the
exporter is giving them, or they reject it.

The other issue here is that DMA scatterlists are _not_ really that
determinable in terms of number of entries when it comes to systems with
system IOMMUs.  System IOMMUs, which should be integrated into the DMA
API, are permitted to coalesce entries in the physical page range.  For
example:

nsg = 128;
n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);

Here, n might be 4 if the system IOMMU has been able to coalesce the 128
entries down to 4 IOMMU entries - and that means for DMA purposes, only
the first four scatterlist entries should be walked (this is why
dma_map_sg() returns a positive integer when mapping.)

Each struct device has a set of parameters which control how the IOMMU
entries are coalesced:

struct device_dma_parameters {
/*
 * a low level driver may set these to teach IOMMU code about
 * sg limitations.

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-06-02 Thread Hans Verkuil

Hi Sumit,

On 05/05/2015 04:41 PM, Sumit Semwal wrote:
> Hi Russell, everyone,
> 
> First up, sincere apologies for being awol for sometime; had some
> personal / medical things to take care of, and then I thought I'd wait
> for the merge window to get over before beginning to discuss this
> again.
> 
> On 11 February 2015 at 21:53, Russell King - ARM Linux
>  wrote:
>> On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
>>> Hello,
>>>
>>> On 2015-02-11 12:12, Russell King - ARM Linux wrote:
 Which is a damn good reason to NAK it - by that admission, it's a 
 half-baked
 idea.

 If all we want to know is whether the importer can accept only contiguous
 memory or not, make a flag to do that, and allow the exporter to test this
 flag.  Don't over-engineer this to make it _seem_ like it can do something
 that it actually totally fails with.

 As I've already pointed out, there's a major problem if you have already
 had a less restrictive attachment which has an active mapping, and a new
 more restrictive attachment comes along later.

 It seems from Rob's descriptions that we also need another flag in the
 importer to indicate whether it wants to have a valid struct page in the
 scatter list, or whether it (correctly) uses the DMA accessors on the
 scatter list - so that exporters can reject importers which are buggy.
>>>
>>> Okay, but flag-based approach also have limitations.
>>
>> Yes, the flag-based approach doesn't let you describe in detail what
>> the importer can accept - which, given the issues that I've raised
>> is a *good* thing.  We won't be misleading anyone into thinking that
>> we can do something that's really half-baked, and which we have no
>> present requirement for.
>>
>> This is precisely what Linus talks about when he says "don't over-
>> engineer" - if we over-engineer this, we end up with something that
>> sort-of works, and that's a bad thing.
>>
>> The Keep It Simple approach here makes total sense - what are our
>> current requirements - to be able to say that an importer can only accept:
>>   - contiguous memory rather than a scatterlist
>>   - scatterlists with struct page pointers
>>
>> Does solving that need us to compare all the constraints of each and
>> every importer, possibly ending up with constraints which can't be
>> satisfied?  No.  Does the flag approach satisfy the requirements?  Yes.
>>
> 
> So, for basic constraint-sharing, we'll just go with the flag based
> approach, with a flag (best place for it is still dev->dma_params I
> suppose) for denoting contiguous or scatterlist. Is that agreed, then?
> Also, with this idea, of course, there won't be any helpers for trying
> to calculate constraints; it would be totally the exporter's
> responsibility to handle it via the attach() dma_buf_op if it wishes
> to.

What's wrong with the proposed max_segment_count? Many media devices do have a
limited max_segment_count and that should be taken into account.

Although to be fair I have to say that in practice the limited segment counts
are all well above what is needed for a normal capture buffer, but it might
be reached if you have video strides > line width, so each line has its own 
segment
(or two, if there is a page boundary in the middle of the line). It's a somewhat
contrived use-case, though, although I have done this once myself.

Anyway, the worst-case number of segments would be:

height * ((bytesperline + PAGE_SIZE - 1) / PAGE_SIZE)

That might well exceed the max segment count for some devices.

I think I have also seen devices in the past that have a coarse-grained IOMMU 
that
uses very large segment sizes, but have a very limited segment count. We don't
have media drivers like that, and to be honest I am not sure whether we should
bother too much with this corner case.

>>> Frankly, if we want to make it really portable and sharable between devices,
>>> then IMO we should get rid of struct scatterlist and replace it with simple
>>> array of pfns in dma_buf. This way at least the problem of missing struct
>>> page will be solved and the buffer representation will be also a bit more
>>> compact.
>>
>> ... and move the mapping and unmapping of the PFNs to the importer,
>> which IMHO is where it should already be (so the importer can decide
>> when it should map the buffer itself independently of getting the
>> details of the buffer.)
>>
>>> Such solution however also requires extending dma-mapping API to handle
>>> mapping and unmapping of such pfn arrays. The advantage of this approach
>>> is the fact that it will be completely new API, so it can be designed
>>> well from the beginning.
>>
>> As far as I'm aware, there was no big discussion of the dma_buf API -
>> it's something that just appeared one day (I don't remember seeing it
>> discussed.)  So, that may well be a good thing if it means we can get
>> these kinds of details better hammered out.
>>
>> However, I don't share yo

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Russell King - ARM Linux

On Tue, Feb 03, 2015 at 10:42:26PM +0100, Arnd Bergmann wrote:
> Right, if you have a private iommu, there is no problem. The tricky part
> is using a single driver for the system-level translation and the gpu
> private mappings when there is only one type of iommu in the system.

You've got a problem anyway with this approach.  If you look at my
figure 2 and apply it to this scenario, you have two MMUs stacked
on top of each other.  That's something that (afaik) we don't support,
but it's entirely possible that will come along with ARM64.

It may not be nice to have to treat GPUs specially, but I think we
really do need to, and forget the idea that the GPU's IOMMU (as
opposed to a system MMU) should appear in a generic form in DT.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Daniel Vetter

On Tue, Feb 3, 2015 at 10:42 PM, Arnd Bergmann  wrote:
>> Again assuming I'm not confused can't we just solve this by pushing the
>> dma api abstraction down one layer for just the gpu, and let it use its
>> private iommmu directly? Steps for binding a buffer would be:
>> 1. dma_map_sg
>> 2. Noodle the dma_addr_t out of the sg table and feed those into a 2nd
>> level mapping set up through the iommu api for the gpu-private mmu.
>
> If you want to do that, you run into the problem of telling the driver
> core about it. We associate the device with an iommu in the device
> tree, describing there how it is wired up.
>
> The driver core creates a platform_device for this and checks if it
> an iommu mapping is required or wanted for the device, which is then
> set up. When the device driver wants to create its own iommu mapping,
> this conflicts with the one that is already there. We can't just
> skip the iommu setup for all devices because it may be needed sometimes,
> and I don't really want to see hacks where the driver core knows which
> devices are GPUs and skips the mapping for them, which would be a
> layering violation.

I don't think you get a choice but to make gpus a special case.
There's a bunch of cases why the iommu private to the gpu is special:
- If there's gpu-private iommu at all you have a nice security
problem, and you must scan your cmd stream to make sure no gpu access
goes to arbitrary system memory. We kinda consider isolation between
clients optional, but isolation to everything else is mandatory. And
scanning the cmd stream in software has such big implications on the
design of your driver that you essentially need 2 different drivers.
Even if the IP block otherwise matches.
- If your iommu supports multiple address space then the gpu must
know. We've already covered this case.

So trying to wedge the dma api between the gpu and its private iommu
is imo the layering violation here. Imo the dma api only should
control an iommu for the gpu if:
- the iommu is shared (so can't be used for isolation and you need the
full blwon cmd scanner)
- it's a 2nd level iommu (e.g. what we have on i915) and there is
another private iommu.

Note that with private I only mean no other device can use it, I don't
mean whether it's on the same IP block or not (we even have an iommu
abstraction in i915 because the pagetable walkers are pretty much
separate from everything else and evolve mostly independently).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Arnd Bergmann

On Tuesday 03 February 2015 12:35:34 Rob Clark wrote:
> 
> I can't think of cases outside of GPU's..  if it were more common I'd
> be in favor of teaching dma api about multiple contexts, but right now
> I think that would just amount to forcing a lot of churn on everyone
> else for the benefit of GPU's.

We have a couple of users of the iommu API at the moment outside of
GPUs:

* drivers/media/platform/omap3isp/isp.c
* drivers/remoteproc/remoteproc_core.c
* drivers/infiniband/hw/usnic/usnic_uiom.c
* drivers/vfio/

I assume we will see a few more over time. The vfio case is the most
important one here, since that is what the iommu API was designed for.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Arnd Bergmann

On Tuesday 03 February 2015 21:04:35 Daniel Vetter wrote:

> - On many soc people love to reuse iommus with the same or similar
>   interface all over the place. The solution thus far adopted on arm
>   platforms is to write an iommu driver for those and then implement the
>   dma-api on top of this iommu.
> 
>   But if we unconditionally do this then we rob the gpu driver's ability
>   to control its private iommu like it wants to, because a lot of the
>   functionality is lost behind the dma api abstraction.

I believe in case of rockchips, msm, exynos and tegra, the same
iommu is used directly by the DRM driver while it is used 
implicitly by the dma-mapping API. We have run into some problems
with this in 3.19, but they should be solvable.

> Again assuming I'm not confused can't we just solve this by pushing the
> dma api abstraction down one layer for just the gpu, and let it use its
> private iommmu directly? Steps for binding a buffer would be:
> 1. dma_map_sg
> 2. Noodle the dma_addr_t out of the sg table and feed those into a 2nd
> level mapping set up through the iommu api for the gpu-private mmu.

If you want to do that, you run into the problem of telling the driver
core about it. We associate the device with an iommu in the device
tree, describing there how it is wired up.

The driver core creates a platform_device for this and checks if it
an iommu mapping is required or wanted for the device, which is then
set up. When the device driver wants to create its own iommu mapping,
this conflicts with the one that is already there. We can't just
skip the iommu setup for all devices because it may be needed sometimes,
and I don't really want to see hacks where the driver core knows which
devices are GPUs and skips the mapping for them, which would be a
layering violation.

> Again, this is what i915 and all the ttm based drivers already do, except
> that we don't use the generic iommu interfaces but have our own (i915 has
> its interface in i915_gem_gtt.c, ttm just calls them tt for translation
> tables ...).

Right, if you have a private iommu, there is no problem. The tricky part
is using a single driver for the system-level translation and the gpu
private mappings when there is only one type of iommu in the system.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Daniel Vetter

On Tue, Feb 03, 2015 at 12:35:34PM -0500, Rob Clark wrote:
> On Tue, Feb 3, 2015 at 11:58 AM, Russell King - ARM Linux
>  wrote:
> >
> > Okay, but switching contexts is not something which the DMA API has
> > any knowledge of (so it can't know which context to associate with
> > which mapping.)  While it knows which device, it has no knowledge
> > (nor is there any way for it to gain knowledge) about contexts.
> >
> > My personal view is that extending the DMA API in this way feels quite
> > dirty - it's a violation of the DMA API design, which is to (a) demark
> > the buffer ownership between CPU and DMA agent, and (b) to translate
> > buffer locations into a cookie which device drivers can use to instruct
> > their device to access that memory.  To see why, consider... that you
> > map a buffer to a device in context A, and then you switch to context B,
> > which means the dma_addr_t given previously is no longer valid.  You
> > then try to unmap it... which is normally done using the (now no longer
> > valid) dma_addr_t.
> >
> > It seems to me that to support this at DMA API level, we would need to
> > completely revamp the DMA API, which IMHO isn't going to be nice.  (It
> > would mean that we end up with three APIs - the original PCI DMA API,
> > the existing DMA API, and some new DMA API.)
> >
> > Do we have any views on how common this feature is?
> >
> 
> I can't think of cases outside of GPU's..  if it were more common I'd
> be in favor of teaching dma api about multiple contexts, but right now
> I think that would just amount to forcing a lot of churn on everyone
> else for the benefit of GPU's.
> 
> IMHO it makes more sense for GPU drivers to bypass the dma api if they
> need to.  Plus, sooner or later, someone will discover that with some
> trick or optimization they can get moar fps, but the extra layer of
> abstraction will just be getting in the way.

See my other reply, but all existing full-blown drivers don't bypass the
dma api. Instead it's just a two-level scheme:
1. First level is dma api. Might or might not contain a system iommu.
2. 2nd level is the gpu-private iommu which is also used for per context
address spaces. Thus far all drivers just rolled their own drivers for
this (it's kinda fused to the chips on x86 hw anyway), but it looks like
using the iommu api gives us a somewhat suitable abstraction for code
sharing.

Imo you need both, otherwise we start leaking stuff like cpu cache
flushing all over the place. Looking at i915 (where the dma api assumes
that everything is coherent, which is kinda not the case) that won't be
pretty. And there's still the issue that you might nest a system iommu
and a 2nd level iommu for per-context pagetables (this is real and what's
going on right now on intel hw).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Daniel Vetter

On Tue, Feb 03, 2015 at 05:36:59PM +0100, Arnd Bergmann wrote:
> On Tuesday 03 February 2015 11:22:01 Rob Clark wrote:
> > On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann  wrote:
> > > I agree for the case you are describing here. From what I understood
> > > from Rob was that he is looking at something more like:
> > >
> > > Fig 3
> > > CPU--L1cache--L2cache--Memory--IOMMU-device
> > >
> > > where the IOMMU controls one or more contexts per device, and is
> > > shared across GPU and non-GPU devices. Here, we need to use the
> > > dmap-mapping interface to set up the IO page table for any device
> > > that is unable to address all of system RAM, and we can use it
> > > for purposes like isolation of the devices. There are also cases
> > > where using the IOMMU is not optional.
> > 
> > 
> > Actually, just to clarify, the IOMMU instance is specific to the GPU..
> > not shared with other devices.  Otherwise managing multiple contexts
> > would go quite badly..
> > 
> > But other devices have their own instance of the same IOMMU.. so same
> > driver could be used.
> 
> I think from the driver perspective, I'd view those two cases as
> identical. Not sure if Russell agrees with that.

Imo whether the iommu is private to the device and required for gpu
functionality like context switching or shared across a bunch of devices
is fairly important. Assuming I understand this discussion correctly we
have two different things pulling in opposite directions:

- From a gpu functionality perspective we want to give the gpu driver full
  control over the device-private iommu, pushing it out of the control of
  the dma api. dma_map_sg would just map to whatever bus addresses that
  iommu would need to use for generating access cycles.

  This is the design used by every gpu driver we have in upstream thus far
  (where you always have some on-gpu iommu/pagetable walker thing), on top
  of whatever system iommu that might be there or not (which is then
  managed by the dma apis).

- On many soc people love to reuse iommus with the same or similar
  interface all over the place. The solution thus far adopted on arm
  platforms is to write an iommu driver for those and then implement the
  dma-api on top of this iommu.

  But if we unconditionally do this then we rob the gpu driver's ability
  to control its private iommu like it wants to, because a lot of the
  functionality is lost behind the dma api abstraction.

Again assuming I'm not confused can't we just solve this by pushing the
dma api abstraction down one layer for just the gpu, and let it use its
private iommmu directly? Steps for binding a buffer would be:
1. dma_map_sg
2. Noodle the dma_addr_t out of the sg table and feed those into a 2nd
level mapping set up through the iommu api for the gpu-private mmu.

Again, this is what i915 and all the ttm based drivers already do, except
that we don't use the generic iommu interfaces but have our own (i915 has
its interface in i915_gem_gtt.c, ttm just calls them tt for translation
tables ...).

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark

On Tue, Feb 3, 2015 at 11:58 AM, Russell King - ARM Linux
 wrote:
>
> Okay, but switching contexts is not something which the DMA API has
> any knowledge of (so it can't know which context to associate with
> which mapping.)  While it knows which device, it has no knowledge
> (nor is there any way for it to gain knowledge) about contexts.
>
> My personal view is that extending the DMA API in this way feels quite
> dirty - it's a violation of the DMA API design, which is to (a) demark
> the buffer ownership between CPU and DMA agent, and (b) to translate
> buffer locations into a cookie which device drivers can use to instruct
> their device to access that memory.  To see why, consider... that you
> map a buffer to a device in context A, and then you switch to context B,
> which means the dma_addr_t given previously is no longer valid.  You
> then try to unmap it... which is normally done using the (now no longer
> valid) dma_addr_t.
>
> It seems to me that to support this at DMA API level, we would need to
> completely revamp the DMA API, which IMHO isn't going to be nice.  (It
> would mean that we end up with three APIs - the original PCI DMA API,
> the existing DMA API, and some new DMA API.)
>
> Do we have any views on how common this feature is?
>

I can't think of cases outside of GPU's..  if it were more common I'd
be in favor of teaching dma api about multiple contexts, but right now
I think that would just amount to forcing a lot of churn on everyone
else for the benefit of GPU's.

IMHO it makes more sense for GPU drivers to bypass the dma api if they
need to.  Plus, sooner or later, someone will discover that with some
trick or optimization they can get moar fps, but the extra layer of
abstraction will just be getting in the way.

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Russell King - ARM Linux

On Tue, Feb 03, 2015 at 11:22:01AM -0500, Rob Clark wrote:
> On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann  wrote:
> > I agree for the case you are describing here. From what I understood
> > from Rob was that he is looking at something more like:
> >
> > Fig 3
> > CPU--L1cache--L2cache--Memory--IOMMU-device
> >
> > where the IOMMU controls one or more contexts per device, and is
> > shared across GPU and non-GPU devices. Here, we need to use the
> > dmap-mapping interface to set up the IO page table for any device
> > that is unable to address all of system RAM, and we can use it
> > for purposes like isolation of the devices. There are also cases
> > where using the IOMMU is not optional.
> 
> 
> Actually, just to clarify, the IOMMU instance is specific to the GPU..
> not shared with other devices.  Otherwise managing multiple contexts
> would go quite badly..
> 
> But other devices have their own instance of the same IOMMU.. so same
> driver could be used.

Okay, so that is my Fig.2 case, and we don't have to worry about Fig.3.

One thing I forgot in Fig.1/2 which my original did have were to mark
the system MMU as optional.  (Think an ARM64 with SMMU into a 32-bit
peripheral bus.)  Do we support stacked MMUs in the DMA API?  We may
need to if we keep IOMMUs in the DMA API.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Russell King - ARM Linux

On Tue, Feb 03, 2015 at 05:12:40PM +0100, Arnd Bergmann wrote:
> On Tuesday 03 February 2015 15:54:04 Russell King - ARM Linux wrote:
> > On Tue, Feb 03, 2015 at 04:31:13PM +0100, Arnd Bergmann wrote:
> > > The dma_map_* interfaces assign the virtual addresses internally,
> > > using typically either a global address space for all devices, or one
> > > address space per device.
> > 
> > We shouldn't be doing one address space per device for precisely this
> > reason.  We should be doing one address space per *bus*.  I did have
> > a nice diagram to illustrate the point in my previous email, but I
> > deleted it, I wish I hadn't... briefly:
> > 
> > Fig. 1.
> >  +--+
> >  |+-+  device   |
> > CPU--L1cache--L2cache--Memory--SysMMU---IOMMU--> |
> >  |+-+   |
> >  +--+
> > 
> > Fig.1 represents what I'd call the "GPU" issue that we're talking about
> > in this thread.
> > 
> > Fig. 2.
> > CPU--L1cache--L2cache--Memory--SysMMU-IOMMU--device
> > 
> > The DMA API should be responsible (at the very least) for everything on
> > the left of "" in and should be providing a dma_addr_t which is
> > representative of what the device (in Fig.1) as a whole sees.  That's
> > the "system" part.  
> > 
> > I believe this is the approach which is taken by x86 and similar platforms,
> > simply because they tend not to have an IOMMU on individual devices (and
> > if they did, eg, on a PCI card, it's clearly the responsibility of the
> > device driver.)
> > 
> > Whether the DMA API also handles the IOMMU in Fig.1 or 2 is questionable.
> > For fig.2, it is entirely possible that the same device could appear
> > without an IOMMU, and in that scenario, you would want the IOMMU to be
> > handled transparently.
> > 
> > However, by doing so for everything, you run into exactly the problem
> > which is being discussed here - the need to separate out the cache
> > coherency from the IOMMU aspects.  You probably also have a setup very
> > similar to fig.1 (which is certainly true of Vivante GPUs.)
> > 
> > If you have the need to separately control both, then using the DMA API
> > to encapsulate both does not make sense - at which point, the DMA API
> > should be responsible for the minimum only - in other words, everything
> > to the left of  (so including the system MMU.)  The control of
> > the device IOMMU should be the responsibility of device driver in this
> > case.
> > 
> > So, dma_map_sg() would be responsible for dealing with the CPU cache
> > coherency issues, and setting up the system MMU.  dma_sync_*() would
> > be responsible for the CPU cache coherency issues, and dma_unmap_sg()
> > would (again) deal with the CPU cache and tear down the system MMU
> > mappings.
> > 
> > Meanwhile, the device driver has ultimate control over its IOMMU, the
> > creation and destruction of mappings and context switches at the
> > appropriate times.
> 
> I agree for the case you are describing here. From what I understood
> from Rob was that he is looking at something more like:
> 
> Fig 3
> CPU--L1cache--L2cache--Memory--IOMMU-device
> 
> where the IOMMU controls one or more contexts per device, and is
> shared across GPU and non-GPU devices. Here, we need to use the
> dmap-mapping interface to set up the IO page table for any device
> that is unable to address all of system RAM, and we can use it
> for purposes like isolation of the devices. There are also cases
> where using the IOMMU is not optional.

Okay, but switching contexts is not something which the DMA API has
any knowledge of (so it can't know which context to associate with
which mapping.)  While it knows which device, it has no knowledge
(nor is there any way for it to gain knowledge) about contexts.

My personal view is that extending the DMA API in this way feels quite
dirty - it's a violation of the DMA API design, which is to (a) demark
the buffer ownership between CPU and DMA agent, and (b) to translate
buffer locations into a cookie which device drivers can use to instruct
their device to access that memory.  To see why, consider... that you
map a buffer to a device in context A, and then you switch to context B,
which means the dma_addr_t given previously is no longer valid.  You
then try to unmap it... which is normally done using the (now no longer
valid) dma_addr_t.

It seems to me that to support this at DMA API level, we would need to
completely revamp the DMA API, which IMHO isn't going to be nice.  (It
would mean that we end up with three APIs - the original PCI DMA API,
the existing DMA API, and some new DMA API.)

Do we have any views on how common this feature is?

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Arnd Bergmann

On Tuesday 03 February 2015 11:22:01 Rob Clark wrote:
> On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann  wrote:
> > I agree for the case you are describing here. From what I understood
> > from Rob was that he is looking at something more like:
> >
> > Fig 3
> > CPU--L1cache--L2cache--Memory--IOMMU-device
> >
> > where the IOMMU controls one or more contexts per device, and is
> > shared across GPU and non-GPU devices. Here, we need to use the
> > dmap-mapping interface to set up the IO page table for any device
> > that is unable to address all of system RAM, and we can use it
> > for purposes like isolation of the devices. There are also cases
> > where using the IOMMU is not optional.
> 
> 
> Actually, just to clarify, the IOMMU instance is specific to the GPU..
> not shared with other devices.  Otherwise managing multiple contexts
> would go quite badly..
> 
> But other devices have their own instance of the same IOMMU.. so same
> driver could be used.

I think from the driver perspective, I'd view those two cases as
identical. Not sure if Russell agrees with that.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Rob Clark

On Tue, Feb 3, 2015 at 11:12 AM, Arnd Bergmann  wrote:
> I agree for the case you are describing here. From what I understood
> from Rob was that he is looking at something more like:
>
> Fig 3
> CPU--L1cache--L2cache--Memory--IOMMU-device
>
> where the IOMMU controls one or more contexts per device, and is
> shared across GPU and non-GPU devices. Here, we need to use the
> dmap-mapping interface to set up the IO page table for any device
> that is unable to address all of system RAM, and we can use it
> for purposes like isolation of the devices. There are also cases
> where using the IOMMU is not optional.


Actually, just to clarify, the IOMMU instance is specific to the GPU..
not shared with other devices.  Otherwise managing multiple contexts
would go quite badly..

But other devices have their own instance of the same IOMMU.. so same
driver could be used.

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Arnd Bergmann

On Tuesday 03 February 2015 15:54:04 Russell King - ARM Linux wrote:
> On Tue, Feb 03, 2015 at 04:31:13PM +0100, Arnd Bergmann wrote:
> > The dma_map_* interfaces assign the virtual addresses internally,
> > using typically either a global address space for all devices, or one
> > address space per device.
> 
> We shouldn't be doing one address space per device for precisely this
> reason.  We should be doing one address space per *bus*.  I did have
> a nice diagram to illustrate the point in my previous email, but I
> deleted it, I wish I hadn't... briefly:
> 
> Fig. 1.
>  +--+
>  |+-+  device   |
> CPU--L1cache--L2cache--Memory--SysMMU---IOMMU--> |
>  |+-+   |
>  +--+
> 
> Fig.1 represents what I'd call the "GPU" issue that we're talking about
> in this thread.
> 
> Fig. 2.
> CPU--L1cache--L2cache--Memory--SysMMU-IOMMU--device
> 
> The DMA API should be responsible (at the very least) for everything on
> the left of "" in and should be providing a dma_addr_t which is
> representative of what the device (in Fig.1) as a whole sees.  That's
> the "system" part.  
> 
> I believe this is the approach which is taken by x86 and similar platforms,
> simply because they tend not to have an IOMMU on individual devices (and
> if they did, eg, on a PCI card, it's clearly the responsibility of the
> device driver.)
> 
> Whether the DMA API also handles the IOMMU in Fig.1 or 2 is questionable.
> For fig.2, it is entirely possible that the same device could appear
> without an IOMMU, and in that scenario, you would want the IOMMU to be
> handled transparently.
> 
> However, by doing so for everything, you run into exactly the problem
> which is being discussed here - the need to separate out the cache
> coherency from the IOMMU aspects.  You probably also have a setup very
> similar to fig.1 (which is certainly true of Vivante GPUs.)
> 
> If you have the need to separately control both, then using the DMA API
> to encapsulate both does not make sense - at which point, the DMA API
> should be responsible for the minimum only - in other words, everything
> to the left of  (so including the system MMU.)  The control of
> the device IOMMU should be the responsibility of device driver in this
> case.
> 
> So, dma_map_sg() would be responsible for dealing with the CPU cache
> coherency issues, and setting up the system MMU.  dma_sync_*() would
> be responsible for the CPU cache coherency issues, and dma_unmap_sg()
> would (again) deal with the CPU cache and tear down the system MMU
> mappings.
> 
> Meanwhile, the device driver has ultimate control over its IOMMU, the
> creation and destruction of mappings and context switches at the
> appropriate times.

I agree for the case you are describing here. From what I understood
from Rob was that he is looking at something more like:

Fig 3
CPU--L1cache--L2cache--Memory--IOMMU-device

where the IOMMU controls one or more contexts per device, and is
shared across GPU and non-GPU devices. Here, we need to use the
dmap-mapping interface to set up the IO page table for any device
that is unable to address all of system RAM, and we can use it
for purposes like isolation of the devices. There are also cases
where using the IOMMU is not optional.

So unlike the scenario you describe, the driver cannot at the
same time control the cache (using the dma-mapping API) and
the I/O page tables (using the iommu API or some internal
functions).

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Russell King - ARM Linux

On Tue, Feb 03, 2015 at 04:31:13PM +0100, Arnd Bergmann wrote:
> On Tuesday 03 February 2015 15:22:05 Russell King - ARM Linux wrote:
> > Don't we already have those in the DMA API?  dma_sync_*() ?
> >
> > dma_map_sg() - sets up the system MMU and deals with initial cache
> > coherency handling.  Device IOMMU being the responsibility of the
> > GPU driver.
> 
> dma_sync_*() works with whatever comes out of dma_map_*(), true,
> but this is not what they want to do here.
>  
> > The GPU can then do dma_sync_*() on the scatterlist as is necessary
> > to synchronise the cache coherency (while respecting the ownership
> > rules - which are very important on ARM to follow as some sync()s are
> > destructive to any dirty data in the CPU cache.)
> > 
> > dma_unmap_sg() tears down the system MMU and deals with the final cache
> > handling.
> > 
> > Why do we need more DMA API interfaces?
> 
> The dma_map_* interfaces assign the virtual addresses internally,
> using typically either a global address space for all devices, or one
> address space per device.

We shouldn't be doing one address space per device for precisely this
reason.  We should be doing one address space per *bus*.  I did have
a nice diagram to illustrate the point in my previous email, but I
deleted it, I wish I hadn't... briefly:

Fig. 1.
 +--+
 |+-+  device   |
CPU--L1cache--L2cache--Memory--SysMMU---IOMMU--> |
 |+-+   |
 +--+

Fig.1 represents what I'd call the "GPU" issue that we're talking about
in this thread.

Fig. 2.
CPU--L1cache--L2cache--Memory--SysMMU-IOMMU--device

The DMA API should be responsible (at the very least) for everything on
the left of "" in and should be providing a dma_addr_t which is
representative of what the device (in Fig.1) as a whole sees.  That's
the "system" part.  

I believe this is the approach which is taken by x86 and similar platforms,
simply because they tend not to have an IOMMU on individual devices (and
if they did, eg, on a PCI card, it's clearly the responsibility of the
device driver.)

Whether the DMA API also handles the IOMMU in Fig.1 or 2 is questionable.
For fig.2, it is entirely possible that the same device could appear
without an IOMMU, and in that scenario, you would want the IOMMU to be
handled transparently.

However, by doing so for everything, you run into exactly the problem
which is being discussed here - the need to separate out the cache
coherency from the IOMMU aspects.  You probably also have a setup very
similar to fig.1 (which is certainly true of Vivante GPUs.)

If you have the need to separately control both, then using the DMA API
to encapsulate both does not make sense - at which point, the DMA API
should be responsible for the minimum only - in other words, everything
to the left of  (so including the system MMU.)  The control of
the device IOMMU should be the responsibility of device driver in this
case.

So, dma_map_sg() would be responsible for dealing with the CPU cache
coherency issues, and setting up the system MMU.  dma_sync_*() would
be responsible for the CPU cache coherency issues, and dma_unmap_sg()
would (again) deal with the CPU cache and tear down the system MMU
mappings.

Meanwhile, the device driver has ultimate control over its IOMMU, the
creation and destruction of mappings and context switches at the
appropriate times.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

2015-02-03 Thread Arnd Bergmann

On Tuesday 03 February 2015 15:22:05 Russell King - ARM Linux wrote:
> On Tue, Feb 03, 2015 at 03:52:48PM +0100, Arnd Bergmann wrote:
> > On Tuesday 03 February 2015 14:41:09 Russell King - ARM Linux wrote:
> > > I'd go as far as saying that the "DMA API on top of IOMMU" is more
> > > intended to be for a system IOMMU for the bus in question, rather
> > > than a device-level IOMMU.
> > > 
> > > If an IOMMU is part of a device, then the device should handle it
> > > (maybe via an abstraction) and not via the DMA API.  The DMA API should
> > > be handing the bus addresses to the device driver which the device's
> > > IOMMU would need to generate.  (In other words, in this circumstance,
> > > the DMA API shouldn't give you the device internal address.)
> > 
> > Exactly. And the abstraction that people choose at the moment is the
> > iommu API, for better or worse. It makes a lot of sense to use this
> > API if the same iommu is used for other devices as well (which is
> > the case on Tegra and probably a lot of others). Unfortunately the
> > iommu API lacks support for cache management, and probably other things
> > as well, because this was not an issue for the original use case
> > (device assignment on KVM/x86).
> > 
> > This could be done by adding explicit or implied cache management
> > to the IOMMU mapping interfaces, or by extending the dma-mapping
> > interfaces in a way that covers the use case of the device managing
> > its own address space, in addition to the existing coherent and
> > streaming interfaces.
> 
> Don't we already have those in the DMA API?  dma_sync_*() ?
>
> dma_map_sg() - sets up the system MMU and deals with initial cache
> coherency handling.  Device IOMMU being the responsibility of the
> GPU driver.

dma_sync_*() works with whatever comes out of dma_map_*(), true,
but this is not what they want to do here.
 
> The GPU can then do dma_sync_*() on the scatterlist as is necessary
> to synchronise the cache coherency (while respecting the ownership
> rules - which are very important on ARM to follow as some sync()s are
> destructive to any dirty data in the CPU cache.)
> 
> dma_unmap_sg() tears down the system MMU and deals with the final cache
> handling.
> 
> Why do we need more DMA API interfaces?

The dma_map_* interfaces assign the virtual addresses internally,
using typically either a global address space for all devices, or one
address space per device.

There are multiple things that this cannot do, and that is why the
drivers use the iommu API directly:

- use one address space per 'struct mm'
- map user memory with bus_address == user_address
- map memory into the GPU without having a permanent kernel mapping
- map memory first, and do the initial cache flushes later

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

18 matches

Site Navigation

Mail list logo

Footer information