[RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-28 Thread Daniel Vetter
On Mon, Nov 28, 2011 at 08:47:31AM +0100, Marek Szyprowski wrote:
> On Tuesday, November 08, 2011 7:43 PM Daniel Vetter wrote:
> > Thanks for the clarification. I think this is another reason why
> > get_scatterlist should return the sg_list already mapped into the device
> > address space - it's more consisten with the other dma apis. Another
> > reason to completely hide everything but mapped addresses is crazy stuff
> > like this
> > 
> > mem <---> tiling iommu <-+-> gpu
> >  |
> >  +-> scanout engine
> >  |
> >  +-> mpeg decoder
> > 
> > where it doesn't really make sense to talk about the memory backing the
> > dma buffer because that's smeared all over the place due to tiling. IIRC
> > for the case of omap these devices can also access memory through other
> > paths and iommut that don't tile (but just remap like a normal iommu)
> 
> I really don't get why you want to force the exporter to map the buffer into
> clients dma address space. Only the client device might know all the quirks
> required to do this correctly. Exporter should only provide a scatter-list 
> with the memory that belongs to the exported buffer (might be pinned). How
> do you want to solve the following case - the gpu hardware from your diagram
> and a simple usb webcam with generic driver. The application would like to
> export a buffer from the webcam to scanout engine. How the generic webcam 
> driver might know HOW to set up the tiller to create correct mappings for 
> the GPU/scanout? IMHO only a GPU driver is capable of doing that assuming
> it got just a scatter list from the webcam driver.

You're correct that only the gpu knows how to put things into the tiler
(and maybe other devices that have access to it). Let me expand my diagram
so that you're webcam fits into the picture.

mem <-+-> tiling iommu <-+-> gpu
  |  |
  |  +-> scanout engine
  |  |
  |  +-> mpeg decoder
  |  |
  |  |
  +-> direct dma   <-+
  |  
  +-> iommua A <-+-> usb hci   
 |
 +-> other devices   
 |
 ...

A few notes:
- might not be exactly how omap really looks like
- the devices behind tiler have different device address space windows to
  access the different paths to main memory. No other device can access
  the tiler, iirc.
- your webcam doesn't exist on this because we can't dma from it's memory,
  we can only zero-copy from the memory the usb hci transferred the frame
  to.

Now when when e.g. the scanout engine calls get_scatterlist you only call
dma_map_sg (which does nothing, because there's no iommu that's managed by
the core kernel code for it). The scanout engine will then complain that
your stuff is not contiguous and bail out. Or it is indeed contiguous and
things Just Work.

The much more interesting case is when the mpeg decoder and the gpu share
a buffer (think video on rotating cube or whatever other gui transition
you fancy). Then the omap tiler code can check whether the the device sits
behind the tiler (e.g. with some omap-specific device tree attribute) and
hand out a linear view to a tiled buffer.

In other words, whereever you're currently calling one of the map/unmap
dma api variants, you would call get/put_scatterlist (or better the new
name I'm proposing). So I also don't see your argument about only the
client knows how to map something into address space.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-28 Thread Marek Szyprowski
Hello,

I'm sorry for the late reply, I must have missed this mail...

On Tuesday, November 08, 2011 7:43 PM Daniel Vetter wrote:

> On Tue, Nov 08, 2011 at 05:55:17PM +, Russell King - ARM Linux wrote:
> > On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:
> > > Actually I think the importer should get a _mapped_ scatterlist when it
> > > calls get_scatterlist. The simple reason is that for strange stuff like
> > > memory remapped into e.g. omaps TILER doesn't have any sensible notion of
> > > an address in physical memory. For the USB-example I think the right
> > > approach is to attach the usb hci to the dma_buf, after all that is the
> > > device that will read the data and move over the usb bus to the udl
> > > device. Similar for any other device that sits behind a bus that can't do
> > > dma (or it doesn't make sense to do dma).
> > >
> > > Imo if there's a use-case where the client needs to frob the sg_list
> > > before calling dma_map_sg, we have an issue with the dma subsystem in
> > > general.
> >
> > Let's clear something up about the DMA API, which I think is causing some
> > misunderstanding here.  For this purpose, I'm going to talk about
> > dma_map_single(), but the same applies to the scatterlist and _page
> > variants as well.
> >
> > dma = dma_map_single(dev, cpuaddr, size, dir);
> >
> > dev := the device _performing_ the DMA operation.  You are quite correct
> >that in the case of a USB peripheral device, the device is normally
> >the USB HCI device.
> >
> > dma := dma address to be programmed into 'dev' which corresponds (by some
> >means) with 'cpuaddr'.  This may not be the physical address due
> >to bus offset translations or mappings setup in IOMMUs.
> >
> > Therefore, it is wrong to talk about a 'physical address' when talking
> > about the DMA API.
> >
> > We can take this one step further.  Lets say that the USB HCI is not
> > capable of performing memory accesses itself, but it is connected to a
> > separate DMA engine device:
> >
> > mem <---> dma engine <---> usb hci <---> usb peripheral
> >
> > (such setups do exist, but despite having such implementations I've never
> > tried to support it.)
> >
> > In this case, the dma engine, in response to control signals from the
> > USB host controller, will generate the appropriate bus address to access
> > memory and transfer the data into the USB HCI device.
> >
> > So, in this case, the struct device to be used for mapping memory for
> > transfers to the usb peripheral is the DMA engine device, not the USB HCI
> > device nor the USB peripheral device.
> 
> Thanks for the clarification. I think this is another reason why
> get_scatterlist should return the sg_list already mapped into the device
> address space - it's more consisten with the other dma apis. Another
> reason to completely hide everything but mapped addresses is crazy stuff
> like this
> 
>   mem <---> tiling iommu <-+-> gpu
>|
>+-> scanout engine
>|
>+-> mpeg decoder
> 
> where it doesn't really make sense to talk about the memory backing the
> dma buffer because that's smeared all over the place due to tiling. IIRC
> for the case of omap these devices can also access memory through other
> paths and iommut that don't tile (but just remap like a normal iommu)

I really don't get why you want to force the exporter to map the buffer into
clients dma address space. Only the client device might know all the quirks
required to do this correctly. Exporter should only provide a scatter-list 
with the memory that belongs to the exported buffer (might be pinned). How
do you want to solve the following case - the gpu hardware from your diagram
and a simple usb webcam with generic driver. The application would like to
export a buffer from the webcam to scanout engine. How the generic webcam 
driver might know HOW to set up the tiller to create correct mappings for 
the GPU/scanout? IMHO only a GPU driver is capable of doing that assuming
it got just a scatter list from the webcam driver.

Best regards
-- 
Marek Szyprowski
Samsung Poland R Center





[RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-08 Thread Daniel Vetter
On Tue, Nov 08, 2011 at 05:55:17PM +, Russell King - ARM Linux wrote:
> On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:
> > Actually I think the importer should get a _mapped_ scatterlist when it
> > calls get_scatterlist. The simple reason is that for strange stuff like
> > memory remapped into e.g. omaps TILER doesn't have any sensible notion of
> > an address in physical memory. For the USB-example I think the right
> > approach is to attach the usb hci to the dma_buf, after all that is the
> > device that will read the data and move over the usb bus to the udl
> > device. Similar for any other device that sits behind a bus that can't do
> > dma (or it doesn't make sense to do dma).
> > 
> > Imo if there's a use-case where the client needs to frob the sg_list
> > before calling dma_map_sg, we have an issue with the dma subsystem in
> > general.
> 
> Let's clear something up about the DMA API, which I think is causing some
> misunderstanding here.  For this purpose, I'm going to talk about
> dma_map_single(), but the same applies to the scatterlist and _page
> variants as well.
> 
>   dma = dma_map_single(dev, cpuaddr, size, dir);
> 
> dev := the device _performing_ the DMA operation.  You are quite correct
>that in the case of a USB peripheral device, the device is normally
>the USB HCI device.
> 
> dma := dma address to be programmed into 'dev' which corresponds (by some
>means) with 'cpuaddr'.  This may not be the physical address due
>to bus offset translations or mappings setup in IOMMUs.
> 
> Therefore, it is wrong to talk about a 'physical address' when talking
> about the DMA API.
> 
> We can take this one step further.  Lets say that the USB HCI is not
> capable of performing memory accesses itself, but it is connected to a
> separate DMA engine device:
> 
>   mem <---> dma engine <---> usb hci <---> usb peripheral
> 
> (such setups do exist, but despite having such implementations I've never
> tried to support it.)
> 
> In this case, the dma engine, in response to control signals from the
> USB host controller, will generate the appropriate bus address to access
> memory and transfer the data into the USB HCI device.
> 
> So, in this case, the struct device to be used for mapping memory for
> transfers to the usb peripheral is the DMA engine device, not the USB HCI
> device nor the USB peripheral device.

Thanks for the clarification. I think this is another reason why
get_scatterlist should return the sg_list already mapped into the device
address space - it's more consisten with the other dma apis. Another
reason to completely hide everything but mapped addresses is crazy stuff
like this

mem <---> tiling iommu <-+-> gpu
 |
 +-> scanout engine
 |
 +-> mpeg decoder

where it doesn't really make sense to talk about the memory backing the
dma buffer because that's smeared all over the place due to tiling. IIRC
for the case of omap these devices can also access memory through other
paths and iommut that don't tile (but just remap like a normal iommu)

-Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-08 Thread Daniel Vetter
On Tue, Nov 08, 2011 at 10:59:56AM -0600, Clark, Rob wrote:
> On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski
> > 2. dma-mapping api is very limited in the area of the dynamic buffer 
> > management,
> > this API has been designed definitely for static buffer allocation and 
> > mapping.
> >
> > It looks that fully dynamic buffer management requires a complete change of
> > v4l2 api principles (V4L3?) and a completely new DMA API interface. That's
> > probably the reason by none of the GPU driver relies on the DMA-mapping API
> > and implements custom solution for managing the mappings.
> >
> > This reminds me one more issue I've noticed in the current dma buf proof-of-
> > concept. You assumed that the exporter will be responsible for mapping the
> > buffer into io address space of all the client devices. What if the device
> > needs additional custom hooks/hacks during the mappings? This will be a 
> > serious
> > problem for the current GPU drivers for example. IMHO the API will be much
> > clearer if each client driver will map the scatter list gathered from the
> > dma buf by itself. Only the client driver has the complete knowledge how
> > to do this correctly for this particular device. This way it will also work
> > with devices that don't do the real DMA (like for example USB devices that
> > copy all data from usb packets to the target buffer with the cpu).
> 
> The exporter doesn't map.. it returns a scatterlist to the importer.
> But the exporter does allocate and pin backing pages.  And it is
> preferable if the exporter has the opportunity to wait until as much
> is known about the various importing devices to know if it must
> allocate contiguous pages, or pages in a certain range.

Actually I think the importer should get a _mapped_ scatterlist when it
calls get_scatterlist. The simple reason is that for strange stuff like
memory remapped into e.g. omaps TILER doesn't have any sensible notion of
an address in physical memory. For the USB-example I think the right
approach is to attach the usb hci to the dma_buf, after all that is the
device that will read the data and move over the usb bus to the udl
device. Similar for any other device that sits behind a bus that can't do
dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list
before calling dma_map_sg, we have an issue with the dma subsystem in
general.

> That said, on a platform where everything had iommu's or somehow
> didn't have any particular memory requirements, or where the exporter
> had the strictest requirements (or at least knew of the strictest
> requirements), then the exporter is free to allocate/pin the backing
> pages earlier, such as even before the buffer is exported.

Yeah, I think the important thing is that the dma_buf api should allow
decent buffer management. If certain subsystems ignore that and just
allocate up-front, no problem for me. But given how all graphics drivers
for essentially all OS have moved to dynamic buffer management, I expect
decoders, encoders, v4l devices and whatever else might sit in a graphics
pipeline to follow.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-08 Thread Russell King - ARM Linux
On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:
> Actually I think the importer should get a _mapped_ scatterlist when it
> calls get_scatterlist. The simple reason is that for strange stuff like
> memory remapped into e.g. omaps TILER doesn't have any sensible notion of
> an address in physical memory. For the USB-example I think the right
> approach is to attach the usb hci to the dma_buf, after all that is the
> device that will read the data and move over the usb bus to the udl
> device. Similar for any other device that sits behind a bus that can't do
> dma (or it doesn't make sense to do dma).
> 
> Imo if there's a use-case where the client needs to frob the sg_list
> before calling dma_map_sg, we have an issue with the dma subsystem in
> general.

Let's clear something up about the DMA API, which I think is causing some
misunderstanding here.  For this purpose, I'm going to talk about
dma_map_single(), but the same applies to the scatterlist and _page
variants as well.

dma = dma_map_single(dev, cpuaddr, size, dir);

dev := the device _performing_ the DMA operation.  You are quite correct
   that in the case of a USB peripheral device, the device is normally
   the USB HCI device.

dma := dma address to be programmed into 'dev' which corresponds (by some
   means) with 'cpuaddr'.  This may not be the physical address due
   to bus offset translations or mappings setup in IOMMUs.

Therefore, it is wrong to talk about a 'physical address' when talking
about the DMA API.

We can take this one step further.  Lets say that the USB HCI is not
capable of performing memory accesses itself, but it is connected to a
separate DMA engine device:

mem <---> dma engine <---> usb hci <---> usb peripheral

(such setups do exist, but despite having such implementations I've never
tried to support it.)

In this case, the dma engine, in response to control signals from the
USB host controller, will generate the appropriate bus address to access
memory and transfer the data into the USB HCI device.

So, in this case, the struct device to be used for mapping memory for
transfers to the usb peripheral is the DMA engine device, not the USB HCI
device nor the USB peripheral device.


Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-08 Thread Daniel Vetter
On Tue, Nov 08, 2011 at 10:59:56AM -0600, Clark, Rob wrote:
 On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski
  2. dma-mapping api is very limited in the area of the dynamic buffer 
  management,
  this API has been designed definitely for static buffer allocation and 
  mapping.
 
  It looks that fully dynamic buffer management requires a complete change of
  v4l2 api principles (V4L3?) and a completely new DMA API interface. That's
  probably the reason by none of the GPU driver relies on the DMA-mapping API
  and implements custom solution for managing the mappings.
 
  This reminds me one more issue I've noticed in the current dma buf proof-of-
  concept. You assumed that the exporter will be responsible for mapping the
  buffer into io address space of all the client devices. What if the device
  needs additional custom hooks/hacks during the mappings? This will be a 
  serious
  problem for the current GPU drivers for example. IMHO the API will be much
  clearer if each client driver will map the scatter list gathered from the
  dma buf by itself. Only the client driver has the complete knowledge how
  to do this correctly for this particular device. This way it will also work
  with devices that don't do the real DMA (like for example USB devices that
  copy all data from usb packets to the target buffer with the cpu).
 
 The exporter doesn't map.. it returns a scatterlist to the importer.
 But the exporter does allocate and pin backing pages.  And it is
 preferable if the exporter has the opportunity to wait until as much
 is known about the various importing devices to know if it must
 allocate contiguous pages, or pages in a certain range.

Actually I think the importer should get a _mapped_ scatterlist when it
calls get_scatterlist. The simple reason is that for strange stuff like
memory remapped into e.g. omaps TILER doesn't have any sensible notion of
an address in physical memory. For the USB-example I think the right
approach is to attach the usb hci to the dma_buf, after all that is the
device that will read the data and move over the usb bus to the udl
device. Similar for any other device that sits behind a bus that can't do
dma (or it doesn't make sense to do dma).

Imo if there's a use-case where the client needs to frob the sg_list
before calling dma_map_sg, we have an issue with the dma subsystem in
general.

 That said, on a platform where everything had iommu's or somehow
 didn't have any particular memory requirements, or where the exporter
 had the strictest requirements (or at least knew of the strictest
 requirements), then the exporter is free to allocate/pin the backing
 pages earlier, such as even before the buffer is exported.

Yeah, I think the important thing is that the dma_buf api should allow
decent buffer management. If certain subsystems ignore that and just
allocate up-front, no problem for me. But given how all graphics drivers
for essentially all OS have moved to dynamic buffer management, I expect
decoders, encoders, v4l devices and whatever else might sit in a graphics
pipeline to follow.

Yours, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanismch

2011-11-08 Thread Russell King - ARM Linux
On Tue, Nov 08, 2011 at 06:42:27PM +0100, Daniel Vetter wrote:
 Actually I think the importer should get a _mapped_ scatterlist when it
 calls get_scatterlist. The simple reason is that for strange stuff like
 memory remapped into e.g. omaps TILER doesn't have any sensible notion of
 an address in physical memory. For the USB-example I think the right
 approach is to attach the usb hci to the dma_buf, after all that is the
 device that will read the data and move over the usb bus to the udl
 device. Similar for any other device that sits behind a bus that can't do
 dma (or it doesn't make sense to do dma).
 
 Imo if there's a use-case where the client needs to frob the sg_list
 before calling dma_map_sg, we have an issue with the dma subsystem in
 general.

Let's clear something up about the DMA API, which I think is causing some
misunderstanding here.  For this purpose, I'm going to talk about
dma_map_single(), but the same applies to the scatterlist and _page
variants as well.

dma = dma_map_single(dev, cpuaddr, size, dir);

dev := the device _performing_ the DMA operation.  You are quite correct
   that in the case of a USB peripheral device, the device is normally
   the USB HCI device.

dma := dma address to be programmed into 'dev' which corresponds (by some
   means) with 'cpuaddr'.  This may not be the physical address due
   to bus offset translations or mappings setup in IOMMUs.

Therefore, it is wrong to talk about a 'physical address' when talking
about the DMA API.

We can take this one step further.  Lets say that the USB HCI is not
capable of performing memory accesses itself, but it is connected to a
separate DMA engine device:

mem --- dma engine --- usb hci --- usb peripheral

(such setups do exist, but despite having such implementations I've never
tried to support it.)

In this case, the dma engine, in response to control signals from the
USB host controller, will generate the appropriate bus address to access
memory and transfer the data into the USB HCI device.

So, in this case, the struct device to be used for mapping memory for
transfers to the usb peripheral is the DMA engine device, not the USB HCI
device nor the USB peripheral device.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel