Re: How to efficiently handle DMA and cache on ARMv7 ? (was Is get_user_pages() enough to prevent pages from being swapped out ?)

2009-08-26 Thread David Xiao
On Tue, 2009-08-25 at 16:17 -0700, Laurent Pinchart wrote:
 On Wednesday 26 August 2009 00:02:48 David Xiao wrote:
  On Tue, 2009-08-25 at 05:53 -0700, Steven Walter wrote:
   On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM
   Linuxli...@arm.linux.org.uk wrote:
   [...]
  
As far as userspace DMA coherency, the only way you could do it with
current kernel APIs is by using get_user_pages(), creating a
scatterlist from those, and then passing it to dma_map_sg().  While the
device has ownership of the SG, userspace must _not_ touch the buffer
until after DMA has completed.
  
   [...]
  
   Would that work on a processor with VIVT caches?  It seems not.  In
   particular, dma_map_page uses page_address to get a virtual address to
   pass to map_single().  map_single() in turn uses this address to
   perform cache maintenance.  Since page_address() returns the kernel
   virtual address, I don't see how any cache-lines for the userspace
   virtual address would get invalidated (for the DMA_FROM_DEVICE case).
  
   If that's true, then what is the correct way to allow DMA to/from a
   userspace buffer with a VIVT cache?  If not true, what am I missing?
 
  page_address() is basically returning page-virtual, which records the
  virtual/physical mapping for both user/kernel space; and what only
  matters there is highmem or not.
 
 I'm not sure to get it. Are you implying that a physical page will then be 
 mapped to the same address in all contexts (kernelspace and userspace 
 processes) ? Is that even possible ? And if not, how could page-virtual 
 store 
 both the initial kernel map and all the userspace mappings ?
 
Sorry for the confusion, page_address() indeed only returns kernel
virtual address; and in order to support VIVT cache maintenance for the
user space mappings, the dma_map_sg/dma_map_page() functions or even the
struct scatterlist do seem to have to be modified to pass in virtual
address, I think.

David


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to efficiently handle DMA and cache on ARMv7 ? (was Is get_user_pages() enough to prevent pages from being swapped out ?)

2009-08-25 Thread David Xiao
On Tue, 2009-08-25 at 05:53 -0700, Steven Walter wrote:
 On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM
 Linuxli...@arm.linux.org.uk wrote:
 [...]
  As far as userspace DMA coherency, the only way you could do it with
  current kernel APIs is by using get_user_pages(), creating a scatterlist
  from those, and then passing it to dma_map_sg().  While the device has
  ownership of the SG, userspace must _not_ touch the buffer until after
  DMA has completed.
 [...]
 
 Would that work on a processor with VIVT caches?  It seems not.  In
 particular, dma_map_page uses page_address to get a virtual address to
 pass to map_single().  map_single() in turn uses this address to
 perform cache maintenance.  Since page_address() returns the kernel
 virtual address, I don't see how any cache-lines for the userspace
 virtual address would get invalidated (for the DMA_FROM_DEVICE case).
 
 If that's true, then what is the correct way to allow DMA to/from a
 userspace buffer with a VIVT cache?  If not true, what am I missing?

page_address() is basically returning page-virtual, which records the
virtual/physical mapping for both user/kernel space; and what only
matters there is highmem or not.

David 


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to efficiently handle DMA and cache on ARMv7 ? (was Is get_user_pages() enough to prevent pages from being swapped out ?)

2009-08-11 Thread David Xiao
On Tue, 2009-08-11 at 02:31 -0700, Catalin Marinas wrote:
 On Thu, 2009-08-06 at 22:59 -0700, David Xiao wrote:
  The V7 speculative prefetching will then probably apply to DMA coherency
  issue in general, both kernel and user space DMAs. Could this be
  addressed by inside the dma_unmap_sg/single() calling dma_cache_maint()
  when the direction is DMA_FROM_DEVICE/DMA_BIDIRECTIONAL, to basically
  invalidate the related cache lines in case any filled by prefetching?
  Assuming dma_unmap_sg/single() is called after each DMA operation is
  completed. 
 
 Theoretically, with speculative prefetching on ARMv7 and the FROM_DEVICE
 case we need to invalidate the corresponding D-cache lines both before
 and after the DMA transfer, i.e. in both dma_map_sg and dma_unmap_sg,
 otherwise there is a risk of stale data in the cache.
 
The dma_map_sg() code is already calling dma_cache_maint() to invalidate
the cache lines in the DMA_FROM_DEVICE/DMA_BIDIRECTIONAL direction
cases. And the suggestion was to do something similar in dma_unmap_sg()
case to deal with the speculative prefetching on ARMv7, and Russel has
other postings talking about the details of this in terms of
feasibility/etc.

Furthermore, duplicate MMU mappings in the kernel bring more twists to
this problem as explained in this email chain as well, especially in the
case of DMA-coherent memory (dma_alloc_coherent()).

David   


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to efficiently handle DMA and cache on ARMv7 ? (was Is get_user_pages() enough to prevent pages from being swapped out ?)

2009-08-07 Thread David Xiao
On Fri, 2009-08-07 at 13:28 -0700, Russell King - ARM Linux wrote:

 The kernel direct mapping maps all system (low) memory with normal
 memory cacheable attributes.
 
 So using vmalloc, dma_alloc_coherent, using pages in userspace all
 create duplicate mappings of pages.
 

If we do want to remove all these duplicate mappings, as part of
solution to deal with the speculative prefetching, probably one way is
to not map all the RAM into the direct-mapped space at paging_init()
time, and instead map them on-demand by different upper layer allocation
functions, such as vmalloc/dma_alloc_coherent/do_brk/kmalloc/
get_free_pages/etc. But then the distinction between upper layer
allocation functions and non-upper layer ones must be made clear though.

I know that mapping the RAM at paging_init() time can take advantage of
1M section mapping most of the time, and thus save many 1KB L2 page
tables. But a lot of memory still ends up being remapped with L2 page
tables later on, and meanwhile 1KB might not be as precious as it used
to be as well-:)

David



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to efficiently handle DMA and cache on ARMv7 ? (was Is get_user_pages() enough to prevent pages from being swapped out ?)

2009-08-06 Thread David Xiao
On Thu, 2009-08-06 at 06:06 -0700, Laurent Pinchart wrote:
 Hi Ben,
 
 On Thursday 06 August 2009 13:46:19 Ben Dooks wrote:
  On Thu, Aug 06, 2009 at 12:08:21PM +0200, Laurent Pinchart wrote:
 [snip]
  
   The second problem is to ensure cache coherency. As the userspace
   application will read data from the video buffers, those buffers will end
   up being cached in the processor's data cache. The driver does need to
   invalidate the cache before starting the DMA operation (userspace could
   in theory write to the buffers, but the data will be overwritten by DMA
   anyway, so there's no need to clean the cache).
 
  You'll need to clean the write buffers, otherwise the CPU may have data
  queued that it has yet to write back to memory.
 
 Good points, thanks.

   I thought this should have been taken care of by the CPU specific
dma_inv_range routine. However, In arch/arm/mm/cache-v7.c,
v7_dma_inv_range does not drain the write buffer; and the
v6_dma_inv_range does that in the end of all the cache maintenance
operaitons.
   So this is probably something Russel can clarify.

 
   As the cache is of the VIPT (Virtual Index Physical Tag) type, cache
   invalidation can either be done globally (in which case the cache is
   flushed instead of being invalidated) or based on virtual addresses. In
   the last case the processor will need to look physical addresses up,
   either in the TLB or through hardware table walk.
  
   I can see three solutions to the DMA/cache problem.
  
   1. Flushing the whole data cache right before starting the DMA transfer.
   There's no API for that in the ARM architecture, so a whole I+D cache is
   required. This is quite costly, we're talking about around 30 flushes per
   second, but it doesn't involve the MMU. That's the solution that I
   currently use.
  
   2. Invalidating only the cache lines that store video buffer data. This
   requires a TLB lookup or a hardware table walk, so the userspace
   application MM context needs to be available (no problem there as where's
   flushing in userspace context) and all pages need to be mapped properly.
   This can be a problem as, as Hugh pointed out, pages can still be
   unmapped from the userspace context after get_user_pages() returns. I
   have experienced one oops due to a kernel paging request failure:
 
  If you already know the virtual addresses of the buffers, why do you need
  a TLB lookup (or am I being dense here?)
 
 The virtual address is used to compute the cache lines index, and the 
 physical 
 address is then used when comparing the cache line tag. So the processor (or 
 actually the CP15 coprocessor if I'm not wrong) does a TLB lookup to get the 
 physical address during cache invalidation/flushing.
 
   Unable to handle kernel paging request at virtual address
   44e12000 pgd = c8698000
   [44e12000] *pgd=8a4fd031, *pte=8cfda1cd, *ppte=
   Internal error: Oops: 817 [#1] PREEMPT
   PC is at v7_dma_inv_range+0x2c/0x44
  
   Fixing this requires more investigation, and I'm not sure how to proceed
   to find out if the page fault is really caused by pages being unmapped
   from the userspace context. Help would be appreciated.
  
   3. Mark the pages as non-cacheable. Depending on how the buffers are then
   used by userspace, the additional cache misses might destroy any benefit
   I would get from not flushing the cache before DMA. I'm not sure how to
   mark a bunch of pages as non-cacheable though. What usually happens is
   that video drivers allocate DMA-coherent memory themselves, but in this
   case I need to deal with an arbitrary buffer allocated by userspace. If
   someone has any experience with this, it would be appreciated.
 

Another approach is working from a different direction: the kernel
allocates the non-cached buffer and then mmap() into user space. I have
done that in similar situation to try to achieve zero-copy. 


David


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html