Re: Question about lima kernel MM implementation

2018-02-18 Thread Qiang Yu
>
>
> >
> > Current MM:
> > 1. drm_gem_cma_object, only support contiguous memory
>
> Please note that drm_gem_cma_object only looks at memory after the MMU
> has done the mapping. If you have a good IOMMU driver that registers
> correctly the dma_ops then you can allocate memory from anywhere and
> still import it into the lima driver via drm_gem_cma_prime_import_sg_
> table()
> hook attached to the gem_prime_import_sg_table.
>
Thanks, good to know this. But some Mali400/450 SoC doesn't have IOMMU
at all, so I can't rely on it.

Regards,
Qiang


>
> > 3. buffer is not really allocated when GEM_CREATE, but in CPU
> > page fault handler and task submit buffer validation which make
> > sure no GPU page fault
> > 4. in shrinker handler, free un-used page in the pool, if still not
> > enough, swap some idle buffer to disk
> >
> > 3&4 apply to both dma_alloc buffer and alloc_page buffer.
> >
> > Thanks,
> > Qiang
>
> > ___
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> --
>  /`\
> / : |
>_.._ | '/
>  /`\| /
> |  .-._ '-"` (
> |_/   /   o  o\
>   |  == () ==
>\   -- /   __
> 
>/ ---<_  |
> |___
>   |  \\ \   |  I would like to fix the world but
>  |  /
>   | | \\__   \  |   no one gives me the source code.
>  | /
>   / ; |.__)  /  |__|
>\
>  (_/.-.   ; /__)
> (_\
> { `|   \_/
>  '-\   / |
> | /  |
>/  \  '-.
>\__|-'
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Question about lima kernel MM implementation

2018-02-15 Thread Liviu Dudau
On Tue, Feb 13, 2018 at 09:34:26PM +0800, Qiang Yu wrote:
> Hi guys,
> 
> I'm working on the Lima project for ARM mali400/450 GPU. Now lima
> kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
> for each vertex/fragment shader processor, so I want to refine the lima
> kernel driver for non-contiguous memory support.
> 
> After some investigation on current available MM method used by
> several linux kernel DRM driver, I can't find an exactly match one for
> lima. So I'd like to hear some advise from you and see if I have some
> miss understanding on current MMs and if there's better approach.
> If can't use existing MM, I may have to write one for lima.
> 
> About Mali400/450 GPU:
> 1. it has separate vertex and fragment shader processors, 1 vertex
> processor and 1~4 fragment processors are grouped to process an
> OpenGL draw
> 2. each processor has an MMU work independently
> 3. Mali400/450 will work with different display DRM driver, some
> display DRM driver support non-contiguous framebuffer and some
> not
> 
> My requirement:
> 1. support non-contiguous memory allocation as GPU buffer
> 2. support contiguous memory allocation too for exporting to some
> display DRM driver as framebuffer
> 3. no GPU page fault for better performance and avoid multi MMU
> page fault handling, CPU page fault is OK
> 4. better have buffer swap to disk feature when memory is full
> 
> Current MM:
> 1. drm_gem_cma_object, only support contiguous memory

Please note that drm_gem_cma_object only looks at memory after the MMU
has done the mapping. If you have a good IOMMU driver that registers
correctly the dma_ops then you can allocate memory from anywhere and
still import it into the lima driver via drm_gem_cma_prime_import_sg_table()
hook attached to the gem_prime_import_sg_table.

> 2. drm_gem_get_pages
>   1) need to combine with cma method for contiguous memory
>   2) when shrink is needed, swap some idle buffer to disk and put
>   pages, need implement by myself
>   3) additional shmem layer introduced
> 3. TTM TTM_PL_SYSTEM only
>   1) no contiguous memory support
>   2) too complicated as we don't need other functions of TTM
>   3) need GPU page fault to populate memory?
>   4) no page pool for cached memory
> 
> My plan:
> 1. for contiguous memory allocation use dma_alloc_*
> 2. for non-contiguous memory allocation, use a page pool from
> alloc_page

You should probably try to figure out who is your primary memory allocator.
Most of the times you don't want the GPU driver to allocate the memory,
you want that to come from a library that takes into account all the
constraints of the devices in the chain (GPU + display driver). There is
more to memory allocation for GPU than contiguous memory (alignment,
buffer encoding, etc).

Best regards,
Liviu

> 3. buffer is not really allocated when GEM_CREATE, but in CPU
> page fault handler and task submit buffer validation which make
> sure no GPU page fault
> 4. in shrinker handler, free un-used page in the pool, if still not
> enough, swap some idle buffer to disk
> 
> 3&4 apply to both dma_alloc buffer and alloc_page buffer.
> 
> Thanks,
> Qiang

> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


-- 
 /`\
/ : |
   _.._ | '/
 /`\| /
|  .-._ '-"` (
|_/   /   o  o\
  |  == () ==
   \   -- /   __
   / ---<_  |  
|___
  |  \\ \   |  I would like to fix the world but   |
  /
  | | \\__   \  |   no one gives me the source code.   |
 /
  / ; |.__)  /  |__|
 \
 (_/.-.   ; /__)
(_\
{ `|   \_/
 '-\   / |
| /  |
   /  \  '-.
   \__|-'
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Question about lima kernel MM implementation

2018-02-15 Thread Qiang Yu
>
>
> >
> > My requirement:
> > 1. support non-contiguous memory allocation as GPU buffer
> > 2. support contiguous memory allocation too for exporting to some
> > display DRM driver as framebuffer
>
> btw, I think etnaviv deals w/ contiguous scanout buffer by just
> importing the scanout buffer from the other display drm driver.  So I
> think you could avoid having to allocate these buffers.
>
> Right, after looking at the renderonly lib of imx+etnaviv, it create dumb
buffer at display DRM and export to etnaviv. With this method, seems
lima just need to support non-contiguous memory allocation.

Thanks,
Qiang


>
> > 3. no GPU page fault for better performance and avoid multi MMU
> > page fault handling, CPU page fault is OK
> > 4. better have buffer swap to disk feature when memory is full
> >
> > Current MM:
> > 1. drm_gem_cma_object, only support contiguous memory
> > 2. drm_gem_get_pages
> >   1) need to combine with cma method for contiguous memory
> >   2) when shrink is needed, swap some idle buffer to disk and put
> >   pages, need implement by myself
> >   3) additional shmem layer introduced
> > 3. TTM TTM_PL_SYSTEM only
> >   1) no contiguous memory support
> >   2) too complicated as we don't need other functions of TTM
> >   3) need GPU page fault to populate memory?
> >   4) no page pool for cached memory
> >
> > My plan:
> > 1. for contiguous memory allocation use dma_alloc_*
> > 2. for non-contiguous memory allocation, use a page pool from
> > alloc_page
> > 3. buffer is not really allocated when GEM_CREATE, but in CPU
> > page fault handler and task submit buffer validation which make
> > sure no GPU page fault
> > 4. in shrinker handler, free un-used page in the pool, if still not
> > enough, swap some idle buffer to disk
> >
> > 3&4 apply to both dma_alloc buffer and alloc_page buffer.
> >
> > Thanks,
> > Qiang
> >
> >
> > ___
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Question about lima kernel MM implementation

2018-02-14 Thread Rob Clark
On Tue, Feb 13, 2018 at 8:34 AM, Qiang Yu  wrote:
> Hi guys,
>
> I'm working on the Lima project for ARM mali400/450 GPU. Now lima
> kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
> for each vertex/fragment shader processor, so I want to refine the lima
> kernel driver for non-contiguous memory support.
>
> After some investigation on current available MM method used by
> several linux kernel DRM driver, I can't find an exactly match one for
> lima. So I'd like to hear some advise from you and see if I have some
> miss understanding on current MMs and if there's better approach.
> If can't use existing MM, I may have to write one for lima.
>
> About Mali400/450 GPU:
> 1. it has separate vertex and fragment shader processors, 1 vertex
> processor and 1~4 fragment processors are grouped to process an
> OpenGL draw
> 2. each processor has an MMU work independently
> 3. Mali400/450 will work with different display DRM driver, some
> display DRM driver support non-contiguous framebuffer and some
> not
>
> My requirement:
> 1. support non-contiguous memory allocation as GPU buffer
> 2. support contiguous memory allocation too for exporting to some
> display DRM driver as framebuffer

btw, I think etnaviv deals w/ contiguous scanout buffer by just
importing the scanout buffer from the other display drm driver.  So I
think you could avoid having to allocate these buffers.

(iirc, etnaviv does need contiguous buffers internally for a few
things, like cmdstream (?) and mmu pagetables)

BR,
-R

> 3. no GPU page fault for better performance and avoid multi MMU
> page fault handling, CPU page fault is OK
> 4. better have buffer swap to disk feature when memory is full
>
> Current MM:
> 1. drm_gem_cma_object, only support contiguous memory
> 2. drm_gem_get_pages
>   1) need to combine with cma method for contiguous memory
>   2) when shrink is needed, swap some idle buffer to disk and put
>   pages, need implement by myself
>   3) additional shmem layer introduced
> 3. TTM TTM_PL_SYSTEM only
>   1) no contiguous memory support
>   2) too complicated as we don't need other functions of TTM
>   3) need GPU page fault to populate memory?
>   4) no page pool for cached memory
>
> My plan:
> 1. for contiguous memory allocation use dma_alloc_*
> 2. for non-contiguous memory allocation, use a page pool from
> alloc_page
> 3. buffer is not really allocated when GEM_CREATE, but in CPU
> page fault handler and task submit buffer validation which make
> sure no GPU page fault
> 4. in shrinker handler, free un-used page in the pool, if still not
> enough, swap some idle buffer to disk
>
> 3&4 apply to both dma_alloc buffer and alloc_page buffer.
>
> Thanks,
> Qiang
>
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Question about lima kernel MM implementation

2018-02-14 Thread Qiang Yu
Hi guys,

I'm working on the Lima project for ARM mali400/450 GPU. Now lima
kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
for each vertex/fragment shader processor, so I want to refine the lima
kernel driver for non-contiguous memory support.

After some investigation on current available MM method used by
several linux kernel DRM driver, I can't find an exactly match one for
lima. So I'd like to hear some advise from you and see if I have some
miss understanding on current MMs and if there's better approach.
If can't use existing MM, I may have to write one for lima.

About Mali400/450 GPU:
1. it has separate vertex and fragment shader processors, 1 vertex
processor and 1~4 fragment processors are grouped to process an
OpenGL draw
2. each processor has an MMU work independently
3. Mali400/450 will work with different display DRM driver, some
display DRM driver support non-contiguous framebuffer and some
not

My requirement:
1. support non-contiguous memory allocation as GPU buffer
2. support contiguous memory allocation too for exporting to some
display DRM driver as framebuffer
3. no GPU page fault for better performance and avoid multi MMU
page fault handling, CPU page fault is OK
4. better have buffer swap to disk feature when memory is full

Current MM:
1. drm_gem_cma_object, only support contiguous memory
2. drm_gem_get_pages
  1) need to combine with cma method for contiguous memory
  2) when shrink is needed, swap some idle buffer to disk and put
  pages, need implement by myself
  3) additional shmem layer introduced
3. TTM TTM_PL_SYSTEM only
  1) no contiguous memory support
  2) too complicated as we don't need other functions of TTM
  3) need GPU page fault to populate memory?
  4) no page pool for cached memory

My plan:
1. for contiguous memory allocation use dma_alloc_*
2. for non-contiguous memory allocation, use a page pool from
alloc_page
3. buffer is not really allocated when GEM_CREATE, but in CPU
page fault handler and task submit buffer validation which make
sure no GPU page fault
4. in shrinker handler, free un-used page in the pool, if still not
enough, swap some idle buffer to disk

3&4 apply to both dma_alloc buffer and alloc_page buffer.

Thanks,
Qiang
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel