On Mon, 18 Feb 2008 13:11:27 +0100 Dodji Seketeli <[EMAIL PROTECTED]> babbled:
> Hello, > > I have been investigating the use of DMA to transfer pixmap from system > ram to the video memory of the glamo chip, on gta02. > > Problem statement > ================= > > The bus that interconnects the glamo GPU to the CPU of the s3c2442 chip > is very slow. Some measurements of the throughput when memcpy-ing data > from system ram to the video ram showed that goes at 30Mbytes/sec, > whereas doing copies in the system ram goes at around 130Mbytes/sec, if > not higher. i measured 160mb/sec system -> system memcpy()'s so 30 vs 160... sucketh muchly :) > Kaa acceleration architecture we are using allows us to use hardware > capabilities to accelerate the transfer of pixmaps from system ram to > video ram. That transfer is called pixmap migration in X lingo. > > On the other hand, the s3c2442 SoC has a DMA module that can perform > data copies without using the processor. In our case, using that DMA > module would not help us go faster than 30MB/sec because that speed is > a bus limitation. It will however (hopefully) free the processor to do > something else during the transfer. indeedily :) the 30m/sec means that the cpu is locked into wasting cycles doing the copy and really slows down uploads of data - and since xrender and such are not accelerated yet (and at best can only be partly accelerated), anything "fancy" will require lots of uploads. *IF* the x client and xserver are implemented correctly, we get the uploads "for free", as long as the client has something it can do while the upload happens (for example - re-calculate the next frame of animation). i do know evas does this in its rendering model, so if my benchmarking number are right, this might get about a 40-50% framerate increase. not shabby. if you block and wait for the transfer before continuing, we do free up the cpu to go idle or run some other process outside the x client doing the drawing/upload and xserver, but you won't see framerate increases. > What is needed > ============== > > So we need the framebuffer driver to expose entry points to perform > pixmap copies using the DMA module of the s3c2442 chip. That entry > point would then be called from withing the Xglamo server to perform > the pixmap migrations. > > > What has been done > =================== > > I spent the last week understanding the s3c2442 DMA module and kernel > api. I started putting together an implementation of a DMA based pixmap > copy. You can find the patch I have written attached to bug > http://bugzilla.openmoko.org/cgi-bin/bugzilla/show_bug.cgi?id=1234 . > > There are a few gotchas in that patch, so here is what it does: > > 1/ it creates a new blocking ioctl (called > FBIO_GLAMO_UPLOAD_PIXMAP_TO_VRAM) to the framebuffer device driver. > That ioctl is meant to copy a pixmap that resides in system ram to a > destination in video ram. > > 2/ the pixmap is first copied in an in-kernel buffer that is DMA > friendly. The DMA transfer will then copy the pixmap from this > in-kernel buffer to the destination, in vram. The ultimate > implementation should get rid of this copy to in-kernel DMA friendly > buffer. I needed to this for now, for the sake of simplicity, to get > things going, and have a chance to have the whole chain working first. > > 3/ the pixmap is then tranfered to vram, _line by line_ using DMA. > The line by line is important to notice here because it is bad > performance wise. But as pixmap copy must be done inherently line by > line, there is no easy way to that for now. Ultimately though, I should > be using a bounce buffer in offscreen vram. A bounce buffer is a buffer > allocated in offscreen (non visible) vram. The pixmap data would be > bulk copied in there first. Then, when that is done, the driver will > use the glamo blitter to do the proper copy of the pixmap (line by > line) to the actual final destination in vram. That way, the DMA won't > be done line by line, and the processor won't be used to do the copy > either. > > 4/ I was obliged to hack the s3c24xx DMA api a bit to make it support > the type of DMA transfer I needed. Actually there are two types of DMA > transfer supported by the s3c2442 module: software mode, and hardware > mode. In the software mode, the software basically triggers the > transfer, whereas in the hardware mode, it is the device where the data > is transfered to (or from) that triggers the transfer. That requires a > special wiring between the device and the s3c2442 chip. > In the case of the glamo chip though, from an s3c2442 DMA module > perspective, accessing vram is like accessing normal memory, so there > is not special wiring in place to do hardware more DMA. We must then do > software mode DMA. Unfortunately, software mode DMA was not really > supported by the s3c24xx DMA api that is in the kernel right now. So I > hacked it a bit to support it. That is in the patch attached to the bug > I referred to earlier; look in the file > linux-2.6.24/arch/arm/plat-s3c24xx/dma.c. > > 5/ I wrote a test application named test-glamo-dma to test/debug the > whole thing outside of X. Its OE package source is also attached to the > bug. > > What needs to be done > ====================== > > Well, continue hacking on this and make it actually usable from an > Xglamo perspective. When that is done, make Xglamo use it, and see if > it is fast enough. > > > That's all folks. > > Thanks for reading so far :-) > > Dodji. sounds great. :) all the bits there - just need to work, then be streamlined :) -- Carsten Haitzler (The Rasterman) <[EMAIL PROTECTED]>

