Hello, I have been investigating the use of DMA to transfer pixmap from system ram to the video memory of the glamo chip, on gta02.
Problem statement ================= The bus that interconnects the glamo GPU to the CPU of the s3c2442 chip is very slow. Some measurements of the throughput when memcpy-ing data from system ram to the video ram showed that goes at 30Mbytes/sec, whereas doing copies in the system ram goes at around 130Mbytes/sec, if not higher. Kaa acceleration architecture we are using allows us to use hardware capabilities to accelerate the transfer of pixmaps from system ram to video ram. That transfer is called pixmap migration in X lingo. On the other hand, the s3c2442 SoC has a DMA module that can perform data copies without using the processor. In our case, using that DMA module would not help us go faster than 30MB/sec because that speed is a bus limitation. It will however (hopefully) free the processor to do something else during the transfer. What is needed ============== So we need the framebuffer driver to expose entry points to perform pixmap copies using the DMA module of the s3c2442 chip. That entry point would then be called from withing the Xglamo server to perform the pixmap migrations. What has been done =================== I spent the last week understanding the s3c2442 DMA module and kernel api. I started putting together an implementation of a DMA based pixmap copy. You can find the patch I have written attached to bug http://bugzilla.openmoko.org/cgi-bin/bugzilla/show_bug.cgi?id=1234 . There are a few gotchas in that patch, so here is what it does: 1/ it creates a new blocking ioctl (called FBIO_GLAMO_UPLOAD_PIXMAP_TO_VRAM) to the framebuffer device driver. That ioctl is meant to copy a pixmap that resides in system ram to a destination in video ram. 2/ the pixmap is first copied in an in-kernel buffer that is DMA friendly. The DMA transfer will then copy the pixmap from this in-kernel buffer to the destination, in vram. The ultimate implementation should get rid of this copy to in-kernel DMA friendly buffer. I needed to this for now, for the sake of simplicity, to get things going, and have a chance to have the whole chain working first. 3/ the pixmap is then tranfered to vram, _line by line_ using DMA. The line by line is important to notice here because it is bad performance wise. But as pixmap copy must be done inherently line by line, there is no easy way to that for now. Ultimately though, I should be using a bounce buffer in offscreen vram. A bounce buffer is a buffer allocated in offscreen (non visible) vram. The pixmap data would be bulk copied in there first. Then, when that is done, the driver will use the glamo blitter to do the proper copy of the pixmap (line by line) to the actual final destination in vram. That way, the DMA won't be done line by line, and the processor won't be used to do the copy either. 4/ I was obliged to hack the s3c24xx DMA api a bit to make it support the type of DMA transfer I needed. Actually there are two types of DMA transfer supported by the s3c2442 module: software mode, and hardware mode. In the software mode, the software basically triggers the transfer, whereas in the hardware mode, it is the device where the data is transfered to (or from) that triggers the transfer. That requires a special wiring between the device and the s3c2442 chip. In the case of the glamo chip though, from an s3c2442 DMA module perspective, accessing vram is like accessing normal memory, so there is not special wiring in place to do hardware more DMA. We must then do software mode DMA. Unfortunately, software mode DMA was not really supported by the s3c24xx DMA api that is in the kernel right now. So I hacked it a bit to support it. That is in the patch attached to the bug I referred to earlier; look in the file linux-2.6.24/arch/arm/plat-s3c24xx/dma.c. 5/ I wrote a test application named test-glamo-dma to test/debug the whole thing outside of X. Its OE package source is also attached to the bug. What needs to be done ====================== Well, continue hacking on this and make it actually usable from an Xglamo perspective. When that is done, make Xglamo use it, and see if it is fast enough. That's all folks. Thanks for reading so far :-) Dodji. -- OpenedHand Ltd. Unit R, Homesdale Business Center / 216 - 218 Homesdale Road / Bromley / BR1 2QZ / United Kingdom. Tel,fax: +44 (0) 208 819 6559 Expert Open Source For Consumer Devices - http://www.openedhand.com

