On 11/23/05, Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > Hi ! > > I just popped to this list for the first time, so pardon me if that's > already been asked/discussed, but I wanted to raise a couple of things > that I think are important for proper operations on things like powerpc > platforms (or other non-x86 & big endian) > > - VGA access. While it's generally always possible to generate the > legacy VGA io cycles, it's unconvenient due to the hard decoding. The > VGA memory cycles however cannot be produced on the bus on a number of > platforms. For example, a Mac can only issue PCI MMIO cycles in the > range 0x80000000 ... 0xff000000, that excludes the VGA "memory" hole. > It would be very useful if the card defined an additional PCI BAR of, > let's say, 128k, that contains an alias of the VGA space. That would > allow usage of legacy VGA modes & text mode on platforms that dont have > the memory hole. As far as the PCI interface implementation is > concerned, it's basically a matter of decoding accesses through that BAR > exactly as if they were coming from the low VGA addresses. To simplify > the decoding, you may want to implement a 1Mb BAR that only "catches" > addresses in the VGA range. In a similar vein, all VGA IO registers > should be available via a relocatable BAR, either in MMIO space along > with the other card registers, or via a separate IO BAR. That way, it's > possible to use all VGA features without having to do hard-decoding, > thus allowing several VGA cards without the complicated ping-pong'ing of > IOs.
The VGA stuff will be accessible that way, but for non-x86 platforms, I figured we'd just disable it and use a graphical console from boot. Coming from Sun boxes, I'm used to us writing forth code to handle this from power-on. > > - It should be possible to disable decoding of legacy IO and legacy VGA > (hard coded addresses) and it should be possible to entirely operate the > card without ever having to issue a legacy cycle to a hard coded > address. For the reasons above among others. For example, if you require > hard-decoding, you require a costly sofware arbitration between all VGA > cards in the machine, flip/flop'ing MEM/IO enable bits in the PCI > command registers & VGA forwarding in P2P bridges. This is especially > bad if any card wants to be interrupt driven since you may end up with > another card disabling access to yours in order to get the VGA resources > and you taking an interrupt ... Typically, ATI cards for example have a > register in MMIO space that contains bits that allow selective disabling > of VGA IO and VGA mem decoding. I agree, so worry not. My last graphics chip didn't support VGA or even iospace accesses at all. My attitude toward VGA is that it is an add-on just to support the larger userbase of x86 users. Although it's something we're accounting for from the beginning, it's going to look like an afterthought, because we're trying to minimize it in every way we can. > > - Endianness. X, among others (MacOS too, at least MacOS, I'm not sure > about X) requires on big endian platform a big endian framebuffer. Since > PCI is little endian and the framebuffer generally too, it's very > unpractical to implement the scannout of big endian. Thus cards usually > have front-end swappers on the PCI <-> VRAM path. For example, ATIs can > program up to 2 "apertures" with different swapper settings, and > radeon's can additionally program up to 8 "surfaces" which allow to > define regions of the vram space that are addresses with different > swapper setting and different tiling setting (oh yes, tiled > framebuffer, that's something else you might want to consider :). It's > fairly important to provide something flexible here. For example, when > implementing EXA (the new X.org driver API oriented toward faster > compositing) support for radeons, we had to use the surfaces. EXA can > require access to up to 3 different pixmaps at one time (source, mask > and destination) for a composite software fallback. Since those can be > of different bpp, they may need different swapper settings and that > setting itself might have to be different from the bpp of the front > buffer. Similarly, the Z buffer, for 3D operations, or textures, might > have different bit depth than the front buffer and thus may have to be > accessed with different byteswapping settings. Given my background, it's going to end up being more flexible than you need it to be. > - DMA. We have been doing various experiments with EXA and DMA. EXA > using agressive caching of pixmaps to the framebuffer, it can get pretty > bad at "trashing", that is downloading pixmaps in/out the framebuffer. > Thus it's pretty important to have a rather fast path for blitting > pixmaps in both directions to/from system memory. The best way to do > somethign "universal" here is to have a scatter/gather DMA engine that > can source it's SG list from either an address in system memory or from > the vram (faster). Something like a list of source/dest/size. It's also > important that the DMA engine be able to do endian swapping on the way > in and out. The 3 typically types of swapping that are useful in > practice are 16 bits swap, 32 bits swap, and half dbl word swap. They > are respectively > > A B C D -> B A D C > A B C D -> D C B A > A B C D -> C D A B > > (The later being generally used to "fixup" a 16 bits pixel value that > was transferred using a 32 bits swapper). The problem with using a DMA > engine is that there is some cost to establish/tear down DMA mappings > for pixmaps, thus it's not very efficient for small transfers. Already thought of. Stick around and watch the specs. If we forget something, remind us! > In addition, cards like ATI provide a PCI GART which sort-of simulates > an AGP cards on PCI and PCI-Express. It's fairly handy. The idea is to > be able to define a linear range of the card's address space that is > mapped to the system memory via an AGP-like GART table (or IOMMU), thus > allowing to dynamically bind/unbind portions of system memory into card > space in a linear way. Although nothing is set in stone, there may be grounds for not supporting AGP. Or if we do, it won't be right away. But there's still time to debate that issue. > It's still fairly important that blit operations, even local to the card > space, can do byteswapping as well, using any of the 3 swapper settings > highlited above. Especially when using a GART as the pixmaps stored in > the GART space by the host may be in a different endian than the native > fort buffer format. Our blit engine can't touch host memory, for security reasons. But I have specified mechanisms to so all the proper swapping on various sorts of image upload and download operations. > Voila ! That's all that came to mind today, I hope this is useful. These things you mention are very important issues, things that not everyone is aware of, so it's good that you point them out. I suspect that there are deeper issues that even I haven't considered, so please keep them coming. And if they're left out of the prototype, you can help us to figure that out and make sure they're in there. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
