[Open-graphics] PCI interfaces considerations

Benjamin Herrenschmidt Thu, 24 Nov 2005 15:34:17 -0800

Hi !

I just popped to this list for the first time, so pardon me if that's
already been asked/discussed, but I wanted to raise a couple of things
that I think are important for proper operations on things like powerpc
platforms (or other non-x86 & big endian)


- VGA access. While it's generally always possible to generate the
legacy VGA io cycles, it's unconvenient due to the hard decoding. The
VGA memory cycles however cannot be produced on the bus on a number of
platforms. For example, a Mac can only issue PCI MMIO cycles in the
range 0x80000000 ... 0xff000000, that excludes the VGA "memory" hole.
It would be very useful if the card defined an additional PCI BAR of,
let's say, 128k, that contains an alias of the VGA space. That would
allow usage of legacy VGA modes & text mode on platforms that dont have
the memory hole. As far as the PCI interface implementation is
concerned, it's basically a matter of decoding accesses through that BAR
exactly as if they were coming from the low VGA addresses. To simplify
the decoding, you may want to implement a 1Mb BAR that only "catches"
addresses in the VGA range. In a similar vein, all VGA IO registers
should be available via a relocatable BAR, either in MMIO space along
with the other card registers, or via a separate IO BAR. That way, it's
possible to use all VGA features without having to do hard-decoding,
thus allowing several VGA cards without the complicated ping-pong'ing of
IOs.

 - It should be possible to disable decoding of legacy IO and legacy VGA
(hard coded addresses) and it should be possible to entirely operate the
card without ever having to issue a legacy cycle to a hard coded
address. For the reasons above among others. For example, if you require
hard-decoding, you require a costly sofware arbitration between all VGA
cards in the machine, flip/flop'ing MEM/IO enable bits in the PCI
command registers & VGA forwarding in P2P bridges. This is especially
bad if any card wants to be interrupt driven since you may end up with
another card disabling access to yours in order to get the VGA resources
and you taking an interrupt ... Typically, ATI cards for example have a
register in MMIO space that contains bits that allow selective disabling
of VGA IO and VGA mem decoding.

 - Endianness. X, among others (MacOS too, at least MacOS, I'm not sure
about X) requires on big endian platform a big endian framebuffer. Since
PCI is little endian and the framebuffer generally too, it's very
unpractical to implement the scannout of big endian. Thus cards usually
have front-end swappers on the PCI <-> VRAM path. For example, ATIs can
program up to 2 "apertures" with different swapper settings, and
radeon's can additionally program up to 8 "surfaces" which allow to
define regions of the vram space that are addresses with different
swapper setting and different tiling setting  (oh yes, tiled
framebuffer, that's something else you might want to consider :). It's
fairly important to provide something flexible here. For example, when
implementing EXA (the new X.org driver API oriented toward faster
compositing) support for radeons, we had to use the surfaces. EXA can
require access to up to 3 different pixmaps at one time (source, mask
and destination) for a composite software fallback. Since those can be
of different bpp, they may need different swapper settings and that
setting itself might have to be different from the bpp of the front
buffer. Similarly, the Z buffer, for 3D operations, or textures, might
have different bit depth than the front buffer and thus may have to be
accessed with different byteswapping settings.

 - DMA. We have been doing various experiments with EXA and DMA. EXA
using agressive caching of pixmaps to the framebuffer, it can get pretty
bad at "trashing", that is downloading pixmaps in/out the framebuffer.
Thus it's pretty important to have a rather fast path for blitting
pixmaps in both directions to/from system memory. The best way to do
somethign "universal" here is to have a scatter/gather DMA engine that
can source it's SG list from either an address in system memory or from
the vram (faster). Something like a list of source/dest/size. It's also
important that the DMA engine be able to do endian swapping on the way
in and out. The 3 typically types of swapping that are useful in
practice are 16 bits swap, 32 bits swap, and half dbl word swap. They
are respectively 

    A B C D -> B A D C
    A B C D -> D C B A
    A B C D -> C D A B

(The later being generally used to "fixup" a 16 bits pixel value that
was transferred using a 32 bits swapper). The problem with using a DMA
engine is that there is some cost to establish/tear down DMA mappings
for pixmaps, thus it's not very efficient for small transfers.

In addition, cards like ATI provide a PCI GART which sort-of simulates
an AGP cards on PCI and PCI-Express. It's fairly handy. The idea is to
be able to define a linear range of the card's address space that is
mapped to the system memory via an AGP-like GART table (or IOMMU), thus
allowing to dynamically bind/unbind portions of system memory into card
space in a linear way.

It's still fairly important that blit operations, even local to the card
space, can do byteswapping as well, using any of the 3 swapper settings
highlited above. Especially when using a GART as the pixmaps stored in
the GART space by the host may be in a different endian than the native
fort buffer format.

Voila ! That's all that came to mind today, I hope this is useful.

Cheers,
Ben.


_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

[Open-graphics] PCI interfaces considerations

Reply via email to