Re: [Render] RENDER ext. & hardware acceleration, libart integration

Peter Kaczowka Wed, 21 Aug 2002 09:42:53 -0700

Thomas Roell wrote:

> In your message of 15 August 2002 you write:
>
> > The second is that when
> > compositing images with window contents, placing the compositing software
> > right next to the frame buffer means that only the pixels which are
> > translucent in the source image need be read from the frame buffer.
>
> Actually that is incorrect. Reading data pixel-wise from the
> framebuffer can be significantly slower than reading them in
> sequential order. The reason is that the HW can use burst mode type
> access (bus as well as SRAM) to read data more efficiently.
>


I agree.  When accessing the frame buffer, if possible you should only read and
write longs (4 bytes), on longword boundaries (bottom 2 bits = 0).  The extra
instructions required to pack/unpack the pixels is well worth it, assuming a
fast modern machine.  In some cases where the CPU is fast and the memory
subsystem relatively slow (e.g. a modern Pentium machine) that approach can be
faster even when accessing main memory.

Even faster is to read and write quadwords (8 bytes) on quadword boundaries
(bottom 3 bits = 0).  On an x86 machine the practical way to do that is to use
MMX.  You have to write assembler to do that but gcc does seem to nicely
support embedded assembler; my MMX experience is with the MS compilers.  Of
course such assembler is not portable; it would have to be a compile-time
option.  On all x86 machines though you can determine at run-time whether the
processor supports MMX. (Please don't ask me how though: I did it at a previous
company and I don't have access to that source code - honest! I believe the
Intel developer web site can help there.)

The previous company was DOME imaging systems, now part of Planar Systems.  The
product was the imaging library DIMPL.  I don't think I'm giving any trade
secrets away here.  Here's an URL with some benchmarks:

    http://www.dome.com/products/software.html

I don't have any financial interest in this anymore since I don't work at
DOME or own any DOME stock; Planar Systems bought it all (thank you Planar!).
I do still have a "pride of authorship" though: check out the 8-bit JPEG
decompression figures; world's fastest we believe.  The other operations are
equivalently fast.  It's not open source and I don't think they have a Linux
port yet; the medical industry is incredibly slow to change.  Most medical
imaging viewing workstations run Solaris.  The medical companies have been
porting to NT to run on cheaper x86 hardware.  You would think they would wise
up and "port" (recompile) their code to Linux and be done with it ...

Depending on the PCI bus controllers (in the host and on the PCI card) you need
to write quadwords on quadword boundaries to get "burst" cycles on the bus.
DIMPL does this using MMX where available.  DIMPL also uses MMX for a lot of
other operations including JPEG decompression.

Without understanding Render too well, I would guess that some Render
operations could be sped up using MMX.  I'm not volunteering to do it, but for
anyone interested I think you'll find that using SIMD and coding MMX is great
fun.  But then again I'm old school, and coding assembler still strikes me as
more fun than say Java.

If MMX is already being used in Render or has already tried, I'm sorry to imply
that no one has thought of this before.

Peter Kaczowka


>
> > Of course, you could also look at the Matrox code and implement similar
> > acceleration for whatever video card you like; that will make Render a lot
> > faster, and doing some of the common cases isn't really that much work.
>
> That is one of my major headaches in implementing X Render. Modern
> Graphics HW split up the 2D engine from the 3D engine. The matrox chip
> you refer to is really one of the last ones, where the 2D engine was
> combined with the 3D engine.
>
> Hence you will have to implement XRender on y typical 3D engine that
> has been geared towards Direct3D/OpenGL. And this is where the
> headache starts:
>
> 1. Source and Mask in reality had to be implemented as texture maps,
>    where the Mask would proably be a alpha-texture. PseudoColor maps
>    could in theory be implemented as palletized textures, but few
>    graphics to implement them still, as having luts for trilinear
>    texture filtering gets kind of expensive. Now the issue with using
>    textures for Source and Mask is that they are texture and they are
>    also pixmaps at the same time, to which one could render to
>    (including by the current composite operation). For most HW,
>    texture are either in dedicated memory (where they cannot be drawn
>    to), or are in a different format (tiled memory organisation), have
>    certain size restrictions that make them difficult to keep on
>    HW. But essentially they are read only objects, rather than the
>    read&write paradigma of XRender.
>
> 2. After the process of combining Source and Mask, the 3D engines
>    blender has used for combining the result with the destination.
>    Direct3D/OpenGL do use a powerful but simple scheme of having a
>    source factor and a destination factor, where with modern graphics
>    engines, those factors are independantly controllable for the rgb
>    and the alpha channel(s). XRender now only uses a single operator,
>    that has been defined in a way, that makes it often impossible
>    possible to map the functionality to existing (and most likely
>    furture) HW. An example is the PictOpSaturate. This operator is not
>    implementable in HW without changeing the texture formats on the
>    fly.
>
> 3. The way that picture object are associated with pixmaps do force an
>    X-Server implementation to support many new pixmap formats. If an
>    X-Server implements HW caching of pixmaps, this is a verys
>    significant problem.
>
> My suggestions for making XRender to be implementable on real existing
> HW (without having to resort to the statement that only few fast paths
> are in reality to be HW accelerated) would be:
>
> - make Source and Mask readonly objects, associated with a
>   picture. Then a driver can convert the data into usable formats and
>   directly make use of them.
>
> - replace the notion of PseudoColor pictures by having simple
>   intensity formats (could do almost the same, and it's supported by
>   HW).
>
> - rework the operators so that they are mappable to exisiting HW.
>
> - have picture objects independant of pixmaps.
>
> As we tried to implement XRender (and have not given up more than 3
> times because of the convolutedness, and the fact that pretty much all
> applications seen out there using on the alpha blending capability),
> one can today only use maybe the source/destination beling capability
> along with a scratch texture surface). Caching of glyphs is next to
> impossible given the semantics of operators and the read&write nature
> of pictures. Actually with that in mind and the fact that for most
> operations the cost of setting up the graphics engine is higher than
> simply doing it in SW, it seems that a software only implementation
> makes way more sence than attempting to put this concept onto real HW.
>
> - Thomas
> --
>              Thomas Roell   /\         Das Reh sprint hoch,
>              Xi Graphics   /  \/\ _     das Reh springt weit,
>          [EMAIL PROTECTED]   /   /  \ \     was soll es tun,
>                          / Oelch! \ \     es hat ja Zeit.
> _______________________________________________
> Render mailing list
> [EMAIL PROTECTED]
> http://XFree86.Org/mailman/listinfo/render

_______________________________________________
Render mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/render

Re: [Render] RENDER ext. & hardware acceleration, libart integration

Reply via email to