[Open-graphics] High Performance Graphics Hardware Design Requirements

Roland Nagtegaal Tue, 15 Mar 2005 06:37:08 -0800

Hello mr. Miller, and the others,

Years ago I followed the development of the GGI project which attempted
to improve the linux graphics subsystem to, say, IRIX levels. Although
they never succeeded, it turned out that graphics hardware is generally
designed in such a way that it is very hard, or impossible for the OS to
provide direct access to graphics hardware for multiple programs
simultanuously in a fast, secure and stable manner. Apparantly SGI did
make such secure but still fast hardware.


It is very well possible that you already took that into account in your
design, but still I want to bring a (very short) paper on this subject
to your attention. 

I have included a weppage by Linas Vepstas about designing graphics
hardware in such a way that it is easier for the OS to make graphics
access crash-proof, secure, but still fast.
* Crash-proof as in not being able to crash the graphics hardware by
feeding it bogus commands;
* Secure as in providing protected graphics contexts for different
programs, what I think would be needed for integrating SELinux with the
X window system without making it slow;
* Fast as in low overhead for context switching when different programs
access the graphics hardware, because the hardware facilitates saving
and restoring the graphics context.

I hope it is any use, and also that making secure and stable hardware
_can_ be combined with being fast.
I don't know Linas Vepstas at all, but from time to time I try to get
the right information to the right people.

thank you,

here it is:
------------------------------------------------------------------------

High Performance Graphics Hardware Design Requirements

This page attempts to spell out graphics hardware design requirements
needed to build high-performance graphics subsystems. This page is
intended for h/w graphics chip and board designers, as well as graphics
software sub-system designers and graphics device driver writers. It's
intent is to broaden the understanding of hardware design principles
needed to create high-performance graphics subsystems. These principles
are well known to high-end folks, but are sorely lacking in the Wintel
PC clone marketplace. 

This page is motivated by discussions on the
comp.os.linux.development.system USENET group, and the efforts of the
Linux GGI group, where it has been discovered that most PC-class/ MS
Windows graphics hardware is sorely lacking in important graphics
features. Current work on hardware-accelerated 3D centers around the
Mesa OpenGL implementation. The Graphics Advocacy page provides the
Linux background for accelerated 3D graphics. 


Basic Principles
The single most fundamental concept of high-performance graphics
hardware design is that the graphics program must have direct access to
the hardware. Depending on your experience, this may sound either
obvious, or a damned-fool bad idea. To people writing computer games,
and to people building hardware, this is obvious. To people writing
operating systems and graphics applications, who are used to device
drivers, libraries and windowing systems, this sounds stupid. In fact,
both camps are correct: fast access is direct access, and yes, with
improperly designed hardware, it is dangerous. 

The high-end Unix graphics hardware community has learned that both
worlds are possible: direct access from user-level programs (usually
through libraries) for performance, coupled to protected system modes
that prevent out-of-control or malicious programs from hanging the
system and locking up the hardware. However, to create such a system,
certain principles must be adhered to in the raster chip, bus interface
chip, and graphics card design. These principles are not terribly hard,
and in fact are sometimes deceptively simple and obvious. However, many
schedules have been slipped due to a misunderstanding of the required
functions. The repercussions of these principles affect the hardware,
the graphics system, the operating system, the window system, and the
graphics application. "Minor" hardware bugs in these areas are not
easily worked around in software; indeed, it may not be even possible to
work around them. 

There are two basic principles: (1) a recognition that there is a
difference between a protected mode, to which only the operating system
has access, and user-level drawing commands, which any program can bang
on. (2) The concept of context switching, whereby one graphics
application can be stopped, and another re-started, all without hanging
the graphics adapter, or loosing/scrambling the state of the hardware.
All of the other principles follow from the above. 

Without further ado, the list: 
Protected Mode
        Certain graphics h/w registers/functions, such as cursor control
        and colormap load, must be segregated into a distinct address
        space from other functions, such as area clear and line drawing.
        This allows the operating system to protect *privileged
        functions*, such as cursor movement or colormap loading, from
        *user space programs*, which want to have direct access to
        hardware registers for line drawing and area clear for (obvious)
        performance reasons. Such functions must be separated by at
        least 4K bytes, since most CPU's do not allow fine-grained
        memory protection (e.g. Intel x86, PowerPC, MIPS, Sparc only
        allow protection for 1K-4K byte pages.) 
        
Hardware Cursor
        It is impossible to build a high-performance graphics subsystem
        if the cursor needs to be drawn using software. This is not much
        of an issue, since many DAC's today support hardware cursors,
        and many/most graphics cards provide this function. 
        
Atomic Operations
        All drawing (i.e non-protected) operations must be atomic. This
        allows the operating system to suspend one program that is
        drawing, and start up another program that is drawing, without
        hanging the graphics hardware. For example, if it requires three
        registers to be written to draw a line or clear an are
        (start-xy, end-xy, and "command"), it must be possible for the
        software to write the start/end points, and never get around to
        writing the command, without hanging the hardware. (If the
        command is never written, then the line is never drawn). 
        
        In particular, this requires that command words be written last,
        and not first. For commands that require multiple registers to
        be written, it must be possible to break off the command at any
        point without hanging the hardware (i.e. it must be possible to
        write some of the registers, without writing all of them,
        without indefinitely hanging the hardware). If only a partial
        command is written, then no operation is performed. 
        
Interruptible Operations
        All drawing (i.e. user-level) operations must be interruptible.
        That is, if a command requires that multiple registers must be
        written, it must be possible to start writing data for this
        command, and then break this off and perform another command
        instead. Thus, for example, it must be possible to specify the
        line endpoints, then specify clear-area extents, then clear the
        area, then move the cursor, and then ask for the line to be
        drawn (software may have reloaded the line endpoints first).
        Such interrupted operations must NOT leave the hardware in an
        unknown or hung state. 
        
        This, together with the atomic-operations requirement above, and
        the readable registers requirement below, allows a multi-tasking
        operating system to stop a drawing process at any time (on an
        instruction-by-instruction basis), put it to sleep, and then
        allow another drawing process to run and do its drawing.
        Non-atomic, non-interruptible drawing operations require that
        the drawing program to obtain a lock, do its stuff, then release
        the lock when it's done. In general, locks are undesirable: they
        are slow. Even if a lock was fast, just having to do one takes
        CPU cycles away from what we really want to do: draw stuff. 
        
        Note that after the operating system has suspended one client,
        it may do house-hold functions, such as updating the cursor or
        the colormap, before allowing other processes to run. Thus, it
        must be possible to execute privileged commands that interrupt
        user commands. 
        
Readable Registers
        All registers must be readable. This is vital for a
        multi-tasking operating system. This allows the operating system
        to stop a graphics process, and save its graphics hardware
        context. It then allows the OS to restore a possibly different
        context from a different graphics process, allowing it to run,
        then stopping it, saving, etc. 
        
        The concept introduced here is of "context switching" or
        "multi-tasking". Basically, a graphics program can be suspended
        at any time, and another graphics program can be started exactly
        where it last left off. In order to be able to restart another
        process precisely where it left off, it must be possible to set
        the graphics hardware into the exact same state where the last
        program left off. To be able to get back to the exact same
        state, it must be possible to somehow read and save this state. 
        
        Note that high-end hardware usually provides features that not
        only make it possible to read and restore state information, but
        also make this operation extremely fast. Hardware that does
        support save/restore usually supports this at sub-millisecond
        speeds, thus allowing hundreds of context switches per second,
        while still leaving the the CPU and graphics card 90% free so
        that drawing can continue without hardly any slowdown. 
        
        Note that more modern high-end high-end hardware allows multiple
        graphics contexts: these can be saved to, and restored from
        special RAM areas on the card, without having to move all of the
        context information over the bus. 
        
Window Clipping Planes
        Window clipping planes prevent a program from drawing outside of
        it's window boundaries. This function isn't absolutely required,
        but is almost so. A graphics program can achieve much higher
        performance by not worrying about whether it is drawing outside
        of it's window boundaries, or whether it is obscured by another
        window. In addition, clipping planes provide an important
        security function: they prevent errant or intentionally
        malicious programs from drawing where they should not. Thus, an
        out-of-control program will not scribble all over the screen. 
        
        The update of window clipping planes must be a reserved,
        protected operation. That is, the control of window clipping
        planes must be segregated into a different address space than
        other user-mode drawing operations. 
        
        Note that some graphics hardware provides user-mode clipping
        registers. These are NOT what we are talking about here. Yes, it
        is nice to have user-mode clip registers, but these cannot be
        used by the operating system to prevent out-of-control or
        malicious programs from drawing where they shouldn't. 
        
        Note that hardware that supports directly-addressable frame
        buffers should also support clip tests against data written to
        the directly addressable areas. 
        
Per-Window Double Buffering
        This is not strictly a requirement, but frankly, for a
        high-performance, animated 3D hardware, full-screen double
        buffering sucks. It is painful to support in the operating
        system, in the graphics subsystem, and basically looks bad once
        you have two or more windows animating at the same time. 
        
Per-Window Multiple Colormaps 
        Again, not strictly a requirement, but if you want things to
        look nice on the screen, you have got to allow applications to
        set their own private colormaps, without ruining everything for
        the other windows on the screen 
        
FIFO's
        Another non-requirement, but the fact is that most high-end
        graphics hardware employs FIFOs to buffer drawing commands
        between the central CPU and the graphics hardware. These FIFO's
        are typically anywhere from 64 Bytes to 64 KBytes long. This
        allows the CPU to write commands to the graphics adapter without
        having to wait for it to finish, and it allows the graphics
        hardware to process drawing commands without having to wait for
        the CPU to provide more commands. As long as the buffer never
        accumulates more than one-tenth of a second worth of drawing
        commands, any delays or lags become essentially un-noticeable to
        the user. 
        
        Four common designs are seen: FIFO's in hardware (on the
        graphics adapter), FIFO's in user-memory, and "ping-pong"
        buffers. FIFO's on the graphics card can present a problem: when
        a context switch occurs, the FIFO contents must be saved and
        restored. They can be moved either to other memory on the
        graphics card, or they can be sent across the bus, back to the
        system. FIFO's in user memory present a problem: data and
        pointers can be corrupted by the user program (accidentally or
        maliciously). Of course, it must not be possible to hang the
        hardware due to corrupt data in the FIFO. 
        
Hardware Contexts
        Yet another non-requirement. However, almost all high-end
        hardware keeps considerable graphics context information on the
        hardware itself. Just as is the case with FIFO's, this context
        information must be saved and restored when a context switch
        occurs. Again, this context is moved either to another memory
        location on the adapter, or is sent back across the bus to the
        system for temporary storage in the kernel.
Well, that all. There are in fact a large variety of more detailed
design issues, but these are too numerous to be discussed in this
overview. All of the principles discussed above are well-known and
understood in the high-end (UNIX) graphics hardware community. All of
these have been discussed and written about in public forums and
journals. However, many of these are rare, have low circulation, or are
out-of-print. This is the ultimate reason for the existence of this
page. See Bibliography below. 


Kernel Considerations
The operating system kernel must address each of the hardware design
considerations expressed above. In particular, the kernel on SGI Irix
and IBM RS/6000 AIX systems supports the following functions: 
Grant and Retract
        A user application is granted direct access to the drawing
        subsystem for the very first time by registering itself with the
        kernel. The kernel returns addresses to the drawing subsystem
        hardware. 
Graphics Faults
        Access control to the graphics hardware is governed by a
        mechanism similar in many ways to the page-fault mechanism. Let
        us review page-faulting: when the CPU attempts to touch a page
        which is not in real memory (is in the swap space, for
        instance), the CPU receives an interrupt. The interrupt handler
        puts the process to sleep, and issues a read request to the
        disk. When the disk has found the requested page, that page is
        loaded into real memory, the virtual page tables are updated,
        and the process is marked "ready-to-run". When a time slice is
        available, the kernel will schedule the process and allow it to
        run again. 
        
        A graphics fault proceeds in a similar manner: as long as there
        are no other graphics processes that want to access the
        hardware, the current process can bang away at it. Periodically,
        however (typically, every 4 milliseconds), the graphics
        time-slice expires. The kernel looks to see if here are any
        other graphics processes that want to run. If so, then it
        retracts write permission to the graphics hardware from the
        first process, performs the graphics context switch, and then
        grants address access to the second process. At this point, if
        the first process attempts to touch the graphics i/o space, an
        interrupt will be generated. The first process will be put to
        sleep. The kernel will then schedule another process to run (not
        necessarily another graphics process). Graphics time-slice
        scheduling and regular process scheduling typically run
        independently of each other. 
        
Cursor
        The kernel must provide interfaces to allow a special process
        (typically, the X Server) to update the position of the cursor. 
        
WID Management
        Most high-end graphics hardware has window-id (WID) planes.
        These planes control not only which hardware color palette is
        used for pixel color lookup, but also typically provide hardware
        clipping so that a process cannot draw outside of its window and
        corrupt the screen. 
        
        The kernel must provide interfaces to manage these clipping
        planes, and/or take over management itself. In particular, if a
        window is moved (e.g. the user picks it up with the mouse and
        moves it), the WID planes must be updated to reflect the new
        window position. Window ID updates are by definition a
        privileged operation: user processes must not be allowed to
        twiddle with them, as this would allow them to corrupt window
        contents accidentally or intentionally. If the corruption is
        accidental, then it is merely ugly: the user sees crap drawn all
        over the screen, where it shouldn't be. A malicious example
        might be a rogue program running on a CIA/NSA machine attempting
        to read confidential information from another window. 
        
Context Management
        If the graphics hardware has hardware contexts or hardware
        FIFOs, then the kernel must shuffle this data around during a
        context switch. If the adapter does not have a lot of memory on
        it, then this data must be copied back across the bus, and
        stored in some temporary location within the kernel. This memory
        must, of course,be cleaned up if the graphics process exits. 
        
Double Buffering
        All high-end graphics hardware supports hardware double
        buffering. Some supports hardware quad-buffering (for
        double-buffered stereo viewing). Buffer swaps need to be
        synchronized with vertical retrace interrupts, so that image
        tearing does not occur. The kernel is often involved with
        synchronizing the swap with the retrace interrupt. 
        
        Furthermore, the kernel must count the number of pending buffer
        swaps for a graphics process, and put it to sleep if there are
        two. A graphics program is still typically allowed to write to a
        FIFO or buffer while there is one pending, outstanding swap
        request. But any more than that, and things get ugly. For
        example, we once allowed a program to issue 600 buffer swaps
        without putting it to sleep. It then proceeded to buffer swap 60
        times a second for the next ten seconds, while everybody
        wondered why it couldn't be control-C'd, and otherwise acted
        unexpectedly! Never mind that what it was drawing was 10 seconds
        out of date with respect to the current position of the mouse! 
        



Bibliography
Many of the above principles are discussed in greater detail in the
following classical references. If my memory serves me correctly, the
papers by Voorhies and by Rhoden are particularly descriptive of the
issues and possible solutions. Yes, these would appear to be very old,
but, if anything, they illustrate how Unix and Unix workstations have at
times enjoyed a ten year lead in technology over PC's and PC operating
systems. 
     1. Akeley, Kurt and Tom Jermoluk, "High Performance Polygon
        Rendering", Conference Proceedings, SIGGRAPH, 1988, vol 22 no.
        4, pp 239-246. 
        
     2. Doyle, Brian, "All About Multi-Processing for Unix
        Workstations", Conference Proceedings NCGA '1990, pp228-253.
        (National Computer Graphics Association). 
        
     3. Haletky, Edward H. and Linas Vepstas, "Integration of GL with
        the X Window System", Conference Proceedings, Xhibition 1991,
        pp.105-113 
        
     4. Norrod, Forest and Larry Thayer, "An Advanced VLSI Chip Set for
        Very High Speed Graphics Rendering", Conference Proceedings,
        NCGA 1991, pp 1-10. 
        
     5. Rhoden, Desi and Chris Wilcox. "Hardware Acceleration for Window
        Systems", Conference Proceedings SIGGRAPH 1989 vol 23 no. 3 pp
        61-67. 
        
     6. Stewart, Don. "VLSI: Key to Four Basic Strategies for Improving
        Workstation Graphics", Conference Proceedings, NCGA 1990 pp
        302-308. 
        
     7. Vepstas, Linas. "Porting OpenGL to New Hardware Platforms",
        Course Notes, OpenGL, SIGGRAPH 1992. 
        
     8. Voorhies, Douglas, David Kirk and Olin Lathrop, "Virtual
        Graphics", Conference Proceedings, SIGGRAPH 1988, vol 22 no. 4,
        pp 247-253.

________________________________________________________________________
Last updated 18 February 1996 by Linas Vepstas. 
Linas can be reached at [EMAIL PROTECTED] 
See also Linas Web Page
-- 
Roland Nagtegaal <[EMAIL PROTECTED]>
Universiteit Leiden, Instituut Lorentz

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

[Open-graphics] High Performance Graphics Hardware Design Requirements

Reply via email to