Brett Johnson wrote:
|                                                ...  The whole point of all
| these posts I've been writing is that only *one* indirection needs to exist,
| and it might as well be at the libGL layer rather than at the driver layer.

You're correct that one indirection (in addition to loading the
address of the dispatch table) suffices, but there are also reasons to
use more, or to put them some place other than in libGL.

Let's assume for a moment that there's only one dispatch table.  Then
any state change must modify the table entries for all the OpenGL
functions it affects.  For some state changes, this is a *lot* of
table entries.  If state changes are frequent compared to rendering
commands, then the cost of modifying the table entries can be higher
than the cost of double indirections (or inline tests) for the
rendering commands.

In theory, you could solve this by keeping a set of dispatch tables,
each handling an interesting combination of modes.  A state-setting
command would change just the current context's pointer to the
dispatch table.  This requires two loads (one for the table's address,
and one for the function address) to dispatch each command; in return,
both state changes and command dispatching are relatively fast.

In practice, the number of unique dispatch tables you'd need to do
this is probably prohibitively high.  But you can apply the same
strategy hierarchically; for instance, all the commands that you
expect to be executed infrequently can be lumped together in one or
more subtables, and high-frequency commands could be left in the
highest-level dispatch table.  Then state-setting commands would
change one high-level dispatch table pointer and just a few
lower-level subtable pointers.  However, the choice of the best
hierarchy of tables is likely to be highly machine-dependent, so these
tricks need to be done in the driver, not in libGL.

Other people know the low-level details of current implementations
better than I do, but my guess would be that the best tradeoff looks
something like this:

        The per-thread information needed for OpenGL rendering is a
        pointer to the current rendering context and a pointer to the
        current dispatch table.

        Core OpenGL commands are dispatched by loading the pointer to
        the current dispatch table from the per-thread data area,
        loading the appropriate element of the table, and jumping to
        it.  This transfers control to the driver associated with the
        current rendering context.  Commands in the driver fetch their
        arguments in the usual way, and load the pointer to the
        current rendering context from the per-thread data area if
        they need it.

        The dispatch table is maintained entirely by the driver
        associated with the current context.  State changes may cause
        the thread's dispatch-table pointer to change (thus swapping
        in a new table that's potentially entirely different from the
        old one), or may cause individual entries in the current
        dispatch table to change.

        The dispatch table contains entries for both core commands and
        extension commands (for the driver associated with the current
        context).

        The mapping between command and table index needs to be
        identical across all OpenGL implementations.  This allows a
        single libGL to interpret the dispatch tables from any driver
        in the most efficient way (using constant offsets for
        dispatching).  The table indices should be maintained in a
        registry, just like OpenGL core and extension enumerants, and
        allocated in small chunks so that the table memory is used
        efficiently.  [Note:  I've just mentioned this for
        completeness; it's not a part of the opengl-base
        specification.]

        In addition to entry points for core OpenGL commands and
        previously-registered extensions, libGL should include a
        number of reserved entry points for extension commands that
        were registered after the time libGL was compiled.  Each of
        these entry points is associated with a table index variable. 
        GetProcAddress functions by asking the driver to map a command
        name into a table index, storing that value in the index
        variable associated with the next-available reserved entry
        point, and returning the address of that entry point.  (If no
        more reserved entry points are available, GetProcAddress
        returns NULL.)

        The registry of dispatch-table indices for extension commands
        guarantees that an efficient dispatch process is possible for
        all contexts that support a given extension.  The entire setup
        also preserves the nice property that
        glGetProcAddress("glFoo")==&glFoo, for both core and extension
        commands, for all contexts.  Finally, the dispatch process is
        very nearly as efficient for new extensions as it is for core
        commands (the only difference is indexing the dispatch table
        with a variable rather than a constant).

        Note that if an application uses glGetProcAddress to get the
        address of an extension function, and then calls that function
        when the current context does not support the extension, libGL
        will jump through a nonexistent dispatch table entry. 
        Personally, I say ``We gave 'em the rope; let 'em hang,'' but
        with additional overhead in the reserved entry-point functions
        we could check dispatch table length, check for null table
        entries, etc.  This might be useful when debugging apps that
        fail to check the extensions string properly.

Allen

Reply via email to