Brett Johnson wrote:
| ... The whole point of all
| these posts I've been writing is that only *one* indirection needs to exist,
| and it might as well be at the libGL layer rather than at the driver layer.
You're correct that one indirection (in addition to loading the
address of the dispatch table) suffices, but there are also reasons to
use more, or to put them some place other than in libGL.
Let's assume for a moment that there's only one dispatch table. Then
any state change must modify the table entries for all the OpenGL
functions it affects. For some state changes, this is a *lot* of
table entries. If state changes are frequent compared to rendering
commands, then the cost of modifying the table entries can be higher
than the cost of double indirections (or inline tests) for the
rendering commands.
In theory, you could solve this by keeping a set of dispatch tables,
each handling an interesting combination of modes. A state-setting
command would change just the current context's pointer to the
dispatch table. This requires two loads (one for the table's address,
and one for the function address) to dispatch each command; in return,
both state changes and command dispatching are relatively fast.
In practice, the number of unique dispatch tables you'd need to do
this is probably prohibitively high. But you can apply the same
strategy hierarchically; for instance, all the commands that you
expect to be executed infrequently can be lumped together in one or
more subtables, and high-frequency commands could be left in the
highest-level dispatch table. Then state-setting commands would
change one high-level dispatch table pointer and just a few
lower-level subtable pointers. However, the choice of the best
hierarchy of tables is likely to be highly machine-dependent, so these
tricks need to be done in the driver, not in libGL.
Other people know the low-level details of current implementations
better than I do, but my guess would be that the best tradeoff looks
something like this:
The per-thread information needed for OpenGL rendering is a
pointer to the current rendering context and a pointer to the
current dispatch table.
Core OpenGL commands are dispatched by loading the pointer to
the current dispatch table from the per-thread data area,
loading the appropriate element of the table, and jumping to
it. This transfers control to the driver associated with the
current rendering context. Commands in the driver fetch their
arguments in the usual way, and load the pointer to the
current rendering context from the per-thread data area if
they need it.
The dispatch table is maintained entirely by the driver
associated with the current context. State changes may cause
the thread's dispatch-table pointer to change (thus swapping
in a new table that's potentially entirely different from the
old one), or may cause individual entries in the current
dispatch table to change.
The dispatch table contains entries for both core commands and
extension commands (for the driver associated with the current
context).
The mapping between command and table index needs to be
identical across all OpenGL implementations. This allows a
single libGL to interpret the dispatch tables from any driver
in the most efficient way (using constant offsets for
dispatching). The table indices should be maintained in a
registry, just like OpenGL core and extension enumerants, and
allocated in small chunks so that the table memory is used
efficiently. [Note: I've just mentioned this for
completeness; it's not a part of the opengl-base
specification.]
In addition to entry points for core OpenGL commands and
previously-registered extensions, libGL should include a
number of reserved entry points for extension commands that
were registered after the time libGL was compiled. Each of
these entry points is associated with a table index variable.
GetProcAddress functions by asking the driver to map a command
name into a table index, storing that value in the index
variable associated with the next-available reserved entry
point, and returning the address of that entry point. (If no
more reserved entry points are available, GetProcAddress
returns NULL.)
The registry of dispatch-table indices for extension commands
guarantees that an efficient dispatch process is possible for
all contexts that support a given extension. The entire setup
also preserves the nice property that
glGetProcAddress("glFoo")==&glFoo, for both core and extension
commands, for all contexts. Finally, the dispatch process is
very nearly as efficient for new extensions as it is for core
commands (the only difference is indexing the dispatch table
with a variable rather than a constant).
Note that if an application uses glGetProcAddress to get the
address of an extension function, and then calls that function
when the current context does not support the extension, libGL
will jump through a nonexistent dispatch table entry.
Personally, I say ``We gave 'em the rope; let 'em hang,'' but
with additional overhead in the reserved entry-point functions
we could check dispatch table length, check for null table
entries, etc. This might be useful when debugging apps that
fail to check the extensions string properly.
Allen