I've just checked in dxmpi as a new -what do you call it? project? in the OpenDX source repository. This is in preparation for making it available, but its not yet "officially" released, and is not readable by not-me. I still have not written any significant documentation (I've been putting it off on the rationale that I'll be able to do it at home while helping care for our soon-to-be-born. Yeah, right.). I'll give a little overview here, and if there's interest, will be happy to discuss it at greater length.
Basically, it works by starting an instance of the DX executive on each MPI host, with one (rank 0) designated the master. The master communicates with the UI or directly in script language (though rather than by typing directly in, a separate program is used to connect the keyboard to dxexec). Absent any use of OpenDX-MPI extensions, the master performs as a completely normal OpenDX exec. The MPI extensions to OpenDX fit two forms: modules that appear to run on the master but which use MPI internally to run in parallel, and macros that are run in parallel on the slaves at the behest of a "RunOnSlaves" module that runs on the master. I've put in an API allowing modules to register and call remote procedures and to pass DX objects around to support the first type. For the second, there's a set of new modules that provide tools for data distribution from the master to the slaves, leaving the distributed data as named objects in the slaves' cache, where they can be retrieved by name by macros run on the slaves by RunOnSlaves. As an example of the first type, I've implemented a parallel regular-grid importer (DRegularImporter) that works like this. As inputs, it receives the name of an ad-hoc data description file that describes the layout of data found in a separate file. It registers a remote procedure with the slaves, and then invokes that procedure on the slaves, passing them the data description and a destination object name. Each slave, knowing how many slaves there are and its own rank among them, is able to determine what portion of the data its responsible for (essentially, partitioning the data as a part of importing it, and including a "ghost zone" of overlap with neighboring partitions). Each imports its part, saves it in the cache under the destination object name (the same on all slaves), and returns the bounding box and min max of its partition to the master. The master (still in the guts of the DReguarImport module) gathers this info together into an output Group object where member i is the box and minmax of partition i residing on slave i. Now suppose you want to isosurface and render that data. You create a macro on the slaves that accesses the data by the name under which DRegularImporter left it and an isovalue (again by name, from the cache). The macro then sends the data through Isosurface and Color and whatever, then passes that result to the SlaveDone module, which indicates to the master that the execution of the macro is complete and returns the object. On the master, a scalar interactor feeds the input of Broadcast(value, name) that broadcasts the interactor's value to the slaves under the associated name. The dummy output of Broadcast and the output of DRegularImporter are passed to "control" inputs to RunOnSlaves(macro name, control ...) that just serve to ensure correct module sequencing. RunOnSlaves causes the slaves to run the named macro, waits for their results, and gathers it all together in a group where member i contains the result of slave i. This could then be passed directly into Image for rendering. There are a couple other cool pieces to this, most notably a distributed parallel renderer and a new streamline module that runs on distributed vector fields. There are a lot of holes to fix; for example, errors in slave macros very often cause system hangs. There is no integration of the notion of slave macros into the UI. Lots of things. Greg
