Victor Eijkhout <[email protected]> writes: > On Jan 2, 2014, at 10:12 PM, Jed Brown <[email protected]> wrote: > >> the execution model is BSP > > No it's not. There are no barriers or syncs.
You have a communication phase followed by computation, followed by more communication (in general). Looks like BSP without explicit barriers (which would be semantically meaningless if you added them). An example of a less structured communication pattern is Mark's asynchronous Gauss-Seidel. http://crd.lbl.gov/assets/Uploads/ANAG/gs.pdf > The callbacks are there just to make the code uniform. I've edited the > document to reflect that in the MPI case you can dispense with them. I thought you were targeting hybrid MPI/threads? >> your transformation is recognizing a common >> pattern of communication into temporary buffers, followed by >> computation, followed by post-communication and putting a declarative >> syntax on it > > Somewhat simplified, but not wrong. I'm kind of interested in the > question what practically relevant algorithms do not conform to that > model. The GS above is one example. More simply, how does MatMult_MPI look in your model (note overlapped communication and computation)? Also, you can't implement an efficient setup for your communication pattern without richer semantics, see PetscCommBuildTwoSided_Ibarrier and the paper http://unixer.de/publications/img/hoefler-dsde-protocols.pdf >> Your abstraction is not uniform >> if you need to index into owned parts of shared data structures or >> perform optimizations like cooperative prefetch. > > Not sure what you're saying here. In case you dug into my code deeply, > at the moment gathering the halo region involves one send-to-self that > should be optimized away in the future. Look, it's a prototype, all > right? I don't care about that. What does the hybrid case look like? Do you prepare separate work buffers for each thread or do the threads work on parts of a shared data structure, perhaps cooperating at higher frequency than the MPI? I thought you were creating strict independence and private buffers, which would be MPI-like semantics using threads (totally possible, and I think usually a good thing, but most threaded programming models are explicitly trying to avoid it). >> If you're going to use >> separate buffers, why is a system like MPI not sufficient? > > 1. No idea why you're so fixated on buffers. I've just been going for > the simplest implementation that makes it work on this one example. It > can all be optimized. What would the optimized version look like? Someone has to decide how to index into the data structures. You're not providing much uniformity relative to MPI+X (e.g., X=OpenMP) if the threaded part is always sorted out by the user in an application-dependent way. > 2. Why is MPI not sufficient: because you have to spell out too > much. Calculating a halo in the case of a block-cyclically distributed > vector is way too much work. An extension of IMP would make this very > easy to specify, and all the hard work is done under the covers. VecScatter and PetscSF are less intrusive interfaces on top of MPI. I agree that MPI_Isend is pretty low level, but what are you providing relative to these less intrusive abstractions? You said that you were NOT looking for 'a better notation for VecScatters", so let's assume that such interfaces are available to the MPI programmer. >> What semantic does your abstraction provide for hybrid >> distributed/shared memory that imperative communication systems cannot? > > How about having the exact same code that I've shown you, except that > you specify that the processes are organized as N nodes times C cores? > Right now I've not implemented hybrid programming, but it shouldn't be > hard. How does the code the user writes in the callback remain the same for MPI and threads without the copies that AMPI or shared-memory MPI would have done and without requiring the user to explicitly deal with the threads (working on shared data structures as with OpenMP/TBB/pthreads)? I'm looking for a precise statement of something user code can be ignorant of with your model, yet reap the benefits of a model where they explicitly used that information (e.g., a semantic not possible with shared-memory MPI/AMPI, and possible using MPI+X only with some onerous complexity that cannot be easily tucked into a generic library function).
pgp9WA5hiu3yi.pgp
Description: PGP signature
