Jeff Hammond <[email protected]> writes: > On Tue, Jul 10, 2018 at 11:27 AM, Richard Tran Mills <[email protected]> > wrote: > >> On Mon, Jul 9, 2018 at 10:04 AM, Jed Brown <[email protected]> wrote: >> >>> Jeff Hammond <[email protected]> writes: >>> >>> > This is the textbook Wrong Way to write OpenMP and the reason that the >>> > thread-scalability of DOE applications using MPI+OpenMP sucks. It >>> leads to >>> > codes that do fork-join far too often and suffer from death by Amdahl, >>> > unless you do a second pass where you fuse all the OpenMP regions and >>> > replace the serial regions between them with critical sections or >>> similar. >>> > >>> > This isn't how you'd write MPI, is it? No, you'd figure out how to >>> > decompose your data properly to exploit locality and then implement an >>> > algorithm that minimizes communication and synchronization. Do that >>> with >>> > OpenMP. >>> >>> The applications that would call PETSc do not do this decomposition and >>> the OpenMP programming model does not provide a "communicator" or >>> similar abstraction to associate the work done by the various threads. >>> It's all implicit. >> >> >> This is perhaps the biggest single reason that I hate OpenMP. >> >> --Richard >> >> The idea with PETSc's threadcomm was to provide an >>> object for this, but nobody wanted to call PETSc that way. It's clear >>> that applications using OpenMP are almost exclusively interested in its >>> incrementalism, not in doing it right. It's also pretty clear that the >>> OpenMP forum agrees, otherwise they would be providing abstractions for >>> performing collective operations across module boundaries within a >>> parallel region. >>> >>> So the practical solution is to use OpenMP the way everyone else does, >>> even if the performance is not good, because at least it works with the >>> programming model the application has chosen. >>> >> >> > The counter argument is that users who want this level of control are > empowered to implement exactly what they need using the explicit threading > model.
There is no standard for "explicit threading" crossing module boundaries. A communicator would provide an explicit way to say "these threads participate in this collective operation". It's fragile to build it on top of a programming model that does not have such a concept, particularly when you wish to support multiple libraries and callback interfaces (necessary for nonlinear solvers and other types of extensible composition). > A thread communicator is just the set of threads that work together > and one can implement workshare and collective operations using that > information. Given how inefficiently GOMP implementations barriers, > folks should probably be rolling their own barriers anyways. If folks > are deeply unhappy with this suggestion because of the work it > requires, then perhaps DOE needs to fund somebody to write an > open-source collectives library for OpenMP. What set of threads would these hypethetical collectives be collective on? > For what it's worth, there's a active effort to make the teams construct > valid in general (i.e. not just in the context of 'target'), which is > intended to make NUMA programming easier. Currently, teams are defined by > the implementation, but it may be possible to allow them to be > user-defined. The challenge is that teams were initially created to > support the OpenCL execution model that does not permit synchronization > between work groups, so there may be objections to giving users control of > their definition. > > Jeff > > -- > Jeff Hammond > [email protected] > http://jeffhammond.github.io/
