Re: [petsc-dev] PETSc and threads

Barry Smith Mon, 19 Jan 2015 16:54:06 -0800

> On Jan 19, 2015, at 5:23 PM, Jed Brown <[email protected]> wrote:
> 
> Barry Smith <[email protected]> writes:
>>   Whose legacy applications? Presumably all PETSc legacy applications
>>   which are written for pure MPI are either MPI scalable or have
>>   flaws in the non-PETSc part that make them non-MPI scalable but
>>   presumably that could be fixed?  
> 
> Or they have already bought into threads for other reasons.  People
> rarely design applications based on unbiased data.


   Not that many and mostly half-assed I would say. In most cases I believe 
they can just turn off the use of threads in their code and be fine. That is 
few of them are using threads due to "large amounts of data that can not be 
parallelized" but most are doing it just because they were told to use threads. 
I don't buy the legacy argument except for a tiny number of apps we can safely 
ignore, or hack if need be by using shared memory and turning off some of the 
MPI processes during the users "threaded" portions of the code. 

> 
>>   Are you talking about applications with for example redundant
>>   storage of mesh info, hence it is not MPI scalable? Well it is not
>>   going to be MPI + thread scalable either (though the threads will
>>   help for a while as you said). 
> 
> Correct, but if the number of nodes is not increasing, they'll be able
> to live with certain types of non-scalability for a while, especially if
> their goal is science/engineering output rather than "scalability" on
> DOE's computers.
> 
>>   And as you noted before even the redundant mesh info business can
>>   be handled with the MPI window stuff just as well if not better
>>   than threads anyways. Be more specific with what legacy
>>   applications and what features of those apps.
> 
> Apps that already committed to threads.  And it doesn't have to be
> better even in their own tests.  I recently reviewed a paper written by
> an esteemed colleague that is promoting a "new" threading model that
> performs uniformly worse than the naive MPI implementation they have had
> for years.  And it was presented as a positive result because "MPI won't
> scale", despite the new thing showing worse performance and worse
> scalability on every numerical example in the paper.

   I hope you recommended rejecting the paper, it should be rejected. This kind 
of crap just makes our life harder.

> 
> If we take the hard-line approach that PETSc will not support threads in
> any form, we're going to be swimming upstream for a sizable class of
> apps and libraries.  I think we should encourage people to choose
> processes, but for those that buy into threads for whatever reason,
> logical or not, we would be a better library if we can offer them good
> performance.
> 
    There is a cost to PETSc in terms of "lost cool new functionality" and 
improved solvers and improved code by us devoting/wasting intellectual and time 
energy to "mucking with threads" to make them reasonable. Is that cost worth 
the effort? I doubt it. I'd much rather have you working on other stuff then 
make work.

   On the other hand there is the political cost in refusing to do what the 
sheeple tell us to do. So why don't we just say we're doing thread stuff but 
not bother. Or if that won't work just drop the damn openMP pragmas into the 
code (strip out the threadcomm stuff) and call it a day? That requires little 
intellectual energy or time compared to "doing threads right" but satisfies 
your criteria.
  
> Also note that if the vendors stick with their silly high-latency
> node-wide coherence, a larger fraction of users would be able to run
> their science of interest on a single node, in which case we could
> provide practical parallel solvers to applications that are not
> interested in dealing with MPI's idiosyncrasies.

   Maybe if we improved our API to hide more of "MPI's idiosyncrasies" such as 
revisiting our "FormFunctionLocal" to make it friendly and even going as far as 
to have a simplified MPI_Comm less API (that assumes MPI_COMM_WORLD or a fixed 
subset) for everything? If we did this then the user (without even knowing) 
would actually be writing an application that was not restricted to one node 
and yet they would have the "simplicity" of no MPI in their application or 
parallelism in their model. 

   As far as I am concerned if you measure the total memory bandwidth of the 
system, then based on that predict the 'solver' performance and get some 
reasonable percentage (> 80?) of the predicted that scales, then you are done. 
You don't need to justify using MPI instead of threads (or any model versus any 
other model) because switching the model (at what ever cost) will only give you 
an additional 100/80 = 1.25 hence 25 percent improvement assuming the 
alternative model is perfect, which is won't be. The problem with people 
pushing the thread model is that they "think" that it magically solves the 
memory bandwidth problem when it doesn't help that all. So when people question 
us why can't we just answer by asking them how threads improves the accessible 
memory bandwidth? 

   You are the one who believes the pure MPI model is the best technically now 
and into the future, you don't need to compromise so easily.

   Barry

Re: [petsc-dev] PETSc and threads

Reply via email to