> On Feb 13, 2018, at 8:56 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> I agree with Matt, flat 64 will be faster, I would expect, but this code has 
> global metadata that would have to be replicated in a full scale run.\

  Use MPI 3 shared memory to expose the "global metadata" and forget this 
thread nonsense.

> We are just doing single socket test now (I think).
> 
> We have been tracking down what look like compiler bugs and we have only 
> taken at peak performance to make sure we are not wasting our time with 
> threads.

   You are wasting your time. There are better ways to deal with global 
metadata than with threads.

> 
> I agree 16x4 VS 64 would be interesting to see.
> 
> Mark
> 
> 
> 
> On Tue, Feb 13, 2018 at 2:02 PM, Kong, Fande <fande.k...@inl.gov> wrote:
> Curious about the comparison of 16x4 VS 64.
> 
> Fande,
> 
> On Tue, Feb 13, 2018 at 11:44 AM, Bakytzhan Kallemov <bkalle...@lbl.gov> 
> wrote:
> Hi,
> I am not sure about 64 flat run, 
> unfortunately I did not save logs since it's easy to run,  but for 16 - here 
> is the plot I got for different number of threads for KSPSolve time
> Baky
> 
> On 02/13/2018 10:28 AM, Matthew Knepley wrote:
>> On Tue, Feb 13, 2018 at 11:30 AM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
>> > On Feb 13, 2018, at 10:12 AM, Mark Adams <mfad...@lbl.gov> wrote:
>> >
>> > FYI, we were able to get hypre with threads working on KNL on Cori by 
>> > going down to -O1 optimization. We are getting about 2x speedup with 4 
>> > threads and 16 MPI processes per socket. Not bad.
>> 
>>   In other works using 16 MPI processes with 4 threads per process is twice 
>> as fast as running with 64 mpi processes?  Could you send the -log_view 
>> output for these two cases?
>> 
>> Is that what you mean? I took it to mean
>> 
>>   We ran 16MPI processes and got time T.
>>   We ran 16MPI processes with 4 threads each and got time T/2.
>> 
>> I would likely eat my shirt if 16x4 was 2x faster than 64.
>> 
>>   Matt
>>  
>> 
>> >
>> > There error, flatlined or slightly diverging hypre solves, occurred even 
>> > in flat MPI runs with openmp=1.
>> 
>>   But the answers are wrong as soon as you turn on OpenMP?
>> 
>>    Thanks
>> 
>>     Barry
>> 
>> 
>> >
>> > We are going to test the Haswell nodes next.
>> >
>> > On Thu, Jan 25, 2018 at 4:16 PM, Mark Adams <mfad...@lbl.gov> wrote:
>> > Baky (cc'ed) is getting a strange error on Cori/KNL at NERSC. Using maint 
>> > it runs fine with -with-openmp=0, it runs fine with -with-openmp=1 and 
>> > gamg, but with hypre and -with-openmp=1, even running with flat MPI, the 
>> > solver seems flatline (see attached and notice that the residual starts to 
>> > creep after a few time steps).
>> >
>> > Maybe you can suggest a hypre test that I can run?
>> >
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/
> 
> 
> 

Reply via email to