I am not sure about 64 flat run,

unfortunately I did not save logs since it's easy to run,  but for 16 - here is the plot I got for different number of threads for KSPSolve time


    > FYI, we were able to get hypre with threads working on KNL on
    Cori by going down to -O1 optimization. We are getting about 2x
    speedup with 4 threads and 16 MPI processes per socket. Not bad.

      In other works using 16 MPI processes with 4 threads per process
    is twice as fast as running with 64 mpi processes?  Could you send
    the -log_view output for these two cases?

Is that what you mean? I took it to mean

  We ran 16MPI processes and got time T.
  We ran 16MPI processes with 4 threads each and got time T/2.

I would likely eat my shirt if 16x4 was 2x faster than 64.


    > There error, flatlined or slightly diverging hypre solves,
    occurred even in flat MPI runs with openmp=1.

      But the answers are wrong as soon as you turn on OpenMP?



    > We are going to test the Haswell nodes next.
    > Baky (cc'ed) is getting a strange error on Cori/KNL at NERSC.
    Using maint it runs fine with -with-openmp=0, it runs fine with
    -with-openmp=1 and gamg, but with hypre and -with-openmp=1, even
    running with flat MPI, the solver seems flatline (see attached and
    notice that the residual starts to creep after a few time steps).
    > Maybe you can suggest a hypre test that I can run?

