My name has been called. Mark, if you're having issues with Crusher, please contact Veronica Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in those emails
On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsm...@petsc.dev> wrote: > > > On Jan 24, 2022, at 2:46 PM, Mark Adams <mfad...@lbl.gov> wrote: > > Yea, CG/Jacobi is as close to a benchmark code as we could want. I could > run this on one processor to get cleaner numbers. > > Is there a designated ECP technical support contact? > > > Mark, you've forgotten you work for DOE. There isn't a non-ECP > technical support contact. > > But if this is an AMD machine then maybe contact Matt's student Justin > Chang? > > > > > > On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsm...@petsc.dev> wrote: > >> >> I think you should contact the crusher ECP technical support team and >> tell them you are getting dismel performance and ask if you should expect >> better. Don't waste time flogging a dead horse. >> >> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com> wrote: >> >> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >>> >>> >>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov> wrote: >>> >>>> >>>> >>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>>> Mark, I think you can benchmark individual vector operations, and once >>>>> we get reasonable profiling results, we can move to solvers etc. >>>>> >>>> >>>> Can you suggest a code to run or are you suggesting making a vector >>>> benchmark code? >>>> >>> Make a vector benchmark code, testing vector operations that would be >>> used in your solver. >>> Also, we can run MatMult() to see if the profiling result is reasonable. >>> Only once we get some solid results on basic operations, it is useful to >>> run big codes. >>> >> >> So we have to make another throw-away code? Why not just look at the >> vector ops in Mark's actual code? >> >> Matt >> >> >>> >>>> >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Here except for VecNorm the GPU is used effectively in that most >>>>>>> of the time is time is spent doing real work on the GPU >>>>>>> >>>>>>> VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 >>>>>>> 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 >>>>>>> 0.00e+00 0 0.00e+00 100 >>>>>>> >>>>>>> Even the dots are very effective, only the VecNorm flop rate over >>>>>>> the full time is much much lower than the vecdot. Which is somehow due >>>>>>> to >>>>>>> the use of the GPU or CPU MPI in the allreduce? >>>>>>> >>>>>> >>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate >>>>>> is about the same as the other vec ops. I don't know what to make of >>>>>> that. >>>>>> >>>>>> But Crusher is clearly not crushing it. >>>>>> >>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience >>>>>> with Crusher that they can share. They could very well find some low >>>>>> level >>>>>> magic. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Mark, can we compare with Spock? >>>>>>>> >>>>>>> >>>>>>> Looks much better. This puts two processes/GPU because there are >>>>>>> only 4. >>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt> >>>>>>> >>>>>>> >>>>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >> >> >