adding Suyash, I found the/a problem. Using ex56, which has a crappy decomposition, using one MPI process/GPU is much faster than using 8 (64 total). (I am looking at ex13 to see how much of this is due to the decomposition) If you only use 8 processes it seems that all 8 are put on the first GPU, but adding -c8 seems to fix this. Now the numbers are looking reasonable.
On Mon, Jan 24, 2022 at 3:24 PM Barry Smith <bsm...@petsc.dev> wrote: > > For this, to start, someone can run > > src/vec/vec/tutorials/performance.c > > and compare the performance to that in the technical report Evaluation of > PETSc on a Heterogeneous Architecture \\ the OLCF Summit System \\ Part I: > Vector Node Performance. Google to find. One does not have to and shouldn't > do an extensive study right now that compares everything, instead one > should run a very small number of different size problems (make them big) > and compare those sizes with what Summit gives. Note you will need to make > sure that performance.c uses the Kokkos backend. > > One hopes for better performance than Summit; if one gets tons worse we > know something is very wrong somewhere. I'd love to see some comparisons. > > Barry > > > On Jan 24, 2022, at 3:06 PM, Justin Chang <jychan...@gmail.com> wrote: > > Also, do you guys have an OLCF liaison? That's actually your better bet if > you do. > > Performance issues with ROCm/Kokkos are pretty common in apps besides just > PETSc. We have several teams actively working on rectifying this. However, > I think performance issues can be quicker to identify if we had a more > "official" and reproducible PETSc GPU benchmark, which I've already > expressed to some folks in this thread, and as others already commented on > the difficulty of such a task. Hopefully I will have more time soon to > illustrate what I am thinking. > > On Mon, Jan 24, 2022 at 1:57 PM Justin Chang <jychan...@gmail.com> wrote: > >> My name has been called. >> >> Mark, if you're having issues with Crusher, please contact Veronica >> Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in >> those emails >> >> On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsm...@petsc.dev> wrote: >> >>> >>> >>> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> >>> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could >>> run this on one processor to get cleaner numbers. >>> >>> Is there a designated ECP technical support contact? >>> >>> >>> Mark, you've forgotten you work for DOE. There isn't a non-ECP >>> technical support contact. >>> >>> But if this is an AMD machine then maybe contact Matt's student >>> Justin Chang? >>> >>> >>> >>> >>> >>> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsm...@petsc.dev> wrote: >>> >>>> >>>> I think you should contact the crusher ECP technical support team and >>>> tell them you are getting dismel performance and ask if you should expect >>>> better. Don't waste time flogging a dead horse. >>>> >>>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com> wrote: >>>> >>>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang < >>>>>> junchao.zh...@gmail.com> wrote: >>>>>> >>>>>>> Mark, I think you can benchmark individual vector operations, and >>>>>>> once we get reasonable profiling results, we can move to solvers etc. >>>>>>> >>>>>> >>>>>> Can you suggest a code to run or are you suggesting making a vector >>>>>> benchmark code? >>>>>> >>>>> Make a vector benchmark code, testing vector operations that would be >>>>> used in your solver. >>>>> Also, we can run MatMult() to see if the profiling result is >>>>> reasonable. >>>>> Only once we get some solid results on basic operations, it is useful >>>>> to run big codes. >>>>> >>>> >>>> So we have to make another throw-away code? Why not just look at the >>>> vector ops in Mark's actual code? >>>> >>>> Matt >>>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Here except for VecNorm the GPU is used effectively in that most >>>>>>>>> of the time is time is spent doing real work on the GPU >>>>>>>>> >>>>>>>>> VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 >>>>>>>>> 0.00e+00 0 0.00e+00 100 >>>>>>>>> >>>>>>>>> Even the dots are very effective, only the VecNorm flop rate over >>>>>>>>> the full time is much much lower than the vecdot. Which is somehow >>>>>>>>> due to >>>>>>>>> the use of the GPU or CPU MPI in the allreduce? >>>>>>>>> >>>>>>>> >>>>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate >>>>>>>> is about the same as the other vec ops. I don't know what to make of >>>>>>>> that. >>>>>>>> >>>>>>>> But Crusher is clearly not crushing it. >>>>>>>> >>>>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience >>>>>>>> with Crusher that they can share. They could very well find some low >>>>>>>> level >>>>>>>> magic. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Mark, can we compare with Spock? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Looks much better. This puts two processes/GPU because there are >>>>>>>>> only 4. >>>>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>>> >>>> >>> >
0 KSP Residual norm 1.410853326294e+00 1 KSP Residual norm 3.308114929726e+00 2 KSP Residual norm 5.268571496560e+00 3 KSP Residual norm 6.149538104592e+00 4 KSP Residual norm 5.850118312153e+00 5 KSP Residual norm 6.691885163871e+00 6 KSP Residual norm 6.804517562756e+00 7 KSP Residual norm 7.197858569937e+00 8 KSP Residual norm 7.822478314857e+00 9 KSP Residual norm 8.202105022638e+00 10 KSP Residual norm 8.939894492312e+00 11 KSP Residual norm 9.429993430012e+00 12 KSP Residual norm 9.605492804767e+00 13 KSP Residual norm 9.640964280678e+00 14 KSP Residual norm 9.298652856327e+00 15 KSP Residual norm 8.688325517281e+00 16 KSP Residual norm 8.103300011658e+00 17 KSP Residual norm 7.657056535579e+00 18 KSP Residual norm 7.274740905565e+00 19 KSP Residual norm 6.989367099698e+00 20 KSP Residual norm 6.693292777717e+00 21 KSP Residual norm 6.264239515746e+00 22 KSP Residual norm 5.946230942315e+00 23 KSP Residual norm 5.461074143939e+00 24 KSP Residual norm 5.000139937199e+00 25 KSP Residual norm 4.690147106850e+00 26 KSP Residual norm 4.340975483114e+00 27 KSP Residual norm 4.216407821646e+00 28 KSP Residual norm 4.075379410030e+00 29 KSP Residual norm 4.093724077948e+00 30 KSP Residual norm 3.972717435085e+00 31 KSP Residual norm 3.757728119779e+00 32 KSP Residual norm 3.540607563741e+00 33 KSP Residual norm 3.431062851880e+00 34 KSP Residual norm 3.450360009855e+00 35 KSP Residual norm 3.593502735404e+00 36 KSP Residual norm 3.780832581840e+00 37 KSP Residual norm 3.905447434318e+00 38 KSP Residual norm 3.984131419229e+00 39 KSP Residual norm 3.945938933976e+00 40 KSP Residual norm 3.553422818113e+00 41 KSP Residual norm 2.938844893302e+00 42 KSP Residual norm 2.809545432521e+00 43 KSP Residual norm 2.953724603153e+00 44 KSP Residual norm 2.944856948692e+00 45 KSP Residual norm 2.714548772425e+00 46 KSP Residual norm 2.757853041702e+00 47 KSP Residual norm 2.802728332990e+00 48 KSP Residual norm 2.733707284580e+00 49 KSP Residual norm 2.795310289754e+00 50 KSP Residual norm 2.885286206575e+00 51 KSP Residual norm 2.840587445960e+00 52 KSP Residual norm 2.986739512809e+00 53 KSP Residual norm 3.038967844916e+00 54 KSP Residual norm 3.120224614592e+00 55 KSP Residual norm 3.252584908500e+00 56 KSP Residual norm 3.329078354051e+00 57 KSP Residual norm 3.493538794345e+00 58 KSP Residual norm 3.693624595560e+00 59 KSP Residual norm 3.946156830176e+00 60 KSP Residual norm 4.372813538537e+00 61 KSP Residual norm 4.793425118505e+00 62 KSP Residual norm 5.506707673470e+00 63 KSP Residual norm 6.150469745023e+00 64 KSP Residual norm 7.009152654362e+00 65 KSP Residual norm 8.253999190110e+00 66 KSP Residual norm 9.773686873303e+00 67 KSP Residual norm 1.174201878873e+01 68 KSP Residual norm 1.396810766198e+01 69 KSP Residual norm 1.531938038251e+01 70 KSP Residual norm 1.513815060009e+01 71 KSP Residual norm 1.351504569209e+01 72 KSP Residual norm 1.189818271063e+01 73 KSP Residual norm 1.055982729886e+01 74 KSP Residual norm 9.291111182468e+00 75 KSP Residual norm 8.994372539499e+00 76 KSP Residual norm 9.974014612561e+00 77 KSP Residual norm 1.127854042048e+01 78 KSP Residual norm 1.252496528261e+01 79 KSP Residual norm 1.418696243993e+01 80 KSP Residual norm 1.532377955119e+01 81 KSP Residual norm 1.370656960788e+01 82 KSP Residual norm 1.180429013782e+01 83 KSP Residual norm 1.003617095145e+01 84 KSP Residual norm 8.394450117817e+00 85 KSP Residual norm 6.899686914524e+00 86 KSP Residual norm 6.179350449619e+00 87 KSP Residual norm 5.565154073979e+00 88 KSP Residual norm 5.150487367510e+00 89 KSP Residual norm 4.999864016175e+00 90 KSP Residual norm 4.869910941255e+00 91 KSP Residual norm 4.744777237912e+00 92 KSP Residual norm 4.753059736768e+00 93 KSP Residual norm 4.746021509746e+00 94 KSP Residual norm 4.676154678970e+00 95 KSP Residual norm 4.667939895068e+00 96 KSP Residual norm 4.982168193998e+00 97 KSP Residual norm 5.376230525346e+00 98 KSP Residual norm 6.027223402693e+00 99 KSP Residual norm 6.688770388651e+00 100 KSP Residual norm 7.685272624683e+00 101 KSP Residual norm 8.540315337448e+00 102 KSP Residual norm 9.039414712941e+00 103 KSP Residual norm 9.412267211525e+00 104 KSP Residual norm 9.404393063521e+00 105 KSP Residual norm 9.809809633962e+00 106 KSP Residual norm 1.019997954431e+01 107 KSP Residual norm 1.032798037382e+01 108 KSP Residual norm 1.018368040001e+01 109 KSP Residual norm 9.032578302284e+00 110 KSP Residual norm 7.511728677100e+00 111 KSP Residual norm 6.320399999215e+00 112 KSP Residual norm 5.638446159168e+00 113 KSP Residual norm 5.503768021011e+00 114 KSP Residual norm 5.781512507352e+00 115 KSP Residual norm 6.668193746580e+00 116 KSP Residual norm 8.289840511454e+00 117 KSP Residual norm 9.602543908825e+00 118 KSP Residual norm 9.885225641874e+00 119 KSP Residual norm 9.475771653754e+00 120 KSP Residual norm 9.253307705621e+00 121 KSP Residual norm 9.188703825743e+00 122 KSP Residual norm 8.982425406803e+00 123 KSP Residual norm 9.029965071148e+00 124 KSP Residual norm 8.936472797372e+00 125 KSP Residual norm 8.847701213231e+00 126 KSP Residual norm 8.850219067523e+00 127 KSP Residual norm 8.883966846716e+00 128 KSP Residual norm 8.822082961919e+00 129 KSP Residual norm 9.144573911170e+00 130 KSP Residual norm 9.210998384025e+00 131 KSP Residual norm 8.767074129481e+00 132 KSP Residual norm 8.653932024226e+00 133 KSP Residual norm 8.738817183375e+00 134 KSP Residual norm 8.847719520860e+00 135 KSP Residual norm 8.823379882635e+00 136 KSP Residual norm 8.688648621431e+00 137 KSP Residual norm 8.766604393781e+00 138 KSP Residual norm 8.961220512489e+00 139 KSP Residual norm 9.038789268757e+00 140 KSP Residual norm 9.255097048034e+00 141 KSP Residual norm 9.457532840426e+00 142 KSP Residual norm 9.353035188344e+00 143 KSP Residual norm 8.972079650141e+00 144 KSP Residual norm 8.990246637705e+00 145 KSP Residual norm 9.133606744913e+00 146 KSP Residual norm 9.284449139694e+00 147 KSP Residual norm 9.446523116163e+00 148 KSP Residual norm 9.392983045581e+00 149 KSP Residual norm 9.190311275931e+00 150 KSP Residual norm 8.637696807809e+00 151 KSP Residual norm 8.246041171334e+00 152 KSP Residual norm 7.974442084343e+00 153 KSP Residual norm 7.819232318105e+00 154 KSP Residual norm 7.908790010611e+00 155 KSP Residual norm 8.281392146382e+00 156 KSP Residual norm 8.711804633156e+00 157 KSP Residual norm 8.972428309154e+00 158 KSP Residual norm 8.821322938720e+00 159 KSP Residual norm 8.694550793978e+00 160 KSP Residual norm 8.497087628681e+00 161 KSP Residual norm 8.342289866176e+00 162 KSP Residual norm 8.323833824628e+00 163 KSP Residual norm 8.340846763041e+00 164 KSP Residual norm 8.938969817866e+00 165 KSP Residual norm 9.072018746931e+00 166 KSP Residual norm 9.382200283204e+00 167 KSP Residual norm 9.618709771467e+00 168 KSP Residual norm 9.816042710750e+00 169 KSP Residual norm 1.006175118406e+01 170 KSP Residual norm 1.013405891235e+01 171 KSP Residual norm 9.945457958847e+00 172 KSP Residual norm 1.006028462918e+01 173 KSP Residual norm 1.001712718542e+01 174 KSP Residual norm 9.950326839565e+00 175 KSP Residual norm 9.870606457184e+00 176 KSP Residual norm 9.505672324164e+00 177 KSP Residual norm 9.422406293510e+00 178 KSP Residual norm 9.180050627762e+00 179 KSP Residual norm 8.686064400557e+00 180 KSP Residual norm 8.568532139747e+00 181 KSP Residual norm 8.734731645402e+00 182 KSP Residual norm 9.018967477404e+00 183 KSP Residual norm 9.460079286079e+00 184 KSP Residual norm 9.448953574953e+00 185 KSP Residual norm 9.685497063794e+00 186 KSP Residual norm 9.869855710508e+00 187 KSP Residual norm 1.003302047960e+01 188 KSP Residual norm 9.564028860536e+00 189 KSP Residual norm 9.013288033632e+00 190 KSP Residual norm 8.750427764456e+00 191 KSP Residual norm 8.903646907458e+00 192 KSP Residual norm 9.285007079918e+00 193 KSP Residual norm 9.424801141906e+00 194 KSP Residual norm 9.291833173642e+00 195 KSP Residual norm 8.991571624860e+00 196 KSP Residual norm 8.694508731874e+00 197 KSP Residual norm 9.031462542355e+00 198 KSP Residual norm 9.496643154125e+00 199 KSP Residual norm 9.284160146520e+00 200 KSP Residual norm 8.742226063537e+00 Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=12288000, cols=12288000, bs=3 total: nonzeros=982938168, allocated nonzeros=995328000 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 512000 nodes, limit used is 5 **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/tutorials/data/../ex56 on a arch-olcf-crusher named crusher002 with 8 processors, by adams Tue Jan 25 08:15:31 2022 Using Petsc Development GIT revision: v3.16.3-684-g003dbea9e0 GIT Date: 2022-01-24 12:23:30 -0600 Max Max/Min Avg Total Time (sec): 7.811e+00 1.000 7.811e+00 Objects: 1.900e+01 1.000 1.900e+01 Flop: 5.331e+10 1.000 5.331e+10 4.265e+11 Flop/sec: 6.825e+09 1.000 6.825e+09 5.460e+10 MPI Messages: 1.432e+03 1.005 1.426e+03 1.141e+04 MPI Message Lengths: 1.187e+08 1.002 8.310e+04 9.480e+08 MPI Reductions: 6.450e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 6.8604e+00 87.8% 1.3230e+09 0.3% 9.500e+01 0.8% 2.101e+06 21.1% 1.800e+01 2.8% 1: Setup: 6.2347e-03 0.1% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.000e+00 0.3% 2: Solve: 9.4447e-01 12.1% 4.2516e+11 99.7% 1.131e+04 99.2% 6.616e+04 78.9% 6.060e+02 94.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 4 1.0 8.7896e-01534.8 0.00e+00 0.0 3.8e+01 8.0e+00 4.0e+00 8 0 0 0 1 9 0 40 0 22 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 4 1.0 8.7908e-01509.8 0.00e+00 0.0 9.5e+01 2.1e+06 4.0e+00 8 0 1 21 1 9 0100100 22 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 2 1.0 8.7524e-01 4.1 0.00e+00 0.0 3.8e+01 5.2e+06 2.0e+00 10 0 0 21 0 11 0 40 99 11 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 2 1.0 3.5833e-01 1.0 1.55e+06 0.0 0.0e+00 0.0e+00 4.0e+00 5 0 0 0 1 5 0 0 0 22 17 0 0 0.00e+00 0 0.00e+00 0 VecSet 1 1.0 3.8404e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAssemblyBegin 2 1.0 4.1764e-03 1.1 0.00e+00 0.0 5.7e+01 3.8e+04 2.0e+00 0 0 0 0 0 0 0 60 1 11 0 0 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 2 1.0 5.2523e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 1 1.0 4.5518e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --- Event Stage 1: Setup KSPSetUp 1 1.0 7.0851e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 99 0 0 0100 0 0 0 0.00e+00 0 0.00e+00 0 PCSetUp 1 1.0 6.2920e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --- Event Stage 2: Solve BuildTwoSided 1 1.0 9.1706e-05 1.6 0.00e+00 0.0 5.6e+01 4.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 200 1.0 6.7831e-01 1.0 4.91e+10 1.0 1.1e+04 6.6e+04 1.0e+00 9 92 99 79 0 71 92100100 0 579635 1014212 1 2.04e-04 0 0.00e+00 100 MatView 1 1.0 7.8531e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 9.4550e-01 1.0 5.31e+10 1.0 1.1e+04 6.6e+04 6.0e+02 12100 99 79 94 100100100100100 449667 893741 1 2.04e-04 0 0.00e+00 100 PCApply 201 1.0 1.6966e-01 1.0 3.09e+08 1.0 0.0e+00 0.0e+00 2.0e+00 2 1 0 0 0 18 1 0 0 0 14558 163941 0 0.00e+00 0 0.00e+00 100 VecTDot 401 1.0 5.3642e-02 1.3 1.23e+09 1.0 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 62 5 2 0 0 66 183716 353914 0 0.00e+00 0 0.00e+00 100 VecNorm 201 1.0 2.2219e-02 1.1 6.17e+08 1.0 0.0e+00 0.0e+00 2.0e+02 0 1 0 0 31 2 1 0 0 33 222325 303155 0 0.00e+00 0 0.00e+00 100 VecCopy 2 1.0 2.3551e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 1 1.0 9.8740e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 400 1.0 2.3017e-02 1.1 1.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 2 2 0 0 0 427091 514744 0 0.00e+00 0 0.00e+00 100 VecAYPX 199 1.0 1.1312e-02 1.1 6.11e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 432323 532889 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 235882 290088 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04 1.0e+00 2 0 99 79 0 19 0100100 0 0 0 1 2.04e-04 0 0.00e+00 0 VecScatterEnd 200 1.0 1.9007e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetUp 1 1.0 1.3015e-03 1.3 0.00e+00 0.0 1.1e+02 1.7e+04 1.0e+00 0 0 1 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFPack 200 1.0 1.7309e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 18 0 0 0 0 0 0 1 2.04e-04 0 0.00e+00 0 SFUnpack 200 1.0 2.3165e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --- Event Stage 3: Unknown --- Event Stage 4: Unknown --- Event Stage 5: Unknown --- Event Stage 6: Unknown --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 1 8096 0. Matrix 3 3 1554675244 0. Preconditioner 1 1 872 0. Viewer 2 1 840 0. Vector 4 8 74208728 0. Index Set 2 2 235076 0. Star Forest Graph 1 1 1200 0. --- Event Stage 1: Setup Vector 4 1 12289784 0. --- Event Stage 2: Solve Vector 1 0 0 0. --- Event Stage 3: Unknown --- Event Stage 4: Unknown --- Event Stage 5: Unknown --- Event Stage 6: Unknown ======================================================================================================================== Average time to get PetscTime(): 3.51e-08 Average time for MPI_Barrier(): 2.7172e-06 Average time for zero size MPI_Send(): 8.326e-06 #PETSc Option Table entries: -alpha 1.e-3 -ksp_converged_reason -ksp_max_it 200 -ksp_monitor -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_view -mat_type aijkokkos -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_max_it 2 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -ne 159 -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 1 -pc_gamg_threshold -0.01 -pc_type jacobi -use_gpu_aware_mpi true -use_mat_nearnullspace false #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher ----------------------------------------- Libraries compiled on 2022-01-25 12:50:33 on login2 Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4 Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc Using PETSc arch: arch-olcf-crusher ----------------------------------------- Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O Using Fortran compiler: ftn -fPIC -g ----------------------------------------- Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/include -I/opt/rocm-4.5.0/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -L/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lz -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa -----------------------------------------