Re: [petsc-dev] HDF5 + p4est regression

2023-12-20 Thread Mark Adams
Cool, it's not in main is it? On Wed, Dec 20, 2023 at 8:51 AM Matthew Knepley wrote: > That is the problem Berend is asking about. Toby and I have to fix it. > > Matt > > On Wed, Dec 20, 2023 at 7:49 AM Mark Adams wrote: > >> I seem to have a regression with writing H

[petsc-dev] HDF5 + p4est regression

2023-12-20 Thread Mark Adams
I seem to have a regression with writing HDF5 meshes with DMForest. I going to try bisect but wanted to see if anyone has any ideas, Thanks, Mark [0]PETSC ERROR: - Error Message -- [0]PETSC ERROR: Invalid argument

Re: [petsc-dev] PetscHMapI passing

2023-11-27 Thread Mark Adams
To add a method to PETSc there are several places that you need to add headers and registration stuff. I find a method that looks like my method and clone that by searching (git grep) for that (DMPlexGet something) method and cloning that code everywhere that you find that method. Mark On Mon,

Re: [petsc-dev] [petsc-users] performance regression with GAMG

2023-10-11 Thread Mark Adams
> data" error or hangs, and going back to the square graph aggressive > coarsening brings us back the old performance. So we'd be keen to have that > branch merged indeed > Many thanks for your assistance with this > Stephan > > On 05/10/2023 01:11, Mark Adams wrote:

Re: [petsc-dev] [petsc-users] performance regression with GAMG

2023-10-06 Thread Mark Adams
did not send an empty message. Just a guess. Thanks, Mark On Fri, Oct 6, 2023 at 12:30 AM Pierre Jolivet wrote: > > > On Oct 6, 2023, at 3:48 AM, Mark Adams wrote: > >  > Pierre, (moved to dev) > > It looks like there is a subtle bug in the new MatFilter. > My gue

Re: [petsc-dev] [petsc-users] performance regression with GAMG

2023-10-05 Thread Mark Adams
, as seen in the graph, did not match the cached communication lists. The old way just created a whole new matrix, which took care of that. Mark On Thu, Oct 5, 2023 at 8:51 PM Mark Adams wrote: > Fantastic, it will get merged soon. > > Thank you for your diligence and patience. > Thi

Re: [petsc-dev] bad cpu/MPI performance problem

2023-01-09 Thread Mark Adams
is possible that in your run the nroots is always >= 0 and some >> MPI bug is causing the problem but this doesn't change the fact that the >> current code is buggy and needs to be fixed before blaming some other bug >> for the problem. >> >> >> >>

Re: [petsc-dev] bad cpu/MPI performance problem

2023-01-08 Thread Mark Adams
ave a bug fix suggestion for me to try? Thanks > Barry > > Yes it is possible that in your run the nroots is always >= 0 and some MPI > bug is causing the problem but this doesn't change the fact that the > current code is buggy and needs to be fixed before blaming some other

Re: [petsc-dev] bad cpu/MPI performance problem

2023-01-08 Thread Mark Adams
2.3259668003585e+00 (plot ID 0) 0 SNES Function norm 5.415286407365e-03 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 245100.0 ON crusher002 CANCELLED AT 2023-01-08T15:32:43 DUE TO TIME LIMIT *** > > Thanks, > > Matt > &g

Re: [petsc-dev] bad cpu/MPI performance problem

2023-01-08 Thread Mark Adams
y study with 2x the nodes and 1/2 the cores per node, so I am doing that. That seems to be running well up to 32 nodes but my one 128 node job that ran today timed out and it looks like it is fouling internally somehow. Thanks, > > On Jan 8, 2023, at 12:21 PM, Mark Adams wrote: > >

[petsc-dev] bad cpu/MPI performance problem

2023-01-08 Thread Mark Adams
I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE. In going up to 64 nodes, something really catastrophic is happening. I understand I am not using the machine the way it was intended, but I just want to see if there are any options that I could try for a quick fix/help. In a

Re: [petsc-dev] do all MIS algorithms require a symmetric structure?

2022-09-19 Thread Mark Adams
differently and goes through some hoops that the old MIS does not do that are not as easy as the MatDuplicate to avoid. Mark > > On Sep 18, 2022, at 6:58 PM, Mark Adams wrote: > > > > On Sun, Sep 18, 2022 at 6:19 PM Barry Smith wrote: > >> >> Mark, >> >&

Re: [petsc-dev] do all MIS algorithms require a symmetric structure?

2022-09-18 Thread Mark Adams
will see a PtAP if you use aggressive coarsening in the new code) And -pc_gamg_threshold 0 will filter (zeros only). Use < 0 for no filtering. The old code also had this optimization to not create a graph for bs==1 and no filter, Mark > > > > On Sep 18, 2022, at 4:21 PM, Mark Adams w

Re: [petsc-dev] do all MIS algorithms require a symmetric structure?

2022-09-18 Thread Mark Adams
doc/summary?doi=10.1.1.37.3040=8 This paper discusses the parallel algotithm. The algorithm is fine, but I could see the implementations could be bad. Its not that much code. Mark > > Barry > > > On Sep 18, 2022, at 4:21 PM, Mark Adams wrote: > > > > On Sun, Sep 18,

Re: [petsc-dev] do all MIS algorithms require a symmetric structure?

2022-09-18 Thread Mark Adams
On Sun, Sep 18, 2022 at 4:02 PM Barry Smith wrote: > > Mark, > >Do all MIS algorithms in PETSc require a symmetric graph structure? > And parallel ones can hang if not structurally symmetric? > Yes, > >When used sequentially I guess it never hangs but it may not produce a >

Re: [petsc-dev] large cost of symmetrizing for GAMG?

2022-09-18 Thread Mark Adams
oo damn long due to lack of > good tuning they will do something else. In other words each user should > not need an algebraic multigrid expert consulting with them on every step > of their project. > > > > On Sep 18, 2022, at 12:25 PM, Mark Adams wrote: > > You re

Re: [petsc-dev] large cost of symmetrizing for GAMG?

2022-09-18 Thread Mark Adams
clear separation of concerns between > the PC and KSP makes doing a good thing more difficult because in the > original KSP/PC design I didn't think about this type of concern. > > > > On Sep 17, 2022, at 10:12 PM, Mark Adams wrote: > > > > On Sat, Sep 17, 2022 at 5:52

Re: [petsc-dev] large cost of symmetrizing for GAMG?

2022-09-17 Thread Mark Adams
me for a while and this user just reminded me and I have a little cleanup MR going What do you think about just removing this and always reuse? Mark > > > > > > On Sep 17, 2022, at 1:43 PM, Mark Adams wrote: > > I don't see a problem here other than the network

Re: [petsc-dev] large cost of symmetrizing for GAMG?

2022-09-17 Thread Mark Adams
at 10:12 AM Barry Smith wrote: > > Sure, but have you ever seen such a large jump in time in going from one > to two MPI ranks, and are there any algorithms to do the aggregation that > would not require this very expensive parallel symmetrization? > > On Sep 17, 2022, at 9

Re: [petsc-dev] large cost of symmetrizing for GAMG?

2022-09-17 Thread Mark Adams
Symetrix graph make a transpose and then adds them. I imagine adding two different matrices is expensive. On Fri, Sep 16, 2022 at 8:30 PM Barry Smith wrote: > > Mark, > >I have runs of GAMG on one and two ranks with -pc_gamg_symmetrize_graph > because the matrix is far from symmetric and

Re: [petsc-dev] VecNest

2022-07-12 Thread Mark Adams
It looks like the RHS is zero in the 2nd case (0 KSP unpreconditioned resid norm 0.e+00), but the true residual is the same. It looks like you added "nest_subvec" to our example. You can start by looking at the vectors with -vec_view (there is code that you can view vectors

Re: [petsc-dev] Hiptmair--Xu as PCCOMPOSITE

2022-07-07 Thread Mark Adams
ML has/had a method derived from Hiptmair in AMG but we don't have an interface to it and ML is in a funny state wrt PETSc. Manteuffel, et al, preferred to avoid all the projecting back and forth, which always looked fragile to me in an AMG context, and step on the null space of each element with

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-27 Thread Mark Adams
On Mon, Jun 27, 2022 at 2:16 PM Matthew Martineau wrote: > Theoretically the access patterns can be worse, but our sparse operations > output matrices with unordered columns, so the fine matrix being sorted > shouldn’t impact the overall performance. > And this "unsorted" is really two _sorted_

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-27 Thread Mark Adams
(4 weeks ago) | | |\ \ \ \ | | | * | | | b68380d8d6b - DMPlexCreatePartitionerGraph_{Overlap,Native}: fix indexing for cell start > 0 (4 weeks ago) | | |/ / / / ** | | | | | e24d7920346 - Fixing issue with PCReset_AMGX (5 days ago) * On Sun, Jun 26, 2022 at 10:16 AM Mark Adams wrote: > > >

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-26 Thread Mark Adams
On Sat, Jun 25, 2022 at 9:39 AM Barry Smith wrote: > > Does AMGX require sorted column indices? (Python indentation notation > below) > > If not > just use MatMPIAIJGetLocalMatMerge instead of MatMPIAIJGetLocalMat. > > Ugh, I worked on this this AM without rebasing over main and lost my

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-25 Thread Mark Adams
On Fri, Jun 24, 2022 at 1:54 PM Barry Smith wrote: > > > On Jun 24, 2022, at 1:38 PM, Mark Adams wrote: > > I am rearranging the code for clarity from the repo but I have: > > PetscBool is_dev_ptrs; > PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX,

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-24 Thread Mark Adams
un 24, 2022 at 10:00 AM Barry Smith wrote: > > > On Jun 24, 2022, at 8:58 AM, Mark Adams wrote: > > And before we move to the MR, I think Matt found a clear problem: > > * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >localA)); > returns "localA seqaij&q

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-24 Thread Mark Adams
inter is a device mapped pointer but that it is invalid" Matt, lets just comment out the REUSE line and add another INITIAL line (destroying the old Mat of course), and lets press on. We can keep the debugging code for now. We (PETSc) can work on this independently, Thanks, Mark On Fri, Jun 2

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-24 Thread Mark Adams
achieve is that when we are parallel, we fetch > the local part of A and the device pointer to the matrix values from that > structure so that we can pass to AmgX. Preferring whichever API calls are > the most efficient. > > > *From:* Stefano Zampini > *Sent:* 23 June 2022 20:5

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-23 Thread Mark Adams
gt; Currently if one changes the nonzero matrix of the parallel matrix one is > likely to get random confusing crashes due to memory corruption. But likely > not the problem here. > > On Jun 23, 2022, at 2:23 PM, Mark Adams wrote: > > We have a bug in the AMGx test snes_tests

[petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

2022-06-23 Thread Mark Adams
We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in the code below, where the local matrix is created (INITIAL), but in the next pass, when "REUSE" is used, he sees an invalid pointer. Matt found that it does

[petsc-dev] AIJ type question

2022-06-16 Thread Mark Adams
I have a test failing: ksp_ksp_tutorials-ex7_gamg_cuda_nsize-2 ( https://gitlab.com/petsc/petsc/-/jobs/2601658676) I have this code that I want to be true here (2 processors): PetscCall(PetscObjectTypeCompare((PetscObject)Gmat, MATMPIAIJ, )); But it is returning false. Should I have a

[petsc-dev] new format error

2022-06-09 Thread Mark Adams
I rebased over master and now this branch fails source code check: SETERRQ() with trailing newline -- 1346 src/ksp/pc/impls/amgx/amgx.cxx:169: SETERRQ(amgx->comm, PETSC_ERR_LIB, "%s\n", msg); \ Any idea how to best fix

Re: [petsc-dev] kokkos hang after rebase over main, maybe, Crusher

2022-05-25 Thread Mark Adams
transient failure. works now. On Tue, May 24, 2022 at 10:13 PM Mark Adams wrote: > I was working on Crusher yesterday and I think I rebased over main and now > I am hanging here. > > Any ideas? > > (gdb) bt > #0 0x7fff81bd5547 in sched_yield () from

[petsc-dev] kokkos hang after rebase over main, maybe, Crusher

2022-05-24 Thread Mark Adams
I was working on Crusher yesterday and I think I rebased over main and now I am hanging here. Any ideas? (gdb) bt #0 0x7fff81bd5547 in sched_yield () from /lib64/libc.so.6 #1 0x7fff79e43665 in ?? () from /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1 #2 0x7fff79e382f4 in ?? () from

Re: [petsc-dev] odd log behavior

2022-05-17 Thread Mark Adams
/* memcpy(,,sizeof(PetscLogDouble)); */ /* } */ /* } */ /* #endif */ On Tue, May 17, 2022 at 7:17 AM Mark Adams wrote: > Can I get some advice on how to add/hack an event now into the reported > events? > > I am noticing that all of the difference

Re: [petsc-dev] odd log behavior

2022-05-16 Thread Mark Adams
ind. > >Barry > > > On May 16, 2022, at 7:31 PM, Mark Adams wrote: > > I am not sure I understand the logic, we print the ratio of max/min. > I report max and look at the ratio to see if I might be catching some load > imbalance or whatever. Is there a problem wi

Re: [petsc-dev] odd log behavior

2022-05-16 Thread Mark Adams
rrectly the inner operations, and > thus feels it is the best default. > > Barry > > > On Apr 27, 2022, at 10:08 AM, Mark Adams wrote: > > > > On Tue, Apr 26, 2022 at 8:00 PM Barry Smith wrote: > >> >> The current nan output has to be replaced to

Re: [petsc-dev] Is the ASM preconditioner algebraic or geometric in the context of DMPlex (unstructured) ?

2022-05-08 Thread Mark Adams
On Sun, May 8, 2022 at 12:44 AM markwinpe wrote: > Dear PETSc’s developers, > > In the context of unstructured applications developed by PETSc’s > DMPlex, if the ASM is used for preconditioning, is it algebraic (based on > matrix graph) or geometric (based on Metis partition)? >

Re: [petsc-dev] odd log behavior

2022-04-27 Thread Mark Adams
Apr 26, 2022, at 3:49 PM, Matthew Knepley wrote: > > On Tue, Apr 26, 2022 at 12:03 PM Mark Adams wrote: > >> Well, Nans are a clear sign that something is very wrong. >> > > Barry chose them so that it could not be mistaken for an actual number. > >

Re: [petsc-dev] odd log behavior

2022-04-26 Thread Mark Adams
. Roman wrote: > > > > You have to add -log_view_gpu_time > > See https://gitlab.com/petsc/petsc/-/merge_requests/5056 > > > > Jose > > > > > >> El 26 abr 2022, a las 16:39, Mark Adams escribió: > >> > >> I'm seeing this on Perlmutter wit

[petsc-dev] odd log behavior

2022-04-26 Thread Mark Adams
I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log timing data except the two 'Solve' lines. Just cg/jacobi on snes/ex56. Any ideas? VecTDot2 1.0 nan nan 1.20e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan-nan 0 0.00e+000

Re: [petsc-dev] How does PETSc define the overlap in an algebraic ASM preconditioner ?

2022-04-12 Thread Mark Adams
You can see https://petsc.org/main/docs/manualpages/PC/PCASM.html It is very simple as you can see but you are limited to one block per MPI process. If you want more then you need to create them and use https://petsc.org/main/docs/manualpages/PC/PCASMSetLocalSubdomains.html#PCASMSetLocalSubdomains

Re: [petsc-dev] PETSc amg solver with gpu seems run slowly

2022-03-28 Thread Mark Adams
tation in the future, that way, I can compare the difference > between NVIDIA and AMD GPU. It seems there are many benchmark cases I can > do in the future. > > Regards, > Qi > > > > > On Wed, Mar 23, 2022 at 9:39 AM Mark Adams wrote: > >> A few points, but firs

Re: [petsc-dev] PETSc amg solver with gpu seems run slowly

2022-03-22 Thread Mark Adams
A few points, but first this is a nice start. If you are interested in working on benchmarking that would be great. If so, read on. * Barry pointed out the SOR issues that are thrashing the memory system. This solve would run faster on the CPU (maybe, 9M eqs is a lot). * Most applications run for

Re: [petsc-dev] MatSetPreallocationCOO remove attached objects?

2022-03-01 Thread Mark Adams
can be merged. >>> >>> On Mar 1, 2022, at 2:47 PM, Junchao Zhang >>> wrote: >>> >>> I realized this problem but did not expect someone would run into it :) >>> Let me think again. >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Mar 1, 2022 at 1:33 PM Mark Adams wrote: >>> >>>> I have a container attached to my matrix and it seems to go away after >>>> a call to MatSetPreallocationCOO. >>>> Does that sound plausible? >>>> >>> >>>

Re: [petsc-dev] MatSetPreallocationCOO remove attached objects?

2022-03-01 Thread Mark Adams
ifferences and why both exist but I think there are some subtle reasons > why there are both and don't know if they can be merged. > > On Mar 1, 2022, at 2:47 PM, Junchao Zhang wrote: > > I realized this problem but did not expect someone would run into it :) > Let me think again. > >

[petsc-dev] MatSetPreallocationCOO remove attached objects?

2022-03-01 Thread Mark Adams
I have a container attached to my matrix and it seems to go away after a call to MatSetPreallocationCOO. Does that sound plausible?

Re: [petsc-dev] funny diff from MR

2022-01-30 Thread Mark Adams
Moved onto slack On Sun, Jan 30, 2022 at 6:07 PM Mark Adams wrote: > I am just adding a flag to hypre in !4781 > <https://gitlab.com/petsc/petsc/-/merge_requests/4781> and I am getting > some significant diffs. > This just looks buggy to me. > I can ask Ruipeng. Maybe I am

[petsc-dev] funny diff from MR

2022-01-30 Thread Mark Adams
I am just adding a flag to hypre in !4781 and I am getting some significant diffs. This just looks buggy to me. I can ask Ruipeng. Maybe I am doing something wrong, like using it on CPU runs but I see diff on CPU and GPU CI tests. Any

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-28 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:51 PM Barry Smith wrote: > > I have added a mini-MR to print out the key so we can see if it is 0 or > some crazy number. https://gitlab.com/petsc/petsc/-/merge_requests/4766 > Well, after all of our MRs (Junchao's in particular) I am not seeing this MPI error. So

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
t; On Jan 26, 2022, at 2:25 PM, Mark Adams wrote: > > I have used valgrind here. I did not run it on this MPI error. I will. > > On Wed, Jan 26, 2022 at 10:56 AM Barry Smith wrote: > >> >> Any way to run with valgrind (or a HIP variant of valgrind)? It looks >>

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
seen this before with buggy MPIs. > > On Wed, Jan 26, 2022 at 1:29 PM Mark Adams wrote: > >> >> >> On Wed, Jan 26, 2022 at 1:54 PM Justin Chang wrote: >> >>> Couple suggestions: >>> >>> 1. Set the environment variable "export

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
> > > Are the crashes reproducible in the same place with identical runs? > > I have not seen my repoducer work and it is in MatAssemblyEnd with not finding a table entry. I can't tell if it is the same error everytime.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
t 9:56 AM Barry Smith wrote: > >> >> Any way to run with valgrind (or a HIP variant of valgrind)? It looks >> like a memory corruption issue and tracking down exactly when the >> corruption begins is 3/4's of the way to finding the exact cause. >> >>

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:25 PM Mark Adams wrote: > I have used valgrind here. I did not run it on this MPI error. I will. > > On Wed, Jan 26, 2022 at 10:56 AM Barry Smith wrote: > >> >> Any way to run with valgrind (or a HIP variant of valgrind)? It looks >>

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
ins is 3/4's of the way to finding the exact cause. > > Are the crashes reproducible in the same place with identical runs? > > > On Jan 26, 2022, at 10:46 AM, Mark Adams wrote: > > I think it is an MPI bug. It works with GPU aware MPI turned off. > I am sure Summit w

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
run 6 nodes (48 mpi ranks). If it fails, then run the same problem on > Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of > our own. > > --Junchao Zhang > > > On Wed, Jan 26, 2022 at 8:44 AM Mark Adams wrote: > >> I am not able to reproduc

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
3.c:169 [15]PETSC ERROR: PETSc Option Table entries: [15]PETSC ERROR: -benchmark_it 10 On Wed, Jan 26, 2022 at 7:26 AM Mark Adams wrote: > The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node. > I will make a minimum reproducer. start with 2 nodes, one process on each > node

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
es almost perfectly, but the overall flop rate is only half of what it > should be at 64). > > On Jan 25, 2022, at 9:24 PM, Mark Adams wrote: > > It looks like we have our instrumentation and job configuration in decent > shape so on to scaling with AMG. > In using multiple nodes I

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
> > > Note that Mark's logs have been switching back and forth between > -use_gpu_aware_mpi and changing number of ranks -- we won't have that > information if we do manual timing hacks. This is going to be a routine > thing we'll need on the mailing list and we need the provenance to go with >

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
Here are two runs, without and with -log_view, respectively. My new timer is "Solve time = " About 10% difference On Tue, Jan 25, 2022 at 12:53 PM Mark Adams wrote: > BTW, a -device_view would be great. > > On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote: > >>

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
BTW, a -device_view would be great. On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote: > > > On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote: > >> Barry Smith writes: >> >> > Thanks Mark, far more interesting. I've improved the formatting to >> make

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote: > Barry Smith writes: > > > Thanks Mark, far more interesting. I've improved the formatting to > make it easier to read (and fixed width font for email reading) > > > > * Can you do same run with say 10 iterations of Jacobi PC? > > > > *

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
> > > > > VecPointwiseMult 201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 1 1 0 0 0 235882 290088 0 0.00e+000 > 0.00e+00 100 > > VecScatterBegin 200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04 > 1.0e+00 2 0 99 79 0 19 0100100 0

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
ronica >> Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in >> those emails >> >> On Mon, Jan 24, 2022 at 1:49 PM Barry Smith wrote: >> >>> >>> >>> On Jan 24, 2022, at 2:46 PM, Mark Adams wrote: >>> >>> Ye

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
On Mon, Jan 24, 2022 at 2:57 PM Justin Chang wrote: > My name has been called. > > Mark, if you're having issues with Crusher, please contact Veronica > Vergara (vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in > those emails > I have worked with Veronica before. I'll ask Tood if we

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
t;> >> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams wrote: >> >>> >>> >>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang >>> wrote: >>> >>>> Mark, I think you can benchmark individual vector operations, and once >>>

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
> --Junchao Zhang > > > On Mon, Jan 24, 2022 at 12:09 PM Mark Adams wrote: > >> >> >> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith wrote: >> >>> >>> Here except for VecNorm the GPU is used effectively in that most of >>> the tim

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
very well find some low level magic. > > > On Jan 24, 2022, at 12:14 PM, Mark Adams wrote: > > > >> Mark, can we compare with Spock? >> > > Looks much better. This puts two processes/GPU because there are only 4. > > > >

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
> > Mark, can we compare with Spock? > Looks much better. This puts two processes/GPU because there are only 4. DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625 Number of 1-cells per rank:

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Mark Adams
> Mark, > > Can you run both with GPU aware MPI? > > Perlmuter fails with GPU aware MPI. I think there are know problems with this that are being worked on. And here is Crusher with GPU aware MPI. DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Mark Adams
Ugh, try again. Still a big difference, but less. Mat-vec does not change much. On Sun, Jan 23, 2022 at 7:12 PM Barry Smith wrote: > > You have debugging turned on on crusher but not permutter > > On Jan 23, 2022, at 6:37 PM, Mark Adams wrote: > > * Perlmutter is roug

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Mark Adams
t /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:449 [48]PETSC ERROR: #5 PetscInitialize_Common() at /global/u2/m/madams/petsc/src/sys/objects/pinit.c:963 [48]PETSC ERROR: #6 PetscInitialize() at /global/u2/m/madams/petsc/src/sys/objects/pinit.c:1238 On Sun, Jan 23, 2022 at 8:58

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Mark Adams
0 0.00e+000 > 0.00e+00 100 > VecPointwiseMult 402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 3 1 0 0 18,67538,633 0 0.00e+000 > 0.00e+00 100 > > > > On Jan 22, 2022, at 12:40 PM, Mark Adams wrote: > > And I have a new MR with if you want to see what I've done so far. > > >

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
And I have a new MR with if you want to see what I've done so far.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
On Sat, Jan 22, 2022 at 12:29 PM Jed Brown wrote: > Mark Adams writes: > > >> > >> > >> > >> > VecPointwiseMult 402 1.0 2.9605e-01 3.6 1.05e+08 1.0 0.0e+00 > 0.0e+00 > >> 0.0e+00 0 0 0 0 0 5 1 0 0 0 22515 70608

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
So where are we as far as timers? See the latest examples (with 160 CHARACTER) Jed, "(I don't trust these timings)." what do you think? No sense in doing an MR if it is still nonsense. On Sat, Jan 22, 2022 at 12:16 PM Jed Brown wrote: > Mark Adams writes: > > > as far as

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
> > > > > VecPointwiseMult 402 1.0 2.9605e-01 3.6 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 5 1 0 0 0 22515 70608 0 0.00e+000 > 0.00e+00 100 > > VecScatterBegin 400 1.0 1.6791e-01 6.0 0.00e+00 0.0 3.7e+05 1.6e+04 > 0.0e+00 0 0 62 54 0 2 0100100 0 0

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
--- On Sat, Jan 22, 2022 at 11:10 AM Junchao Zhang wrote: > > > > On Sat, Jan 22, 2022 at 10:04 AM Mark Adams wrote: > >> Logging GPU flops should be inside of PetscLogGpuTim

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
On Sat, Jan 22, 2022 at 10:25 AM Jed Brown wrote: > Mark Adams writes: > > > On Fri, Jan 21, 2022 at 9:55 PM Barry Smith wrote: > > > >> > >> Interesting, Is this with all native Kokkos kernels or do some kokkos > >> kernels use rocm? > >

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
t; On Jan 21, 2022, at 9:37 PM, Mark Adams wrote: > > Here is one with 2M / GPU. Getting better. > > On Fri, Jan 21, 2022 at 9:17 PM Barry Smith wrote: > >> >>Matt is correct, vectors are way too small. >> >>BTW: Now would be a good time to

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
But in particular look at the VecTDot and VecNorm CPU flop >> rates compared to the GPU, much lower, this tells me the MPI_Allreduce is >> likely hurting performance in there also a great deal. It would be good to >> see a single MPI rank job to compare to see performance without

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Mark Adams
On Fri, Jan 21, 2022 at 9:55 PM Barry Smith wrote: > > Interesting, Is this with all native Kokkos kernels or do some kokkos > kernels use rocm? > Ah, good question. I often run with tpl=0 but I did not specify here on Crusher. In looking at the log files I see

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-21 Thread Mark Adams
> > >But in particular look at the VecTDot and VecNorm CPU flop > rates compared to the GPU, much lower, this tells me the MPI_Allreduce is > likely hurting performance in there also a great deal. It would be good to > see a single MPI rank job to compare to see performance without the

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-21 Thread Mark Adams
and performance on VecOps. > >Also Report 2. > > Barry > > > On Jan 21, 2022, at 7:58 PM, Matthew Knepley wrote: > > On Fri, Jan 21, 2022 at 6:41 PM Mark Adams wrote: > >> I am looking at performance of a CG/Jacobi solve on a 3D Q2 Laplacian >> (e

[petsc-dev] Kokkos/Crusher perforance

2022-01-21 Thread Mark Adams
I am looking at performance of a CG/Jacobi solve on a 3D Q2 Laplacian (ex13) on one Crusher node (8 GPUs on 4 GPU sockets, MI250X or is it MI200?). This is with a 16M equation problem. GPU-aware MPI and non GPU-aware MPI are similar (mat-vec is a little faster w/o, the total is about the same,

[petsc-dev] Is -mat_type hypre supposed to work on Crusher

2022-01-19 Thread Mark Adams
This is not critical. I don't need MatFDColoringCreate, but is this supposed to work? It does work w/o *-mat_type hypre* 12:57 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ srun -n2 -N1 ./ex19 -dm_vec_type hip -da_refine 1 -snes_monitor_short -ksp_norm_type unpreconditioned

Re: [petsc-dev] Using PETSC with GPUs

2022-01-14 Thread Mark Adams
; -Wl,-rpath,/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib > -L/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib > -Wl,-rpath,/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/lib/gcc/ppc64le-redhat-linux/8 > -L/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/lib/gcc/

Re: [petsc-dev] Using PETSC with GPUs

2022-01-14 Thread Mark Adams
There are a few things: * GPU have higher latencies and so you basically need a large enough problem to get GPU speedup * I assume you are assembling the matrix on the CPU. The copy of data to the GPU takes time and you really should be creating the matrix on the GPU * I agree with Barry, Roughly

Re: [petsc-dev] user makefile

2022-01-14 Thread Mark Adams
, you can add additional packages with one > line. > > > > PACKAGES := $(petsc.pc) $(x.pc) $(y.pc) > > > > where the x.pc variable can either be the plain package name (to look > for it in default locations) or a path to the x.pc file. > > > > Mark Adams write

Re: [petsc-dev] user makefile

2022-01-14 Thread Mark Adams
022, at 21:49, Barry Smith wrote: > >> > >> > >> > >> > https://petsc.org/release/docs/manual/getting_started/?highlight=user%20makefile > >> > >> Search for Writing Application Codes with PETSc > >> > >> Perhaps this need

Re: [petsc-dev] user makefile

2022-01-14 Thread Mark Adams
; >> Search for Writing Application Codes with PETSc >> >> Perhaps this needs to clearer or have links to it from the FAQ. >> >> >> >> On Jan 13, 2022, at 9:38 PM, Mark Adams wrote: >> >> I am finding it pretty hard to find an example of a m

Re: [petsc-dev] user makefile

2022-01-14 Thread Mark Adams
or have links to it from the FAQ. > > > > On Jan 13, 2022, at 9:38 PM, Mark Adams wrote: > > I am finding it pretty hard to find an example of a makefile target to > build an app with PETSc. > > I can not find it on the docs page. > > With Google: > > Victor ha

Re: [petsc-dev] user makefile

2022-01-14 Thread Mark Adams
ing Application Codes with PETSc > > Perhaps this needs to clearer or have links to it from the FAQ. > > > > On Jan 13, 2022, at 9:38 PM, Mark Adams wrote: > > I am finding it pretty hard to find an example of a makefile target to > build an app with PETSc. > > I c

[petsc-dev] user makefile

2022-01-13 Thread Mark Adams
I am finding it pretty hard to find an example of a makefile target to build an app with PETSc. I can not find it on the docs page. With Google: Victor has a little example that looks fine ($PETSC_LIB). (sort of, see below) Another one is bigger and has this at the end: -${CLINKER} $< -o

Re: [petsc-dev] PETSc init eats too much CUDA memory

2022-01-08 Thread Mark Adams
cuda-memcheck is a valgrind clone, but like valgrind it does not report usage as it goes. Just in a report at the end. On Fri, Jan 7, 2022 at 10:23 PM Barry Smith wrote: > > Doesn't Nvidia supply a "valgrind" like tool that will allow tracking > memory usage? I'm pretty sure I've seen one; it

Re: [petsc-dev] cuda with kokkos-cuda build fail

2022-01-07 Thread Mark Adams
-ex19_cuda.counts ok snes_tutorials-ex19_cuda ok diff-snes_tutorials-ex19_cuda 12:17 nid002872 main= perlmutter:~/petsc$ On Fri, Jan 7, 2022 at 2:23 PM Mark Adams wrote: > And it looks universal: > > 11:21 nid001544 main= perlmutter:~/petsc$ make > PETSC_ARCH=arch-perlmutter-opt-gcc-ko

Re: [petsc-dev] cuda with kokkos-cuda build fail

2022-01-07 Thread Mark Adams
utorials/ex1.c:45 > # [0]PETSC ERROR: PETSc Option Table entries: > # [0]PETSC ERROR: -check_pointer_intensity 0 > # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1 > # [0]PETSC ERROR: -dm_landau_device_type cuda > # [0]PETSC ERROR: -dm_landau_ion_charges 1,18 > > On Fri, Jan 7, 2022

Re: [petsc-dev] cuda with kokkos-cuda build fail

2022-01-07 Thread Mark Adams
s_max 2,1 # [0]PETSC ERROR: -dm_landau_device_type cuda # [0]PETSC ERROR: -dm_landau_ion_charges 1,18 On Fri, Jan 7, 2022 at 1:52 PM Junchao Zhang wrote: > > > > On Fri, Jan 7, 2022 at 11:17 AM Mark Adams wrote: > >> These are cuda/cusparse tests. The Kokkos versions

  1   2   3   4   5   6   7   8   9   10   >