Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
Valgrind was not useful. Just an MPI abort message with 128 process output. Can we merge my MR and I can test your branch. On Wed, Jan 26, 2022 at 2:51 PM Barry Smith wrote: > > I have added a mini-MR to print out the key so we can see if it is 0 or > some crazy number.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Barry Smith
I have added a mini-MR to print out the key so we can see if it is 0 or some crazy number. https://gitlab.com/petsc/petsc/-/merge_requests/4766 Note that the table data structure is not sent through MPI so if MPI is the culprit it is not just that MPI is putting incorrect (or no)

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:32 PM Justin Chang wrote: > rocgdb requires "-ggdb" in addition to "-g" > Ah, OK. > > What happens if you lower AMD_LOG_LEVEL to something like 1 or 2? I was > hoping AMD_LOG_LEVEL could at least give you something like a "stacktrace" > showing what the last

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
> > > Are the crashes reproducible in the same place with identical runs? > > I have not seen my repoducer work and it is in MatAssemblyEnd with not finding a table entry. I can't tell if it is the same error everytime.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Justin Chang
rocgdb requires "-ggdb" in addition to "-g" What happens if you lower AMD_LOG_LEVEL to something like 1 or 2? I was hoping AMD_LOG_LEVEL could at least give you something like a "stacktrace" showing what the last successful HIP/HSA call was. I believe it should also show line numbers in the code.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 1:54 PM Justin Chang wrote: > Couple suggestions: > > 1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will tell > you everything that's happening at the HIP level (memcpy's, mallocs, kernel > execution time, etc) > Humm, My reproducer uses 2 nodes and

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:25 PM Mark Adams wrote: > I have used valgrind here. I did not run it on this MPI error. I will. > > On Wed, Jan 26, 2022 at 10:56 AM Barry Smith wrote: > >> >> Any way to run with valgrind (or a HIP variant of valgrind)? It looks >> like a memory corruption issue

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I have used valgrind here. I did not run it on this MPI error. I will. On Wed, Jan 26, 2022 at 10:56 AM Barry Smith wrote: > > Any way to run with valgrind (or a HIP variant of valgrind)? It looks > like a memory corruption issue and tracking down exactly when the > corruption begins is 3/4's

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Justin Chang
Couple suggestions: 1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will tell you everything that's happening at the HIP level (memcpy's, mallocs, kernel execution time, etc) 2. Try rocgdb, AFAIK this is the closest "HIP variant of valgrind" that we officially support. There are

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Barry Smith
Any way to run with valgrind (or a HIP variant of valgrind)? It looks like a memory corruption issue and tracking down exactly when the corruption begins is 3/4's of the way to finding the exact cause. Are the crashes reproducible in the same place with identical runs? > On Jan 26, 2022,

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I think it is an MPI bug. It works with GPU aware MPI turned off. I am sure Summit will be fine. We have had users fix this error by switching thier MPI. On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang wrote: > I don't know if this is due to bugs in petsc/kokkos backend. See if you > can run 6

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Junchao Zhang
I don't know if this is due to bugs in petsc/kokkos backend. See if you can run 6 nodes (48 mpi ranks). If it fails, then run the same problem on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of our own. --Junchao Zhang On Wed, Jan 26, 2022 at 8:44 AM Mark Adams

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I am not able to reproduce this with a small problem. 2 nodes or less refinement works. This is from the 8 node test, the -dm_refine 5 version. I see that it comes from PtAP. This is on the fine grid. (I was thinking it could be on a reduced grid with idle processors, but no) [15]PETSC ERROR:

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node. I will make a minimum reproducer. start with 2 nodes, one process on each node. On Tue, Jan 25, 2022 at 10:19 PM Barry Smith wrote: > > So the MPI is killing you in going from 8 to 64. (The GPU flop rate > scales almost