Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases
Mark, Do you have a reproducer using petsc examples? On Tue, Nov 15, 2022, 12:49 PM Mark Adams wrote: > Junchao, this is the same problem that I have been having right? > > On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users < > petsc-users@mcs.anl.gov> wrote: > >> I built petsc with: >> >> $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 >> --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices >> --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos >> --download-kokkos-kernels >> >> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all >> >> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install >> >> >> Then I build xolotl in a separate build directory (after checking out the >> "feature-petsc-kokkos" branch) with: >> >> $ cmake -DCMAKE_BUILD_TYPE=Debug >> -DKokkos_DIR=$HOME/build/petsc/debug/install >> -DPETSC_DIR=$HOME/build/petsc/debug/install >> >> $ make -j4 SystemTester >> >> >> Then, from the xolotl build directory, run (for example): >> >> $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v >> >> Note that this test case will use the parameter file >> '/benchmarks/params_system_NE_4.txt' which has the command-line >> arguments for petsc in its "petscArgs=..." line. If you look at >> '/test/system/SystemTester.cpp' all the system test cases >> follow the same naming convention with their corresponding parameter files >> under '/benchmarks'. >> >> The failure happens with the NE_4 case (which is 2D) and the PSI_3 case >> (which is 1D). >> >> Let me know if this is still unclear. >> >> Thanks, >> >> >> *Philip Fackler * >> Research Software Engineer, Application Engineering Group >> Advanced Computing Systems Research Section >> Computer Science and Mathematics Division >> *Oak Ridge National Laboratory* >> -- >> *From:* Junchao Zhang >> *Sent:* Tuesday, November 15, 2022 00:16 >> *To:* Fackler, Philip >> *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie >> >> *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with >> COO interface crashes in some cases >> >> Hi, Philip, >> Can you tell me instructions to build Xolotl to reproduce the error? >> --Junchao Zhang >> >> >> On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users < >> petsc-users@mcs.anl.gov> wrote: >> >> In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use >> the COO interface for preallocating and setting values in the Jacobian >> matrix. I have found that with some of our test cases, using more than one >> MPI rank results in a crash. Way down in the preconditioner code in petsc a >> Mat gets computed that has "null" for the "productsymbolic" member of its >> "ops". It's pretty far removed from where we compute the Jacobian entries, >> so I haven't been able (so far) to track it back to an error in my code. >> I'd appreciate some help with this from someone who is more familiar with >> the petsc guts so we can figure out what I'm doing wrong. (I'm assuming >> it's a bug in Xolotl.) >> >> Note that this is using the kokkos backend for Mat and Vec in petsc, but >> with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only >> multiple MPI rank run. >> >> Here's a paste of the error output showing the relevant parts of the call >> stack: >> >> [ERROR] [0]PETSC ERROR: >> [ERROR] - Error Message >> -- >> [ERROR] [1]PETSC ERROR: >> [ERROR] - Error Message >> -- >> [ERROR] [1]PETSC ERROR: >> [ERROR] [0]PETSC ERROR: >> [ERROR] No support for this operation for this object type >> [ERROR] [1]PETSC ERROR: >> [ERROR] No support for this operation for this object type >> [ERROR] [0]PETSC ERROR: >> [ERROR] No method productsymbolic for Mat of type (null) >> [ERROR] No method productsymbolic for Mat of type (null) >> [ERROR] [0]PETSC ERROR: >> [ERROR] [1]PETSC ERROR: >> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. >> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. >> [ERROR] [0]PETSC ERROR: >> [ERROR] [1]PETSC ERROR: >> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT >> Date: 2022-10-28 14:39:41 + >> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT >> Date: 2022-10-28 14:39:41 + >> [ERROR] [1]PETSC ERROR: >> [ERROR] [0]PETSC ERROR: >> [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 >> [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 >> [ERROR] [1]PETSC ERROR: >> [ERROR] [0]PETSC ERROR: >> [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc >> PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc >> --with-cxx=mpicxx --with-fc=0 --with-cudac=0 >>
Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases
Junchao, this is the same problem that I have been having right? On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users < petsc-users@mcs.anl.gov> wrote: > I built petsc with: > > $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug > --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 > --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices > --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos > --download-kokkos-kernels > > $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all > > $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install > > > Then I build xolotl in a separate build directory (after checking out the > "feature-petsc-kokkos" branch) with: > > $ cmake -DCMAKE_BUILD_TYPE=Debug > -DKokkos_DIR=$HOME/build/petsc/debug/install > -DPETSC_DIR=$HOME/build/petsc/debug/install > > $ make -j4 SystemTester > > > Then, from the xolotl build directory, run (for example): > > $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v > > Note that this test case will use the parameter file > '/benchmarks/params_system_NE_4.txt' which has the command-line > arguments for petsc in its "petscArgs=..." line. If you look at > '/test/system/SystemTester.cpp' all the system test cases > follow the same naming convention with their corresponding parameter files > under '/benchmarks'. > > The failure happens with the NE_4 case (which is 2D) and the PSI_3 case > (which is 1D). > > Let me know if this is still unclear. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > -- > *From:* Junchao Zhang > *Sent:* Tuesday, November 15, 2022 00:16 > *To:* Fackler, Philip > *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie < > sblon...@utk.edu> > *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO > interface crashes in some cases > > Hi, Philip, > Can you tell me instructions to build Xolotl to reproduce the error? > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use > the COO interface for preallocating and setting values in the Jacobian > matrix. I have found that with some of our test cases, using more than one > MPI rank results in a crash. Way down in the preconditioner code in petsc a > Mat gets computed that has "null" for the "productsymbolic" member of its > "ops". It's pretty far removed from where we compute the Jacobian entries, > so I haven't been able (so far) to track it back to an error in my code. > I'd appreciate some help with this from someone who is more familiar with > the petsc guts so we can figure out what I'm doing wrong. (I'm assuming > it's a bug in Xolotl.) > > Note that this is using the kokkos backend for Mat and Vec in petsc, but > with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only > multiple MPI rank run. > > Here's a paste of the error output showing the relevant parts of the call > stack: > > [ERROR] [0]PETSC ERROR: > [ERROR] - Error Message > -- > [ERROR] [1]PETSC ERROR: > [ERROR] - Error Message > -- > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] No support for this operation for this object type > [ERROR] [1]PETSC ERROR: > [ERROR] No support for this operation for this object type > [ERROR] [0]PETSC ERROR: > [ERROR] No method productsymbolic for Mat of type (null) > [ERROR] No method productsymbolic for Mat of type (null) > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. > [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. > [ERROR] [0]PETSC ERROR: > [ERROR] [1]PETSC ERROR: > [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT > Date: 2022-10-28 14:39:41 + > [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT > Date: 2022-10-28 14:39:41 + > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 > [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 > [ERROR] [1]PETSC ERROR: > [ERROR] [0]PETSC ERROR: > [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc > PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc > --with-cxx=mpicxx --with-fc=0 --with-cudac=0 > --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices > --with-shared-libraries > --with-kokkos-dir=/home/4pf/build/kokkos/serial/install > --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install > [ERROR]
Re: [petsc-users] On PCFIELDSPLIT and its implementation
Thanks, I'll do it then :) Il Mar 15 Nov 2022, 19:25 Jed Brown ha scritto: > You do if preconditioners (like AMG) will use it or if using functions > like MatSetValuesBlocked(). If you have uniform block structure, it doesn't > hurt. > > Edoardo alinovi writes: > > > Hi Guys, > > > > Very quick one. Do I need to set the block size with MPIAIJ? >
Re: [petsc-users] On PCFIELDSPLIT and its implementation
You do if preconditioners (like AMG) will use it or if using functions like MatSetValuesBlocked(). If you have uniform block structure, it doesn't hurt. Edoardo alinovi writes: > Hi Guys, > > Very quick one. Do I need to set the block size with MPIAIJ?
Re: [petsc-users] On PCFIELDSPLIT and its implementation
Hi Guys, Very quick one. Do I need to set the block size with MPIAIJ?
Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.
Can you paste -log_view result so I can see what functions are used? --Junchao Zhang On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip wrote: > Yes, most (but not all) of our system test cases fail with the kokkos/cuda > or cuda backends. All of them pass with the CPU-only kokkos backend. > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > -- > *From:* Junchao Zhang > *Sent:* Monday, November 14, 2022 19:34 > *To:* Fackler, Philip > *Cc:* xolotl-psi-developm...@lists.sourceforge.net < > xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov>; Blondel, Sophie ; Zhang, > Junchao ; Roth, Philip > *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec > diverging when running on CUDA device. > > Hi, Philip, > Sorry to hear that. It seems you could run the same code on CPUs but > not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it > right? > > --Junchao Zhang > > > On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > This is an issue I've brought up before (and discussed in-person with > Richard). I wanted to bring it up again because I'm hitting the limits of > what I know to do, and I need help figuring this out. > > The problem can be reproduced using Xolotl's "develop" branch built > against a petsc build with kokkos and kokkos-kernels enabled. Then, either > add the relevant kokkos options to the "petscArgs=" line in the system test > parameter file(s), or just replace the system test parameter files with the > ones from the "feature-petsc-kokkos" branch. See here the files that > begin with "params_system_". > > Note that those files use the "kokkos" options, but the problem is similar > using the corresponding cuda/cusparse options. I've already tried building > kokkos-kernels with no TPLs and got slightly different results, but the > same problem. > > Any help would be appreciated. > > Thanks, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > >
Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases
I built petsc with: $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos --download-kokkos-kernels $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install Then I build xolotl in a separate build directory (after checking out the "feature-petsc-kokkos" branch) with: $ cmake -DCMAKE_BUILD_TYPE=Debug -DKokkos_DIR=$HOME/build/petsc/debug/install -DPETSC_DIR=$HOME/build/petsc/debug/install $ make -j4 SystemTester Then, from the xolotl build directory, run (for example): $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v Note that this test case will use the parameter file '/benchmarks/params_system_NE_4.txt' which has the command-line arguments for petsc in its "petscArgs=..." line. If you look at '/test/system/SystemTester.cpp' all the system test cases follow the same naming convention with their corresponding parameter files under '/benchmarks'. The failure happens with the NE_4 case (which is 2D) and the PSI_3 case (which is 1D). Let me know if this is still unclear. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory From: Junchao Zhang Sent: Tuesday, November 15, 2022 00:16 To: Fackler, Philip Cc: petsc-users@mcs.anl.gov ; Blondel, Sophie Subject: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO interface crashes in some cases Hi, Philip, Can you tell me instructions to build Xolotl to reproduce the error? --Junchao Zhang On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO interface for preallocating and setting values in the Jacobian matrix. I have found that with some of our test cases, using more than one MPI rank results in a crash. Way down in the preconditioner code in petsc a Mat gets computed that has "null" for the "productsymbolic" member of its "ops". It's pretty far removed from where we compute the Jacobian entries, so I haven't been able (so far) to track it back to an error in my code. I'd appreciate some help with this from someone who is more familiar with the petsc guts so we can figure out what I'm doing wrong. (I'm assuming it's a bug in Xolotl.) Note that this is using the kokkos backend for Mat and Vec in petsc, but with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple MPI rank run. Here's a paste of the error output showing the relevant parts of the call stack: [ERROR] [0]PETSC ERROR: [ERROR] - Error Message -- [ERROR] [1]PETSC ERROR: [ERROR] - Error Message -- [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [1]PETSC ERROR: [ERROR] No support for this operation for this object type [ERROR] [0]PETSC ERROR: [ERROR] No method productsymbolic for Mat of type (null) [ERROR] No method productsymbolic for Mat of type (null) [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting. [ERROR] [0]PETSC ERROR: [ERROR] [1]PETSC ERROR: [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 + [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 + [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] Unknown Name on a named PC0115427 by 4pf Mon Nov 14 13:22:01 2022 [ERROR] [1]PETSC ERROR: [ERROR] [0]PETSC ERROR: [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices --with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install
Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend. Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory From: Junchao Zhang Sent: Monday, November 14, 2022 19:34 To: Fackler, Philip Cc: xolotl-psi-developm...@lists.sourceforge.net ; petsc-users@mcs.anl.gov ; Blondel, Sophie ; Zhang, Junchao ; Roth, Philip Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device. Hi, Philip, Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right? --Junchao Zhang On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out. The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_". Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem. Any help would be appreciated. Thanks, Philip Fackler Research Software Engineer, Application Engineering Group Advanced Computing Systems Research Section Computer Science and Mathematics Division Oak Ridge National Laboratory