Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

2022-11-15 Thread Junchao Zhang
Mark,
Do you have a reproducer using petsc examples?

On Tue, Nov 15, 2022, 12:49 PM Mark Adams  wrote:

> Junchao, this is the same problem that I have been having right?
>
> On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> I built petsc with:
>>
>> $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug
>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0
>> --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices
>> --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos
>> --download-kokkos-kernels
>>
>> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all
>>
>> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install
>>
>>
>> Then I build xolotl in a separate build directory (after checking out the
>> "feature-petsc-kokkos" branch) with:
>>
>> $ cmake -DCMAKE_BUILD_TYPE=Debug
>> -DKokkos_DIR=$HOME/build/petsc/debug/install
>> -DPETSC_DIR=$HOME/build/petsc/debug/install 
>>
>> $ make -j4 SystemTester
>>
>>
>> Then, from the xolotl build directory, run (for example):
>>
>> $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v
>>
>> Note that this test case will use the parameter file
>> '/benchmarks/params_system_NE_4.txt' which has the command-line
>> arguments for petsc in its "petscArgs=..." line. If you look at
>> '/test/system/SystemTester.cpp' all the system test cases
>> follow the same naming convention with their corresponding parameter files
>> under '/benchmarks'.
>>
>> The failure happens with the NE_4 case (which is 2D) and the PSI_3 case
>> (which is 1D).
>>
>> Let me know if this is still unclear.
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> --
>> *From:* Junchao Zhang 
>> *Sent:* Tuesday, November 15, 2022 00:16
>> *To:* Fackler, Philip 
>> *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie
>> 
>> *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with
>> COO interface crashes in some cases
>>
>> Hi, Philip,
>>   Can you tell me instructions to build Xolotl to reproduce the error?
>> --Junchao Zhang
>>
>>
>> On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>> In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use
>> the COO interface for preallocating and setting values in the Jacobian
>> matrix. I have found that with some of our test cases, using more than one
>> MPI rank results in a crash. Way down in the preconditioner code in petsc a
>> Mat gets computed that has "null" for the "productsymbolic" member of its
>> "ops". It's pretty far removed from where we compute the Jacobian entries,
>> so I haven't been able (so far) to track it back to an error in my code.
>> I'd appreciate some help with this from someone who is more familiar with
>> the petsc guts so we can figure out what I'm doing wrong. (I'm assuming
>> it's a bug in Xolotl.)
>>
>> Note that this is using the kokkos backend for Mat and Vec in petsc, but
>> with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only
>> multiple MPI rank run.
>>
>> Here's a paste of the error output showing the relevant parts of the call
>> stack:
>>
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] - Error Message
>> --
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] - Error Message
>> --
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] No support for this operation for this object type
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] No support for this operation for this object type
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] No method productsymbolic for Mat of type (null)
>> [ERROR] No method productsymbolic for Mat of type (null)
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
>> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
>> Date: 2022-10-28 14:39:41 +
>> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
>> Date: 2022-10-28 14:39:41 +
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
>> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
>> [ERROR] [1]PETSC ERROR:
>> [ERROR] [0]PETSC ERROR:
>> [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc
>> PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc
>> --with-cxx=mpicxx --with-fc=0 --with-cudac=0
>> 

Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

2022-11-15 Thread Mark Adams
Junchao, this is the same problem that I have been having right?

On Tue, Nov 15, 2022 at 11:56 AM Fackler, Philip via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> I built petsc with:
>
> $ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug
> --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0
> --prefix=$HOME/build/petsc/debug/install --with-64-bit-indices
> --with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos
> --download-kokkos-kernels
>
> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all
>
> $ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install
>
>
> Then I build xolotl in a separate build directory (after checking out the
> "feature-petsc-kokkos" branch) with:
>
> $ cmake -DCMAKE_BUILD_TYPE=Debug
> -DKokkos_DIR=$HOME/build/petsc/debug/install
> -DPETSC_DIR=$HOME/build/petsc/debug/install 
>
> $ make -j4 SystemTester
>
>
> Then, from the xolotl build directory, run (for example):
>
> $ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v
>
> Note that this test case will use the parameter file
> '/benchmarks/params_system_NE_4.txt' which has the command-line
> arguments for petsc in its "petscArgs=..." line. If you look at
> '/test/system/SystemTester.cpp' all the system test cases
> follow the same naming convention with their corresponding parameter files
> under '/benchmarks'.
>
> The failure happens with the NE_4 case (which is 2D) and the PSI_3 case
> (which is 1D).
>
> Let me know if this is still unclear.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Tuesday, November 15, 2022 00:16
> *To:* Fackler, Philip 
> *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie <
> sblon...@utk.edu>
> *Subject:* [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO
> interface crashes in some cases
>
> Hi, Philip,
>   Can you tell me instructions to build Xolotl to reproduce the error?
> --Junchao Zhang
>
>
> On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use
> the COO interface for preallocating and setting values in the Jacobian
> matrix. I have found that with some of our test cases, using more than one
> MPI rank results in a crash. Way down in the preconditioner code in petsc a
> Mat gets computed that has "null" for the "productsymbolic" member of its
> "ops". It's pretty far removed from where we compute the Jacobian entries,
> so I haven't been able (so far) to track it back to an error in my code.
> I'd appreciate some help with this from someone who is more familiar with
> the petsc guts so we can figure out what I'm doing wrong. (I'm assuming
> it's a bug in Xolotl.)
>
> Note that this is using the kokkos backend for Mat and Vec in petsc, but
> with a serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only
> multiple MPI rank run.
>
> Here's a paste of the error output showing the relevant parts of the call
> stack:
>
> [ERROR] [0]PETSC ERROR:
> [ERROR] - Error Message
> --
> [ERROR] [1]PETSC ERROR:
> [ERROR] - Error Message
> --
> [ERROR] [1]PETSC ERROR:
> [ERROR] [0]PETSC ERROR:
> [ERROR] No support for this operation for this object type
> [ERROR] [1]PETSC ERROR:
> [ERROR] No support for this operation for this object type
> [ERROR] [0]PETSC ERROR:
> [ERROR] No method productsymbolic for Mat of type (null)
> [ERROR] No method productsymbolic for Mat of type (null)
> [ERROR] [0]PETSC ERROR:
> [ERROR] [1]PETSC ERROR:
> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
> [ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
> [ERROR] [0]PETSC ERROR:
> [ERROR] [1]PETSC ERROR:
> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
> Date: 2022-10-28 14:39:41 +
> [ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT
> Date: 2022-10-28 14:39:41 +
> [ERROR] [1]PETSC ERROR:
> [ERROR] [0]PETSC ERROR:
> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
> [ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
> [ERROR] [1]PETSC ERROR:
> [ERROR] [0]PETSC ERROR:
> [ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc
> PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc
> --with-cxx=mpicxx --with-fc=0 --with-cudac=0
> --prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices
> --with-shared-libraries
> --with-kokkos-dir=/home/4pf/build/kokkos/serial/install
> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
> [ERROR] 

Re: [petsc-users] On PCFIELDSPLIT and its implementation

2022-11-15 Thread Edoardo alinovi
Thanks, I'll do it then :)

Il Mar 15 Nov 2022, 19:25 Jed Brown  ha scritto:

> You do if preconditioners (like AMG) will use it or if using functions
> like MatSetValuesBlocked(). If you have uniform block structure, it doesn't
> hurt.
>
> Edoardo alinovi  writes:
>
> > Hi Guys,
> >
> > Very quick one. Do I need to set the block size with MPIAIJ?
>


Re: [petsc-users] On PCFIELDSPLIT and its implementation

2022-11-15 Thread Jed Brown
You do if preconditioners (like AMG) will use it or if using functions like 
MatSetValuesBlocked(). If you have uniform block structure, it doesn't hurt.

Edoardo alinovi  writes:

> Hi Guys,
>
> Very quick one. Do I need to set the block size with MPIAIJ?


Re: [petsc-users] On PCFIELDSPLIT and its implementation

2022-11-15 Thread Edoardo alinovi
Hi Guys,

Very quick one. Do I need to set the block size with MPIAIJ?


Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

2022-11-15 Thread Junchao Zhang
Can you paste -log_view result so I can see what functions are used?

--Junchao Zhang


On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip  wrote:

> Yes, most (but not all) of our system test cases fail with the kokkos/cuda
> or cuda backends. All of them pass with the CPU-only kokkos backend.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Monday, November 14, 2022 19:34
> *To:* Fackler, Philip 
> *Cc:* xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>; Blondel, Sophie ; Zhang,
> Junchao ; Roth, Philip 
> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec
> diverging when running on CUDA device.
>
> Hi, Philip,
>   Sorry to hear that.  It seems you could run the same code on CPUs but
> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it
> right?
>
> --Junchao Zhang
>
>
> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> This is an issue I've brought up before (and discussed in-person with
> Richard). I wanted to bring it up again because I'm hitting the limits of
> what I know to do, and I need help figuring this out.
>
> The problem can be reproduced using Xolotl's "develop" branch built
> against a petsc build with kokkos and kokkos-kernels enabled. Then, either
> add the relevant kokkos options to the "petscArgs=" line in the system test
> parameter file(s), or just replace the system test parameter files with the
> ones from the "feature-petsc-kokkos" branch. See here the files that
> begin with "params_system_".
>
> Note that those files use the "kokkos" options, but the problem is similar
> using the corresponding cuda/cusparse options. I've already tried building
> kokkos-kernels with no TPLs and got slightly different results, but the
> same problem.
>
> Any help would be appreciated.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>


Re: [petsc-users] [EXTERNAL] Re: Using multiple MPI ranks with COO interface crashes in some cases

2022-11-15 Thread Fackler, Philip via petsc-users
I built petsc with:

$ ./configure PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug 
--with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-debugging=0 
--prefix=$HOME/build/petsc/debug/install --with-64-bit-indices 
--with-shared-libraries --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --download-kokkos 
--download-kokkos-kernels

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug all

$ make PETSC_DIR=$PWD PETSC_ARCH=arch-kokkos-serial-debug install


Then I build xolotl in a separate build directory (after checking out the 
"feature-petsc-kokkos" branch) with:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DKokkos_DIR=$HOME/build/petsc/debug/install 
-DPETSC_DIR=$HOME/build/petsc/debug/install 

$ make -j4 SystemTester


Then, from the xolotl build directory, run (for example):

$ mpirun -n 2 ./test/system/SystemTester -t System/NE_4 -- -v

Note that this test case will use the parameter file 
'/benchmarks/params_system_NE_4.txt' which has the command-line 
arguments for petsc in its "petscArgs=..." line. If you look at 
'/test/system/SystemTester.cpp' all the system test cases follow 
the same naming convention with their corresponding parameter files under 
'/benchmarks'.

The failure happens with the NE_4 case (which is 2D) and the PSI_3 case (which 
is 1D).

Let me know if this is still unclear.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Tuesday, November 15, 2022 00:16
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; Blondel, Sophie 

Subject: [EXTERNAL] Re: [petsc-users] Using multiple MPI ranks with COO 
interface crashes in some cases

Hi, Philip,
  Can you tell me instructions to build Xolotl to reproduce the error?
--Junchao Zhang


On Mon, Nov 14, 2022 at 12:24 PM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
In Xolotl's "feature-petsc-kokkos" branch, I have moved our code to use the COO 
interface for preallocating and setting values in the Jacobian matrix. I have 
found that with some of our test cases, using more than one MPI rank results in 
a crash. Way down in the preconditioner code in petsc a Mat gets computed that 
has "null" for the "productsymbolic" member of its "ops". It's pretty far 
removed from where we compute the Jacobian entries, so I haven't been able (so 
far) to track it back to an error in my code. I'd appreciate some help with 
this from someone who is more familiar with the petsc guts so we can figure out 
what I'm doing wrong. (I'm assuming it's a bug in Xolotl.)

Note that this is using the kokkos backend for Mat and Vec in petsc, but with a 
serial-only build of kokkos and kokkos-kernels. So, it's a CPU-only multiple 
MPI rank run.

Here's a paste of the error output showing the relevant parts of the call stack:

[ERROR] [0]PETSC ERROR:
[ERROR] - Error Message 
--
[ERROR] [1]PETSC ERROR:
[ERROR] - Error Message 
--
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [1]PETSC ERROR:
[ERROR] No support for this operation for this object type
[ERROR] [0]PETSC ERROR:
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] No method productsymbolic for Mat of type (null)
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] See hxxps://petsc.org/release/faq/ for trouble shooting.
[ERROR] [0]PETSC ERROR:
[ERROR] [1]PETSC ERROR:
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 
2022-10-28 14:39:41 +
[ERROR] Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 
2022-10-28 14:39:41 +
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] Unknown Name on a  named PC0115427 by 4pf Mon Nov 14 13:22:01 2022
[ERROR] [1]PETSC ERROR:
[ERROR] [0]PETSC ERROR:
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc 
PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc 
--with-cxx=mpicxx --with-fc=0 --with-cudac=0 
--prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices 
--with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install 
--with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/serial/install
[ERROR] Configure options PETSC_DIR=/home/4pf/repos/petsc 
PETSC_ARCH=arch-kokkos-serial-debug --with-debugging=1 --with-cc=mpicc 
--with-cxx=mpicxx --with-fc=0 --with-cudac=0 
--prefix=/home/4pf/build/petsc/serial-debug/install --with-64-bit-indices 
--with-shared-libraries --with-kokkos-dir=/home/4pf/build/kokkos/serial/install 

Re: [petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

2022-11-15 Thread Fackler, Philip via petsc-users
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or 
cuda backends. All of them pass with the CPU-only kokkos backend.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Monday, November 14, 2022 19:34
To: Fackler, Philip 
Cc: xolotl-psi-developm...@lists.sourceforge.net 
; petsc-users@mcs.anl.gov 
; Blondel, Sophie ; Zhang, Junchao 
; Roth, Philip 
Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging 
when running on CUDA device.

Hi, Philip,
  Sorry to hear that.  It seems you could run the same code on CPUs but not no 
GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right?

--Junchao Zhang


On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
This is an issue I've brought up before (and discussed in-person with Richard). 
I wanted to bring it up again because I'm hitting the limits of what I know to 
do, and I need help figuring this out.

The problem can be reproduced using Xolotl's "develop" branch built against a 
petsc build with kokkos and kokkos-kernels enabled. Then, either add the 
relevant kokkos options to the "petscArgs=" line in the system test parameter 
file(s), or just replace the system test parameter files with the ones from the 
"feature-petsc-kokkos" branch. See here the files that begin with 
"params_system_".

Note that those files use the "kokkos" options, but the problem is similar 
using the corresponding cuda/cusparse options. I've already tried building 
kokkos-kernels with no TPLs and got slightly different results, but the same 
problem.

Any help would be appreciated.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory