Re: [petsc-dev] HYPRE setup for GPU and CPU solves and other GPU solver questions

2023-05-01 Thread Junchao Zhang
On Mon, May 1, 2023 at 8:14 AM Andrew Ho  wrote:

> Hi,
>
> I noticed that when I compile PETSc/HYPRE with GPU support, it demands
> that I use GPU vectors/matrices (in the form of either
> VECCUDA/MATMPIAIJCUSPARSE,  VECHIP/MATMPIAIJHIPSPARSE, or
> VECKOKKOS/MATMPIAIJKOKKOS). However, I would like to do some
> comparisons/tuning vs. CPU solvers for my particular application, and when
> I try to pass in regular CPU vectors/sparse matrices, PETSc complains:
>
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to
> enable PETSc device support, for example, in some cases, -vec_type cuda
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.19.1-126-g02e62876438
>  GIT Date: 2023-04-30 09:10:41 -0500
> [0]PETSC ERROR: ./myprog on a  named butter by pb Sun Apr 30 23:07:05 2023
> [0]PETSC ERROR: Configure options --prefix=/home/pb/opt/petsc/3.19.1
> --with-debugging=0 --with-large-file-io=1 --with-hdf5-dir=/home/pb/.local
> --download-hypre --download-metis --download-parmetis
> --download-superlu_dist --COPTFLAGS=-O3 --FOPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --CUDAOPTFLAGS=-O3 --with-64-bit-indices=1 --download-slepc
> --download-fblaslapack --with-cuda --with-kokkos-dir=/home/pb/.local
> --with-kokkos-kernels-dir=/home/pb/.local --with-openmp
> [0]PETSC ERROR: #1 VecGetArrayForHYPRE() at
> /scratch/pb/code/third_party/petsc/petsc/src/vec/vec/impls/hypre/vhyp.c:95
> [0]PETSC ERROR: #2 VecHYPRE_IJVectorPushVecRead() at
> /scratch/pb/code/third_party/petsc/petsc/src/vec/vec/impls/hypre/vhyp.c:138
> [0]PETSC ERROR: #3 PCApply_HYPRE() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/pc/impls/hypre/hypre.c:433
> [0]PETSC ERROR: #4 PCApply() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/pc/interface/precon.c:441
>
> [0]PETSC ERROR: #5 KSP_PCApply() at
> /scratch/pb/code/third_party/petsc/petsc/include/petsc/private/kspimpl.h:381
> [0]PETSC ERROR: #6 KSPInitialResidual() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/ksp/interface/itres.c:64
> [0]PETSC ERROR: #7 KSPSolve_GMRES() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/ksp/impls/gmres/gmres.c:226
> [0]PETSC ERROR: #8 KSPSolve_Private() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/ksp/interface/itfunc.c:898
> [0]PETSC ERROR: #9 KSPSolve() at
> /scratch/pb/code/third_party/petsc/petsc/src/ksp/ksp/interface/itfunc.c:1070
> [0]PETSC ERROR: #10 main() at
> /home/pb/code/misc/kokkos_test/src/main.cpp:257
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -pc_hypre_type boomeramg (source: command line)
> [0]PETSC ERROR: -pc_type hypre (source: command line)
> [0]PETSC ERROR: -use_gpu_aware_mpi 0 (source: command line)
>
> Do I have to use a separate PETSc/HYPRE build to do the CPU/GPU
> comparisons?
>
Yes, that is the case for now.   Hypre recently added preliminary support
for running CPU/GPU tests with a single build. I did some experiments in
petsc with it. I got rid of the error you showed, but failed to have the
expected convergence rate. So, there are still bugs.


>
> I was also wondering if the base PETSc KSP solvers use any kind of GPU
> acceleration, and if this is configurable at runtime.
>
Unlike Hypre, if petsc is configured with GPU support,  it can run CPU-only
tests (even on machines with GPUs)


>
> I tried solving a 2D poisson system, and whether I allocated CPU or GPU
> vectors the KSP solver will complete in roughly the same non-trivial amount
> of time, which makes me suspect that either it is never being solved on the
> GPU or is always solved on the GPU since I'm only using a single CPU core.
> This is with PETSC compiled with CUDA/Kokkos support.
>
You can add -log_view -log_view_gpu_time to compare the profiling result.


Re: [petsc-dev] I will be converting .rst files to .md; let me know if you have any outstanding MR with changes

2023-03-24 Thread Junchao Zhang
I have https://gitlab.com/petsc/petsc/-/merge_requests/6225, but my change
is tiny so that you can go ahead,  and I will revise it later.

--Junchao Zhang


On Fri, Mar 24, 2023 at 7:08 AM Barry Smith  wrote:

>
> I will be converting .rst files to .md; please let me know if you have any
> outstanding MR with changes so there are no conflicts.
>
>   Barry
>
>


Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-23 Thread Junchao Zhang
Karl,
 Thanks for the effort.  I feel we should have had less projects and
instead gave a good introduction to one or two.  Lessons for next year.

--Junchao Zhang


On Wed, Feb 22, 2023 at 10:48 PM Karl Rupp  wrote:

> Dear all,
>
> unfortunately our application for the Google Summer of Code 2023 got
> rejected. I haven't received any feedback on the reasons yet; however,
> looking at our GSoC ideas list I can see that we haven't done a good
> enough job to describe our GSoC-projects.
>
> Well, we can take this as input for a better application next year :-)
>
> Best regards,
> Karli
>
>
> On 2/7/23 18:37, Karl Rupp wrote:
> > Dear all,
> >
> > thanks for all the input and help. Our application has been submitted,
> > let's keep our fingers crossed.
> >
> > Also, this is a friendly reminder to fill out the details on the
> > GSoC-topics:
> >   https://gitlab.com/petsc/petsc/-/issues/?search=GSoC
> > Part of the evaluation is whether our ideas are properly communicated.
> :-)
> >
> > Thanks and best regards,
> > Karli
> >
> >
> >
> > On 2/6/23 20:24, Karl Rupp wrote:
> >> Hello all,
> >>
> >> thanks for proposing projects. I've created the suggestions so far as
> >> 'issues' in the issue tracker on Gitlab, prefixed by 'GSoC:'. Please
> >> add a better description to your suggestions so that applicants get a
> >> better idea of what that project is all about and how to get started.
> :-)
> >>
> >> Also, Satish, Junchao, Jed, and Matt should have received invitations
> >> to join the PETSc org for GSoC 2023. Please join today, as we need to
> >> apply by tomorrow (Tuesday) 18:00 UTC.
> >>
> >> I've got one question regarding payment processing; since that is a
> >> bit sensitive, I'll send it to the private list petsc-maint.
> >>
> >> Thanks and best regards,
> >> Karli
> >>
> >>
> >>
> >> On 2/4/23 20:46, Matthew Knepley wrote:
> >>> On Fri, Feb 3, 2023 at 6:28 PM Jed Brown  >>> <mailto:j...@jedbrown.org>> wrote:
> >>>
> >>> Thanks for proposing this. Some ideas:
> >>>
> >>> * DMPlex+libCEED automation
> >>> * Pipelined Krylov methods using Rust async
> >>> * Differentiable programming using Enzyme with PETSc
> >>>
> >>>
> >>> I like all those.
> >>>
> >>>Matt
> >>>
> >>> Karl Rupp mailto:r...@iue.tuwien.ac.at>>
> >>> writes:
> >>>
> >>>  > Dear PETSc developers,
> >>>  >
> >>>  > in order to attract students to PETSc development, I'm thinking
> >>> about a
> >>>  > PETSc application for Google Summer of Code (GSoC) 2023:
> >>>  > https://summerofcode.withgoogle.com/programs/2023
> >>> <https://summerofcode.withgoogle.com/programs/2023>
> >>>  >
> >>>  > The org application deadline is February 7, i.e. in 4 days. This
> >>>  > application is - roughly speaking - a form with a state of
> intent
> >>> and a
> >>>  > justification why the project is a good fit for GSoC. I've done
> >>> this in
> >>>  > the past (~2010-12) and can do the paperwork again this year.
> >>>  >
> >>>  > What is required:
> >>>  >   - PETSc developers, who are willing to act as mentors
> >>> throughout the
> >>>  > program.
> >>>  >   - A few good project ideas (e.g. MATDENSE for GPUs) for
> >>>  > contributors/students to work on
> >>>  >
> >>>  > It used to be that new organizations will get at most 2
> >>> contributor
> >>>  > slots assigned. That's fair, because one must not
> >>> underestimate the
> >>>  > effort that goes into mentoring.
> >>>  >
> >>>  > Thoughts? Shall we apply (yes/no)? If yes, are you willing to be
> >>> mentor?
> >>>  > The more mentors, the better; it underlines the importance of
> the
> >>>  > project and indicates that contributors will find a good
> >>> environment.
> >>>  >
> >>>  > Thanks and best regards,
> >>>  > Karli
> >>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> >>> experiments is infinitely more interesting than any results to which
> >>> their experiments lead.
> >>> -- Norbert Wiener
> >>>
> >>> https://www.cse.buffalo.edu/~knepley/
> >>> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-04 Thread Junchao Zhang
On Sat, Feb 4, 2023 at 2:00 PM Satish Balay  wrote:

> BTW: ANL summer student application process is also in progress - and
> it could be easier process [for Junchao] than google to get a student
>
> [If I remember correctly - there is a category where students are at no
> cost to the project]
>
> I applied for the SRP (Sustainable Research Pathways) program. It incurs
no cost. If it goes well, I will have a student this summer.


> Satish
>
>
> On Fri, 3 Feb 2023, Junchao Zhang wrote:
>
> > On Fri, Feb 3, 2023 at 1:31 PM Karl Rupp  wrote:
> >
> > > Dear PETSc developers,
> > >
> > > in order to attract students to PETSc development, I'm thinking about a
> > > PETSc application for Google Summer of Code (GSoC) 2023:
> > >   https://summerofcode.withgoogle.com/programs/2023
> > >
> > > The org application deadline is February 7, i.e. in 4 days. This
> > > application is - roughly speaking - a form with a state of intent and a
> > > justification why the project is a good fit for GSoC. I've done this in
> > > the past (~2010-12) and can do the paperwork again this year.
> > >
> > > What is required:
> > >   - PETSc developers, who are willing to act as mentors throughout the
> >
> > Hi, Karl, I am happy to act as a mentor
> >
> >
> > >
> > > program.
> > >   - A few good project ideas (e.g. MATDENSE for GPUs) for
> > > contributors/students to work on
> > >
> > * make I, J in AIJ able to have different types, i.e., I in 64-bit but J
> in
> > 32-bit.
> > * MATBAIJ/SBAIJ on GPUs
> > * Support CUDA-12 (we do not now)
> >
> >
> > >
> > > It used to be that new organizations will get at most 2 contributor
> > > slots assigned. That's fair, because one must not underestimate the
> > > effort that goes into mentoring.
> > >
> > > Thoughts? Shall we apply (yes/no)? If yes, are you willing to be
> mentor?
> > > The more mentors, the better; it underlines the importance of the
> > > project and indicates that contributors will find a good environment.
> > >
> > > Thanks and best regards,
> > > Karli
> > >
> >
>
>


Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-03 Thread Junchao Zhang
On Fri, Feb 3, 2023 at 1:31 PM Karl Rupp  wrote:

> Dear PETSc developers,
>
> in order to attract students to PETSc development, I'm thinking about a
> PETSc application for Google Summer of Code (GSoC) 2023:
>   https://summerofcode.withgoogle.com/programs/2023
>
> The org application deadline is February 7, i.e. in 4 days. This
> application is - roughly speaking - a form with a state of intent and a
> justification why the project is a good fit for GSoC. I've done this in
> the past (~2010-12) and can do the paperwork again this year.
>
> What is required:
>   - PETSc developers, who are willing to act as mentors throughout the

Hi, Karl, I am happy to act as a mentor


>
> program.
>   - A few good project ideas (e.g. MATDENSE for GPUs) for
> contributors/students to work on
>
* make I, J in AIJ able to have different types, i.e., I in 64-bit but J in
32-bit.
* MATBAIJ/SBAIJ on GPUs
* Support CUDA-12 (we do not now)


>
> It used to be that new organizations will get at most 2 contributor
> slots assigned. That's fair, because one must not underestimate the
> effort that goes into mentoring.
>
> Thoughts? Shall we apply (yes/no)? If yes, are you willing to be mentor?
> The more mentors, the better; it underlines the importance of the
> project and indicates that contributors will find a good environment.
>
> Thanks and best regards,
> Karli
>


Re: [petsc-dev] Swarm tag error

2022-11-23 Thread Junchao Zhang
>From my reading, the code actually does not need multiple tags. You can
just let _get_tags() return a constant (say 0), or use your modulo
MPI_TAG_UB approach.

541 for (i = 0; i < np; ++i)
PetscCallMPI(MPI_Isend(>messages_to_be_sent[i],
1, MPIU_INT, de->neighbour_procs[i], de->send_tags[i], de->comm, >
_requests[i]));
542 for (i = 0; i < np; ++i) PetscCallMPI(MPI_Irecv(>
messages_to_be_recvieved[i], 1, MPIU_INT, de->neighbour_procs[i], de->
recv_tags[i], de->comm, >_requests[np + i]));

--Junchao Zhang


On Tue, Nov 22, 2022 at 11:59 PM Matthew Knepley  wrote:

> On Tue, Nov 22, 2022 at 11:23 PM Junchao Zhang 
> wrote:
>
>> I don't understand why you need so many tags.  Is the
>> communication pattern actually MPI_Alltoallv, but you implemented it in
>> MPI_Send/Recv?
>>
>
> I am preserving the original design from Dave until we do a more thorough
> rewrite. I think he is using a different tag for each pair of processes to
> make debugging easier.
>
> I don't think Alltoallv is appropriate most of the time. If you had a lot
> of particles with a huge spread of velocities then you could get that, but
> most
> scenarios I think look close to nearest neighbor.
>
>   Thanks,
>
>   Matt
>
>
>> --Junchao Zhang
>>
>>
>> On Mon, Nov 21, 2022 at 2:37 PM Matthew Knepley 
>> wrote:
>>
>>> In data_ex.c, Swarm uses a distinct tag for each pair of processes. If
>>> the number of processes exceeds 1024, there are > 1024^2 tags which exceeds
>>> MPI_TAG_UB on Intel MPI.
>>>
>>> My solution is going to be to use that process pair number modulo
>>> MPI_TAG_UB. Does anyone have a slicker suggestion?
>>>
>>>   Thanks,
>>>
>>>   Matt
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] Swarm tag error

2022-11-22 Thread Junchao Zhang
I don't understand why you need so many tags.  Is the communication pattern
actually MPI_Alltoallv, but you implemented it in MPI_Send/Recv?

--Junchao Zhang


On Mon, Nov 21, 2022 at 2:37 PM Matthew Knepley  wrote:

> In data_ex.c, Swarm uses a distinct tag for each pair of processes. If the
> number of processes exceeds 1024, there are > 1024^2 tags which exceeds
> MPI_TAG_UB on Intel MPI.
>
> My solution is going to be to use that process pair number modulo
> MPI_TAG_UB. Does anyone have a slicker suggestion?
>
>   Thanks,
>
>   Matt
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] [petsc-users] Regarding the status of MatSolve on GPUs

2022-10-19 Thread Junchao Zhang
Sherry,  sorry to ping you again for this issue.

--Junchao Zhang


On Tue, Oct 11, 2022 at 11:04 AM Junchao Zhang 
wrote:

> Hi, Sherry,
>   A petsc user wants to call MatSolve(mat, b, x) multiple times with
> different b on GPUs. In petsc, the code is like
>
> PetscScalar *bptr = NULL;
> VecGetArray(b, )
> pdgssvx3d(.., bptr, ..);
>
> Note VecGetArray() returns a host pointer. If vector b's latest data is on
> GPU, PETSc needs to do a device to host memory copy. Now we want to save
> this memory copy and directly pass b's device pointer (obtained
> via VecCUDAGetArray()) to superlu_dist. But I did not find a mechanism for
> me to tell superlu_dist that bptr is a device pointer.
> Do you have suggestions?
>
> Thanks
> On Thu, Oct 6, 2022 at 2:32 PM Sajid Ali 
> wrote:
>
>> Hi PETSc-developers,
>>
>> Does PETSc currently provide (either native or third party support) for
>> MatSolve that can be performed entirely on a GPU given a factored matrix?
>> i.e. a direct solver that would store the factors L and U on the device and
>> use the GPU to solve the linear system. It does not matter if the GPU is
>> not used for the factorization as we intend to solve the same linear system
>> for 100s of iterations and thus try to prevent GPU->CPU transfers for the
>> MatSolve phase.
>>
>> Currently, I've built PETSc@main (commit 9c433d, 10/03) with
>> superlu-dist@develop, both of which are configured with CUDA. With this,
>> I'm seeing that each call to PCApply/MatSolve involves one GPU->CPU
>> transfer. Is it possible to avoid this?
>>
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Scientific Computing Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io
>>
>


Re: [petsc-dev] [petsc-users] Regarding the status of MatSolve on GPUs

2022-10-11 Thread Junchao Zhang
Hi, Sherry,
  A petsc user wants to call MatSolve(mat, b, x) multiple times with
different b on GPUs. In petsc, the code is like

PetscScalar *bptr = NULL;
VecGetArray(b, )
pdgssvx3d(.., bptr, ..);

Note VecGetArray() returns a host pointer. If vector b's latest data is on
GPU, PETSc needs to do a device to host memory copy. Now we want to save
this memory copy and directly pass b's device pointer (obtained
via VecCUDAGetArray()) to superlu_dist. But I did not find a mechanism for
me to tell superlu_dist that bptr is a device pointer.
Do you have suggestions?

Thanks
On Thu, Oct 6, 2022 at 2:32 PM Sajid Ali 
wrote:

> Hi PETSc-developers,
>
> Does PETSc currently provide (either native or third party support) for
> MatSolve that can be performed entirely on a GPU given a factored matrix?
> i.e. a direct solver that would store the factors L and U on the device and
> use the GPU to solve the linear system. It does not matter if the GPU is
> not used for the factorization as we intend to solve the same linear system
> for 100s of iterations and thus try to prevent GPU->CPU transfers for the
> MatSolve phase.
>
> Currently, I've built PETSc@main (commit 9c433d, 10/03) with
> superlu-dist@develop, both of which are configured with CUDA. With this,
> I'm seeing that each call to PCApply/MatSolve involves one GPU->CPU
> transfer. Is it possible to avoid this?
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
>


Re: [petsc-dev] PetscSFCount is not compatible with MPI_Count

2022-03-29 Thread Junchao Zhang
Yes, the problem is from " Mac + 64-bit indices + --with-clanguage=C++"
In Fande's case, PetscInt (int64_t) is  "long long int" but  MPI_Count is
"long int".  Though both are 64-bit, they are different in C++ function
resolution.

$ cat test.c
extern int foo(long long *);
void bar() {long a = 0; foo();}

$ g++ -c test.c
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior
is deprecated [-Wdeprecated]
test.c:2:25: error: no matching function for call to 'foo'
void bar() {long a = 0; foo();}
^~~
test.c:1:12: note: candidate function not viable: no known conversion from
'long *' to 'long long *' for 1st argument
extern int foo(long long *);

$ gcc -c test.c
test.c:2:29: warning: incompatible pointer types passing 'long *' to
parameter of type 'long long *' [-Wincompatible-pointer-types]
void bar() {long a = 0; foo();}
^~
test.c:1:27: note: passing argument to parameter here
extern int foo(long long *);
  ^
1 warning generated.

I would just use MPI types for MPI arguments.  Also, it looks we need a
64-bit CI job on Mac.

--Junchao Zhang


On Tue, Mar 29, 2022 at 7:07 PM Satish Balay  wrote:

> On Tue, 29 Mar 2022, Junchao Zhang wrote:
>
> > On Tue, Mar 29, 2022 at 4:59 PM Satish Balay via petsc-dev <
> > petsc-dev@mcs.anl.gov> wrote:
> >
> > > We do have such builds in CI - don't know why CI didn't catch it.
> > >
> > > $ grep with-64-bit-indices=1 *.py
> > > arch-ci-freebsd-cxx-cmplx-64idx-dbg.py:  '--with-64-bit-indices=1',
> > > arch-ci-linux-cuda-double-64idx.py:'--with-64-bit-indices=1',
> > > arch-ci-linux-cxx-cmplx-pkgs-64idx.py:  '--with-64-bit-indices=1',
> > > arch-ci-linux-pkgs-64idx.py:  '--with-64-bit-indices=1',
> > > arch-ci-opensolaris-misc.py:  '--with-64-bit-indices=1',
> > >
> > > It implies these CI jobs do not have a recent MPI (like MPICH-4.x )
> that
> > supports MPI-4 large count? It looks we need to have one.
>
> And a Mac
>
> I can't reproduce on linux [even with latest clang]
>
> Satish
>
> >
> >
> > >
> > > Satish
> > >
> > > On Tue, 29 Mar 2022, Fande Kong wrote:
> > >
> > > > OK, I attached the configure log here so that we have move
> information.
> > > >
> > > > I feel like we should do
> > > >
> > > > typedef MPI_Count PetscSFCount
> > > >
> > > > Do we have the target of 64-bit-indices with C++ in CI? I was
> > > > surprised that I am the only guy who saw this issue
> > > >
> > > > Thanks,
> > > >
> > > > Fande
> > > >
> > > > On Tue, Mar 29, 2022 at 2:50 PM Satish Balay 
> wrote:
> > > >
> > > > > What MPI is this? How to reproduce?
> > > > >
> > > > > Perhaps its best if you can send the relevant logs.
> > > > >
> > > > > The likely trigger code in sfneighbor.c:
> > > > >
> > > > > >>>>
> > > > > /* A convenience temporary type */
> > > > > #if defined(PETSC_HAVE_MPI_LARGE_COUNT) &&
> > > defined(PETSC_USE_64BIT_INDICES)
> > > > >   typedef PetscInt PetscSFCount;
> > > > > #else
> > > > >   typedef PetscMPIInt  PetscSFCount;
> > > > > #endif
> > > > >
> > > > > This change is at
> https://gitlab.com/petsc/petsc/-/commit/c87b50c4628
> > > > >
> > > > > Hm - if MPI supported LARGE_COUNT - perhaps it also provides a type
> > > that
> > > > > should go with it which we could use - instead of PetscInt?
> > > > >
> > > > >
> > > > > Perhaps it should be: "typedef log PetscSFCount;"
> > > > >
> > > > > Satish
> > > > >
> > > > >
> > > > > On Tue, 29 Mar 2022, Fande Kong wrote:
> > > > >
> > > > > > It seems correct according to
> > > > > >
> > > > > > #define PETSC_SIZEOF_LONG 8
> > > > > >
> > > > > > #define PETSC_SIZEOF_LONG_LONG 8
> > > > > >
> > > > > >
> > > > > > Can not convert from "non-constant" to "constant"?
> > > > > >
> > > > > > Fande
> > > > > >
> > > > > > On Tue, Mar 29, 2022 at 2:22 PM Fande Kong 
> > > wrote:
> > > > > >
> > &

Re: [petsc-dev] PetscSFCount is not compatible with MPI_Count

2022-03-29 Thread Junchao Zhang
On Tue, Mar 29, 2022 at 5:25 PM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> I'm not sure why we have PetscSFCount - and not always use MPI_Count.
>
> Maybe this would work?
>
> Perhaps Junchao can clarify
>
I used MPIU_Ineighbor_alltoallv() to wrap MPI_Ineighbor_alltoallv() or
MPI_Neighbor_alltoallv_c(), depending on whether PETSc uses 64-bit indices
and the MPI supports large count.
Such that the input count array arguments are of type int* or MPI_Count*. I
defined PetscSFCount for convenience so that I can use that to declare
variables working for both cases.
I have thought in 64-bit, I could pass PetscInt * (long long *) to
 MPI_Count* (long *), but apparently from Fande's report, it is wrong.
Fande's fix is right. I will create an MR with that.


>
> Satish
>
> ---
>
> diff --git a/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c
> b/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c
> index 5dc2e8c0b2..10f42fc302 100644
> --- a/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c
> +++ b/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c
> @@ -1,12 +1,7 @@
>  #include <../src/vec/is/sf/impls/basic/sfpack.h>
>  #include <../src/vec/is/sf/impls/basic/sfbasic.h>
>
> -/* A convenience temporary type */
> -#if defined(PETSC_HAVE_MPI_LARGE_COUNT) &&
> defined(PETSC_USE_64BIT_INDICES)
> -  typedef PetscInt PetscSFCount;
> -#else
> -  typedef PetscMPIInt  PetscSFCount;
> -#endif
> +typedef MPI_Count PetscSFCount;
>
>  typedef struct {
>SFBASICHEADER;
>
>
> On Tue, 29 Mar 2022, Fande Kong wrote:
>
> > OK, this works for me.
> >
> > (moose) kongf@FN428781 petsc1 % git diff
> >
> > *diff --git a/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c
> > b/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c*
> >
> > *index 5dc2e8c0b2..c2cc72dfa9 100644*
> >
> > *--- a/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c*
> >
> > *+++ b/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c*
> >
> > @@ -3,7 +3,7 @@
> >
> >
> >
> >  /* A convenience temporary type */
> >
> >  #if defined(PETSC_HAVE_MPI_LARGE_COUNT) &&
> defined(PETSC_USE_64BIT_INDICES)
> >
> > -  typedef PetscInt PetscSFCount;
> >
> > +  typedef MPI_Count PetscSFCount;
> >
> >  #else
> >
> >typedef PetscMPIInt  PetscSFCount;
> >
> >  #endif
> >
> > On Tue, Mar 29, 2022 at 3:49 PM Fande Kong  wrote:
> >
> > > OK, I attached the configure log here so that we have move information.
> > >
> > > I feel like we should do
> > >
> > > typedef MPI_Count PetscSFCount
> > >
> > > Do we have the target of 64-bit-indices with C++ in CI? I was
> > > surprised that I am the only guy who saw this issue
> > >
> > > Thanks,
> > >
> > > Fande
> > >
> > > On Tue, Mar 29, 2022 at 2:50 PM Satish Balay 
> wrote:
> > >
> > >> What MPI is this? How to reproduce?
> > >>
> > >> Perhaps its best if you can send the relevant logs.
> > >>
> > >> The likely trigger code in sfneighbor.c:
> > >>
> > >> 
> > >> /* A convenience temporary type */
> > >> #if defined(PETSC_HAVE_MPI_LARGE_COUNT) &&
> > >> defined(PETSC_USE_64BIT_INDICES)
> > >>   typedef PetscInt PetscSFCount;
> > >> #else
> > >>   typedef PetscMPIInt  PetscSFCount;
> > >> #endif
> > >>
> > >> This change is at https://gitlab.com/petsc/petsc/-/commit/c87b50c4628
> > >>
> > >> Hm - if MPI supported LARGE_COUNT - perhaps it also provides a type
> that
> > >> should go with it which we could use - instead of PetscInt?
> > >>
> > >>
> > >> Perhaps it should be: "typedef log PetscSFCount;"
> > >>
> > >> Satish
> > >>
> > >>
> > >> On Tue, 29 Mar 2022, Fande Kong wrote:
> > >>
> > >> > It seems correct according to
> > >> >
> > >> > #define PETSC_SIZEOF_LONG 8
> > >> >
> > >> > #define PETSC_SIZEOF_LONG_LONG 8
> > >> >
> > >> >
> > >> > Can not convert from "non-constant" to "constant"?
> > >> >
> > >> > Fande
> > >> >
> > >> > On Tue, Mar 29, 2022 at 2:22 PM Fande Kong 
> wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > When building PETSc with 64 bit indices, it seems that
> PetscSFCount is
> > >> > > 64-bit integer while MPI_Count is still 32 bit.
> > >> > >
> > >> > > typedef long MPI_Count;
> > >> > >
> > >> > > typedef PetscInt   PetscSFCount;
> > >> > >
> > >> > >
> > >> > >  I had the following errors. Do I have a bad MPI?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Fande
> > >> > >
> > >> > >
> > >> > >
> > >>
> Users/kongf/projects/moose6/petsc1/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c:171:18:
> > >> > > error: no matching function for call to
> 'MPI_Ineighbor_alltoallv_c'
> > >> > >
> > >> > >
> > >>
> PetscCallMPI(MPIU_Ineighbor_alltoallv(rootbuf,dat->rootcounts,dat->rootdispls,unit,leafbuf,dat->leafcounts,dat->leafdispls,unit,distcomm,req));
> > >> > >
> > >> > >
> > >>
> ^~~~
> > >> > >
> > >>
> /Users/kongf/projects/moose6/petsc1/include/petsc/private/mpiutils.h:97:79:
> > >> > > note: expanded from macro 'MPIU_Ineighbor_alltoallv'
> > >> > >   #define 

Re: [petsc-dev] PetscSFCount is not compatible with MPI_Count

2022-03-29 Thread Junchao Zhang
On Tue, Mar 29, 2022 at 4:59 PM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> We do have such builds in CI - don't know why CI didn't catch it.
>
> $ grep with-64-bit-indices=1 *.py
> arch-ci-freebsd-cxx-cmplx-64idx-dbg.py:  '--with-64-bit-indices=1',
> arch-ci-linux-cuda-double-64idx.py:'--with-64-bit-indices=1',
> arch-ci-linux-cxx-cmplx-pkgs-64idx.py:  '--with-64-bit-indices=1',
> arch-ci-linux-pkgs-64idx.py:  '--with-64-bit-indices=1',
> arch-ci-opensolaris-misc.py:  '--with-64-bit-indices=1',
>
> It implies these CI jobs do not have a recent MPI (like MPICH-4.x ) that
supports MPI-4 large count? It looks we need to have one.


>
> Satish
>
> On Tue, 29 Mar 2022, Fande Kong wrote:
>
> > OK, I attached the configure log here so that we have move information.
> >
> > I feel like we should do
> >
> > typedef MPI_Count PetscSFCount
> >
> > Do we have the target of 64-bit-indices with C++ in CI? I was
> > surprised that I am the only guy who saw this issue
> >
> > Thanks,
> >
> > Fande
> >
> > On Tue, Mar 29, 2022 at 2:50 PM Satish Balay  wrote:
> >
> > > What MPI is this? How to reproduce?
> > >
> > > Perhaps its best if you can send the relevant logs.
> > >
> > > The likely trigger code in sfneighbor.c:
> > >
> > > 
> > > /* A convenience temporary type */
> > > #if defined(PETSC_HAVE_MPI_LARGE_COUNT) &&
> defined(PETSC_USE_64BIT_INDICES)
> > >   typedef PetscInt PetscSFCount;
> > > #else
> > >   typedef PetscMPIInt  PetscSFCount;
> > > #endif
> > >
> > > This change is at https://gitlab.com/petsc/petsc/-/commit/c87b50c4628
> > >
> > > Hm - if MPI supported LARGE_COUNT - perhaps it also provides a type
> that
> > > should go with it which we could use - instead of PetscInt?
> > >
> > >
> > > Perhaps it should be: "typedef log PetscSFCount;"
> > >
> > > Satish
> > >
> > >
> > > On Tue, 29 Mar 2022, Fande Kong wrote:
> > >
> > > > It seems correct according to
> > > >
> > > > #define PETSC_SIZEOF_LONG 8
> > > >
> > > > #define PETSC_SIZEOF_LONG_LONG 8
> > > >
> > > >
> > > > Can not convert from "non-constant" to "constant"?
> > > >
> > > > Fande
> > > >
> > > > On Tue, Mar 29, 2022 at 2:22 PM Fande Kong 
> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > When building PETSc with 64 bit indices, it seems that
> PetscSFCount is
> > > > > 64-bit integer while MPI_Count is still 32 bit.
> > > > >
> > > > > typedef long MPI_Count;
> > > > >
> > > > > typedef PetscInt   PetscSFCount;
> > > > >
> > > > >
> > > > >  I had the following errors. Do I have a bad MPI?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Fande
> > > > >
> > > > >
> > > > >
> > >
> Users/kongf/projects/moose6/petsc1/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c:171:18:
> > > > > error: no matching function for call to 'MPI_Ineighbor_alltoallv_c'
> > > > >
> > > > >
> > >
> PetscCallMPI(MPIU_Ineighbor_alltoallv(rootbuf,dat->rootcounts,dat->rootdispls,unit,leafbuf,dat->leafcounts,dat->leafdispls,unit,distcomm,req));
> > > > >
> > > > >
> > >
> ^~~~
> > > > >
> > >
> /Users/kongf/projects/moose6/petsc1/include/petsc/private/mpiutils.h:97:79:
> > > > > note: expanded from macro 'MPIU_Ineighbor_alltoallv'
> > > > >   #define MPIU_Ineighbor_alltoallv(a,b,c,d,e,f,g,h,i,j)
> > > > > MPI_Ineighbor_alltoallv_c(a,b,c,d,e,f,g,h,i,j)
> > > > >
> > > > > ^
> > > > > /Users/kongf/projects/moose6/petsc1/include/petscerror.h:407:32:
> note:
> > > > > expanded from macro 'PetscCallMPI'
> > > > > PetscMPIInt _7_errorcode = __VA_ARGS__;
> > > > >  \
> > > > >^~~
> > > > > /Users/kongf/mambaforge3/envs/moose/include/mpi_proto.h:945:5:
> note:
> > > > > candidate function not viable: no known conversion from
> 'PetscSFCount
> > > *'
> > > > > (aka 'long long *') to 'const MPI_Count *' (aka 'const long *')
> for 2nd
> > > > > argument
> > > > > int MPI_Ineighbor_alltoallv_c(const void *sendbuf, const MPI_Count
> > > > > sendcounts[],
> > > > > ^
> > > > >
> > >
> /Users/kongf/projects/moose6/petsc1/src/vec/is/sf/impls/basic/neighbor/sfneighbor.c:195:18:
> > > > > error: no matching function for call to 'MPI_Ineighbor_alltoallv_c'
> > > > >
> > > > >
> > >
> PetscCallMPI(MPIU_Ineighbor_alltoallv(leafbuf,dat->leafcounts,dat->leafdispls,unit,rootbuf,dat->rootcounts,dat->rootdispls,unit,distcomm,req));
> > > > >
> > > > >
> > >
> ^~~~
> > > > >
> > >
> /Users/kongf/projects/moose6/petsc1/include/petsc/private/mpiutils.h:97:79:
> > > > > note: expanded from macro 'MPIU_Ineighbor_alltoallv'
> > > > >   #define MPIU_Ineighbor_alltoallv(a,b,c,d,e,f,g,h,i,j)
> > > > > MPI_Ineighbor_alltoallv_c(a,b,c,d,e,f,g,h,i,j)
> > > > >
> > > > > ^
> > > > > 

Re: [petsc-dev] MatSetPreallocationCOO remove attached objects?

2022-03-01 Thread Junchao Zhang
I met errors and I don't know how to fix them,
https://gitlab.com/petsc/petsc/-/jobs/2151255655
The errors are all hypre-related.  @Stefano Zampini
  might know more.
Perhaps we can assert in MatHeaderMerge(A,C) and A does not contain
composed objects?  MatHeaderMerge() is so vague on what will be kept and
what will be discarded.

--Junchao Zhang


On Tue, Mar 1, 2022 at 3:00 PM Mark Adams  wrote:

> I can attach my containers (3!) after this call.
> Actually better structure in my code but this should be fixed.
> Thanks
>
> On Tue, Mar 1, 2022 at 3:06 PM Barry Smith  wrote:
>
>>
>>   These might not need to be deleted but could possibly be moved over
>>
>> ierr = PetscFunctionListDestroy(&((PetscObject)A)->qlist);CHKERRQ(ierr);
>>   ierr = PetscObjectListDestroy(&((PetscObject)A)->olist);CHKERRQ(ierr);
>>   ierr = PetscComposedQuantitiesDestroy((PetscObject)A);CHKERRQ(ierr);
>>
>> also MatHeaderReplace() exists. I struggle to understand the exact
>> differences and why both exist but I think there are some subtle reasons
>> why there are both and don't know if they can be merged.
>>
>> On Mar 1, 2022, at 2:47 PM, Junchao Zhang 
>> wrote:
>>
>> I realized this problem but did not expect someone would run into it :)
>> Let me think again.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Mar 1, 2022 at 1:33 PM Mark Adams  wrote:
>>
>>> I have a container attached to my matrix and it seems to go away after a
>>> call to MatSetPreallocationCOO.
>>> Does that sound plausible?
>>>
>>
>>


Re: [petsc-dev] MatSetPreallocationCOO remove attached objects?

2022-03-01 Thread Junchao Zhang
I realized this problem but did not expect someone would run into it :)
Let me think again.

--Junchao Zhang


On Tue, Mar 1, 2022 at 1:33 PM Mark Adams  wrote:

> I have a container attached to my matrix and it seems to go away after a
> call to MatSetPreallocationCOO.
> Does that sound plausible?
>


Re: [petsc-dev] Current status of using streams within PETSc

2022-02-15 Thread Junchao Zhang
Besides the MPI synchronization issue,  we need new async APIs like
VecAXPYAsync() to pass scalars produced on device.

--Junchao Zhang


On Tue, Feb 15, 2022 at 10:11 AM Jed Brown  wrote:

> Note that operations that don't have communication (like VecAXPY and
> VecPointwiseMult) are already non-blocking on streams. (A recent Thrust
> update helped us recover what had silently become blocking in a previous
> release.) For multi-rank, operations like MatMult require communication and
> MPI doesn't have a way to make it nonblocking. We've had some issues/bugs
> with NVSHMEM to bypass MPI.
>
> MPI implementors have been really skeptical of placing MPI operations on
> streams (like NCCL/RCCL or NVSHMEM). Cray's MPI doesn't have anything to do
> with streams, device memory is cachable on the host, and RDMA operations
> are initiated on the host without device logic being involved. I feel like
> it's going to take company investment or a very enterprising systems
> researcher to make the case for getting messaging to play well with
> streams. Perhaps it's a better use of time to focus on reducing latency of
> notifying the host when RDMA completes and reducing kernel launch time. In
> short, there are many unanswered questions regarding truly asynchronous
> Krylov solvers. But in the most obvious places for async, it works
> currently.
>
> Jacob Faibussowitsch  writes:
>
> > New code can (and absolutely should) use it right away,
> PetscDeviceContext has been fully functional since its merger. Remember
> though that it works on a “principled parallelism” model; the caller is
> responsible for proper serialization.
> >
> > Existing code? Not so much. In broad strokes the following sections need
> support before parallelism can be achieved from user-code:
> >
> > 1. Vec - WIP (feature complete, now in bug-fixing stage)
> > 2. PetscSF - TODO
> > 3. Mat - TODO
> > 4. KSP/PC  - TODO
> >
> > Seeing as each MR thus far for this has taken me roughly 3-4 months to
> merge, and with the later sections requiring enormous rewrites and API
> changes I don’t expect this to be finished for at least 2 years… Once the
> Vec MR is merged you could theoretically run with
> -device_context_stream_type default_blocking and achieve “asynchronous”
> compute but nothing would work properly as every other part of petsc
> expects to be synchronous.
> >
> > That being said I would be happy to give a demo to people on how they
> can integrate PetscDeviceContext into their code on the next developers
> meeting. It would go a long way to cutting down the timeline.
> >
> >> On Feb 15, 2022, at 02:02, Stefano Zampini 
> wrote:
> >>
> >> Jacob
> >>
> >> what is the current status of the async support in PETSc?
> >> Can you summarize here? Is there any documentation available?
> >>
> >> Thanks
> >> --
> >> Stefano
>


Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Junchao Zhang
I don't know if this is due to bugs in petsc/kokkos backend.   See if you
can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on
Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of
our own.

--Junchao Zhang


On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:

> I am not able to reproduce this with a small problem. 2 nodes or less
> refinement works. This is from the 8 node test, the -dm_refine 5 version.
> I see that it comes from PtAP.
> This is on the fine grid. (I was thinking it could be on a reduced grid
> with idle processors, but no)
>
> [15]PETSC ERROR: Argument out of range
> [15]PETSC ERROR: Key <= 0
> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
>  GIT Date: 2022-01-25 09:20:51 -0500
> [15]PETSC ERROR:
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
> --with-fc=ftn --with-fortran-bindings=0
> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
> --download-p4est=1
> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
> PETSC_ARCH=arch-olcf-crusher
> [15]PETSC ERROR: #1 PetscTableFind() at
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
> [15]PETSC ERROR: #5 MatAssemblyEnd() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
> [15]PETSC ERROR: #9 MatProductSymbolic() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
> [15]PETSC ERROR: #10 MatPtAP() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
> [15]PETSC ERROR: #12 PCSetUp_GAMG() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
> [15]PETSC ERROR: #13 PCSetUp() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
> [15]PETSC ERROR: #14 KSPSetUp() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
> [15]PETSC ERROR: #15 KSPSolve_Private() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
> [15]PETSC ERROR: #16 KSPSolve() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
> [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
> [15]PETSC ERROR: #18 SNESSolve() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
> [15]PETSC ERROR: #19 main() at ex13.c:169
> [15]PETSC ERROR: PETSc Option Table entries:
> [15]PETSC ERROR: -benchmark_it 10
>
> On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  wrote:
>
>> The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
>> I will make a minimum reproducer. start with 2 nodes, one process on each
>> node.
>>
>>
>> On Tue, Jan 25, 2022 at 10:19 PM Barry Smith  wrote:
>>
>>>
>>>   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
>>> scales almost perfectly, but the overall flop rate is only half of what it
>>> should be at 64).
>>>
>>> On Jan 25, 2022, at 9:24 PM, Mark Adams  wrote:
>>>
>>> It looks like we have our instrumentation and job c

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Junchao Zhang
On Mon, Jan 24, 2022 at 12:55 PM Mark Adams  wrote:

>
>
> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang 
> wrote:
>
>> Mark, I think you can benchmark individual vector operations, and once we
>> get reasonable profiling results, we can move to solvers etc.
>>
>
> Can you suggest a code to run or are you suggesting making a vector
> benchmark code?
>
Make a vector benchmark code, testing vector operations that would be used
in your solver.
Also, we can run MatMult() to see if the profiling result is reasonable.
Only once we get some solid results on basic operations, it is useful to
run big codes.


>
>
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams  wrote:
>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith  wrote:
>>>
>>>>
>>>>   Here except for VecNorm the GPU is used effectively in that most of
>>>> the time is time is spent doing real work on the GPU
>>>>
>>>> VecNorm  402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>> 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393  0
>>>> 0.00e+000 0.00e+00 100
>>>>
>>>> Even the dots are very effective, only the VecNorm flop rate over the
>>>> full time is much much lower than the vecdot. Which is somehow due to the
>>>> use of the GPU or CPU MPI in the allreduce?
>>>>
>>>
>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate is
>>> about the same as the other vec ops. I don't know what to make of that.
>>>
>>> But Crusher is clearly not crushing it.
>>>
>>> Junchao: Perhaps we should ask Kokkos if they have any experience with
>>> Crusher that they can share. They could very well find some low level magic.
>>>
>>>
>>>
>>>>
>>>>
>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams  wrote:
>>>>
>>>>
>>>>
>>>>> Mark, can we compare with Spock?
>>>>>
>>>>
>>>>  Looks much better. This puts two processes/GPU because there are only
>>>> 4.
>>>> 
>>>>
>>>>
>>>>


Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-24 Thread Junchao Zhang
Mark, I think you can benchmark individual vector operations, and once we
get reasonable profiling results, we can move to solvers etc.

--Junchao Zhang


On Mon, Jan 24, 2022 at 12:09 PM Mark Adams  wrote:

>
>
> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith  wrote:
>
>>
>>   Here except for VecNorm the GPU is used effectively in that most of the
>> time is time is spent doing real work on the GPU
>>
>> VecNorm  402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00
>> 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393  0 0.00e+000
>> 0.00e+00 100
>>
>> Even the dots are very effective, only the VecNorm flop rate over the
>> full time is much much lower than the vecdot. Which is somehow due to the
>> use of the GPU or CPU MPI in the allreduce?
>>
>
> The VecNorm GPU rate is relatively high on Crusher and the CPU rate is
> about the same as the other vec ops. I don't know what to make of that.
>
> But Crusher is clearly not crushing it.
>
> Junchao: Perhaps we should ask Kokkos if they have any experience with
> Crusher that they can share. They could very well find some low level magic.
>
>
>
>>
>>
>> On Jan 24, 2022, at 12:14 PM, Mark Adams  wrote:
>>
>>
>>
>>> Mark, can we compare with Spock?
>>>
>>
>>  Looks much better. This puts two processes/GPU because there are only 4.
>> 
>>
>>
>>


Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Junchao Zhang
On Sun, Jan 23, 2022 at 11:22 PM Barry Smith  wrote:

>
>
> On Jan 24, 2022, at 12:16 AM, Junchao Zhang 
> wrote:
>
>
>
> On Sun, Jan 23, 2022 at 10:44 PM Barry Smith  wrote:
>
>>
>>   Junchao,
>>
>>  Without GPU aware MPI, is it moving the entire vector to the CPU and
>> doing the scatter and moving everything back or does it just move up
>> exactly what needs to be sent to the other ranks and move back exactly what
>> it received from other ranks?
>>
> It only moves entries needed, using a kernel to pack/unpack them.
>
>
> Ok, that pack kernel is Kokkos?  How come the pack times take so
> little time compared to the MPI sends in the locks those times are much
> smaller than the VecScatter times? Is the logging correct for how much
> stuff is sent up and down?
>
Yes, the pack/unpack kernels are kokkos.  I need to check the profiling.


>
>
>> It is moving 4.74e+02 * 1e+6 bytes total data up and then down. Is
>> that a reasonable amount?
>>
>> Why is it moving 800 distinct counts up and 800 distinct counts down
>> when the MatMult is done 400 times, shouldn't it be 400 counts?
>>
>>   Mark,
>>
>>  Can you run both with GPU aware MPI?
>>
>>
>>   Norm, AXPY, pointwisemult roughly the same.
>>
>>
>> On Jan 23, 2022, at 11:24 PM, Mark Adams  wrote:
>>
>> Ugh, try again. Still a big difference, but less.  Mat-vec does not
>> change much.
>>
>> On Sun, Jan 23, 2022 at 7:12 PM Barry Smith  wrote:
>>
>>>
>>>  You have debugging turned on on crusher but not permutter
>>>
>>> On Jan 23, 2022, at 6:37 PM, Mark Adams  wrote:
>>>
>>> * Perlmutter is roughly 5x faster than Crusher on the one node 2M eq
>>> test. (small)
>>> This is with 8 processes.
>>>
>>> * The next largest version of this test, 16M eq total and 8 processes,
>>> fails in memory allocation in the mat-mult setup in the Kokkos Mat.
>>>
>>> * If I try to run with 64 processes on Perlmutter I get this error in
>>> initialization. These nodes have 160 Gb of memory.
>>> (I assume this is related to these large memory requirements from
>>> loading packages, etc)
>>>
>>> Thanks,
>>> Mark
>>>
>>> + srun -n64 -N1 --cpu-bind=cores --ntasks-per-core=1 ../ex13
>>> -dm_plex_box_faces 4,4,4 -petscpartitioner_simple_process_grid 4,4,4
>>> -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
>>> -dm_refine 6 -dm_view -pc_type jacobi -log
>>> _view -ksp_view -use_gpu_aware_mpi false -dm_mat_type aijkokkos
>>> -dm_vec_type kokkos -log_trace
>>> + tee jac_out_001_kokkos_Perlmutter_6_8.txt
>>> [48]PETSC ERROR: - Error Message
>>> --
>>> [48]PETSC ERROR: GPU error
>>> [48]PETSC ERROR: cuda error 2 (cudaErrorMemoryAllocation) : out of memory
>>> [48]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> [48]PETSC ERROR: Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8
>>>  GIT Date: 2022-01-22 12:18:02 -0600
>>> [48]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/data/../ex13
>>> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid001424 by madams Sun Jan
>>> 23 15:19:56 2022
>>> [48]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
>>> -rdynamic -DLANDAU_DIM=2 -DLAN
>>> DAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC
>>> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
>>> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
>>> --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
>>>  --with-debugging=0 --download-metis --download-parmetis --with-cuda=1
>>> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
>>> --with-zlib=1 --download-kokkos --download-kokkos-kernels
>>> --with-kokkos-kernels-tpl=0 --with-
>>> make-np=8 PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
>>> [48]PETSC ERROR: #1 initialize() at
>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:72
>>> [48]PETSC ERROR: #2 initialize() at
>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:343
>>> [48]PETSC ERRO

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Junchao Zhang
On Sun, Jan 23, 2022 at 10:44 PM Barry Smith  wrote:

>
>   Junchao,
>
>  Without GPU aware MPI, is it moving the entire vector to the CPU and
> doing the scatter and moving everything back or does it just move up
> exactly what needs to be sent to the other ranks and move back exactly what
> it received from other ranks?
>
It only moves entries needed, using a kernel to pack/unpack them.

>
> It is moving 4.74e+02 * 1e+6 bytes total data up and then down. Is
> that a reasonable amount?
>
> Why is it moving 800 distinct counts up and 800 distinct counts down
> when the MatMult is done 400 times, shouldn't it be 400 counts?
>
>   Mark,
>
>  Can you run both with GPU aware MPI?
>
>
>   Norm, AXPY, pointwisemult roughly the same.
>
>
> On Jan 23, 2022, at 11:24 PM, Mark Adams  wrote:
>
> Ugh, try again. Still a big difference, but less.  Mat-vec does not change
> much.
>
> On Sun, Jan 23, 2022 at 7:12 PM Barry Smith  wrote:
>
>>
>>  You have debugging turned on on crusher but not permutter
>>
>> On Jan 23, 2022, at 6:37 PM, Mark Adams  wrote:
>>
>> * Perlmutter is roughly 5x faster than Crusher on the one node 2M eq
>> test. (small)
>> This is with 8 processes.
>>
>> * The next largest version of this test, 16M eq total and 8 processes,
>> fails in memory allocation in the mat-mult setup in the Kokkos Mat.
>>
>> * If I try to run with 64 processes on Perlmutter I get this error in
>> initialization. These nodes have 160 Gb of memory.
>> (I assume this is related to these large memory requirements from loading
>> packages, etc)
>>
>> Thanks,
>> Mark
>>
>> + srun -n64 -N1 --cpu-bind=cores --ntasks-per-core=1 ../ex13
>> -dm_plex_box_faces 4,4,4 -petscpartitioner_simple_process_grid 4,4,4
>> -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
>> -dm_refine 6 -dm_view -pc_type jacobi -log
>> _view -ksp_view -use_gpu_aware_mpi false -dm_mat_type aijkokkos
>> -dm_vec_type kokkos -log_trace
>> + tee jac_out_001_kokkos_Perlmutter_6_8.txt
>> [48]PETSC ERROR: - Error Message
>> --
>> [48]PETSC ERROR: GPU error
>> [48]PETSC ERROR: cuda error 2 (cudaErrorMemoryAllocation) : out of memory
>> [48]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [48]PETSC ERROR: Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8
>>  GIT Date: 2022-01-22 12:18:02 -0600
>> [48]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/data/../ex13 on
>> a arch-perlmutter-opt-gcc-kokkos-cuda named nid001424 by madams Sun Jan 23
>> 15:19:56 2022
>> [48]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
>> -rdynamic -DLANDAU_DIM=2 -DLAN
>> DAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC
>> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
>> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
>> --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
>>  --with-debugging=0 --download-metis --download-parmetis --with-cuda=1
>> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
>> --with-zlib=1 --download-kokkos --download-kokkos-kernels
>> --with-kokkos-kernels-tpl=0 --with-
>> make-np=8 PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
>> [48]PETSC ERROR: #1 initialize() at
>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:72
>> [48]PETSC ERROR: #2 initialize() at
>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:343
>> [48]PETSC ERROR: #3 PetscDeviceInitializeTypeFromOptions_Private() at
>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:319
>> [48]PETSC ERROR: #4 PetscDeviceInitializeFromOptions_Internal() at
>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:449
>> [48]PETSC ERROR: #5 PetscInitialize_Common() at
>> /global/u2/m/madams/petsc/src/sys/objects/pinit.c:963
>> [48]PETSC ERROR: #6 PetscInitialize() at
>> /global/u2/m/madams/petsc/src/sys/objects/pinit.c:1238
>>
>>
>> On Sun, Jan 23, 2022 at 8:58 AM Mark Adams  wrote:
>>
>>>
>>>
>>> On Sat, Jan 22, 2022 at 6:22 PM Barry Smith  wrote:
>>>

I cleaned up Mark's last run and put it in a fixed-width font. I
 realize this may be too difficult but it would be great to have identical
 runs to compare with on Summit.

>>>
>>> I was planning on running this on Perlmutter today, as well as some
>>> sanity checks like all GPUs are being used. I'll try PetscDeviceView.
>>>
>>> Junchao modified the timers and all GPU > CPU now, but he seemed to move
>>> the timers more outside and Barry wants them tight on the "kernel".
>>> I think Junchao is going to work on that so I will hold off.
>>> (I removed the the Kokkos wait stuff and seemed to run a little faster
>>> but I am not sure how deterministic the timers are, and I did a test 

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-23 Thread Junchao Zhang
On Sat, Jan 22, 2022 at 9:00 PM Junchao Zhang 
wrote:

>
>
>
> On Sat, Jan 22, 2022 at 5:00 PM Barry Smith  wrote:
>
>>
>>   The GPU flop rate (when 100 percent flops on the GPU) should always be
>> higher than the overall flop rate (the previous column). For large problems
>> they should be similar, for small problems the GPU one may be much higher.
>>
>>   If the CPU one is higher (when 100 percent flops on the GPU) something
>> must be wrong with the logging. I looked at the code for the two cases and
>> didn't see anything obvious.
>>
>>   Junchao and Jacob,
>>   I think some of the timing code in the Kokkos interface is wrong.
>>
>> *  The PetscLogGpuTimeBegin/End should be inside the viewer access
>> code not outside it. (The GPU time is an attempt to best time the kernels,
>> not other processing around the use of the kernels, that other stuff is
>> captured in the general LogEventBegin/End.
>>
> What about potential host to device memory copy before calling a kernel?
Should we count it in the kernel time?

Good point
>
>> *  The use of WaitForKokkos() is confusing and seems inconsistent.
>>
> I need to have a look. Until now, I have not paid much attention to kokkos
> profiling.
>
>>  -For example it is used in VecTDot_SeqKokkos() which I would
>> think has a barrier anyways because it puts a scalar result into update?
>>  -Plus PetscLogGpuTimeBegin/End is suppose to already have
>> suitable system (that Hong added) to ensure the kernel is complete; reading
>> the manual page and looking at Jacobs cupmcontext.hpp it seems to be there
>> so I don't think WaitForKokkos() is needed in most places (or is Kokkos
>> asynchronous and needs this for correctness?)
>> But these won't explain the strange result of overall flop rate being
>> higher than GPU flop rate.
>>
>>   Barry
>>
>>
>>
>>
>>
>> On Jan 22, 2022, at 11:44 AM, Mark Adams  wrote:
>>
>> I am getting some funny timings and I'm trying to figure it out.
>> I figure the gPU flop rates are bit higher because the timers are inside
>> of the CPU timers, but *some are a lot bigger or inverted*
>>
>> --- Event Stage 2: KSP Solve only
>>
>> MatMult  400 1.0 1.0094e+01 1.2 1.07e+11 1.0 3.7e+05 6.1e+04
>> 0.0e+00  2 55 62 54  0  68 91100100  0 671849   857147  0 0.00e+000
>> 0.00e+00 100
>> MatView2 1.0 4.5257e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
>> 2.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> KSPSolve   2 1.0 1.4591e+01 1.1 1.18e+11 1.0 3.7e+05 6.1e+04
>> 1.2e+03  2 60 62 54 60 100100100100100 512399   804048  0 0.00e+000
>> 0.00e+00 100
>> SFPack   400 1.0 2.4545e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> SFUnpack 400 1.0 9.4637e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> VecTDot  802 1.0 3.0577e+00 2.1 3.36e+09 1.0 0.0e+00 0.0e+00
>> 8.0e+02  0  2  0  0 40  13  3  0  0 67 *69996   488328*  0 0.00e+00
>>0 0.00e+00 100
>> VecNorm  402 1.0 1.9597e+00 3.4 1.69e+09 1.0 0.0e+00 0.0e+00
>> 4.0e+02  0  1  0  0 20   6  1  0  0 33 54744   571507  0 0.00e+000
>> 0.00e+00 100
>> VecCopy4 1.0 1.7143e-0228.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> VecSet 4 1.0 3.8051e-0316.9 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> VecAXPY  800 1.0 8.6160e-0113.6 3.36e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   6  3  0  0  0 *247787   448304*  0 0.00e+00
>>0 0.00e+00 100
>> VecAYPX  398 1.0 1.6831e+0031.1 1.67e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   5  1  0  0  0 63107   77030  0 0.00e+000
>> 0.00e+00 100
>> VecPointwiseMult 402 1.0 3.8729e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   2  1  0  0  0 138502   262413  0 0.00e+000
>> 0.00e+00 100
>> VecScatterBegin  400 1.0 1.1947e+0035.1 0.00e+00 0.0 3.7e+05 6.1e+04
>> 0.0e+00  0  0 62 54  0   5  0100100  0 0   0  0 0.00e+000
>> 0.00e+00  0
>> VecScatterEnd    400 1.0 6.2969e+00 8.8 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Junchao Zhang
On Sat, Jan 22, 2022 at 5:00 PM Barry Smith  wrote:

>
>   The GPU flop rate (when 100 percent flops on the GPU) should always be
> higher than the overall flop rate (the previous column). For large problems
> they should be similar, for small problems the GPU one may be much higher.
>
>   If the CPU one is higher (when 100 percent flops on the GPU) something
> must be wrong with the logging. I looked at the code for the two cases and
> didn't see anything obvious.
>
>   Junchao and Jacob,
>   I think some of the timing code in the Kokkos interface is wrong.
>
> *  The PetscLogGpuTimeBegin/End should be inside the viewer access
> code not outside it. (The GPU time is an attempt to best time the kernels,
> not other processing around the use of the kernels, that other stuff is
> captured in the general LogEventBegin/End.
>
Good point

> *  The use of WaitForKokkos() is confusing and seems inconsistent.
>
I need to have a look. Until now, I have not paid much attention to kokkos
profiling.

>  -For example it is used in VecTDot_SeqKokkos() which I would
> think has a barrier anyways because it puts a scalar result into update?
>  -Plus PetscLogGpuTimeBegin/End is suppose to already have
> suitable system (that Hong added) to ensure the kernel is complete; reading
> the manual page and looking at Jacobs cupmcontext.hpp it seems to be there
> so I don't think WaitForKokkos() is needed in most places (or is Kokkos
> asynchronous and needs this for correctness?)
> But these won't explain the strange result of overall flop rate being
> higher than GPU flop rate.
>
>   Barry
>
>
>
>
>
> On Jan 22, 2022, at 11:44 AM, Mark Adams  wrote:
>
> I am getting some funny timings and I'm trying to figure it out.
> I figure the gPU flop rates are bit higher because the timers are inside
> of the CPU timers, but *some are a lot bigger or inverted*
>
> --- Event Stage 2: KSP Solve only
>
> MatMult  400 1.0 1.0094e+01 1.2 1.07e+11 1.0 3.7e+05 6.1e+04
> 0.0e+00  2 55 62 54  0  68 91100100  0 671849   857147  0 0.00e+000
> 0.00e+00 100
> MatView2 1.0 4.5257e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> KSPSolve   2 1.0 1.4591e+01 1.1 1.18e+11 1.0 3.7e+05 6.1e+04
> 1.2e+03  2 60 62 54 60 100100100100100 512399   804048  0 0.00e+000
> 0.00e+00 100
> SFPack   400 1.0 2.4545e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> SFUnpack 400 1.0 9.4637e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> VecTDot  802 1.0 3.0577e+00 2.1 3.36e+09 1.0 0.0e+00 0.0e+00
> 8.0e+02  0  2  0  0 40  13  3  0  0 67 *69996   488328*  0 0.00e+00
>  0 0.00e+00 100
> VecNorm  402 1.0 1.9597e+00 3.4 1.69e+09 1.0 0.0e+00 0.0e+00
> 4.0e+02  0  1  0  0 20   6  1  0  0 33 54744   571507  0 0.00e+000
> 0.00e+00 100
> VecCopy4 1.0 1.7143e-0228.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> VecSet 4 1.0 3.8051e-0316.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> VecAXPY  800 1.0 8.6160e-0113.6 3.36e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   6  3  0  0  0 *247787   448304*  0 0.00e+00
>0 0.00e+00 100
> VecAYPX  398 1.0 1.6831e+0031.1 1.67e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   5  1  0  0  0 63107   77030  0 0.00e+000
> 0.00e+00 100
> VecPointwiseMult 402 1.0 3.8729e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   2  1  0  0  0 138502   262413  0 0.00e+000
> 0.00e+00 100
> VecScatterBegin  400 1.0 1.1947e+0035.1 0.00e+00 0.0 3.7e+05 6.1e+04
> 0.0e+00  0  0 62 54  0   5  0100100  0 0   0  0 0.00e+000
> 0.00e+00  0
> VecScatterEnd400 1.0 6.2969e+00 8.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  10  0  0  0  0 0   0  0 0.00e+000
> 0.00e+00  0
> PCApply  402 1.0 3.8758e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   2  1  0  0  0 138396   262413  0 0.00e+000
> 0.00e+00 100
>
> ---
>
>
> On Sat, Jan 22, 2022 at 11:10 AM Junchao Zhang 
> wrote:
>
>>
>>
>>
>> On Sat, Jan

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-22 Thread Junchao Zhang
On Sat, Jan 22, 2022 at 10:04 AM Mark Adams  wrote:

> Logging GPU flops should be inside of PetscLogGpuTimeBegin()/End()  right?
>
No, PetscLogGpuTime() does not know the flops of the caller.


>
> On Fri, Jan 21, 2022 at 9:47 PM Barry Smith  wrote:
>
>>
>>   Mark,
>>
>>   Fix the logging before you run more. It will help with seeing the true
>> disparity between the MatMult and the vector ops.
>>
>>
>> On Jan 21, 2022, at 9:37 PM, Mark Adams  wrote:
>>
>> Here is one with 2M / GPU. Getting better.
>>
>> On Fri, Jan 21, 2022 at 9:17 PM Barry Smith  wrote:
>>
>>>
>>>Matt is correct, vectors are way too small.
>>>
>>>BTW: Now would be a good time to run some of the Report I benchmarks
>>> on Crusher to get a feel for the kernel launch times and performance on
>>> VecOps.
>>>
>>>Also Report 2.
>>>
>>>   Barry
>>>
>>>
>>> On Jan 21, 2022, at 7:58 PM, Matthew Knepley  wrote:
>>>
>>> On Fri, Jan 21, 2022 at 6:41 PM Mark Adams  wrote:
>>>
 I am looking at performance of a CG/Jacobi solve on a 3D Q2 Laplacian
 (ex13) on one Crusher node (8 GPUs on 4 GPU sockets, MI250X or is it
 MI200?).
 This is with a 16M equation problem. GPU-aware MPI and non GPU-aware
 MPI are similar (mat-vec is a little faster w/o, the total is about the
 same, call it noise)

 I found that MatMult was about 3x faster using 8 cores/GPU, that is all
 64 cores on the node, then when using 1 core/GPU. With the same size
 problem of course.
 I was thinking MatMult should be faster with just one MPI process. Oh
 well, worry about that later.

 The bigger problem, and I have observed this to some extent with the
 Landau TS/SNES/GPU-solver on the V/A100s, is that the vector operations are
 expensive or crazy expensive.
 You can see (attached) and the times here that the solve is dominated
 by not-mat-vec:


 
 EventCount  Time (sec) Flop
  --- Global ---  --- Stage   *Total   GPU *   - CpuToGpu -
   - GpuToCpu - GPU
Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen
  Reduct  %T %F %M %L %R  %T %F %M %L %R *Mflop/s Mflop/s* Count   Size
   Count   Size  %F

 ---
 17:15 main=
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ grep "MatMult
  400" jac_out_00*5_8_gpuawaremp*
 MatMult  400 1.0 *1.2507e+00* 1.3 1.34e+10 1.1 3.7e+05
 1.6e+04 0.0e+00  1 55 62 54  0  27 91100100  0 *668874   0*  0
 0.00e+000 0.00e+00 100
 17:15 main=
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ grep "KSPSolve
   2" jac_out_001*_5_8_gpuawaremp*
 KSPSolve   2 1.0 *4.4173e+00* 1.0 1.48e+10 1.1 3.7e+05
 1.6e+04 1.2e+03  4 60 62 54 61 100100100100100 *208923   1094405*
  0 0.00e+000 0.00e+00 100

 Notes about flop counters here,
 * that MatMult flops are not logged as GPU flops but something is
 logged nonetheless.
 * The GPU flop rate is 5x the total flop rate  in KSPSolve :\
 * I think these nodes have an FP64 peak flop rate of 200 Tflops, so we
 are at < 1%.

>>>
>>> This looks complicated, so just a single remark:
>>>
>>> My understanding of the benchmarking of vector ops led by Hannah was
>>> that you needed to be much
>>> bigger than 16M to hit peak. I need to get the tech report, but on 8
>>> GPUs I would think you would be
>>> at 10% of peak or something right off the bat at these sizes. Barry, is
>>> that right?
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Anway, not sure how to proceed but I thought I would share.
 Maybe ask the Kokkos guys if the have looked at Crusher.

 Mark

>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>>
>>>
>>> 
>>
>>
>>


Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-21 Thread Junchao Zhang
On Fri, Jan 21, 2022 at 8:08 PM Barry Smith  wrote:

>
>   Junchao, Mark,
>
>  Some of the logging information is non-sensible, MatMult says all
> flops are done on the GPU (last column) but the GPU flop rate is zero.
>
>  It looks like  MatMult_SeqAIJKokkos() is missing
> PetscLogGpuTimeBegin()/End() in fact all the operations in
> aijkok.kokkos.cxx seem to be missing it. This might explain the crazy 0 GPU
> flop rate. Can this be fixed ASAP?
>
I will add this profiling temporarily.  I may use Kokkos own profiling APIs
later.


>
>  Regarding VecOps, sure looks the kernel launches are killing
> performance.
>
>But in particular look at the VecTDot and VecNorm CPU flop
> rates compared to the GPU, much lower, this tells me the MPI_Allreduce is
> likely hurting performance in there also a great deal. It would be good to
> see a single MPI rank job to compare to see performance without the MPI
> overhead.
>
>
>
>
>
>
>
> On Jan 21, 2022, at 6:41 PM, Mark Adams  wrote:
>
> I am looking at performance of a CG/Jacobi solve on a 3D Q2 Laplacian
> (ex13) on one Crusher node (8 GPUs on 4 GPU sockets, MI250X or is it
> MI200?).
> This is with a 16M equation problem. GPU-aware MPI and non GPU-aware MPI
> are similar (mat-vec is a little faster w/o, the total is about the same,
> call it noise)
>
> I found that MatMult was about 3x faster using 8 cores/GPU, that is all 64
> cores on the node, then when using 1 core/GPU. With the same size problem
> of course.
> I was thinking MatMult should be faster with just one MPI process. Oh
> well, worry about that later.
>
> The bigger problem, and I have observed this to some extent with the
> Landau TS/SNES/GPU-solver on the V/A100s, is that the vector operations are
> expensive or crazy expensive.
> You can see (attached) and the times here that the solve is dominated by
> not-mat-vec:
>
>
> 
> EventCount  Time (sec) Flop
>--- Global ---  --- Stage   *Total   GPU *   - CpuToGpu -   -
> GpuToCpu - GPU
>Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R *Mflop/s Mflop/s* Count   Size
> Count   Size  %F
>
> ---
> 17:15 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$
> grep "MatMult  400" jac_out_00*5_8_gpuawaremp*
> MatMult  400 1.0 *1.2507e+00* 1.3 1.34e+10 1.1 3.7e+05
> 1.6e+04 0.0e+00  1 55 62 54  0  27 91100100  0 *668874   0*  0
> 0.00e+000 0.00e+00 100
> 17:15 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$
> grep "KSPSolve   2" jac_out_001*_5_8_gpuawaremp*
> KSPSolve   2 1.0 *4.4173e+00* 1.0 1.48e+10 1.1 3.7e+05
> 1.6e+04 1.2e+03  4 60 62 54 61 100100100100100 *208923   1094405*  0
> 0.00e+000 0.00e+00 100
>
> Notes about flop counters here,
> * that MatMult flops are not logged as GPU flops but something is logged
> nonetheless.
> * The GPU flop rate is 5x the total flop rate  in KSPSolve :\
> * I think these nodes have an FP64 peak flop rate of 200 Tflops, so we are
> at < 1%.
>
> Anway, not sure how to proceed but I thought I would share.
> Maybe ask the Kokkos guys if the have looked at Crusher.
>
> Mark
>
>
> 
>
>
>


Re: [petsc-dev] Gitlab workflow discussion with GitLab developers

2022-01-20 Thread Junchao Zhang
*  Email notification when one is mentioned or added as a reviewer
*  Color text in comment box
*  Click a failed job, run the job with the *updated* branch
*  Allow one to reorder commits (e.g., the fix up commits generated from
applying comments) and mark commits that should be fixed up
*  Easily retarget a branch, e.g., from main to release (currently I have
to checkout to local machine, do rebase, then push)

--Junchao Zhang


On Thu, Jan 20, 2022 at 7:05 PM Barry Smith  wrote:

>
>   I got asked to go over some of my Gitlab workflow uses next week with
> some Gitlab developers; they do this to understand how Gitlab is used, how
> it can be improved etc.
>
>   If anyone has ideas on topics I should hit, let me know. I will hit them
> on the brokenness of appropriate code-owners not being automatically added
> to reviewers. And support for people outside of the Petsc group to set more
> things when they make MRs. And being to easily add non-PETSc folks as
> reviewers.
>
>   Barry
>
>


Re: [petsc-dev] cuda with kokkos-cuda build fail

2022-01-07 Thread Junchao Zhang
On Fri, Jan 7, 2022 at 11:17 AM Mark Adams  wrote:

> These are cuda/cusparse tests. The Kokkos versions are fine and cusparse
> w/o a Kokkos build is fine.
>
> I do have some #ifdefs in the code. Maybe something snuck into the #ifdef
> KOKKOS, but I can't imagine what that could even be.
>
> I have had problems with very large "cuda" jobs (on Summit with 21 MPI
> processes per GPU) running out of "resources" with a Kokkos build, that
> went away with a pure CUDA build (ie, w/o Kokkos), but these are tiny tests.
>
If Kokkos is initialized on MPI ranks, then each rank will consume
resources on GPU.

>
> I will try it again.
>
> Thanks,
>
>
> On Fri, Jan 7, 2022 at 12:06 PM Junchao Zhang 
> wrote:
>
>> It failed when you did not even pass any vec/mat kokkos options?  It does
>> not make sense and you need to double check that.
>> --Junchao Zhang
>>
>>
>> On Thu, Jan 6, 2022 at 9:33 PM Mark Adams  wrote:
>>
>>> I seem to have a regression with using aijcusprase in a kokkos build.
>>> It's OK with a straight CUDA build.
>>>
>>> # [0]PETSC ERROR: - Error Message
>>> --
>>> # [0]PETSC ERROR: GPU error
>>> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED)
>>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> # [0]PETSC ERROR: Petsc Development GIT revision:
>>> v3.16.3-511-g96172674f3  GIT Date: 2022-01-06 23:44:32 +
>>> # [0]PETSC ERROR:
>>> /global/u2/m/madams/petsc_install/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1
>>> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003188 by madams Thu Jan
>>>  6 19:29:06 2022
>>> # [0]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
>>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4"
>>> --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
>>> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
>>> --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
>>> --with-debugging=0 --download-metis --download-parmetis --with-cuda=1
>>> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
>>> --with-zlib=1 --download-kokkos --download-kokkos-kernels
>>> --with-kokkos-kernels-tpl=0 --with-make-np=8
>>> PETSC_DIR=/global/homes/m/madams/petsc_install/petsc
>>> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
>>> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at
>>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/impls/seq/seqcuda/
>>> veccuda2.cu:994
>>> # [0]PETSC ERROR: #2 VecNorm() at
>>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/interface/rvector.c:228
>>> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at
>>> /global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c:179
>>> # [0]PETSC ERROR: #4 SNESSolve() at
>>> /global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c:4810
>>> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at
>>> /global/u2/m/madams/petsc_install/petsc/src/ts/impls/arkimex/arkimex.c:845
>>> # [0]PETSC ERROR: #6 TSStep() at
>>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3572
>>> # [0]PETSC ERROR: #7 TSSolve() at
>>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3971
>>> # [0]PETSC ERROR: #8 main() at
>>> /global/u2/m/madams/petsc_install/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45
>>> # [0]PETSC ERROR: PETSc Option Table entries:
>>> # [0]PETSC ERROR: -check_pointer_intensity 0
>>> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1
>>> # [0]PETSC ERROR: -dm_landau_device_type cuda
>>> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18
>>> # [0]PETSC ERROR: -dm_landau_ion_masses 2,4
>>> # [0]PETSC ERROR: -dm_landau_n 1.00018,1,1e-5
>>> # [0]PETSC ERROR: -dm_landau_n_0 1e20
>>> # [0]PETSC ERROR: -dm_landau_num_species_grid 1,2
>>> # [0]PETSC ERROR: -dm_landau_thermal_temps 5,5,.5
>>> # [0]PETSC ERROR: -dm_landau_type p4est
>>> # [0]PETSC ERROR: -dm_mat_type aijcusparse
>>> # [0]PETSC ERROR: -dm_preallocate_only false
>>> # [0]PETSC ERROR: -dm_vec_type cuda
>>> # [0]PETSC ERROR: -error_output

Re: [petsc-dev] cuda with kokkos-cuda build fail

2022-01-07 Thread Junchao Zhang
It failed when you did not even pass any vec/mat kokkos options?  It does
not make sense and you need to double check that.
--Junchao Zhang


On Thu, Jan 6, 2022 at 9:33 PM Mark Adams  wrote:

> I seem to have a regression with using aijcusprase in a kokkos build. It's
> OK with a straight CUDA build.
>
> # [0]PETSC ERROR: - Error Message
> --
> # [0]PETSC ERROR: GPU error
> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED)
> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> # [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-511-g96172674f3
>  GIT Date: 2022-01-06 23:44:32 +
> # [0]PETSC ERROR:
> /global/u2/m/madams/petsc_install/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1
> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003188 by madams Thu Jan
>  6 19:29:06 2022
> # [0]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4"
> --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
> --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
> --with-debugging=0 --download-metis --download-parmetis --with-cuda=1
> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
> --with-zlib=1 --download-kokkos --download-kokkos-kernels
> --with-kokkos-kernels-tpl=0 --with-make-np=8
> PETSC_DIR=/global/homes/m/madams/petsc_install/petsc
> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at
> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/impls/seq/seqcuda/
> veccuda2.cu:994
> # [0]PETSC ERROR: #2 VecNorm() at
> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/interface/rvector.c:228
> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at
> /global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c:179
> # [0]PETSC ERROR: #4 SNESSolve() at
> /global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c:4810
> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at
> /global/u2/m/madams/petsc_install/petsc/src/ts/impls/arkimex/arkimex.c:845
> # [0]PETSC ERROR: #6 TSStep() at
> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3572
> # [0]PETSC ERROR: #7 TSSolve() at
> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3971
> # [0]PETSC ERROR: #8 main() at
> /global/u2/m/madams/petsc_install/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45
> # [0]PETSC ERROR: PETSc Option Table entries:
> # [0]PETSC ERROR: -check_pointer_intensity 0
> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1
> # [0]PETSC ERROR: -dm_landau_device_type cuda
> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18
> # [0]PETSC ERROR: -dm_landau_ion_masses 2,4
> # [0]PETSC ERROR: -dm_landau_n 1.00018,1,1e-5
> # [0]PETSC ERROR: -dm_landau_n_0 1e20
> # [0]PETSC ERROR: -dm_landau_num_species_grid 1,2
> # [0]PETSC ERROR: -dm_landau_thermal_temps 5,5,.5
> # [0]PETSC ERROR: -dm_landau_type p4est
> # [0]PETSC ERROR: -dm_mat_type aijcusparse
> # [0]PETSC ERROR: -dm_preallocate_only false
> # [0]PETSC ERROR: -dm_vec_type cuda
> # [0]PETSC ERROR: -error_output_stdout
> # [0]PETSC ERROR: -ksp_type preonly
> # [0]PETSC ERROR: -malloc_dump
> # [0]PETSC ERROR: -mat_cusparse_use_cpu_solve
> # [0]PETSC ERROR: -nox
> # [0]PETSC ERROR: -nox_warning
> # [0]PETSC ERROR: -pc_type lu
> # [0]PETSC ERROR: -petscspace_degree 3
> # [0]PETSC ERROR: -petscspace_poly_tensor 1
> # [0]PETSC ERROR: -snes_converged_reason
> # [0]PETSC ERROR: -snes_monitor
> # [0]PETSC ERROR: -snes_rtol 1.e-14
> # [0]PETSC ERROR: -snes_stol 1.e-14
> # [0]PETSC ERROR: -ts_adapt_clip .5,1.25
> # [0]PETSC ERROR: -ts_adapt_scale_solve_failed 0.75
> # [0]PETSC ERROR: -ts_adapt_time_step_increase_delay 5
> # [0]PETSC ERROR: -ts_arkimex_type 1bee
> # [0]PETSC ERROR: -ts_dt 1.e-1
> # [0]PETSC ERROR: -ts_max_snes_failures -1
> # [0]PETSC ERROR: -ts_max_steps 1
> # [0]PETSC ERROR: -ts_max_time 1
> # [0]PETSC ERROR: -ts_monitor
> # [0]PETSC ERROR: -ts_rtol 1e-1
> # [0]PETSC ERROR: -ts_type arkimex
> # [0]PETSC ERROR: -use_gpu_aware_mpi 0
> # [0]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--
>


Re: [petsc-dev] Questions around benchmarking and data loading with PETSc

2021-12-11 Thread Junchao Zhang
I expected TACO was better since its website says "It uses novel compiler
techniques to get performance competitive with hand-optimized kernels"

--Junchao Zhang


On Sat, Dec 11, 2021 at 5:56 PM Rohan Yadav  wrote:

> Sorry, what’s surprising about this? 40 mpi ranks on a single node should
> be similar performance as 40 threads. Both petsc and taco are doing a
> row-based parallelism strategy so it should line up.
>
> Rohan Yadav
>
> On Dec 11, 2021, at 6:44 PM, Junchao Zhang 
> wrote:
>
> 
>
> On Sat, Dec 11, 2021 at 5:09 PM Rohan Yadav  wrote:
>
>> > Did you mean with 1 rank or 40 mpi ranks, petsc's performance is close
>> to 1 thread or 40 threads of TACO?
>>
>> The 1 rank time is the same as taco 1 thread, and the 40 rank time is the
>> same as taco 40 threads.
>>
> Interesting. TACO is supposed to give an optimized SpMV.
>
>
>>
>> Rohan
>>
>> On Sat, Dec 11, 2021 at 6:07 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Sat, Dec 11, 2021, 4:22 PM Rohan Yadav  wrote:
>>>
>>>> Thanks all for the help, the main problem was the lack of optimization
>>>> flags in the default build provided by my system. A manual installation
>>>> with optimization flags delivers performance equal to the single node
>>>> benchmark I discussed before.
>>>>
>>> Did you mean with 1 rank or 40 mpi ranks, petsc's performance is close
>>> to 1 thread or 40 threads of TACO?
>>>
>>>>
>>>> Rohan
>>>>
>>>> On Sat, Dec 11, 2021 at 4:04 PM Rohan Yadav 
>>>> wrote:
>>>>
>>>>> > The matrix market file in text format is not good for load.  One
>>>>> should convert it to petsc binary format (only once), and use the new
>>>>> binary file  afterwards.
>>>>>
>>>>> Yes, I understand this. The point I'm trying to make is that using
>>>>> PETSc to even perform the initial conversion from matrix market to the
>>>>> binary format was prohibitively slow using `MatSetValues`.
>>>>>
>>>>> > I meant 10 lines of code without any function call, which can be
>>>>> thought of as a textbook implementation of SpMV. As a baseline, one can
>>>>> apply optimizations to it.  PETSc does not do sophisticated sparse matrix
>>>>> optimization itself, instead it relies on third-party libraries.  I
>>>>> remember we had OSKI from Berkeley for CPU, and on GPU we use cuSparse,
>>>>> hipSparse, MKLSparse or Kokkos-Kernels. If TACO is good, then petsc can 
>>>>> add
>>>>> an interface to it too.
>>>>>
>>>>> Yes, this is what I expected. Given that PETSc uses high-performance
>>>>> kernels for for the sparse matrix operation itself, I was surprised to see
>>>>> that the single-thread performance of PETSc to be closer to a baseline 
>>>>> like
>>>>> TACO. This performance will likely improve when I compile PETSc with
>>>>> optimization flags.
>>>>>
>>>>> Rohan
>>>>>
>>>>> On Sat, Dec 11, 2021 at 1:04 PM Junchao Zhang 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 11, 2021 at 10:28 AM Rohan Yadav 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junchao,
>>>>>>>
>>>>>>> Thanks for the response!
>>>>>>>
>>>>>>> > You can use https://petsc.org/main/src/mat/tests/ex72.c.html to
>>>>>>> convert a Matrix Market file into a petsc binary file. And then in
>>>>>>> your test, load the binary matrix, following this example
>>>>>>> https://petsc.org/main/src/mat/tutorials/ex1.c.html
>>>>>>>
>>>>>>> I tried an example like this, but the performance was too slow (it
>>>>>>> would process ~2000-3000 calls to `SetValue` a second), which is not
>>>>>>> reasonable for loading matrices with millions of non-zeros.
>>>>>>>
>>>>>> The matrix market file in text format is not good for load.  One
>>>>>> should convert it to petsc binary format (only once), and use the new
>>>>>> binary file  afterwards.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> > I

Re: [petsc-dev] Questions around benchmarking and data loading with PETSc

2021-12-11 Thread Junchao Zhang
On Sat, Dec 11, 2021 at 5:09 PM Rohan Yadav  wrote:

> > Did you mean with 1 rank or 40 mpi ranks, petsc's performance is close
> to 1 thread or 40 threads of TACO?
>
> The 1 rank time is the same as taco 1 thread, and the 40 rank time is the
> same as taco 40 threads.
>
Interesting. TACO is supposed to give an optimized SpMV.


>
> Rohan
>
> On Sat, Dec 11, 2021 at 6:07 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Sat, Dec 11, 2021, 4:22 PM Rohan Yadav  wrote:
>>
>>> Thanks all for the help, the main problem was the lack of optimization
>>> flags in the default build provided by my system. A manual installation
>>> with optimization flags delivers performance equal to the single node
>>> benchmark I discussed before.
>>>
>> Did you mean with 1 rank or 40 mpi ranks, petsc's performance is close to
>> 1 thread or 40 threads of TACO?
>>
>>>
>>> Rohan
>>>
>>> On Sat, Dec 11, 2021 at 4:04 PM Rohan Yadav 
>>> wrote:
>>>
>>>> > The matrix market file in text format is not good for load.  One
>>>> should convert it to petsc binary format (only once), and use the new
>>>> binary file  afterwards.
>>>>
>>>> Yes, I understand this. The point I'm trying to make is that using
>>>> PETSc to even perform the initial conversion from matrix market to the
>>>> binary format was prohibitively slow using `MatSetValues`.
>>>>
>>>> > I meant 10 lines of code without any function call, which can be
>>>> thought of as a textbook implementation of SpMV. As a baseline, one can
>>>> apply optimizations to it.  PETSc does not do sophisticated sparse matrix
>>>> optimization itself, instead it relies on third-party libraries.  I
>>>> remember we had OSKI from Berkeley for CPU, and on GPU we use cuSparse,
>>>> hipSparse, MKLSparse or Kokkos-Kernels. If TACO is good, then petsc can add
>>>> an interface to it too.
>>>>
>>>> Yes, this is what I expected. Given that PETSc uses high-performance
>>>> kernels for for the sparse matrix operation itself, I was surprised to see
>>>> that the single-thread performance of PETSc to be closer to a baseline like
>>>> TACO. This performance will likely improve when I compile PETSc with
>>>> optimization flags.
>>>>
>>>> Rohan
>>>>
>>>> On Sat, Dec 11, 2021 at 1:04 PM Junchao Zhang 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Dec 11, 2021 at 10:28 AM Rohan Yadav 
>>>>> wrote:
>>>>>
>>>>>> Hi Junchao,
>>>>>>
>>>>>> Thanks for the response!
>>>>>>
>>>>>> > You can use https://petsc.org/main/src/mat/tests/ex72.c.html to
>>>>>> convert a Matrix Market file into a petsc binary file. And then in
>>>>>> your test, load the binary matrix, following this example
>>>>>> https://petsc.org/main/src/mat/tutorials/ex1.c.html
>>>>>>
>>>>>> I tried an example like this, but the performance was too slow (it
>>>>>> would process ~2000-3000 calls to `SetValue` a second), which is not
>>>>>> reasonable for loading matrices with millions of non-zeros.
>>>>>>
>>>>> The matrix market file in text format is not good for load.  One
>>>>> should convert it to petsc binary format (only once), and use the new
>>>>> binary file  afterwards.
>>>>>
>>>>>
>>>>>>
>>>>>> > I don't know what "No Races" means, but it seems you'd better also
>>>>>> verify the result of SpMV.
>>>>>>
>>>>>> This is a correct implementation of SpMV. The no-races is fine as it
>>>>>> parallelizes over the rows of the matrix, and thus does not need
>>>>>> synchronization between writes to the output.
>>>>>>
>>>>>> > You can think petsc's default CSR spmv is the baseline,  which is
>>>>>> done in ~10 lines of code.
>>>>>>
>>>>>> I'm sorry, but I don't think that is a reasonable statement w.r.t to
>>>>>> the lines of code making it a good baseline. The TACO compiler also can 
>>>>>> be
>>>>>> used in 10 lines of code to compute an SpMV, or any other 
>>>>>> state

Re: [petsc-dev] Questions around benchmarking and data loading with PETSc

2021-12-11 Thread Junchao Zhang
On Sat, Dec 11, 2021, 4:22 PM Rohan Yadav  wrote:

> Thanks all for the help, the main problem was the lack of optimization
> flags in the default build provided by my system. A manual installation
> with optimization flags delivers performance equal to the single node
> benchmark I discussed before.
>
Did you mean with 1 rank or 40 mpi ranks, petsc's performance is close to 1
thread or 40 threads of TACO?

>
> Rohan
>
> On Sat, Dec 11, 2021 at 4:04 PM Rohan Yadav  wrote:
>
>> > The matrix market file in text format is not good for load.  One should
>> convert it to petsc binary format (only once), and use the new binary file
>> afterwards.
>>
>> Yes, I understand this. The point I'm trying to make is that using PETSc
>> to even perform the initial conversion from matrix market to the binary
>> format was prohibitively slow using `MatSetValues`.
>>
>> > I meant 10 lines of code without any function call, which can be
>> thought of as a textbook implementation of SpMV. As a baseline, one can
>> apply optimizations to it.  PETSc does not do sophisticated sparse matrix
>> optimization itself, instead it relies on third-party libraries.  I
>> remember we had OSKI from Berkeley for CPU, and on GPU we use cuSparse,
>> hipSparse, MKLSparse or Kokkos-Kernels. If TACO is good, then petsc can add
>> an interface to it too.
>>
>> Yes, this is what I expected. Given that PETSc uses high-performance
>> kernels for for the sparse matrix operation itself, I was surprised to see
>> that the single-thread performance of PETSc to be closer to a baseline like
>> TACO. This performance will likely improve when I compile PETSc with
>> optimization flags.
>>
>> Rohan
>>
>> On Sat, Dec 11, 2021 at 1:04 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>>
>>> On Sat, Dec 11, 2021 at 10:28 AM Rohan Yadav 
>>> wrote:
>>>
>>>> Hi Junchao,
>>>>
>>>> Thanks for the response!
>>>>
>>>> > You can use https://petsc.org/main/src/mat/tests/ex72.c.html to
>>>> convert a Matrix Market file into a petsc binary file. And then in
>>>> your test, load the binary matrix, following this example
>>>> https://petsc.org/main/src/mat/tutorials/ex1.c.html
>>>>
>>>> I tried an example like this, but the performance was too slow (it
>>>> would process ~2000-3000 calls to `SetValue` a second), which is not
>>>> reasonable for loading matrices with millions of non-zeros.
>>>>
>>> The matrix market file in text format is not good for load.  One should
>>> convert it to petsc binary format (only once), and use the new binary file
>>> afterwards.
>>>
>>>
>>>>
>>>> > I don't know what "No Races" means, but it seems you'd better also
>>>> verify the result of SpMV.
>>>>
>>>> This is a correct implementation of SpMV. The no-races is fine as it
>>>> parallelizes over the rows of the matrix, and thus does not need
>>>> synchronization between writes to the output.
>>>>
>>>> > You can think petsc's default CSR spmv is the baseline,  which is
>>>> done in ~10 lines of code.
>>>>
>>>> I'm sorry, but I don't think that is a reasonable statement w.r.t to
>>>> the lines of code making it a good baseline. The TACO compiler also can be
>>>> used in 10 lines of code to compute an SpMV, or any other state-of-the-art
>>>> library could wrap an SpMV implementation behind a single function call.
>>>> I'm wondering if this performance I'm seeing using PETSc is expected, or if
>>>> I've misconfigured or am misusing the system in some way.
>>>>
>>> I meant 10 lines of code without any function call, which can be thought
>>> of as a textbook implementation of SpMV. As a baseline, one can apply
>>> optimizations to it.  PETSc does not do sophisticated sparse matrix
>>> optimization itself, instead it relies on third-party libraries.  I
>>> remember we had OSKI from Berkeley for CPU, and on GPU we use cuSparse,
>>> hipSparse, MKLSparse or Kokkos-Kernels. If TACO is good, then petsc can add
>>> an interface to it too.
>>>
>>>
>>>> Rohan
>>>>
>>>>
>>>> On Fri, Dec 10, 2021 at 11:39 PM Junchao Zhang 
>>>> wrote:
>>>>
>>>>> On Fri, Dec 10, 2021 at 8:05 PM Rohan Yadav 
>>>>> wrote:
>>>>>
>>>>>> Hi, I’m Ro

Re: [petsc-dev] Questions around benchmarking and data loading with PETSc

2021-12-11 Thread Junchao Zhang
On Sat, Dec 11, 2021 at 10:28 AM Rohan Yadav  wrote:

> Hi Junchao,
>
> Thanks for the response!
>
> > You can use https://petsc.org/main/src/mat/tests/ex72.c.html to convert
> a Matrix Market file into a petsc binary file. And then in your test,
> load the binary matrix, following this example
> https://petsc.org/main/src/mat/tutorials/ex1.c.html
>
> I tried an example like this, but the performance was too slow (it would
> process ~2000-3000 calls to `SetValue` a second), which is not reasonable
> for loading matrices with millions of non-zeros.
>
The matrix market file in text format is not good for load.  One should
convert it to petsc binary format (only once), and use the new binary file
afterwards.


>
> > I don't know what "No Races" means, but it seems you'd better also
> verify the result of SpMV.
>
> This is a correct implementation of SpMV. The no-races is fine as it
> parallelizes over the rows of the matrix, and thus does not need
> synchronization between writes to the output.
>
> > You can think petsc's default CSR spmv is the baseline,  which is done
> in ~10 lines of code.
>
> I'm sorry, but I don't think that is a reasonable statement w.r.t to the
> lines of code making it a good baseline. The TACO compiler also can be used
> in 10 lines of code to compute an SpMV, or any other state-of-the-art
> library could wrap an SpMV implementation behind a single function call.
> I'm wondering if this performance I'm seeing using PETSc is expected, or if
> I've misconfigured or am misusing the system in some way.
>
I meant 10 lines of code without any function call, which can be thought of
as a textbook implementation of SpMV. As a baseline, one can apply
optimizations to it.  PETSc does not do sophisticated sparse matrix
optimization itself, instead it relies on third-party libraries.  I
remember we had OSKI from Berkeley for CPU, and on GPU we use cuSparse,
hipSparse, MKLSparse or Kokkos-Kernels. If TACO is good, then petsc can add
an interface to it too.


> Rohan
>
>
> On Fri, Dec 10, 2021 at 11:39 PM Junchao Zhang 
> wrote:
>
>> On Fri, Dec 10, 2021 at 8:05 PM Rohan Yadav 
>> wrote:
>>
>>> Hi, I’m Rohan, a student working on compilation techniques for
>>> distributed tensor computations. I’m looking at using PETSc as a baseline
>>> for experiments I’m running, and want to understand if I’m using PETSc as
>>> it was intended to achieve high performance, and if the performance I’m
>>> seeing is expected. Currently, I’m just looking at SpMV operations.
>>>
>>>
>>> My experiments are run on the Lassen Supercomputer (
>>> https://hpc.llnl.gov/hardware/platforms/lassen). The system has 40
>>> CPUs, 4 V100s and an Infiniband interconnect. A visualization of the
>>> architecture is here:
>>> https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png
>>> .
>>>
>>>
>>> As of now, I’m trying to understand the single-node performance of
>>> PETSc, as the scaling performance onto multiple nodes appears to be as I
>>> expect. I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix
>>> collection, detailed here: https://sparse.tamu.edu/LAW/arabic-2005. As
>>> a trusted baseline, I am comparing against SpMV code generated by the TACO
>>> compiler (
>>> http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)=y:d:0;A:ds:0,1;x:d:0=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)
>>> .
>>>
>> I don't know what "No Races" means, but it seems you'd better also verify
>> the result of SpMV.
>>
>>>
>>> My experiments find that PETSc is roughly 4 times slower on a single
>>> thread and node than the kernel generated by TACO:
>>>
>>>
>>> PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms.
>>>
>>> TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms.
>>>
>> You can think petsc's default CSR spmv is the baseline,  which is done in
>> ~10 lines of code.
>>
>>>
>>> My code using PETSc is here:
>>> https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38
>>> .
>>>
>>>
>>> Runs from 1 thread and 1 node with -log_view are attached to the email.
>>> The command lines for each were as follows:
>>>
>>>
>>> 1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20
>>> -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`
>>>
>>> 1 node 40 threads: `jsrun -n 40 -c 1 -r 

Re: [petsc-dev] Questions around benchmarking and data loading with PETSc

2021-12-10 Thread Junchao Zhang
On Fri, Dec 10, 2021 at 8:05 PM Rohan Yadav  wrote:

> Hi, I’m Rohan, a student working on compilation techniques for distributed
> tensor computations. I’m looking at using PETSc as a baseline for
> experiments I’m running, and want to understand if I’m using PETSc as it
> was intended to achieve high performance, and if the performance I’m seeing
> is expected. Currently, I’m just looking at SpMV operations.
>
>
> My experiments are run on the Lassen Supercomputer (
> https://hpc.llnl.gov/hardware/platforms/lassen). The system has 40 CPUs,
> 4 V100s and an Infiniband interconnect. A visualization of the architecture
> is here:
> https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png.
>
>
> As of now, I’m trying to understand the single-node performance of PETSc,
> as the scaling performance onto multiple nodes appears to be as I expect.
> I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix
> collection, detailed here: https://sparse.tamu.edu/LAW/arabic-2005. As a
> trusted baseline, I am comparing against SpMV code generated by the TACO
> compiler (
> http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)=y:d:0;A:ds:0,1;x:d:0=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)
> .
>
I don't know what "No Races" means, but it seems you'd better also verify
the result of SpMV.

>
> My experiments find that PETSc is roughly 4 times slower on a single
> thread and node than the kernel generated by TACO:
>
>
> PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms.
>
> TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms.
>
You can think petsc's default CSR spmv is the baseline,  which is done in
~10 lines of code.

>
> My code using PETSc is here:
> https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38
> .
>
>
> Runs from 1 thread and 1 node with -log_view are attached to the email.
> The command lines for each were as follows:
>
>
> 1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup
> 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`
>
> 1 node 40 threads: `jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20
> -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`
>
>
>
> In addition to these benchmarking concerns, I wanted to share my
> experiences trying to load data from Matrix Market files into PETSc, which
> ended up 1being much more difficult than I anticipated. Essentially, trying
> to iterate through the Matrix Market files and using `write` to insert
> entries into a `Mat` was extremely slow. In order to get reasonable
> performance, I had to use an external utility to basically construct a CSR
> matrix, and then pass the arrays from the CSR Matrix into
> `MatCreateSeqAIJWithArrays`. I couldn’t find any more guidance on PETSc
> forums or Google, so I wanted to know if this was the right way to go.
>
>
> Thanks,
>
>
> Rohan Yadav
>


Re: [petsc-dev] kokkos error

2021-11-21 Thread Junchao Zhang
You can add -v to the offending command line to see what happened, i.e.,
how nvcc_wrapper passed options to g++.

--Junchao Zhang


On Sun, Nov 21, 2021 at 12:05 PM Mark Adams  wrote:

> Any idea what is going wrong with this?
>
> Using PETSC_DIR=/global/homes/m/madams/petsc and
> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> gmake[3]: [/global/homes/m/madams/petsc/lib/petsc/conf/rules:301:
> ex3k.PETSc] Error 2 (ignored)
> ***Error detected during compile or
> link!***
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> /global/homes/m/madams/petsc/src/snes/tutorials ex3k
>
> *
>
> PATH=/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib:/global/common/software/nersc/shasta2105/arm-forge/21.0.2-linux-x86_64/bin:/global/common/software/nersc/cos1.3/cuda/11.3.0/bin:/opt/cray/pe/perftools/21.10.0/bin:/opt/cray/pe/papi/
> 6.0.0.10/bin:/opt/cray/pe/gcc/9.3.0/bin:/opt/cray/pe/craype/2.7.11/bin:/global/common/software/nersc/shasta2105/python/3.8-anaconda-2021.05/bin:/global/homes/m/madams/.local/perlmutter/3.8-anaconda-2021.05/bin:/global/common/software/nersc/shasta2105/cmake/git-master/bin:/global/common/software/nersc/bin:/opt/cray/libfabric/1.11.0.4.79/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/opt/cray/pe/bin:`dirname
> <http://6.0.0.10/bin:/opt/cray/pe/gcc/9.3.0/bin:/opt/cray/pe/craype/2.7.11/bin:/global/common/software/nersc/shasta2105/python/3.8-anaconda-2021.05/bin:/global/homes/m/madams/.local/perlmutter/3.8-anaconda-2021.05/bin:/global/common/software/nersc/shasta2105/cmake/git-master/bin:/global/common/software/nersc/bin:/opt/cray/libfabric/1.11.0.4.79/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mit/bin:/opt/cray/pe/bin:dirname>
> /global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc`
> NVCC_WRAPPER_DEFAULT_COMPILER=CC
> /global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/bin/nvcc_wrapper
> --expt-extended-lambda -g -Xcompiler -rdynamic -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -Xcompiler -fPIC -ccbin CC
> -std=c++17 -gencode arch=compute_80,code=sm_80  -Wno-deprecated-gpu-targets
> -I/opt/cray/pe/mpich/8.1.10/ofi/gnu/9.1/include -I/opt/cray/pe/libsci/
> 21.08.1.2/GNU/9.1/x86_64/include -I/opt/cray/pe/pmi/6.0.14/include
> -I/opt/cray/pe/dsmml/0.2.2/dsmml//include
> -I/opt/cray/xpmem/2.2.40-7.0.1.0_3.1__g1d7a24d.shasta/include
>  -I/global/homes/m/madams/petsc/include
> -I/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/include
> -I/global/common/software/nersc/cos1.3/cuda/11.3.0/include-g
> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -fPIC -O3 -g
> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -O3
> -L/opt/cray/pe/mpich/8.1.10/ofi/gnu/9.1/lib
> -L/opt/cray/pe/mpich/8.1.10/gtl/lib -L/opt/cray/pe/libsci/
> 21.08.1.2/GNU/9.1/x86_64/lib -L/opt/cray/pe/pmi/6.0.14/lib
> -L/opt/cray/pe/dsmml/0.2.2/dsmml//lib
> -L/opt/cray/xpmem/2.2.40-7.0.1.0_3.1__g1d7a24d.shasta/lib64
> -Wl,--as-needed,-lmpi_gnu_91,--no-as-needed
> -Wl,--as-needed,-lsci_gnu_82_mpi,--no-as-needed
> -Wl,--as-needed,-lsci_gnu_82,--no-as-needed
> -Wl,--as-needed,-ldsmml,--no-as-needed  -lpmi -lpmi2 -lmpi_gtl_cuda -ldl
> -lxpmem   ex3k.kokkos.cxx
>  
> -Wl,-rpath,/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib
> -L/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib
> -Wl,-rpath,/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib
> -L/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib
> -Wl,-rpath,/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64
> -L/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64
> -L/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64/stubs
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/gnu/9.1/lib
> -L/opt/cray/pe/mpich/8.1.10/ofi/gnu/9.1/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib
> -L/opt/cray/pe/mpich/8.1.10/gtl/lib -Wl,-rpath,/opt/cray/pe/libsci/
> 21.08.1.2/GNU/9.1/x86_64/lib -L/opt/cray/pe/libsci/
> 21.08.1.2/GNU/9.1/x86_64/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib
> -L/opt/cray/pe/pmi/6.0.14/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib
> -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib
> -Wl,-rpath,/opt/cray/xpmem/2.2.40-7.0.1.0_3.1__g1d7a24d.shasta/lib64
> -L/opt/cray/xpmem/2.2.40-7.0.1.0_3.1__g1d7a24d.shasta/lib64
> -Wl,-rpath,/opt/cray/pe/gcc/9.3.0/snos/lib/gcc/x86_64-suse-linux/9.3.0
> -L/opt/cray/pe/gcc/9.3.0/snos/lib/gcc/x86_64-suse-linux/9.3.0
> -Wl,-rpath,/opt/cray/pe/gcc/9.3.0/snos/lib64
> -L

Re: [petsc-dev] CUDA + OpenMP on Summit with Hypre

2021-11-13 Thread Junchao Zhang
On Sat, Nov 13, 2021 at 2:24 PM Mark Adams  wrote:

> I have a user that wants CUDA + Hypre on Sumit and they want to use OpenMP
> in their code. I configured with openmp but without thread safety and got
> this error.
>
> Maybe there is no need for us to do anything with omp in our
> configuration. Not sure.
>
> 15:08 main= summit:/gpfs/alpine/csc314/scratch/adams/petsc$ make
> PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3
> PETSC_ARCH="" check
> Running check examples to verify correct installation
> Using
> PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3
> and PETSC_ARCH=
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> [1] (280696) Warning: Could not find key lid0:0:2 in cache
> <=
> [1] (280696) Warning: Could not find key qpn0:0:0:2 in cache
> <=
> Unable to connect queue-pairs
> [h37n08:280696] Error: common_pami.c:1094 - ompi_common_pami_init() 1:
> Unable to create 1 PAMI communication context(s) rc=1
>
I don't know what petsc's thread safety is.  But this error seems to be in
the environment.   You can report to OLCF help.


> --
> No components were able to be opened in the pml framework.
>
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
>
>   Host:  h37n08
>   Framework: pml
> --
> [h37n08:280696] PML pami cannot be selected
> 1,5c1,16
> < lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
> <   0 SNES Function norm 0.0406612
> <   1 SNES Function norm 4.12227e-06
> <   2 SNES Function norm 6.098e-11
> < Number of SNES iterations = 2
> ---
> > [1] (280721) Warning: Could not find key lid0:0:2 in cache
> <=
> > [1] (280721) Warning: Could not find key qpn0:0:0:2 in cache
> <=
> > Unable to connect queue-pairs
> > [h37n08:280721] Error: common_pami.c:1094 - ompi_common_pami_init() 1:
> Unable to create 1 PAMI communication context(s) rc=1
> >
> --
> > No components were able to be opened in the pml framework.
> >
> > This typically means that either no components of this type were
> > installed, or none of the installed components can be loaded.
> > Sometimes this means that shared libraries required by these
> > components are unable to be found/loaded.
> >
> >   Host:  h37n08
> >   Framework: pml
> >
> --
> > [h37n08:280721] PML pami cannot be selected
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =
> 2,15c2,15
> <   0 SNES Function norm 2.391552133017e-01
> < 0 KSP Residual norm 2.325621076120e-01
> < 1 KSP Residual norm 1.654206318674e-02
> < 2 KSP Residual norm 7.202836119880e-04
> < 3 KSP Residual norm 1.796861424199e-05
> < 4 KSP Residual norm 2.461332992052e-07
> <   1 SNES Function norm 6.826585648929e-05
> < 0 KSP Residual norm 2.347339172985e-05
> < 1 KSP Residual norm 8.356798075993e-07
> < 2 KSP Residual norm 1.844045309619e-08
> < 3 KSP Residual norm 5.336386977405e-10
> < 4 KSP Residual norm 2.662608472862e-11
> <   2 SNES Function norm 6.549682264799e-11
> < Number of SNES iterations = 2
> ---
> > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is
> not GPU-aware. For better performance, please use a GPU-aware MPI.
> > [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To
> not see the message again, add the option to your .petscrc, OR add it to
> the env var PETSC_OPTIONS.
> > [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you
> may need jsrun --smpiargs=-gpu.
> > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda (
> https://www.open-mpi.org/faq/?category=buildcuda)
> > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (
> http://mvapich.cse.ohio-state.edu/userguide/gdr/)
> > [0]PETSC ERROR: For Cray-MPICH, you need to set
> MPICH_RDMA_ENABLED_CUDA=1 (
> https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/)
> >
> --
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> > with errorcode 76.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > 

Re: [petsc-dev] HIP / hypre

2021-11-10 Thread Junchao Zhang
Justin,
  We like to have a wrapper over CUDA/HIP since with that we only need to
maintain one code base.  Scott (Cc'ed) may introduce his work/thoughts
along this line and coordinate with Jacob and your team.

--Junchao Zhang


On Wed, Nov 10, 2021 at 12:50 PM Jacob Faibussowitsch 
wrote:

> What’s the plan going forward with this unified cuda/hip branch?
>
>
> The end goal is to integrate PetscDevice and PetscDeviceContext — new
> objects which encapsulates physical devices and device-side sets of
> operations respectively — into PETSc. PetscDeviceContext provides a
> framework for enqueueing work on device streams, but is far more
> extensible. For example, I extend it to a very basic graph-based
> “PetscCallGraph" implementation in
> https://gitlab.com/petsc/petsc/-/merge_requests/4217. I believe Junchao
> is also currently working on integrating SYCL into the PetscDevice
> framework.
>
> Is this related to what some of us have been hearing about PETSc
> eventually going with a unified wrapper over both the CUDA and HIP API?
>
>
> The unified CUDA-HIP wrapper exists mainly because both APIs similar
> enough that it made sense to wrap them, and is the man in the middle
> between direct cuda/hip calls and higher-level
> PetscDevice/PetscDeviceContext API.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Nov 10, 2021, at 12:28, Justin Chang  wrote:
>
> Jacob,
>
> What’s the plan going forward with this unified cuda/hip branch? Is this
> related to what some of us have been hearing about PETSc eventually going
> with a unified wrapper over both the CUDA and HIP API?
>
> We’re interested in making the HIP port more mature, including having
> support for the HIP part of HYPRE, but we’re unsure what direction you guys
> want to go with the GPU route.
>
> Thanks,
> Justin
>
> On Wed, Nov 10, 2021 at 12:19 PM Justin Chang  wrote:
>
>> Cc’ing Paul since I misspelled his email address initially
>>
>> On Wed, Nov 10, 2021 at 12:17 PM Jacob Faibussowitsch <
>> jacob@gmail.com> wrote:
>>
>>> I’m in the process of implementing asynchronous GPU support for petsc. A
>>> side effect of this is that I unify the cuda/hip interface such that
>>> anywhere we have cuda-like code we will automatically also get the hip
>>> variant.
>>>
>>> The scaffolding is in include/petsc/private/cupminterface.hpp, but for
>>> concrete examples see the jacobf/2021-10-21/veccupm-async branch for
>>> the WIP port of VecSeq in src/vec/vec/impls/seq/seqcupm/veccupm.hpp.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> On Nov 10, 2021, at 11:50, Justin Chang  wrote:
>>>
>>> Paul Bauman was also involved with the HIP port of HYPRE. Several of us
>>> at AMD are interested in getting HIP support for PETSc in general, and
>>> having HYPRE support would greatly help
>>>
>>> On Wed, Nov 10, 2021 at 11:47 AM Stefano Zampini <
>>> stefano.zamp...@gmail.com> wrote:
>>>
>>>> I did the work last summer. It's already available in 3.16
>>>>
>>>> Il Mer 10 Nov 2021, 20:44 Mark Adams  ha scritto:
>>>>
>>>>> Hypre has released HIP support and Ulrike says:
>>>>>
>>>>> I just want to let you know that hypre can now be used through PETSc
>>>>> with GPUs (both Nvidia and AMD).
>>>>>
>>>>> I am guessing we have some work to do to make this happen.
>>>>>
>>>>> What should I do?
>>>>>
>>>>
>>>
>


Re: [petsc-dev] OpenMP

2021-11-06 Thread Junchao Zhang
On Sat, Nov 6, 2021 at 3:51 PM Mark Adams  wrote:

> Yea, that is a bit inscrutable, but I see mumps is the main/only user of
> this:
>
> /* if using PETSc OpenMP support, we only call MUMPS on master ranks.
> Before/after the call, we change/restore CPUs the master ranks can run on */
>
> And I see _OPENMP is a macro for the release date (mm) of the OMP
> version. It's not clear what the v5.0 is (
> https://www.openmp.org/specifications/)
>
{200505,"2.5"},{200805,"3.0"},{201107,"3.1"},{201307,"4.0"},{201511,"4.5"},{201811,"5.0"},{202011,"5.1"}

On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:
>>
>>> Two questions on OMP:
>>>
>>> * Can I test for the version of OMP? I want >= 5 and I see this, which
>>> looks promising:
>>> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
>>> !defined(_WIN32)
>>>
>>> * What is the difference between HAVE_OPENMP and
>>> HAVE_OPENMP_SUPPORT.
>>>
>>> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
>> have facilities to support
>> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
>> hybrid mode
>> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found
>> and self.hwloc.found:
>> # Apple pthread does not provide this functionality
>> if self.function.check('pthread_barrier_init', libraries = 'pthread'):
>> self.addDefine('HAVE_OPENMP_SUPPORT', 1)
>>
>>
>>> Thanks,
>>> Mark
>>>
>>


Re: [petsc-dev] OpenMP

2021-11-06 Thread Junchao Zhang
On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:

> Two questions on OMP:
>
> * Can I test for the version of OMP? I want >= 5 and I see this, which
> looks promising:
> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
> !defined(_WIN32)
>
> * What is the difference between HAVE_OPENMP and
> HAVE_OPENMP_SUPPORT.
>
> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
have facilities to support
# running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
hybrid mode
if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found and
self.hwloc.found:
# Apple pthread does not provide this functionality
if self.function.check('pthread_barrier_init', libraries = 'pthread'):
self.addDefine('HAVE_OPENMP_SUPPORT', 1)


> Thanks,
> Mark
>


Re: [petsc-dev] cuda-memcheck finds an error on Summit

2021-09-26 Thread Junchao Zhang
Mark,
   without cuda-memcheck, did the test run?
--Junchao Zhang


On Sun, Sep 26, 2021 at 12:38 PM Mark Adams  wrote:

> FYI, I am getting this with cuda-memcheck on Summit with CUDA 11.0.3:
>
> jsrun -n 48 -a 6 -c 6 -g 1 -r 6 --smpiargs -gpu cuda-memcheck ../ex13-cu
> -dm_plex_box_faces 4,6,12 -petscpartitioner_simple_node_grid 2,2,2
> -dm_plex_box_upper 2,3,6 -petscpartitioner_simple_process_grid 2,3,6
> -dm_refine 3 -dm_mat_type aijcusparse -dm_vec_type cuda -dm_view
>
> This job runs with Kokkos and it runs with these 8 nodes with -dm_refine 2
> instead of 3. And it runs with the 1 node version of this test.
>
> Thanks,
> Mark
>
> [8]PETSC ERROR: - Error Message
> --
> [8]PETSC ERROR: GPU error
> [8]PETSC ERROR: cuda error 715 (cudaErrorIllegalInstruction) : an illegal
> instruction was encountered
> [8]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [8]PETSC ERROR: Petsc Development GIT revision: v3.15.4-943-g83e1f11c26
>  GIT Date: 2021-09-25 18:33:40 -0400
> [8]PETSC ERROR: ../ex13-cu on a arch-summit-opt-gnu-cuda named f10n17 by
> Unknown Sun Sep 26 13:29:03 2021
> [8]PETSC ERROR: Configure options --with-fc=mpifort --with-cc=mpicc
> --with-cxx=mpiCC --CFLAGS="-fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10
> -DLANDAU_MAX_Q=4" --CXXFLAGS="-fPIC -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --FCFLAGS="-fPIC
> -g" --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-ssl=0 --with-batch=0
> --with-mpiexec="jsrun -g1" --with-cuda=1 --with-cudac=nvcc
> --with-cuda-arch=70 --download-p4est=1 --download-zlib
> --with-blaslapack-lib="-L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-9.1.0/netlib-lapack-3.9.1-e6vxode53ghsjrop2dfwlq77s3dvkr7t/lib64
> -lblas -llapack" --with-x=0 --with-64-bit-indices=0 --with-debugging=0
> PETSC_ARCH=arch-summit-opt-gnu-cuda
> [8]PETSC ERROR: #1 PetscSFLinkSyncStream_CUDA() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/cuda/
> sfcuda.cu:872
> [8]PETSC ERROR: #2 PetscSFLinkSyncStreamBeforeCallMPI() at
> /gpfs/alpine/csc314/scratch/adams/petsc/include/../src/vec/is/sf/impls/basic/sfpack.h:344
> [8]PETSC ERROR: #3 PetscSFLinkStartRequests_MPI() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfmpi.c:40
> [8]PETSC ERROR: #4 PetscSFLinkStartCommunication() at
> /gpfs/alpine/csc314/scratch/adams/petsc/include/../src/vec/is/sf/impls/basic/sfpack.h:267
> [8]PETSC ERROR: #5 PetscSFBcastBegin_Basic() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:191
> [8]PETSC ERROR: #6 PetscSFBcastWithMemTypeBegin() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:1493
> [8]PETSC ERROR: #7 DMGlobalToLocalBegin() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:2613
> [8]PETSC ERROR: #8 SNESComputeJacobian_DMLocal() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/utils/dmlocalsnes.c:119
> [8]PETSC ERROR: #9 SNESComputeJacobian() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:2824
> [8]PETSC ERROR: #10 SNESSolve_KSPONLY() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:43
> [8]PETSC ERROR: #11 SNESSolve() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4769
> [8]PETSC ERROR: #12 main() at ex13.c:169
>


Re: [petsc-dev] cuSparse vector performance issue

2021-09-25 Thread Junchao Zhang
On Sat, Sep 25, 2021 at 4:45 PM Mark Adams  wrote:

> I am testing my Landau code, which is MPI serial, but with many
> independent MPI processes driving each GPU, in an MPI parallel harness code
> (Landau ex2).
>
> Vector operations with Kokkos Kernels and cuSparse are about the same (KK
> is faster) and a bit expensive with one process / GPU. About the same as my
> Jacobian construction, which is expensive but optimized on the GPU.  (I am
> using arkimex adaptive TS. I am guessing that it does a lot of vector ops,
> because there are a lot.)
>
> With 14 or 15 processes, all doing the same MPI serial problem, cuSparse
> is about 2.5x more expensive than KK. KK does degrad by about 15% from the
> one processor case. So KK is doing fine, but something bad is
> happening with cuSparse.
>
AIJKOKKOS and AIJCUSPARSE have different algorithms? I don't know.  To know
exactly, the best approach is to consult with Peng@nvidia to profile the
code.


>
> Anyone have any thoughts on this?
>
> Thanks,
> Mark
>
>


Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-14 Thread Junchao Zhang
Yes, we can turn it off.   The code without real use is just a
maintenance burden.

--Junchao Zhang

On Tue, Sep 14, 2021 at 10:45 AM Barry Smith  wrote:

>
>   Ok, so it could be a bug in PETSc, but if it appears with particular MPI
> implementations shouldn't we turn off the support in those cases we know it
> will fail?
>
>   Barry
>
>
> On Sep 14, 2021, at 11:10 AM, Junchao Zhang 
> wrote:
>
> MPI one-sided is tricky and needs careful synchronization (like OpenMP).
> An incorrect code could work in one interface but fail in another.
>
> --Junchao Zhang
>
>
> On Tue, Sep 14, 2021 at 10:01 AM Barry Smith  wrote:
>
>>
>>It sounds reproducible and related to using a particular versions of
>> OpenMPI and even particular interfaces.
>>
>>   Barry
>>
>>On Tue, Sep 14, 2021 at 2:35 AM Stefano Zampini <
>> stefano.zamp...@gmail.com> wrote:
>>>
>>> I can reproduce it even with OpenMPI 4.1.1 on a different machine
>>> (Ubuntu 18 + AMD milan + clang from AOCC)  and it is definitely an OpenMPI
>>> bug in the vader BTL If I use tcp, everything runs smooth
>>>
>>>
>>>
>>>
>> On Sep 14, 2021, at 10:54 AM, Junchao Zhang 
>> wrote:
>>
>> Without a standalone & valid mpi example to reproduce the error, we are
>> not assured to say it is an OpenMPI bug.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Sep 14, 2021 at 6:17 AM Matthew Knepley 
>> wrote:
>>
>>> Okay, we have to send this to OpenMPI. Volunteers?
>>>
>>> Maybe we should note this in the FAQ, or installation, so we remember
>>> how to fix it if someone else asks?
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>> On Tue, Sep 14, 2021 at 2:35 AM Stefano Zampini <
>>> stefano.zamp...@gmail.com> wrote:
>>>
>>>> I can reproduce it even with OpenMPI 4.1.1 on a different machine
>>>> (Ubuntu 18 + AMD milan + clang from AOCC)  and it is definitely an OpenMPI
>>>> bug in the vader BTL If I use tcp, everything runs smooth
>>>>
>>>> zampins@kanary:~/Devel/petsc$ cat
>>>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>>>> btl = tcp,self
>>>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test
>>>> vec_is_sf_tutorials-ex1_4
>>>> Using MAKEFLAGS:
>>>> TEST arch-debug/tests/counts/vec_is_sf_tutorials-ex1_4.counts
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>>>  ok
>>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>>>  ok
>>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>>>
>>>>
>>>> zampins@kanary:~/Devel/petsc$ cat
>>>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>>>> btl = vader,tcp,self
>>>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test

Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-14 Thread Junchao Zhang
MPI one-sided is tricky and needs careful synchronization (like OpenMP).
An incorrect code could work in one interface but fail in another.

--Junchao Zhang


On Tue, Sep 14, 2021 at 10:01 AM Barry Smith  wrote:

>
>It sounds reproducible and related to using a particular versions of
> OpenMPI and even particular interfaces.
>
>   Barry
>
>On Tue, Sep 14, 2021 at 2:35 AM Stefano Zampini <
> stefano.zamp...@gmail.com> wrote:
>>
>> I can reproduce it even with OpenMPI 4.1.1 on a different machine (Ubuntu
>> 18 + AMD milan + clang from AOCC)  and it is definitely an OpenMPI bug in
>> the vader BTL If I use tcp, everything runs smooth
>>
>>
>>
>>
> On Sep 14, 2021, at 10:54 AM, Junchao Zhang 
> wrote:
>
> Without a standalone & valid mpi example to reproduce the error, we are
> not assured to say it is an OpenMPI bug.
>
> --Junchao Zhang
>
>
> On Tue, Sep 14, 2021 at 6:17 AM Matthew Knepley  wrote:
>
>> Okay, we have to send this to OpenMPI. Volunteers?
>>
>> Maybe we should note this in the FAQ, or installation, so we remember how
>> to fix it if someone else asks?
>>
>>   Thanks,
>>
>>  Matt
>>
>> On Tue, Sep 14, 2021 at 2:35 AM Stefano Zampini <
>> stefano.zamp...@gmail.com> wrote:
>>
>>> I can reproduce it even with OpenMPI 4.1.1 on a different machine
>>> (Ubuntu 18 + AMD milan + clang from AOCC)  and it is definitely an OpenMPI
>>> bug in the vader BTL If I use tcp, everything runs smooth
>>>
>>> zampins@kanary:~/Devel/petsc$ cat
>>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>>> btl = tcp,self
>>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test
>>> vec_is_sf_tutorials-ex1_4
>>> Using MAKEFLAGS:
>>> TEST arch-debug/tests/counts/vec_is_sf_tutorials-ex1_4.counts
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>>  ok vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>>  ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>>
>>>
>>> zampins@kanary:~/Devel/petsc$ cat
>>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>>> btl = vader,tcp,self
>>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test
>>> vec_is_sf_tutorials-ex1_4
>>> Using MAKEFLAGS:
>>> TEST arch-debug/tests/counts/vec_is_sf_tutorials-ex1_4.counts
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>> not ok
>>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>> # Error code: 1
>>> # 43,46c43,46
>>> # < [0] 0: 4001 2000 2002 3002 4002
>>> # < [1] 0: 1001 3000
>>> # < [2] 0: 2001 4000
>>> # < [3] 0: 3001 1000
>>> # ---
>>> # > [0] 0: 2002 2146435072 2 2146435072 38736240
>>> # > [1] 0: 3000 2146435072
>>> # > [2] 0: 2001 2146435072
>>> # > [3] 0: 3001 2146435072
>>>  ok
>>> vec_is_sf_tutorials-ex1_4+sf_win

Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-14 Thread Junchao Zhang
Without a standalone & valid mpi example to reproduce the error, we are not
assured to say it is an OpenMPI bug.

--Junchao Zhang


On Tue, Sep 14, 2021 at 6:17 AM Matthew Knepley  wrote:

> Okay, we have to send this to OpenMPI. Volunteers?
>
> Maybe we should note this in the FAQ, or installation, so we remember how
> to fix it if someone else asks?
>
>   Thanks,
>
>  Matt
>
> On Tue, Sep 14, 2021 at 2:35 AM Stefano Zampini 
> wrote:
>
>> I can reproduce it even with OpenMPI 4.1.1 on a different machine (Ubuntu
>> 18 + AMD milan + clang from AOCC)  and it is definitely an OpenMPI bug in
>> the vader BTL If I use tcp, everything runs smooth
>>
>> zampins@kanary:~/Devel/petsc$ cat
>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>> btl = tcp,self
>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test
>> vec_is_sf_tutorials-ex1_4
>> Using MAKEFLAGS:
>> TEST arch-debug/tests/counts/vec_is_sf_tutorials-ex1_4.counts
>>  ok vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-dynamic
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-allocate
>>  ok vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-create
>>  ok vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-dynamic
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-lock_sf_window_flavor-allocate
>>
>>
>> zampins@kanary:~/Devel/petsc$ cat
>> /home/zampins/local/etc/openmpi-mca-params.conf | grep btl
>> btl = vader,tcp,self
>> zampins@kanary:~/Devel/petsc$ make -f gmakefile.test
>> vec_is_sf_tutorials-ex1_4
>> Using MAKEFLAGS:
>> TEST arch-debug/tests/counts/vec_is_sf_tutorials-ex1_4.counts
>>  ok vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>> not ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>> # Error code: 1
>> # 43,46c43,46
>> # < [0] 0: 4001 2000 2002 3002 4002
>> # < [1] 0: 1001 3000
>> # < [2] 0: 2001 4000
>> # < [3] 0: 3001 1000
>> # ---
>> # > [0] 0: 2002 2146435072 2 2146435072 38736240
>> # > [1] 0: 3000 2146435072
>> # > [2] 0: 2001 2146435072
>> # > [3] 0: 3001 2146435072
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>> not ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-dynamic
>> # Error code: 1
>> # 43,46c43,46
>> # < [0] 0: 4001 2000 2002 3002 4002
>> # < [1] 0: 1001 3000
>> # < [2] 0: 2001 4000
>> # < [3] 0: 3001 1000
>> # ---
>> # > [0] 0: 2002 2146435072 2 2146435072 0
>> # > [1] 0: 3000 2146435072
>> # > [2] 0: 2001 2146435072
>> # > [3] 0: 3001 2146435072
>>  ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>>  ok
>> diff-vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-allocate
>> # retrying
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create
>> not ok
>> vec_is_sf_tutorials-ex1_4+sf_window_sync-active_sf_window_flavor-create #
>> Error code: 98
>> # [1]PETSC ERROR: - Error Message
>> --
>> # [1]PETSC ERROR: General MPI error
>> # [1]PETSC ERROR: MPI error 6 MPI_ERR_RANK: invalid rank
>> # [1]PETSC ERROR: See https://petsc.org/rel

Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-13 Thread Junchao Zhang
Hi, Stefano,
   Ping you again to see if you want to resolve this problem before
petsc-3.16

--Junchao Zhang


On Sun, Sep 12, 2021 at 3:06 PM Antonio T. sagitter <
sagit...@fedoraproject.org> wrote:

> Unfortunately, it's not possible. I must use the OpenMPI provided by
> Fedora build-system (these rpm builds of PETSc are for Fedora's
> repositories), downloading external software is not permitted.
>
> On 9/12/21 21:10, Pierre Jolivet wrote:
> >
> >> On 12 Sep 2021, at 8:56 PM, Matthew Knepley  >> <mailto:knep...@gmail.com>> wrote:
> >>
> >> On Sun, Sep 12, 2021 at 2:49 PM Antonio T. sagitter
> >> mailto:sagit...@fedoraproject.org>> wrote:
> >>
> >> Those attached are configure.log/make.log from a MPI build in
> >> Fedora 34
> >> x86_64 where the error below occurred.
> >>
> >>
> >> This is OpenMPI 4.1.0. Is that the only MPI you build? My first
> >> inclination is that this is an MPI implementation bug.
> >>
> >> Junchao, do we have an OpenMPI build in the CI?
> >
> > config/examples/arch-ci-linux-cuda-double-64idx.py:
> >   '--download-openmpi=1',
> > config/examples/arch-ci-linux-pkgs-dbg-ftn-interfaces.py:
> >   '--download-openmpi=1',
> > config/examples/arch-ci-linux-pkgs-opt.py:  '--download-openmpi=1',
> >
> > config/BuildSystem/config/packages/OpenMPI.py uses version 4.1.0 as well.
> > I’m not sure PETSc is to blame here Antonio. You may want to try to
> > ditch the OpenMPI shipped by your packet manager and try
> > --download-openmpi as well, just for a quick sanity check.
> >
> > Thanks,
> > Pierre
> >
>
> --
> ---
> Antonio Trande
> Fedora Project
> mailto: sagit...@fedoraproject.org
> GPG key: 0x29FBC85D7A51CC2F
> GPG key server: https://keyserver1.pgp.com/
>


Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-12 Thread Junchao Zhang
An old issue with SF_Window is at
https://gitlab.com/petsc/petsc/-/issues/555,  though which is a different
error.

--Junchao Zhang


On Sun, Sep 12, 2021 at 2:20 PM Junchao Zhang 
wrote:

> We met SF + Windows errors before.  Stefano wrote the code, which I don't
> think was worth doing. SF with MPI one-sided is hard to be correct (due to
> shared memory programming), bad in performance, and no users use that.
> I would suggest we just disable the test and feature?   Stefano, what do
> you think?
>
> --Junchao Zhang
>
>
> On Sun, Sep 12, 2021 at 2:10 PM Pierre Jolivet  wrote:
>
>>
>> On 12 Sep 2021, at 8:56 PM, Matthew Knepley  wrote:
>>
>> On Sun, Sep 12, 2021 at 2:49 PM Antonio T. sagitter <
>> sagit...@fedoraproject.org> wrote:
>>
>>> Those attached are configure.log/make.log from a MPI build in Fedora 34
>>> x86_64 where the error below occurred.
>>>
>>
>> This is OpenMPI 4.1.0. Is that the only MPI you build? My first
>> inclination is that this is an MPI implementation bug.
>>
>> Junchao, do we have an OpenMPI build in the CI?
>>
>>
>> config/examples/arch-ci-linux-cuda-double-64idx.py:
>>  '--download-openmpi=1',
>> config/examples/arch-ci-linux-pkgs-dbg-ftn-interfaces.py:
>>  '--download-openmpi=1',
>> config/examples/arch-ci-linux-pkgs-opt.py:  '--download-openmpi=1',
>>
>> config/BuildSystem/config/packages/OpenMPI.py uses version 4.1.0 as well.
>> I’m not sure PETSc is to blame here Antonio. You may want to try to ditch
>> the OpenMPI shipped by your packet manager and try --download-openmpi as
>> well, just for a quick sanity check.
>>
>> Thanks,
>> Pierre
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> On 9/12/21 19:18, Antonio T. sagitter wrote:
>>> > Okay. I will try to set correctly the DATAFILESPATH options.
>>> >
>>> > I see even this error:
>>> >
>>> > not ok
>>> > vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>>> #
>>> > Error code: 68
>>> >
>>> > #PetscSF Object: 4 MPI processes
>>> >
>>> > #  type: window
>>> >
>>> > #  [0] Number of roots=3, leaves=2, remote ranks=2
>>> >
>>> > #  [0] 0 <- (3,1)
>>> >
>>> > #  [0] 1 <- (1,0)
>>> >
>>> > #  [1] Number of roots=2, leaves=3, remote ranks=2
>>> >
>>> > #  [1] 0 <- (0,1)
>>> >
>>> > #  [1] 1 <- (2,0)
>>> >
>>> > #  [1] 2 <- (0,2)
>>> >
>>> > #  [2] Number of roots=2, leaves=3, remote ranks=3
>>> >
>>> > #  [2] 0 <- (1,1)
>>> >
>>> > #  [2] 1 <- (3,0)
>>> >
>>> > #  [2] 2 <- (0,2)
>>> >
>>> > #  [3] Number of roots=2, leaves=3, remote ranks=2
>>> >
>>> > #  [3] 0 <- (2,1)
>>> >
>>> > #  [3] 1 <- (0,0)
>>> >
>>> > #  [3] 2 <- (0,2)
>>> >
>>> > #  [0] Roots referenced by my leaves, by rank
>>> >
>>> > #  [0] 1: 1 edges
>>> >
>>> > #  [0]1 <- 0
>>> >
>>> > #  [0] 3: 1 edges
>>> >
>>> > #  [0]0 <- 1
>>> >
>>> > #  [1] Roots referenced by my leaves, by rank
>>> >
>>> > #  [1] 0: 2 edges
>>> >
>>> > #  [1]0 <- 1
>>> >
>>> > #  [1]2 <- 2
>>> >
>>> > #  [1] 2: 1 edges
>>> >
>>> > #  [1]1 <- 0
>>> >
>>> > #  [2] Roots referenced by my leaves, by rank
>>> >
>>> > #  [2] 0: 1 edges
>>> >
>>> > #  [2]2 <- 2
>>> >
>>> > #  [2] 1: 1 edges
>>> >
>>> > #  [2]0 <- 1
>>> >
>>> > #  [2] 3: 1 edges
>>> >
>>> > #  [2]1 <- 0
>>> >
>>> > #  [3] Roots referenced by my leaves, by rank
>>> >
>>> > #  [3] 0: 2 edges
>>> >
>>> > #  [3]1 <- 0
>>> >
>>> > #  [3]2 <- 2
>>> >
>>> > #  [3] 2: 1 edges
>>> >
>>> > #  [3]0 <- 1
>>> >
>>> 

Re: [petsc-dev] Cannot locate file: share/petsc/datafiles/matrices/small

2021-09-12 Thread Junchao Zhang
We met SF + Windows errors before.  Stefano wrote the code, which I don't
think was worth doing. SF with MPI one-sided is hard to be correct (due to
shared memory programming), bad in performance, and no users use that.
I would suggest we just disable the test and feature?   Stefano, what do
you think?

--Junchao Zhang


On Sun, Sep 12, 2021 at 2:10 PM Pierre Jolivet  wrote:

>
> On 12 Sep 2021, at 8:56 PM, Matthew Knepley  wrote:
>
> On Sun, Sep 12, 2021 at 2:49 PM Antonio T. sagitter <
> sagit...@fedoraproject.org> wrote:
>
>> Those attached are configure.log/make.log from a MPI build in Fedora 34
>> x86_64 where the error below occurred.
>>
>
> This is OpenMPI 4.1.0. Is that the only MPI you build? My first
> inclination is that this is an MPI implementation bug.
>
> Junchao, do we have an OpenMPI build in the CI?
>
>
> config/examples/arch-ci-linux-cuda-double-64idx.py:
>  '--download-openmpi=1',
> config/examples/arch-ci-linux-pkgs-dbg-ftn-interfaces.py:
>  '--download-openmpi=1',
> config/examples/arch-ci-linux-pkgs-opt.py:  '--download-openmpi=1',
>
> config/BuildSystem/config/packages/OpenMPI.py uses version 4.1.0 as well.
> I’m not sure PETSc is to blame here Antonio. You may want to try to ditch
> the OpenMPI shipped by your packet manager and try --download-openmpi as
> well, just for a quick sanity check.
>
> Thanks,
> Pierre
>
>   Thanks,
>
>  Matt
>
>
>> On 9/12/21 19:18, Antonio T. sagitter wrote:
>> > Okay. I will try to set correctly the DATAFILESPATH options.
>> >
>> > I see even this error:
>> >
>> > not ok
>> > vec_is_sf_tutorials-ex1_4+sf_window_sync-fence_sf_window_flavor-create
>> #
>> > Error code: 68
>> >
>> > #PetscSF Object: 4 MPI processes
>> >
>> > #  type: window
>> >
>> > #  [0] Number of roots=3, leaves=2, remote ranks=2
>> >
>> > #  [0] 0 <- (3,1)
>> >
>> > #  [0] 1 <- (1,0)
>> >
>> > #  [1] Number of roots=2, leaves=3, remote ranks=2
>> >
>> > #  [1] 0 <- (0,1)
>> >
>> > #  [1] 1 <- (2,0)
>> >
>> > #  [1] 2 <- (0,2)
>> >
>> > #  [2] Number of roots=2, leaves=3, remote ranks=3
>> >
>> > #  [2] 0 <- (1,1)
>> >
>> > #  [2] 1 <- (3,0)
>> >
>> > #  [2] 2 <- (0,2)
>> >
>> > #  [3] Number of roots=2, leaves=3, remote ranks=2
>> >
>> > #  [3] 0 <- (2,1)
>> >
>> > #  [3] 1 <- (0,0)
>> >
>> > #  [3] 2 <- (0,2)
>> >
>> > #  [0] Roots referenced by my leaves, by rank
>> >
>> > #  [0] 1: 1 edges
>> >
>> > #  [0]1 <- 0
>> >
>> > #  [0] 3: 1 edges
>> >
>> > #  [0]0 <- 1
>> >
>> > #  [1] Roots referenced by my leaves, by rank
>> >
>> > #  [1] 0: 2 edges
>> >
>> > #  [1]0 <- 1
>> >
>> > #  [1]2 <- 2
>> >
>> > #  [1] 2: 1 edges
>> >
>> > #  [1]1 <- 0
>> >
>> > #  [2] Roots referenced by my leaves, by rank
>> >
>> > #  [2] 0: 1 edges
>> >
>> > #  [2]2 <- 2
>> >
>> > #  [2] 1: 1 edges
>> >
>> > #  [2]0 <- 1
>> >
>> > #  [2] 3: 1 edges
>> >
>> > #  [2]1 <- 0
>> >
>> > #  [3] Roots referenced by my leaves, by rank
>> >
>> > #  [3] 0: 2 edges
>> >
>> > #  [3]1 <- 0
>> >
>> > #  [3]2 <- 2
>> >
>> > #  [3] 2: 1 edges
>> >
>> > #  [3]0 <- 1
>> >
>> > #  current flavor=CREATE synchronization=FENCE MultiSF
>> sort=rank-order
>> >
>> > #current info=MPI_INFO_NULL
>> >
>> > #[buildhw-x86-09:1135574] *** An error occurred in MPI_Accumulate
>> >
>> > #[buildhw-x86-09:1135574] *** reported by process [3562602497,3]
>> >
>> > #[buildhw-x86-09:1135574] *** on win rdma window 4
>> >
>> > #[buildhw-x86-09:1135574] *** MPI_ERR_RMA_RANGE: invalid RMA
>> address
>> > range
>> >
>> > #[buildhw-x86-09:1135574] *** MPI_ERRORS_ARE_FATAL (processes in
>> > this win will now abort,
>> >
>> > #[buildhw-x86-09:1135574] ***and potentially your MPI job)
>> >
>> > #[buildhw-x86-09.iad2.fedoraproject.org:1135567] 3 more processes
>> > have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
>> >
>> > #[buildhw-x86-09.iad2.fedoraproject.org:1135567] Set MCA parameter
>> > "orte_base_help_aggregate" to 0 to see all help / error messages
>> >
>> > Looks like an error related to OpenMPI-4*:
>> > https://github.com/open-mpi/ompi/issues/6374
>> >
>>
>> --
>> ---
>> Antonio Trande
>> Fedora Project
>> mailto: sagit...@fedoraproject.org
>> GPG key: 0x29FBC85D7A51CC2F
>> GPG key server: https://keyserver1.pgp.com/
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>


Re: [petsc-dev] help on Summit

2021-08-30 Thread Junchao Zhang
Can you use less fancy 'static const int'?
--Junchao Zhang


On Mon, Aug 30, 2021 at 1:02 PM Jacob Faibussowitsch 
wrote:

> No luck with C++14
>
>
> TL;DR: you need to have host and device compiler either both using c++17
> or neither using c++17.
>
> Long version:
> C++17 among other things changed how static constexpr member variables for
> classes worked. Previously if I had a class with a static constexpr member
> variable I would have to not only declare it inline within the class, but
> also define it within an executable otherwise the variable would not
> actually have any physical memory address:
>
> // foo.hpp
> class foo
> {
>   static constexpr int bar = 5;
> };
>
> // foo.cpp
> int foo::bar;
>
> In c++17 however this changed because you can have static “inline”
> variables. All this does is force the compiler define the variable for you
> instead. The issue of course is that static constexpr implicitly makes that
> variable inline in c++17. So to sum it up:
>
> 1. The c++17 compiler (nvcc) sees the static constexpr variable, goes “hmm
> ok I will define this in some undefined location”.
> 2. The c++11/14 compiler comes along, sees your hand-coded definition of
> the variable and goes “ah but I think I’ve seen this before, I’ll ignore
> it”. This silent rejection is due to the hand-coded definition idiom being
> deprecated from c++17 onwards. Stupid, I know.
> 2. The linker (driven by the c++11/14 compiler since PETSc links using the
> host compiler) comes along and now suddenly cannot find the literal
> definition, because it doesn’t know what the c++17 did. Disaster!
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Aug 30, 2021, at 10:12, Mark Adams  wrote:
>
> No luck with C++14
>
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>  CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
> multiple definition of
> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
> first defined here
> /usr/bin/ld: link errors found, deleting executable
> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
> collect2: error: ld returned 1 exit status
> gmake[3]: *** [gmakefile:113:
> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
> gmake[2]: ***
> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
> Error 2
> **ERROR*
>   Error during compile, check
> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>   Send it and arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log to
> petsc-ma...@mcs.anl.gov
> ********
> gmake[1]: *** [makefile:40: all] Error 1
>
> On Mon, Aug 30, 2021 at 10:50 AM Mark Adams  wrote:
>
>> Stefano suggested C++14 in configure. I was using C++11.
>>
>> On Mon, Aug 30, 2021 at 10:46 AM Junchao Zhang 
>> wrote:
>>
>>>  Petsc::CUPMInterface
>>> @Jacob Faibussowitsch 
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Aug 30, 2021 at 9:35 AM Mark Adams  wrote:
>>>
>>>> I was running fine this AM and am bouncing between modules to help two
>>>> apps (ECP milestone season) at the same time and something broke. I did
>>>> update main and I get the same error in main and a hypre branch of
>>>> Stefano's.
>>>> I started with a clean build and checked my modules...
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>CC arch-summit-hypre-cuda-dbg/obj/tao/interface/taosolver.o
>>>>   CC arch-summit-hypre-cuda-dbg/obj/ts/interface/ts.o
>>>>CUDAC
>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>CUDAC.dep
>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>CUDAC
>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>CUDAC.dep
>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>CUDAC
>>>> arch-summit-h

Re: [petsc-dev] help on Summit

2021-08-30 Thread Junchao Zhang
 Petsc::CUPMInterface
@Jacob Faibussowitsch 
--Junchao Zhang


On Mon, Aug 30, 2021 at 9:35 AM Mark Adams  wrote:

> I was running fine this AM and am bouncing between modules to help two
> apps (ECP milestone season) at the same time and something broke. I did
> update main and I get the same error in main and a hypre branch of
> Stefano's.
> I started with a clean build and checked my modules...
> Any ideas?
>
> Thanks,
> Mark
>
>CC arch-summit-hypre-cuda-dbg/obj/tao/interface/taosolver.o
>   CC arch-summit-hypre-cuda-dbg/obj/ts/interface/ts.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>CUDAC
> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>CUDAC.dep
> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>  CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
> multiple definition of
> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
> first defined here
> /usr/bin/ld: link errors found, deleting executable
> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
> collect2: error: ld returned 1 exit status
> gmake[3]: *** [gmakefile:113:
> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
> gmake[2]: ***
> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
> Error 2
> **ERROR*
>   Error during compile, check
> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>   Send it and arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log to
> petsc-ma...@mcs.anl.gov
> 
> gmake[1]: *** [makefile:40: all] Error 1
> make: *** [GNUmakefile:9: all] Error 2
>


Re: [petsc-dev] PETSC_COMM_WORLD not a PETSc communicator

2021-08-16 Thread Junchao Zhang
Barry,
  Thanks for the PetscRegisterFinalize() suggestion. I made an MR at
https://gitlab.com/petsc/petsc/-/merge_requests/4238
   In rare cases, if I do need to duplicate communicators, I now free them
through PetscRegisterFinalize().

--Junchao Zhang


On Sun, Aug 15, 2021 at 12:50 PM Barry Smith  wrote:

>
>Junchao,
>
> Thanks for checking this.
>
> Could you use PetscRegisterFinalize()?
>
>  Barry
>
>
> On Aug 14, 2021, at 10:46 PM, Junchao Zhang 
> wrote:
>
>
>
> On Thu, Aug 12, 2021 at 11:22 AM Barry Smith  wrote:
>
>>
>>  User visible communicators generally do not have a keyval attached.
>> Rather the keyval is attached to the inner communicator; because we don't
>> want both PETSc and the user doing MPI operations on the same communicator
>> (to prevent mixing up tags).
>>
>>   I think PetscShmCommGet() is wrong. I think it should not call
>> MPI_Comm_get_attr(globcomm,Petsc_Counter_keyval,,); but should
>> call PetscCommDuplicate() and use that communicator to stash the pshmcomm;
>> then you would not have the problem you are having.
>>
> Barry, I think it over and find the problem is:  if I PetscCommDuplicate()
> the outer comm in PetscShmCommGet(), then I can not find a place to destroy
> the inner communicator (note usually petsc inner communciators are
> destroyed with petsc objects. Doing what you said breaks the rule)
>
> PetscShmCommGet() was designed to help doing OpenMP multithreading on a
> communicator that some petsc objects live in. So requiring the input
> communicator to be petsc comm is not totally nonsense.
>
> I tried another approach: Don't check whether the input comm in
> PetscShmCommGet(globcomm,) is an outer comm or an inner comm. We
> instead check its Petsc_ShmComm_keyval. If it does not have one, we just
> create one for it (along with a new shared memory communicator)
> With that, one is able to call PetscShmCommGet(PETSC_COMM_WORLD, ...).
> The problem is we attached an attribute to PETSC_COMM_WORLD. It is deleted
> inside MPI_Finalize().   PETSc -malloc_dump complains of unfreed memory
> (since I used PetscMalloc inside PetscShmCommGet). I could
> bypass PetscMalloc and directly use malloc() to avoid this situation. Is
> it worthy?
>
>
>>
>>
>> Barry
>>
>>
>>
>> > On Aug 12, 2021, at 11:05 AM, Pierre Jolivet  wrote:
>> >
>> > Hello,
>> > Is there a specific reason why PETSC_COMM_WORLD is not a PETSc
>> communicator, i.e., has no Petsc_Counter_keyval attached?
>> > ierr =
>> PetscOmpCtrlCreate(PETSC_COMM_WORLD,nthreads,);CHKERRQ(ierr);
>> > yields
>> > [0]PETSC ERROR: Bad MPI communicator supplied must be a PETSc
>> communicator
>> > [0]PETSC ERROR: #1 PetscShmCommGet() at src/sys/utils/mpishm.c:60
>> > [0]PETSC ERROR: #2 PetscOmpCtrlCreate() at src/sys/utils/mpishm.c:340
>> >
>> > Thanks,
>> > Pierre
>>
>>
>


Re: [petsc-dev] PETSC_COMM_WORLD not a PETSc communicator

2021-08-14 Thread Junchao Zhang
On Thu, Aug 12, 2021 at 11:22 AM Barry Smith  wrote:

>
>  User visible communicators generally do not have a keyval attached.
> Rather the keyval is attached to the inner communicator; because we don't
> want both PETSc and the user doing MPI operations on the same communicator
> (to prevent mixing up tags).
>
>   I think PetscShmCommGet() is wrong. I think it should not call
> MPI_Comm_get_attr(globcomm,Petsc_Counter_keyval,,); but should
> call PetscCommDuplicate() and use that communicator to stash the pshmcomm;
> then you would not have the problem you are having.
>
Barry, I think it over and find the problem is:  if I PetscCommDuplicate()
the outer comm in PetscShmCommGet(), then I can not find a place to destroy
the inner communicator (note usually petsc inner communciators are
destroyed with petsc objects. Doing what you said breaks the rule)

PetscShmCommGet() was designed to help doing OpenMP multithreading on a
communicator that some petsc objects live in. So requiring the input
communicator to be petsc comm is not totally nonsense.

I tried another approach: Don't check whether the input comm in
PetscShmCommGet(globcomm,) is an outer comm or an inner comm. We
instead check its Petsc_ShmComm_keyval. If it does not have one, we just
create one for it (along with a new shared memory communicator)
With that, one is able to call PetscShmCommGet(PETSC_COMM_WORLD, ...).  The
problem is we attached an attribute to PETSC_COMM_WORLD. It is deleted
inside MPI_Finalize().   PETSc -malloc_dump complains of unfreed memory
(since I used PetscMalloc inside PetscShmCommGet). I could
bypass PetscMalloc and directly use malloc() to avoid this situation. Is it
worthy?


>
>
> Barry
>
>
>
> > On Aug 12, 2021, at 11:05 AM, Pierre Jolivet  wrote:
> >
> > Hello,
> > Is there a specific reason why PETSC_COMM_WORLD is not a PETSc
> communicator, i.e., has no Petsc_Counter_keyval attached?
> > ierr = PetscOmpCtrlCreate(PETSC_COMM_WORLD,nthreads,);CHKERRQ(ierr);
> > yields
> > [0]PETSC ERROR: Bad MPI communicator supplied must be a PETSc
> communicator
> > [0]PETSC ERROR: #1 PetscShmCommGet() at src/sys/utils/mpishm.c:60
> > [0]PETSC ERROR: #2 PetscOmpCtrlCreate() at src/sys/utils/mpishm.c:340
> >
> > Thanks,
> > Pierre
>
>


Re: [petsc-dev] building on Spock

2021-08-13 Thread Junchao Zhang
Yes, the two files are just petsc headers.

--Junchao Zhang


On Fri, Aug 13, 2021 at 5:17 PM Mark Adams  wrote:

> I seem to be getting Kokkos includes in my install but there is no kokkos
> in the configure and I started with a clean PETSc arch directory and
> install directory. Does this make sense?
>
> 18:12 main= /gpfs/alpine/csc314/scratch/adams/petsc$ ll
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray/include/
>  | grep -i kokko
> -rw-rw-r-- 1 adams adams7074 May 27 06:57 petscdmda_kokkos.hpp
> -rw-rw-r-- 1 adams adams1636 May 27 06:57 petscvec_kokkos.hpp
> 18:12 main= /gpfs/alpine/csc314/scratch/adams/petsc$
>
>
> On Fri, Aug 13, 2021 at 2:27 PM Mark Adams  wrote:
>
>>
>>
>> On Fri, Aug 13, 2021 at 2:01 PM Matthew Knepley 
>> wrote:
>>
>>> On Fri, Aug 13, 2021 at 1:49 PM Mark Adams  wrote:
>>>
>>>> I was building on Spock a few weeks ago, but am getting these errors
>>>> now.
>>>> I have a setup with this environment and get this error.
>>>>
>>>> Any ideas?
>>>>
>>>
>>> Here is the error:
>>>
>>> Executing: cc  -o /tmp/petsc-6izfgpkr/config.setCompilers/conftest
>>> -L/opt/rocm-4.2.0/lib -lhsa-runtime64 -L${ROCM_PATH}/lib -lamdhip64
>>> -lhsa-runtime64  /tmp/petsc-6izfgpkr/config.setCompilers/conftest.o
>>> Possible ERROR while running linker: exit code 1
>>> stderr:
>>> ld.lld: error: unable to find library -lmpi_gtl_hsa
>>> clang-12: error: linker command failed with exit code 1 (use -v to see
>>> invocation)
>>> Linker output before filtering:
>>>
>>> ld.lld: error: unable to find library -lmpi_gtl_hsa
>>> clang-12: error: linker command failed with exit code 1 (use -v to see
>>> invocation)
>>> :
>>> Linker output after filtering:
>>>
>>> ld.lld: error: unable to find library -lmpi_gtl_hsa
>>> clang-12: error: linker command failed with exit code 1 (use -v to see
>>> invocation):
>>>   Error testing C compiler: Cannot compile/link C with cc.
>>> Deleting "CC"
>>>
>>> That -l look like it is put in directly by 'cc'. How does anything work
>>> on that machine?
>>>
>>
>> It looks like this comes from an env variable that I have:
>>
>> export PE_MPICH_GTL_LIBS_amd_gfx908="-lmpi_gtl_hsa"
>>
>>
>>>
>>>Matt
>>>
>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> module load craype-accel-amd-gfx908
>>>> module load rocm
>>>> module load emacs
>>>> module load zlib
>>>> module load autoconf automake libtool
>>>>
>>>> ## These must be set before compiling so the executable picks up GTL
>>>> export
>>>> PE_MPICH_GTL_DIR_amd_gfx908="-L/opt/cray/pe/mpich/8.1.4/gtl/lib"
>>>> export PE_MPICH_GTL_LIBS_amd_gfx908="-lmpi_gtl_hsa"
>>>>
>>>> ## These must be set before running
>>>> export MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0
>>>> export MPICH_GPU_SUPPORT_ENABLED=1
>>>> export MPICH_SMP_SINGLE_COPY_MODE=CMA
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>


Re: [petsc-dev] Kokkos make error on Spock

2021-07-17 Thread Junchao Zhang
Mark,  I can reproduce this error with PrgEnv-cray, i.e., using the Cray
compiler (clang-11).  Previously I used PrgEnv-gnu, which did not have this
error.
Probably it is a problem of Spock.  But I am not sure.

--Junchao Zhang


On Sat, Jul 17, 2021 at 10:17 AM Mark Adams  wrote:

> And I can run a fortran test, with warnings, but C tests fail:
>
> 11:15 jczhang/fix-cray-mpicxx-includes/main=
> /gpfs/alpine/csc314/scratch/adams/petsc$ make
> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-spock-opt-cray-kokkos -f gmakefile test
> search='ts_utils_dmplexlandau_tutorials-ex1f90_0'
> Using MAKEFLAGS: -- search=ts_utils_dmplexlandau_tutorials-ex1f90_0
> PETSC_ARCH=arch-spock-opt-cray-kokkos
> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
>   FC
> arch-spock-opt-cray-kokkos/tests/ts/utils/dmplexlandau/tutorials/ex1f90.o
>  FLINKER
> arch-spock-opt-cray-kokkos/tests/ts/utils/dmplexlandau/tutorials/ex1f90
> /opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:
> warning: alignment 128 of symbol
> `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in
> /opt/cray/pe/cce/11.0.4/cce/x86_64/lib/libmodules.so is smaller than 256 in
> arch-spock-opt-cray-kokkos/tests/ts/utils/dmplexlandau/tutorials/ex1f90.o
> /opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:
> warning: alignment 64 of symbol `$data_init$iso_c_binding_' in
> /opt/cray/pe/cce/11.0.4/cce/x86_64/lib/libmodules.so is smaller than 256 in
> arch-spock-opt-cray-kokkos/tests/ts/utils/dmplexlandau/tutorials/ex1f90.o
> TEST
> arch-spock-opt-cray-kokkos/tests/counts/ts_utils_dmplexlandau_tutorials-ex1f90_0.counts
>  ok ts_utils_dmplexlandau_tutorials-ex1f90_0
>  ok diff-ts_utils_dmplexlandau_tutorials-ex1f90_0
>
>
> On Sat, Jul 17, 2021 at 10:53 AM Mark Adams  wrote:
>
>> HUmm, I can not reproduce this.
>>
>> 10:49 jczhang/fix-cray-mpicxx-includes/main=
>> /gpfs/alpine/csc314/scratch/adams/petsc$ make
>> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
>> PETSC_ARCH=arch-spock-dbg-kokkos check
>> Running check examples to verify correct installation
>> Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and
>> PETSC_ARCH=arch-spock-dbg-kokkos
>> gmake[3]:
>> [/gpfs/alpine/csc314/scratch/adams/petsc/lib/petsc/conf/rules:301:
>> ex19.PETSc] Error 2 (ignored)
>> ***Error detected during compile or
>> link!***
>> See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19
>>
>> *
>> cc -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64
>>  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
>> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O0  -fPIC
>> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
>> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include
>> -I/gpfs/alpine/csc314/scratch/adams/petsc/include
>> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-dbg-kokkos/include
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include
>>  -I/sw/spock/spack-envs/views/rocm-4.1.0/includeex19.c
>>  -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-dbg-kokkos/lib
>> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-dbg-kokkos/lib
>> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-dbg-kokkos/lib
>> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-dbg-kokkos/lib
>> -Wl,-rpath,/sw/spock/spack-envs/views/rocm-4.1.0/lib
>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib
>> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib64 -L/opt/gcc/8.1.0/snos/lib64
>> -Wl,-rpath,/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>> -L/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>> -L/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/gtl/lib
>> -L/opt/cray/pe/mpich/8.1.4/gtl/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.10/lib
>> -L/opt/cray/pe/pmi/6.0.10/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.4/dsmml/lib
>> -L/opt/cray/pe/dsmml/0.1.4/dsmml/lib
>> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
>> -L/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
>> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
>> -L/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
>> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
>> -L/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
>> -W

Re: [petsc-dev] Kokkos make error on Spock

2021-07-16 Thread Junchao Zhang
Mark, I configured with Fortran binding enabled with main + MR !4175
<https://gitlab.com/petsc/petsc/-/merge_requests/4175>

'--with-mpiexec=srun',
'--with-shared-libraries=1',
'--with-cc=cc',
'--with-cxx=CC',
'--with-fc=ftn',
'--with-fortran-bindings',
'--with-hip',
'--with-hipc=hipcc',
'--with-debugging',
'--CPPFLAGS=-I${ROCM_PATH}/include',
'--CXXPPFLAGS=-I${ROCM_PATH}/include',
'--CC_LINKER_FLAGS=-L${ROCM_PATH}/lib -lamdhip64 -lhsa-runtime64',
'--CXX_LINKER_FLAGS=-L${ROCM_PATH}/lib -lamdhip64 -lhsa-runtime64',
'--FC_LINKER_FLAGS=-L${ROCM_PATH}/lib -lamdhip64 -lhsa-runtime64',
'--COPTFLAGS=-g -O0',
'--CXXOPTFLAGS=-g -O0',
'--FOPTFLAGS=-g -O0',
'--download-kokkos',
'--download-kokkos-kernels',
'--download-kokkos-commit=3.4.01',
'--download-kokkos-kernels-commit=3.4.01',
'--with-kokkos-hip-arch=VEGA908',

and 'make check' ran smoothly on a compute node

$ make check
Running check examples to verify correct installation
Using PETSC_DIR=/ccs/home/jczhang/petsc and
PETSC_ARCH=arch-spock-cray-kokkos-dbg
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
C/C++ example src/snes/tutorials/ex3k run successfully with kokkos-kernels
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process

Completed test examples


--Junchao Zhang


On Fri, Jul 16, 2021 at 6:04 PM Mark Adams  wrote:

> And I find that this error, on non-Kokkos C tests, is fixed by turning the
> fortran bindings off:
>
> ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]
> ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]
> ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]
>
> On Fri, Jul 16, 2021 at 3:53 PM Mark Adams  wrote:
>
>> Not complex. THis has some overlap with my problem w/o Kokkos.
>>
>> On Fri, Jul 16, 2021 at 12:54 PM Junchao Zhang 
>> wrote:
>>
>>> Do you use complex? post your configure.log.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Jul 16, 2021 at 9:47 AM Mark Adams  wrote:
>>>
>>>> The simple Kokkos example is failing for me on Spock.
>>>> Any ideas?
>>>> Thanks,
>>>>
>>>> 10:44 main *=
>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
>>>> PETSC_ARCH=arch-spock-opt-cray-kokkos ex3k
>>>> MPICH_CXX="hipcc" OMPI_CXX="hipcc" CC
>>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lhsa-runtime64
>>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64
>>>>  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
>>>> -fstack-protector -fvisibility=hidden -g -O2   -fPIC -Wall -Wwrite-strings
>>>> -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O3
>>>> -std=c++14  -I/gpfs/alpine/csc314/scratch/adams/petsc/include
>>>> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/include
>>>> -I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include
>>>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include
>>>>  -I/sw/spock/spack-envs/views/rocm-4.1.0/includeex3k.kokkos.cxx
>>>>  
>>>> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
>>>> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
>>>> -Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
>>>> -L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
>>>> -Wl,-rpath,/sw/spock/spack-envs/views/rocm-4.1.0/lib
>>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib
>>>> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib64 -L/opt/gcc/8.1.0/snos/lib64
>>>> -Wl,-rpath,/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>>>> -L/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>>>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>>>> -L/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>>>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/gtl/lib
>>>> -L/opt/cray/pe/mpich/8.1.4/gtl/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.10/lib

Re: [petsc-dev] Kokkos make error on Spock

2021-07-16 Thread Junchao Zhang
I don't understand this when linking ex19
   ld.lld: error:
/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib/libpetsc.so:
undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa
[--no-allow-shlib-undefined]
Do you have a stale libpetsc.so?

For ex3k, it is a makefile problem (I am fixing). If you do not directly
build an executable from *.kokkos.cxx, then you can avoid this problem.
 For example,  snes/tests/ex13 works with kokkos options on Spock.

--Junchao Zhang


On Fri, Jul 16, 2021 at 2:53 PM Mark Adams  wrote:

> Not complex. THis has some overlap with my problem w/o Kokkos.
>
> On Fri, Jul 16, 2021 at 12:54 PM Junchao Zhang 
> wrote:
>
>> Do you use complex? post your configure.log.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Jul 16, 2021 at 9:47 AM Mark Adams  wrote:
>>
>>> The simple Kokkos example is failing for me on Spock.
>>> Any ideas?
>>> Thanks,
>>>
>>> 10:44 main *=
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
>>> PETSC_ARCH=arch-spock-opt-cray-kokkos ex3k
>>> MPICH_CXX="hipcc" OMPI_CXX="hipcc" CC
>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lhsa-runtime64
>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64
>>>  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
>>> -fstack-protector -fvisibility=hidden -g -O2   -fPIC -Wall -Wwrite-strings
>>> -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O3
>>> -std=c++14  -I/gpfs/alpine/csc314/scratch/adams/petsc/include
>>> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/include
>>> -I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include
>>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include
>>>  -I/sw/spock/spack-envs/views/rocm-4.1.0/includeex3k.kokkos.cxx
>>>  
>>> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
>>> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
>>> -Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
>>> -L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
>>> -Wl,-rpath,/sw/spock/spack-envs/views/rocm-4.1.0/lib
>>> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib
>>> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib64 -L/opt/gcc/8.1.0/snos/lib64
>>> -Wl,-rpath,/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>>> -L/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
>>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>>> -L/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
>>> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/gtl/lib
>>> -L/opt/cray/pe/mpich/8.1.4/gtl/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.10/lib
>>> -L/opt/cray/pe/pmi/6.0.10/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.4/dsmml/lib
>>> -L/opt/cray/pe/dsmml/0.1.4/dsmml/lib
>>> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
>>> -L/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
>>> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
>>> -L/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
>>> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
>>> -L/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
>>> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
>>> -L/opt/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
>>> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
>>> -L/opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
>>> -lpetsc -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse
>>> -lhipblas -lrocsparse -lrocsolver -lrocblas -lamdhip64 -lhsa-runtime64
>>> -lstdc++ -ldl -lpmi -lsci_cray_mpi -lsci_cray -lmpifort_cray -lmpi_cray
>>> -lmpi_gtl_hsa -lxpmem -ldsmml -lpgas-shmem -lquadmath -lcrayacc_amdgpu
>>> -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread
>>> -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64
>>> -lquadmath -lstdc++ -ldl -o ex3k
>>> In file included from ex3k.kokkos.cxx:3:
>>> In file included from
>>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscdmda_kokkos.hpp:4:
>>> In file included from
>>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscvec_kokkos.hpp:14:
>>> In file included from
>>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscvec.h:

Re: [petsc-dev] Kokkos make error on Spock

2021-07-16 Thread Junchao Zhang
Do you use complex? post your configure.log.

--Junchao Zhang


On Fri, Jul 16, 2021 at 9:47 AM Mark Adams  wrote:

> The simple Kokkos example is failing for me on Spock.
> Any ideas?
> Thanks,
>
> 10:44 main *= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
> make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-spock-opt-cray-kokkos ex3k
> MPICH_CXX="hipcc" OMPI_CXX="hipcc" CC
> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lhsa-runtime64
> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64
>  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
> -fstack-protector -fvisibility=hidden -g -O2   -fPIC -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O3
> -std=c++14  -I/gpfs/alpine/csc314/scratch/adams/petsc/include
> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/include
> -I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include
>  -I/sw/spock/spack-envs/views/rocm-4.1.0/includeex3k.kokkos.cxx
>  
> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/lib
> -Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
> -L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/lib
> -Wl,-rpath,/sw/spock/spack-envs/views/rocm-4.1.0/lib
> -L/sw/spock/spack-envs/views/rocm-4.1.0/lib
> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib64 -L/opt/gcc/8.1.0/snos/lib64
> -Wl,-rpath,/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
> -L/opt/cray/pe/libsci/21.04.1.1/CRAY/9.0/x86_64/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
> -L/opt/cray/pe/mpich/8.1.4/ofi/cray/9.1/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.4/gtl/lib
> -L/opt/cray/pe/mpich/8.1.4/gtl/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.10/lib
> -L/opt/cray/pe/pmi/6.0.10/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.1.4/dsmml/lib
> -L/opt/cray/pe/dsmml/0.1.4/dsmml/lib
> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
> -L/opt/cray/pe/cce/11.0.4/cce/x86_64/lib
> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
> -L/opt/cray/xpmem/2.2.40-2.1_2.7__g3cf3325.shasta/lib64
> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
> -L/opt/cray/pe/cce/11.0.4/cce-clang/x86_64/lib/clang/11.0.0/lib/linux
> -Wl,-rpath,/opt/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
> -L/opt/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
> -Wl,-rpath,/opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
> -L/opt/cray/pe/cce/11.0.4/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
> -lpetsc -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse
> -lhipblas -lrocsparse -lrocsolver -lrocblas -lamdhip64 -lhsa-runtime64
> -lstdc++ -ldl -lpmi -lsci_cray_mpi -lsci_cray -lmpifort_cray -lmpi_cray
> -lmpi_gtl_hsa -lxpmem -ldsmml -lpgas-shmem -lquadmath -lcrayacc_amdgpu
> -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread
> -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64
> -lquadmath -lstdc++ -ldl -o ex3k
> In file included from ex3k.kokkos.cxx:3:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscdmda_kokkos.hpp:4:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscvec_kokkos.hpp:14:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscvec.h:9:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscsys.h:42:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscsystypes.h:255:
> In file included from
> /gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include/Kokkos_Complex.hpp:47:
> In file included from
> /gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include/Kokkos_Atomic.hpp:212:
> /gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include/impl/Kokkos_Atomic_Exchange.hpp:259:11:
> error: no member named 'lock_address_host_space' in namespace
> 'Kokkos::Impl::Kokkos::Impl'; did you mean simply 'lock_address_host_space'?
>   while (!Impl::lock_address_host_space((void*)dest))
>   ^
>   lock_address_host_space
> /gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray-kokkos/include/Kokkos_HostSpace.hpp:83:6:
> note: 'lock_address_host_space' declared here
> bool lock_address_host_space(void* ptr);
>  ^
> In file included from ex3k.kokkos.cxx:3:
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscd

Re: [petsc-dev] VecGetArrayAndMemType

2021-06-23 Thread Junchao Zhang
Mark,
  I am not sure what your problem is.  If it is a regression, can you
bisect it?
--Junchao Zhang


On Wed, Jun 23, 2021 at 4:04 PM Mark Adams  wrote:

> I also tried commenting out the second VecView, so there is just one step
> in the file, and the .h5 file is only 8 bytes smaller and the .xmf file
> goes from 5373  bytes to 3090 bytes.
>
> On Wed, Jun 23, 2021 at 4:01 PM Mark Adams  wrote:
>
>> It is not a device issue but it is a regression.
>>
>> Landau ex1 is tiny and just calls VecView before and after the TSsolve,
>> which is one time step. If you add "*-dm_view hdf5:f.h5 -vec_view
>> hdf5:f.h5::append -dm_landau_Ez 10.*" to landau/ex1 (see below), you get
>> an h5 file with two time steps, as it should be.
>> This is a huge electric field, Ez=10, which makes the electron
>> distribution (u_e) get visibly pulled off center.
>> In Visit, both time steps have identical data that is clearly after the
>> solve and not the initial condition (see attached).
>>
>> I ran this again with -ex1_ts_max_steps 0 and get the expected result of
>> two steps/frames with the symmetric initial condition in both. THis is
>> correct behavior.
>>
>> Any ideas?
>> Thanks
>>
>> diff --git a/src/ts/utils/dmplexlandau/tutorials/ex1.c
>> b/src/ts/utils/dmplexlandau/tutorials/ex1.c
>> index 9e4c8f1b61..31dfda2fad 100644
>> --- a/src/ts/utils/dmplexlandau/tutorials/ex1.c
>> +++ b/src/ts/utils/dmplexlandau/tutorials/ex1.c
>> @@ -66,6 +66,6 @@ int main(int argc, char **argv)
>>test:
>>  suffix: 0
>>  requires: p4est !complex
>> -args: -petscspace_degree 3 -petscspace_poly_tensor 1 -dm_landau_type
>> p4est -dm_landau_ion_masses 2,4 -dm_landau_ion_charges 1,18
>> -dm_landau_thermal_temps 5,5,.5 -dm_landau_n 1.00018,1,1e-5 -dm_landau_n_0
>> 1e20 -ex1_ts_monitor -ex1_snes_rtol 1.e-14 -ex1_snes_stol 1.e-14
>> -ex1_snes_monitor -ex1_snes_converged_reason -ex1_ts_type arkimex
>> -ex1_ts_arkimex_type 1bee -ex1_ts_max_snes_failures -1 -ex1_ts_rtol 1e-1
>> -ex1_ts_dt 1.e-1 -ex1_ts_max_time 1 -ex1_ts_adapt_clip .5,1.25
>> -ex1_ts_adapt_scale_solve_failed 0.75
>> -ex1_ts_adapt_time_step_increase_delay 5 -ex1_ts_max_steps 1 -ex1_pc_type
>> lu -ex1_ksp_type preonly -dm_landau_amr_levels_max 7
>> -dm_landau_domain_radius 5 -dm_landau_amr_re_levels 0 -dm_landau_re_radius
>> 1 -dm_landau_amr_z_refine1 1 -dm_landau_amr_z_refine2 0
>> -dm_landau_amr_post_refine 0 -dm_landau_z_radius1 .1 -dm_landau_z_radius2
>> .1 -dm_refine 1 -dm_landau_gpu_assembly false
>> +args: -petscspace_degree 3 -petscspace_poly_tensor 1 -dm_landau_type
>> p4est -dm_landau_ion_masses 2,4 -dm_landau_ion_charges 1,18
>> -dm_landau_thermal_temps 5,5,.5 -dm_landau_n 1.00018,1,1e-5 -dm_landau_n_0
>> 1e20 -ex1_ts_monitor -ex1_snes_rtol 1.e-14 -ex1_snes_stol 1.e-14
>> -ex1_snes_monitor -ex1_snes_converged_reason -ex1_ts_type arkimex
>> -ex1_ts_arkimex_type 1bee -ex1_ts_max_snes_failures -1 -ex1_ts_rtol 1e-1
>> -ex1_ts_dt 1.e-1 -ex1_ts_max_time 1 -ex1_ts_adapt_clip .5,1.25
>> -ex1_ts_adapt_scale_solve_failed 0.75
>> -ex1_ts_adapt_time_step_increase_delay 5 -ex1_ts_max_steps 1 -ex1_pc_type
>> lu -ex1_ksp_type preonly -dm_landau_amr_levels_max 7
>> -dm_landau_domain_radius 5 -dm_landau_amr_re_levels 0 -dm_landau_re_radius
>> 1 -dm_landau_amr_z_refine1 1 -dm_landau_amr_z_refine2 0
>> -dm_landau_amr_post_refine 0 -dm_landau_z_radius1 .1 -dm_landau_z_radius2
>> .1 -dm_refine 1 -dm_landau_gpu_assembly false *-dm_view hdf5:f.h5
>> -vec_view hdf5:f.h5::append -dm_landau_Ez 10.*
>>
>>  TEST*/
>>
>> On Wed, Jun 23, 2021 at 1:38 PM Mark Adams  wrote:
>>
>>> Landau ex1 should work. I will test.
>>>
>>> On Wed, Jun 23, 2021 at 10:47 AM Matthew Knepley 
>>> wrote:
>>>
>>>> On Wed, Jun 23, 2021 at 10:44 AM Junchao Zhang 
>>>> wrote:
>>>>
>>>>> Use VecGetArrayRead/Write() to get up-to-date host pointers to the
>>>>> vector array.
>>>>>
>>>>
>>>> I think Mark is saying that those are not working. We do call
>>>> VecGetArrayRead() in the HDF5 code.
>>>>
>>>> Mark, it seem like a small broken code is necessary.
>>>>
>>>>   Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Wed, Jun 23, 2021 at 9:15 AM Mark Adams  wrote:
>>>>>
>>>>>> First, there seem to be two pages for VecGetArrayAndMemType (one has
>>>>>> a pointer to the other).
>>>>>>
>>>>>> So I need to get a CPU array for HDF5 viewing. Totally broken for
>>>>>> devices.
>>>>>>
>>>>>> I don't find a VecGetArrayCpu[HOST] that does the right thing.
>>>>>>
>>>>>> Perhaps have VecGetArrayAndMemType return a valid CPU pointer when
>>>>>> "mtype==NULL"?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>


Re: [petsc-dev] VecGetArrayAndMemType

2021-06-23 Thread Junchao Zhang
Use VecGetArrayRead/Write() to get up-to-date host pointers to the vector
array.

--Junchao Zhang


On Wed, Jun 23, 2021 at 9:15 AM Mark Adams  wrote:

> First, there seem to be two pages for VecGetArrayAndMemType (one has a
> pointer to the other).
>
> So I need to get a CPU array for HDF5 viewing. Totally broken for devices.
>
> I don't find a VecGetArrayCpu[HOST] that does the right thing.
>
> Perhaps have VecGetArrayAndMemType return a valid CPU pointer when
> "mtype==NULL"?
>
> Mark
>


Re: [petsc-dev] kokkos test fail in branch

2021-06-05 Thread Junchao Zhang
This problem was fixed in
https://gitlab.com/petsc/petsc/-/merge_requests/4056, and is waiting for
!3411 <https://gitlab.com/petsc/petsc/-/merge_requests/3411> :)
--Junchao Zhang


On Sat, Jun 5, 2021 at 9:42 PM Barry Smith  wrote:

>
>   Looks like the MPI libraries are not being passed to the NVCC internal
> compiler (gcc). This would normally be setup in MPI.py, please send
> configure.log
>
>   Barry
>
>
> On Jun 5, 2021, at 1:18 PM, Mark Adams  wrote:
>
> Ah, there was an old one there. I removed it and I can not compile ex3k. I
> thought check always rebuilt executables. Now I get a link error:
>
> 14:12 barry/2020-11-11/cleanup-matsetvaluesdevice *=
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 ex3k
>
> PATH=/sw/sources/lsf-tools/2.0/summit/bin:/sw/summit/xalt/1.2.1/bin:/sw/summit/forge/20.0.1/bin:/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/bin:/sw/summit/gcc/6.4.0/bin:/sw/summit/cuda/10.1.243/bin:/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/cmake-3.20.2-24ualfzy6em6ws5sbiu7rlgcuionodrm/bin:/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/darshan-runtime-3.1.7-cnvxicgf5j4ap64qi6v5gxp67hmrjz43/bin:/sw/sources/hpss/bin:/opt/ibm/spectrumcomputing/lsf/
> 10.1.0.9/linux3.10-glibc2.17-ppc64le-csm/etc:/opt/ibm/spectrumcomputing/lsf/10.1.0.9/linux3.10-glibc2.17-ppc64le-csm/bin:/opt/ibm/csm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibm/flightlog/bin:/opt/ibutils/bin:/opt/ibm/spectrum_mpi/jsm_pmix/bin:/opt/puppetlabs/bin:/usr/lpp/mmfs/bin:`dirname
> <http://10.1.0.9/linux3.10-glibc2.17-ppc64le-csm/etc:/opt/ibm/spectrumcomputing/lsf/10.1.0.9/linux3.10-glibc2.17-ppc64le-csm/bin:/opt/ibm/csm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibm/flightlog/bin:/opt/ibutils/bin:/opt/ibm/spectrum_mpi/jsm_pmix/bin:/opt/puppetlabs/bin:/usr/lpp/mmfs/bin:%60dirname>
> nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=gcc
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/bin/nvcc_wrapper
> --expt-extended-lambda -Xcompiler -rdynamic -lineinfo -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4
> -Xcompiler -fPIC -O3  -gencode arch=compute_70,code=sm_70
>  
> -I/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/include
> -Wno-deprecated-gpu-targets
>  -I/gpfs/alpine/csc314/scratch/adams/petsc/include
> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include
> -I/sw/summit/cuda/10.1.243/include
>  
> -I/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/include
>   -fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC
> -DLANDAU_MAX_Q=4 -Werror=maybe-uninitialized -O0   ex3k.kokkos.cxx
>  
> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/netlib-lapack-3.8.0-wcabdyqhdi5rooxbkqa6x5d7hxyxwdkm/lib64
> -Wl,-rpath,/sw/summit/cuda/10.1.243/lib64 -L/sw/summit/cuda/10.1.243/lib64
> -lpetsc -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lblas
> -llapack -ltriangle -lm -lz -lcudart -lcufft -lcublas -lcusparse -lcusolver
> -lcurand -lstdc++ -ldl -o ex3k
> nvcc_wrapper - *warning* you have set multiple optimization flags (-O*),
> only the last is used because nvcc can only accept a single optimization
> setting.
>
> */usr/bin/ld: /tmp/tmpxft_e0c2_-10_ex3k.kokkos.o: undefined
> reference to symbol 
> 'ompi_mpi_comm_self'*/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libmpi_ibm.so.3:
> error adding symbols: DSO missing from command line
> collect2: error: ld returned 1 exit status
> make: *** [ex3k] Error 1
>
> On Sat, Jun 5, 2021 at 12:16 PM Junchao Zhang 
> wrote:
>
>> $rm ex3k
>> $make ex3k
>> and run again?
>>
>> --Junchao Zhang
>>
>>
>> On Sat, Jun 5, 2021 at 10:25 AM Mark Adams  wrote:
>&

Re: [petsc-dev] kokkos test fail in branch

2021-06-05 Thread Junchao Zhang
$rm ex3k
$make ex3k
and run again?

--Junchao Zhang


On Sat, Jun 5, 2021 at 10:25 AM Mark Adams  wrote:

> This is posted in Barry's MR, but I get this error with Kokkos-cuda on
> Summit. Failing to open a shared lib.
> Thoughts?
> Mark
>
> 11:15 barry/2020-11-11/cleanup-matsetvaluesdevice=
> /gpfs/alpine/csc314/scratch/adams/petsc$ make
> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 check
> Running check examples to verify correct installation
> Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> gmake[3]: [runex3k_kokkos] Error 127 (ignored)
> 1,25c1,2
> < atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=1
> < Vec Object: Exact Solution 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 0.
> < 0.015625
> < 0.125
> < Process [1]
> < 0.421875
> < 1.
> < Vec Object: Forcing function 2 MPI processes
> <   type: mpikokkos
> < Process [0]
> < 1e-72
> < 1.50024
> < 3.01563
> < Process [1]
> < 4.67798
> < 7.
> <   0 SNES Function norm 5.414682427127e+00
> <   1 SNES Function norm 2.952582418265e-01
> <   2 SNES Function norm 4.502293658739e-04
> <   3 SNES Function norm 1.389665806646e-09
> < Number of SNES iterations = 3
> < Norm of error 1.49752e-10 Iterations 3
> ---
>
>
> *> ./ex3k: error while loading shared libraries: libpetsc.so.3.015: cannot
> open shared object file: No such file or directory> ./ex3k: error while
> loading shared libraries: libpetsc.so.3.015: cannot open shared object
> file: No such file or directory*
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
>


Re: [petsc-dev] [petsc-users] ex5k.kokkos compile error

2021-06-03 Thread Junchao Zhang
(Moved to petsc-dev)

Mark,
   Is it because your branch is out of date?  petsc should use nvcc_wapper
to compile ex5k.kokkos.cxx.  See mine
$cd ~/petsc/src/mat/tutorials
$ make ex5k
PATH=/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/gdb-9.1-bzjzolog57gom5anscfcb3oe6uqr6s6m/bin:/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/bear-2.2.0-xhfk3moyh7jled6o62f3om557iia6oun/bin:/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/perl-5.30.3-2boymujkmxezayez4emfw5sw5wuqgar6/bin:/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/openmpi-4.0.2-7vmyqiyk4iyvdeoqpux7fyoce6mjt7iw/bin:/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/cuda-10.2.89-uy4hlcnd7svrcahaguwx4bzl7ujoqx2v/bin:/nfs/gce/software/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/cmake-3.20.0-vov726r/bin:/nfs/gce/software/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/gcc-8.3.0-fjpc5ys/bin:/home/jczhang/soft/bin:/home/jczhang/spack/bin:/home/jczhang/arm/forge/20.2/bin:/home/jczhang/.vscode-server/data/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/10.0.0/clangd_10.0.0/bin:/home/jczhang/.vscode-server/bin/054a9295330880ed74ceaedda236253b4f39a335/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:`dirname
nvcc` NVCC_WRAPPER_DEFAULT_COMPILER=gcc
/home/jczhang/petsc/linux-kokkos-dbg/bin/nvcc_wrapper --expt-extended-lambda
-Xcompiler -fPIC -g  -gencode arch=compute_70,code=sm_70
 
-I/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/openmpi-4.0.2-7vmyqiyk4iyvdeoqpux7fyoce6mjt7iw/include
-Wno-deprecated-gpu-targets  -I/home/jczhang/petsc/include
-I/home/jczhang/petsc/linux-kokkos-dbg/include
 
-I/home/jczhang/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-8.3.0/openmpi-4.0.2-7vmyqiyk4iyvdeoqpux7fyoce6mjt7iw/include
  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
-fstack-protector -fvisibility=hidden -g -O0   ex5k.kokkos.cxx
 -Wl,-rpath,/home/jczhang/petsc/linux-kokkos-dbg/lib
-L/home/jczhang/petsc/linux-kokkos-dbg/lib
-Wl,-rpath,/home/jczhang/petsc/linux-kokkos-dbg/lib
-L/home/jczhang/petsc/linux-kokkos-dbg/lib -lpetsc -lkokkoskernels
-lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lcufft
-lcublas -lcusparse -lcusolver -lcurand -lX11 -lquadmath -lstdc++ -ldl -o
ex5k

--Junchao Zhang


On Thu, Jun 3, 2021 at 8:32 AM Mark Adams  wrote:

> I am getting this error:
>
> 09:22 adams/landau-mass-opt=
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/tutorials$ make
> PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc
> PETSC_ARCH=arch-summit-opt-gnu-kokkos-notpl-cuda10 ex5k.kokkos
> mpicxx -fPIC -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10
> -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O0   -fPIC -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DPETSC_HAVE_CUDA_ATOMIC -DLANDAU_MAX_Q=4 -O0
> -fPIC-I/gpfs/alpine/csc314/scratch/adams/petsc/include
> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include
> -I/sw/summit/cuda/10.1.243/include ex5k.kokkos.cxx
>  
> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib
> -L/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/netlib-lapack-3.8.0-wcabdyqhdi5rooxbkqa6x5d7hxyxwdkm/lib64
> -Wl,-rpath,/sw/summit/cuda/10.1.243/lib64 -L/sw/summit/cuda/10.1.243/lib64
> -lpetsc -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lblas
> -llapack -ltriangle -lm -lz -lcudart -lcufft -lcublas -lcusparse -lcusolver
> -lcurand -lstdc++ -ldl -o ex5k.kokkos
> In file included from
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include/KokkosCore_Config_SetupBackend.hpp:47:0,
>  from
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include/Kokkos_Macros.hpp:109,
>  from
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include/Kokkos_Core_fwd.hpp:52,
>  from
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include/Kokkos_Core.hpp:51,
>  from ex5k.kokkos.cxx:10:
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/include/setup/Kokkos_Setup_Cuda.hpp:54:2:
> error: #error "KOKKOS_ENABLE_CUDA defined but the compiler is not defining
> the __CUDACC__ macro as expected"
>  #error \
>   ^
> In file included from
> /sw/summit/cuda/10.1.243/include/crt/co

Re: [petsc-dev] [petsc-users] strange segv

2021-05-29 Thread Junchao Zhang
try gcc/6.4.0
--Junchao Zhang


On Sat, May 29, 2021 at 9:50 PM Mark Adams  wrote:

> And I grief using gcc-8.1.1 and get this error:
>
> /autofs/nccs-svm1_sw/summit/gcc/8.1.1/include/c++/8.1.1/type_traits(347):
> error: identifier "__ieee128" is undefined
>
> Any ideas?
>
> On Sat, May 29, 2021 at 10:39 PM Mark Adams  wrote:
>
>> And  valgrind sees this. I think the jump to the function is going to the
>> wrong place.
>> I'm giving up on PGI but can try newer versions of GCC. (what is the deal
>> with the range of major releases, 4-10?)
>> (as I said this looks like an error that a user is getting so I'd like to
>> figure it out).
>>
>> 0 SNES Function norm 4.974994975313e-03
>> ==77820== Invalid read of size 4
>> ==77820==at 0x7E69068: LandauKokkosJacobian (in
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-kokkos-notpl-cuda10/lib/libpetsc.so.3.015.0)
>> ==77820==by 0x7C598AF: LandauFormJacobian_Internal (plexland.c:212)
>> ==77820==by 0x7C728D3: LandauIJacobian (plexland.c:2107)
>> ==77820==by 0x7C8C26B: TSComputeIJacobian (ts.c:934)
>> ==77820==by 0x7E28337: SNESTSFormJacobian_Theta (theta.c:1007)
>> ==77820==by 0x7CBBFD3: SNESTSFormJacobian (ts.c:4415)
>> ==77820==by 0x7AD84BF: SNESComputeJacobian (snes.c:2824)
>> ==77820==by 0x7BA945B: SNESSolve_NEWTONLS (ls.c:222)
>> ==77820==by 0x7AF336F: SNESSolve (snes.c:4769)
>> ==77820==by 0x7E19D13: TSTheta_SNESSolve (theta.c:185)
>> ==77820==by 0x7E1A8B7: TSStep_Theta (theta.c:223)
>> ==77820==by 0x7CB093F: TSStep (ts.c:3571)
>> ==77820==  Address 0x96fff690 is in a --- anonymous segment
>> ==77820==
>> [0]PETSC ERROR:
>> 
>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
>> X to find memory corruption errors
>> [0]PETSC ERROR: likely location of problem given in stack below
>> [0]PETSC ERROR: -  Stack Frames
>> 
>> [0]PETSC ERROR: The EXACT line numbers in the error traceback are not
>> available.
>> [0]PETSC ERROR: instead the line number of the start of the function is
>> given.
>> [0]PETSC ERROR: #1 LandauKokkosJacobian() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ts/utils/dmplexlandau/kokkos/landau.kokkos.cxx:272
>>
>> On Sat, May 29, 2021 at 8:46 PM Mark Adams  wrote:
>>
>>>
>>>
>>> On Sat, May 29, 2021 at 7:48 PM Barry Smith  wrote:
>>>
>>>>
>>>>I don't see why it is not running the Kokkos check. Here is the rule
>>>> right below the CUDA rule that is apparently running.
>>>>
>>>> check_build:
>>>> -@echo "Running check examples to verify correct installation"
>>>> -@echo "Using PETSC_DIR=${PETSC_DIR} and
>>>> PETSC_ARCH=${PETSC_ARCH}"
>>>> +@cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
>>>> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR} clean-legacy
>>>> +@cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
>>>> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR} testex19
>>>> +@if [ "${HYPRE_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ]
>>>> &&  [ "${PETSC_SCALAR}" = "real" ]; then \
>>>>   cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
>>>> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR}
>>>> DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_hypre; \
>>>>  fi;
>>>> +@if [ "${CUDA_LIB}" != "" ] && [ "${PETSC_WITH_BATCH}" = "" ]
>>>> &&  [ "${PETSC_SCALAR}" = "real" ]; then \
>>>>   cd src/snes/tutorials >/dev/null; ${OMAKE_SELF}
>>>> PETSC_ARCH=${PETSC_ARCH}  PETSC_DIR=${PETSC_DIR}
>>>> DIFF=${PETSC_DIR}/lib/petsc/bin/petscdiff runex19_cuda; \
>>>>  fi;
>>>> +@if [ "${KOKKOS_KERNELS_LIB}" != "" ] && [
>>>> "${PETSC_WITH_BATCH}" = "" ] &&  [ "${PETSC_SCALAR}" = "rea

Re: [petsc-dev] VS Code debugger and PETSc

2021-05-29 Thread Junchao Zhang
I don't have.  I think the main problem with petsc is one usually needs to
debug with multiple MPI ranks.
For light debug, I use gdb or tmpi <https://github.com/Azrael3000/tmpi>;
for heavy debug, I use ddt on servers (need license).

--Junchao Zhang


On Fri, May 28, 2021 at 3:14 PM Aagaard, Brad T via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> Does anyone have any experience getting -start_in_debugger to launch the
> debugger integrated into VS Code?
>
>
>
> Thanks,
>
> Brad
>
>
>


Re: [petsc-dev] Problem with the reorganization of complex with clang

2021-05-16 Thread Junchao Zhang
I checked and found PetscComplex was correctly defined to std::complex, and
clang-6.0 was also fine to use GNU C++ library.  The problem was the
gcc-4.8.5 C++ library clang picked up was a bit old, which conformed to
C++11. However clang-6.0's default uses C++14, which changed the rules of
*constexpr.*
Workarounds include:

   - define PETSC_SKIP_CXX_COMPLEX_FIX  in the offending *.cxx file.
   - add CXXOPTFLAGS=-std=c++11
   - update clang-6.o or gcc-4.8.5 (of 2015) on that machine.

--Junchao Zhang


On Fri, May 14, 2021 at 10:42 AM Satish Balay  wrote:

> You can login to: isdp001.cels.anl.gov - using your MCS account
> credentials [and then setup ~/.ssh/authorized_keys]
>
> Satish
>
> On Fri, 14 May 2021, Junchao Zhang wrote:
>
> > Satish, how to access this machine? I want to know why complex is screwed
> > up.
> >
> > --Junchao Zhang
> >
> >
> > On Thu, May 13, 2021 at 7:08 PM Matthew Knepley 
> wrote:
> >
> > > Nope. I will use your fix.
> > >
> > >   Thanks,
> > >
> > >  Matt
> > >
> > > On Thu, May 13, 2021 at 7:55 PM Matthew Knepley 
> wrote:
> > >
> > >> I am going to try just including petscsys.h and see if it works.
> > >>
> > >>   Thanks,
> > >>
> > >>  Matt
> > >>
> > >> On Thu, May 13, 2021 at 6:23 PM Satish Balay 
> wrote:
> > >>
> > >>> This gets the build going...
> > >>>
> > >>> diff --git a/src/sys/dll/cxx/demangle.cxx
> b/src/sys/dll/cxx/demangle.cxx
> > >>> index 31810ea15f..793a97d285 100644
> > >>> --- a/src/sys/dll/cxx/demangle.cxx
> > >>> +++ b/src/sys/dll/cxx/demangle.cxx
> > >>> @@ -1,3 +1,4 @@
> > >>> +#define PETSC_SKIP_COMPLEX
> > >>>  #include 
> > >>>
> > >>>  #ifdef PETSC_HAVE_CXXABI_H
> > >>>
> > >>> Satish
> > >>>
> > >>> On Thu, 13 May 2021, Satish Balay wrote:
> > >>>
> > >>> > > CXX arch-ci-linux-clang-avx/obj/sys/dll/cxx/demangle.o
> > >>> >
> > >>> > It is built with a c++ compiler - so __cplusplus should be defined.
> > >>> [PETSC_HAVE_CXXABI is not defined]
> > >>> >
> > >>> > Do you need to build this sourcefile file in a clanguage=C build?
> > >>> >
> > >>> > I'm not sure if a c++/complex build  is checked with this compiler.
> > >>> >
> > >>> > [eventhough its clang build - I see the compiler is using system
> > >>> incldues aka from gcc-4.8.5 - so perhaps some things don't work?]
> > >>> >
> > >>> > One option is to add the following to this sourcefile:
> > >>> >
> > >>> > #define PETSC_SKIP_COMPLEX
> > >>> >
> > >>> > Satish
> > >>> >
> > >>> >
> > >>> > On Thu, 13 May 2021, Matthew Knepley wrote:
> > >>> >
> > >>> > > In this CI run (linux-clang-avg):
> > >>> > >
> > >>> > >   https://gitlab.com/petsc/petsc/-/jobs/1260342204
> > >>> > >
> > >>> > > The compile fails building a C++ file, demangle.cxx. It fails at
> the
> > >>> first
> > >>> > > line,
> > >>> > > including , down in petscsystypes.h.
> It
> > >>> bombs
> > >>> > > during the definition of complex because it looks like the
> compiler
> > >>> is not
> > >>> > > defining __cplusplus, and thus takes the wrong branch. Is this
> what
> > >>> is
> > >>> > > happening?
> > >>> > > I cannot access this machine.
> > >>> > >
> > >>> > >   Thanks,
> > >>> > >
> > >>> > >   Matt
> > >>> > >
> > >>> > >
> > >>> >
> > >>> >
> > >>>
> > >>>
> > >>
> > >> --
> > >> What most experimenters take for granted before they begin their
> > >> experiments is infinitely more interesting than any results to which
> their
> > >> experiments lead.
> > >> -- Norbert Wiener
> > >>
> > >> https://www.cse.buffalo.edu/~knepley/
> > >> <http://www.cse.buffalo.edu/~knepley/>
> > >>
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> their
> > > experiments lead.
> > > -- Norbert Wiener
> > >
> > > https://www.cse.buffalo.edu/~knepley/
> > > <http://www.cse.buffalo.edu/~knepley/>
> > >
> >
>
>


Re: [petsc-dev] Problem with the reorganization of complex with clang

2021-05-14 Thread Junchao Zhang
Satish, how to access this machine? I want to know why complex is screwed
up.

--Junchao Zhang


On Thu, May 13, 2021 at 7:08 PM Matthew Knepley  wrote:

> Nope. I will use your fix.
>
>   Thanks,
>
>  Matt
>
> On Thu, May 13, 2021 at 7:55 PM Matthew Knepley  wrote:
>
>> I am going to try just including petscsys.h and see if it works.
>>
>>   Thanks,
>>
>>  Matt
>>
>> On Thu, May 13, 2021 at 6:23 PM Satish Balay  wrote:
>>
>>> This gets the build going...
>>>
>>> diff --git a/src/sys/dll/cxx/demangle.cxx b/src/sys/dll/cxx/demangle.cxx
>>> index 31810ea15f..793a97d285 100644
>>> --- a/src/sys/dll/cxx/demangle.cxx
>>> +++ b/src/sys/dll/cxx/demangle.cxx
>>> @@ -1,3 +1,4 @@
>>> +#define PETSC_SKIP_COMPLEX
>>>  #include 
>>>
>>>  #ifdef PETSC_HAVE_CXXABI_H
>>>
>>> Satish
>>>
>>> On Thu, 13 May 2021, Satish Balay wrote:
>>>
>>> > > CXX arch-ci-linux-clang-avx/obj/sys/dll/cxx/demangle.o
>>> >
>>> > It is built with a c++ compiler - so __cplusplus should be defined.
>>> [PETSC_HAVE_CXXABI is not defined]
>>> >
>>> > Do you need to build this sourcefile file in a clanguage=C build?
>>> >
>>> > I'm not sure if a c++/complex build  is checked with this compiler.
>>> >
>>> > [eventhough its clang build - I see the compiler is using system
>>> incldues aka from gcc-4.8.5 - so perhaps some things don't work?]
>>> >
>>> > One option is to add the following to this sourcefile:
>>> >
>>> > #define PETSC_SKIP_COMPLEX
>>> >
>>> > Satish
>>> >
>>> >
>>> > On Thu, 13 May 2021, Matthew Knepley wrote:
>>> >
>>> > > In this CI run (linux-clang-avg):
>>> > >
>>> > >   https://gitlab.com/petsc/petsc/-/jobs/1260342204
>>> > >
>>> > > The compile fails building a C++ file, demangle.cxx. It fails at the
>>> first
>>> > > line,
>>> > > including , down in petscsystypes.h. It
>>> bombs
>>> > > during the definition of complex because it looks like the compiler
>>> is not
>>> > > defining __cplusplus, and thus takes the wrong branch. Is this what
>>> is
>>> > > happening?
>>> > > I cannot access this machine.
>>> > >
>>> > >   Thanks,
>>> > >
>>> > >   Matt
>>> > >
>>> > >
>>> >
>>> >
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] Compilation errors with kokkos and --with-scalar-type complex

2021-04-21 Thread Junchao Zhang
On Wed, Apr 21, 2021 at 5:23 AM Stefano Zampini 
wrote:

> Incidentally, I found PETSc does not compile when configured using
>
> --with-scalar-type=complex --with-kokkos-dir=
> --with-kokkos_kernels-dir=...
>
With cuda or not?

>
> Some fixes are trivial, others require some more thought.
>
> Thanks
> --
> Stefano
>


Re: [petsc-dev] -with-kokkos-cuda-arch=AMPERE80 nonsense

2021-04-05 Thread Junchao Zhang
On Mon, Apr 5, 2021 at 7:33 PM Jeff Hammond  wrote:

> NVCC has supported multi-versioned "fat" binaries since I worked for
> Argonne.  Libraries should figure out what the oldest hardware they are
> about is and then compile for everything from that point forward.  Kepler
> (3.5) is oldest version any reasonable person should be thinking about at
> this point.  The oldest thing I know of in the DOE HPC fleet is Pascal
> (6.x).  Volta and Turing are 7.x and Ampere is 8.x.
>
> The biggest architectural changes came with unified memory (
> https://developer.nvidia.com/blog/unified-memory-in-cuda-6/) and
> cooperative (https://developer.nvidia.com/blog/cooperative-groups/ in
> CUDA 9) but Kokkos doesn't use the latter.  Both features can be used on
> quite old GPU architectures, although the performance is better on newer
> ones.
>
> I haven't dug into what Kokkos and PETSc are doing but the direct use of
> this stuff in CUDA is well-documented, certainly as well as the CPU
> switches for x86 binaries in the Intel compiler are.
>
>
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
>
> Devices with the same major revision number are of the same core
> architecture. The major revision number is 8 for devices based on the NVIDIA
> Ampere GPU architecture, 7 for devices based on the Volta architecture, 6
> for devices based on the Pascal architecture, 5 for devices based on the
> Maxwell architecture, 3 for devices based on the Kepler architecture, 2
> for devices based on the Fermi architecture, and 1 for devices based on
> the Tesla architecture.
>
Kokkos has config options Kokkos_ARCH_TURING75,
Kokkos_ARCH_VOLTA70, Kokkos_ARCH_VOLTA72.Any idea how one can map
compute capability versions to arch names?


>
>
>
> https://docs.nvidia.com/cuda/pascal-compatibility-guide/index.html#building-pascal-compatible-apps-using-cuda-8-0
>
> https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html#building-volta-compatible-apps-using-cuda-9-0
>
> https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
>
> https://docs.nvidia.com/cuda/ampere-compatibility-guide/index.html#building-ampere-compatible-apps-using-cuda-11-0
>
> Programmatic querying can be done with the following (
> https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html):
>
> cudaDeviceGetAttribute
>
>-
>
>cudaDevAttrComputeCapabilityMajor
>
> :
>Major compute capability version number;
>-
>
>cudaDevAttrComputeCapabilityMinor
>
> :
>Minor compute capability version number;
>
> The compiler help tells me this, which can be cross-referenced with CUDA
> documentation above.
>
> $ /usr/local/cuda-10.0/bin/nvcc -h
>
>
> Usage  : nvcc [options] 
>
>
> ...
>
>
> Options for steering GPU code generation.
>
> =
>
>
> --gpu-architecture   (-arch)
>
>
> Specify the name of the class of NVIDIA 'virtual' GPU
> architecture for which
>
> the CUDA input files must be compiled.
>
> With the exception as described for the shorthand below, the
> architecture
>
> specified with this option must be a 'virtual' architecture (such
> as compute_50).
>
> Normally, this option alone does not trigger assembly of the
> generated PTX
>
> for a 'real' architecture (that is the role of nvcc option
> '--gpu-code',
>
> see below); rather, its purpose is to control preprocessing and
> compilation
>
> of the input to PTX.
>
> For convenience, in case of simple nvcc compilations, the
> following shorthand
>
> is supported.  If no value for option '--gpu-code' is specified,
> then the
>
> value of this option defaults to the value of
> '--gpu-architecture'.  In this
>
> situation, as only exception to the description above, the value
> specified
>
> for '--gpu-architecture' may be a 'real' architecture (such as a
> sm_50),
>
> in which case nvcc uses the specified 'real' architecture and its
> closest
>
> 'virtual' architecture as effective architecture values.  For
> example, 'nvcc
>
> --gpu-architecture=sm_50' is equivalent to 'nvcc
> --gpu-architecture=compute_50
>
> --gpu-code=sm_50,compute_50'.
>
> Allowed values for this option:
> 'compute_30','compute_32','compute_35',
>
>
> 'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
>
>
> 'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
>
>
> 'sm_37','sm_50','sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72',
>
>  

Re: [petsc-dev] Possible SF bug

2021-04-01 Thread Junchao Zhang
Matt,
It is easy for me to know it is not an SF bug, since the code passes a null
pointer (as leafdata) to an SF with 9 roots and leaves. But I could not
figure out what the root reason is.  Here is my finding.
In DMSetUp(forest) >> DMSetUp_p4est >> DMPforestGetPlex
>> DMConvert_pforest_plex >> .. >> DMShareDiscretization,  we have
 dmB->sectionSF = dmA->sectionSF;
Which makes the two DMs,  forest, and  ((DM_Forest_pforest*) ((DM_Forest*)
forest->data)->data)->plex, share the same sectionSF. But in the test,
there are two places to populate the sectionSF based on different DMs.

1) In DMGlobalToLocal(forest, g, INSERT_VALUES, l), the SF is built based
on forest and its local/globalSection.
2) In VecView(g, viewer), which does
 ierr = VecGetDM(vec,);CHKERRQ(ierr);

  ierr = DMPforestGetPlex(dm,);CHKERRQ(ierr);
  ierr = VecSetDM(vec,plex);CHKERRQ(ierr);
  ierr = VecView_Plex(vec,viewer);CHKERRQ(ierr);
  ierr = VecSetDM(vec,dm);CHKERRQ(ierr);

VecView_Plex() calls VecView_Plex_HDF5_Internal(), which builds the SF
based on plex's local/globalSection.

Depending on which function is called first, we get different SFs. The
crashed one did 1) first and then 2). The 'good'  one did 2) and then 1).
But  even the good one is wrong, since it gives an empty SF (thus not
crashing the code).

--Junchao Zhang


On Tue, Mar 30, 2021 at 5:44 AM Matthew Knepley  wrote:

> On Mon, Mar 29, 2021 at 11:05 PM Junchao Zhang 
> wrote:
>
>> Matt,
>>   I can reproduce the error. Let me see what is wrong.
>>
>
> Thanks! It might be a bug in Plex or Forest as well, but it is hard for me
> to tell.
>
>Matt
>
>
>>   Thanks.
>> --Junchao Zhang
>>
>>
>> On Mon, Mar 29, 2021 at 2:16 PM Matthew Knepley 
>> wrote:
>>
>>> Junchao,
>>>
>>> I have an SF problem, which I think is a caching bug, but it is hard to
>>> see what is happening in the internals. I have made a small example which
>>> should help you see what is wrong. It is attached.
>>>
>>> If you run without arguments, you get
>>>
>>> master *:~/Downloads/tmp/Salac$ ./forestHDF
>>> [0]PETSC ERROR: - Error Message
>>> --
>>> [0]PETSC ERROR: Null argument, when expecting valid pointer
>>> [0]PETSC ERROR: Trying to copy to a null pointer
>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.14.5-879-g03cacdc99d
>>>  GIT Date: 2021-03-22 01:02:08 +
>>> [0]PETSC ERROR: ./forestHDF on a arch-master-debug named
>>> MacBook-Pro.fios-router.home by knepley Mon Mar 29 15:14:16 2021
>>> [0]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug
>>> --download-bamg --download-chaco --download-ctetgen --download-egads
>>> --download-eigen --download-exodusii --download-fftw --download-hpddm
>>> --download-libpng --download-metis --download-ml --download-mumps
>>> --download-netcdf --download-opencascade --download-p4est
>>> --download-parmetis --download-pnetcdf --download-scalapack
>>> --download-slepc --download-suitesparse --download-superlu_dist
>>> --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake
>>> --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest
>>> --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple
>>> --with-shared-libraries --with-slepc --with-zlib --download-tetgen
>>> [0]PETSC ERROR: #1 PetscMemcpy() at
>>> /PETSc3/petsc/petsc-dev/include/petscsys.h:1798
>>> [0]PETSC ERROR: #2 UnpackAndInsert_PetscReal_1_1() at
>>> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:426
>>> [0]PETSC ERROR: #3 ScatterAndInsert_PetscReal_1_1() at
>>> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:426
>>> [0]PETSC ERROR: #4 PetscSFLinkScatterLocal() at
>>> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:1248
>>> [0]PETSC ERROR: #5 PetscSFBcastBegin_Basic() at
>>> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfbasic.c:193
>>> [0]PETSC ERROR: #6 PetscSFBcastWithMemTypeBegin() at
>>> /PETSc3/petsc/petsc-dev/src/vec/is/sf/interface/sf.c:1493
>>> [0]PETSC ERROR: #7 DMGlobalToLocalBegin() at
>>> /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:2565
>>> [0]PETSC ERROR: #8 VecView_Plex_HDF5_Internal() at
>>> /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:251
>>> [0]PETSC ERROR: #9 VecView_Plex() at
>>> /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:385
>>>

Re: [petsc-dev] Possible SF bug

2021-03-29 Thread Junchao Zhang
Matt,
  I can reproduce the error. Let me see what is wrong.
  Thanks.
--Junchao Zhang


On Mon, Mar 29, 2021 at 2:16 PM Matthew Knepley  wrote:

> Junchao,
>
> I have an SF problem, which I think is a caching bug, but it is hard to
> see what is happening in the internals. I have made a small example which
> should help you see what is wrong. It is attached.
>
> If you run without arguments, you get
>
> master *:~/Downloads/tmp/Salac$ ./forestHDF
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Null argument, when expecting valid pointer
> [0]PETSC ERROR: Trying to copy to a null pointer
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.14.5-879-g03cacdc99d
>  GIT Date: 2021-03-22 01:02:08 +
> [0]PETSC ERROR: ./forestHDF on a arch-master-debug named
> MacBook-Pro.fios-router.home by knepley Mon Mar 29 15:14:16 2021
> [0]PETSC ERROR: Configure options --PETSC_ARCH=arch-master-debug
> --download-bamg --download-chaco --download-ctetgen --download-egads
> --download-eigen --download-exodusii --download-fftw --download-hpddm
> --download-libpng --download-metis --download-ml --download-mumps
> --download-netcdf --download-opencascade --download-p4est
> --download-parmetis --download-pnetcdf --download-scalapack
> --download-slepc --download-suitesparse --download-superlu_dist
> --download-triangle --with-cmake-exec=/PETSc3/petsc/apple/bin/cmake
> --with-ctest-exec=/PETSc3/petsc/apple/bin/ctest
> --with-hdf5-dir=/PETSc3/petsc/apple --with-mpi-dir=/PETSc3/petsc/apple
> --with-shared-libraries --with-slepc --with-zlib --download-tetgen
> [0]PETSC ERROR: #1 PetscMemcpy() at
> /PETSc3/petsc/petsc-dev/include/petscsys.h:1798
> [0]PETSC ERROR: #2 UnpackAndInsert_PetscReal_1_1() at
> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:426
> [0]PETSC ERROR: #3 ScatterAndInsert_PetscReal_1_1() at
> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:426
> [0]PETSC ERROR: #4 PetscSFLinkScatterLocal() at
> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfpack.c:1248
> [0]PETSC ERROR: #5 PetscSFBcastBegin_Basic() at
> /PETSc3/petsc/petsc-dev/src/vec/is/sf/impls/basic/sfbasic.c:193
> [0]PETSC ERROR: #6 PetscSFBcastWithMemTypeBegin() at
> /PETSc3/petsc/petsc-dev/src/vec/is/sf/interface/sf.c:1493
> [0]PETSC ERROR: #7 DMGlobalToLocalBegin() at
> /PETSc3/petsc/petsc-dev/src/dm/interface/dm.c:2565
> [0]PETSC ERROR: #8 VecView_Plex_HDF5_Internal() at
> /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plexhdf5.c:251
> [0]PETSC ERROR: #9 VecView_Plex() at
> /PETSc3/petsc/petsc-dev/src/dm/impls/plex/plex.c:385
> [0]PETSC ERROR: #10 VecView_p4est() at
> /PETSc3/petsc/petsc-dev/src/dm/impls/forest/p4est/pforest.c:4922
> [0]PETSC ERROR: #11 VecView() at
> /PETSc3/petsc/petsc-dev/src/vec/vec/interface/vector.c:613
> [0]PETSC ERROR: #12 main() at
> /Users/knepley/Downloads/tmp/Salac/forestHDF.c:53
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -malloc_debug
> [0]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--
> application called MPI_Abort(MPI_COMM_SELF, 53001) - process 0
> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=53001
>
> If you run with
>
>   ./forestHDF -write_early
>
> or
>
>   ./forestHDF -no_g2l
>
> Then it is fine. Thus it appears to me that if you run a G2L at the wrong
> time, something is incorrectly cached.
>
>   Thanks,
>
> Matt
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] configureLibrary fails for c++11 projects

2021-03-23 Thread Junchao Zhang
Can we combine CXXPPFLAGS and CXXFLAGS into one CXXFLAGS?
--Junchao Zhang


On Tue, Mar 23, 2021 at 11:38 AM Patrick Sanan 
wrote:

> I had a related (I think) issue trying to build with Kokkos. Those headers
> throw an #error if they're expecting OpenMP and the compiler doesn't have
> the OpenMP flag. I have an open MR here (number 60^2!) which thus adds the
> OpenMP flag to the CXXPPFLAGS:
> https://gitlab.com/petsc/petsc/-/merge_requests/3600
>
>
> My collaborator at CSCS was testing with the latest Kokkos and ran into an
> even hairier version of this problem trying to use CUDA - the Kokkos
> headers now apparently check that you're using nvcc. He has some workaround
> which I'll review and hopefully be able to submit.
>
>
> Am 23.03.2021 um 17:04 schrieb Stefano Zampini  >:
>
> The check fails within buildsystem when running mpicc -E (which uses
> CXXPPFLAGS)  The package header needs c++11  to be included properly.
> C++11 is also needed at preprocessing time
>
> Il Mar 23 Mar 2021, 18:59 Satish Balay  ha scritto:
>
>> -std=cxx11 for sure is a compile flag. But don't really know if its
>> also needed at pre-process stage and/or at link stage.
>>
>> And for compile stage both CXXFLAGS and CXXPPFLAGS should get
>> used. [PETSc makefiles make sure this is the case]
>>
>> And for link stage CXXFLAGS and LDFLAGS get used [but then sometimes
>> we have CLINKER, and FLINKER - and they certainly don't use CXXFLAGS -
>> so -std=cxx11 isn't really needed at link time?
>>
>> So the previous default of CXXPPFLAGS=-std=cxx11 looks reasonable to me.
>>
>> However if this project is not using PETSc makefiles - it should make
>> sure all compile flags are grabbed.
>>
>> # lib/petsc/conf/variables
>> PETSC_CXXCPPFLAGS   = ${PETSC_CC_INCLUDES} ${PETSCFLAGS}
>> ${CXXPP_FLAGS} ${CXXPPFLAGS}
>> CXXCPPFLAGS = ${PETSC_CXXCPPFLAGS}
>> PETSC_CXXCOMPILE_SINGLE = ${CXX} -o $*.o -c ${CXX_FLAGS} ${CXXFLAGS}
>> ${CXXCPPFLAGS}
>>
>> # lib/petsc/conf/rules
>> .cpp.o .cxx.o .cc.o .C.o:
>> ${PETSC_CXXCOMPILE_SINGLE} `pwd`/$<
>>
>> # gmakefile.test
>> PETSC_COMPILE.cxx = $(call quiet,CXX) -c $(CXX_FLAGS) $(CXXFLAGS)
>> $(CXXCPPFLAGS) $(CXX_DEPFLAGS)
>>
>> # lib/petsc/conf/test
>> LINK.cc = $(CXXLINKER) $(CXX_FLAGS) $(CXXFLAGS) $(CXXCPPFLAGS) $(LDFLAGS)
>>
>> Satish
>>
>>
>> On Tue, 23 Mar 2021, Junchao Zhang wrote:
>>
>> > I would rather directly change the project to use CXXFLAGS instead of
>> > CXXPPFLAGS.
>> >
>> > --Junchao Zhang
>> >
>> >
>> > On Tue, Mar 23, 2021 at 10:01 AM Satish Balay via petsc-dev <
>> > petsc-dev@mcs.anl.gov> wrote:
>> >
>> > > On Tue, 23 Mar 2021, Stefano Zampini wrote:
>> > >
>> > > > Just tried out of main, and and the include tests of a c++11
>> project fail
>> > > > Below my fix, if we agree on, I'll make a MR
>> > > >
>> > > > diff --git a/config/BuildSystem/config/compilers.py
>> > > > b/config/BuildSystem/config/compilers.py
>> > > > index c96967e..44e4657 100644
>> > > > --- a/config/BuildSystem/config/compilers.py
>> > > > +++ b/config/BuildSystem/config/compilers.py
>> > > > @@ -527,6 +527,8 @@ class Configure(config.base.Configure):
>> > > >  if self.setCompilers.checkCompilerFlag(flag, includes,
>> > > > body+body14):
>> > > >newflag = getattr(self.setCompilers,LANG+'FLAGS') + ' ' +
>> > > flag #
>> > > > append flag to the old
>> > > >setattr(self.setCompilers,LANG+'FLAGS',newflag)
>> > > > +  newflag = getattr(self.setCompilers,LANG+'PPFLAGS') + '
>> ' +
>> > > flag
>> > > > # append flag to the old
>> > > > +  setattr(self.setCompilers,LANG+'PPFLAGS',newflag)
>> > >
>> > >
>> > >
>> https://gitlab.com/petsc/petsc/commit/ead1aa4045d7bca177e78933b9ca25145fc3c574
>> > >
>> > >   self.setCompilers.CXXPPFLAGS += ' ' + flag
>> > >   newflag = getattr(self.setCompilers,LANG+'FLAGS') + ' ' +
>> flag #
>> > > append flag to the old
>> > >   setattr(self.setCompilers,LANG+'FLAGS',newflag)
>> > >
>> > > So the old code was setting 'PPFLAGS' - but this commit changed to
>> > > 'FLAGS'. Maybe this flag is needed at both compile time and link time?
>> > >
>&

Re: [petsc-dev] configureLibrary fails for c++11 projects

2021-03-23 Thread Junchao Zhang
I would rather directly change the project to use CXXFLAGS instead of
CXXPPFLAGS.

--Junchao Zhang


On Tue, Mar 23, 2021 at 10:01 AM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> On Tue, 23 Mar 2021, Stefano Zampini wrote:
>
> > Just tried out of main, and and the include tests of a c++11 project fail
> > Below my fix, if we agree on, I'll make a MR
> >
> > diff --git a/config/BuildSystem/config/compilers.py
> > b/config/BuildSystem/config/compilers.py
> > index c96967e..44e4657 100644
> > --- a/config/BuildSystem/config/compilers.py
> > +++ b/config/BuildSystem/config/compilers.py
> > @@ -527,6 +527,8 @@ class Configure(config.base.Configure):
> >  if self.setCompilers.checkCompilerFlag(flag, includes,
> > body+body14):
> >newflag = getattr(self.setCompilers,LANG+'FLAGS') + ' ' +
> flag #
> > append flag to the old
> >setattr(self.setCompilers,LANG+'FLAGS',newflag)
> > +  newflag = getattr(self.setCompilers,LANG+'PPFLAGS') + ' ' +
> flag
> > # append flag to the old
> > +  setattr(self.setCompilers,LANG+'PPFLAGS',newflag)
>
>
> https://gitlab.com/petsc/petsc/commit/ead1aa4045d7bca177e78933b9ca25145fc3c574
>
>   self.setCompilers.CXXPPFLAGS += ' ' + flag
>   newflag = getattr(self.setCompilers,LANG+'FLAGS') + ' ' + flag #
> append flag to the old
>   setattr(self.setCompilers,LANG+'FLAGS',newflag)
>
> So the old code was setting 'PPFLAGS' - but this commit changed to
> 'FLAGS'. Maybe this flag is needed at both compile time and link time?
>
> So this project is somehow using CXXPPFLAGS - but not CXXFLAGS?
>
> I'm fine with adding it to PPFLAGS - duplicate listing hopefully shouldn't
> cause grief.
>
> Satish
>
> >cxxdialect = 'C++14'
> >self.addDefine('HAVE_'+LANG+'_DIALECT_CXX14',1)
> >self.addDefine('HAVE_'+LANG+'_DIALECT_CXX11',1)
> > @@ -546,6 +548,8 @@ class Configure(config.base.Configure):
> >  if self.setCompilers.checkCompilerFlag(flag, includes, body):
> >newflag = getattr(self.setCompilers,LANG+'FLAGS') + ' ' +
> flag #
> > append flag to the old
> >setattr(self.setCompilers,LANG+'FLAGS',newflag)
> > +  newflag = getattr(self.setCompilers,LANG+'PPFLAGS') + ' ' +
> flag
> > # append flag to the old
> > +  setattr(self.setCompilers,LANG+'PPFLAGS',newflag)
> >cxxdialect = 'C++11'
> >self.addDefine('HAVE_'+LANG+'_DIALECT_CXX11',1)
> >break
> >
> >
> >
>
>


Re: [petsc-dev] Commit squashing in MR

2021-03-03 Thread Junchao Zhang
Oh, graph is an alias in my .gitconfig

[alias]
graph = log --graph --decorate --abbrev-commit --pretty=oneline

--Junchao Zhang


On Wed, Mar 3, 2021 at 1:51 PM Mark Adams  wrote:

>
>
> On Tue, Mar 2, 2021 at 10:02 PM Junchao Zhang 
> wrote:
>
>> I am a naive git user, so I use interactive git rebase.  Suppose I am on
>> the branch I want to modify,
>>
>> 1) Use git graph to locate an upstream commit to be used as the base
>> $ git graph
>>
>
> Humm 
>
> 14:49 adams/cusparse-lu-landau= /gpfs/alpine/csc314/scratch/adams/petsc$
> git --version
> git version 2.20.1
> 14:49 adams/cusparse-lu-landau= /gpfs/alpine/csc314/scratch/adams/petsc$
> git graph
> git: 'graph' is not a git command. See 'git --help'.
>
> The most similar commands are
> branch
> grep
>
>


Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Junchao Zhang
I am a naive git user, so I use interactive git rebase.  Suppose I am on
the branch I want to modify,

1) Use git graph to locate an upstream commit to be used as the base
$ git graph
* 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF to
SFCreateEmbeddedRootSF
* e7314fbb SF: add an MPI_Op argument to SFBcast
* 83df288d Replace MPIU_REPLACE with MPI_REPLACE
*   b434c516 Merge branch 'barry/2021-02-02/petscsf-communication-specific'
into 'main'
|\
| * 62152ded (barry/2021-02-02/petscsf-communication-specific)
PetscSFView() never called viewer for the specific type (bug), hence many
output files were incorrect.
* |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'

2) Suppose we choose b434c516 as the base. All commits we want to squash
are after it.  Do interactive git rebase. It shows a screen for you to
edit.  Read the help, which is helpful for new users
  $ git rebase -i b434c516
pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
pick e7314fbb SF: add an MPI_Op argument to SFBcast
pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

# Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
#
# Commands:
# p, pick  = use commit
# r, reword  = use commit, but edit the commit message
# e, edit  = use commit, but stop for amending
# s, squash  = use commit, but meld into previous commit
# f, fixup  = like "squash", but discard this commit's log message
# x, exec  = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop  = remove commit
# l, label  = label current HEAD with a name
# t, reset  = reset HEAD to a label
# m, merge [-C  | -c ]  [# ]
# .   create a merge commit using the original merge commit's
# .   message (or the oneline, if no original merge commit was
# .   specified). Use -c  to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

3) Suppose we want to squash the last two commits to 83df288d, replace
their pick with s (or f, see the help for difference), save and exit the
screen
pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
s e7314fbb SF: add an MPI_Op argument to SFBcast
s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

A new screen shows up

# This is a combination of 3 commits.
# This is the 1st commit message:

Replace MPIU_REPLACE with MPI_REPLACE

Since we believe all MPI implementations support MPI_REPLACE

# This is the commit message #2:

SF: add an MPI_Op argument to SFBcast

# This is the commit message #3:

SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.

4) Edit the commit message as you want, save and exit, done!

--Junchao Zhang


On Tue, Mar 2, 2021 at 6:19 PM Blaise A Bourdin  wrote:

> Hi,
>
> This is not technically a petsc question.
> It would be great to have a short section in the PETSc integration
> workflow document explaining how to squash commits in a MR for git-impaired
> developers like me.
>
> Anybody wants to pitch in, or explain me how to do this?
>
> Regards,
> Blaise
>
> --
> A.K. & Shirley Barton Professor of  Mathematics
> Adjunct Professor of Mechanical Engineering
> Adjunct of the Center for Computation & Technology
> Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803,
> USA
> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276 Web
> http://www.math.lsu.edu/~bourdin
>
>


Re: [petsc-dev] Understanding Vecscatter with Kokkos Vecs

2021-02-19 Thread Junchao Zhang
Even ISCUDA is simple to add, the PetscSFSetUp algorithm and many functions
involved are done on host (and are not simple to be parallelized on GPU)
The indices passed to VecScatter are analyzed and re-grouped. Even they are
copied to device eventually, they are likely not in their original form.
So, copying the indices from device to host and build a VecScatter there
seems the easiest approach.

The Kokkos-related functions are experimental. We need to decide whether
they are good or not.

--Junchao Zhang


On Fri, Feb 19, 2021 at 4:32 AM Patrick Sanan 
wrote:

> Thanks! That helps a lot.
>
> I assume "no," but is ISCUDA simple to add?
>
> More on what I'm trying to do, in case I'm missing an obvious approach:
>
> I'm working on a demo code that uses an external library, based on Kokkos,
> as a solver - I create a Vec of type KOKKOS and populate it with the
> solution data from the library, by getting access to the raw Kokkos view
> with VecKokkosGetDeviceView() * .
>
> I then want to reorder that solution data into PETSc-native ordering (for
> a velocity-pressure DMStag), so I create a pair of ISs and a VecScatter to
> do that.
>
> The issue is that to create this scatter, I need to use information
> (essentially, an element-to-index map) from the external library's
> mesh-management object, which lives on the device. This doesn't work (when
> host != device), because of course the ISs live on the host and to create
> them I need to provide host arrays of indices.
>
> Am I stuck, for now, with sending the index information information from
> the device to the host, using it to create the IS, and then having
> essentially the same information go back to the device when I use the
> scatter?
>
> * As an aside, it looks like some of these Kokkos-related functions and
> types are missing man pages - if you have time to add them, even as stubs,
> that'd be great (if not let me know and I'll just try to formally do it, so
> that at least the existence of the functions in the API is reflected on the
> website).
>
> Am 18.02.2021 um 23:17 schrieb Junchao Zhang :
>
>
> On Thu, Feb 18, 2021 at 4:04 PM Fande Kong  wrote:
>
>>
>>
>> On Thu, Feb 18, 2021 at 1:55 PM Junchao Zhang 
>> wrote:
>>
>>> VecScatter (i.e., SF, the two are the same thing) setup (building
>>> various index lists, rank lists) is done on the CPU.  is1, is2 must be host
>>> data.
>>>
>>
>> Just out of curiosity, is1 and is2 can not be created on a GPU device in
>> the first place? That being said, it is technically impossible? Or we just
>> did not implement them yet?
>>
> Simply because we do not have an ISCUDA class.
>
>
>>
>> Fande,
>>
>>
>>> When the SF is used to communicate device data, indices are copied to
>>> the device..
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan 
>>> wrote:
>>>
>>>> I'm trying to understand how VecScatters work with GPU-native Kokkos
>>>> Vecs.
>>>>
>>>> Specifically, I'm interested in what will happen in code like in
>>>> src/vec/vec/tests/ex22.c,
>>>>
>>>> ierr = VecScatterCreate(x,is1,y,is2,);CHKERRQ(ierr);
>>>>
>>>> (from
>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44
>>>> )
>>>>
>>>> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the
>>>> command line. But is1 and is2 are (I think), always
>>>> CPU/host data. Assuming that the scatter itself can happen on the GPU,
>>>> the indices must make it to the device somehow - are they copied there when
>>>> the scatter is created? Is there a way to create the scatter using indices
>>>> already on the GPU (Maybe using SF more directly)?
>>>>
>>>>
>


Re: [petsc-dev] Understanding Vecscatter with Kokkos Vecs

2021-02-18 Thread Junchao Zhang
On Thu, Feb 18, 2021 at 4:04 PM Fande Kong  wrote:

>
>
> On Thu, Feb 18, 2021 at 1:55 PM Junchao Zhang 
> wrote:
>
>> VecScatter (i.e., SF, the two are the same thing) setup (building various
>> index lists, rank lists) is done on the CPU.  is1, is2 must be host data.
>>
>
> Just out of curiosity, is1 and is2 can not be created on a GPU device in
> the first place? That being said, it is technically impossible? Or we just
> did not implement them yet?
>
Simply because we do not have an ISCUDA class.


>
> Fande,
>
>
>> When the SF is used to communicate device data, indices are copied to the
>> device..
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan 
>> wrote:
>>
>>> I'm trying to understand how VecScatters work with GPU-native Kokkos
>>> Vecs.
>>>
>>> Specifically, I'm interested in what will happen in code like in
>>> src/vec/vec/tests/ex22.c,
>>>
>>> ierr = VecScatterCreate(x,is1,y,is2,);CHKERRQ(ierr);
>>>
>>> (from
>>> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44
>>> )
>>>
>>> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the
>>> command line. But is1 and is2 are (I think), always
>>> CPU/host data. Assuming that the scatter itself can happen on the GPU,
>>> the indices must make it to the device somehow - are they copied there when
>>> the scatter is created? Is there a way to create the scatter using indices
>>> already on the GPU (Maybe using SF more directly)?
>>>
>>>


Re: [petsc-dev] Understanding Vecscatter with Kokkos Vecs

2021-02-18 Thread Junchao Zhang
VecScatter (i.e., SF, the two are the same thing) setup (building various
index lists, rank lists) is done on the CPU.  is1, is2 must be host data.
When the SF is used to communicate device data, indices are copied to the
device..

--Junchao Zhang


On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan 
wrote:

> I'm trying to understand how VecScatters work with GPU-native Kokkos Vecs.
>
> Specifically, I'm interested in what will happen in code like in
> src/vec/vec/tests/ex22.c,
>
> ierr = VecScatterCreate(x,is1,y,is2,);CHKERRQ(ierr);
>
> (from
> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44)
>
> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the
> command line. But is1 and is2 are (I think), always
> CPU/host data. Assuming that the scatter itself can happen on the GPU, the
> indices must make it to the device somehow - are they copied there when the
> scatter is created? Is there a way to create the scatter using indices
> already on the GPU (Maybe using SF more directly)?
>
>


Re: [petsc-dev] aijkokkos solvers

2020-12-24 Thread Junchao Zhang
When is the deadline of your SC paper?
--Junchao Zhang


On Thu, Dec 24, 2020 at 6:44 PM Mark Adams  wrote:

> It does not look like aijkokkos is equipped with solves the way
> aijcusparse is.
>
> I would like to get a GPU direct solver for an SC paper on the Landau
> stuff with Cuda and Kokkos backends. This would be a good opportunity to
> get the kinks out of our whole TS on GPUs and publish it.  We should equip
> Kokkos with solvers anyway. (Kokkos kernels also has a Gauss-Seidel which
> would be handy).
>
> Any thoughts?
> Mark
>


Re: [petsc-dev] Building PETSc on LLNL Lassen

2020-12-13 Thread Junchao Zhang
Jacob,
  Do you need to add  'CUDAFLAGS=-ccbin xlc++' to specify the host compiler
for CUDA? Note in cuda.py I added

if self.compilers.cxxdialect in ['C++11','C++14']: #nvcc is a C++ compiler
so it is always good to add -std=xxx. It is even crucial when using thrust
complex (see MR 2822)
self.setCompilers.CUDAFLAGS += ' -std=' + self.compilers.cxxdialect.lower()

 In your configure.log, there are

#define PETSC_HAVE_CXX_DIALECT_CXX11 1
#define PETSC_HAVE_CXX_DIALECT_CXX14 1


I guess without -ccbin, nvcc uses gcc by default and your gcc does not
support C++14.

--Junchao Zhang


On Sun, Dec 13, 2020 at 1:25 PM Jacob Faibussowitsch 
wrote:

> Hello All,
>
> Does anyone have any experience building petsc with cuda support on
> Lassen? I’ve been having trouble building with ibm xl compilers +
> spectrum-mpi + nvcc. NVCC seems to not like -std=c++14 argument,
> complaining that its configured host compiler doesn’t support it, but
> compiling the following “test.cc":
>
> #include 
>
> int main(int argc, char **argv)
> {
>
>   int i = 1;
>   i += argc;
>   return(i);
> }
>
> With mpicc -std=c++14 test.cc produces zero errors.
> 
>
> Modules loaded:
>
> module load xl/2020.11.12-cuda-11.1.1
>
> module load spectrum-mpi
> module load cuda/11.1.1
> module load python/3.8.2
> module load cmake
> module load valgrind
> module load lapack
>
> My configure commands:
>
> ./configure  --with-cc=mpicc --with-cxx=mpiCC --with-fc=mpifort
> --with-cuda --with-debugging=1 PETSC_ARCH=arch-linux-c-debug
>
> The error:
>
> TESTING: findMPIInc from
> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:636)
>   
> ***
>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
> details):
>
> ---
> Bad compiler flag:
> -I/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include
>
> ***
>
> The actual configure.log error:
>
> Executing: nvcc -c -o
> /var/tmp/petsc-2v0k4k61/config.setCompilers/conftest.o
> -I/var/tmp/petsc-2v0k4\
> k61/config.setCompilers -I/var/tmp/petsc-2v0k4k61/config.types  -g
> -std=c++14 -I/usr/tce/packages/s\
> pectrum-mpi/ibm/spectrum-mpi-rolling-release/include  
> -Wno-deprecated-gpu-targets
> /var/tmp/petsc-2v\
> 0k4k61/config.setCompilers/conftest.cu
> Possible ERROR while running compiler:
> stderr:
> nvcc warning : The -std=c++14 flag is not supported with the configured
> host compiler. Flag will be\
>  ignored.
> Source:
> #include "confdefs.h"
> #include "conffix.h"
>
> int main() {
> ;
>   return 0;
> }
>   Rejecting compiler flag
> -I/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include  due
> to
> nvcc warning : The -std=c++14 flag is not supported with the configured
> host compiler. Flag will be ignored.
>
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>


Re: [petsc-dev] cusparse error

2020-12-09 Thread Junchao Zhang
Could be GPU resource competition. Note this test uses nsize=8.
--Junchao Zhang


On Wed, Dec 9, 2020 at 7:15 PM Mark Adams  wrote:

> And this is a Cuda 11 complex build:
> https://gitlab.com/petsc/petsc/-/jobs/901108135
>
> On Wed, Dec 9, 2020 at 8:11 PM Mark Adams  wrote:
>
>> My MR is generating an error. Tee error message says cusparse has not
>> been initialized, so I added a cuparse init, but I still get the error
>> (appended, *adams/landau-gpu-assembly
>> <https://gitlab.com/petsc/petsc/-/tree/adams/landau-gpu-assembly>*).
>> Any ideas would be appreciated.
>>
>> I am trying to reproduce this on Summit and it fails with a timeout limit
>> of 60s, but it only runs for a few seconds (see timers). Any ideas?
>>
>> 19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test
>> search='ksp_ksp_tutorials-ex71_bddc_cusparse'
>> PETSC_ARCH=arch-summit-opt-gnu-cuda
>> Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda
>> search=ksp_ksp_tutorials-ex71_bddc_cusparse
>> TEST
>> arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of
>> 60 s
>>  ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff
>>
>> # -
>> #   Summary
>> # -
>> # FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
>> # success 0/1 tests (0.0%)
>> # failed 1/1 tests (100.0%)
>> # todo 0/1 tests (0.0%)
>> # skip 0/1 tests (0.0%)
>> #
>> # Wall clock time for tests: 3 sec
>> # Approximate CPU time (not incl. build time): 3.14 sec
>>
>>
>>
>>
>>
>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
>> 2391 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2391># [1]PETSC
>> ERROR: - Error Message
>> --
>> 2392 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2392># [1]PETSC
>> ERROR: GPU error
>> 2393 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2393># [1]PETSC
>> ERROR: cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization
>> error
>> 2394 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2394># [1]PETSC
>> ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> 2395 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2395># [1]PETSC
>> ERROR: Petsc Development GIT revision: v3.14.2-85-gd60087d GIT Date:
>> 2020-12-09 17:49:59 -0500
>> 2396 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2396># [1]PETSC
>> ERROR: ../ex71 on a named frog by petsc Wed Dec 9 18:41:10 2020
>> 2397 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2397># [1]PETSC
>> ERROR: Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs
>> --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g
>> -O" --with-scalar-type=complex --with-precision=single
>> --with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
>> 2398 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2398># [1]PETSC
>> ERROR: #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in
>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>> aijcusparse.cu
>> 2399 <https://gitlab.com/petsc/petsc/-/jobs/901108135#L2399># [1]PETSC
>> ERROR: #2 MatCreate_SeqAIJCUSPARSE() line 2739 in
>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>> aijcusparse.cu
>>
>


Re: [petsc-dev] Job openings

2020-11-20 Thread Junchao Zhang
I think we can just send to both petsc-announce and petsc-users. First
there are not many such emails.  Second, if there are, users should be
happy to see that.
I receive 10+ ad emails daily and I don't mind receiving extra 5 emails
monthly :)

--Junchao Zhang

On Fri, Nov 20, 2020 at 7:27 PM Barry Smith  wrote:

>
>   PETSc announce has more people than petsc-users but it is not clear that
> everyone on petsc-users is on petsc-announce. Everyone should join
> petsc-announce but they may not.
>
>   We could send them to both with the same label but then many people will
> get two emails which is annoying.
>
>   Maybe use the labels   [PETSc Job opening] and [PETSc Release] to give
> people an easier filter.
>
>
>An approach which is probably not simple is that anything sent to
> petsc-announce is also sent to everyone on petsc-users who IS NOT on
> petsc-announce so everyone gets only exactly one copy regardless of whether
> they are on both or either.
>
>1)  Maybe we could just manually remove everyone from announce who is
> in users and make sure that anything sent to announce also gets sent to
> users.
> 2)  Or whenever anyone joins users we sign them up for announce
> automatically and then only send such message to announce (the webpage
> could indicate you will
>   automatically also be added to announce. This seems the least
> painful, but then someone now needs to add to announce everyone who is on
> users but not
>   on announce.
>
>People could get fancy with filters to get only one copy but that is
> obnoxious to expect them to do that.
>
>Barry
>
>
>
> On Nov 20, 2020, at 1:27 PM, Junchao Zhang 
> wrote:
>
> The usefulness depends on how many users subscribe to petsc-announce.
>
> Since there are not many such emails, I think it is fine to send to
> petsc-users. And in these emails, we can always add a link to a job section
> on the petsc website.  Once petsc users get used to this, they may go to
> the website later when they are finding jobs.
>
> --Junchao Zhang
>
>
> On Fri, Nov 20, 2020 at 1:04 PM Matthew Knepley  wrote:
>
>> That is a good idea. Anyone against this?
>>
>>   Thanks,
>>
>> Matt
>>
>> On Fri, Nov 20, 2020 at 1:26 PM Barry Smith  wrote:
>>
>>>
>>>   Maybe something as simple for petsc-announce
>>>
>>>  Subject:[Release] 
>>>  Subject:[Job opening] 
>>>
>>>Then when you send out the most recent job opening you can include in
>>> the message something like
>>>
>>> "The PETSc announce mailing list will continue to be low volume. We
>>> will now tag each message in the subject line with [Release], [Job
>>> opening],  or possibly other tags so you can have your mail program filter
>>> out messages you are not interested in.
>>>
>>> Thanks for your continued support,"
>>>
>>>
>>>
>>> On Nov 20, 2020, at 9:45 AM, Matthew Knepley  wrote:
>>>
>>> I got the second email in less than one month about sending a job
>>> opening to the PETSc list.
>>>
>>> 1) Should we have some policy about this?
>>>
>>> I think we should encourage it, but in a way that does not produce noise
>>> for people. I think there are no other good outlets for computational jobs.
>>>
>>> 2) Should we have a section of the website for this?
>>>
>>> I would like something that just selected some petsc-users mail from the
>>> archive with a query in the URL.
>>>
>>> 3) If we encourage it, should we have a special header for job posts in
>>> the mailing list?
>>>
>>> This would facilitate 2).
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>


Re: [petsc-dev] Job openings

2020-11-20 Thread Junchao Zhang
The usefulness depends on how many users subscribe to petsc-announce.

Since there are not many such emails, I think it is fine to send to
petsc-users. And in these emails, we can always add a link to a job section
on the petsc website.  Once petsc users get used to this, they may go to
the website later when they are finding jobs.

--Junchao Zhang


On Fri, Nov 20, 2020 at 1:04 PM Matthew Knepley  wrote:

> That is a good idea. Anyone against this?
>
>   Thanks,
>
> Matt
>
> On Fri, Nov 20, 2020 at 1:26 PM Barry Smith  wrote:
>
>>
>>   Maybe something as simple for petsc-announce
>>
>>  Subject:[Release] 
>>  Subject:[Job opening] 
>>
>>Then when you send out the most recent job opening you can include in
>> the message something like
>>
>> "The PETSc announce mailing list will continue to be low volume. We
>> will now tag each message in the subject line with [Release], [Job
>> opening],  or possibly other tags so you can have your mail program filter
>> out messages you are not interested in.
>>
>> Thanks for your continued support,"
>>
>>
>>
>> On Nov 20, 2020, at 9:45 AM, Matthew Knepley  wrote:
>>
>> I got the second email in less than one month about sending a job opening
>> to the PETSc list.
>>
>> 1) Should we have some policy about this?
>>
>> I think we should encourage it, but in a way that does not produce noise
>> for people. I think there are no other good outlets for computational jobs.
>>
>> 2) Should we have a section of the website for this?
>>
>> I would like something that just selected some petsc-users mail from the
>> archive with a query in the URL.
>>
>> 3) If we encourage it, should we have a special header for job posts in
>> the mailing list?
>>
>> This would facilitate 2).
>>
>>   Thanks,
>>
>>  Matt
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] [petsc-users] new book introducing PETSc for PDEs

2020-10-30 Thread Junchao Zhang
Ed,
  I agree with all what you said.  My thought is we don't need to add each
of your examples into corresponding src/XX/tutorials/.  Your repo can be a
standalone directory and we just need PETSc CI to be able to run them.
--Junchao Zhang


On Fri, Oct 30, 2020 at 9:00 PM Ed Bueler  wrote:

> Junchao--
>
> > I was wondering if it is feasible to add your example programs to PETSc
> tests so that readers will always be able to run your code.
>
> Thanks for asking.  There was a deliberate idea here, which I want
> to explain, and the petsc-dev list is the right spot.  (Sorry if this is
> more than you want to know.)
>
> First, the example programs are in a completely public spot:
> https://github.com/bueler/p4pdes
> No one needs to own the book to run the codes, for example.  I welcome
> corrections/feedback/improvements through the issues interface at that
> repo, whether or not connected they are connected to the book text.
>
> In fact, here are things one may observe about the petsc tutorial
> src/XX/tutorials/exN.c) examples:
>
> 1.  They may or may not point to a clear document(s) which can help a
> beginner know how they are designed.
> 2.  They don't have a uniform style because of different authorship.
> 3.  They are not ordered by difficulty in any clear way.  (E.g. ex1.c may
> not be the best example to start with, and the beginner would not be able
> to grep to find "easy" even if they can find some function from the API
> that way.)
> 4.  Their features evolve over time as developers work with the examples
> as regression/feature tests.  (See src/snes/tutorials/ex5.c.)
>
> Note that all of these facts are exactly what petsc devs would want!  That
> is, the way the set of examples in the petsc tree are structured helps with
> fast development by a diverse dev team.
>
> However the same facts make the examples less friendly to those who don't
> already know petsc.  Thus my opinion about the book's example codes is that
> a single source of stable examples, ordered by difficulty, closely tied to
> beginner documentation, of uniform style, and kinda boring to most petsc
> devs, is something I can supply and maintain.  So I'll be acting as editor
> to preserve the intent and simplicity of the examples.
>
> Does that make sense?
>
> Needless to say, fork my repo all you want!  The MIT license is nice and
> permissive.
>
> Ed
>
>
>
> On Fri, Oct 30, 2020 at 4:50 PM Junchao Zhang 
> wrote:
>
>> Prof. Ed Bueler,
>>Congratulations on your book. I am eager to read it.
>>I was wondering if it is feasible to add your example programs to
>> PETSc tests so that readers will always be able to run your code.
>> --Junchao Zhang
>>
>>
>> On Thu, Oct 29, 2020 at 8:29 PM Ed Bueler  wrote:
>>
>>> All --
>>>
>>> SIAM Press just published my new book "PETSc for Partial Differential
>>> Equations: Numerical Solutions in C and Python":
>>>
>>>   https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137
>>>
>>> The book is available both as a paperback and an e-book with working
>>> links.  A SIAM member discount is available, of course.
>>>
>>> This book is a genuine introduction which does not assume you have used
>>> PETSc before, and which should make sense even if your differential
>>> equations knowledge is basic.  The prerequisites are a bit of programming
>>> in C and a bit of numerical linear algebra, roughly like the main ideas of
>>> Trefethen and Bau, but even that is reviewed and summarized.  I've made an
>>> effort to introduce discretizations from the beginning, especially finite
>>> differences and elements.
>>>
>>> The book is based on a collection of example programs at
>>> https://github.com/bueler/p4pdes.  Most of these codes call PETSc
>>> directly through the C API, but the last two chapters have Python codes
>>> using UFL and Firedrake.  Nonetheless the book contains ideas, mathematical
>>> and computational; it complements, but does not replace, the PETSc User's
>>> Manual and the tutorial examples in the PETSc source.  Concepts are
>>> explained and illustrated, with sufficient context to facilitate further
>>> development. Performance (optimality) and parallel scalability are the
>>> primary goals, so preconditioners including multigrid are central threads,
>>> and run-time solver options are explored in both the text and the exercises.
>>>
>>> Here is the place to appreciate the usual PETSc suspects for their
>>> comments on drafts, and help in writing this book: Barry, Jed, Matt, Dave,
>>> Rich, Lois, Patrick, Mark, Satish, David K., and many others.  Also let me
>>> say that SIAM Press has nothing but professionals who are nice to work with
>>> too; send them your book idea!
>>>
>>> Ed
>>>
>>> --
>>> Ed Bueler
>>> Dept of Mathematics and Statistics
>>> University of Alaska Fairbanks
>>> Fairbanks, AK 99775-6660
>>> 306C Chapman
>>>
>>
>
> --
> Ed Bueler
> Dept of Mathematics and Statistics
> University of Alaska Fairbanks
> Fairbanks, AK 99775-6660
> 306C Chapman
>


Re: [petsc-dev] [petsc-users] new book introducing PETSc for PDEs

2020-10-30 Thread Junchao Zhang
Prof. Ed Bueler,
   Congratulations on your book. I am eager to read it.
   I was wondering if it is feasible to add your example programs to PETSc
tests so that readers will always be able to run your code.
--Junchao Zhang


On Thu, Oct 29, 2020 at 8:29 PM Ed Bueler  wrote:

> All --
>
> SIAM Press just published my new book "PETSc for Partial Differential
> Equations: Numerical Solutions in C and Python":
>
>   https://my.siam.org/Store/Product/viewproduct/?ProductId=32850137
>
> The book is available both as a paperback and an e-book with working
> links.  A SIAM member discount is available, of course.
>
> This book is a genuine introduction which does not assume you have used
> PETSc before, and which should make sense even if your differential
> equations knowledge is basic.  The prerequisites are a bit of programming
> in C and a bit of numerical linear algebra, roughly like the main ideas of
> Trefethen and Bau, but even that is reviewed and summarized.  I've made an
> effort to introduce discretizations from the beginning, especially finite
> differences and elements.
>
> The book is based on a collection of example programs at
> https://github.com/bueler/p4pdes.  Most of these codes call PETSc
> directly through the C API, but the last two chapters have Python codes
> using UFL and Firedrake.  Nonetheless the book contains ideas, mathematical
> and computational; it complements, but does not replace, the PETSc User's
> Manual and the tutorial examples in the PETSc source.  Concepts are
> explained and illustrated, with sufficient context to facilitate further
> development. Performance (optimality) and parallel scalability are the
> primary goals, so preconditioners including multigrid are central threads,
> and run-time solver options are explored in both the text and the exercises.
>
> Here is the place to appreciate the usual PETSc suspects for their
> comments on drafts, and help in writing this book: Barry, Jed, Matt, Dave,
> Rich, Lois, Patrick, Mark, Satish, David K., and many others.  Also let me
> say that SIAM Press has nothing but professionals who are nice to work with
> too; send them your book idea!
>
> Ed
>
> --
> Ed Bueler
> Dept of Mathematics and Statistics
> University of Alaska Fairbanks
> Fairbanks, AK 99775-6660
> 306C Chapman
>


Re: [petsc-dev] Kokkos initialize

2020-10-16 Thread Junchao Zhang
Let me have a look.  cupminit.inc is a template for CUDA and HIP. It is OK
if you see some symbols twice.
--Junchao Zhang


On Fri, Oct 16, 2020 at 8:22 AM Mark Adams  wrote:

> Junchao, I see this in cupminit.inc (twice)
>
> #if defined(PETSC_HAVE_KOKKOS)
> ierr = PetscKokkosInitialize_Private();CHKERRQ(ierr);
> PetscBeganKokkos = PETSC_TRUE;
> #endif
>
> And I see
>
> ierr = PetscKokkosInitializeCheck();CHKERRQ(ierr);
>
> In the Kokkos operators.
>
> Are these redundant?
>
> On Thu, Oct 15, 2020 at 10:44 PM Mark Adams  wrote:
>
>> Si it seems like these two calls in cupminit.inc are
>> inconsistent with lazy:
>>
>> 22:41 adams/gamg-reduce-opt-cuda *= ~/petsc$ git grep PetscBeganKokkos
>> src/sys/objects/cupminit.inc:PetscBeganKokkos = PETSC_TRUE;
>> src/sys/objects/cupminit.inc:PetscBeganKokkos = PETSC_TRUE;
>>
>> I can do an MR to remove these if that is the case.
>>
>> Mark
>>
>> On Thu, Oct 15, 2020 at 8:34 PM Barry Smith  wrote:
>>
>>>
>>>
>>>   I thought the plan was that Kokkos also had a lazy initialization but
>>> perhaps it does not and needs to be fixed.
>>>
>>>   Barry
>>>
>>> > On Oct 15, 2020, at 6:49 PM, Mark Adams  wrote:
>>> >
>>> > I am running a on SUMMIt with a Kokkos cuda configuration and while
>>> debugging with ddt I noticed that it spent a long time in KokkosInit, but I
>>> was not using Kokkos. KokkosInit was call in PETSc's GPU init, which seems
>>> logical enough, but it would be better if it is not called if you are not
>>> using Kokkos.
>>> >
>>> > I recall seeing places where Kokkos is checked when calling a Kokkos
>>> method (ie, lazy initialization). Do we have policy on whether we are being
>>> lazy with KokkosInit or not?
>>> >
>>> > Mark
>>>
>>>


Re: [petsc-dev] Kokkos error on SUMMIT

2020-10-02 Thread Junchao Zhang
On Fri, Oct 2, 2020 at 3:02 PM Junchao Zhang 
wrote:

>
>
> On Fri, Oct 2, 2020 at 2:59 PM Mark Adams  wrote:
>
>>
>>
>> On Fri, Oct 2, 2020 at 3:15 PM Barry Smith  wrote:
>>
>>>
>>>   Mark,
>>>
>>>   Looks like you are building Kokkos without CUDA.
>>
>>
>> Yes. This is a CPU build of Kokkos.
>>
>>
>>> You don't have --with-cuda on configure line that is used by Kokkos to
>>> determine what version to build.
>>>
>>>   Junchao,
>>>
>>>   I guess you need to test Kokkos Kernels without CUDA and HIP and make
>>> a few changes.
>>>
>>
>> I'm trying with OpenMP right now. If you want CPU runs then asking for
>> OMP is not terrible.
>>
>>
> That is an interesting feature of Kokkos.
>
By the design, even 'device' memory is now the host memory. The computation
is still diverted to Kokkos backend.  In other words, petsc supports
multithreading through Kokkos.


>
>
>>
>>>   Barry
>>>
>>>
>>>
>>>
>>> #if defined(PETSC_HAVE_CUDA)
>>>   #define WaitForKokkos() PetscCUDASynchronize ? (Kokkos::fence(),0) : 0;
>>> #elif defined(PETSC_HAVE_HIP)
>>>   #define WaitForKokkos() PetscHIPSynchronize ? (Kokkos::fence(),0) : 0;
>>> #endif
>>>
>>>
>>>
>>> > On Oct 2, 2020, at 11:47 AM, Mark Adams  wrote:
>>> >
>>> >
>>> > 
>>>
>>>


Re: [petsc-dev] Kokkos error on SUMMIT

2020-10-02 Thread Junchao Zhang
On Fri, Oct 2, 2020 at 2:59 PM Mark Adams  wrote:

>
>
> On Fri, Oct 2, 2020 at 3:15 PM Barry Smith  wrote:
>
>>
>>   Mark,
>>
>>   Looks like you are building Kokkos without CUDA.
>
>
> Yes. This is a CPU build of Kokkos.
>
>
>> You don't have --with-cuda on configure line that is used by Kokkos to
>> determine what version to build.
>>
>>   Junchao,
>>
>>   I guess you need to test Kokkos Kernels without CUDA and HIP and make a
>> few changes.
>>
>
> I'm trying with OpenMP right now. If you want CPU runs then asking for OMP
> is not terrible.
>
>
That is an interesting feature of Kokkos.


>
>>   Barry
>>
>>
>>
>>
>> #if defined(PETSC_HAVE_CUDA)
>>   #define WaitForKokkos() PetscCUDASynchronize ? (Kokkos::fence(),0) : 0;
>> #elif defined(PETSC_HAVE_HIP)
>>   #define WaitForKokkos() PetscHIPSynchronize ? (Kokkos::fence(),0) : 0;
>> #endif
>>
>>
>>
>> > On Oct 2, 2020, at 11:47 AM, Mark Adams  wrote:
>> >
>> >
>> > 
>>
>>


Re: [petsc-dev] CI format failure

2020-09-24 Thread Junchao Zhang
It is better the tool can also print out line and column numbers and
reasons why it is wrong.

--Junchao Zhang


On Thu, Sep 24, 2020 at 11:16 AM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> The relevant part:
>
> No space after if, for or while -
> include/petscaijdevice.h
>
> Satish
>
> On Thu, 24 Sep 2020, Mark Adams wrote:
>
> > I have a failure here but I can't see what the problem is:
> >
> > https://gitlab.com/petsc/petsc/-/jobs/755956828
> >
> > Am I missing something?
> >
> > Thanks,
> > Mark
> >
>
>


Re: [petsc-dev] MPI derived datatype use in PETSc

2020-09-23 Thread Junchao Zhang
DMPlex has MPI_Type_create_struct().  But for matrices and vectors, we only
use MPIU_SCALAR.

In petsc, we always pack non-contiguous data before calling MPI, since most
indices are irregular. Using MPI_Type_indexed() etc probably does not
provide any benefit.
The only place I can think of that can benefit from derived data types is
in DMDA. The ghost points can be described with MPI_Type_vector(). We can
save the packing/unpacking and associated buffers.

--Junchao Zhang


On Wed, Sep 23, 2020 at 12:30 PM Victor Eijkhout 
wrote:

> The Ohio mvapich people are working on getting better performance out of
> MPI datatypes. I notice that there are 5 million lines in the petsc source
> that reference MPI datatypes. So just as a wild guess:
>
> Optimizations on MPI Datatypes seem to be beneficial mostly if you’re
> sending blocks of at least a kilobyte each. Is that a plausible usage
> scenario? What is the typical use of MPI Datatypes in PETSc, and what type
> of datatype would most benefit from optimization?
>
> Victor.


Re: [petsc-dev] Question about MPICH device we use

2020-07-23 Thread Junchao Zhang
On Thu, Jul 23, 2020 at 11:35 PM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> On Thu, 23 Jul 2020, Jeff Hammond wrote:
>
> > Open-MPI refuses to let users over subscribe without an extra flag to
> > mpirun.
>
> Yes - and when using this flag - it lets the run through - but there is
> still performance degradation in oversubscribe mode.
>
> > I think Intel MPI has an option for blocking poll that supports
> > oversubscription “nicely”.
>
> What option is this? Is it compile time option or something for mpiexec?
>
I only found configure time options,
  --enable-nemesis-dbg-nolocal, alias for --enable-dbg-nolocal
  --enable-dbg-nolocalenables debugging mode where
shared-memory communication is disabled

Satish
>
> > MPICH might have a “no local” option that
> > disables shared memory, in which case nemesis over libfabric with the
> > sockets or TCP provider _might_ do the right thing. But you should ask
> > MPICH people for details.
> >
> > Jeff
> >
> > On Thu, Jul 23, 2020 at 12:40 PM Jed Brown  wrote:
> >
> > > I think we should default to ch3:nemesis when --download-mpich, and
> only
> > > do ch3:sock when requested (which we would do in CI).
> > >
> > > Satish Balay via petsc-dev  writes:
> > >
> > > > Primarily because ch3:sock performance does not degrade in
> oversubscribe
> > > mode - which is developer friendly - i.e on your laptop.
> > > >
> > > > And folks doing optimized runs should use a properly tuned MPI for
> their
> > > setup anyway.
> > > >
> > > > In this case --download-mpich-device=ch3:nemesis is likely
> appropriate
> > > if using --download-mpich [and not using a separate/optimized MPI]
> > > >
> > > > Having defaults that satisfy all use cases is not practical.
> > > >
> > > > Satish
> > > >
> > > > On Wed, 22 Jul 2020, Matthew Knepley wrote:
> > > >
> > > >> We default to ch3:sock. Scott MacLachlan just had a long thread on
> the
> > > >> Firedrake list where it ended up that reconfiguring using
> ch3:nemesis
> > > had a
> > > >> 2x performance boost on his 16-core proc, and noticeable effect on
> the 4
> > > >> core speedup.
> > > >>
> > > >> Why do we default to sock?
> > > >>
> > > >>   Thanks,
> > > >>
> > > >>  Matt
> > > >>
> > > >>
> > >
> >
>


Re: [petsc-dev] How do I see the gcov results?

2020-06-25 Thread Junchao Zhang
No. That is the plan. Petsc's script gcov.py works correctly and we need to
move it to codecov.io.

--Junchao Zhang


On Thu, Jun 25, 2020 at 9:34 AM Aagaard, Brad T via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> Are you opposed to using codecov.io to compile the results and generate
> plots?
>
> Brad
>
> On 6/24/20, 4:17 PM, "petsc-dev on behalf of Scott Kruger" <
> petsc-dev-boun...@mcs.anl.gov on behalf of kru...@txcorp.com> wrote:
>
>
>
> For more detail, Stage 4 of the pipeline ("analyze-pipeline") has all
> of
> the gcov data and you can download it from the right side after
> clicking
> "Download" from "Job Artifacts" tab.  This is handled by the
> .gitlab-ci.yml file (search for gcov).
>
> If someone knows how gcov outputs it's data and how to upgrade the
> lib/petsc/bin/maint/gcov.py to read in the data as gitlab organizes it
> and then output the html/figures, then we'd have it done (locally.  To
> upload to wiki or other gitlab display would require more work on the
> gitlab-ci.yml file).
>
> I spent quite a few hours on it, and got stuck.  It requires
> understanding gcov to a degree that was interfering with other
> priorities.
>
> If someone has the knowledge or inclination, it's a good problem to
> solve.
>
> Scott
>
>
>
> On 6/24/20 2:39 PM, Satish Balay via petsc-dev wrote:
> > Its not yet setup in the current CI
> >
> > Satish
> >
> > On Wed, 24 Jun 2020, Matthew Knepley wrote:
> >
> >> Thanks,
> >>
> >> Matt
> >>
> >>
>
> --
> Tech-X Corporation   kru...@txcorp.com
> 5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
> Boulder, CO 80303Fax:   (303) 448-7756
>
>


Re: [petsc-dev] Prediscusion of appropriate communication tool for discussion of PETSc 4 aka the Grand Refactorization

2020-06-18 Thread Junchao Zhang
A dedicated mailing list has all these functionalities and is easier to see
discussion threads.
--Junchao Zhang


On Thu, Jun 18, 2020 at 9:27 PM Barry Smith  wrote:

>
>I'd like to start a discussion of PETSc 4.0 aka the Grand
> Refactorization but to have that discussion we need to discuss what tool to
> use for that discussion.
>
>So this discussion is not about PETSc 4.0, please don't discuss it here.
>
>What do people recommend to use for the discussion
>
>   * dedicated mailing list
>   * slack channel(s)
>   * zulip channel(s)
>   * something else?
>
>   I'd like a single tool that anyone can join at any time, see the full
> history, can attach files, search, not cost more money the we are already
> paying, etc.
>
>   I expect this discussion to take maybe a week and then the actual
> discussion to take on the order of two months.
>
>Thanks
>
>  Barry
>
>


Re: [petsc-dev] https://developer.nvidia.com/nccl

2020-06-16 Thread Junchao Zhang
It should be renamed as NCL (NVIDIA Communications Library) as it adds
point-to-point, in addition to collectives. I am not sure whether to
implement it in petsc as none exscale machine uses nvidia GPUs.

--Junchao Zhang


On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley  wrote:

> It would seem to make more sense to just reverse-engineering this as
> another MPI impl.
>
>Matt
>
> On Tue, Jun 16, 2020 at 6:22 PM Barry Smith  wrote:
>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-dev] Preferred Windows Installation

2020-06-15 Thread Junchao Zhang
On Mon, Jun 15, 2020 at 8:33 PM Jacob Faibussowitsch 
wrote:

> And if one needs windows native/libraries - then dealing with windows and
> its quirks is unavoidable.
>
> WSL2 allows you to run windows binaries natively inside WSL I believe
> https://docs.microsoft.com/en-us/windows/wsl/interop#run-windows-tools-from-linux
>  without
> breaking the illusion of linux.
>
The website you give only says "Run Windows tools from Linux" (such as
notepad), not "any windows binaries/apps".


>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
> On Jun 15, 2020, at 8:27 PM, Satish Balay  wrote:
>
> And if one needs windows native/libraries - then dealing with windows and
> its quirks is unavoidable. Its orthogonal to cygwin.
>
> Satish
>
> On Mon, 15 Jun 2020, Satish Balay via petsc-dev wrote:
>
> Sure - if WSL is sufficient for a use case that is fine. Its a simpler way
> to install something equivalent to a Linux VM on windows.
>
> cygwin instructions are for building native windows binaries with MS
> compilers. [usable with other MS native libraries]. If this is not the
> use-case - its easier to just use linux - or linux equvalent like WSL
>
> Satish
>
>
> On Mon, 15 Jun 2020, Jacob Faibussowitsch wrote:
>
> Hello All,
>
> Having recently had to assist a coworker in setting up a petsc install on
> windows and running into a whole host of issues with getting Cygwin and an
> overly aggressive windows defender (of all things) to play nice I
> discovered WSL, specifically WSL2. With regards to ease-of-use and install
> time, WSL2 was by far easier to do than Cygwin. The only out of the
> ordinary step required was turning on virtualization in the BIOS but this
> seems like it is not a common step, and after installing an ubuntu distro
> it was smooth sailing.
>
> The only performance hiccup that I have so far encountered when using WSL2
> is that I/O performance when pulling from the windows filesystem in
> /mnt/c/foo/bar is somewhat slower than just moving files within the VM
> itself, but in my opinion this is relatively minor. Additionally while
> there is no current way to use CUDA on WSL, NVIDIA has apparently already
> started a limited test-release for WSL2.
>
> Currently, from the installation page it seems like Cygwin is the
> preferred method of installing petsc on windows but if it is this easy to
> get things up and running with WSL2 (and the above performance qualms are
> satisfied) then we should consider making it the default.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
>
>
>
>
>


Re: [petsc-dev] Configure Downloads Wrong MUMPS

2020-06-06 Thread Junchao Zhang
In mumps.py, change
self.version = '5.3.1'
to
self.minversion = '5.2.1'

If we support older mumps, we can even lower the minverison.

--Junchao Zhang


On Sat, Jun 6, 2020 at 3:16 PM Jacob Faibussowitsch 
wrote:

> Hello All,
>
> As the title suggest configure downloads the wrong version of MUMPS
> (5.2.1) using —download-mumps even after —with-clean and manually rm -rf
> /petsc-arch/externalpackages/mumps_folder. The configure warning message
> says that 5.3 is the version that petsc is tested with. FWIW I am using
> MUMPS with slepc (but slepc is installed through petsc), so perhaps there
> is a version mismatch between what slepc and petsc expect?
>
> ===
>
>  Trying to download
> https://bitbucket.org/petsc/pkg-mumps/get/v5.2.1-p2.tar.gz for MUMPS
>   
> ===
> ===
>
> Warning: Using version 5.2.1 of package mumps PETSc is tested with 5.3
>
> Suggest using --download-mumps for a compatible MUMPS
>
>   
> ===
>
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
>


Re: [petsc-dev] Valgrind MPI-Related Errors

2020-06-02 Thread Junchao Zhang
I guess Jacob already used MPICH, since MPIDI_CH3_EagerContigShortSend() is
from MPICH.

--Junchao Zhang


On Tue, Jun 2, 2020 at 9:38 AM Satish Balay via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:

> use --download-mpich for valgrind.
>
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
> Satish
>
> On Tue, 2 Jun 2020, Karl Rupp wrote:
>
> > Hi Jacob,
> >
> > the recommendation in the past was to use MPICH as it is (was?)
> > valgrind-clean. Which MPI do you use? OpenMPI used to have these kinds of
> > issues. (My information might be outdated)
> >
> > Best regards,
> > Karli
> >
> > On 6/2/20 2:43 AM, Jacob Faibussowitsch wrote:
> > > Hello All,
> > >
> > > TL;DR: valgrind always complains about "Syscall param write(buf)
> points to
> > > uninitialised byte(s)” for a LOT of MPI operations in petsc code,
> making
> > > debugging using valgrind fairly annoying since I have to sort through
> a ton
> > > of unrelated stuff. I have built valgrind from source, used apt install
> > > valgrind, apt install valgrind-mpi to no avail.
> > >
> > > I am using valgrind from docker. Dockerfile is attached below as well.
> I
> > > have been unsuccessfully trying to resolve these local valgrind
> errors, but
> > > I am running out of ideas. Googling the issue has also not provided
> entirely
> > > applicable solutions. Here is an example of the error:
> > >
> > > $ make -f gmakefile test VALGRIND=1
> > > ...
> > > #==54610== Syscall param write(buf) points to uninitialised byte(s)
> > > #==54610==at 0x6F63317: write (write.c:26)
> > > #==54610==by 0x9056AC9: MPIDI_CH3I_Sock_write (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x9059FCD: MPIDI_CH3_iStartMsg (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x903F298: MPIDI_CH3_EagerContigShortSend (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x9049479: MPID_Send (in
> /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8FC9B2A: MPIC_Send (in
> /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8F86F2E: MPIR_Bcast_intra_binomial (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8EE204E: MPIR_Bcast_intra_auto (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8EE21F4: MPIR_Bcast_impl (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8F887FB: MPIR_Bcast_intra_smp (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8EE206E: MPIR_Bcast_intra_auto (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8EE21F4: MPIR_Bcast_impl (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x8EE2A6F: PMPI_Bcast (in
> /usr/local/lib/libmpi.so.12.1.8)
> > > #==54610==by 0x4B377B8: PetscOptionsInsertFile (options.c:525)
> > > #==54610==by 0x4B39291: PetscOptionsInsert (options.c:672)
> > > #==54610==by 0x4B5B1EF: PetscInitialize (pinit.c:996)
> > > #==54610==by 0x10A6BA: main (ex9.c:75)
> > > #==54610==  Address 0x1ffeffa944 is on thread 1's stack
> > > #==54610==  in frame #3, created by MPIDI_CH3_EagerContigShortSend
> (???:)
> > > #==54610==  Uninitialised value was created by a stack allocation
> > > #==54610==at 0x903F200: MPIDI_CH3_EagerContigShortSend (in
> > > /usr/local/lib/libmpi.so.12.1.8)
> > >
> > > There are probably 20 such errors every single time, regardless of
> what code
> > > is being run. I have tried using apt install valgrind, apt install
> > > valgrind-mpi, and building valgrind from source:
> > >
> > > # VALGRIND
> > > WORKDIR /
> > > RUN git clone git://sourceware.org/git/valgrind.git
> > > WORKDIR /valgrind
> > > RUN git pull
> > > RUN ./autogen.sh
> > > RUN ./configure --with-mpicc=/usr/local/bin/mpicc
> > > RUN make -j 5
> > > RUN make install
> > >
> > > None of the those approaches lead to these errors disappearing.
> Perhaps I am
> > > missing some funky MPI args?
> > >
> > > Best regards,
> > >
> > > Jacob Faibussowitsch
> > > (Jacob Fai - booss - oh - vitch)
> > > Cell: (312) 694-3391
> > >
> > >
> >
> >
>


Re: [petsc-dev] Should PetscSignalHandlerDefault avoid calling MPI_Abort?

2020-05-06 Thread Junchao Zhang
John,
  I had an MR at https://gitlab.com/petsc/petsc/-/merge_requests/2745.
Currently, we could not agree on a solution. The concern is if we do
_Exit() instead of MPI_Abort() in signal handler, then some MPI (batch
system) might not be able to kill all MPI processes.
  I prefer _Exit(), because it can solve the problem you reported (actually
happened).

--Junchao Zhang


On Wed, May 6, 2020 at 10:22 AM John Peterson  wrote:

> Hi Junchao,
>
> I was just wondering if there was any update on this? I saw your question
> on the discuss@mpich thread, but I gather you have not received a
> response yet.
>
> --
> John
>
>
> On Tue, Apr 21, 2020 at 10:09 PM Junchao Zhang 
> wrote:
>
>>   I don't see problems calling _exit in PetscSignalHandlerDefault. Let me
>> try it first.
>> --Junchao Zhang
>>
>>
>> On Tue, Apr 21, 2020 at 3:17 PM John Peterson 
>> wrote:
>>
>>> Hi,
>>>
>>> I started a thread on disc...@mpich.org regarding some hanging canceled
>>> jobs that we were seeing:
>>>
>>> https://lists.mpich.org/pipermail/discuss/2020-April/005910.html
>>>
>>> It turns out that there are some fairly strict rules about what types of
>>> functions (asynchronous-safe only) can be called from signal handlers, and
>>> MPI_Abort(), at least the mpich implementation of it, apparently does not
>>> fall into that category. I wonder if you have any comments on this. One
>>> possibility might be might be to just call "_exit" from
>>> PetscSignalHandlerDefault rather than PETSCABORT, not sure what other
>>> issues that would cause, however.
>>>
>>> Thanks,
>>> John
>>>
>>
>
>


Re: [petsc-dev] Should PetscSignalHandlerDefault avoid calling MPI_Abort?

2020-04-21 Thread Junchao Zhang
  I don't see problems calling _exit in PetscSignalHandlerDefault. Let me
try it first.
--Junchao Zhang


On Tue, Apr 21, 2020 at 3:17 PM John Peterson  wrote:

> Hi,
>
> I started a thread on disc...@mpich.org regarding some hanging canceled
> jobs that we were seeing:
>
> https://lists.mpich.org/pipermail/discuss/2020-April/005910.html
>
> It turns out that there are some fairly strict rules about what types of
> functions (asynchronous-safe only) can be called from signal handlers, and
> MPI_Abort(), at least the mpich implementation of it, apparently does not
> fall into that category. I wonder if you have any comments on this. One
> possibility might be might be to just call "_exit" from
> PetscSignalHandlerDefault rather than PETSCABORT, not sure what other
> issues that would cause, however.
>
> Thanks,
> John
>


Re: [petsc-dev] CUDA + OMP make error

2020-04-13 Thread Junchao Zhang
Probably matrix assembly on GPU is more important. Do you have an example
for me to play to see what GPU interface we should have?
--Junchao Zhang

On Mon, Apr 13, 2020 at 5:44 PM Mark Adams  wrote:

> I was looking into assembling matrices with threads. I have a coloring to
> avoid conflicts.
>
> Turning off all the logging seems way overkill and for methods that can
> get called in a thread then we could use PETSC_HAVE_THREADSAFTEY (thingy)
> to protect logging functions. So one can still get timings for the whole
> assembly process, just not for MatSetValues. Few people are going to do
> this. I don't think it will be a time sink, and if it is we just revert
> back to saying 'turn logging off'. I don't see a good argument for
> insisting on turning off logging, it is pretty important, if we just say
> that we are going to protect methods as needed.
>
> It is not a big deal, I am just exploring this idea. It is such a basic
> concept in shared memory sparse linear algebra that it seems like a good
> thing to be able to support and have in an example to say we can assemble
> matrices in threads (not that it is a great idea). We have all the tools
> (eg, coloring methods) that it is just a matter of protecting code a few
> methods. I use DMPlex MatClosure instead of MatSetValues and this is where
> I die now with non-thread safe code. We have an idea, from Jed, on how to
> fix it.
>
> Anyway, thanks for your help, but I think we should hold off on doing
> anything until we have some consensus that this would be a good idea to put
> some effort into getting a thread safe PETSc that can support OMP matrix
> assembly with a nice compact example.
>
> Thanks again,
> Mark
>
> On Mon, Apr 13, 2020 at 5:44 PM Junchao Zhang 
> wrote:
>
>> Mark,
>>  I saw you had "--with-threadsaftey --with-log=0".  Do you really want to
>> call petsc from multiple threads (in contrast to letting petsc call
>> other libraries, e.g., BLAS, doing multithreading)?  If not, you can
>> drop --with-threadsaftey.
>>  I have https://gitlab.com/petsc/petsc/-/merge_requests/2714 that should
>> fix your original compilation errors.
>>
>> --Junchao Zhang
>>
>> On Mon, Apr 13, 2020 at 2:07 PM Mark Adams  wrote:
>>
>>> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
>>>
>>> and I see this on my Mac:
>>>
>>> 14:23 1 mark/feature-xgc-interface-rebase *= ~/Codes/petsc$
>>> ../arch-macosx-gnu-O-omp.py
>>>
>>>
>>>
>>> ===
>>>  Configuring PETSc to compile on your system
>>>
>>>
>>> ===
>>> ===
>>>
>>>
>>>Warning: PETSC_ARCH from environment does not match
>>> command-line or name of script.
>>>
>>>  Warning: Using from command-line or
>>> name of script: arch-macosx-gnu-O-omp, ignoring environment:
>>> arch-macosx-gnu-g
>>>
>>> ===
>>>
>>>
>>>  TESTING: configureLibraryOptions from
>>> PETSc.options.libraryOptions(config/PETSc/options/libraryOptions.py:37)
>>>
>>>
>>>  
>>> ***
>>>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log
>>> for details):
>>>
>>> ---
>>> Must use --with-log=0 with --with-threadsafety
>>>
>>> ***
>>>
>>>
>>> On Mon, Apr 13, 2020 at 2:54 PM Junchao Zhang 
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 13, 2020 at 12:06 PM Mark Adams  wrote:
>>>>
>>>>> BTW, I can build on SUMMIT with logging and OMP, apparently. I also
>>>>> seem to be able to build with debugging. Both of which are not allowed
>>>>> according the the docs. I am puzzled.
>>>>>
>>>>  What are "the docs"?
>>>>
>>>>>
>>>>> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams  wrote:
>>>>>
>>>>>> I think the problem is that you have to turn off logging with o

Re: [petsc-dev] CUDA + OMP make error

2020-04-13 Thread Junchao Zhang
Mark,
 I saw you had "--with-threadsaftey --with-log=0".  Do you really want to
call petsc from multiple threads (in contrast to letting petsc call
other libraries, e.g., BLAS, doing multithreading)?  If not, you can
drop --with-threadsaftey.
 I have https://gitlab.com/petsc/petsc/-/merge_requests/2714 that should
fix your original compilation errors.

--Junchao Zhang

On Mon, Apr 13, 2020 at 2:07 PM Mark Adams  wrote:

> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html
>
> and I see this on my Mac:
>
> 14:23 1 mark/feature-xgc-interface-rebase *= ~/Codes/petsc$
> ../arch-macosx-gnu-O-omp.py
>
>
>
> ===
>  Configuring PETSc to compile on your system
>
>
> ===
> ===
>
>
>Warning: PETSC_ARCH from environment does not match
> command-line or name of script.
>
>  Warning: Using from command-line or
> name of script: arch-macosx-gnu-O-omp, ignoring environment:
> arch-macosx-gnu-g
>
> ===
>
>
>  TESTING: configureLibraryOptions from
> PETSc.options.libraryOptions(config/PETSc/options/libraryOptions.py:37)
>
>
>  
> ***
>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
> details):
>
> ---
> Must use --with-log=0 with --with-threadsafety
>
> ***
>
>
> On Mon, Apr 13, 2020 at 2:54 PM Junchao Zhang 
> wrote:
>
>>
>>
>>
>> On Mon, Apr 13, 2020 at 12:06 PM Mark Adams  wrote:
>>
>>> BTW, I can build on SUMMIT with logging and OMP, apparently. I also seem
>>> to be able to build with debugging. Both of which are not allowed according
>>> the the docs. I am puzzled.
>>>
>>  What are "the docs"?
>>
>>>
>>> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams  wrote:
>>>
>>>> I think the problem is that you have to turn off logging with openmp
>>>> and the (newish) GPU timers did not protect their timers.
>>>>
>>>> I don't see a good reason to require logging be turned off with OMP. We
>>>> could use PETSC_HAVE_THREADSAFETY to protect logs that we care about (eg,
>>>> in MatSetValues) and as users discover more things that they want to call
>>>> in an OMP thread block, then tell them to turn logging off and we will fix
>>>> it when we can.
>>>>
>>>> Any thoughts on the idea of letting users keep logging with openmp?
>>>>
>>>> On Mon, Apr 13, 2020 at 11:40 AM Junchao Zhang 
>>>> wrote:
>>>>
>>>>> Yes. Looks we need to include petsclog.h. Don't know why OMP
>>>>> triggered the error.
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Apr 13, 2020 at 9:59 AM Mark Adams  wrote:
>>>>>
>>>>>> Should I do an MR to fix this?
>>>>>>
>>>>>


Re: [petsc-dev] CUDA + OMP make error

2020-04-13 Thread Junchao Zhang
On Mon, Apr 13, 2020 at 12:06 PM Mark Adams  wrote:

> BTW, I can build on SUMMIT with logging and OMP, apparently. I also seem
> to be able to build with debugging. Both of which are not allowed according
> the the docs. I am puzzled.
>
 What are "the docs"?

>
> On Mon, Apr 13, 2020 at 12:05 PM Mark Adams  wrote:
>
>> I think the problem is that you have to turn off logging with openmp and
>> the (newish) GPU timers did not protect their timers.
>>
>> I don't see a good reason to require logging be turned off with OMP. We
>> could use PETSC_HAVE_THREADSAFETY to protect logs that we care about (eg,
>> in MatSetValues) and as users discover more things that they want to call
>> in an OMP thread block, then tell them to turn logging off and we will fix
>> it when we can.
>>
>> Any thoughts on the idea of letting users keep logging with openmp?
>>
>> On Mon, Apr 13, 2020 at 11:40 AM Junchao Zhang 
>> wrote:
>>
>>> Yes. Looks we need to include petsclog.h. Don't know why OMP
>>> triggered the error.
>>> --Junchao Zhang
>>>
>>>
>>> On Mon, Apr 13, 2020 at 9:59 AM Mark Adams  wrote:
>>>
>>>> Should I do an MR to fix this?
>>>>
>>>


Re: [petsc-dev] CUDA + OMP make error

2020-04-13 Thread Junchao Zhang
Yes. Looks we need to include petsclog.h. Don't know why OMP triggered the
error.
--Junchao Zhang


On Mon, Apr 13, 2020 at 9:59 AM Mark Adams  wrote:

> Should I do an MR to fix this?
>


Re: [petsc-dev] Thank you!

2020-04-10 Thread Junchao Zhang
Ali,
 Congratulations to your paper, thanks to Barry's PCVPBJACOBI.
--Junchao Zhang


On Thu, Apr 9, 2020 at 6:12 PM Ali Reza Khaz'ali 
wrote:

> Dear PETSc team,
>
>
>
> I just want to thank you for implementing the PCVPBJACOBI into the PETSc
> library, which I used to solve the compositional fluid flow problem within
> the fractured petroleum reservoirs. I have published my code on the GitHub (
> https://github.com/khazali/Osiris). Additionally, a related paper
> explaining the details of the code has become online recently (
> https://authors.elsevier.com/a/1atND6gSGqY68l). Thanking you in the
> acknowledgement part was the least that I could do.
>
>
>
> Thanks again, and best wishes,
>
> Dr. Ali Reza Khaz’ali
>
> Assistant Professor of Petroleum Engineering & Director of Student Affairs,
>
> Department of Chemical Engineering
>
> Isfahan University of Technology
>
> Isfahan, Iran
>


  1   2   >