[petsc-users] Vec Ownership ranges with Global Section Offsets

2023-01-05 Thread Nicholas Arnold-Medabalimi
Hi Petsc Users,

I'm working with a dmplex system with a subsampled mesh distributed with an
overlap of 1.

I'm encountering unusual situations when using VecGetOwnershipRange to
adjust the offset received from a global section. The logic of the
following code is first to get the offset needed to index a global vector
while still being able to check if it is an overlapped cell and skip if
needed while counting the owned cells.

call DMGetGlobalSection(dmplex,section,ierr)
call VecGetArrayF90(stateVec,stateVecV,ierr)
call VecGetOwnershipRange(stateVec,oStart,oEnd,ierr)
do i = c0, (c1-1)

call PetscSectionGetOffset(section,i,offset,ierr)
write(*,*) "cell",i,"offset",offset,'oStart',oStart, offset-oStart

if(offset<0) then
cycle
endif
offset=offset-oStart
plexcells=plexcells+1
stateVecV(offset)=  enddo

I'm noticing some very weird results that I've appended below. The
GetOffset documentation notes that a negative offset indicates an unowned
point (which I use to cycle). However, the offset subtraction with oStart
will yield an illegal index for the Vector access. I see that on the
documentation for GetOwnershipRange, it notes that this may be
"ill-defined"  but I wanted to see if this is type of ill-defined I can
expect or there is just something terribly wrong with my PetscSection.(both
the Vec and Section were produced from DMPlexDistributeField so should by
definition have synchronized section information) I was wondering if there
is a possible output and/or the best way to index the vector. I'm thinking
of subtracting the offset of cell 0 perhaps?

on rank 0

 cell   0 offset   0 oStart   0   0
 cell   1 offset  55 oStart   0  55
 cell   2 offset 110 oStart   0 110
 cell   3 offset 165 oStart   0 165
 cell   4 offset 220 oStart   0 220
 cell   5 offset 275 oStart   0 275
 cell   6 offset 330 oStart   0 330
 cell   7 offset 385 oStart   0 385
 cell   8 offset 440 oStart   0 440
 cell   9 offset 495 oStart   0 495
 cell  10 offset 550 oStart   0 550
 cell  11 offset 605 oStart   0 605
 cell  12 offset 660 oStart   0 660
 cell  13 offset 715 oStart   0 715

and on rank one
cell   0 offset2475 oStart2640-165
 cell   1 offset2530 oStart2640-110
 cell   2 offset2585 oStart2640 -55
 cell   3 offset2640 oStart2640   0
 cell   4 offset2695 oStart2640  55
 cell   5 offset2750 oStart2640 110
 cell   6 offset2805 oStart2640 165
 cell   7 offset2860 oStart2640 220
 cell   8 offset2915 oStart2640 275
 cell   9 offset2970 oStart2640 330
 cell  10 offset3025 oStart2640 385
 cell  11 offset3080 oStart2640 440
 cell  12 offset3135 oStart2640 495
 cell  13 offset3190 oStart2640 550
 cell  14 offset3245 oStart2640 605
 cell  15 offset-771 oStart2640   -3411


Sincerely
Nicholas

-- 
Nicholas Arnold-Medabalimi

Ph.D. Candidate
Computational Aeroscience Lab
University of Michigan


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Barry Smith

  So Jed's "everyone" now consists of "no one" and Jed can stop complaining 
that "everyone" thinks it is a bad idea.



> On Jan 5, 2023, at 11:50 PM, Junchao Zhang  wrote:
> 
> 
> 
> 
> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  > wrote:
>> 
>> 
>> > On Jan 5, 2023, at 3:42 PM, Jed Brown > > > wrote:
>> > 
>> > Mark Adams mailto:mfad...@lbl.gov>> writes:
>> > 
>> >> Support of HIP and CUDA hardware together would be crazy, 
>> > 
>> > I don't think it's remotely crazy. libCEED supports both together and it's 
>> > very convenient when testing on a development machine that has one of each 
>> > brand GPU and simplifies binary distribution for us and every package that 
>> > uses us. Every day I wish PETSc could build with both simultaneously, but 
>> > everyone tells me it's silly.
>> 
>>   Not everyone at all; just a subset of everyone. Junchao is really the 
>> hold-out :-)
> I am not, instead I think we should try (I fully agree it can ease binary 
> distribution).  But satish needs to install such a machine first :)
> There are issues out of our control if we want to mix GPUs in execution.  For 
> example, how to do VexAXPY on a cuda vector and a hip vector? Shall we do it 
> on the host? Also, there are no gpu-aware MPI implementations supporting 
> messages between cuda memory and hip memory.
>> 
>>   I just don't care about "binary packages" :-); I think they are an archaic 
>> and bad way of thinking about code distribution (but yes the alternatives 
>> need lots of work to make them flawless, but I think that is where the work 
>> should go in the packaging world.)
>> 
>>I go further and think one should be able to automatically use a CUDA 
>> vector on a HIP device as well, it is not hard in theory but requires 
>> thinking about how we handle classes and subclasses a little to make it 
>> straightforward; or perhaps Jacob has fixed that also?
>  



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:

>
>
> > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
> >
> > Mark Adams  writes:
> >
> >> Support of HIP and CUDA hardware together would be crazy,
> >
> > I don't think it's remotely crazy. libCEED supports both together and
> it's very convenient when testing on a development machine that has one of
> each brand GPU and simplifies binary distribution for us and every package
> that uses us. Every day I wish PETSc could build with both simultaneously,
> but everyone tells me it's silly.
>
>   Not everyone at all; just a subset of everyone. Junchao is really the
> hold-out :-)
>
I am not, instead I think we should try (I fully agree it can ease binary
distribution).  But satish needs to install such a machine first :)
There are issues out of our control if we want to mix GPUs in execution.
For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we
do it on the host? Also, there are no gpu-aware MPI implementations
supporting messages between cuda memory and hip memory.

>
>   I just don't care about "binary packages" :-); I think they are an
> archaic and bad way of thinking about code distribution (but yes the
> alternatives need lots of work to make them flawless, but I think that is
> where the work should go in the packaging world.)
>
>I go further and think one should be able to automatically use a CUDA
> vector on a HIP device as well, it is not hard in theory but requires
> thinking about how we handle classes and subclasses a little to make it
> straightforward; or perhaps Jacob has fixed that also?


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Barry Smith



> On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
> 
> Mark Adams  writes:
> 
>> Support of HIP and CUDA hardware together would be crazy, 
> 
> I don't think it's remotely crazy. libCEED supports both together and it's 
> very convenient when testing on a development machine that has one of each 
> brand GPU and simplifies binary distribution for us and every package that 
> uses us. Every day I wish PETSc could build with both simultaneously, but 
> everyone tells me it's silly.

  Not everyone at all; just a subset of everyone. Junchao is really the 
hold-out :-)

  I just don't care about "binary packages" :-); I think they are an archaic 
and bad way of thinking about code distribution (but yes the alternatives need 
lots of work to make them flawless, but I think that is where the work should 
go in the packaging world.)

   I go further and think one should be able to automatically use a CUDA vector 
on a HIP device as well, it is not hard in theory but requires thinking about 
how we handle classes and subclasses a little to make it straightforward; or 
perhaps Jacob has fixed that also?



Re: [petsc-users] setting a vector with VecSetValue versus VecSetValues

2023-01-05 Thread Barry Smith

  Note there is also VecSetValuesLocal() that takes ghosted local indices 
(ghosted local indices are different from your meaning of "local indices"). See 
https://petsc.org/release/docs/manualpages/Vec/VecSetValuesLocal/  
https://petsc.org/release/docs/manualpages/Vec/VecSetLocalToGlobalMapping/

  Barry


> On Jan 5, 2023, at 12:41 PM, Alfredo Jaramillo  
> wrote:
> 
> omg... for some reason I was thinking it takes local indices. Yes.. modifying 
> that line the code works well.
> 
> thank you!
> Alfredo
> 
> On Thu, Jan 5, 2023 at 10:38 AM Junchao Zhang  > wrote:
>> VecSetValue() also needs global indices, so you need PetscInt gl_row = 
>> (PetscInt)(i)+rstart; 
>> 
>> --Junchao Zhang
>> 
>> 
>> On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo > > wrote:
>>> dear PETSc developers,
>>> 
>>> I have a code where I copy an array to a distributed petsc vector with the 
>>> next lines:
>>> 
>>> 1for (int i = 0; i < ndof_local; i++) {
>>> 2PetscInt gl_row = (PetscInt)(i)+rstart;
>>> 3PetscScalar val = (PetscScalar)u[i];
>>> 4VecSetValues(x,1,_row,,INSERT_VALUES);
>>> 5}
>>> 
>>> // for (int i = 0; i < ndof_local; i++) {
>>> // PetscInt gl_row = (PetscInt)(i);
>>> // PetscScalar val = (PetscScalar)u[i];
>>> // VecSetValue(x,gl_row,val,INSERT_VALUES);
>>> // }
>>> 
>>> VecAssemblyBegin(x);
>>> VecAssemblyEnd(x);
>>> 
>>> This works as expected. If, instead of using lines 1-5, I use the lines 
>>> where VecSetValue is used with local indices, then the vector is null on 
>>> all the processes but rank 0, and the piece of information at rank zero is 
>>> incorrect.
>>> 
>>> What could I be doing wrong?
>>> 
>>> bests regards
>>> Alfredo



Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Junchao Zhang
Jacob, is it because the cuda arch is too old?

--Junchao Zhang


On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry  wrote:

> I'm seeing the same thing on latest main with a different machine and
> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
> static_cast(device->deviceId)));   in the initialize function.
>
>
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 2,17c2,46
> <   0 SNES Function norm 2.391552133017e-01
> < 0 KSP Residual norm 2.928487269734e-01
> < 1 KSP Residual norm 1.876489580142e-02
> < 2 KSP Residual norm 3.291394847944e-03
> < 3 KSP Residual norm 2.456493072124e-04
> < 4 KSP Residual norm 1.161647147715e-05
> < 5 KSP Residual norm 1.285648407621e-06
> <   1 SNES Function norm 6.846805706142e-05
> < 0 KSP Residual norm 2.292783790384e-05
> < 1 KSP Residual norm 2.100673631699e-06
> < 2 KSP Residual norm 2.121341386147e-07
> < 3 KSP Residual norm 2.455932678957e-08
> < 4 KSP Residual norm 1.753095730744e-09
> < 5 KSP Residual norm 7.489214418904e-11
> <   2 SNES Function norm 2.103908447865e-10
> < Number of SNES iterations = 2
> ---
> > [0]PETSC ERROR: - Error Message
> --
> > [0]PETSC ERROR: GPU error
> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
> Could be the program crashed before they were used or a spelling mistake,
> etc!
> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
> command line
> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
> environment
> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
> source: command line
> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
>  GIT Date: 2023-01-05 17:22:48 +
> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
> Thu Jan  5 17:25:17 2023
> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> > [0]PETSC ERROR: #1 initialize() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
> cupmcontext.cu:10
> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> > [0]PETSC ERROR: #8 create() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> > [0]PETSC ERROR: #10 VecSetType() at
> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at
> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> > [0]PETSC ERROR: #13 main() at ex19.c:149
>
>
> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry  wrote:
>
>> I'm trying to compile the cuda example
>>
>> ./config/examples/arch-ci-linux-cuda-double-64idx.py
>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>
>> and running make test passes the test ok
>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
>> but the eager variant fails, pasted below.
>>
>> I get a similar error running my client code, pasted after. There when
>> running with -info, it seems that some lazy initialization happens first,
>> and i also call VecCreateSeqCuda which seems to have no issue.
>>
>> Any idea? This happens to be with an -sm 3.5 device if it matters,
>> otherwise it's a recent cuda compiler+driver.
>>
>>
>> petsc test code output:
>>
>>
>>
>> not ok
>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
>> Error code: 97
>> # [0]PETSC ERROR: - Error 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown
Junchao Zhang  writes:

>> I don't think it's remotely crazy. libCEED supports both together and it's
>> very convenient when testing on a development machine that has one of each
>> brand GPU and simplifies binary distribution for us and every package that
>> uses us. Every day I wish PETSc could build with both simultaneously, but
>> everyone tells me it's silly.
>>
>
> So an executable supports both GPUs,  but a running instance supports one
> or both at the same time?

I personally only have reason to instantiate one at a time within a given 
executable, though libCEED supports both instantiated at the same time.


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Matthew Knepley
On Thu, Jan 5, 2023 at 3:42 PM Jed Brown  wrote:

> Mark Adams  writes:
>
> > Support of HIP and CUDA hardware together would be crazy,
>
> I don't think it's remotely crazy. libCEED supports both together and it's
> very convenient when testing on a development machine that has one of each
> brand GPU and simplifies binary distribution for us and every package that
> uses us. Every day I wish PETSc could build with both simultaneously, but
> everyone tells me it's silly.
>

This is how I always understood our plan. I think it is crazy _not_ to do
this.

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Mark Lohry
I'm seeing the same thing on latest main with a different machine and -sm52
card, cuda 11.8. make check fails with the below, where the indicated line
249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(,
static_cast(device->deviceId)));   in the initialize function.


Running check examples to verify correct installation
Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
2,17c2,46
<   0 SNES Function norm 2.391552133017e-01
< 0 KSP Residual norm 2.928487269734e-01
< 1 KSP Residual norm 1.876489580142e-02
< 2 KSP Residual norm 3.291394847944e-03
< 3 KSP Residual norm 2.456493072124e-04
< 4 KSP Residual norm 1.161647147715e-05
< 5 KSP Residual norm 1.285648407621e-06
<   1 SNES Function norm 6.846805706142e-05
< 0 KSP Residual norm 2.292783790384e-05
< 1 KSP Residual norm 2.100673631699e-06
< 2 KSP Residual norm 2.121341386147e-07
< 3 KSP Residual norm 2.455932678957e-08
< 4 KSP Residual norm 1.753095730744e-09
< 5 KSP Residual norm 7.489214418904e-11
<   2 SNES Function norm 2.103908447865e-10
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: - Error Message
--
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
supported
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
Could be the program crashed before they were used or a spelling mistake,
etc!
> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
command line
> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
environment
> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
source: command line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
 GIT Date: 2023-01-05 17:22:48 +
> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu
Jan  5 17:25:17 2023
> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> [0]PETSC ERROR: #1 initialize() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
cupmcontext.cu:10
> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> [0]PETSC ERROR: #7 GetHandleDispatch_() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> [0]PETSC ERROR: #8 create() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
/home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> [0]PETSC ERROR: #10 VecSetType() at
/home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
/home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> [0]PETSC ERROR: #12 DMCreateGlobalVector() at
/home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> [0]PETSC ERROR: #13 main() at ex19.c:149


On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry  wrote:

> I'm trying to compile the cuda example
>
> ./config/examples/arch-ci-linux-cuda-double-64idx.py
> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>
> and running make test passes the test ok
> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
> but the eager variant fails, pasted below.
>
> I get a similar error running my client code, pasted after. There when
> running with -info, it seems that some lazy initialization happens first,
> and i also call VecCreateSeqCuda which seems to have no issue.
>
> Any idea? This happens to be with an -sm 3.5 device if it matters,
> otherwise it's a recent cuda compiler+driver.
>
>
> petsc test code output:
>
>
>
> not ok
> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
> Error code: 97
> # [0]PETSC ERROR: - Error Message
> --
> # [0]PETSC ERROR: GPU error
> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> # [0]PETSC ERROR: Petsc 

Re: [petsc-users] How to install in /usr/lib64 instead of /usr/lib?

2023-01-05 Thread Satish Balay via petsc-users
For now - perhaps the following patch...

Satish

---

diff --git a/config/install.py b/config/install.py
index 017bb736542..00f857f939e 100755
--- a/config/install.py
+++ b/config/install.py
@@ -76,9 +76,9 @@ class Installer(script.Script):
 self.archBinDir= os.path.join(self.rootDir, self.arch, 'bin')
 self.archLibDir= os.path.join(self.rootDir, self.arch, 'lib')
 self.destIncludeDir= os.path.join(self.destDir, 'include')
-self.destConfDir   = os.path.join(self.destDir, 'lib','petsc','conf')
-self.destLibDir= os.path.join(self.destDir, 'lib')
-self.destBinDir= os.path.join(self.destDir, 'lib','petsc','bin')
+self.destConfDir   = os.path.join(self.destDir, 'lib64','petsc','conf')
+self.destLibDir= os.path.join(self.destDir, 'lib64')
+self.destBinDir= os.path.join(self.destDir, 'lib64','petsc','bin')
 self.installIncludeDir = os.path.join(self.installDir, 'include')
 self.installBinDir = os.path.join(self.installDir, 'lib','petsc','bin')
 self.rootShareDir  = os.path.join(self.rootDir, 'share')

On Thu, 5 Jan 2023, Fellype via petsc-users wrote:

> Hi,
> I'm building petsc from sources on a 64-bit Slackware Linux and I would like 
> to know how to install the libraries in /usr/lib64 instead of /usr/lib. Is it 
> possible? I've not found an option like --libdir=DIR to pass to ./configure.
> 
> Regards,
> Fellype



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 2:42 PM Jed Brown  wrote:

> Mark Adams  writes:
>
> > Support of HIP and CUDA hardware together would be crazy,
>
> I don't think it's remotely crazy. libCEED supports both together and it's
> very convenient when testing on a development machine that has one of each
> brand GPU and simplifies binary distribution for us and every package that
> uses us. Every day I wish PETSc could build with both simultaneously, but
> everyone tells me it's silly.
>

So an executable supports both GPUs,  but a running instance supports one
or both at the same time?


[petsc-users] How to install in /usr/lib64 instead of /usr/lib?

2023-01-05 Thread Fellype via petsc-users
Hi,
I'm building petsc from sources on a 64-bit Slackware Linux and I would like to 
know how to install the libraries in /usr/lib64 instead of /usr/lib. Is it 
possible? I've not found an option like --libdir=DIR to pass to ./configure.

Regards,
Fellype

[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

2023-01-05 Thread Mark Lohry
I'm trying to compile the cuda example

./config/examples/arch-ci-linux-cuda-double-64idx.py
--with-cudac=/usr/local/cuda-11.5/bin/nvcc

and running make test passes the test ok
diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
but the eager variant fails, pasted below.

I get a similar error running my client code, pasted after. There when
running with -info, it seems that some lazy initialization happens first,
and i also call VecCreateSeqCuda which seems to have no issue.

Any idea? This happens to be with an -sm 3.5 device if it matters,
otherwise it's a recent cuda compiler+driver.


petsc test code output:



not ok
sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
Error code: 97
# [0]PETSC ERROR: - Error Message
--
# [0]PETSC ERROR: GPU error
# [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
supported
# [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
# [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
# [0]PETSC ERROR: ../ex1 on a  named lancer by mlohry Thu Jan  5 15:22:33
2023
# [0]PETSC ERROR: Configure options
--package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2
--download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g
-O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1
--with-cuda=1 --with-precision=double --with-clanguage=c
--with-cudac=/usr/local/cuda-11.5/bin/nvcc
PETSC_ARCH=arch-ci-linux-cuda-double-64idx
# [0]PETSC ERROR: #1 CUPMAwareMPI_() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
# [0]PETSC ERROR: #2 initialize() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
# [0]PETSC ERROR: #3 init_device_id_() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
# [0]PETSC ERROR: #4 getDevice() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
# [0]PETSC ERROR: #5 PetscDeviceCreate() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
# [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
# [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
# [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
# [0]PETSC ERROR: #9 PetscInitialize_Common() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
# [0]PETSC ERROR: #10 PetscInitialize() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
# [0]PETSC ERROR: #11 main() at
/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
# [0]PETSC ERROR: PETSc Option Table entries:
# [0]PETSC ERROR: -default_device_type host
# [0]PETSC ERROR: -device_enable eager
# [0]PETSC ERROR: End of Error Message ---send entire
error message to petsc-ma...@mcs.anl.gov--





solver code output:



[0]  PetscDetermineInitialFPTrap(): Floating point trapping is off by
default 0
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
host available, initializing
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host
initialized, default device id 0, view FALSE, init type lazy
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
cuda available, initializing
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda
initialized, default device id 0, view FALSE, init type lazy
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
hip not available
[0]  PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
sycl not available
[0]  PetscInitialize_Common(): PETSc successfully started: number of
processors = 1
[0]  PetscGetHostName(): Rejecting domainname, likely is NIS
lancer.(none)
[0]  PetscInitialize_Common(): Running on machine: lancer
# [Info] Petsc initialization complete.
# [Trace] Timing: Starting solver...
# [Info] RNG initial conditions have mean 0.04, renormalizing.
# [Trace] Timing: PetscTimeIntegrator initialization...
# [Trace] Timing: Allocating Petsc CUDA arrays...
[0]  PetscCommDuplicate(): Duplicating a communicator 2 3 max tags =
1
[0]  configure(): Configured device 0
[0]  PetscCommDuplicate(): Using internal PETSc 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown
Mark Adams  writes:

> Support of HIP and CUDA hardware together would be crazy, 

I don't think it's remotely crazy. libCEED supports both together and it's very 
convenient when testing on a development machine that has one of each brand GPU 
and simplifies binary distribution for us and every package that uses us. Every 
day I wish PETSc could build with both simultaneously, but everyone tells me 
it's silly.


Re: [petsc-users] error when trying to compile with HPDDM

2023-01-05 Thread Alfredo Jaramillo
Hi Pierre, no, I don't really need that flag. I removed it and the
installation process went well. I just noticed a "minor" detail when
building SLEPc:

"gmake[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent
make rule"

so the compilation of that library went slow.

Thanks,
Alfredo


On Thu, Jan 5, 2023 at 12:39 PM Pierre Jolivet  wrote:

>
>
> On 5 Jan 2023, at 7:06 PM, Matthew Knepley  wrote:
>
> On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo <
> ajaramillopa...@gmail.com> wrote:
>
>> Dear developers,
>> I'm trying to compile petsc together with the HPDDM library. A series on
>> errors appeared:
>>
>> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:
>> In static member function ‘static constexpr __float128
>> std::numeric_limits<__float128>::min()’:
>> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57:
>> error: unable to find numeric literal operator ‘operator""Q’
>>54 | static constexpr __float128 min() noexcept { return
>> FLT128_MIN; }
>>
>> I'm attaching the log files to this email.
>>
>
> Pierre,
>
> It looks like we may need to test for FLT_MIN and FLT_MAX in configure
> since it looks like Alfredo's headers do not have them.
> Is this correct?
>
>
> We could do that, but I bet this is a side effect of the fact that Alfredo
> is using --with-cxx-dialect=C++11.
> Alfredo, did you got that flag from someone else’s configure, or do you
> know what that flag is doing?
> - If yes, do you really need to stick to -std=c++11?
> - If no, please look at
> https://gitlab.com/petsc/petsc/-/issues/1284#note_1173803107 and consider
> removing that flag, or at least changing the option to
> --with-cxx-dialect=11. If compilation still fails, please send the
> up-to-date configure.log/make.log
>
> Thanks,
> Pierre
>
>   Thanks,
>
>  Matt
>
>
>> Could you please help me with this?
>>
>> bests regards
>> Alfredo
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>
>
>


Re: [petsc-users] error when trying to compile with HPDDM

2023-01-05 Thread Pierre Jolivet


> On 5 Jan 2023, at 7:06 PM, Matthew Knepley  wrote:
> 
> On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo  > wrote:
>> Dear developers,
>> I'm trying to compile petsc together with the HPDDM library. A series on 
>> errors appeared:
>> 
>> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:
>>  In static member function ‘static constexpr __float128 
>> std::numeric_limits<__float128>::min()’:
>> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57:
>>  error: unable to find numeric literal operator ‘operator""Q’
>>54 | static constexpr __float128 min() noexcept { return FLT128_MIN; }
>> 
>> I'm attaching the log files to this email.
> 
> Pierre,
> 
> It looks like we may need to test for FLT_MIN and FLT_MAX in configure since 
> it looks like Alfredo's headers do not have them.
> Is this correct?

We could do that, but I bet this is a side effect of the fact that Alfredo is 
using --with-cxx-dialect=C++11.
Alfredo, did you got that flag from someone else’s configure, or do you know 
what that flag is doing?
- If yes, do you really need to stick to -std=c++11?
- If no, please look at 
https://gitlab.com/petsc/petsc/-/issues/1284#note_1173803107 and consider 
removing that flag, or at least changing the option to --with-cxx-dialect=11. 
If compilation still fails, please send the up-to-date configure.log/make.log

Thanks,
Pierre

>   Thanks,
> 
>  Matt
>  
>> Could you please help me with this?
>> 
>> bests regards
>> Alfredo
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 



Re: [petsc-users] error when trying to compile with HPDDM

2023-01-05 Thread Matthew Knepley
On Thu, Jan 5, 2023 at 11:36 AM Alfredo Jaramillo 
wrote:

> Dear developers,
> I'm trying to compile petsc together with the HPDDM library. A series on
> errors appeared:
>
> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:
> In static member function ‘static constexpr __float128
> std::numeric_limits<__float128>::min()’:
> /home/ajaramillo/petsc/x64-openmpi-aldaas2021/include/HPDDM_specifications.hpp:54:57:
> error: unable to find numeric literal operator ‘operator""Q’
>54 | static constexpr __float128 min() noexcept { return
> FLT128_MIN; }
>
> I'm attaching the log files to this email.
>

Pierre,

It looks like we may need to test for FLT_MIN and FLT_MAX in configure
since it looks like Alfredo's headers do not have them.
Is this correct?

  Thanks,

 Matt


> Could you please help me with this?
>
> bests regards
> Alfredo
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] setting a vector with VecSetValue versus VecSetValues

2023-01-05 Thread Alfredo Jaramillo
omg... for some reason I was thinking it takes local indices. Yes..
modifying that line the code works well.

thank you!
Alfredo

On Thu, Jan 5, 2023 at 10:38 AM Junchao Zhang 
wrote:

> VecSetValue() also needs global indices, so you need PetscInt gl_row = (
> PetscInt)(i)+rstart;
>
> --Junchao Zhang
>
>
> On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo <
> ajaramillopa...@gmail.com> wrote:
>
>> dear PETSc developers,
>>
>> I have a code where I copy an array to a distributed petsc vector with
>> the next lines:
>>
>> 1 for (int i = 0; i < ndof_local; i++) {
>> 2 PetscInt gl_row = (PetscInt)(i)+rstart;
>> 3 PetscScalar val = (PetscScalar)u[i];
>> 4 VecSetValues(x,1,_row,,INSERT_VALUES);
>> 5 }
>>
>> // for (int i = 0; i < ndof_local; i++) {
>> // PetscInt gl_row = (PetscInt)(i);
>> // PetscScalar val = (PetscScalar)u[i];
>> // VecSetValue(x,gl_row,val,INSERT_VALUES);
>> // }
>>
>> VecAssemblyBegin(x);
>> VecAssemblyEnd(x);
>>
>> This works as expected. If, instead of using lines 1-5, I use the lines
>> where VecSetValue is used with local indices, then the vector is null on
>> all the processes but rank 0, and the piece of information at rank zero is
>> incorrect.
>>
>> What could I be doing wrong?
>>
>> bests regards
>> Alfredo
>>
>


Re: [petsc-users] setting a vector with VecSetValue versus VecSetValues

2023-01-05 Thread Junchao Zhang
VecSetValue() also needs global indices, so you need PetscInt gl_row = (
PetscInt)(i)+rstart;

--Junchao Zhang


On Thu, Jan 5, 2023 at 11:25 AM Alfredo Jaramillo 
wrote:

> dear PETSc developers,
>
> I have a code where I copy an array to a distributed petsc vector with the
> next lines:
>
> 1 for (int i = 0; i < ndof_local; i++) {
> 2 PetscInt gl_row = (PetscInt)(i)+rstart;
> 3 PetscScalar val = (PetscScalar)u[i];
> 4 VecSetValues(x,1,_row,,INSERT_VALUES);
> 5 }
>
> // for (int i = 0; i < ndof_local; i++) {
> // PetscInt gl_row = (PetscInt)(i);
> // PetscScalar val = (PetscScalar)u[i];
> // VecSetValue(x,gl_row,val,INSERT_VALUES);
> // }
>
> VecAssemblyBegin(x);
> VecAssemblyEnd(x);
>
> This works as expected. If, instead of using lines 1-5, I use the lines
> where VecSetValue is used with local indices, then the vector is null on
> all the processes but rank 0, and the piece of information at rank zero is
> incorrect.
>
> What could I be doing wrong?
>
> bests regards
> Alfredo
>


[petsc-users] setting a vector with VecSetValue versus VecSetValues

2023-01-05 Thread Alfredo Jaramillo
dear PETSc developers,

I have a code where I copy an array to a distributed petsc vector with the
next lines:

1 for (int i = 0; i < ndof_local; i++) {
2 PetscInt gl_row = (PetscInt)(i)+rstart;
3 PetscScalar val = (PetscScalar)u[i];
4 VecSetValues(x,1,_row,,INSERT_VALUES);
5 }

// for (int i = 0; i < ndof_local; i++) {
// PetscInt gl_row = (PetscInt)(i);
// PetscScalar val = (PetscScalar)u[i];
// VecSetValue(x,gl_row,val,INSERT_VALUES);
// }

VecAssemblyBegin(x);
VecAssemblyEnd(x);

This works as expected. If, instead of using lines 1-5, I use the lines
where VecSetValue is used with local indices, then the vector is null on
all the processes but rank 0, and the piece of information at rank zero is
incorrect.

What could I be doing wrong?

bests regards
Alfredo


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 10:24 AM Mark Lohry  wrote:

>
> I am thinking something like MatSeqAIJGetArrayAndMemType
>>
>
> Isn't the "MemType" of the matrix an invariant on creation? e.g. a user
> shouldn't care what memtype a pointer is for, just that if a device matrix
> was created it returns device pointers, if a host matrix was created it
> returns host pointers.
>
> Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType
> ,
> isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will
> cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return
> me those raw device pointers?
>
> Yeah, I forgot I added it :). On "a user shouldn't care what memtype a
pointer is":  yes if you can, otherwise you can use mtype to differentiate
your code path.


>
>
>
>
> On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang 
> wrote:
>
>>
>>
>> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:
>>
>>>
 A workaround is to let petsc build the matrix and allocate the memory,
 then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.

>>>
>>> Junchao, looking at the code for this it seems to only return a pointer
>>> to the value array, but not pointers to the column and row index arrays, is
>>> that right?
>>>
>> Yes, that is correct.
>> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
>> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
>> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
>> A is a device matrix, otherwise (a,i, j) on host and mtype =
>> PETSC_MEMTYPE_HOST.
>> We currently have similar things like
>> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
>> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
>>
>> It looks like you need (a, i, j) for assembly, but the above function
>> only works for an assembled matrix.
>>
>>
>>>
>>>
>>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
>>> wrote:
>>>
 We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
 both


 CUPM works with both enabled simultaneously, I don’t think there are
 any direct restrictions for it. Vec at least was fully usable with both
 cuda and hip (though untested) last time I checked.

 Best regards,

 Jacob Faibussowitsch
 (Jacob Fai - booss - oh - vitch)

 On Jan 5, 2023, at 00:09, Junchao Zhang 
 wrote:

 



 On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
 wrote:

> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
> wrote:
>
>>
>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>
>>> Oh, is the device backend not known at compile time?
>>>
>> Currently it is known at compile time.
>>
>
> Are you sure? I don't think it is known at compile time.
>
 We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
 both


>
>   Thanks,
>
>  Matt
>
>
>> Or multiple backends can be alive at once?
>>>
>>
>> Some petsc developers (Jed and Barry) want to support this, but we
>> are incapable now.
>>
>>
>>>
>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>>> wrote:
>>>


 On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:

> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
>> on AMD
>> GPUs, ...
>
>
> Wouldn't one function suffice? Assuming these are contiguous
> arrays in CSR format, they're just raw device pointers in all cases.
>
 But we need to know what device it is (to dispatch to either
 petsc-CUDA or petsc-HIP backend)


>
> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
> junchao.zh...@gmail.com> wrote:
>
>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), 
>> but
>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>> HIPSPARSE on
>> AMD GPUs, ...
>>
>> The real problem I think is to deal with multiple MPI ranks.
>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus 
>> is
>> discouraged for users to do so.
>>
>> A workaround is to let petsc build the matrix and allocate the
>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
>> and fill
>> it up.
>>
>> We recently added routines to support matrix assembly on GPUs,
>> see if MatSetValuesCOO
>> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jacob Faibussowitsch
Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointerYes, the user generally only cares host or device. Sometimes internally though we care if it is host or device, and if device, what kind of device. PetscMemtype is a bit overloaded in this sense.if a host matrix was created it returns host pointers.Many matrix “types” allow conversion. So what was once a host matrix can become a device matrix and vice versa. In these cases you’d want to know which pointers are returned.Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 16:24, Mark Lohry  wrote:I am thinking something like MatSeqAIJGetArrayAndMemTypeIsn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer is for, just that if a device matrix was created it returns device pointers, if a host matrix was created it returns host pointers.Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType, isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return me those raw device pointers?On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang  wrote:On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:A workaround is to let petsc build the matrix and allocate the 
memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
fill it up.Junchao, looking at the code for this it seems to only return a pointer to the value array, but not pointers to the column and row index arrays, is that right?Yes, that is correct. I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST.We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix.    On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch  wrote:We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked. Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:Oh, is the device backend not known at compile time? Currently it is known at compile time.Are you sure? I don't think it is known at compile time.We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both    Thanks,     Matt Or multiple backends can be alive at once?Some petsc developers (Jed and Barry) want to support this, but we are incapable now. On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend) On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang  wrote:No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps--Junchao ZhangOn Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I callMatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Lohry
I am thinking something like MatSeqAIJGetArrayAndMemType
>

Isn't the "MemType" of the matrix an invariant on creation? e.g. a user
shouldn't care what memtype a pointer is for, just that if a device matrix
was created it returns device pointers, if a host matrix was created it
returns host pointers.

Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType
,
isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will
cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return
me those raw device pointers?





On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang 
wrote:

>
>
> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:
>
>>
>>> A workaround is to let petsc build the matrix and allocate the memory,
>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>>
>>
>> Junchao, looking at the code for this it seems to only return a pointer
>> to the value array, but not pointers to the column and row index arrays, is
>> that right?
>>
> Yes, that is correct.
> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
> A is a device matrix, otherwise (a,i, j) on host and mtype =
> PETSC_MEMTYPE_HOST.
> We currently have similar things like
> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
>
> It looks like you need (a, i, j) for assembly, but the above function only
> works for an assembled matrix.
>
>
>>
>>
>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
>> wrote:
>>
>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>>
>>> CUPM works with both enabled simultaneously, I don’t think there are any
>>> direct restrictions for it. Vec at least was fully usable with both cuda
>>> and hip (though untested) last time I checked.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>>>
>>> 
>>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
 wrote:

>
> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>
>> Oh, is the device backend not known at compile time?
>>
> Currently it is known at compile time.
>

 Are you sure? I don't think it is known at compile time.

>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>>

   Thanks,

  Matt


> Or multiple backends can be alive at once?
>>
>
> Some petsc developers (Jed and Barry) want to support this, but we are
> incapable now.
>
>
>>
>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>>
 Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
> on AMD
> GPUs, ...


 Wouldn't one function suffice? Assuming these are contiguous arrays
 in CSR format, they're just raw device pointers in all cases.

>>> But we need to know what device it is (to dispatch to either
>>> petsc-CUDA or petsc-HIP backend)
>>>
>>>

 On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
 junchao.zh...@gmail.com> wrote:

> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
> then we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
> AMD
> GPUs, ...
>
> The real problem I think is to deal with multiple MPI ranks.
> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
> discouraged for users to do so.
>
> A workaround is to let petsc build the matrix and allocate the
> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
> and fill
> it up.
>
> We recently added routines to support matrix assembly on GPUs, see
> if MatSetValuesCOO
> 
>  helps
>
> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
> wrote:
>
>> I have a sparse matrix constructed in non-petsc code using a
>> standard CSR representation where I compute the Jacobian to be used 
>> in an
>> implicit TS context. In the CPU world I call
>>
>> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:

>
>> A workaround is to let petsc build the matrix and allocate the memory,
>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>
>
> Junchao, looking at the code for this it seems to only return a pointer to
> the value array, but not pointers to the column and row index arrays, is
> that right?
>
Yes, that is correct.
I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
A is a device matrix, otherwise (a,i, j) on host and mtype =
PETSC_MEMTYPE_HOST.
We currently have similar things like
VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).

It looks like you need (a, i, j) for assembly, but the above function only
works for an assembled matrix.


>
>
> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
> wrote:
>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>> CUPM works with both enabled simultaneously, I don’t think there are any
>> direct restrictions for it. Vec at least was fully usable with both cuda
>> and hip (though untested) last time I checked.
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>>
>> 
>>
>>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>>> wrote:
>>>

 On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:

> Oh, is the device backend not known at compile time?
>
 Currently it is known at compile time.

>>>
>>> Are you sure? I don't think it is known at compile time.
>>>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Or multiple backends can be alive at once?
>

 Some petsc developers (Jed and Barry) want to support this, but we are
 incapable now.


>
> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>
>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
 we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
 AMD
 GPUs, ...
>>>
>>>
>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>> in CSR format, they're just raw device pointers in all cases.
>>>
>> But we need to know what device it is (to dispatch to either
>> petsc-CUDA or petsc-HIP backend)
>>
>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>> junchao.zh...@gmail.com> wrote:
>>>
 No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
 GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
 then we
 would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
 GPUs, ...

 The real problem I think is to deal with multiple MPI ranks.
 Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
 discouraged for users to do so.

 A workaround is to let petsc build the matrix and allocate the
 memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
 fill
 it up.

 We recently added routines to support matrix assembly on GPUs, see
 if MatSetValuesCOO
 
  helps

 --Junchao Zhang


 On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:

> I have a sparse matrix constructed in non-petsc code using a
> standard CSR representation where I compute the Jacobian to be used 
> in an
> implicit TS context. In the CPU world I call
>
> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
> rowidxptr, colidxptr, valptr, Jac);
>
> which as I understand it -- (1) never copies/allocates that
> information, and the matrix Jac is just a non-owning view into the 
> already
> allocated CSR, (2) I can write directly into the original data 
> structures
> and the Mat just "knows" about it, although it still needs a call to
> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
> this
> works great with GAMG.
>
> I have the same CSR representation filled in GPU data allocated
> with cudaMalloc and filled on-device. Is there an equivalent Mat
> constructor for GPU arrays, or some other way to avoid unnecessary 
> copies?
>
> Thanks,
> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Lohry
>
>
> A workaround is to let petsc build the matrix and allocate the memory,
> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>

Junchao, looking at the code for this it seems to only return a pointer to
the value array, but not pointers to the column and row index arrays, is
that right?

On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
wrote:

> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
> CUPM works with both enabled simultaneously, I don’t think there are any
> direct restrictions for it. Vec at least was fully usable with both cuda
> and hip (though untested) last time I checked.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>
> 
>
>
>
> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>>
 Oh, is the device backend not known at compile time?

>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Or multiple backends can be alive at once?

>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>> incapable now.
>>>
>>>

 On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
 wrote:

>
>
> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>
>>
>> Wouldn't one function suffice? Assuming these are contiguous arrays
>> in CSR format, they're just raw device pointers in all cases.
>>
> But we need to know what device it is (to dispatch to either
> petsc-CUDA or petsc-HIP backend)
>
>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
>> wrote:
>>
>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
>>> then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>>
>>> The real problem I think is to deal with multiple MPI ranks.
>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>> discouraged for users to do so.
>>>
>>> A workaround is to let petsc build the matrix and allocate the
>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
>>> fill
>>> it up.
>>>
>>> We recently added routines to support matrix assembly on GPUs, see if
>>>  MatSetValuesCOO
>>> 
>>>  helps
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>>>
 I have a sparse matrix constructed in non-petsc code using a
 standard CSR representation where I compute the Jacobian to be used in 
 an
 implicit TS context. In the CPU world I call

 MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
 rowidxptr, colidxptr, valptr, Jac);

 which as I understand it -- (1) never copies/allocates that
 information, and the matrix Jac is just a non-owning view into the 
 already
 allocated CSR, (2) I can write directly into the original data 
 structures
 and the Mat just "knows" about it, although it still needs a call to
 MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
 works great with GAMG.

 I have the same CSR representation filled in GPU data allocated
 with cudaMalloc and filled on-device. Is there an equivalent Mat
 constructor for GPU arrays, or some other way to avoid unnecessary 
 copies?

 Thanks,
 Mark

>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jacob Faibussowitsch
We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked. Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:Oh, is the device backend not known at compile time? Currently it is known at compile time.Are you sure? I don't think it is known at compile time.We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both    Thanks,     Matt Or multiple backends can be alive at once?Some petsc developers (Jed and Barry) want to support this, but we are incapable now. On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend) On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang  wrote:No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps--Junchao ZhangOn Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I callMatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG.I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies?Thanks,Mark





-- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.-- Norbert Wienerhttps://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Adams
Support of HIP and CUDA hardware together would be crazy, but supporting
(Kokkos) OpenMP and a device backend would require something that looks
like two device back-ends and long term CPUs (eg, Grace) may not go away
wrt kernels.
PETSc does not support OMP well, of course, but that support could grow if
that is where hardware and applications go.


On Wed, Jan 4, 2023 at 7:38 PM Mark Lohry  wrote:

> You have my condolences if you have to support all those things
> simultaneously.
>
> On Wed, Jan 4, 2023, 7:27 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang 
>> wrote:
>>
>>> We don't have a machine for us to test with both "--with-cuda --with-hip"
>>>
>>
>> Yes, but your answer suggested that the structure of the code prevented
>> this combination.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
 wrote:

> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
> wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>>
 Oh, is the device backend not known at compile time?

>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
> both
>

 Where is the logic for that in the code? This seems like a crazy design.

   Thanks,

 Matt


>   Thanks,
>>
>>  Matt
>>
>>
>>> Or multiple backends can be alive at once?

>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we
>>> are incapable now.
>>>
>>>

 On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
 wrote:

>
>
> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry 
> wrote:
>
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but
>>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>>> HIPSPARSE on
>>> AMD GPUs, ...
>>
>>
>> Wouldn't one function suffice? Assuming these are contiguous
>> arrays in CSR format, they're just raw device pointers in all cases.
>>
> But we need to know what device it is (to dispatch to either
> petsc-CUDA or petsc-HIP backend)
>
>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>> junchao.zh...@gmail.com> wrote:
>>
>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), 
>>> but
>>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>>> HIPSPARSE on
>>> AMD GPUs, ...
>>>
>>> The real problem I think is to deal with multiple MPI ranks.
>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus 
>>> is
>>> discouraged for users to do so.
>>>
>>> A workaround is to let petsc build the matrix and allocate the
>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
>>> and fill
>>> it up.
>>>
>>> We recently added routines to support matrix assembly on GPUs,
>>> see if MatSetValuesCOO
>>> 
>>>  helps
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
>>> wrote:
>>>
 I have a sparse matrix constructed in non-petsc code using a
 standard CSR representation where I compute the Jacobian to be 
 used in an
 implicit TS context. In the CPU world I call

 MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
 rowidxptr, colidxptr, valptr, Jac);

 which as I understand it -- (1) never copies/allocates that
 information, and the matrix Jac is just a non-owning view into the 
 already
 allocated CSR, (2) I can write directly into the original data 
 structures
 and the Mat just "knows" about it, although it still needs a call 
 to
 MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
 this
 works great with GAMG.

 I have the same CSR representation filled in GPU data allocated
 with cudaMalloc and filled on-device. Is there an equivalent Mat
 constructor for GPU arrays, or some other way to avoid unnecessary 
 copies?

 Thanks,