Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-07 Thread Junchao Zhang
I see.  Thanks a lot.
--Junchao Zhang


On Sat, Jan 7, 2023 at 6:15 AM Mark Lohry  wrote:

> I've worked on a few different codes doing matrix assembly on GPU
> independently of petsc. In all instances to plug into petsc all I need are
> the device CSR pointers and some guarantee they don't move around (my first
> try without setpreallocation on CPU I saw the value array pointer move
> after the first solve). It would also be nice to have a guarantee there
> aren't any unnecessary copies since memory constraints are always a concern.
>
> Here I call
> MatCreateSeqAIJCUSPARSE
> MatSeqAIJSetPreallocationCSR (filled using a preexisting CSR on host using
> the correct index arrays and zeros for values)
> MatSeqAIJGetCSRAndMemType (grab the allocated device CSR pointers and use
> those directly)
>
> Then in the Jacobian evaluation routine I fill that CSR directly with no
> calls to MatSetValues, just
>
> MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY);
>  MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY);
>
> after to put it in the correct state.
>
> In this code to fill the CSR coefficients, each GPU thread gets one row
> and fills it. No race conditions to contend with. Technically I'm
> duplicating some computations (a given dof could fill its own row and
> column) but this is much faster than the linear solver anyway.
>
> Other mesh based codes did GPU assembly using either coloring or mutexes,
> but still just need the CSR value array to fill.
>
>
> On Fri, Jan 6, 2023, 9:44 PM Junchao Zhang 
> wrote:
>
>>
>>
>>
>> On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry  wrote:
>>
>>> Well, I think it's a moderately crazy idea unless it's less painful to
>>> implement than I'm thinking. Is there a use case for a mixed device system
>>> where one petsc executable might be addressing both a HIP and CUDA device
>>> beyond some frankenstein test system somebody cooked up? In all my code I
>>> implicitly assume I have either have one host with one device or one host
>>> with zero devices. I guess you can support these weird scenarios, but why?
>>> Life is hard enough supporting one device compiler with one host compiler.
>>>
>>> Many thanks Junchao -- with combinations of SetPreallocation I was able
>>> to grab allocated pointers out of petsc. Now I have all the jacobian
>>> construction on device with no copies.
>>>
>> Hi, Mark, could you say a few words about how you assemble matrices on
>> GPUs?  We ported MatSetValues like routines to GPUs but did not continue
>> this approach since we have to resolve data races between GPU threads.
>>
>>
>>>
>>> On Fri, Jan 6, 2023 at 12:27 AM Barry Smith  wrote:
>>>

   So Jed's "everyone" now consists of "no one" and Jed can stop
 complaining that "everyone" thinks it is a bad idea.



 On Jan 5, 2023, at 11:50 PM, Junchao Zhang 
 wrote:




 On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:

>
>
> > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
> >
> > Mark Adams  writes:
> >
> >> Support of HIP and CUDA hardware together would be crazy,
> >
> > I don't think it's remotely crazy. libCEED supports both together
> and it's very convenient when testing on a development machine that has 
> one
> of each brand GPU and simplifies binary distribution for us and every
> package that uses us. Every day I wish PETSc could build with both
> simultaneously, but everyone tells me it's silly.
>
>   Not everyone at all; just a subset of everyone. Junchao is really
> the hold-out :-)
>
 I am not, instead I think we should try (I fully agree it can ease
 binary distribution).  But satish needs to install such a machine first :)
 There are issues out of our control if we want to mix GPUs in
 execution.  For example, how to do VexAXPY on a cuda vector and a hip
 vector? Shall we do it on the host? Also, there are no gpu-aware MPI
 implementations supporting messages between cuda memory and hip memory.

>
>   I just don't care about "binary packages" :-); I think they are an
> archaic and bad way of thinking about code distribution (but yes the
> alternatives need lots of work to make them flawless, but I think that is
> where the work should go in the packaging world.)
>
>I go further and think one should be able to automatically use a
> CUDA vector on a HIP device as well, it is not hard in theory but requires
> thinking about how we handle classes and subclasses a little to make it
> straightforward; or perhaps Jacob has fixed that also?







Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-07 Thread Mark Lohry
I've worked on a few different codes doing matrix assembly on GPU
independently of petsc. In all instances to plug into petsc all I need are
the device CSR pointers and some guarantee they don't move around (my first
try without setpreallocation on CPU I saw the value array pointer move
after the first solve). It would also be nice to have a guarantee there
aren't any unnecessary copies since memory constraints are always a concern.

Here I call
MatCreateSeqAIJCUSPARSE
MatSeqAIJSetPreallocationCSR (filled using a preexisting CSR on host using
the correct index arrays and zeros for values)
MatSeqAIJGetCSRAndMemType (grab the allocated device CSR pointers and use
those directly)

Then in the Jacobian evaluation routine I fill that CSR directly with no
calls to MatSetValues, just

MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY);
 MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY);

after to put it in the correct state.

In this code to fill the CSR coefficients, each GPU thread gets one row and
fills it. No race conditions to contend with. Technically I'm duplicating
some computations (a given dof could fill its own row and column) but this
is much faster than the linear solver anyway.

Other mesh based codes did GPU assembly using either coloring or mutexes,
but still just need the CSR value array to fill.


On Fri, Jan 6, 2023, 9:44 PM Junchao Zhang  wrote:

>
>
>
> On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry  wrote:
>
>> Well, I think it's a moderately crazy idea unless it's less painful to
>> implement than I'm thinking. Is there a use case for a mixed device system
>> where one petsc executable might be addressing both a HIP and CUDA device
>> beyond some frankenstein test system somebody cooked up? In all my code I
>> implicitly assume I have either have one host with one device or one host
>> with zero devices. I guess you can support these weird scenarios, but why?
>> Life is hard enough supporting one device compiler with one host compiler.
>>
>> Many thanks Junchao -- with combinations of SetPreallocation I was able
>> to grab allocated pointers out of petsc. Now I have all the jacobian
>> construction on device with no copies.
>>
> Hi, Mark, could you say a few words about how you assemble matrices on
> GPUs?  We ported MatSetValues like routines to GPUs but did not continue
> this approach since we have to resolve data races between GPU threads.
>
>
>>
>> On Fri, Jan 6, 2023 at 12:27 AM Barry Smith  wrote:
>>
>>>
>>>   So Jed's "everyone" now consists of "no one" and Jed can stop
>>> complaining that "everyone" thinks it is a bad idea.
>>>
>>>
>>>
>>> On Jan 5, 2023, at 11:50 PM, Junchao Zhang 
>>> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:
>>>


 > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
 >
 > Mark Adams  writes:
 >
 >> Support of HIP and CUDA hardware together would be crazy,
 >
 > I don't think it's remotely crazy. libCEED supports both together and
 it's very convenient when testing on a development machine that has one of
 each brand GPU and simplifies binary distribution for us and every package
 that uses us. Every day I wish PETSc could build with both simultaneously,
 but everyone tells me it's silly.

   Not everyone at all; just a subset of everyone. Junchao is really the
 hold-out :-)

>>> I am not, instead I think we should try (I fully agree it can ease
>>> binary distribution).  But satish needs to install such a machine first :)
>>> There are issues out of our control if we want to mix GPUs in
>>> execution.  For example, how to do VexAXPY on a cuda vector and a hip
>>> vector? Shall we do it on the host? Also, there are no gpu-aware MPI
>>> implementations supporting messages between cuda memory and hip memory.
>>>

   I just don't care about "binary packages" :-); I think they are an
 archaic and bad way of thinking about code distribution (but yes the
 alternatives need lots of work to make them flawless, but I think that is
 where the work should go in the packaging world.)

I go further and think one should be able to automatically use a
 CUDA vector on a HIP device as well, it is not hard in theory but requires
 thinking about how we handle classes and subclasses a little to make it
 straightforward; or perhaps Jacob has fixed that also?
>>>
>>>
>>>
>>>
>>>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-06 Thread Junchao Zhang
On Fri, Jan 6, 2023 at 7:35 PM Mark Lohry  wrote:

> Well, I think it's a moderately crazy idea unless it's less painful to
> implement than I'm thinking. Is there a use case for a mixed device system
> where one petsc executable might be addressing both a HIP and CUDA device
> beyond some frankenstein test system somebody cooked up? In all my code I
> implicitly assume I have either have one host with one device or one host
> with zero devices. I guess you can support these weird scenarios, but why?
> Life is hard enough supporting one device compiler with one host compiler.
>
> Many thanks Junchao -- with combinations of SetPreallocation I was able to
> grab allocated pointers out of petsc. Now I have all the jacobian
> construction on device with no copies.
>
Hi, Mark, could you say a few words about how you assemble matrices on
GPUs?  We ported MatSetValues like routines to GPUs but did not continue
this approach since we have to resolve data races between GPU threads.


>
> On Fri, Jan 6, 2023 at 12:27 AM Barry Smith  wrote:
>
>>
>>   So Jed's "everyone" now consists of "no one" and Jed can stop
>> complaining that "everyone" thinks it is a bad idea.
>>
>>
>>
>> On Jan 5, 2023, at 11:50 PM, Junchao Zhang 
>> wrote:
>>
>>
>>
>>
>> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:
>>
>>>
>>>
>>> > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
>>> >
>>> > Mark Adams  writes:
>>> >
>>> >> Support of HIP and CUDA hardware together would be crazy,
>>> >
>>> > I don't think it's remotely crazy. libCEED supports both together and
>>> it's very convenient when testing on a development machine that has one of
>>> each brand GPU and simplifies binary distribution for us and every package
>>> that uses us. Every day I wish PETSc could build with both simultaneously,
>>> but everyone tells me it's silly.
>>>
>>>   Not everyone at all; just a subset of everyone. Junchao is really the
>>> hold-out :-)
>>>
>> I am not, instead I think we should try (I fully agree it can ease binary
>> distribution).  But satish needs to install such a machine first :)
>> There are issues out of our control if we want to mix GPUs in execution.
>> For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we
>> do it on the host? Also, there are no gpu-aware MPI implementations
>> supporting messages between cuda memory and hip memory.
>>
>>>
>>>   I just don't care about "binary packages" :-); I think they are an
>>> archaic and bad way of thinking about code distribution (but yes the
>>> alternatives need lots of work to make them flawless, but I think that is
>>> where the work should go in the packaging world.)
>>>
>>>I go further and think one should be able to automatically use a CUDA
>>> vector on a HIP device as well, it is not hard in theory but requires
>>> thinking about how we handle classes and subclasses a little to make it
>>> straightforward; or perhaps Jacob has fixed that also?
>>
>>
>>
>>
>>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-06 Thread Mark Lohry
Well, I think it's a moderately crazy idea unless it's less painful to
implement than I'm thinking. Is there a use case for a mixed device system
where one petsc executable might be addressing both a HIP and CUDA device
beyond some frankenstein test system somebody cooked up? In all my code I
implicitly assume I have either have one host with one device or one host
with zero devices. I guess you can support these weird scenarios, but why?
Life is hard enough supporting one device compiler with one host compiler.

Many thanks Junchao -- with combinations of SetPreallocation I was able to
grab allocated pointers out of petsc. Now I have all the jacobian
construction on device with no copies.

On Fri, Jan 6, 2023 at 12:27 AM Barry Smith  wrote:

>
>   So Jed's "everyone" now consists of "no one" and Jed can stop
> complaining that "everyone" thinks it is a bad idea.
>
>
>
> On Jan 5, 2023, at 11:50 PM, Junchao Zhang 
> wrote:
>
>
>
>
> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:
>
>>
>>
>> > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
>> >
>> > Mark Adams  writes:
>> >
>> >> Support of HIP and CUDA hardware together would be crazy,
>> >
>> > I don't think it's remotely crazy. libCEED supports both together and
>> it's very convenient when testing on a development machine that has one of
>> each brand GPU and simplifies binary distribution for us and every package
>> that uses us. Every day I wish PETSc could build with both simultaneously,
>> but everyone tells me it's silly.
>>
>>   Not everyone at all; just a subset of everyone. Junchao is really the
>> hold-out :-)
>>
> I am not, instead I think we should try (I fully agree it can ease binary
> distribution).  But satish needs to install such a machine first :)
> There are issues out of our control if we want to mix GPUs in execution.
> For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we
> do it on the host? Also, there are no gpu-aware MPI implementations
> supporting messages between cuda memory and hip memory.
>
>>
>>   I just don't care about "binary packages" :-); I think they are an
>> archaic and bad way of thinking about code distribution (but yes the
>> alternatives need lots of work to make them flawless, but I think that is
>> where the work should go in the packaging world.)
>>
>>I go further and think one should be able to automatically use a CUDA
>> vector on a HIP device as well, it is not hard in theory but requires
>> thinking about how we handle classes and subclasses a little to make it
>> straightforward; or perhaps Jacob has fixed that also?
>
>
>
>
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Barry Smith

  So Jed's "everyone" now consists of "no one" and Jed can stop complaining 
that "everyone" thinks it is a bad idea.



> On Jan 5, 2023, at 11:50 PM, Junchao Zhang  wrote:
> 
> 
> 
> 
> On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  > wrote:
>> 
>> 
>> > On Jan 5, 2023, at 3:42 PM, Jed Brown > > > wrote:
>> > 
>> > Mark Adams mailto:mfad...@lbl.gov>> writes:
>> > 
>> >> Support of HIP and CUDA hardware together would be crazy, 
>> > 
>> > I don't think it's remotely crazy. libCEED supports both together and it's 
>> > very convenient when testing on a development machine that has one of each 
>> > brand GPU and simplifies binary distribution for us and every package that 
>> > uses us. Every day I wish PETSc could build with both simultaneously, but 
>> > everyone tells me it's silly.
>> 
>>   Not everyone at all; just a subset of everyone. Junchao is really the 
>> hold-out :-)
> I am not, instead I think we should try (I fully agree it can ease binary 
> distribution).  But satish needs to install such a machine first :)
> There are issues out of our control if we want to mix GPUs in execution.  For 
> example, how to do VexAXPY on a cuda vector and a hip vector? Shall we do it 
> on the host? Also, there are no gpu-aware MPI implementations supporting 
> messages between cuda memory and hip memory.
>> 
>>   I just don't care about "binary packages" :-); I think they are an archaic 
>> and bad way of thinking about code distribution (but yes the alternatives 
>> need lots of work to make them flawless, but I think that is where the work 
>> should go in the packaging world.)
>> 
>>I go further and think one should be able to automatically use a CUDA 
>> vector on a HIP device as well, it is not hard in theory but requires 
>> thinking about how we handle classes and subclasses a little to make it 
>> straightforward; or perhaps Jacob has fixed that also?
>  



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 10:32 PM Barry Smith  wrote:

>
>
> > On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
> >
> > Mark Adams  writes:
> >
> >> Support of HIP and CUDA hardware together would be crazy,
> >
> > I don't think it's remotely crazy. libCEED supports both together and
> it's very convenient when testing on a development machine that has one of
> each brand GPU and simplifies binary distribution for us and every package
> that uses us. Every day I wish PETSc could build with both simultaneously,
> but everyone tells me it's silly.
>
>   Not everyone at all; just a subset of everyone. Junchao is really the
> hold-out :-)
>
I am not, instead I think we should try (I fully agree it can ease binary
distribution).  But satish needs to install such a machine first :)
There are issues out of our control if we want to mix GPUs in execution.
For example, how to do VexAXPY on a cuda vector and a hip vector? Shall we
do it on the host? Also, there are no gpu-aware MPI implementations
supporting messages between cuda memory and hip memory.

>
>   I just don't care about "binary packages" :-); I think they are an
> archaic and bad way of thinking about code distribution (but yes the
> alternatives need lots of work to make them flawless, but I think that is
> where the work should go in the packaging world.)
>
>I go further and think one should be able to automatically use a CUDA
> vector on a HIP device as well, it is not hard in theory but requires
> thinking about how we handle classes and subclasses a little to make it
> straightforward; or perhaps Jacob has fixed that also?


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Barry Smith



> On Jan 5, 2023, at 3:42 PM, Jed Brown  wrote:
> 
> Mark Adams  writes:
> 
>> Support of HIP and CUDA hardware together would be crazy, 
> 
> I don't think it's remotely crazy. libCEED supports both together and it's 
> very convenient when testing on a development machine that has one of each 
> brand GPU and simplifies binary distribution for us and every package that 
> uses us. Every day I wish PETSc could build with both simultaneously, but 
> everyone tells me it's silly.

  Not everyone at all; just a subset of everyone. Junchao is really the 
hold-out :-)

  I just don't care about "binary packages" :-); I think they are an archaic 
and bad way of thinking about code distribution (but yes the alternatives need 
lots of work to make them flawless, but I think that is where the work should 
go in the packaging world.)

   I go further and think one should be able to automatically use a CUDA vector 
on a HIP device as well, it is not hard in theory but requires thinking about 
how we handle classes and subclasses a little to make it straightforward; or 
perhaps Jacob has fixed that also?



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown
Junchao Zhang  writes:

>> I don't think it's remotely crazy. libCEED supports both together and it's
>> very convenient when testing on a development machine that has one of each
>> brand GPU and simplifies binary distribution for us and every package that
>> uses us. Every day I wish PETSc could build with both simultaneously, but
>> everyone tells me it's silly.
>>
>
> So an executable supports both GPUs,  but a running instance supports one
> or both at the same time?

I personally only have reason to instantiate one at a time within a given 
executable, though libCEED supports both instantiated at the same time.


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Matthew Knepley
On Thu, Jan 5, 2023 at 3:42 PM Jed Brown  wrote:

> Mark Adams  writes:
>
> > Support of HIP and CUDA hardware together would be crazy,
>
> I don't think it's remotely crazy. libCEED supports both together and it's
> very convenient when testing on a development machine that has one of each
> brand GPU and simplifies binary distribution for us and every package that
> uses us. Every day I wish PETSc could build with both simultaneously, but
> everyone tells me it's silly.
>

This is how I always understood our plan. I think it is crazy _not_ to do
this.

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 2:42 PM Jed Brown  wrote:

> Mark Adams  writes:
>
> > Support of HIP and CUDA hardware together would be crazy,
>
> I don't think it's remotely crazy. libCEED supports both together and it's
> very convenient when testing on a development machine that has one of each
> brand GPU and simplifies binary distribution for us and every package that
> uses us. Every day I wish PETSc could build with both simultaneously, but
> everyone tells me it's silly.
>

So an executable supports both GPUs,  but a running instance supports one
or both at the same time?


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown
Mark Adams  writes:

> Support of HIP and CUDA hardware together would be crazy, 

I don't think it's remotely crazy. libCEED supports both together and it's very 
convenient when testing on a development machine that has one of each brand GPU 
and simplifies binary distribution for us and every package that uses us. Every 
day I wish PETSc could build with both simultaneously, but everyone tells me 
it's silly.


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 10:24 AM Mark Lohry  wrote:

>
> I am thinking something like MatSeqAIJGetArrayAndMemType
>>
>
> Isn't the "MemType" of the matrix an invariant on creation? e.g. a user
> shouldn't care what memtype a pointer is for, just that if a device matrix
> was created it returns device pointers, if a host matrix was created it
> returns host pointers.
>
> Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType
> ,
> isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will
> cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return
> me those raw device pointers?
>
> Yeah, I forgot I added it :). On "a user shouldn't care what memtype a
pointer is":  yes if you can, otherwise you can use mtype to differentiate
your code path.


>
>
>
>
> On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang 
> wrote:
>
>>
>>
>> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:
>>
>>>
 A workaround is to let petsc build the matrix and allocate the memory,
 then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.

>>>
>>> Junchao, looking at the code for this it seems to only return a pointer
>>> to the value array, but not pointers to the column and row index arrays, is
>>> that right?
>>>
>> Yes, that is correct.
>> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
>> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
>> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
>> A is a device matrix, otherwise (a,i, j) on host and mtype =
>> PETSC_MEMTYPE_HOST.
>> We currently have similar things like
>> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
>> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
>>
>> It looks like you need (a, i, j) for assembly, but the above function
>> only works for an assembled matrix.
>>
>>
>>>
>>>
>>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
>>> wrote:
>>>
 We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
 both


 CUPM works with both enabled simultaneously, I don’t think there are
 any direct restrictions for it. Vec at least was fully usable with both
 cuda and hip (though untested) last time I checked.

 Best regards,

 Jacob Faibussowitsch
 (Jacob Fai - booss - oh - vitch)

 On Jan 5, 2023, at 00:09, Junchao Zhang 
 wrote:

 



 On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
 wrote:

> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
> wrote:
>
>>
>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>
>>> Oh, is the device backend not known at compile time?
>>>
>> Currently it is known at compile time.
>>
>
> Are you sure? I don't think it is known at compile time.
>
 We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
 both


>
>   Thanks,
>
>  Matt
>
>
>> Or multiple backends can be alive at once?
>>>
>>
>> Some petsc developers (Jed and Barry) want to support this, but we
>> are incapable now.
>>
>>
>>>
>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>>> wrote:
>>>


 On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:

> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
>> on AMD
>> GPUs, ...
>
>
> Wouldn't one function suffice? Assuming these are contiguous
> arrays in CSR format, they're just raw device pointers in all cases.
>
 But we need to know what device it is (to dispatch to either
 petsc-CUDA or petsc-HIP backend)


>
> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
> junchao.zh...@gmail.com> wrote:
>
>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), 
>> but
>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>> HIPSPARSE on
>> AMD GPUs, ...
>>
>> The real problem I think is to deal with multiple MPI ranks.
>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus 
>> is
>> discouraged for users to do so.
>>
>> A workaround is to let petsc build the matrix and allocate the
>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
>> and fill
>> it up.
>>
>> We recently added routines to support matrix assembly on GPUs,
>> see if MatSetValuesCOO
>> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jacob Faibussowitsch
Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointerYes, the user generally only cares host or device. Sometimes internally though we care if it is host or device, and if device, what kind of device. PetscMemtype is a bit overloaded in this sense.if a host matrix was created it returns host pointers.Many matrix “types” allow conversion. So what was once a host matrix can become a device matrix and vice versa. In these cases you’d want to know which pointers are returned.Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 16:24, Mark Lohry  wrote:I am thinking something like MatSeqAIJGetArrayAndMemTypeIsn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer is for, just that if a device matrix was created it returns device pointers, if a host matrix was created it returns host pointers.Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType, isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return me those raw device pointers?On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang  wrote:On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:A workaround is to let petsc build the matrix and allocate the 
memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
fill it up.Junchao, looking at the code for this it seems to only return a pointer to the value array, but not pointers to the column and row index arrays, is that right?Yes, that is correct. I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST.We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix.    On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch  wrote:We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked. Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:Oh, is the device backend not known at compile time? Currently it is known at compile time.Are you sure? I don't think it is known at compile time.We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both    Thanks,     Matt Or multiple backends can be alive at once?Some petsc developers (Jed and Barry) want to support this, but we are incapable now. On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend) On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang  wrote:No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps--Junchao ZhangOn Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I callMatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Lohry
I am thinking something like MatSeqAIJGetArrayAndMemType
>

Isn't the "MemType" of the matrix an invariant on creation? e.g. a user
shouldn't care what memtype a pointer is for, just that if a device matrix
was created it returns device pointers, if a host matrix was created it
returns host pointers.

Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType
,
isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will
cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return
me those raw device pointers?





On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang 
wrote:

>
>
> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:
>
>>
>>> A workaround is to let petsc build the matrix and allocate the memory,
>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>>
>>
>> Junchao, looking at the code for this it seems to only return a pointer
>> to the value array, but not pointers to the column and row index arrays, is
>> that right?
>>
> Yes, that is correct.
> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
> A is a device matrix, otherwise (a,i, j) on host and mtype =
> PETSC_MEMTYPE_HOST.
> We currently have similar things like
> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
>
> It looks like you need (a, i, j) for assembly, but the above function only
> works for an assembled matrix.
>
>
>>
>>
>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
>> wrote:
>>
>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>>
>>> CUPM works with both enabled simultaneously, I don’t think there are any
>>> direct restrictions for it. Vec at least was fully usable with both cuda
>>> and hip (though untested) last time I checked.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>>>
>>> 
>>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
 wrote:

>
> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>
>> Oh, is the device backend not known at compile time?
>>
> Currently it is known at compile time.
>

 Are you sure? I don't think it is known at compile time.

>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>>

   Thanks,

  Matt


> Or multiple backends can be alive at once?
>>
>
> Some petsc developers (Jed and Barry) want to support this, but we are
> incapable now.
>
>
>>
>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>>
 Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
> on AMD
> GPUs, ...


 Wouldn't one function suffice? Assuming these are contiguous arrays
 in CSR format, they're just raw device pointers in all cases.

>>> But we need to know what device it is (to dispatch to either
>>> petsc-CUDA or petsc-HIP backend)
>>>
>>>

 On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
 junchao.zh...@gmail.com> wrote:

> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
> then we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
> AMD
> GPUs, ...
>
> The real problem I think is to deal with multiple MPI ranks.
> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
> discouraged for users to do so.
>
> A workaround is to let petsc build the matrix and allocate the
> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
> and fill
> it up.
>
> We recently added routines to support matrix assembly on GPUs, see
> if MatSetValuesCOO
> 
>  helps
>
> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
> wrote:
>
>> I have a sparse matrix constructed in non-petsc code using a
>> standard CSR representation where I compute the Jacobian to be used 
>> in an
>> implicit TS context. In the CPU world I call
>>
>> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Junchao Zhang
On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry  wrote:

>
>> A workaround is to let petsc build the matrix and allocate the memory,
>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>
>
> Junchao, looking at the code for this it seems to only return a pointer to
> the value array, but not pointers to the column and row index arrays, is
> that right?
>
Yes, that is correct.
I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
A is a device matrix, otherwise (a,i, j) on host and mtype =
PETSC_MEMTYPE_HOST.
We currently have similar things like
VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).

It looks like you need (a, i, j) for assembly, but the above function only
works for an assembled matrix.


>
>
> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
> wrote:
>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>> CUPM works with both enabled simultaneously, I don’t think there are any
>> direct restrictions for it. Vec at least was fully usable with both cuda
>> and hip (though untested) last time I checked.
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>>
>> 
>>
>>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>>> wrote:
>>>

 On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:

> Oh, is the device backend not known at compile time?
>
 Currently it is known at compile time.

>>>
>>> Are you sure? I don't think it is known at compile time.
>>>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Or multiple backends can be alive at once?
>

 Some petsc developers (Jed and Barry) want to support this, but we are
 incapable now.


>
> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>
>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
 we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
 AMD
 GPUs, ...
>>>
>>>
>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>> in CSR format, they're just raw device pointers in all cases.
>>>
>> But we need to know what device it is (to dispatch to either
>> petsc-CUDA or petsc-HIP backend)
>>
>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>> junchao.zh...@gmail.com> wrote:
>>>
 No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
 GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
 then we
 would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
 GPUs, ...

 The real problem I think is to deal with multiple MPI ranks.
 Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
 discouraged for users to do so.

 A workaround is to let petsc build the matrix and allocate the
 memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
 fill
 it up.

 We recently added routines to support matrix assembly on GPUs, see
 if MatSetValuesCOO
 
  helps

 --Junchao Zhang


 On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:

> I have a sparse matrix constructed in non-petsc code using a
> standard CSR representation where I compute the Jacobian to be used 
> in an
> implicit TS context. In the CPU world I call
>
> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
> rowidxptr, colidxptr, valptr, Jac);
>
> which as I understand it -- (1) never copies/allocates that
> information, and the matrix Jac is just a non-owning view into the 
> already
> allocated CSR, (2) I can write directly into the original data 
> structures
> and the Mat just "knows" about it, although it still needs a call to
> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
> this
> works great with GAMG.
>
> I have the same CSR representation filled in GPU data allocated
> with cudaMalloc and filled on-device. Is there an equivalent Mat
> constructor for GPU arrays, or some other way to avoid unnecessary 
> copies?
>
> Thanks,
> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Lohry
>
>
> A workaround is to let petsc build the matrix and allocate the memory,
> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>

Junchao, looking at the code for this it seems to only return a pointer to
the value array, but not pointers to the column and row index arrays, is
that right?

On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch 
wrote:

> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
> CUPM works with both enabled simultaneously, I don’t think there are any
> direct restrictions for it. Vec at least was fully usable with both cuda
> and hip (though untested) last time I checked.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:
>
> 
>
>
>
> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>>
 Oh, is the device backend not known at compile time?

>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Or multiple backends can be alive at once?

>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>> incapable now.
>>>
>>>

 On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
 wrote:

>
>
> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>
>>
>> Wouldn't one function suffice? Assuming these are contiguous arrays
>> in CSR format, they're just raw device pointers in all cases.
>>
> But we need to know what device it is (to dispatch to either
> petsc-CUDA or petsc-HIP backend)
>
>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
>> wrote:
>>
>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
>>> then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>>
>>> The real problem I think is to deal with multiple MPI ranks.
>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>> discouraged for users to do so.
>>>
>>> A workaround is to let petsc build the matrix and allocate the
>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
>>> fill
>>> it up.
>>>
>>> We recently added routines to support matrix assembly on GPUs, see if
>>>  MatSetValuesCOO
>>> 
>>>  helps
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>>>
 I have a sparse matrix constructed in non-petsc code using a
 standard CSR representation where I compute the Jacobian to be used in 
 an
 implicit TS context. In the CPU world I call

 MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
 rowidxptr, colidxptr, valptr, Jac);

 which as I understand it -- (1) never copies/allocates that
 information, and the matrix Jac is just a non-owning view into the 
 already
 allocated CSR, (2) I can write directly into the original data 
 structures
 and the Mat just "knows" about it, although it still needs a call to
 MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
 works great with GAMG.

 I have the same CSR representation filled in GPU data allocated
 with cudaMalloc and filled on-device. Is there an equivalent Mat
 constructor for GPU arrays, or some other way to avoid unnecessary 
 copies?

 Thanks,
 Mark

>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jacob Faibussowitsch
We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked. Best regards,Jacob Faibussowitsch(Jacob Fai - booss - oh - vitch)On Jan 5, 2023, at 00:09, Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:Oh, is the device backend not known at compile time? Currently it is known at compile time.Are you sure? I don't think it is known at compile time.We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both    Thanks,     Matt Or multiple backends can be alive at once?Some petsc developers (Jed and Barry) want to support this, but we are incapable now. On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang  wrote:On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend) On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang  wrote:No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so.A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps--Junchao ZhangOn Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I callMatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG.I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies?Thanks,Mark





-- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.-- Norbert Wienerhttps://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Mark Adams
Support of HIP and CUDA hardware together would be crazy, but supporting
(Kokkos) OpenMP and a device backend would require something that looks
like two device back-ends and long term CPUs (eg, Grace) may not go away
wrt kernels.
PETSc does not support OMP well, of course, but that support could grow if
that is where hardware and applications go.


On Wed, Jan 4, 2023 at 7:38 PM Mark Lohry  wrote:

> You have my condolences if you have to support all those things
> simultaneously.
>
> On Wed, Jan 4, 2023, 7:27 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang 
>> wrote:
>>
>>> We don't have a machine for us to test with both "--with-cuda --with-hip"
>>>
>>
>> Yes, but your answer suggested that the structure of the code prevented
>> this combination.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
 wrote:

> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
> wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>>
 Oh, is the device backend not known at compile time?

>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
> both
>

 Where is the logic for that in the code? This seems like a crazy design.

   Thanks,

 Matt


>   Thanks,
>>
>>  Matt
>>
>>
>>> Or multiple backends can be alive at once?

>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we
>>> are incapable now.
>>>
>>>

 On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
 wrote:

>
>
> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry 
> wrote:
>
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but
>>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>>> HIPSPARSE on
>>> AMD GPUs, ...
>>
>>
>> Wouldn't one function suffice? Assuming these are contiguous
>> arrays in CSR format, they're just raw device pointers in all cases.
>>
> But we need to know what device it is (to dispatch to either
> petsc-CUDA or petsc-HIP backend)
>
>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>> junchao.zh...@gmail.com> wrote:
>>
>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), 
>>> but
>>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>>> HIPSPARSE on
>>> AMD GPUs, ...
>>>
>>> The real problem I think is to deal with multiple MPI ranks.
>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus 
>>> is
>>> discouraged for users to do so.
>>>
>>> A workaround is to let petsc build the matrix and allocate the
>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
>>> and fill
>>> it up.
>>>
>>> We recently added routines to support matrix assembly on GPUs,
>>> see if MatSetValuesCOO
>>> 
>>>  helps
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
>>> wrote:
>>>
 I have a sparse matrix constructed in non-petsc code using a
 standard CSR representation where I compute the Jacobian to be 
 used in an
 implicit TS context. In the CPU world I call

 MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
 rowidxptr, colidxptr, valptr, Jac);

 which as I understand it -- (1) never copies/allocates that
 information, and the matrix Jac is just a non-owning view into the 
 already
 allocated CSR, (2) I can write directly into the original data 
 structures
 and the Mat just "knows" about it, although it still needs a call 
 to
 MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
 this
 works great with GAMG.

 I have the same CSR representation filled in GPU data allocated
 with cudaMalloc and filled on-device. Is there an equivalent Mat
 constructor for GPU arrays, or some other way to avoid unnecessary 
 copies?

 Thanks,

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Mark Lohry
You have my condolences if you have to support all those things
simultaneously.

On Wed, Jan 4, 2023, 7:27 PM Matthew Knepley  wrote:

> On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang 
> wrote:
>
>> We don't have a machine for us to test with both "--with-cuda --with-hip"
>>
>
> Yes, but your answer suggested that the structure of the code prevented
> this combination.
>
>   Thanks,
>
>  Matt
>
>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley  wrote:
>>
>>> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
 wrote:

> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
> wrote:
>
>>
>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>
>>> Oh, is the device backend not known at compile time?
>>>
>> Currently it is known at compile time.
>>
>
> Are you sure? I don't think it is known at compile time.
>
 We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
 both

>>>
>>> Where is the logic for that in the code? This seems like a crazy design.
>>>
>>>   Thanks,
>>>
>>> Matt
>>>
>>>
   Thanks,
>
>  Matt
>
>
>> Or multiple backends can be alive at once?
>>>
>>
>> Some petsc developers (Jed and Barry) want to support this, but we
>> are incapable now.
>>
>>
>>>
>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>>> wrote:
>>>


 On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:

> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
>> on AMD
>> GPUs, ...
>
>
> Wouldn't one function suffice? Assuming these are contiguous
> arrays in CSR format, they're just raw device pointers in all cases.
>
 But we need to know what device it is (to dispatch to either
 petsc-CUDA or petsc-HIP backend)


>
> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
> junchao.zh...@gmail.com> wrote:
>
>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), 
>> but
>> then we would need another for MATMPIAIJCUSPARSE, and then for 
>> HIPSPARSE on
>> AMD GPUs, ...
>>
>> The real problem I think is to deal with multiple MPI ranks.
>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus 
>> is
>> discouraged for users to do so.
>>
>> A workaround is to let petsc build the matrix and allocate the
>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
>> and fill
>> it up.
>>
>> We recently added routines to support matrix assembly on GPUs,
>> see if MatSetValuesCOO
>> 
>>  helps
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
>> wrote:
>>
>>> I have a sparse matrix constructed in non-petsc code using a
>>> standard CSR representation where I compute the Jacobian to be used 
>>> in an
>>> implicit TS context. In the CPU world I call
>>>
>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>>> rowidxptr, colidxptr, valptr, Jac);
>>>
>>> which as I understand it -- (1) never copies/allocates that
>>> information, and the matrix Jac is just a non-owning view into the 
>>> already
>>> allocated CSR, (2) I can write directly into the original data 
>>> structures
>>> and the Mat just "knows" about it, although it still needs a call to
>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
>>> this
>>> works great with GAMG.
>>>
>>> I have the same CSR representation filled in GPU data allocated
>>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>>> constructor for GPU arrays, or some other way to avoid unnecessary 
>>> copies?
>>>
>>> Thanks,
>>> Mark
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>

>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Matthew Knepley
On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang 
wrote:

> We don't have a machine for us to test with both "--with-cuda --with-hip"
>

Yes, but your answer suggested that the structure of the code prevented
this combination.

  Thanks,

 Matt


> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
>> wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
 wrote:

>
> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>
>> Oh, is the device backend not known at compile time?
>>
> Currently it is known at compile time.
>

 Are you sure? I don't think it is known at compile time.

>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>
>> Where is the logic for that in the code? This seems like a crazy design.
>>
>>   Thanks,
>>
>> Matt
>>
>>
>>>   Thanks,

  Matt


> Or multiple backends can be alive at once?
>>
>
> Some petsc developers (Jed and Barry) want to support this, but we are
> incapable now.
>
>
>>
>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>>
 Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE 
> on AMD
> GPUs, ...


 Wouldn't one function suffice? Assuming these are contiguous arrays
 in CSR format, they're just raw device pointers in all cases.

>>> But we need to know what device it is (to dispatch to either
>>> petsc-CUDA or petsc-HIP backend)
>>>
>>>

 On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
 junchao.zh...@gmail.com> wrote:

> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
> then we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
> AMD
> GPUs, ...
>
> The real problem I think is to deal with multiple MPI ranks.
> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
> discouraged for users to do so.
>
> A workaround is to let petsc build the matrix and allocate the
> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array 
> and fill
> it up.
>
> We recently added routines to support matrix assembly on GPUs, see
> if MatSetValuesCOO
> 
>  helps
>
> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry 
> wrote:
>
>> I have a sparse matrix constructed in non-petsc code using a
>> standard CSR representation where I compute the Jacobian to be used 
>> in an
>> implicit TS context. In the CPU world I call
>>
>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>> rowidxptr, colidxptr, valptr, Jac);
>>
>> which as I understand it -- (1) never copies/allocates that
>> information, and the matrix Jac is just a non-owning view into the 
>> already
>> allocated CSR, (2) I can write directly into the original data 
>> structures
>> and the Mat just "knows" about it, although it still needs a call to
>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
>> this
>> works great with GAMG.
>>
>> I have the same CSR representation filled in GPU data allocated
>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>> constructor for GPU arrays, or some other way to avoid unnecessary 
>> copies?
>>
>> Thanks,
>> Mark
>>
>

 --
 What most experimenters take for granted before they begin their
 experiments is infinitely more interesting than any results to which their
 experiments lead.
 -- Norbert Wiener

 https://www.cse.buffalo.edu/~knepley/
 

>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Junchao Zhang
We don't have a machine for us to test with both "--with-cuda --with-hip"

--Junchao Zhang


On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley  wrote:

> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
> wrote:
>
>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>>> wrote:
>>>

 On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:

> Oh, is the device backend not known at compile time?
>
 Currently it is known at compile time.

>>>
>>> Are you sure? I don't think it is known at compile time.
>>>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>
> Where is the logic for that in the code? This seems like a crazy design.
>
>   Thanks,
>
> Matt
>
>
>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Or multiple backends can be alive at once?
>

 Some petsc developers (Jed and Barry) want to support this, but we are
 incapable now.


>
> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>
>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
 we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on 
 AMD
 GPUs, ...
>>>
>>>
>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>> in CSR format, they're just raw device pointers in all cases.
>>>
>> But we need to know what device it is (to dispatch to either
>> petsc-CUDA or petsc-HIP backend)
>>
>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>> junchao.zh...@gmail.com> wrote:
>>>
 No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
 GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
 then we
 would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
 GPUs, ...

 The real problem I think is to deal with multiple MPI ranks.
 Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
 discouraged for users to do so.

 A workaround is to let petsc build the matrix and allocate the
 memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
 fill
 it up.

 We recently added routines to support matrix assembly on GPUs, see
 if MatSetValuesCOO
 
  helps

 --Junchao Zhang


 On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:

> I have a sparse matrix constructed in non-petsc code using a
> standard CSR representation where I compute the Jacobian to be used 
> in an
> implicit TS context. In the CPU world I call
>
> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
> rowidxptr, colidxptr, valptr, Jac);
>
> which as I understand it -- (1) never copies/allocates that
> information, and the matrix Jac is just a non-owning view into the 
> already
> allocated CSR, (2) I can write directly into the original data 
> structures
> and the Mat just "knows" about it, although it still needs a call to
> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far 
> this
> works great with GAMG.
>
> I have the same CSR representation filled in GPU data allocated
> with cudaMalloc and filled on-device. Is there an equivalent Mat
> constructor for GPU arrays, or some other way to avoid unnecessary 
> copies?
>
> Thanks,
> Mark
>

>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Matthew Knepley
On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang 
wrote:

> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>>
 Oh, is the device backend not known at compile time?

>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>

Where is the logic for that in the code? This seems like a crazy design.

  Thanks,

Matt


>   Thanks,
>>
>>  Matt
>>
>>
>>> Or multiple backends can be alive at once?

>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>> incapable now.
>>>
>>>

 On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
 wrote:

>
>
> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>
>>
>> Wouldn't one function suffice? Assuming these are contiguous arrays
>> in CSR format, they're just raw device pointers in all cases.
>>
> But we need to know what device it is (to dispatch to either
> petsc-CUDA or petsc-HIP backend)
>
>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
>> wrote:
>>
>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but 
>>> then we
>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>> GPUs, ...
>>>
>>> The real problem I think is to deal with multiple MPI ranks.
>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>> discouraged for users to do so.
>>>
>>> A workaround is to let petsc build the matrix and allocate the
>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
>>> fill
>>> it up.
>>>
>>> We recently added routines to support matrix assembly on GPUs, see if
>>>  MatSetValuesCOO
>>> 
>>>  helps
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>>>
 I have a sparse matrix constructed in non-petsc code using a
 standard CSR representation where I compute the Jacobian to be used in 
 an
 implicit TS context. In the CPU world I call

 MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
 rowidxptr, colidxptr, valptr, Jac);

 which as I understand it -- (1) never copies/allocates that
 information, and the matrix Jac is just a non-owning view into the 
 already
 allocated CSR, (2) I can write directly into the original data 
 structures
 and the Mat just "knows" about it, although it still needs a call to
 MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
 works great with GAMG.

 I have the same CSR representation filled in GPU data allocated
 with cudaMalloc and filled on-device. Is there an equivalent Mat
 constructor for GPU arrays, or some other way to avoid unnecessary 
 copies?

 Thanks,
 Mark

>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Junchao Zhang
On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley  wrote:

> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
> wrote:
>
>>
>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>>
>>> Oh, is the device backend not known at compile time?
>>>
>> Currently it is known at compile time.
>>
>
> Are you sure? I don't think it is known at compile time.
>
We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both


>
>   Thanks,
>
>  Matt
>
>
>> Or multiple backends can be alive at once?
>>>
>>
>> Some petsc developers (Jed and Barry) want to support this, but we are
>> incapable now.
>>
>>
>>>
>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>>> wrote:
>>>


 On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:

> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>> GPUs, ...
>
>
> Wouldn't one function suffice? Assuming these are contiguous arrays in
> CSR format, they're just raw device pointers in all cases.
>
 But we need to know what device it is (to dispatch to either petsc-CUDA
 or petsc-HIP backend)


>
> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
> wrote:
>
>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then 
>> we
>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>> GPUs, ...
>>
>> The real problem I think is to deal with multiple MPI ranks.
>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>> discouraged for users to do so.
>>
>> A workaround is to let petsc build the matrix and allocate the
>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and 
>> fill
>> it up.
>>
>> We recently added routines to support matrix assembly on GPUs, see if
>>  MatSetValuesCOO
>> 
>>  helps
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>>
>>> I have a sparse matrix constructed in non-petsc code using a
>>> standard CSR representation where I compute the Jacobian to be used in 
>>> an
>>> implicit TS context. In the CPU world I call
>>>
>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
>>> colidxptr, valptr, Jac);
>>>
>>> which as I understand it -- (1) never copies/allocates that
>>> information, and the matrix Jac is just a non-owning view into the 
>>> already
>>> allocated CSR, (2) I can write directly into the original data 
>>> structures
>>> and the Mat just "knows" about it, although it still needs a call to
>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>> works great with GAMG.
>>>
>>> I have the same CSR representation filled in GPU data allocated with
>>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor 
>>> for
>>> GPU arrays, or some other way to avoid unnecessary copies?
>>>
>>> Thanks,
>>> Mark
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Matthew Knepley
On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang 
wrote:

>
> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:
>
>> Oh, is the device backend not known at compile time?
>>
> Currently it is known at compile time.
>

Are you sure? I don't think it is known at compile time.

  Thanks,

 Matt


> Or multiple backends can be alive at once?
>>
>
> Some petsc developers (Jed and Barry) want to support this, but we are
> incapable now.
>
>
>>
>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>>
 Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
> GPUs, ...


 Wouldn't one function suffice? Assuming these are contiguous arrays in
 CSR format, they're just raw device pointers in all cases.

>>> But we need to know what device it is (to dispatch to either petsc-CUDA
>>> or petsc-HIP backend)
>>>
>>>

 On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
 wrote:

> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then 
> we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
> GPUs, ...
>
> The real problem I think is to deal with multiple MPI ranks. Providing
> the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged
> for users to do so.
>
> A workaround is to let petsc build the matrix and allocate the memory,
> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>
> We recently added routines to support matrix assembly on GPUs, see if
> MatSetValuesCOO
> 
>  helps
>
> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>
>> I have a sparse matrix constructed in non-petsc code using a standard
>> CSR representation where I compute the Jacobian to be used in an implicit
>> TS context. In the CPU world I call
>>
>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
>> colidxptr, valptr, Jac);
>>
>> which as I understand it -- (1) never copies/allocates that
>> information, and the matrix Jac is just a non-owning view into the 
>> already
>> allocated CSR, (2) I can write directly into the original data structures
>> and the Mat just "knows" about it, although it still needs a call to
>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>> works great with GAMG.
>>
>> I have the same CSR representation filled in GPU data allocated with
>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor 
>> for
>> GPU arrays, or some other way to avoid unnecessary copies?
>>
>> Thanks,
>> Mark
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Junchao Zhang
On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry  wrote:

> Oh, is the device backend not known at compile time?
>
Currently it is known at compile time.

Or multiple backends can be alive at once?
>

Some petsc developers (Jed and Barry) want to support this, but we are
incapable now.


>
> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:
>>
>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
 would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
 GPUs, ...
>>>
>>>
>>> Wouldn't one function suffice? Assuming these are contiguous arrays in
>>> CSR format, they're just raw device pointers in all cases.
>>>
>> But we need to know what device it is (to dispatch to either petsc-CUDA
>> or petsc-HIP backend)
>>
>>
>>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
>>> wrote:
>>>
 No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
 GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
 would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
 GPUs, ...

 The real problem I think is to deal with multiple MPI ranks. Providing
 the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged
 for users to do so.

 A workaround is to let petsc build the matrix and allocate the memory,
 then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.

 We recently added routines to support matrix assembly on GPUs, see if
 MatSetValuesCOO
  helps

 --Junchao Zhang


 On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:

> I have a sparse matrix constructed in non-petsc code using a standard
> CSR representation where I compute the Jacobian to be used in an implicit
> TS context. In the CPU world I call
>
> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
> colidxptr, valptr, Jac);
>
> which as I understand it -- (1) never copies/allocates that
> information, and the matrix Jac is just a non-owning view into the already
> allocated CSR, (2) I can write directly into the original data structures
> and the Mat just "knows" about it, although it still needs a call to
> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
> works great with GAMG.
>
> I have the same CSR representation filled in GPU data allocated with
> cudaMalloc and filled on-device. Is there an equivalent Mat constructor 
> for
> GPU arrays, or some other way to avoid unnecessary copies?
>
> Thanks,
> Mark
>



Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Junchao Zhang
On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry  wrote:

> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>> GPUs, ...
>
>
> Wouldn't one function suffice? Assuming these are contiguous arrays in CSR
> format, they're just raw device pointers in all cases.
>
But we need to know what device it is (to dispatch to either petsc-CUDA or
petsc-HIP backend)


>
> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
> wrote:
>
>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs.
>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would
>> need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...
>>
>> The real problem I think is to deal with multiple MPI ranks. Providing
>> the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged
>> for users to do so.
>>
>> A workaround is to let petsc build the matrix and allocate the memory,
>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>
>> We recently added routines to support matrix assembly on GPUs, see if
>> MatSetValuesCOO
>>  helps
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>>
>>> I have a sparse matrix constructed in non-petsc code using a standard
>>> CSR representation where I compute the Jacobian to be used in an implicit
>>> TS context. In the CPU world I call
>>>
>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
>>> colidxptr, valptr, Jac);
>>>
>>> which as I understand it -- (1) never copies/allocates that information,
>>> and the matrix Jac is just a non-owning view into the already allocated
>>> CSR, (2) I can write directly into the original data structures and the Mat
>>> just "knows" about it, although it still needs a call to
>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>> works great with GAMG.
>>>
>>> I have the same CSR representation filled in GPU data allocated with
>>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for
>>> GPU arrays, or some other way to avoid unnecessary copies?
>>>
>>> Thanks,
>>> Mark
>>>
>>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Mark Lohry
>
> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
> GPUs, ...


Wouldn't one function suffice? Assuming these are contiguous arrays in CSR
format, they're just raw device pointers in all cases.

On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang 
wrote:

> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs.
> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would
> need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...
>
> The real problem I think is to deal with multiple MPI ranks. Providing the
> split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for
> users to do so.
>
> A workaround is to let petsc build the matrix and allocate the memory,
> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>
> We recently added routines to support matrix assembly on GPUs, see if
> MatSetValuesCOO
>  helps
>
> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:
>
>> I have a sparse matrix constructed in non-petsc code using a standard CSR
>> representation where I compute the Jacobian to be used in an implicit TS
>> context. In the CPU world I call
>>
>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
>> colidxptr, valptr, Jac);
>>
>> which as I understand it -- (1) never copies/allocates that information,
>> and the matrix Jac is just a non-owning view into the already allocated
>> CSR, (2) I can write directly into the original data structures and the Mat
>> just "knows" about it, although it still needs a call to
>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>> works great with GAMG.
>>
>> I have the same CSR representation filled in GPU data allocated with
>> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for
>> GPU arrays, or some other way to avoid unnecessary copies?
>>
>> Thanks,
>> Mark
>>
>


Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Junchao Zhang
No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs.
Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would
need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...

The real problem I think is to deal with multiple MPI ranks. Providing the
split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for
users to do so.

A workaround is to let petsc build the matrix and allocate the memory, then
you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.

We recently added routines to support matrix assembly on GPUs, see if
MatSetValuesCOO
 helps

--Junchao Zhang


On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry  wrote:

> I have a sparse matrix constructed in non-petsc code using a standard CSR
> representation where I compute the Jacobian to be used in an implicit TS
> context. In the CPU world I call
>
> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
> colidxptr, valptr, Jac);
>
> which as I understand it -- (1) never copies/allocates that information,
> and the matrix Jac is just a non-owning view into the already allocated
> CSR, (2) I can write directly into the original data structures and the Mat
> just "knows" about it, although it still needs a call to
> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
> works great with GAMG.
>
> I have the same CSR representation filled in GPU data allocated with
> cudaMalloc and filled on-device. Is there an equivalent Mat constructor for
> GPU arrays, or some other way to avoid unnecessary copies?
>
> Thanks,
> Mark
>


[petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-04 Thread Mark Lohry
I have a sparse matrix constructed in non-petsc code using a standard CSR
representation where I compute the Jacobian to be used in an implicit TS
context. In the CPU world I call

MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr,
colidxptr, valptr, Jac);

which as I understand it -- (1) never copies/allocates that information,
and the matrix Jac is just a non-owning view into the already allocated
CSR, (2) I can write directly into the original data structures and the Mat
just "knows" about it, although it still needs a call to
MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
works great with GAMG.

I have the same CSR representation filled in GPU data allocated with
cudaMalloc and filled on-device. Is there an equivalent Mat constructor for
GPU arrays, or some other way to avoid unnecessary copies?

Thanks,
Mark