Re: [petsc-users] [External] Re: MatVec on GPUs

Swarnava Ghosh Tue, 19 Oct 2021 19:01:48 -0700

Thanks, Matt!

Sincerely,
SG


On Tue, Oct 19, 2021 at 9:34 PM Matthew Knepley <[email protected]> wrote:

> On Tue, Oct 19, 2021 at 9:18 PM Swarnava Ghosh <[email protected]>
> wrote:
>
>> Thank you Junchao! Is it possible to determine how much time is being
>> spent on data transfer from the CPU mem to the GPU mem from the log?
>>
>
> It looks like
>
> VecCUDACopyTo        891 1.1 1.5322e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0    842 6.23e+01    0
> 0.00e+00  0
>
> VecCUDACopyFrom      891 1.1 1.5837e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00  842
> 6.23e+01  0
>
> MatCUSPARSCopyTo     891 1.1 1.5229e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0    842 1.93e+03    0
> 0.00e+00  0
>
>   Thanks,
>
>      Matt
>
>
>>
>> ************************************************************************************************************************
>>
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>>
>> ************************************************************************************************************************
>>
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>>
>> /ccsopen/home/swarnava/MiniApp_xl_cu/bin/sq on a  named h49n15 with 4
>> processors, by swarnava Tue Oct 19 21:10:56 2021
>>
>> Using Petsc Release Version 3.15.0, Mar 30, 2021
>>
>>
>>                          Max       Max/Min     Avg       Total
>>
>> Time (sec):           1.172e+02     1.000   1.172e+02
>>
>> Objects:              1.160e+02     1.000   1.160e+02
>>
>> Flop:                 5.832e+10     1.125   5.508e+10  2.203e+11
>>
>> Flop/sec:             4.974e+08     1.125   4.698e+08  1.879e+09
>>
>> MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
>>
>> MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
>>
>> MPI Reductions:       1.320e+02     1.000
>>
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flop
>>
>>                             and VecAXPY() for complex vectors of length
>> N --> 8N flop
>>
>>
>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>
>>                         Avg     %Total     Avg     %Total    Count   %Total
>>     Avg         %Total    Count   %Total
>>
>>  0:      Main Stage: 1.1725e+02 100.0%  2.2033e+11 100.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  1.140e+02  86.4%
>>
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>>
>> Phase summary info:
>>
>>    Count: number of times phase was executed
>>
>>    Time and Flop: Max - maximum over all processors
>>
>>                   Ratio - ratio of maximum to minimum over all processors
>>
>>    Mess: number of messages sent
>>
>>    AvgLen: average message length (bytes)
>>
>>    Reduct: number of global reductions
>>
>>    Global: entire computation
>>
>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>
>>       %T - percent time in this phase         %F - percent flop in this
>> phase
>>
>>       %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>
>>       %R - percent reductions in this phase
>>
>>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>> over all processors)
>>
>>    GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max
>> GPU time over all processors)
>>
>>    CpuToGpu Count: total number of CPU to GPU copies per processor
>>
>>    CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
>> processor)
>>
>>    GpuToCpu Count: total number of GPU to CPU copies per processor
>>
>>    GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
>> processor)
>>
>>    GPU %F: percent flops on GPU in this event
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Event                Count      Time (sec)     Flop
>>         --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   -
>> GpuToCpu - GPU
>>
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count
>>   Size  %F
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> --- Event Stage 0: Main Stage
>>
>>
>> BuildTwoSided          2 1.0 6.2501e-03145.1 0.00e+00 0.0 0.0e+00
>> 0.0e+00 2.0e+00  0  0  0  0  2   0  0  0  0  2     0       0      0
>> 0.00e+00    0 0.00e+00  0
>>
>> BuildTwoSidedF         2 1.0 6.2628e-03123.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 2.0e+00  0  0  0  0  2   0  0  0  0  2     0       0      0
>> 0.00e+00    0 0.00e+00  0
>>
>> VecDot             89991 1.1 3.4663e+00 1.2 1.67e+09 1.1 0.0e+00 0.0e+00
>> 0.0e+00  3  3  0  0  0   3  3  0  0  0  1816    1841      0 0.00e+00
>> 84992 6.80e-01 100
>>
>> VecNorm            89991 1.1 5.5282e+00 1.2 1.67e+09 1.1 0.0e+00 0.0e+00
>> 0.0e+00  4  3  0  0  0   4  3  0  0  0  1139    1148      0 0.00e+00
>> 84992 6.80e-01 100
>>
>> VecScale           89991 1.1 1.3902e+00 1.2 8.33e+08 1.1 0.0e+00 0.0e+00
>> 0.0e+00  1  1  0  0  0   1  1  0  0  0  2265    2343   84992 6.80e-01    0
>> 0.00e+00 100
>>
>> VecCopy           178201 1.1 2.9825e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> VecSet              3589 1.1 1.0195e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> VecAXPY           179091 1.1 2.7456e+00 1.2 3.32e+09 1.1 0.0e+00 0.0e+00
>> 0.0e+00  2  6  0  0  0   2  6  0  0  0  4564    4739   169142 1.35e+00
>>   0 0.00e+00 100
>>
>> VecCUDACopyTo        891 1.1 1.5322e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0    842 6.23e+01    0
>> 0.00e+00  0
>>
>> VecCUDACopyFrom      891 1.1 1.5837e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00  842
>> 6.23e+01  0
>>
>> DMCreateMat            5 1.0 7.3491e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 7.0e+00  1  0  0  0  5   1  0  0  0  6     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> SFSetGraph             5 1.0 3.5016e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatMult            89991 1.1 2.0423e+00 1.2 5.08e+10 1.1 0.0e+00 0.0e+00
>> 0.0e+00  2 87  0  0  0   2 87  0  0  0 94039   105680   1683 2.00e+03    0
>> 0.00e+00 100
>>
>> MatCopy              891 1.1 1.3600e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatConvert             2 1.0 1.0489e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatScale               2 1.0 2.7950e-04 1.3 3.18e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  4530       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatAssemblyBegin       7 1.0 6.3768e-0368.8 0.00e+00 0.0 0.0e+00 0.0e+00
>> 2.0e+00  0  0  0  0  2   0  0  0  0  2     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatAssemblyEnd         7 1.0 7.9870e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 4.0e+00  0  0  0  0  3   0  0  0  0  4     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>
>> MatCUSPARSCopyTo     891 1.1 1.5229e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0    842 1.93e+03    0
>> 0.00e+00  0
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>>
>> Reports information only for process 0.
>>
>>
>> --- Event Stage 0: Main Stage
>>
>>
>>               Vector    69             11        19112     0.
>>
>>     Distributed Mesh     3              0            0     0.
>>
>>            Index Set    12             10       187512     0.
>>
>>    IS L to G Mapping     3              0            0     0.
>>
>>    Star Forest Graph    11              0            0     0.
>>
>>      Discrete System     3              0            0     0.
>>
>>            Weak Form     3              0            0     0.
>>
>>    Application Order     1              0            0     0.
>>
>>               Matrix     8              0            0     0.
>>
>>        Krylov Solver     1              0            0     0.
>>
>>       Preconditioner     1              0            0     0.
>>
>>               Viewer     1              0            0     0.
>>
>>
>> ========================================================================================================================
>>
>> Average time to get PetscTime(): 4.32e-08
>>
>> Average time for MPI_Barrier(): 9.94e-07
>>
>> Average time for zero size MPI_Send(): 4.20135e-05
>>
>>
>> Sincerely,
>>
>> SG
>>
>> On Tue, Oct 19, 2021 at 12:28 AM Junchao Zhang <[email protected]>
>> wrote:
>>
>>>
>>>
>>>
>>> On Mon, Oct 18, 2021 at 10:56 PM Swarnava Ghosh <[email protected]>
>>> wrote:
>>>
>>>> I am trying the port parts of the following function on GPUs.
>>>> Essentially, the lines of codes between the two "TODO..." comments should
>>>> be executed on the device. Here is the function:
>>>>
>>>> PetscScalar CalculateSpectralNodesAndWeights(LSDFT_OBJ *pLsdft, int p,
>>>> int LIp)
>>>> {
>>>>
>>>>   PetscInt N_qp;
>>>>   N_qp = pLsdft->N_qp;
>>>>
>>>>   int k;
>>>>   PetscScalar *a, *b;
>>>>   k=0;
>>>>
>>>>   PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &a);
>>>>   PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &b);
>>>>
>>>>   /*
>>>>    * TODO: COPY a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
>>>> pLsdft->LapPlusVeffOprloc, k,p,N_qp from HOST to DEVICE
>>>>    * DO THE FOLLOWING OPERATIONS ON DEVICE
>>>>    */
>>>>
>>>>   //zero out vectors
>>>>   VecZeroEntries(pLsdft->Vk);
>>>>   VecZeroEntries(pLsdft->Vkm1);
>>>>   VecZeroEntries(pLsdft->Vkp1);
>>>>
>>>>   VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES);
>>>>   MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vkm1,pLsdft->Vk);
>>>>   VecDot(pLsdft->Vkm1, pLsdft->Vk, &a[0]);
>>>>   VecAXPY(pLsdft->Vk, -a[0], pLsdft->Vkm1);
>>>>   VecNorm(pLsdft->Vk, NORM_2, &b[0]);
>>>>   VecScale(pLsdft->Vk, 1.0 / b[0]);
>>>>
>>>>   for (k = 0; k < N_qp; k++) {
>>>>     MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vk,pLsdft->Vkp1);
>>>>     VecDot(pLsdft->Vk, pLsdft->Vkp1, &a[k + 1]);
>>>>     VecAXPY(pLsdft->Vkp1, -a[k + 1], pLsdft->Vk);
>>>>     VecAXPY(pLsdft->Vkp1, -b[k], pLsdft->Vkm1);
>>>>     VecCopy(pLsdft->Vk, pLsdft->Vkm1);
>>>>     VecNorm(pLsdft->Vkp1, NORM_2, &b[k + 1]);
>>>>     VecCopy(pLsdft->Vkp1, pLsdft->Vk);
>>>>     VecScale(pLsdft->Vk, 1.0 / b[k + 1]);
>>>>   }
>>>>
>>>>   /*
>>>>    * TODO: Copy back a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
>>>> pLsdft->LapPlusVeffOprloc, k,p,N_qp from DEVICE to HOST
>>>>    */
>>>>
>>>>   /*
>>>>    * Some operation with a, and b on HOST
>>>>    *
>>>>    */
>>>>   TridiagEigenVecSolve_NodesAndWeights(pLsdft, a, b, N_qp, LIp);  //
>>>> operation on the host
>>>>
>>>>   // free a,b
>>>>   PetscFree(a);
>>>>   PetscFree(b);
>>>>
>>>>   return 0;
>>>> }
>>>>
>>>> If I just use the command line options to set vectors Vk,Vkp1 and Vkm1
>>>> as cuda vectors and the matrix  LapPlusVeffOprloc as aijcusparse, will the
>>>> lines of code between the two "TODO" comments be entirely executed on the
>>>> device?
>>>>
>>> yes, except  VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES);  which is
>>> done on CPU, by pulling down vector data from GPU to CPU and setting the
>>> value.  Subsequent vector operations will push the updated vector data to
>>> GPU again.
>>>
>>>
>>>>
>>>> Sincerely,
>>>> Swarnava
>>>>
>>>>
>>>> On Mon, Oct 18, 2021 at 10:13 PM Swarnava Ghosh <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the clarification, Junchao.
>>>>>
>>>>> Sincerely,
>>>>> Swarnava
>>>>>
>>>>> On Mon, Oct 18, 2021 at 10:08 PM Junchao Zhang <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 18, 2021 at 8:47 PM Swarnava Ghosh <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junchao,
>>>>>>>
>>>>>>> If I want to pass command line options as  -mymat_mat_type
>>>>>>> aijcusparse, should it be MatSetOptionsPrefix(A,"mymat"); or
>>>>>>> MatSetOptionsPrefix(A,"mymat_"); ? Could you please clarify?
>>>>>>>
>>>>>>  my fault, it should be MatSetOptionsPrefix(A,"mymat_"), as seen in
>>>>>> mat/tests/ex62.c
>>>>>>  Thanks
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Swarnava
>>>>>>>
>>>>>>> On Mon, Oct 18, 2021 at 9:23 PM Junchao Zhang <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> MatSetOptionsPrefix(A,"mymat")
>>>>>>>> VecSetOptionsPrefix(v,"myvec")
>>>>>>>>
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 18, 2021 at 8:04 PM Chang Liu <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Junchao,
>>>>>>>>>
>>>>>>>>> Thank you for your answer. I tried MatConvert and it works. I
>>>>>>>>> didn't
>>>>>>>>> make it before because I forgot to convert a vector from mpi to
>>>>>>>>> mpicuda
>>>>>>>>> previously.
>>>>>>>>>
>>>>>>>>> For vector, there is no VecConvert to use, so I have to do
>>>>>>>>> VecDuplicate,
>>>>>>>>> VecSetType and VecCopy. Is there an easier option?
>>>>>>>>>
>>>>>>>>  As Matt suggested, you could single out the matrix and vector with
>>>>>>>> options prefix and set their type on command line
>>>>>>>>
>>>>>>>> MatSetOptionsPrefix(A,"mymat");
>>>>>>>> VecSetOptionsPrefix(v,"myvec");
>>>>>>>>
>>>>>>>> Then, -mymat_mat_type aijcusparse -myvec_vec_type cuda
>>>>>>>>
>>>>>>>> A simpler code is to have the vector type automatically set by
>>>>>>>> MatCreateVecs(A,&v,NULL)
>>>>>>>>
>>>>>>>>
>>>>>>>>> Chang
>>>>>>>>>
>>>>>>>>> On 10/18/21 5:23 PM, Junchao Zhang wrote:
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mon, Oct 18, 2021 at 3:42 PM Chang Liu via petsc-users
>>>>>>>>> > <[email protected] <mailto:[email protected]>>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> >     Hi Matt,
>>>>>>>>> >
>>>>>>>>> >     I have a related question. In my code I have many matrices
>>>>>>>>> and I only
>>>>>>>>> >     want to have one living on GPU, the others still staying on
>>>>>>>>> CPU mem.
>>>>>>>>> >
>>>>>>>>> >     I wonder if there is an easier way to copy a mpiaij matrix to
>>>>>>>>> >     mpiaijcusparse (in other words, copy data to GPUs). I can
>>>>>>>>> think of
>>>>>>>>> >     creating a new mpiaijcusparse matrix, and copying the data
>>>>>>>>> line by
>>>>>>>>> >     line.
>>>>>>>>> >     But I wonder if there is a better option.
>>>>>>>>> >
>>>>>>>>> >     I have tried MatCopy and MatConvert but neither work.
>>>>>>>>> >
>>>>>>>>> > Did you use MatConvert(mat,matype,MAT_INPLACE_MATRIX,&mat)?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >     Chang
>>>>>>>>> >
>>>>>>>>> >     On 10/17/21 7:50 PM, Matthew Knepley wrote:
>>>>>>>>> >      > On Sun, Oct 17, 2021 at 7:12 PM Swarnava Ghosh
>>>>>>>>> >     <[email protected] <mailto:[email protected]>
>>>>>>>>> >      > <mailto:[email protected] <mailto:[email protected]>>>
>>>>>>>>> wrote:
>>>>>>>>> >      >
>>>>>>>>> >      >     Do I need convert the MATSEQBAIJ to a cuda matrix in
>>>>>>>>> code?
>>>>>>>>> >      >
>>>>>>>>> >      >
>>>>>>>>> >      > You would need a call to MatSetFromOptions() to take that
>>>>>>>>> type
>>>>>>>>> >     from the
>>>>>>>>> >      > command line, and not have
>>>>>>>>> >      > the type hard-coded in your application. It is generally
>>>>>>>>> a bad
>>>>>>>>> >     idea to
>>>>>>>>> >      > hard code the implementation type.
>>>>>>>>> >      >
>>>>>>>>> >      >     If I do it from command line, then are the other
>>>>>>>>> MatVec calls are
>>>>>>>>> >      >     ported onto CUDA? I have many MatVec calls in my
>>>>>>>>> code, but I
>>>>>>>>> >      >     specifically want to port just one call.
>>>>>>>>> >      >
>>>>>>>>> >      >
>>>>>>>>> >      > You can give that one matrix an options prefix to isolate
>>>>>>>>> it.
>>>>>>>>> >      >
>>>>>>>>> >      >    Thanks,
>>>>>>>>> >      >
>>>>>>>>> >      >       Matt
>>>>>>>>> >      >
>>>>>>>>> >      >     Sincerely,
>>>>>>>>> >      >     Swarnava
>>>>>>>>> >      >
>>>>>>>>> >      >     On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang
>>>>>>>>> >      >     <[email protected] <mailto:
>>>>>>>>> [email protected]>
>>>>>>>>> >     <mailto:[email protected] <mailto:
>>>>>>>>> [email protected]>>>
>>>>>>>>> >     wrote:
>>>>>>>>> >      >
>>>>>>>>> >      >         You can do that with command line options
>>>>>>>>> -mat_type
>>>>>>>>> >     aijcusparse
>>>>>>>>> >      >         -vec_type cuda
>>>>>>>>> >      >
>>>>>>>>> >      >         On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh
>>>>>>>>> >      >         <[email protected] <mailto:
>>>>>>>>> [email protected]>
>>>>>>>>> >     <mailto:[email protected] <mailto:[email protected]>>>
>>>>>>>>> wrote:
>>>>>>>>> >      >
>>>>>>>>> >      >             Dear Petsc team,
>>>>>>>>> >      >
>>>>>>>>> >      >             I had a query regarding using CUDA to
>>>>>>>>> accelerate a matrix
>>>>>>>>> >      >             vector product.
>>>>>>>>> >      >             I have a sequential sparse matrix
>>>>>>>>> (MATSEQBAIJ type).
>>>>>>>>> >     I want
>>>>>>>>> >      >             to port a MatVec call onto GPUs. Is there any
>>>>>>>>> >     code/example I
>>>>>>>>> >      >             can look at?
>>>>>>>>> >      >
>>>>>>>>> >      >             Sincerely,
>>>>>>>>> >      >             SG
>>>>>>>>> >      >
>>>>>>>>> >      >
>>>>>>>>> >      >
>>>>>>>>> >      > --
>>>>>>>>> >      > What most experimenters take for granted before they
>>>>>>>>> begin their
>>>>>>>>> >      > experiments is infinitely more interesting than any
>>>>>>>>> results to which
>>>>>>>>> >      > their experiments lead.
>>>>>>>>> >      > -- Norbert Wiener
>>>>>>>>> >      >
>>>>>>>>> >      > https://www.cse.buffalo.edu/~knepley/
>>>>>>>>> >     <https://www.cse.buffalo.edu/~knepley/>
>>>>>>>>> >     <http://www.cse.buffalo.edu/~knepley/
>>>>>>>>> >     <http://www.cse.buffalo.edu/~knepley/>>
>>>>>>>>> >
>>>>>>>>> >     --
>>>>>>>>> >     Chang Liu
>>>>>>>>> >     Staff Research Physicist
>>>>>>>>> >     +1 609 243 3438
>>>>>>>>> >     [email protected] <mailto:[email protected]>
>>>>>>>>> >     Princeton Plasma Physics Laboratory
>>>>>>>>> >     100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chang Liu
>>>>>>>>> Staff Research Physicist
>>>>>>>>> +1 609 243 3438
>>>>>>>>> [email protected]
>>>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>>
>>>>>>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>

Re: [petsc-users] [External] Re: MatVec on GPUs

Reply via email to