Unfortunately, I am not able to reproduce such a failure with your input matrix.
I’ve used ex79 that I linked previously and the system is properly solved.
$ ./ex79 -pc_type hypre -ksp_type hpddm -ksp_hpddm_type cg
-ksp_converged_reason -ksp_view_mat ascii::ascii_info -ksp_view_rhs
ascii::ascii_info
Linear solve converged due to CONVERGED_RTOL iterations 6
Mat Object: 1 MPI process
type: seqaijcusparse
rows=289, cols=289
total: nonzeros=2401, allocated nonzeros=2401
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Mat Object: 1 MPI process
type: seqdensecuda
rows=289, cols=10
total: nonzeros=2890, allocated nonzeros=2890
total number of mallocs used during MatSetValues calls=0
You mentioned in a subsequent email that you are interested in systems with at
most 1E6 unknowns, and up to 1E4 right-hand sides.
I’m not sure you can expect significant gains from using GPU for such systems.
Probably, the fastest approach would indeed be -pc_type lu -ksp_type preonly
-ksp_matsolve_batch_size 100 or something, depending on the memory available on
your host.
Thanks,
Pierre
> On 15 Dec 2023, at 9:52 PM, Sreeram R Venkat <[email protected]> wrote:
>
> Here are the ksp_view files. I set the options -ksp_error_if_not_converged
> to try to get the vectors that caused the error. I noticed that some of the
> KSPMatSolves converge while others don't. In the code, the solves are called
> as:
>
> input vector v --> insert data of v into a dense mat --> KSPMatSolve() -->
> MatMatMult() --> KSPMatSolve() --> insert data of dense mat into output
> vector w -- output w
>
> The operator used in the KSP is a Laplacian-like operator, and the MatMatMult
> is with a Mass Matrix. The whole thing is supposed to be a solve with a
> biharmonic-like operator. I can also run it with only the first KSPMatSolve
> (i.e. just a Laplacian-like operator). In that case, the KSP reportedly
> converges after 0 iterations (see the next line), but this causes problems in
> other parts of the code later on.
>
> I saw that sometimes the first KSPMatSolve "converges" after 0 iterations due
> to CONVERGED_RTOL. Then, the second KSPMatSolve produces a NaN/Inf. I tried
> setting ksp_min_it, but that didn't seem to do anything.
>
> I'll keep trying different options and also try to get the MWE made (this
> KSPMatSolve is pretty performance critical for us).
>
> Thanks for all your help,
> Sreeram
>
> On Fri, Dec 15, 2023 at 1:01 AM Pierre Jolivet <[email protected]
> <mailto:[email protected]>> wrote:
>>
>>> On 14 Dec 2023, at 11:45 PM, Sreeram R Venkat <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> Thanks, I will try to create a minimal reproducible example. This may take
>>> me some time though, as I need to figure out how to extract only the
>>> relevant parts (the full program this solve is used in is getting quite
>>> complex).
>>
>> You could just do -ksp_view_mat binary:Amat.bin -ksp_view_pmat
>> binary:Pmat.bin -ksp_view_rhs binary:rhs.bin and send me those three files
>> (I’m guessing your are using double-precision scalars with 32-bit PetscInt).
>>
>>> I'll also try out some of the BoomerAMG options to see if that helps.
>>
>> These should work (this is where all “PCMatApply()-ready” PC are being
>> tested): https://petsc.org/release/src/ksp/ksp/tutorials/ex79.c.html#line215
>> You can see it’s also testing PCHYPRE + KSPHPDDM on device (but not with
>> HIP).
>> I’m aware the performance should not be optimal (see your comment about
>> host/device copies), I’ve money to hire someone to work on this but: a) I
>> need to find the correct engineer/post-doc, b) I currently don’t have good
>> use cases (of course, I could generate a synthetic benchmark, for science).
>> So even if you send me the three Mat, a MWE would be appreciated if the
>> KSPMatSolve() is performance-critical for you (see point b) from above).
>>
>> Thanks,
>> Pierre
>>
>>> Thanks,
>>> Sreeram
>>>
>>> On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>>
>>>>
>>>>> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> Hello Pierre,
>>>>>
>>>>> Thank you for your reply. I tried out the HPDDM CG as you said, and it
>>>>> seems to be doing the batched solves, but the KSP is not converging due
>>>>> to a NaN or Inf being generated. I also noticed there are a lot of
>>>>> host-to-device and device-to-host copies of the matrices (the non-batched
>>>>> KSP solve did not have any memcopies). I have attached dump.0 again.
>>>>> Could you please take a look?
>>>>
>>>> Yes, but you’d need to send me something I can run with your set of
>>>> options (if you are more confident doing this in private, you can remove
>>>> the list from c/c).
>>>> Not all BoomerAMG smoothers handle blocks of right-hand sides, and there
>>>> is not much error checking, so instead of erroring out, this may be the
>>>> reason why you are getting garbage.
>>>>
>>>> Thanks,
>>>> Pierre
>>>>
>>>>> Thanks,
>>>>> Sreeram
>>>>>
>>>>> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hello Sreeram,
>>>>>> KSPCG (PETSc implementation of CG) does not handle solves with multiple
>>>>>> columns at once.
>>>>>> There is only a single native PETSc KSP implementation which handles
>>>>>> solves with multiple columns at once: KSPPREONLY.
>>>>>> If you use --download-hpddm, you can use a CG (or GMRES, or more
>>>>>> advanced methods) implementation which handles solves with multiple
>>>>>> columns at once (via -ksp_type hpddm -ksp_hpddm_type cg or
>>>>>> KSPSetType(ksp, KSPHPDDM); KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
>>>>>> I’m the main author of HPDDM, there is preliminary support for device
>>>>>> matrices, but if it’s not working as intended/not faster than column by
>>>>>> column, I’d be happy to have a deeper look (maybe in private), because
>>>>>> most (if not all) of my users interested in (pseudo-)block Krylov
>>>>>> solvers (i.e., solvers that treat right-hand sides in a single go) are
>>>>>> using plain host matrices.
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>
>>>>>> PS: you could have a look at
>>>>>> https://www.sciencedirect.com/science/article/abs/pii/S0898122121000055
>>>>>> to understand the philosophy behind block iterative methods in PETSc
>>>>>> (and in HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned
>>>>>> earlier, was developed in the context of this paper to produce Figures
>>>>>> 2-3. Note that this paper is now slightly outdated, since then, PCHYPRE
>>>>>> and PCMG (among others) have been made “PCMatApply()-ready”.
>>>>>>
>>>>>>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>> Hello Pierre,
>>>>>>>
>>>>>>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner.
>>>>>>> However, I am noticing that it is still solving column by column (this
>>>>>>> is stated explicitly in the info dump attached). I looked at the code
>>>>>>> for KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is
>>>>>>> true, it should do the batched solve, though I'm not sure where that
>>>>>>> gets set.
>>>>>>>
>>>>>>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when
>>>>>>> running the code.
>>>>>>>
>>>>>>> Can you please help me with this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sreeram
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> N.B., AMGX interface is a bit experimental.
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat <[email protected]
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build
>>>>>>>>> correctly was also tricky so hopefully the HYPRE build will be easier.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sreeram
>>>>>>>>>
>>>>>>>>> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet <[email protected]
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat <[email protected]
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thank you Barry and Pierre; I will proceed with the first option.
>>>>>>>>>>>
>>>>>>>>>>> I want to use the AMGX preconditioner for the KSP. I will try it
>>>>>>>>>>> out and see how it performs.
>>>>>>>>>>
>>>>>>>>>> Just FYI, AMGX does not handle systems with multiple RHS, and thus
>>>>>>>>>> has no PCMatApply() implementation.
>>>>>>>>>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG()
>>>>>>>>>> implementation.
>>>>>>>>>> But let us know if you need assistance figuring things out.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Pierre
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sreeram
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet <[email protected]
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>> To expand on Barry’s answer, we have observed repeatedly that
>>>>>>>>>>>> MatMatMult with MatAIJ performs better than MatMult with MatMAIJ,
>>>>>>>>>>>> you can reproduce this on your own with
>>>>>>>>>>>> https://petsc.org/release/src/mat/tests/ex237.c.html.
>>>>>>>>>>>> Also, I’m guessing you are using some sort of preconditioner
>>>>>>>>>>>> within your KSP.
>>>>>>>>>>>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of
>>>>>>>>>>>> right-hand sides column by column, which is very inefficient.
>>>>>>>>>>>> You could run your code with -info dump and send us dump.0 to see
>>>>>>>>>>>> what needs to be done on our end to make things more efficient,
>>>>>>>>>>>> should you not be satisfied with the current performance of the
>>>>>>>>>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Pierre
>>>>>>>>>>>>
>>>>>>>>>>>>> On 7 Dec 2023, at 8:34 PM, Barry Smith <[email protected]
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat
>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of
>>>>>>>>>>>>>> size n x n) and a vector v of size n*m. v = [v_1 , v_2 ,... ,
>>>>>>>>>>>>>> v_m] where v_i has size n. The data for v can be stored either
>>>>>>>>>>>>>> in column-major or row-major order. Now, I want to do 2 types
>>>>>>>>>>>>>> of operations:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Matvecs of the form M*v_i = w_i, for i = 1..m.
>>>>>>>>>>>>>> 2. KSPSolves of the form R*x_i = v_i, for i = 1..m.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From what I have read on the documentation, I can think of 2
>>>>>>>>>>>>>> approaches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Get the pointer to the data in v (column-major) and use it to
>>>>>>>>>>>>>> create a dense matrix V. Then do a MatMatMult with M*V = W, and
>>>>>>>>>>>>>> take the data pointer of W to create the vector w. For
>>>>>>>>>>>>>> KSPSolves, use KSPMatSolve with R and V.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Create a MATMAIJ using M/R and use that for matvecs directly
>>>>>>>>>>>>>> with the vector v. I don't know if KSPSolve with the MATMAIJ
>>>>>>>>>>>>>> will know that it is a multiple RHS system and act accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which would be the more efficient option?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Use 1.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As a side-note, I am also wondering if there is a way to use
>>>>>>>>>>>>>> row-major storage of the vector v.
>>>>>>>>>>>>>
>>>>>>>>>>>>> No
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The reason is that this could allow for more coalesced memory
>>>>>>>>>>>>>> access when doing matvecs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PETSc matrix-vector products use BLAS GMEV matrix-vector
>>>>>>>>>>>>> products for the computation so in theory they should already be
>>>>>>>>>>>>> well-optimized
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Sreeram
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> <dump.0>
>>>>>>
>>>>> <dump.0>
>>>>
>>
> <Pmat.bin><Amat.bin><rhs.bin>