> On 14 Dec 2023, at 11:45 PM, Sreeram R Venkat <srven...@utexas.edu> wrote:
> 
> Thanks, I will try to create a minimal reproducible example. This may take me 
> some time though, as I need to figure out how to extract only the relevant 
> parts (the full program this solve is used in is getting quite complex).

You could just do -ksp_view_mat binary:Amat.bin -ksp_view_pmat binary:Pmat.bin 
-ksp_view_rhs binary:rhs.bin and send me those three files (I’m guessing your 
are using double-precision scalars with 32-bit PetscInt).

> I'll also try out some of the BoomerAMG options to see if that helps.

These should work (this is where all “PCMatApply()-ready” PC are being tested): 
https://petsc.org/release/src/ksp/ksp/tutorials/ex79.c.html#line215
You can see it’s also testing PCHYPRE + KSPHPDDM on device (but not with HIP).
I’m aware the performance should not be optimal (see your comment about 
host/device copies), I’ve money to hire someone to work on this but: a) I need 
to find the correct engineer/post-doc, b) I currently don’t have good use cases 
(of course, I could generate a synthetic benchmark, for science).
So even if you send me the three Mat, a MWE would be appreciated if the 
KSPMatSolve() is performance-critical for you (see point b) from above).

Thanks,
Pierre

> Thanks,
> Sreeram
> 
> On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet <pie...@joliv.et 
> <mailto:pie...@joliv.et>> wrote:
>> 
>> 
>>> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat <srven...@utexas.edu 
>>> <mailto:srven...@utexas.edu>> wrote:
>>> 
>>> Hello Pierre,
>>> 
>>> Thank you for your reply. I tried out the HPDDM CG as you said, and it 
>>> seems to be doing the batched solves, but the KSP is not converging due to 
>>> a NaN or Inf being generated. I also noticed there are a lot of 
>>> host-to-device and device-to-host copies of the matrices (the non-batched 
>>> KSP solve did not have any memcopies). I have attached dump.0 again. Could 
>>> you please take a look?
>> 
>> Yes, but you’d need to send me something I can run with your set of options 
>> (if you are more confident doing this in private, you can remove the list 
>> from c/c).
>> Not all BoomerAMG smoothers handle blocks of right-hand sides, and there is 
>> not much error checking, so instead of erroring out, this may be the reason 
>> why you are getting garbage.
>> 
>> Thanks,
>> Pierre
>> 
>>> Thanks,
>>> Sreeram
>>> 
>>> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet <pie...@joliv.et 
>>> <mailto:pie...@joliv.et>> wrote:
>>>> Hello Sreeram,
>>>> KSPCG (PETSc implementation of CG) does not handle solves with multiple 
>>>> columns at once.
>>>> There is only a single native PETSc KSP implementation which handles 
>>>> solves with multiple columns at once: KSPPREONLY.
>>>> If you use --download-hpddm, you can use a CG (or GMRES, or more advanced 
>>>> methods) implementation which handles solves with multiple columns at once 
>>>> (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); 
>>>> KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
>>>> I’m the main author of HPDDM, there is preliminary support for device 
>>>> matrices, but if it’s not working as intended/not faster than column by 
>>>> column, I’d be happy to have a deeper look (maybe in private), because 
>>>> most (if not all) of my users interested in (pseudo-)block Krylov solvers 
>>>> (i.e., solvers that treat right-hand sides in a single go) are using plain 
>>>> host matrices.
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>> PS: you could have a look at 
>>>> https://www.sciencedirect.com/science/article/abs/pii/S0898122121000055 to 
>>>> understand the philosophy behind block iterative methods in PETSc (and in 
>>>> HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was 
>>>> developed in the context of this paper to produce Figures 2-3. Note that 
>>>> this paper is now slightly outdated, since then, PCHYPRE and PCMG (among 
>>>> others) have been made “PCMatApply()-ready”.
>>>> 
>>>>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat <srven...@utexas.edu 
>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>> 
>>>>> Hello Pierre,
>>>>> 
>>>>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. 
>>>>> However, I am noticing that it is still solving column by column (this is 
>>>>> stated explicitly in the info dump attached). I looked at the code for 
>>>>> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, 
>>>>> it should do the batched solve, though I'm not sure where that gets set. 
>>>>> 
>>>>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when 
>>>>> running the code.
>>>>> 
>>>>> Can you please help me with this?
>>>>> 
>>>>> Thanks,
>>>>> Sreeram
>>>>> 
>>>>> 
>>>>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams <mfad...@lbl.gov 
>>>>> <mailto:mfad...@lbl.gov>> wrote:
>>>>>> N.B., AMGX interface is a bit experimental.
>>>>>> Mark
>>>>>> 
>>>>>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat <srven...@utexas.edu 
>>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build 
>>>>>>> correctly was also tricky so hopefully the HYPRE build will be easier.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Sreeram
>>>>>>> 
>>>>>>> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet <pie...@joliv.et 
>>>>>>> <mailto:pie...@joliv.et>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat <srven...@utexas.edu 
>>>>>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>>>>>> 
>>>>>>>>> Thank you Barry and Pierre; I will proceed with the first option. 
>>>>>>>>> 
>>>>>>>>> I want to use the AMGX preconditioner for the KSP. I will try it out 
>>>>>>>>> and see how it performs.
>>>>>>>> 
>>>>>>>> Just FYI, AMGX does not handle systems with multiple RHS, and thus has 
>>>>>>>> no PCMatApply() implementation.
>>>>>>>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG() 
>>>>>>>> implementation.
>>>>>>>> But let us know if you need assistance figuring things out.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Sreeram
>>>>>>>>> 
>>>>>>>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet <pie...@joliv.et 
>>>>>>>>> <mailto:pie...@joliv.et>> wrote:
>>>>>>>>>> To expand on Barry’s answer, we have observed repeatedly that 
>>>>>>>>>> MatMatMult with MatAIJ performs better than MatMult with MatMAIJ, 
>>>>>>>>>> you can reproduce this on your own with 
>>>>>>>>>> https://petsc.org/release/src/mat/tests/ex237.c.html.
>>>>>>>>>> Also, I’m guessing you are using some sort of preconditioner within 
>>>>>>>>>> your KSP.
>>>>>>>>>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of 
>>>>>>>>>> right-hand sides column by column, which is very inefficient.
>>>>>>>>>> You could run your code with -info dump and send us dump.0 to see 
>>>>>>>>>> what needs to be done on our end to make things more efficient, 
>>>>>>>>>> should you not be satisfied with the current performance of the code.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Pierre
>>>>>>>>>> 
>>>>>>>>>>> On 7 Dec 2023, at 8:34 PM, Barry Smith <bsm...@petsc.dev 
>>>>>>>>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat <srven...@utexas.edu 
>>>>>>>>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of 
>>>>>>>>>>>> size n x n) and a vector v of size n*m. v = [v_1 , v_2 ,... , v_m] 
>>>>>>>>>>>> where v_i has size n. The data for v can be stored either in 
>>>>>>>>>>>> column-major or row-major order.  Now, I want to do 2 types of 
>>>>>>>>>>>> operations:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Matvecs of the form M*v_i = w_i, for i = 1..m. 
>>>>>>>>>>>> 2. KSPSolves of the form R*x_i = v_i, for i = 1..m.
>>>>>>>>>>>> 
>>>>>>>>>>>> From what I have read on the documentation, I can think of 2 
>>>>>>>>>>>> approaches. 
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Get the pointer to the data in v (column-major) and use it to 
>>>>>>>>>>>> create a dense matrix V. Then do a MatMatMult with M*V = W, and 
>>>>>>>>>>>> take the data pointer of W to create the vector w. For KSPSolves, 
>>>>>>>>>>>> use KSPMatSolve with R and V.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. Create a MATMAIJ using M/R and use that for matvecs directly 
>>>>>>>>>>>> with the vector v. I don't know if KSPSolve with the MATMAIJ will 
>>>>>>>>>>>> know that it is a multiple RHS system and act accordingly.
>>>>>>>>>>>> 
>>>>>>>>>>>> Which would be the more efficient option?
>>>>>>>>>>> 
>>>>>>>>>>> Use 1. 
>>>>>>>>>>>> 
>>>>>>>>>>>> As a side-note, I am also wondering if there is a way to use 
>>>>>>>>>>>> row-major storage of the vector v.
>>>>>>>>>>> 
>>>>>>>>>>> No
>>>>>>>>>>> 
>>>>>>>>>>>> The reason is that this could allow for more coalesced memory 
>>>>>>>>>>>> access when doing matvecs.
>>>>>>>>>>> 
>>>>>>>>>>>   PETSc matrix-vector products use BLAS GMEV matrix-vector products 
>>>>>>>>>>> for the computation so in theory they should already be 
>>>>>>>>>>> well-optimized
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Sreeram
>>>>>>>>>> 
>>>>>>>> 
>>>>> <dump.0>
>>>> 
>>> <dump.0>
>> 

Reply via email to