> On 14 Dec 2023, at 11:45 PM, Sreeram R Venkat <srven...@utexas.edu> wrote: > > Thanks, I will try to create a minimal reproducible example. This may take me > some time though, as I need to figure out how to extract only the relevant > parts (the full program this solve is used in is getting quite complex).
You could just do -ksp_view_mat binary:Amat.bin -ksp_view_pmat binary:Pmat.bin -ksp_view_rhs binary:rhs.bin and send me those three files (I’m guessing your are using double-precision scalars with 32-bit PetscInt). > I'll also try out some of the BoomerAMG options to see if that helps. These should work (this is where all “PCMatApply()-ready” PC are being tested): https://petsc.org/release/src/ksp/ksp/tutorials/ex79.c.html#line215 You can see it’s also testing PCHYPRE + KSPHPDDM on device (but not with HIP). I’m aware the performance should not be optimal (see your comment about host/device copies), I’ve money to hire someone to work on this but: a) I need to find the correct engineer/post-doc, b) I currently don’t have good use cases (of course, I could generate a synthetic benchmark, for science). So even if you send me the three Mat, a MWE would be appreciated if the KSPMatSolve() is performance-critical for you (see point b) from above). Thanks, Pierre > Thanks, > Sreeram > > On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet <pie...@joliv.et > <mailto:pie...@joliv.et>> wrote: >> >> >>> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat <srven...@utexas.edu >>> <mailto:srven...@utexas.edu>> wrote: >>> >>> Hello Pierre, >>> >>> Thank you for your reply. I tried out the HPDDM CG as you said, and it >>> seems to be doing the batched solves, but the KSP is not converging due to >>> a NaN or Inf being generated. I also noticed there are a lot of >>> host-to-device and device-to-host copies of the matrices (the non-batched >>> KSP solve did not have any memcopies). I have attached dump.0 again. Could >>> you please take a look? >> >> Yes, but you’d need to send me something I can run with your set of options >> (if you are more confident doing this in private, you can remove the list >> from c/c). >> Not all BoomerAMG smoothers handle blocks of right-hand sides, and there is >> not much error checking, so instead of erroring out, this may be the reason >> why you are getting garbage. >> >> Thanks, >> Pierre >> >>> Thanks, >>> Sreeram >>> >>> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet <pie...@joliv.et >>> <mailto:pie...@joliv.et>> wrote: >>>> Hello Sreeram, >>>> KSPCG (PETSc implementation of CG) does not handle solves with multiple >>>> columns at once. >>>> There is only a single native PETSc KSP implementation which handles >>>> solves with multiple columns at once: KSPPREONLY. >>>> If you use --download-hpddm, you can use a CG (or GMRES, or more advanced >>>> methods) implementation which handles solves with multiple columns at once >>>> (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); >>>> KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);). >>>> I’m the main author of HPDDM, there is preliminary support for device >>>> matrices, but if it’s not working as intended/not faster than column by >>>> column, I’d be happy to have a deeper look (maybe in private), because >>>> most (if not all) of my users interested in (pseudo-)block Krylov solvers >>>> (i.e., solvers that treat right-hand sides in a single go) are using plain >>>> host matrices. >>>> >>>> Thanks, >>>> Pierre >>>> >>>> PS: you could have a look at >>>> https://www.sciencedirect.com/science/article/abs/pii/S0898122121000055 to >>>> understand the philosophy behind block iterative methods in PETSc (and in >>>> HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was >>>> developed in the context of this paper to produce Figures 2-3. Note that >>>> this paper is now slightly outdated, since then, PCHYPRE and PCMG (among >>>> others) have been made “PCMatApply()-ready”. >>>> >>>>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat <srven...@utexas.edu >>>>> <mailto:srven...@utexas.edu>> wrote: >>>>> >>>>> Hello Pierre, >>>>> >>>>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. >>>>> However, I am noticing that it is still solving column by column (this is >>>>> stated explicitly in the info dump attached). I looked at the code for >>>>> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, >>>>> it should do the batched solve, though I'm not sure where that gets set. >>>>> >>>>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when >>>>> running the code. >>>>> >>>>> Can you please help me with this? >>>>> >>>>> Thanks, >>>>> Sreeram >>>>> >>>>> >>>>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams <mfad...@lbl.gov >>>>> <mailto:mfad...@lbl.gov>> wrote: >>>>>> N.B., AMGX interface is a bit experimental. >>>>>> Mark >>>>>> >>>>>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat <srven...@utexas.edu >>>>>> <mailto:srven...@utexas.edu>> wrote: >>>>>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build >>>>>>> correctly was also tricky so hopefully the HYPRE build will be easier. >>>>>>> >>>>>>> Thanks, >>>>>>> Sreeram >>>>>>> >>>>>>> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet <pie...@joliv.et >>>>>>> <mailto:pie...@joliv.et>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat <srven...@utexas.edu >>>>>>>>> <mailto:srven...@utexas.edu>> wrote: >>>>>>>>> >>>>>>>>> Thank you Barry and Pierre; I will proceed with the first option. >>>>>>>>> >>>>>>>>> I want to use the AMGX preconditioner for the KSP. I will try it out >>>>>>>>> and see how it performs. >>>>>>>> >>>>>>>> Just FYI, AMGX does not handle systems with multiple RHS, and thus has >>>>>>>> no PCMatApply() implementation. >>>>>>>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG() >>>>>>>> implementation. >>>>>>>> But let us know if you need assistance figuring things out. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Pierre >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sreeram >>>>>>>>> >>>>>>>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet <pie...@joliv.et >>>>>>>>> <mailto:pie...@joliv.et>> wrote: >>>>>>>>>> To expand on Barry’s answer, we have observed repeatedly that >>>>>>>>>> MatMatMult with MatAIJ performs better than MatMult with MatMAIJ, >>>>>>>>>> you can reproduce this on your own with >>>>>>>>>> https://petsc.org/release/src/mat/tests/ex237.c.html. >>>>>>>>>> Also, I’m guessing you are using some sort of preconditioner within >>>>>>>>>> your KSP. >>>>>>>>>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of >>>>>>>>>> right-hand sides column by column, which is very inefficient. >>>>>>>>>> You could run your code with -info dump and send us dump.0 to see >>>>>>>>>> what needs to be done on our end to make things more efficient, >>>>>>>>>> should you not be satisfied with the current performance of the code. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Pierre >>>>>>>>>> >>>>>>>>>>> On 7 Dec 2023, at 8:34 PM, Barry Smith <bsm...@petsc.dev >>>>>>>>>>> <mailto:bsm...@petsc.dev>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat <srven...@utexas.edu >>>>>>>>>>>> <mailto:srven...@utexas.edu>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of >>>>>>>>>>>> size n x n) and a vector v of size n*m. v = [v_1 , v_2 ,... , v_m] >>>>>>>>>>>> where v_i has size n. The data for v can be stored either in >>>>>>>>>>>> column-major or row-major order. Now, I want to do 2 types of >>>>>>>>>>>> operations: >>>>>>>>>>>> >>>>>>>>>>>> 1. Matvecs of the form M*v_i = w_i, for i = 1..m. >>>>>>>>>>>> 2. KSPSolves of the form R*x_i = v_i, for i = 1..m. >>>>>>>>>>>> >>>>>>>>>>>> From what I have read on the documentation, I can think of 2 >>>>>>>>>>>> approaches. >>>>>>>>>>>> >>>>>>>>>>>> 1. Get the pointer to the data in v (column-major) and use it to >>>>>>>>>>>> create a dense matrix V. Then do a MatMatMult with M*V = W, and >>>>>>>>>>>> take the data pointer of W to create the vector w. For KSPSolves, >>>>>>>>>>>> use KSPMatSolve with R and V. >>>>>>>>>>>> >>>>>>>>>>>> 2. Create a MATMAIJ using M/R and use that for matvecs directly >>>>>>>>>>>> with the vector v. I don't know if KSPSolve with the MATMAIJ will >>>>>>>>>>>> know that it is a multiple RHS system and act accordingly. >>>>>>>>>>>> >>>>>>>>>>>> Which would be the more efficient option? >>>>>>>>>>> >>>>>>>>>>> Use 1. >>>>>>>>>>>> >>>>>>>>>>>> As a side-note, I am also wondering if there is a way to use >>>>>>>>>>>> row-major storage of the vector v. >>>>>>>>>>> >>>>>>>>>>> No >>>>>>>>>>> >>>>>>>>>>>> The reason is that this could allow for more coalesced memory >>>>>>>>>>>> access when doing matvecs. >>>>>>>>>>> >>>>>>>>>>> PETSc matrix-vector products use BLAS GMEV matrix-vector products >>>>>>>>>>> for the computation so in theory they should already be >>>>>>>>>>> well-optimized >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sreeram >>>>>>>>>> >>>>>>>> >>>>> <dump.0> >>>> >>> <dump.0> >>