[petsc-users] ASM vs GASM
I am trying to setup a GASM preconditioner and am running into some difficulty getting convergence even with exact subdomain solvers. I am just doing things in serial now, and tried switching to ASM, and get convergence. I am wondering if I am misunderstanding the GASM interface. If inner_ises and outer_ises define the “inner” and “outer” subdomains for GASM, should GASM and ASM do the same things when configured via: PCGASMSetSubdomains(pc, n_subdomains, inner_ises, outer_ises); and PCASMSetLocalSubdomains(pc, n_subdomains, outer_ises, inner_ises); Thanks! — Boyce
Re: [petsc-users] FIELDSPLIT fields
> On Aug 15, 2018, at 10:07 PM, Smith, Barry F. wrote: > > > Yes you can have "overlapping fields" with FIELDSPLIT but I don't think you > can use FIELDSPLIT for your case. You seem to have a geometric decomposition > into regions. ASM and GASM are intended for such decompositions. Fieldsplit > is for multiple fields that each live across the entire domain. Basically there is one field the lives on the entire domain, and another field that lives only on a subdomain. Perhaps we could do GASM for the geometric split and FIELDSPLIT within the subdomain with the two fields. > Barry > > >> On Aug 15, 2018, at 7:42 PM, Griffith, Boyce Eugene >> wrote: >> >> Is it permissible to have overlapping fields in FIELDSPLIT? We are >> specifically thinking about how to handle DOFs living on the interface >> between two regions. >> >> Thanks! >> >> — Boyce >
Re: [petsc-users] FIELDSPLIT fields
On Aug 15, 2018, at 9:17 PM, Matthew Knepley mailto:knep...@gmail.com>> wrote: On Wed, Aug 15, 2018 at 8:42 PM Griffith, Boyce Eugene mailto:boy...@email.unc.edu>> wrote: Is it permissible to have overlapping fields in FIELDSPLIT? We are specifically thinking about how to handle DOFs living on the interface between two regions. There is only 1 IS, so no way to do RASM, or any other thing on the overlap. This sort of things was supposed to be handled by GASM. There are three big blocks, with one interface between two of them. It may not make much difference which subdomain the interfacial DOFs are assigned to. I am not sure that works. If you want blocks that are not parallel, then you can probably use PCPATCH as soon as I get it merged. Matt Thanks! — Boyce -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.caam.rice.edu/~mk51/>
[petsc-users] FIELDSPLIT fields
Is it permissible to have overlapping fields in FIELDSPLIT? We are specifically thinking about how to handle DOFs living on the interface between two regions. Thanks! — Boyce
Re: [petsc-users] Neumann BC with non-symmetric matrix
On Mar 1, 2016, at 10:56 AM, Jed Brown> wrote: Boyce Griffith > writes: Jed, can you also do this for Stokes? It seems like something like RT0 is the right place to start. See, for example, Arnold, Falk, and Winther's 2007 paper on mixed FEM for elasticity with weakly imposed symmetry. It's the usual H(div) methodology and should apply equally well to Stokes. I'm not aware of any analysis or results of choosing quadrature to eliminate flux terms in these discretizations. Two papers that are along the direction that I have in mind are: http://onlinelibrary.wiley.com/doi/10.1002/fld.1566/abstract http://onlinelibrary.wiley.com/doi/10.1002/fld.1723/abstract I would love to know how to do this kind of thing on a SAMR or octree grid. -- Boyce
Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
Barry -- Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU? Thanks, -- Boyce > On Jan 16, 2016, at 10:46 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > Boyce, > > Of course anything is possible in software. But I expect an optimization to > not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() > rather than some PETSc option that we could add (especially if you are using > Matt's PC_COMPOSITE_MULTIPLICATIVE). > > I would start by copying PCSetUp_ASM(), stripping out all the setup stuff > that doesn't relate to your code and then mark identical domains so you don't > need to call MatGetSubMatrices() on those domains and don't create a new KSP > for each one of those subdomains (but reuses a common one). The PCApply_ASM() > should be hopefully be reusable so long as you have created the full array of > KSP objects (some of which will be common). If you increase the reference > counts of the common KSP in > PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() > should also work unchanged > > Good luck, > > Barry > >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> >> wrote: >> >> >>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>> >>> >>> Ok, I looked at your results in hpcviewer and don't see any surprises. The >>> PETSc time is in the little LU factorizations, the LU solves and the >>> matrix-vector products as it should be. Not much can be done on speeding >>> these except running on machines with high memory bandwidth. >> >> Looks like LU factorizations are about 25% for this particular case. Many >> of these little subsystems are going to be identical (many will correspond >> to constant coefficient Stokes), and it is fairly easy to figure out which >> are which. How hard would it be to modify PCASM to allow for the >> specification of one or more "default" KSPs that can be used for specified >> blocks? >> >> Of course, we'll also look into tweaking the subdomain solves --- it may not >> even be necessary to do exact subdomain solves to get reasonable MG >> performance. >> >> -- Boyce >> >>> If you are using the master branch of PETSc two users gave us a nifty new >>> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers >>> time and flop etc. You can run with -log_view :filename.xml:ascii_xml and >>> then open the file with a browser (for example open -f Safari filename.xml) >>> or email the file. >>> >>> Barry >>> >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amne...@live.unc.edu> >>>> wrote: >>>> >>>> >>>> >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>> >>>>> Either way is fine so long as I don't have to install a ton of stuff; >>>>> which it sounds like I won’t. >>>> >>>> http://hpctoolkit.org/download/hpcviewer/ >>>> >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder >>>> to Applications. You will be able to >>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You >>>> will be able to see three different kind of profiling >>>> under Calling Context View, Callers View and Flat View. >>>> >>>> >>>> >>>> >>>> >>> >> >
Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
> On Jan 16, 2016, at 7:00 PM, Barry Smithwrote: > > > Ok, I looked at your results in hpcviewer and don't see any surprises. The > PETSc time is in the little LU factorizations, the LU solves and the > matrix-vector products as it should be. Not much can be done on speeding > these except running on machines with high memory bandwidth. Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks? Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance. -- Boyce > If you are using the master branch of PETSc two users gave us a nifty new > profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time > and flop etc. You can run with -log_view :filename.xml:ascii_xml and then > open the file with a browser (for example open -f Safari filename.xml) or > email the file. > > Barry > >> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S >> wrote: >> >> >> >>> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >>> >>> Either way is fine so long as I don't have to install a ton of stuff; which >>> it sounds like I won’t. >> >> http://hpctoolkit.org/download/hpcviewer/ >> >> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to >> Applications. You will be able to >> fire HPCViewer from LaunchPad. Point it to this attached directory. You will >> be able to see three different kind of profiling >> under Calling Context View, Callers View and Flat View. >> >> >> >> >> >
Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S> wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it’s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? Thanks, -- Boyce
Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: Does Instruments save results somewhere (like in a cascade view) that I can send to Barry? Yes --- "save as..." will save the current trace, and then you can open it back up. -- Boyce --Amneet Bhalla On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene <boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote: On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>> wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it’s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) Thanks, -- Boyce Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce
Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>> wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it’s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) Thanks, -- Boyce Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce
Re: [petsc-users] HPCToolKit/HPCViewer on OS X
> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > Matt is right, there is a lot of "missing" time from the output. Please > send the output from -ksp_view so we can see exactly what solver is being > used. > > From the output we have: > >Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC > is taking about 22% of the time) >Linear solver 77 % of the time (this is reasonable pretty much the entire > cost of the nonlinear solve is the linear solve) >Time to set up the preconditioner is 19% (10 + 9) >Time of iteration in KSP 35 % (this is the sum of the vector operations > and MatMult() and MatSolve()) > > So 77 - (19 + 35) = 23 % unexplained time inside the linear solver > (custom preconditioner???) > >Also getting the results with Instruments or HPCToolkit would be useful > (so long as we don't need to install HPCTool ourselves to see the results). Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. -- Boyce > > > Barry > >> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu> >> wrote: >> >> >> >>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> >>> wrote: >>> >>> I see one hot spot: >> >> >> Here is with opt build >> >> >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document*** >> >> >> -- PETSc Performance Summary: >> -- >> >> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 >> 02:24:43 2016 >> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: >> 2016-01-13 21:30:26 -0600 >> >> Max Max/MinAvg Total >> Time (sec): 1.018e+00 1.0 1.018e+00 >> Objects: 2.935e+03 1.0 2.935e+03 >> Flops:4.957e+08 1.0 4.957e+08 4.957e+08 >> Flops/sec:4.868e+08 1.0 4.868e+08 4.868e+08 >> MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.0 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.0 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >>e.g., VecAXPY() for real vectors of length N --> >> 2N flops >>and VecAXPY() for complex vectors of length N --> >> 8N flops >> >> Summary of Stages: - Time -- - Flops - --- Messages --- >> -- Message Lengths -- -- Reductions -- >>Avg %Total Avg %Total counts %Total >> Avg %Total counts %Total >> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% >> 0.000e+000.0% 0.000e+00 0.0% >> >> >> See the 'Profiling' chapter of the users' manual for details on interpreting >> output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this phase >> %M - percent messages in this phase %L - percent message lengths in >> this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over >> all processors) >> --
Re: [petsc-users] HPCToolKit/HPCViewer on OS X
On Jan 14, 2016, at 3:09 PM, Barry Smith <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene <boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote: On Jan 14, 2016, at 2:24 PM, Barry Smith <bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote: Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. >From the output we have: Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) Time to set up the preconditioner is 19% (10 + 9) Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time. The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage. Here is a bit more info about what we are trying to do: This is a Vanka-type MG preconditioner for a Stokes-like system on a structured grid. (Currently just uniform grids, but hopefully soon with AMR.) For the smoother, we are using damped Richardson + ASM with relatively small block subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when using smaller subdomains. I can't remember whether we have quantified this carefully, but PCASM seems to bog down with smaller subdomains. A question is whether there are different implementation choices that could make the case of "lots of little subdomains" run faster. But before we get to that, Amneet and I should take a more careful look at overall solver performance. (We are also starting to play around with PCFIELDSPLIT for this problem too, although we don't have many ideas about how to handle the Schur complement.) Thanks, -- Boyce Barry -- Boyce Barry On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote: On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote: I see one hot spot: Here is with opt build *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document*** -- PETSc Performance Summary: -- ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 Max Max/MinAvg Total Time (sec): 1.018e+00 1.0 1.018e+00 Objects: 2.935e+03 1.0 2.935e+03 Flops:4.957e+08 1.0 4.957e+08 4.957e+08 Flops/sec:4.868e+08 1.0 4.868e+08 4.868e+08 MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.0 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: - Time -- - Flops - --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+000.0% 0.000e+00 0.0% See
Re: [petsc-users] HPCToolKit/HPCViewer on OS X
I see one hot spot: On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S> wrote: ## ## # WARNING!!!# ## # This code was compiled with a debugging option, # # To get timing results run ./configure# # using --with-debugging=no, the performance will # # be generally two or three times faster. # ## ##