Barry -- Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU?
Thanks, -- Boyce > On Jan 16, 2016, at 10:46 PM, Barry Smith <[email protected]> wrote: > > > Boyce, > > Of course anything is possible in software. But I expect an optimization to > not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() > rather than some PETSc option that we could add (especially if you are using > Matt's PC_COMPOSITE_MULTIPLICATIVE). > > I would start by copying PCSetUp_ASM(), stripping out all the setup stuff > that doesn't relate to your code and then mark identical domains so you don't > need to call MatGetSubMatrices() on those domains and don't create a new KSP > for each one of those subdomains (but reuses a common one). The PCApply_ASM() > should be hopefully be reusable so long as you have created the full array of > KSP objects (some of which will be common). If you increase the reference > counts of the common KSP in > PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() > should also work unchanged > > Good luck, > > Barry > >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <[email protected]> >> wrote: >> >> >>> On Jan 16, 2016, at 7:00 PM, Barry Smith <[email protected]> wrote: >>> >>> >>> Ok, I looked at your results in hpcviewer and don't see any surprises. The >>> PETSc time is in the little LU factorizations, the LU solves and the >>> matrix-vector products as it should be. Not much can be done on speeding >>> these except running on machines with high memory bandwidth. >> >> Looks like LU factorizations are about 25% for this particular case. Many >> of these little subsystems are going to be identical (many will correspond >> to constant coefficient Stokes), and it is fairly easy to figure out which >> are which. How hard would it be to modify PCASM to allow for the >> specification of one or more "default" KSPs that can be used for specified >> blocks? >> >> Of course, we'll also look into tweaking the subdomain solves --- it may not >> even be necessary to do exact subdomain solves to get reasonable MG >> performance. >> >> -- Boyce >> >>> If you are using the master branch of PETSc two users gave us a nifty new >>> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers >>> time and flop etc. You can run with -log_view :filename.xml:ascii_xml and >>> then open the file with a browser (for example open -f Safari filename.xml) >>> or email the file. >>> >>> Barry >>> >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <[email protected]> >>>> wrote: >>>> >>>> >>>> >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <[email protected]> wrote: >>>>> >>>>> Either way is fine so long as I don't have to install a ton of stuff; >>>>> which it sounds like I won’t. >>>> >>>> http://hpctoolkit.org/download/hpcviewer/ >>>> >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder >>>> to Applications. You will be able to >>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You >>>> will be able to see three different kind of profiling >>>> under Calling Context View, Callers View and Flat View. >>>> >>>> >>>> >>>> <hpctoolkit-main2d-database.zip> >>>> >>> >> >
