Barry --

Another random thought --- are these smallish direct solves things that make 
sense to (try to) offload to a GPU?

Thanks,

-- Boyce

> On Jan 16, 2016, at 10:46 PM, Barry Smith <[email protected]> wrote:
> 
> 
>  Boyce,
> 
>   Of course anything is possible in software. But I expect an optimization to 
> not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() 
> rather than some PETSc option that we could add (especially if you are using 
> Matt's PC_COMPOSITE_MULTIPLICATIVE).
> 
>   I would start by copying PCSetUp_ASM(), stripping out all the setup stuff 
> that doesn't relate to your code and then mark identical domains so you don't 
> need to call MatGetSubMatrices() on those domains and don't create a new KSP 
> for each one of those subdomains (but reuses a common one). The PCApply_ASM() 
> should be hopefully be reusable so long as you have created the full array of 
> KSP objects (some of which will be common). If you increase the reference 
> counts of the common KSP in 
> PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() 
> should also work unchanged
> 
> Good luck,
> 
>  Barry
> 
>> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <[email protected]> 
>> wrote:
>> 
>> 
>>> On Jan 16, 2016, at 7:00 PM, Barry Smith <[email protected]> wrote:
>>> 
>>> 
>>> Ok, I looked at your results in hpcviewer and don't see any surprises. The 
>>> PETSc time is in the little LU factorizations, the LU solves and the 
>>> matrix-vector products as it should be. Not much can be done on speeding 
>>> these except running on machines with high memory bandwidth. 
>> 
>> Looks like LU factorizations are about 25% for this particular case.  Many 
>> of these little subsystems are going to be identical (many will correspond 
>> to constant coefficient Stokes), and it is fairly easy to figure out which 
>> are which.  How hard would it be to modify PCASM to allow for the 
>> specification of one or more "default" KSPs that can be used for specified 
>> blocks?
>> 
>> Of course, we'll also look into tweaking the subdomain solves --- it may not 
>> even be necessary to do exact subdomain solves to get reasonable MG 
>> performance.
>> 
>> -- Boyce
>> 
>>> If you are using the master branch of PETSc two users gave us a nifty new 
>>> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers 
>>> time and flop etc. You can run with -log_view :filename.xml:ascii_xml and 
>>> then open the file with a browser (for example open -f Safari filename.xml) 
>>> or email the file.
>>> 
>>> Barry
>>> 
>>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <[email protected]> 
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <[email protected]> wrote:
>>>>> 
>>>>> Either way is fine so long as I don't have to install a ton of stuff; 
>>>>> which it sounds like I won’t.
>>>> 
>>>> http://hpctoolkit.org/download/hpcviewer/
>>>> 
>>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder 
>>>> to Applications. You will be able to 
>>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You 
>>>> will be able to see three different kind of profiling
>>>> under Calling Context View, Callers View and Flat View.   
>>>> 
>>>> 
>>>> 
>>>> <hpctoolkit-main2d-database.zip>
>>>> 
>>> 
>> 
> 

Reply via email to