> On Aug 5, 2016, at 4:27 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
> 
>  I looked at the code (and read the manual page better)
> 
>      PC_ASM_BASIC       - full interpolation and restriction
>      PC_ASM_RESTRICT    - full restriction, local processor interpolation
>      PC_ASM_INTERPOLATE - full interpolation, local processor restriction
>      PC_ASM_NONE        - local processor restriction and interpolation
> 
> 
> It is not doing what you and I assumed it is doing. The restrict and 
> interpolate are only short circuited (skipped) across processes any 
> restriction and interpolation within an MPI process is always done. Thus in 
> sequential runs the different variants will make no difference. I don't think 
> I would have written it this way.

Thanks, Barry --- I think that this explains some weird results we've been 
getting when trying to use PCASM with small subdomains as a smoother (e.g. 
performance degrades with larger overlaps). At least for convergence 
benchmarking, we can get away with using a simple implementation of RASM.

Also, could this explain why the locally multiplicative version of PCASM seems 
to perform the same (or worse) than the locally additive version?

>   Sorry I wasted your time, but it doesn't look like there is anything useful 
> for you with PCASM; it needs to be completely refactored.
> 
>  Barry
> 
> 
> 
> 
> 
>> On Aug 5, 2016, at 1:26 AM, Boyce Griffith <griff...@cims.nyu.edu> wrote:
>> 
>> 
>>> On Aug 4, 2016, at 9:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>> 
>>> 
>>>  The magic handling of _1_ etc is all done in 
>>> PetscOptionsFindPair_Private() so you need to put a break point in that 
>>> routine and see why the requested value is not located.
>> 
>> I haven’t tracked down the source of the problem with using _1_ etc, but I 
>> have checked to see what happens if I switch between 
>> basic/restrict/interpolate/none “manually” on each level, and I still see 
>> the same results for all choices.
>> 
>> I’ve checked the IS’es and am reasonably confident that they are being 
>> generated correctly the the “overlap” and “non-overlap” regions. It is 
>> definitely the case that the overlap region contains the non-overlap 
>> regions, and the overlap region is bigger (by the proper amount) from the 
>> non-overlap region.
>> 
>> It looks like ksp/ksp/examples/tutorials/ex8.c uses PCASMSetLocalSubdomains 
>> to set up the subdomains for ASM. If I run this example using, e.g.,
>> 
>> ./ex8 -m 100 -n 100 -Mdomains 8 -Ndomains 8 -user_set_subdomains -ksp_rtol 
>> 1.0e-3 -ksp_monitor -pc_asm_type XXXX
>> 
>> I get the same exact results for all different ASM types. I checked (using 
>> -ksp_view) that the ASM type settings were being honored. Are these 
>> subdomains not being setup to include overlaps (in which case I guess all 
>> ASM versions would yield the same results)?
>> 
>> Thanks,
>> 
>> — Boyce
>> 
>>> 
>>> Barry
>>> 
>>> 
>>>> On Aug 4, 2016, at 9:46 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote:
>>>> 
>>>> 
>>>>> On Aug 4, 2016, at 9:41 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote:
>>>>> 
>>>>> 
>>>>>> On Aug 4, 2016, at 9:26 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On Aug 4, 2016, at 9:01 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Aug 4, 2016, at 8:51 PM, Boyce Griffith <griff...@cims.nyu.edu> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Aug 4, 2016, at 8:42 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> History,
>>>>>>>>> 
>>>>>>>>> 1) I originally implemented the ASM with one subdomain per process
>>>>>>>>> 2) easily extended to support multiple domain per process
>>>>>>>>> 3) added -pc_asm_type restrict etc but it only worked for one 
>>>>>>>>> subdomain per process because it took advantage of the fact that 
>>>>>>>>> restrict etc could be achieved by simply dropping the parallel 
>>>>>>>>> communication in the vector scatters
>>>>>>>>> 4) Matt didn't like the restriction to one process per subdomain so 
>>>>>>>>> he added an additional argument to PCASMSetLocalSubdomains() that 
>>>>>>>>> allowed passing in the overlapping and non-overlapping regions of 
>>>>>>>>> each domain (foolishly calling the non-overlapping index set is_local 
>>>>>>>>> even though local has nothing to do with), so that the restrict etc 
>>>>>>>>> could be handled.
>>>>>>>>> 
>>>>>>>>> Unfortunately IMHO Matt made a mess of things because if you use 
>>>>>>>>> things like -pc_asm_blocks n or  -pc_asm_overlap 1 etc it does not 
>>>>>>>>> handle the -pc_asm_type restrict since it cannot track the is vs 
>>>>>>>>> is_local. The code needs to be refactored so that things like 
>>>>>>>>> -pc_asm_blocks and -pc_asm_overlap 1 can track the is vs is_local 
>>>>>>>>> index sets properly when the -pc_asm_type is set. Also the name 
>>>>>>>>> is_local needs to be changed to something meaningfully like 
>>>>>>>>> is_nonoverlapping This refactoring would also result in easier 
>>>>>>>>> cleaner code then is currently there.
>>>>>>>>> 
>>>>>>>>> So basically until the PCASM is refactored properly to handle 
>>>>>>>>> restrict etc you are stuck with being able to use the restrict etc 
>>>>>>>>> ONLY if you specifically supply the overlapping and non overlapping 
>>>>>>>>> domains yourself with PCASMSetLocalSubdomains and curse at Matt 
>>>>>>>>> everyday like we all do.
>>>>>>>> 
>>>>>>>> OK, got it. The reason I’m asking is that we are using PCASM in a 
>>>>>>>> custom smoother, and I noticed that basic/restrict/interpolate/none 
>>>>>>>> all give identical results. We are using PCASMSetLocalSubdomains to 
>>>>>>>> set up the subdomains.
>>>>>>> 
>>>>>>>  But are you setting different is and is_local (stupid name) and not 
>>>>>>> have PETSc computing the overlap in your custom code? If you are 
>>>>>>> setting them differently and not having PETSc compute overlap but 
>>>>>>> getting identical convergence then something is wrong and you likely 
>>>>>>> have to run in the debugger to insure that restrict etc is properly 
>>>>>>> being set and used.
>>>>>> 
>>>>>> Yes we are computing overlapping and non-overlapping IS’es.
>>>>>> 
>>>>>> I just double-checked, and somehow the ASMType setting is not making it 
>>>>>> from the command line into the solver configuration — sorry, I should 
>>>>>> have checked this more carefully before emailing the list. (I thought 
>>>>>> that the command line options were being captured correctly, since I am 
>>>>>> able to control the PC type and all of the sub-KSP/sub-PC settings.)
>>>>> 
>>>>> OK, so here is what appears to be happening. These solvers are named 
>>>>> things like “stokes_pc_level_0_”, “stokes_pc_level_1_”, … . If I use the 
>>>>> command-line argument
>>>>> 
>>>>>   -stokes_ib_pc_level_0_pc_asm_type basic
>>>>> 
>>>>> then the ASM settings are used, but if I do:
>>>>> 
>>>>>   -stokes_ib_pc_level_pc_asm_type basic
>>>>> 
>>>>> they are ignored. Any ideas? :-)
>>>> 
>>>> I should have said: we are playing around with a lot of different command 
>>>> line options that are being collectively applied to all of the level 
>>>> solvers, and these options for ASM are the only ones I’ve encountered so 
>>>> far that have to include the level number to have an effect.
>>>> 
>>>> Thanks,
>>>> 
>>>> — Boyce
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> — Boyce
>>>>> 
>>>>>>>> BTW, there is also this bit (which was easy to overlook in all of the 
>>>>>>>> repetitive convergence histories):
>>>>>>> 
>>>>>>> Yeah, better one question per email or we will miss them.
>>>>>>> 
>>>>>>> There is nothing that says that multiplicative will ALWAYS beat 
>>>>>>> additive, though intuitively you expect it to.
>>>>>> 
>>>>>> OK, so similar story as above: we have a custom MSM that, when used as a 
>>>>>> MG smoother, gives convergence rates that are about 2x PCASM, whereas 
>>>>>> when we use PCASM with MULTIPLICATIVE, it doesn’t seem to help.
>>>>>> 
>>>>>> However, now I am questioning whether the settings are getting 
>>>>>> propagated into PCASM… I’ll need to take another look.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> — Boyce
>>>>>> 
>>>>>>> 
>>>>>>> Barry
>>>>>>> 
>>>>>>>> 
>>>>>>>>>> Also, the MULTIPLICATIVE variant does not seem to behave as I would 
>>>>>>>>>> expect --- for this same example, if you switch from ADDITIVE to 
>>>>>>>>>> MULTIPLICATIVE, the solver converges slightly more slowly:
>>>>>>>>>> 
>>>>>>>>>> $ ./ex2 -m 32 -n 32 -pc_type asm -pc_asm_blocks 8 -ksp_view 
>>>>>>>>>> -ksp_monitor_true_residual -pc_asm_local_type MULTIPLICATIVE
>>>>>>>>>> 0 KSP preconditioned resid norm 7.467363913958e+00 true resid norm 
>>>>>>>>>> 1.166190378969e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>>>>> 1 KSP preconditioned resid norm 2.878371937592e+00 true resid norm 
>>>>>>>>>> 3.646367718253e+00 ||r(i)||/||b|| 3.126734522949e-01
>>>>>>>>>> 2 KSP preconditioned resid norm 1.666575161021e+00 true resid norm 
>>>>>>>>>> 1.940699059619e+00 ||r(i)||/||b|| 1.664135714560e-01
>>>>>>>>>> 3 KSP preconditioned resid norm 1.086140238220e+00 true resid norm 
>>>>>>>>>> 1.191473615464e+00 ||r(i)||/||b|| 1.021680196433e-01
>>>>>>>>>> 4 KSP preconditioned resid norm 7.939217314942e-01 true resid norm 
>>>>>>>>>> 8.059317628307e-01 ||r(i)||/||b|| 6.910807852344e-02
>>>>>>>>>> 5 KSP preconditioned resid norm 6.265169154675e-01 true resid norm 
>>>>>>>>>> 5.942294290555e-01 ||r(i)||/||b|| 5.095475316653e-02
>>>>>>>>>> 6 KSP preconditioned resid norm 5.164999302721e-01 true resid norm 
>>>>>>>>>> 4.585844476718e-01 ||r(i)||/||b|| 3.932329197203e-02
>>>>>>>>>> 7 KSP preconditioned resid norm 4.472399844370e-01 true resid norm 
>>>>>>>>>> 3.884049472908e-01 ||r(i)||/||b|| 3.330544946136e-02
>>>>>>>>>> 8 KSP preconditioned resid norm 3.445446366213e-01 true resid norm 
>>>>>>>>>> 4.008290378967e-01 ||r(i)||/||b|| 3.437080644166e-02
>>>>>>>>>> 9 KSP preconditioned resid norm 1.987509894375e-01 true resid norm 
>>>>>>>>>> 2.619628925380e-01 ||r(i)||/||b|| 2.246313271505e-02
>>>>>>>>>> 10 KSP preconditioned resid norm 1.084551743751e-01 true resid norm 
>>>>>>>>>> 1.354891040098e-01 ||r(i)||/||b|| 1.161809481995e-02
>>>>>>>>>> 11 KSP preconditioned resid norm 6.108303419460e-02 true resid norm 
>>>>>>>>>> 7.252267103275e-02 ||r(i)||/||b|| 6.218767736436e-03
>>>>>>>>>> 12 KSP preconditioned resid norm 3.641579250431e-02 true resid norm 
>>>>>>>>>> 4.069996187932e-02 ||r(i)||/||b|| 3.489992938829e-03
>>>>>>>>>> 13 KSP preconditioned resid norm 2.424898818735e-02 true resid norm 
>>>>>>>>>> 2.469590201945e-02 ||r(i)||/||b|| 2.117656127577e-03
>>>>>>>>>> 14 KSP preconditioned resid norm 1.792399391125e-02 true resid norm 
>>>>>>>>>> 1.622090905110e-02 ||r(i)||/||b|| 1.390931475995e-03
>>>>>>>>>> 15 KSP preconditioned resid norm 1.320657155648e-02 true resid norm 
>>>>>>>>>> 1.336753101147e-02 ||r(i)||/||b|| 1.146256327657e-03
>>>>>>>>>> 16 KSP preconditioned resid norm 7.398524571182e-03 true resid norm 
>>>>>>>>>> 9.747691680405e-03 ||r(i)||/||b|| 8.358576657974e-04
>>>>>>>>>> 17 KSP preconditioned resid norm 3.043993613039e-03 true resid norm 
>>>>>>>>>> 3.848714422908e-03 ||r(i)||/||b|| 3.300245390731e-04
>>>>>>>>>> 18 KSP preconditioned resid norm 1.767867968946e-03 true resid norm 
>>>>>>>>>> 1.736586340170e-03 ||r(i)||/||b|| 1.489110501585e-04
>>>>>>>>>> 19 KSP preconditioned resid norm 1.088792656005e-03 true resid norm 
>>>>>>>>>> 1.307506936484e-03 ||r(i)||/||b|| 1.121177948355e-04
>>>>>>>>>> 20 KSP preconditioned resid norm 4.622653682144e-04 true resid norm 
>>>>>>>>>> 5.718427718734e-04 ||r(i)||/||b|| 4.903511315013e-05
>>>>>>>>>> 21 KSP preconditioned resid norm 2.591703287585e-04 true resid norm 
>>>>>>>>>> 2.690982547548e-04 ||r(i)||/||b|| 2.307498497738e-05
>>>>>>>>>> 22 KSP preconditioned resid norm 1.596527181997e-04 true resid norm 
>>>>>>>>>> 1.715846687846e-04 ||r(i)||/||b|| 1.471326396435e-05
>>>>>>>>>> 23 KSP preconditioned resid norm 1.006766623019e-04 true resid norm 
>>>>>>>>>> 1.044525361282e-04 ||r(i)||/||b|| 8.956731080268e-06
>>>>>>>>>> 24 KSP preconditioned resid norm 5.349814270060e-05 true resid norm 
>>>>>>>>>> 6.598682341705e-05 ||r(i)||/||b|| 5.658323427037e-06
>>>>>>>>>> KSP Object: 1 MPI processes
>>>>>>>>>> type: gmres
>>>>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
>>>>>>>>>> Orthogonalization with no iterative refinement
>>>>>>>>>> GMRES: happy breakdown tolerance 1e-30
>>>>>>>>>> maximum iterations=10000, initial guess is zero
>>>>>>>>>> tolerances:  relative=9.18274e-06, absolute=1e-50, divergence=10000.
>>>>>>>>>> left preconditioning
>>>>>>>>>> using PRECONDITIONED norm type for convergence test
>>>>>>>>>> PC Object: 1 MPI processes
>>>>>>>>>> type: asm
>>>>>>>>>> Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1
>>>>>>>>>> Additive Schwarz: restriction/interpolation type - BASIC
>>>>>>>>>> Additive Schwarz: local solve composition type - MULTIPLICATIVE
>>>>>>>>>> Local solve is same for all blocks, in the following KSP and PC 
>>>>>>>>>> objects:
>>>>>>>>>> KSP Object:    (sub_)     1 MPI processes
>>>>>>>>>>  type: preonly
>>>>>>>>>>  maximum iterations=10000, initial guess is zero
>>>>>>>>>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>>>>>>  left preconditioning
>>>>>>>>>>  using NONE norm type for convergence test
>>>>>>>>>> PC Object:    (sub_)     1 MPI processes
>>>>>>>>>>  type: icc
>>>>>>>>>>    0 levels of fill
>>>>>>>>>>    tolerance for zero pivot 2.22045e-14
>>>>>>>>>>    using Manteuffel shift [POSITIVE_DEFINITE]
>>>>>>>>>>    matrix ordering: natural
>>>>>>>>>>    factor fill ratio given 1., needed 1.
>>>>>>>>>>      Factored matrix follows:
>>>>>>>>>>        Mat Object:             1 MPI processes
>>>>>>>>>>          type: seqsbaij
>>>>>>>>>>          rows=160, cols=160
>>>>>>>>>>          package used to perform factorization: petsc
>>>>>>>>>>          total: nonzeros=443, allocated nonzeros=443
>>>>>>>>>>          total number of mallocs used during MatSetValues calls =0
>>>>>>>>>>              block size is 1
>>>>>>>>>>  linear system matrix = precond matrix:
>>>>>>>>>>  Mat Object:       1 MPI processes
>>>>>>>>>>    type: seqaij
>>>>>>>>>>    rows=160, cols=160
>>>>>>>>>>    total: nonzeros=726, allocated nonzeros=726
>>>>>>>>>>    total number of mallocs used during MatSetValues calls =0
>>>>>>>>>>      not using I-node routines
>>>>>>>>>> linear system matrix = precond matrix:
>>>>>>>>>> Mat Object:   1 MPI processes
>>>>>>>>>> type: seqaij
>>>>>>>>>> rows=1024, cols=1024
>>>>>>>>>> total: nonzeros=4992, allocated nonzeros=5120
>>>>>>>>>> total number of mallocs used during MatSetValues calls =0
>>>>>>>>>>  not using I-node routines
>>>>>>>>>> Norm of error 0.000292304 iterations 24
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> -- Boyce
>>>>> 
>>>> 
>> 

Reply via email to