> On Aug 5, 2016, at 4:27 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > I looked at the code (and read the manual page better) > > PC_ASM_BASIC - full interpolation and restriction > PC_ASM_RESTRICT - full restriction, local processor interpolation > PC_ASM_INTERPOLATE - full interpolation, local processor restriction > PC_ASM_NONE - local processor restriction and interpolation > > > It is not doing what you and I assumed it is doing. The restrict and > interpolate are only short circuited (skipped) across processes any > restriction and interpolation within an MPI process is always done. Thus in > sequential runs the different variants will make no difference. I don't think > I would have written it this way.
Thanks, Barry --- I think that this explains some weird results we've been getting when trying to use PCASM with small subdomains as a smoother (e.g. performance degrades with larger overlaps). At least for convergence benchmarking, we can get away with using a simple implementation of RASM. Also, could this explain why the locally multiplicative version of PCASM seems to perform the same (or worse) than the locally additive version? > Sorry I wasted your time, but it doesn't look like there is anything useful > for you with PCASM; it needs to be completely refactored. > > Barry > > > > > >> On Aug 5, 2016, at 1:26 AM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >> >> >>> On Aug 4, 2016, at 9:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>> >>> >>> The magic handling of _1_ etc is all done in >>> PetscOptionsFindPair_Private() so you need to put a break point in that >>> routine and see why the requested value is not located. >> >> I haven’t tracked down the source of the problem with using _1_ etc, but I >> have checked to see what happens if I switch between >> basic/restrict/interpolate/none “manually” on each level, and I still see >> the same results for all choices. >> >> I’ve checked the IS’es and am reasonably confident that they are being >> generated correctly the the “overlap” and “non-overlap” regions. It is >> definitely the case that the overlap region contains the non-overlap >> regions, and the overlap region is bigger (by the proper amount) from the >> non-overlap region. >> >> It looks like ksp/ksp/examples/tutorials/ex8.c uses PCASMSetLocalSubdomains >> to set up the subdomains for ASM. If I run this example using, e.g., >> >> ./ex8 -m 100 -n 100 -Mdomains 8 -Ndomains 8 -user_set_subdomains -ksp_rtol >> 1.0e-3 -ksp_monitor -pc_asm_type XXXX >> >> I get the same exact results for all different ASM types. I checked (using >> -ksp_view) that the ASM type settings were being honored. Are these >> subdomains not being setup to include overlaps (in which case I guess all >> ASM versions would yield the same results)? >> >> Thanks, >> >> — Boyce >> >>> >>> Barry >>> >>> >>>> On Aug 4, 2016, at 9:46 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>>> >>>> >>>>> On Aug 4, 2016, at 9:41 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>>>> >>>>> >>>>>> On Aug 4, 2016, at 9:26 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>>>>> >>>>>>> >>>>>>> On Aug 4, 2016, at 9:01 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Aug 4, 2016, at 8:51 PM, Boyce Griffith <griff...@cims.nyu.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 4, 2016, at 8:42 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> History, >>>>>>>>> >>>>>>>>> 1) I originally implemented the ASM with one subdomain per process >>>>>>>>> 2) easily extended to support multiple domain per process >>>>>>>>> 3) added -pc_asm_type restrict etc but it only worked for one >>>>>>>>> subdomain per process because it took advantage of the fact that >>>>>>>>> restrict etc could be achieved by simply dropping the parallel >>>>>>>>> communication in the vector scatters >>>>>>>>> 4) Matt didn't like the restriction to one process per subdomain so >>>>>>>>> he added an additional argument to PCASMSetLocalSubdomains() that >>>>>>>>> allowed passing in the overlapping and non-overlapping regions of >>>>>>>>> each domain (foolishly calling the non-overlapping index set is_local >>>>>>>>> even though local has nothing to do with), so that the restrict etc >>>>>>>>> could be handled. >>>>>>>>> >>>>>>>>> Unfortunately IMHO Matt made a mess of things because if you use >>>>>>>>> things like -pc_asm_blocks n or -pc_asm_overlap 1 etc it does not >>>>>>>>> handle the -pc_asm_type restrict since it cannot track the is vs >>>>>>>>> is_local. The code needs to be refactored so that things like >>>>>>>>> -pc_asm_blocks and -pc_asm_overlap 1 can track the is vs is_local >>>>>>>>> index sets properly when the -pc_asm_type is set. Also the name >>>>>>>>> is_local needs to be changed to something meaningfully like >>>>>>>>> is_nonoverlapping This refactoring would also result in easier >>>>>>>>> cleaner code then is currently there. >>>>>>>>> >>>>>>>>> So basically until the PCASM is refactored properly to handle >>>>>>>>> restrict etc you are stuck with being able to use the restrict etc >>>>>>>>> ONLY if you specifically supply the overlapping and non overlapping >>>>>>>>> domains yourself with PCASMSetLocalSubdomains and curse at Matt >>>>>>>>> everyday like we all do. >>>>>>>> >>>>>>>> OK, got it. The reason I’m asking is that we are using PCASM in a >>>>>>>> custom smoother, and I noticed that basic/restrict/interpolate/none >>>>>>>> all give identical results. We are using PCASMSetLocalSubdomains to >>>>>>>> set up the subdomains. >>>>>>> >>>>>>> But are you setting different is and is_local (stupid name) and not >>>>>>> have PETSc computing the overlap in your custom code? If you are >>>>>>> setting them differently and not having PETSc compute overlap but >>>>>>> getting identical convergence then something is wrong and you likely >>>>>>> have to run in the debugger to insure that restrict etc is properly >>>>>>> being set and used. >>>>>> >>>>>> Yes we are computing overlapping and non-overlapping IS’es. >>>>>> >>>>>> I just double-checked, and somehow the ASMType setting is not making it >>>>>> from the command line into the solver configuration — sorry, I should >>>>>> have checked this more carefully before emailing the list. (I thought >>>>>> that the command line options were being captured correctly, since I am >>>>>> able to control the PC type and all of the sub-KSP/sub-PC settings.) >>>>> >>>>> OK, so here is what appears to be happening. These solvers are named >>>>> things like “stokes_pc_level_0_”, “stokes_pc_level_1_”, … . If I use the >>>>> command-line argument >>>>> >>>>> -stokes_ib_pc_level_0_pc_asm_type basic >>>>> >>>>> then the ASM settings are used, but if I do: >>>>> >>>>> -stokes_ib_pc_level_pc_asm_type basic >>>>> >>>>> they are ignored. Any ideas? :-) >>>> >>>> I should have said: we are playing around with a lot of different command >>>> line options that are being collectively applied to all of the level >>>> solvers, and these options for ASM are the only ones I’ve encountered so >>>> far that have to include the level number to have an effect. >>>> >>>> Thanks, >>>> >>>> — Boyce >>>> >>>>> >>>>> Thanks, >>>>> >>>>> — Boyce >>>>> >>>>>>>> BTW, there is also this bit (which was easy to overlook in all of the >>>>>>>> repetitive convergence histories): >>>>>>> >>>>>>> Yeah, better one question per email or we will miss them. >>>>>>> >>>>>>> There is nothing that says that multiplicative will ALWAYS beat >>>>>>> additive, though intuitively you expect it to. >>>>>> >>>>>> OK, so similar story as above: we have a custom MSM that, when used as a >>>>>> MG smoother, gives convergence rates that are about 2x PCASM, whereas >>>>>> when we use PCASM with MULTIPLICATIVE, it doesn’t seem to help. >>>>>> >>>>>> However, now I am questioning whether the settings are getting >>>>>> propagated into PCASM… I’ll need to take another look. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> — Boyce >>>>>> >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>>> >>>>>>>>>> Also, the MULTIPLICATIVE variant does not seem to behave as I would >>>>>>>>>> expect --- for this same example, if you switch from ADDITIVE to >>>>>>>>>> MULTIPLICATIVE, the solver converges slightly more slowly: >>>>>>>>>> >>>>>>>>>> $ ./ex2 -m 32 -n 32 -pc_type asm -pc_asm_blocks 8 -ksp_view >>>>>>>>>> -ksp_monitor_true_residual -pc_asm_local_type MULTIPLICATIVE >>>>>>>>>> 0 KSP preconditioned resid norm 7.467363913958e+00 true resid norm >>>>>>>>>> 1.166190378969e+01 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>>>>> 1 KSP preconditioned resid norm 2.878371937592e+00 true resid norm >>>>>>>>>> 3.646367718253e+00 ||r(i)||/||b|| 3.126734522949e-01 >>>>>>>>>> 2 KSP preconditioned resid norm 1.666575161021e+00 true resid norm >>>>>>>>>> 1.940699059619e+00 ||r(i)||/||b|| 1.664135714560e-01 >>>>>>>>>> 3 KSP preconditioned resid norm 1.086140238220e+00 true resid norm >>>>>>>>>> 1.191473615464e+00 ||r(i)||/||b|| 1.021680196433e-01 >>>>>>>>>> 4 KSP preconditioned resid norm 7.939217314942e-01 true resid norm >>>>>>>>>> 8.059317628307e-01 ||r(i)||/||b|| 6.910807852344e-02 >>>>>>>>>> 5 KSP preconditioned resid norm 6.265169154675e-01 true resid norm >>>>>>>>>> 5.942294290555e-01 ||r(i)||/||b|| 5.095475316653e-02 >>>>>>>>>> 6 KSP preconditioned resid norm 5.164999302721e-01 true resid norm >>>>>>>>>> 4.585844476718e-01 ||r(i)||/||b|| 3.932329197203e-02 >>>>>>>>>> 7 KSP preconditioned resid norm 4.472399844370e-01 true resid norm >>>>>>>>>> 3.884049472908e-01 ||r(i)||/||b|| 3.330544946136e-02 >>>>>>>>>> 8 KSP preconditioned resid norm 3.445446366213e-01 true resid norm >>>>>>>>>> 4.008290378967e-01 ||r(i)||/||b|| 3.437080644166e-02 >>>>>>>>>> 9 KSP preconditioned resid norm 1.987509894375e-01 true resid norm >>>>>>>>>> 2.619628925380e-01 ||r(i)||/||b|| 2.246313271505e-02 >>>>>>>>>> 10 KSP preconditioned resid norm 1.084551743751e-01 true resid norm >>>>>>>>>> 1.354891040098e-01 ||r(i)||/||b|| 1.161809481995e-02 >>>>>>>>>> 11 KSP preconditioned resid norm 6.108303419460e-02 true resid norm >>>>>>>>>> 7.252267103275e-02 ||r(i)||/||b|| 6.218767736436e-03 >>>>>>>>>> 12 KSP preconditioned resid norm 3.641579250431e-02 true resid norm >>>>>>>>>> 4.069996187932e-02 ||r(i)||/||b|| 3.489992938829e-03 >>>>>>>>>> 13 KSP preconditioned resid norm 2.424898818735e-02 true resid norm >>>>>>>>>> 2.469590201945e-02 ||r(i)||/||b|| 2.117656127577e-03 >>>>>>>>>> 14 KSP preconditioned resid norm 1.792399391125e-02 true resid norm >>>>>>>>>> 1.622090905110e-02 ||r(i)||/||b|| 1.390931475995e-03 >>>>>>>>>> 15 KSP preconditioned resid norm 1.320657155648e-02 true resid norm >>>>>>>>>> 1.336753101147e-02 ||r(i)||/||b|| 1.146256327657e-03 >>>>>>>>>> 16 KSP preconditioned resid norm 7.398524571182e-03 true resid norm >>>>>>>>>> 9.747691680405e-03 ||r(i)||/||b|| 8.358576657974e-04 >>>>>>>>>> 17 KSP preconditioned resid norm 3.043993613039e-03 true resid norm >>>>>>>>>> 3.848714422908e-03 ||r(i)||/||b|| 3.300245390731e-04 >>>>>>>>>> 18 KSP preconditioned resid norm 1.767867968946e-03 true resid norm >>>>>>>>>> 1.736586340170e-03 ||r(i)||/||b|| 1.489110501585e-04 >>>>>>>>>> 19 KSP preconditioned resid norm 1.088792656005e-03 true resid norm >>>>>>>>>> 1.307506936484e-03 ||r(i)||/||b|| 1.121177948355e-04 >>>>>>>>>> 20 KSP preconditioned resid norm 4.622653682144e-04 true resid norm >>>>>>>>>> 5.718427718734e-04 ||r(i)||/||b|| 4.903511315013e-05 >>>>>>>>>> 21 KSP preconditioned resid norm 2.591703287585e-04 true resid norm >>>>>>>>>> 2.690982547548e-04 ||r(i)||/||b|| 2.307498497738e-05 >>>>>>>>>> 22 KSP preconditioned resid norm 1.596527181997e-04 true resid norm >>>>>>>>>> 1.715846687846e-04 ||r(i)||/||b|| 1.471326396435e-05 >>>>>>>>>> 23 KSP preconditioned resid norm 1.006766623019e-04 true resid norm >>>>>>>>>> 1.044525361282e-04 ||r(i)||/||b|| 8.956731080268e-06 >>>>>>>>>> 24 KSP preconditioned resid norm 5.349814270060e-05 true resid norm >>>>>>>>>> 6.598682341705e-05 ||r(i)||/||b|| 5.658323427037e-06 >>>>>>>>>> KSP Object: 1 MPI processes >>>>>>>>>> type: gmres >>>>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>>>>>>> Orthogonalization with no iterative refinement >>>>>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>>>> tolerances: relative=9.18274e-06, absolute=1e-50, divergence=10000. >>>>>>>>>> left preconditioning >>>>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>>>> PC Object: 1 MPI processes >>>>>>>>>> type: asm >>>>>>>>>> Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1 >>>>>>>>>> Additive Schwarz: restriction/interpolation type - BASIC >>>>>>>>>> Additive Schwarz: local solve composition type - MULTIPLICATIVE >>>>>>>>>> Local solve is same for all blocks, in the following KSP and PC >>>>>>>>>> objects: >>>>>>>>>> KSP Object: (sub_) 1 MPI processes >>>>>>>>>> type: preonly >>>>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>>>>>>> left preconditioning >>>>>>>>>> using NONE norm type for convergence test >>>>>>>>>> PC Object: (sub_) 1 MPI processes >>>>>>>>>> type: icc >>>>>>>>>> 0 levels of fill >>>>>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>>>>> using Manteuffel shift [POSITIVE_DEFINITE] >>>>>>>>>> matrix ordering: natural >>>>>>>>>> factor fill ratio given 1., needed 1. >>>>>>>>>> Factored matrix follows: >>>>>>>>>> Mat Object: 1 MPI processes >>>>>>>>>> type: seqsbaij >>>>>>>>>> rows=160, cols=160 >>>>>>>>>> package used to perform factorization: petsc >>>>>>>>>> total: nonzeros=443, allocated nonzeros=443 >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>>>> block size is 1 >>>>>>>>>> linear system matrix = precond matrix: >>>>>>>>>> Mat Object: 1 MPI processes >>>>>>>>>> type: seqaij >>>>>>>>>> rows=160, cols=160 >>>>>>>>>> total: nonzeros=726, allocated nonzeros=726 >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>>>> not using I-node routines >>>>>>>>>> linear system matrix = precond matrix: >>>>>>>>>> Mat Object: 1 MPI processes >>>>>>>>>> type: seqaij >>>>>>>>>> rows=1024, cols=1024 >>>>>>>>>> total: nonzeros=4992, allocated nonzeros=5120 >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>>>> not using I-node routines >>>>>>>>>> Norm of error 0.000292304 iterations 24 >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -- Boyce >>>>> >>>> >>