> On Aug 4, 2016, at 9:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > The magic handling of _1_ etc is all done in PetscOptionsFindPair_Private() > so you need to put a break point in that routine and see why the requested > value is not located.
I haven’t tracked down the source of the problem with using _1_ etc, but I have checked to see what happens if I switch between basic/restrict/interpolate/none “manually” on each level, and I still see the same results for all choices. I’ve checked the IS’es and am reasonably confident that they are being generated correctly the the “overlap” and “non-overlap” regions. It is definitely the case that the overlap region contains the non-overlap regions, and the overlap region is bigger (by the proper amount) from the non-overlap region. It looks like ksp/ksp/examples/tutorials/ex8.c uses PCASMSetLocalSubdomains to set up the subdomains for ASM. If I run this example using, e.g., ./ex8 -m 100 -n 100 -Mdomains 8 -Ndomains 8 -user_set_subdomains -ksp_rtol 1.0e-3 -ksp_monitor -pc_asm_type XXXX I get the same exact results for all different ASM types. I checked (using -ksp_view) that the ASM type settings were being honored. Are these subdomains not being setup to include overlaps (in which case I guess all ASM versions would yield the same results)? Thanks, — Boyce > > Barry > > >> On Aug 4, 2016, at 9:46 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >> >> >>> On Aug 4, 2016, at 9:41 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>> >>> >>>> On Aug 4, 2016, at 9:26 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>>> >>>>> >>>>> On Aug 4, 2016, at 9:01 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>> >>>>>> >>>>>> On Aug 4, 2016, at 8:51 PM, Boyce Griffith <griff...@cims.nyu.edu> wrote: >>>>>> >>>>>>> >>>>>>> On Aug 4, 2016, at 8:42 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>>>> >>>>>>> >>>>>>> History, >>>>>>> >>>>>>> 1) I originally implemented the ASM with one subdomain per process >>>>>>> 2) easily extended to support multiple domain per process >>>>>>> 3) added -pc_asm_type restrict etc but it only worked for one subdomain >>>>>>> per process because it took advantage of the fact that >>>>>>> restrict etc could be achieved by simply dropping the parallel >>>>>>> communication in the vector scatters >>>>>>> 4) Matt didn't like the restriction to one process per subdomain so he >>>>>>> added an additional argument to PCASMSetLocalSubdomains() that allowed >>>>>>> passing in the overlapping and non-overlapping regions of each domain >>>>>>> (foolishly calling the non-overlapping index set is_local even though >>>>>>> local has nothing to do with), so that the restrict etc could be >>>>>>> handled. >>>>>>> >>>>>>> Unfortunately IMHO Matt made a mess of things because if you use things >>>>>>> like -pc_asm_blocks n or -pc_asm_overlap 1 etc it does not handle the >>>>>>> -pc_asm_type restrict since it cannot track the is vs is_local. The >>>>>>> code needs to be refactored so that things like -pc_asm_blocks and >>>>>>> -pc_asm_overlap 1 can track the is vs is_local index sets properly when >>>>>>> the -pc_asm_type is set. Also the name is_local needs to be changed to >>>>>>> something meaningfully like is_nonoverlapping This refactoring would >>>>>>> also result in easier cleaner code then is currently there. >>>>>>> >>>>>>> So basically until the PCASM is refactored properly to handle restrict >>>>>>> etc you are stuck with being able to use the restrict etc ONLY if you >>>>>>> specifically supply the overlapping and non overlapping domains >>>>>>> yourself with PCASMSetLocalSubdomains and curse at Matt everyday like >>>>>>> we all do. >>>>>> >>>>>> OK, got it. The reason I’m asking is that we are using PCASM in a custom >>>>>> smoother, and I noticed that basic/restrict/interpolate/none all give >>>>>> identical results. We are using PCASMSetLocalSubdomains to set up the >>>>>> subdomains. >>>>> >>>>> But are you setting different is and is_local (stupid name) and not >>>>> have PETSc computing the overlap in your custom code? If you are setting >>>>> them differently and not having PETSc compute overlap but getting >>>>> identical convergence then something is wrong and you likely have to run >>>>> in the debugger to insure that restrict etc is properly being set and >>>>> used. >>>> >>>> Yes we are computing overlapping and non-overlapping IS’es. >>>> >>>> I just double-checked, and somehow the ASMType setting is not making it >>>> from the command line into the solver configuration — sorry, I should have >>>> checked this more carefully before emailing the list. (I thought that the >>>> command line options were being captured correctly, since I am able to >>>> control the PC type and all of the sub-KSP/sub-PC settings.) >>> >>> OK, so here is what appears to be happening. These solvers are named things >>> like “stokes_pc_level_0_”, “stokes_pc_level_1_”, … . If I use the >>> command-line argument >>> >>> -stokes_ib_pc_level_0_pc_asm_type basic >>> >>> then the ASM settings are used, but if I do: >>> >>> -stokes_ib_pc_level_pc_asm_type basic >>> >>> they are ignored. Any ideas? :-) >> >> I should have said: we are playing around with a lot of different command >> line options that are being collectively applied to all of the level >> solvers, and these options for ASM are the only ones I’ve encountered so far >> that have to include the level number to have an effect. >> >> Thanks, >> >> — Boyce >> >>> >>> Thanks, >>> >>> — Boyce >>> >>>>>> BTW, there is also this bit (which was easy to overlook in all of the >>>>>> repetitive convergence histories): >>>>> >>>>> Yeah, better one question per email or we will miss them. >>>>> >>>>> There is nothing that says that multiplicative will ALWAYS beat >>>>> additive, though intuitively you expect it to. >>>> >>>> OK, so similar story as above: we have a custom MSM that, when used as a >>>> MG smoother, gives convergence rates that are about 2x PCASM, whereas when >>>> we use PCASM with MULTIPLICATIVE, it doesn’t seem to help. >>>> >>>> However, now I am questioning whether the settings are getting propagated >>>> into PCASM… I’ll need to take another look. >>>> >>>> Thanks, >>>> >>>> — Boyce >>>> >>>>> >>>>> Barry >>>>> >>>>>> >>>>>>>> Also, the MULTIPLICATIVE variant does not seem to behave as I would >>>>>>>> expect --- for this same example, if you switch from ADDITIVE to >>>>>>>> MULTIPLICATIVE, the solver converges slightly more slowly: >>>>>>>> >>>>>>>> $ ./ex2 -m 32 -n 32 -pc_type asm -pc_asm_blocks 8 -ksp_view >>>>>>>> -ksp_monitor_true_residual -pc_asm_local_type MULTIPLICATIVE >>>>>>>> 0 KSP preconditioned resid norm 7.467363913958e+00 true resid norm >>>>>>>> 1.166190378969e+01 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>>> 1 KSP preconditioned resid norm 2.878371937592e+00 true resid norm >>>>>>>> 3.646367718253e+00 ||r(i)||/||b|| 3.126734522949e-01 >>>>>>>> 2 KSP preconditioned resid norm 1.666575161021e+00 true resid norm >>>>>>>> 1.940699059619e+00 ||r(i)||/||b|| 1.664135714560e-01 >>>>>>>> 3 KSP preconditioned resid norm 1.086140238220e+00 true resid norm >>>>>>>> 1.191473615464e+00 ||r(i)||/||b|| 1.021680196433e-01 >>>>>>>> 4 KSP preconditioned resid norm 7.939217314942e-01 true resid norm >>>>>>>> 8.059317628307e-01 ||r(i)||/||b|| 6.910807852344e-02 >>>>>>>> 5 KSP preconditioned resid norm 6.265169154675e-01 true resid norm >>>>>>>> 5.942294290555e-01 ||r(i)||/||b|| 5.095475316653e-02 >>>>>>>> 6 KSP preconditioned resid norm 5.164999302721e-01 true resid norm >>>>>>>> 4.585844476718e-01 ||r(i)||/||b|| 3.932329197203e-02 >>>>>>>> 7 KSP preconditioned resid norm 4.472399844370e-01 true resid norm >>>>>>>> 3.884049472908e-01 ||r(i)||/||b|| 3.330544946136e-02 >>>>>>>> 8 KSP preconditioned resid norm 3.445446366213e-01 true resid norm >>>>>>>> 4.008290378967e-01 ||r(i)||/||b|| 3.437080644166e-02 >>>>>>>> 9 KSP preconditioned resid norm 1.987509894375e-01 true resid norm >>>>>>>> 2.619628925380e-01 ||r(i)||/||b|| 2.246313271505e-02 >>>>>>>> 10 KSP preconditioned resid norm 1.084551743751e-01 true resid norm >>>>>>>> 1.354891040098e-01 ||r(i)||/||b|| 1.161809481995e-02 >>>>>>>> 11 KSP preconditioned resid norm 6.108303419460e-02 true resid norm >>>>>>>> 7.252267103275e-02 ||r(i)||/||b|| 6.218767736436e-03 >>>>>>>> 12 KSP preconditioned resid norm 3.641579250431e-02 true resid norm >>>>>>>> 4.069996187932e-02 ||r(i)||/||b|| 3.489992938829e-03 >>>>>>>> 13 KSP preconditioned resid norm 2.424898818735e-02 true resid norm >>>>>>>> 2.469590201945e-02 ||r(i)||/||b|| 2.117656127577e-03 >>>>>>>> 14 KSP preconditioned resid norm 1.792399391125e-02 true resid norm >>>>>>>> 1.622090905110e-02 ||r(i)||/||b|| 1.390931475995e-03 >>>>>>>> 15 KSP preconditioned resid norm 1.320657155648e-02 true resid norm >>>>>>>> 1.336753101147e-02 ||r(i)||/||b|| 1.146256327657e-03 >>>>>>>> 16 KSP preconditioned resid norm 7.398524571182e-03 true resid norm >>>>>>>> 9.747691680405e-03 ||r(i)||/||b|| 8.358576657974e-04 >>>>>>>> 17 KSP preconditioned resid norm 3.043993613039e-03 true resid norm >>>>>>>> 3.848714422908e-03 ||r(i)||/||b|| 3.300245390731e-04 >>>>>>>> 18 KSP preconditioned resid norm 1.767867968946e-03 true resid norm >>>>>>>> 1.736586340170e-03 ||r(i)||/||b|| 1.489110501585e-04 >>>>>>>> 19 KSP preconditioned resid norm 1.088792656005e-03 true resid norm >>>>>>>> 1.307506936484e-03 ||r(i)||/||b|| 1.121177948355e-04 >>>>>>>> 20 KSP preconditioned resid norm 4.622653682144e-04 true resid norm >>>>>>>> 5.718427718734e-04 ||r(i)||/||b|| 4.903511315013e-05 >>>>>>>> 21 KSP preconditioned resid norm 2.591703287585e-04 true resid norm >>>>>>>> 2.690982547548e-04 ||r(i)||/||b|| 2.307498497738e-05 >>>>>>>> 22 KSP preconditioned resid norm 1.596527181997e-04 true resid norm >>>>>>>> 1.715846687846e-04 ||r(i)||/||b|| 1.471326396435e-05 >>>>>>>> 23 KSP preconditioned resid norm 1.006766623019e-04 true resid norm >>>>>>>> 1.044525361282e-04 ||r(i)||/||b|| 8.956731080268e-06 >>>>>>>> 24 KSP preconditioned resid norm 5.349814270060e-05 true resid norm >>>>>>>> 6.598682341705e-05 ||r(i)||/||b|| 5.658323427037e-06 >>>>>>>> KSP Object: 1 MPI processes >>>>>>>> type: gmres >>>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>>>>> Orthogonalization with no iterative refinement >>>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>> tolerances: relative=9.18274e-06, absolute=1e-50, divergence=10000. >>>>>>>> left preconditioning >>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>> PC Object: 1 MPI processes >>>>>>>> type: asm >>>>>>>> Additive Schwarz: total subdomain blocks = 8, amount of overlap = 1 >>>>>>>> Additive Schwarz: restriction/interpolation type - BASIC >>>>>>>> Additive Schwarz: local solve composition type - MULTIPLICATIVE >>>>>>>> Local solve is same for all blocks, in the following KSP and PC >>>>>>>> objects: >>>>>>>> KSP Object: (sub_) 1 MPI processes >>>>>>>> type: preonly >>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>>>>> left preconditioning >>>>>>>> using NONE norm type for convergence test >>>>>>>> PC Object: (sub_) 1 MPI processes >>>>>>>> type: icc >>>>>>>> 0 levels of fill >>>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>>> using Manteuffel shift [POSITIVE_DEFINITE] >>>>>>>> matrix ordering: natural >>>>>>>> factor fill ratio given 1., needed 1. >>>>>>>> Factored matrix follows: >>>>>>>> Mat Object: 1 MPI processes >>>>>>>> type: seqsbaij >>>>>>>> rows=160, cols=160 >>>>>>>> package used to perform factorization: petsc >>>>>>>> total: nonzeros=443, allocated nonzeros=443 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> block size is 1 >>>>>>>> linear system matrix = precond matrix: >>>>>>>> Mat Object: 1 MPI processes >>>>>>>> type: seqaij >>>>>>>> rows=160, cols=160 >>>>>>>> total: nonzeros=726, allocated nonzeros=726 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> not using I-node routines >>>>>>>> linear system matrix = precond matrix: >>>>>>>> Mat Object: 1 MPI processes >>>>>>>> type: seqaij >>>>>>>> rows=1024, cols=1024 >>>>>>>> total: nonzeros=4992, allocated nonzeros=5120 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> not using I-node routines >>>>>>>> Norm of error 0.000292304 iterations 24 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -- Boyce >>> >>