I was able to get it worked out, once I knew the issue, doing a detailed read through our split IS generation. Working great (at least on this test problem) now!
On Wed, Nov 9, 2022 at 12:45 PM Matthew Knepley <[email protected]> wrote: > On Wed, Nov 9, 2022 at 1:45 PM Alexander Lindsay <[email protected]> > wrote: > >> Ok, I've figured out that we are definitely messing something up in our >> split index set generation. For process 1 our Jac/PMat local size is 14307, >> but split 0 IS local size is 4129 and split 1 IS local size is 10170, so >> that leaves us 8 dofs short. >> >> I know now where I need to dig in our field decomposition. Thanks Matt >> for helping me process through this stuff! >> > > Cool. The one piece of advice I have is to make the problem ruthlessly > small. Even if it seems hard, it is worth the time > to get it down to the size you can print to the screen. > > Thanks, > > Matt > > >> On Tue, Nov 8, 2022 at 4:53 PM Alexander Lindsay < >> [email protected]> wrote: >> >>> This is from our DMCreateFieldDecomposition_Moose routine. The IS size >>> on process 1 (which is the process from which I took the error in the >>> original post) is reported as 4129 which is consistent with the row size of >>> A00. >>> >>> Split '0' has local size 4129 on processor 1 >>> Split '0' has local size 4484 on processor 6 >>> Split '0' has local size 4471 on processor 12 >>> Split '0' has local size 4040 on processor 14 >>> Split '0' has local size 3594 on processor 20 >>> Split '0' has local size 4423 on processor 22 >>> Split '0' has local size 2791 on processor 27 >>> Split '0' has local size 3014 on processor 29 >>> Split '0' has local size 3183 on processor 30 >>> Split '0' has local size 3328 on processor 3 >>> Split '0' has local size 4689 on processor 4 >>> Split '0' has local size 8016 on processor 8 >>> Split '0' has local size 6367 on processor 10 >>> Split '0' has local size 5973 on processor 17 >>> Split '0' has local size 4431 on processor 18 >>> Split '0' has local size 7564 on processor 25 >>> Split '0' has local size 12504 on processor 9 >>> Split '0' has local size 10081 on processor 11 >>> Split '0' has local size 13808 on processor 24 >>> Split '0' has local size 14049 on processor 31 >>> Split '0' has local size 15324 on processor 7 >>> Split '0' has local size 15337 on processor 15 >>> Split '0' has local size 14849 on processor 19 >>> Split '0' has local size 15660 on processor 23 >>> Split '0' has local size 14728 on processor 26 >>> Split '0' has local size 15724 on processor 28 >>> Split '0' has local size 17249 on processor 5 >>> Split '0' has local size 15519 on processor 13 >>> Split '0' has local size 16511 on processor 16 >>> Split '0' has local size 16496 on processor 21 >>> Split '0' has local size 18291 on processor 2 >>> Split '0' has local size 18042 on processor 0 >>> >>> On Mon, Nov 7, 2022 at 6:04 PM Matthew Knepley <[email protected]> >>> wrote: >>> >>>> On Mon, Nov 7, 2022 at 5:48 PM Alexander Lindsay < >>>> [email protected]> wrote: >>>> >>>>> My understanding looking at PCFieldSplitSetDefaults is that our >>>>> implementation of `createfielddecomposition` should get called, we'll set >>>>> `fields` and then (ignoring possible user setting of >>>>> -pc_fieldsplit_%D_fields flag) PCFieldSplitSetIS will get called with >>>>> whatever we did to `fields`. So yea I guess that just looking over that I >>>>> would assume we're not supplying two different index sets for rows and >>>>> columns, or put more precisely we (MOOSE) are not really afforded the >>>>> opportunity to. But my interpretation could very well be wrong. >>>>> >>>> >>>> Oh wait. I read the error message again. It does not say that the whole >>>> selection is rectangular. It says >>>> >>>> Local columns of A10 4137 do not equal local rows of A00 4129 >>>> >>>> So this is a parallel partitioning thing. Since A00 has 4129 local >>>> rows, it should have this many columns as well. >>>> However A10 has 4137 local columns. How big is IS_0, on each process, >>>> that you pass in to PCFIELDSPLIT? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> On Mon, Nov 7, 2022 at 12:33 PM Matthew Knepley <[email protected]> >>>>> wrote: >>>>> >>>>>> On Mon, Nov 7, 2022 at 2:09 PM Alexander Lindsay < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> The libMesh/MOOSE specific code that identifies dof indices for >>>>>>> ISCreateGeneral is in DMooseGetEmbedding_Private. I can share that >>>>>>> function >>>>>>> (it's quite long) or more details if that could be helpful. >>>>>>> >>>>>> >>>>>> Sorry, I should have written more. The puzzling thing for me is that >>>>>> somehow it looks like the row and column index sets are not the same. I >>>>>> did >>>>>> not think >>>>>> PCFIELDSPLIT could do that. The PCFieldSplitSetIS() interface does >>>>>> not allow it. I was wondering how you were setting the ISes. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> On Mon, Nov 7, 2022 at 10:55 AM Alexander Lindsay < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I'm not sure exactly what you mean, but I'll try to give more >>>>>>>> details. We have our own DM class (DM_Moose) and we set our own field >>>>>>>> and >>>>>>>> domain decomposition routines: >>>>>>>> >>>>>>>> dm->ops->createfielddecomposition = >>>>>>>> DMCreateFieldDecomposition_Moose; >>>>>>>> >>>>>>>> dm->ops->createdomaindecomposition = >>>>>>>> DMCreateDomainDecomposition_Moose; >>>>>>>> >>>>>>>> >>>>>>>> The field and domain decomposition routines are as follows (can see >>>>>>>> also at >>>>>>>> https://github.com/idaholab/moose/blob/next/framework/src/utils/PetscDMMoose.C >>>>>>>> ): >>>>>>>> >>>>>>>> static PetscErrorCode >>>>>>>> DMCreateFieldDecomposition_Moose( >>>>>>>> DM dm, PetscInt * len, char *** namelist, IS ** islist, DM ** >>>>>>>> dmlist) >>>>>>>> { >>>>>>>> PetscErrorCode ierr; >>>>>>>> DM_Moose * dmm = (DM_Moose *)(dm->data); >>>>>>>> >>>>>>>> PetscFunctionBegin; >>>>>>>> /* Only called after DMSetUp(). */ >>>>>>>> if (!dmm->_splitlocs) >>>>>>>> PetscFunctionReturn(0); >>>>>>>> *len = dmm->_splitlocs->size(); >>>>>>>> if (namelist) >>>>>>>> { >>>>>>>> ierr = PetscMalloc(*len * sizeof(char *), namelist); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> if (islist) >>>>>>>> { >>>>>>>> ierr = PetscMalloc(*len * sizeof(IS), islist); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> if (dmlist) >>>>>>>> { >>>>>>>> ierr = PetscMalloc(*len * sizeof(DM), dmlist); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> for (const auto & dit : *(dmm->_splitlocs)) >>>>>>>> { >>>>>>>> unsigned int d = dit.second; >>>>>>>> std::string dname = dit.first; >>>>>>>> DM_Moose::SplitInfo & dinfo = (*dmm->_splits)[dname]; >>>>>>>> if (!dinfo._dm) >>>>>>>> { >>>>>>>> ierr = DMCreateMoose(((PetscObject)dm)->comm, *dmm->_nl, >>>>>>>> &dinfo._dm); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = PetscObjectSetOptionsPrefix((PetscObject)dinfo._dm, >>>>>>>> ((PetscObject)dm)->prefix); >>>>>>>> CHKERRQ(ierr); >>>>>>>> std::string suffix = std::string("fieldsplit_") + dname + "_"; >>>>>>>> ierr = PetscObjectAppendOptionsPrefix((PetscObject)dinfo._dm, >>>>>>>> suffix.c_str()); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> ierr = DMSetFromOptions(dinfo._dm); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = DMSetUp(dinfo._dm); >>>>>>>> CHKERRQ(ierr); >>>>>>>> if (namelist) >>>>>>>> { >>>>>>>> ierr = PetscStrallocpy(dname.c_str(), (*namelist) + d); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> if (islist) >>>>>>>> { >>>>>>>> if (!dinfo._rembedding) >>>>>>>> { >>>>>>>> IS dembedding, lembedding; >>>>>>>> ierr = DMMooseGetEmbedding_Private(dinfo._dm, &dembedding); >>>>>>>> CHKERRQ(ierr); >>>>>>>> if (dmm->_embedding) >>>>>>>> { >>>>>>>> // Create a relative embedding into the parent's index >>>>>>>> space. >>>>>>>> ierr = ISEmbed(dembedding, dmm->_embedding, PETSC_TRUE, >>>>>>>> &lembedding); >>>>>>>> CHKERRQ(ierr); >>>>>>>> const PetscInt * lindices; >>>>>>>> PetscInt len, dlen, llen, *rindices, off, i; >>>>>>>> ierr = ISGetLocalSize(dembedding, &dlen); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = ISGetLocalSize(lembedding, &llen); >>>>>>>> CHKERRQ(ierr); >>>>>>>> if (llen != dlen) >>>>>>>> SETERRQ1(((PetscObject)dm)->comm, PETSC_ERR_PLIB, >>>>>>>> "Failed to embed split %D", d); >>>>>>>> ierr = ISDestroy(&dembedding); >>>>>>>> CHKERRQ(ierr); >>>>>>>> // Convert local embedding to global (but still relative) >>>>>>>> embedding >>>>>>>> ierr = PetscMalloc(llen * sizeof(PetscInt), &rindices); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = ISGetIndices(lembedding, &lindices); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = PetscMemcpy(rindices, lindices, llen * >>>>>>>> sizeof(PetscInt)); >>>>>>>> CHKERRQ(ierr); >>>>>>>> ierr = ISDestroy(&lembedding); >>>>>>>> CHKERRQ(ierr); >>>>>>>> // We could get the index offset from a corresponding >>>>>>>> global vector, but subDMs don't yet >>>>>>>> // have global vectors >>>>>>>> ierr = ISGetLocalSize(dmm->_embedding, &len); >>>>>>>> CHKERRQ(ierr); >>>>>>>> >>>>>>>> ierr = MPI_Scan(&len, >>>>>>>> &off, >>>>>>>> 1, >>>>>>>> #ifdef PETSC_USE_64BIT_INDICES >>>>>>>> MPI_LONG_LONG_INT, >>>>>>>> #else >>>>>>>> MPI_INT, >>>>>>>> #endif >>>>>>>> MPI_SUM, >>>>>>>> ((PetscObject)dm)->comm); >>>>>>>> CHKERRQ(ierr); >>>>>>>> >>>>>>>> off -= len; >>>>>>>> for (i = 0; i < llen; ++i) >>>>>>>> rindices[i] += off; >>>>>>>> ierr = ISCreateGeneral( >>>>>>>> ((PetscObject)dm)->comm, llen, rindices, >>>>>>>> PETSC_OWN_POINTER, &(dinfo._rembedding)); >>>>>>>> CHKERRQ(ierr); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> dinfo._rembedding = dembedding; >>>>>>>> } >>>>>>>> } >>>>>>>> ierr = PetscObjectReference((PetscObject)(dinfo._rembedding)); >>>>>>>> CHKERRQ(ierr); >>>>>>>> (*islist)[d] = dinfo._rembedding; >>>>>>>> } >>>>>>>> if (dmlist) >>>>>>>> { >>>>>>>> ierr = PetscObjectReference((PetscObject)dinfo._dm); >>>>>>>> CHKERRQ(ierr); >>>>>>>> (*dmlist)[d] = dinfo._dm; >>>>>>>> } >>>>>>>> } >>>>>>>> PetscFunctionReturn(0); >>>>>>>> } >>>>>>>> >>>>>>>> static PetscErrorCode >>>>>>>> DMCreateDomainDecomposition_Moose( >>>>>>>> DM dm, PetscInt * len, char *** namelist, IS ** innerislist, IS >>>>>>>> ** outerislist, DM ** dmlist) >>>>>>>> { >>>>>>>> PetscErrorCode ierr; >>>>>>>> >>>>>>>> PetscFunctionBegin; >>>>>>>> /* Use DMCreateFieldDecomposition_Moose() to obtain everything >>>>>>>> but outerislist, which is currently >>>>>>>> * PETSC_NULL. */ >>>>>>>> if (outerislist) >>>>>>>> *outerislist = PETSC_NULL; /* FIX: allow mesh-based overlap. */ >>>>>>>> ierr = DMCreateFieldDecomposition_Moose(dm, len, namelist, >>>>>>>> innerislist, dmlist); >>>>>>>> CHKERRQ(ierr); >>>>>>>> PetscFunctionReturn(0); >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 3, 2022 at 5:19 PM Matthew Knepley <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Thu, Nov 3, 2022 at 7:52 PM Alexander Lindsay < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I have errors on quite a few (but not all) processes of the like >>>>>>>>>> >>>>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>>>> -------------------------------------------------------------- >>>>>>>>>> [1]PETSC ERROR: Nonconforming object sizes >>>>>>>>>> [1]PETSC ERROR: Local columns of A10 4137 do not equal local rows >>>>>>>>>> of A00 4129 >>>>>>>>>> >>>>>>>>>> when performing field splits. We (MOOSE) have some code for >>>>>>>>>> identifying the index sets for each split. However, the code was >>>>>>>>>> written by >>>>>>>>>> some authors who are no longer with us. Normally I would chase this >>>>>>>>>> down in >>>>>>>>>> a debugger, but this error only seems to crop up for pretty complex >>>>>>>>>> and >>>>>>>>>> large meshes. If anyone has an idea for what we might be doing >>>>>>>>>> wrong, that >>>>>>>>>> might help me chase this down faster. I guess intuitively I'm pretty >>>>>>>>>> perplexed that we could get ourselves into this pickle as it almost >>>>>>>>>> appears >>>>>>>>>> that we have two different local dof index counts for a given block >>>>>>>>>> (0 in >>>>>>>>>> this case). More background, if helpful, can be found in >>>>>>>>>> https://github.com/idaholab/moose/issues/22359 as well as >>>>>>>>>> https://github.com/idaholab/moose/discussions/22468. >>>>>>>>>> >>>>>>>>> >>>>>>>>> How are you specifying the blocks? I would not have thought this >>>>>>>>> was possible. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> I should note that we are currently running with 3.16.6 as our >>>>>>>>>> PETSc submodule hash (we are talking about updating to 3.18 soon). >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>>> their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
