> On Aug 20, 2019, at 10:55 AM, zakaryah . via petsc-users 
> <[email protected]> wrote:
> 
> Thanks Barry - adding -matstash_legacy seemed to fix everything.  Does it 
> look like I also stumbled onto the pathological case?

  Yes, 
> 
> On Tue, Aug 20, 2019 at 2:07 AM Smith, Barry F. <[email protected]> wrote:
> 
>   Before the latest release we changed the way we handled communicating 
> matrix entries that needed to be sent between processors; normally that way 
> would be much faster but another user reported a pathologically case where 
> the PetscSortIntWithArrayPair() was producing a huge number of nested calls. 
> That user actually provided a fix (a better sort) that resolved his problem. 
> 
>  You can either run with -matstash_legacy (may need a prefix on the this your 
> matrix has a prefix) to get the old behavior or 
> 
>  You can switch to the master branch in the PETSc repository version to use 
> it.  Note we have switched to Gitlab for our repository. You can get the 
> master branch with  git clone https://gitlab.com/petsc/petsc
> 
>   Please let us know if this does not resolve your problem
> 
>   Barry
> 
> 
> 
> 
> 
> > On Aug 20, 2019, at 12:42 AM, zakaryah . via petsc-users 
> > <[email protected]> wrote:
> > 
> > I'm working on solving a nonlinear system with a DMComposite, which is 
> > composed of a DMRedundant and a DMDA.  The DMRedundant has a single dof, 
> > while the DMDA can be large (n grid points, where n is a few million).  The 
> > Jacobian is then (n+1)x(n+1).  I set it by calling MatGetLocalSubMatrix 4 
> > times, to get the submatrices J11 (1 x 1), J12 (1 x n), J21 (n x 1), and 
> > J22 (n x n).  In the past, setting the values for these matrices seemed to 
> > work fine - the Jacobian was correct, the code was valgrind quiet, etc.  
> > Since then, something has changed - I went from v. 3.11.0 to 3.11.3, my 
> > sysadmin may have fiddled with MPI, I'm experimenting with different 
> > problem sizes and number of processors, etc.
> > 
> > Now I am getting segfaults, and the valgrind trace has not been terribly 
> > illuminating.  Both valgrind and the debugger indicate the segfault 
> > originating in PetscSortIntWithArrayPair_Private, with the back trace 
> > showing nested calls to this function running *thousands* deep...  I'm not 
> > sure but I suspect that the trace to PetscSortIntWithArrayPair_Private 
> > originates in MatAssemblyBegin.
> > 
> > I've created a working example which is minimal for my purposes (although 
> > probably not fun for anyone else to inspect), and I've narrowed the issue 
> > down to the code which sets the values of J12, the submatrix with a single 
> > row (the DMRedundant dof) and many columns (n).  If I don't set any values 
> > in that submatrix, and change the function so that this Jacobian is 
> > correct, then the code goes back to being valgrind quiet, completing 
> > correctly (for the modified problem), and not throwing a segfault.  If I 
> > run in serial, it also works.
> > 
> > Searching the archive, I found an old thread about segfaults in 
> > MatAssemblyBegin when a very large number of values were set between 
> > flushes, and Matt said that even though there should not be a segfault, it 
> > is a good idea in practice to frequently flush the assembly within a loop.
> > 
> > I guess my question is whether you think the segfault I am getting is due 
> > to setting too many matrix values, and how I should go about flushing the 
> > assembly.  Since I add values to J12 using MatSetValuesLocal, do I need to 
> > call MatRestoreLocalSubmatrix to restore J12, MatAssemblyBegin and End with 
> > MAT_FLUSH_ASSEMBLY on the full Jacobian, then call MatGetLocalSubmatrix to 
> > get J12 again, all inside the loop?
> > 
> > One last point - when I call MatSetValuesLocal on J12, I do that on all 
> > processors, with each processor accessing only the columns which correspond 
> > to the grid points it owns.  Is there anything wrong with doing that?  Do I 
> > need to only set values on the processor which owns the DMRedundant?  If I 
> > call MatGetOwnershipRange on J12, it appears that all processes own the 
> > matrix's single row.
> > 
> > Thanks for any help you can share.
> > 
> > 
> > 
> 

Reply via email to