Please send your "simple" code that attempts to do the appropriate 
preallocation and we'll look at why it is failing.

  Barry


> On Nov 2, 2017, at 4:03 PM, zakaryah . <[email protected]> wrote:
> 
> I ran MatView with the smallest possible grid size.  The elements which are 
> nonzero after the first assembly but not after the matrix is created are 
> exactly the couplings, i.e. row and column 0 off the diagonal, which I'm 
> trying to preallocate, and eventually place in the second call to 
> FormCoupleLocations, although right now it's only being called once - not 
> sure why prealloc_only is set - maybe because I have 
> MAT_NEW_NONZERO_ALLOCATION_ERR set to false?  I'm not sure why the Jbh terms 
> result in so few additional mallocs - are columns allocated in chunks on the 
> fly?  For example, when my DMDA size is 375, adding values to Jhb results in 
> 375 additional mallocs, but adding values to Jbh only adds 25 mallocs.
> 
> I guess my FormCoupleLocations is pretty incomplete, because incrementing dnz 
> and onz doesn't seem to have any effect.  I also don't know how this function 
> should behave - as I said before, writing to dnz[i] and onz[i] where i goes 
> from rstart to rstart+nrows-1 causes a segfault on processors >0, and if I 
> inspect those values of dnz[i] and onz[i] by writing them to a file, they are 
> nonsense when processor >0.  Are dnz and onz local to the processor, i.e. 
> should my iterations go from 0 to nrows-1?  What is the general idea behind 
> setting the structure from within FormCoupleLocations?  I'm currently not 
> doing that at all.
> 
> Thanks for all the help!
> 
> On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. <[email protected]> wrote:
> 
> 
> > On Nov 1, 2017, at 10:32 PM, zakaryah . <[email protected]> wrote:
> >
> > I worked on the assumptions in my previous email and I at least partially 
> > implemented the function to assign the couplings.  For row 0, which is the 
> > redundant field, I set dnz[0] to end-start, and onz[0] to the size of the 
> > matrix minus dnz[0].  For all other rows, I just increment the existing 
> > values of dnz[i] and onz[i], since the coupling to the redundant field adds 
> > one extra element beyond what's allocated for the DMDA stencil.
> 
>   Sounds reasonable.
> >
> > I see in the source that the FormCoupleLocations function is called once if 
> > the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume that 
> > the second call is for setting the nonzero structure.
> 
>   Yes
> > Do I need to do this?
> 
>    You probably should.
> 
>    I would do a MatView() small DMDA on the matrix you obtain and then again 
> after you add the numerical values. This will show you what values are not 
> being properly allocated/put in when the matrix is created by the DM.
> >
> > In any case, something is still not right.  Even with the extra elements 
> > preallocated, the first assembly of the matrix is very slow.  I ran a test 
> > problem on a single process with -info, and got this:
> >
> > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 
> > used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode 
> > routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 
> > unneeded,629703 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using 
> > Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 
> > unneeded,1 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode 
> > routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 
> > unneeded,629703 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using 
> > Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 
> > unneeded,629704 used
> 
>   Yes, really bad.
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using 
> > Inode routines
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 
> > 30818112
> >
> > [0] DMGetDMSNES(): Creating new DMSNES
> >
> > [0] DMGetDMKSP(): Creating new DMKSP
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> >   0 SNES Function norm 2.302381528359e+00
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> > 30194064
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 
> > 126132 unneeded,647722 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using 
> > Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 
> > unneeded,647722 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 
> > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> >
> >
> > The 9610 mallocs during MatSetValues seem suspicious and are probably 
> > what's taking so long with larger problems.  601 of them are apparently in 
> > the call to set Jbh, and 9009 are in the call to set Jhb (b is the 
> > redundant field, h is the DMDA field).  If I run with more than one 
> > process, I get a segfault when a process which has rank greater than 0 sets 
> > dnz or onz in the FormCoupleLocations call.
> >
> >
> > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <[email protected]> wrote:
> > Thanks Barry, that looks like exactly what I need.  I'm looking at pack.c 
> > and packm.c and I want to check my understanding of what my coupling 
> > function should do.  The relevant line in DMCreateMatrix_Composite_AIJ 
> > seems to be:
> >
> > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end);
> >
> > and I infer that dnz and onz are the number of nonzero elements in the 
> > diagonal and off-diagonal submatrices, for each row of the DMComposite 
> > matrix.  I suppose I can just set each of these in a for loop, but can I 
> > use the arguments to FormCoupleLocations as the range for the loop?  Which 
> > ones - __rstart to __rstart+__nrows?  How can I determine the number of 
> > rows on each processor from within the function that I pass?  From the 
> > preallocation macros it looks like __start to __end describe the range of 
> > the columns of the diagonal submatrix - is that right?  It looks like the 
> > ranges will be specific to each processor.  Do I just set the values in dnz 
> > and onz, or do I need to reduce them?
> >
> > Thanks for all the help!  Maybe if I get things working I can carve out the 
> > core of the code to make an example program for DMRedundant/Composite.
> >
> 
> 

Reply via email to