> On Nov 1, 2017, at 10:32 PM, zakaryah . <[email protected]> wrote:
> 
> I worked on the assumptions in my previous email and I at least partially 
> implemented the function to assign the couplings.  For row 0, which is the 
> redundant field, I set dnz[0] to end-start, and onz[0] to the size of the 
> matrix minus dnz[0].  For all other rows, I just increment the existing 
> values of dnz[i] and onz[i], since the coupling to the redundant field adds 
> one extra element beyond what's allocated for the DMDA stencil.

  Sounds reasonable.
> 
> I see in the source that the FormCoupleLocations function is called once if 
> the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume that the 
> second call is for setting the nonzero structure.  

  Yes
> Do I need to do this?

   You probably should.

   I would do a MatView() small DMDA on the matrix you obtain and then again 
after you add the numerical values. This will show you what values are not 
being properly allocated/put in when the matrix is created by the DM.
> 
> In any case, something is still not right.  Even with the extra elements 
> preallocated, the first assembly of the matrix is very slow.  I ran a test 
> problem on a single process with -info, and got this:
> 
> 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 
> used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 1) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode 
> routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 
> unneeded,629703 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 9009) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using 
> Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 
> used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 1) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode 
> routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 
> unneeded,629703 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 9009) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using 
> Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 
> unneeded,629704 used

  Yes, really bad.
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 9010) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using 
> Inode routines
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 
> 30818112
> 
> [0] DMGetDMSNES(): Creating new DMSNES
> 
> [0] DMGetDMKSP(): Creating new DMKSP
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
>   0 SNES Function norm 2.302381528359e+00 
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 
> 30194064
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 
> unneeded,647722 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 9010) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using 
> Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 
> unneeded,647722 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
> 9010) < 0.6. Do not use CompressedRow routines.
> 
> 
> 
> The 9610 mallocs during MatSetValues seem suspicious and are probably what's 
> taking so long with larger problems.  601 of them are apparently in the call 
> to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h 
> is the DMDA field).  If I run with more than one process, I get a segfault 
> when a process which has rank greater than 0 sets dnz or onz in the 
> FormCoupleLocations call.
> 
> 
> On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <[email protected]> wrote:
> Thanks Barry, that looks like exactly what I need.  I'm looking at pack.c and 
> packm.c and I want to check my understanding of what my coupling function 
> should do.  The relevant line in DMCreateMatrix_Composite_AIJ seems to be:
> 
> (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end);
> 
> and I infer that dnz and onz are the number of nonzero elements in the 
> diagonal and off-diagonal submatrices, for each row of the DMComposite 
> matrix.  I suppose I can just set each of these in a for loop, but can I use 
> the arguments to FormCoupleLocations as the range for the loop?  Which ones - 
> __rstart to __rstart+__nrows?  How can I determine the number of rows on each 
> processor from within the function that I pass?  From the preallocation 
> macros it looks like __start to __end describe the range of the columns of 
> the diagonal submatrix - is that right?  It looks like the ranges will be 
> specific to each processor.  Do I just set the values in dnz and onz, or do I 
> need to reduce them?
> 
> Thanks for all the help!  Maybe if I get things working I can carve out the 
> core of the code to make an example program for DMRedundant/Composite.
> 

Reply via email to