Please send your "simple" code that attempts to do the appropriate preallocation and we'll look at why it is failing.
Barry > On Nov 2, 2017, at 4:03 PM, zakaryah . <[email protected]> wrote: > > I ran MatView with the smallest possible grid size. The elements which are > nonzero after the first assembly but not after the matrix is created are > exactly the couplings, i.e. row and column 0 off the diagonal, which I'm > trying to preallocate, and eventually place in the second call to > FormCoupleLocations, although right now it's only being called once - not > sure why prealloc_only is set - maybe because I have > MAT_NEW_NONZERO_ALLOCATION_ERR set to false? I'm not sure why the Jbh terms > result in so few additional mallocs - are columns allocated in chunks on the > fly? For example, when my DMDA size is 375, adding values to Jhb results in > 375 additional mallocs, but adding values to Jbh only adds 25 mallocs. > > I guess my FormCoupleLocations is pretty incomplete, because incrementing dnz > and onz doesn't seem to have any effect. I also don't know how this function > should behave - as I said before, writing to dnz[i] and onz[i] where i goes > from rstart to rstart+nrows-1 causes a segfault on processors >0, and if I > inspect those values of dnz[i] and onz[i] by writing them to a file, they are > nonsense when processor >0. Are dnz and onz local to the processor, i.e. > should my iterations go from 0 to nrows-1? What is the general idea behind > setting the structure from within FormCoupleLocations? I'm currently not > doing that at all. > > Thanks for all the help! > > On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. <[email protected]> wrote: > > > > On Nov 1, 2017, at 10:32 PM, zakaryah . <[email protected]> wrote: > > > > I worked on the assumptions in my previous email and I at least partially > > implemented the function to assign the couplings. For row 0, which is the > > redundant field, I set dnz[0] to end-start, and onz[0] to the size of the > > matrix minus dnz[0]. For all other rows, I just increment the existing > > values of dnz[i] and onz[i], since the coupling to the redundant field adds > > one extra element beyond what's allocated for the DMDA stencil. > > Sounds reasonable. > > > > I see in the source that the FormCoupleLocations function is called once if > > the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that > > the second call is for setting the nonzero structure. > > Yes > > Do I need to do this? > > You probably should. > > I would do a MatView() small DMDA on the matrix you obtain and then again > after you add the numerical values. This will show you what values are not > being properly allocated/put in when the matrix is created by the DM. > > > > In any case, something is still not right. Even with the extra elements > > preallocated, the first assembly of the matrix is very slow. I ran a test > > problem on a single process with -info, and got this: > > > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 > > used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode > > routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 > > unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using > > Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 > > unneeded,1 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode > > routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 > > unneeded,629703 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using > > Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 > > unneeded,629704 used > > Yes, really bad. > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using > > Inode routines > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 > > 30818112 > > > > [0] DMGetDMSNES(): Creating new DMSNES > > > > [0] DMGetDMKSP(): Creating new DMKSP > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > 0 SNES Function norm 2.302381528359e+00 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 > > 30194064 > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: > > 126132 unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using > > Inode routines > > > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 > > unneeded,647722 used > > > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 > > > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > > 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. > > > > > > > > The 9610 mallocs during MatSetValues seem suspicious and are probably > > what's taking so long with larger problems. 601 of them are apparently in > > the call to set Jbh, and 9009 are in the call to set Jhb (b is the > > redundant field, h is the DMDA field). If I run with more than one > > process, I get a segfault when a process which has rank greater than 0 sets > > dnz or onz in the FormCoupleLocations call. > > > > > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <[email protected]> wrote: > > Thanks Barry, that looks like exactly what I need. I'm looking at pack.c > > and packm.c and I want to check my understanding of what my coupling > > function should do. The relevant line in DMCreateMatrix_Composite_AIJ > > seems to be: > > > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end); > > > > and I infer that dnz and onz are the number of nonzero elements in the > > diagonal and off-diagonal submatrices, for each row of the DMComposite > > matrix. I suppose I can just set each of these in a for loop, but can I > > use the arguments to FormCoupleLocations as the range for the loop? Which > > ones - __rstart to __rstart+__nrows? How can I determine the number of > > rows on each processor from within the function that I pass? From the > > preallocation macros it looks like __start to __end describe the range of > > the columns of the diagonal submatrix - is that right? It looks like the > > ranges will be specific to each processor. Do I just set the values in dnz > > and onz, or do I need to reduce them? > > > > Thanks for all the help! Maybe if I get things working I can carve out the > > core of the code to make an example program for DMRedundant/Composite. > > > >
