On Fri, Jan 20, 2012 at 10:21 AM, Wen Jiang <jiangwen84 at gmail.com> wrote:
> Hi, Matt > > Could you tell me some more details about how to get a stack trace there? > I know little about it. The job is submitted on head node and running on > compute nodes. > 1) Always run serial problems until you understand what is happening 2) Run with -start_in_debugger, and type 'cont' in the debugger (read about gdb) 3) When it stalls, Ctrl-C and then type 'where' Matt > Thanks. > > On Fri, Jan 20, 2012 at 9:44 AM, Wen Jiang <jiangwen84 at gmail.com> wrote: > > > Hi Barry, > > > > Thanks for your suggestion. I just added MatSetOption(mat, > > MAT_NEW_NONZERO_ALLOCATION_ > ERR,PETSC_TRUE) to my code, but I did not get > > any error information regarding to bad allocation. And my code is stuck > > there. I attached the output file below. Thanks. > > > > Run with -start_in_debugger and get a stack trace. Note that your stashes > are enormous. You might consider > MatAssemblyBegin/End(A, MAT_ASSEMBLY_FLUSH) during assembly. > > Matt > > > > [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs. > > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs. > > [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs. > > [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs. > > [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs. > > [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs. > > [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs. > > [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10 > mallocs. > > [0] MatStashScatterBegin_Private(): No of messages: 1 > > [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648 > > [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10 > mallocs. > > [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10 > mallocs. > > [7] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0 > > unneeded,2514194 used > > [7] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [7] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [7] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using > Inode > > routines > > [7] PetscCommDuplicate(): Using internal PETSc communicator > 47582902893600 > > 339106512 > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > > unneeded,2514537 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using > Inode > > routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 46968795675680 > > 536030192 > > [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter > > [6] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0 > > unneeded,2499938 used > > [6] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [6] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [6] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using > Inode > > routines > > [6] PetscCommDuplicate(): Using internal PETSc communicator > 47399146302496 > > 509504096 > > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0 > > unneeded,2525390 used > > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using > Inode > > routines > > [5] PetscCommDuplicate(): Using internal PETSc communicator > 47033309994016 > > 520223440 > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > > unneeded,2500281 used > > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using > Inode > > routines > > [1] PetscCommDuplicate(): Using internal PETSc communicator > 47149241441312 > > 163068544 > > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0 > > unneeded,2525733 used > > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294 > > [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using > Inode > > routines > > [2] PetscCommDuplicate(): Using internal PETSc communicator > 47674980494368 > > 119371056 > > > > > > > > > > >> > Since my code never finishes, I cannot get the summary files by add > >> -log_summary. any other way to get summary file? > >> > > > > My guess is that you are running a larger problem on the this system > and > >> your preallocation for the matrix is wrong. While in the small run you > sent > >> the preallocation is correct. > >> > >> Usually the only thing that causes it to take forever is not the > >> parallel communication but is the preallocation. After you create the > >> matrix and set its preallocation call > >> MatSetOption(mat, NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); then run. It > >> will stop with an error message if preallocation is wrong. > >> > >> Barry > >> > >> > >> > >> > > >> > BTW, my codes are running without any problem on shared-memory desktop > >> with any number of processes. > >> > > >> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120120/4d7fe702/attachment.htm>
