I think I found the problem. I configured PETSc with COPTFLAGS=-O3. I'll remove that option and try again.
Thanks! Harshad On Wed, Sep 14, 2016 at 10:06 AM, Harshad Sahasrabudhe <[email protected]> wrote: > Hi Barry, > > Thanks for your inputs. I tried to set a watchpoint on > ((_p_KSP*)ksp)->reason, but gdb says no symbol _p_KSP in context. > Basically, GDB isn't able to find the PETSc source code. I built PETSc with > --with-debugging=1 statically and -fPIC, but it seems the libpetsc.a I get > doesn't contain debugging symbols (checked using objdump -g). How do I get > PETSc library to have debugging info? > > Thanks, > Harshad > > On Tue, Sep 13, 2016 at 2:47 PM, Barry Smith <[email protected]> wrote: > >> >> > On Sep 13, 2016, at 1:01 PM, Harshad Sahasrabudhe <[email protected]> >> wrote: >> > >> > Hi Barry, >> > >> > I compiled with mpich configured using --enable-g=meminit to get rid of >> MPI errors in Valgrind. Doing this reduced the number of errors to 2. I >> have attached the Valgrind output. >> >> This isn't helpful but it seems not to be a memory corruption issue :-( >> > >> > I'm using GAMG+GMRES for in each linear iteration of SNES. The linear >> solver converges with CONVERGED_RTOL for the first 6 iterations and with >> CONVERGED_STEP_LENGTH after that. I'm still very confused about why this is >> happening. Any thoughts/ideas? >> >> Does this happen on one process? If so I would run in the debugger and >> track the variable to see everyplace the variable is changed, this would >> point to exactly what piece of code is changing the variable to this >> unexpected value. >> >> For example with lldb one can use watch http://lldb.llvm.org/tutorial. >> html to see each time a variable gets changed. Similar thing with gdb. >> >> The variable to watch is ksp->reason Once you get the hang of this it >> can take just a few minutes to track down the code that is making this >> unexpected value, though I understand if you haven't done it before it can >> be intimidating. >> >> Barry >> >> You can do the same thing in parallel (like on two processes) if you need >> to but it is more cumbersome since you need run multiple debuggers. You can >> have PETSc start up multiple debuggers with mpiexec -n 2 ./ex >> -start_in_debugger >> >> >> >> >> > >> > Thanks, >> > Harshad >> > >> > On Thu, Sep 8, 2016 at 11:26 PM, Barry Smith <[email protected]> >> wrote: >> > >> > Install your MPI with --download-mpich as a PETSc ./configure option, >> this will eliminate all the MPICH valgrind errors. Then send as an >> attachment the resulting valgrind file. >> > >> > I do not 100 % trust any code that produces such valgrind errors. >> > >> > Barry >> > >> > >> > >> > > On Sep 8, 2016, at 10:12 PM, Harshad Sahasrabudhe < >> [email protected]> wrote: >> > > >> > > Hi Barry, >> > > >> > > Thanks for the reply. My code is in C. I ran with Valgrind and found >> many "Conditional jump or move depends on uninitialized value(s)", "Invalid >> read" and "Use of uninitialized value" errors. I think all of them are from >> the libraries I'm using (LibMesh, Boost, MPI, etc.). I'm not really sure >> what I'm looking for in the Valgrind output. At the end of the file, I get: >> > > >> > > ==40223== More than 10000000 total errors detected. I'm not >> reporting any more. >> > > ==40223== Final error counts will be inaccurate. Go fix your program! >> > > ==40223== Rerun with --error-limit=no to disable this cutoff. Note >> > > ==40223== that errors may occur in your program without prior warning >> from >> > > ==40223== Valgrind, because errors are no longer being displayed. >> > > >> > > Can you give some suggestions on how I should proceed? >> > > >> > > Thanks, >> > > Harshad >> > > >> > > On Thu, Sep 8, 2016 at 1:59 PM, Barry Smith <[email protected]> >> wrote: >> > > >> > > This is very odd. CONVERGED_STEP_LENGTH for KSP is very >> specialized and should never occur with GMRES. >> > > >> > > Can you run with valgrind to make sure there is no memory >> corruption? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > >> > > Is your code fortran or C? >> > > >> > > Barry >> > > >> > > > On Sep 8, 2016, at 10:38 AM, Harshad Sahasrabudhe < >> [email protected]> wrote: >> > > > >> > > > Hi, >> > > > >> > > > I'm using GAMG + GMRES for my Poisson problem. The solver converges >> with KSP_CONVERGED_STEP_LENGTH at a residual of 9.773346857844e-02, which >> is much higher than what I need (I need a tolerance of at least 1E-8). I am >> not able to figure out which tolerance I need to set to avoid convergence >> due to CONVERGED_STEP_LENGTH. >> > > > >> > > > Any help is appreciated! Output of -ksp_view and -ksp_monitor: >> > > > >> > > > 0 KSP Residual norm 3.121347818142e+00 >> > > > 1 KSP Residual norm 9.773346857844e-02 >> > > > Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1 >> > > > KSP Object: 1 MPI processes >> > > > type: gmres >> > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> > > > GMRES: happy breakdown tolerance 1e-30 >> > > > maximum iterations=10000, initial guess is zero >> > > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000 >> > > > left preconditioning >> > > > using PRECONDITIONED norm type for convergence test >> > > > PC Object: 1 MPI processes >> > > > type: gamg >> > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v >> > > > Cycles per PCApply=1 >> > > > Using Galerkin computed coarse grid matrices >> > > > Coarse grid solver -- level ------------------------------- >> > > > KSP Object: (mg_coarse_) 1 MPI processes >> > > > type: preonly >> > > > maximum iterations=1, initial guess is zero >> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> > > > left preconditioning >> > > > using NONE norm type for convergence test >> > > > PC Object: (mg_coarse_) 1 MPI processes >> > > > type: bjacobi >> > > > block Jacobi: number of blocks = 1 >> > > > Local solve is same for all blocks, in the following KSP >> and PC objects: >> > > > KSP Object: (mg_coarse_sub_) 1 MPI processes >> > > > type: preonly >> > > > maximum iterations=1, initial guess is zero >> > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > left preconditioning >> > > > using NONE norm type for convergence test >> > > > PC Object: (mg_coarse_sub_) 1 MPI processes >> > > > type: lu >> > > > LU: out-of-place factorization >> > > > tolerance for zero pivot 2.22045e-14 >> > > > using diagonal shift on blocks to prevent zero pivot >> [INBLOCKS] >> > > > matrix ordering: nd >> > > > factor fill ratio given 5, needed 1.91048 >> > > > Factored matrix follows: >> > > > Mat Object: 1 MPI processes >> > > > type: seqaij >> > > > rows=284, cols=284 >> > > > package used to perform factorization: petsc >> > > > total: nonzeros=7726, allocated nonzeros=7726 >> > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > using I-node routines: found 133 nodes, limit >> used is 5 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 1 MPI processes >> > > > type: seqaij >> > > > rows=284, cols=284 >> > > > total: nonzeros=4044, allocated nonzeros=4044 >> > > > total number of mallocs used during MatSetValues calls >> =0 >> > > > not using I-node routines >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 1 MPI processes >> > > > type: seqaij >> > > > rows=284, cols=284 >> > > > total: nonzeros=4044, allocated nonzeros=4044 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > not using I-node routines >> > > > Down solver (pre-smoother) on level 1 >> ------------------------------- >> > > > KSP Object: (mg_levels_1_) 1 MPI processes >> > > > type: chebyshev >> > > > Chebyshev: eigenvalue estimates: min = 0.195339, max = >> 4.10212 >> > > > maximum iterations=2 >> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> > > > left preconditioning >> > > > using nonzero initial guess >> > > > using NONE norm type for convergence test >> > > > PC Object: (mg_levels_1_) 1 MPI processes >> > > > type: sor >> > > > SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: () 1 MPI processes >> > > > type: seqaij >> > > > rows=9036, cols=9036 >> > > > total: nonzeros=192256, allocated nonzeros=192256 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > not using I-node routines >> > > > Up solver (post-smoother) same as down solver (pre-smoother) >> > > > linear system matrix = precond matrix: >> > > > Mat Object: () 1 MPI processes >> > > > type: seqaij >> > > > rows=9036, cols=9036 >> > > > total: nonzeros=192256, allocated nonzeros=192256 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > not using I-node routines >> > > > >> > > > Thanks, >> > > > Harshad >> > > >> > > >> > >> > >> > <valgrind.log.33199> >> >> >
