Valgrind excerpts below, none seems related, and both are there also with gmres.
I just cant believe bicgstab would quit on such a trivial problem... Regards, Dominik ==10423== Syscall param writev(vector[...]) points to uninitialised byte(s) ==10423== at 0x6B5E789: writev (writev.c:56) ==10423== by 0x515658: MPIDU_Sock_writev (sock_immed.i:610) ==10423== by 0x517943: MPIDI_CH3_iSendv (ch3_isendv.c:84) ==10423== by 0x4FB756: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:509) ==10423== by 0x4FD6E4: MPID_Isend (mpid_isend.c:118) ==10423== by 0x4E502A: MPIC_Isend (helper_fns.c:210) ==10423== by 0xED701D: MPIR_Alltoall (alltoall.c:420) ==10423== by 0xED7880: PMPI_Alltoall (alltoall.c:685) ==10423== by 0xE4BC95: SetUp__ (setup.c:122) ==10423== by 0xE4C4E4: PartitionSmallGraph__ (weird.c:39) ==10423== by 0xE497D0: ParMETIS_V3_PartKway (kmetis.c:131) ==10423== by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97) ==10423== by 0x717D70: MatPartitioningApply (partition.c:236) ==10423== by 0x5287D7: PetscSolver::LoadMesh(std::string const&) (PetscSolver.cxx:676) ==10423== by 0x4C6157: CD3T10_USER::ProcessInputFile() (cd3t10mpi_main.cxx:321) ==10423== by 0x4C3B57: main (cd3t10mpi_main.cxx:568) ==10423== Address 0x71ce9b4 is 4 bytes inside a block of size 72 alloc'd ==10423== at 0x4C28FAC: malloc (vg_replace_malloc.c:236) ==10423== by 0xE5C792: GKmalloc__ (util.c:151) ==10423== by 0xE57C09: PreAllocateMemory__ (memory.c:38) ==10423== by 0xE496B3: ParMETIS_V3_PartKway (kmetis.c:116) ==10423== by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97) ==10423== by 0x717D70: MatPartitioningApply (partition.c:236) ==10423== by 0x5287D7: PetscSolver::LoadMesh(std::string const&) (PetscSolver.cxx:676) ==10423== by 0x4C6157: CD3T10_USER::ProcessInputFile() (cd3t10mpi_main.cxx:321) ==10423== by 0x4C3B57: main (cd3t10mpi_main.cxx:568) ==10423== ==10423== Conditional jump or move depends on uninitialised value(s) ==10423== at 0x55B6510: inflateReset2 (in /lib/x86_64-linux-gnu/libz.so.1.2.3.4) ==10423== by 0x55B6605: inflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.3.4) ==10423== by 0x5308C13: H5Z_filter_deflate (H5Zdeflate.c:110) ==10423== by 0x5308170: H5Z_pipeline (H5Z.c:1103) ==10423== by 0x518BB69: H5D_chunk_lock (H5Dchunk.c:2758) ==10423== by 0x518CB20: H5D_chunk_read (H5Dchunk.c:1728) ==10423== by 0x519BDDA: H5D_read (H5Dio.c:447) ==10423== by 0x519C248: H5Dread (H5Dio.c:173) ==10423== by 0x4CEB99: HDF5::HDF5Reader::readData(std::string const&) (HDF5Reader.cxx:634) ==10423== by 0x4CE305: HDF5::HDF5Reader::read(std::vector<int, std::allocator<int> >&, std::string const&) (HDF5Reader.cxx:527) ==10423== by 0x4C71A3: CD3T10_USER::SetupConstraints() (cd3t10mpi_main.cxx:404) ==10423== by 0x4BA02A: CD3T10::Solve() (CD3T10mpi.cxx:658) ==10423== by 0x4C3BE1: main (cd3t10mpi_main.cxx:590) ==10423== On Fri, Aug 26, 2011 at 11:31 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > ?First run with valgrind > http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to make > sure it does not have some "code bug cause". > > ? Do you get the same message on one process. > > ? My previous message still holds, if it is not a "code bug" then bi-CG-stab > is just breaking down on that matrix and preconditioner combination. > > ? Barry > > On Aug 26, 2011, at 4:27 PM, Dominik Szczerba wrote: > >> Later in the message he only requested that I use "-ksp_norm_type >> unpreconditioned". So I did, and the error comes back, now fully >> documented below. As I wrote, it works fine with gmres, and the >> problem is very simple, diagonal dominant steady state diffusion. >> >> Any hints are highly appreciated. >> >> Dominik >> >> #PETSc Option Table entries: >> -ksp_converged_reason >> -ksp_converged_use_initial_residual_norm >> -ksp_monitor_true_residual >> -ksp_norm_type unpreconditioned >> -ksp_rtol 1e-3 >> -ksp_type bcgs >> -ksp_view >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> ?0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm >> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00 >> ?1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm >> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02 >> [0]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [0]PETSC ERROR: Petsc has generated inconsistent data! >> [0]PETSC ERROR: Divide by zero! >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [2]PETSC ERROR: Petsc has generated inconsistent data! >> [2]PETSC ERROR: Divide by zero! >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [2]PETSC ERROR: See docs/changes/index.html for recent updates. >> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [2]PETSC ERROR: See docs/index.html for manual pages. >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >> [2]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [2]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >> [2]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=CD3T10::SaveSolution() >> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [0]PETSC ERROR: See docs/index.html for manual pages. >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >> [0]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [0]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >> [0]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: KSPSolve_BCGS() line 75 in >> src/ksp/ksp/impls/bcgs/[1]PETSC ERROR: --------------------- Error >> Message ------------------------------------ >> [1]PETSC ERROR: Petsc has generated inconsistent data! >> [1]PETSC ERROR: Divide by zero! >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [1]PETSC ERROR: See docs/changes/index.html for recent updates. >> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [1]PETSC ERROR: See docs/index.html for manual pages. >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >> [1]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [1]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >> [1]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >> [2]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >> [2]PETSC ERROR: User provided function() line 1215 in >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >> [3]PETSC ERROR: --------------------- Error Message >> ------------------------------------ >> [3]PETSC ERROR: Petsc has generated inconsistent data! >> [3]PETSC ERROR: Divide by zero! >> [3]PETSC ERROR: >> ------------------------------------------------------------------------ >> [3]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >> 13:37:48 CDT 2011 >> [3]PETSC ERROR: See docs/changes/index.html for recent updates. >> [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >> [3]PETSC ERROR: See docs/index.html for manual pages. >> [3]PETSC ERROR: >> ------------------------------------------------------------------------ >> [3]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >> [3]PETSC ERROR: Libraries linked from >> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >> [3]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >> [3]PETSC ERROR: Configure options >> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >> --download-f-blas-lapack=bcgs.c >> [0]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: User provided function() line 1215 in >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1 >> --download-parmetis=1 --with-x=0 --with-debugging=1 >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >> [1]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >> [1]PETSC ERROR: User provided function() line 1215 in >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1 >> --download-parmetis=1 --with-x=0 --with-debugging=1 >> [3]PETSC ERROR: >> ------------------------------------------------------------------------ >> [3]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >> [3]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >> [3]PETSC ERROR: User provided function() line 1215 in >> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >> PetscSolver::Finalize() >> PetscSolver::FinalizePetsc() >> >> >> On Fri, Aug 26, 2011 at 11:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >>> >>> ?Are you sure that is the entire error message. It should print the routine >>> and the line number where this happens. >>> >>> ? Likely it is at >>> >>> ?do { >>> ? ?ierr = VecDot(R,RP,&rho);CHKERRQ(ierr); ? ? ? /* ? rho <- (r,rp) ? ? ?*/ >>> ? ?beta = (rho/rhoold) * (alpha/omegaold); >>> ? ?ierr = VecAXPBYPCZ(P,1.0,-omegaold*beta,beta,R,V);CHKERRQ(ierr); ?/* p >>> <- r - omega * beta* v + beta * p */ >>> ? ?ierr = KSP_PCApplyBAorAB(ksp,P,V,T);CHKERRQ(ierr); ?/* ? v <- K p ? ? ? >>> ? ? */ >>> ? ?ierr = VecDot(V,RP,&d1);CHKERRQ(ierr); >>> ? ?if (d1 == 0.0) SETERRQ(((PetscObject)ksp)->comm,PETSC_ERR_PLIB,"Divide >>> by zero"); >>> ? ?alpha = rho / d1; ? ? ? ? ? ? ? ? /* ? a <- rho / (v,rp) ?*/ >>> >>> ?Which means bi-cg-stab has broken down. You'll need to consult references >>> to Bi-CG-stab to see why this might happen (it can happen while GMRES is >>> happy). It may be KSPBCGSL can proceed past this point with a problem >>> values for >>> ? Options Database Keys: >>> + ?-ksp_bcgsl_ell <ell> Number of Krylov search directions >>> - ?-ksp_bcgsl_cxpol Use a convex function of the MR and OR polynomials >>> after the BiCG step >>> - ?-ksp_bcgsl_xres <res> Threshold used to decide when to refresh computed >>> residuals >>> >>> but most likely the preconditioner or matrix is bogus in some way since I >>> think Bi-CG-stab rarely breaks down in practice. >>> >>> >>> ?Barry >>> >>> >>> >>> On Aug 26, 2011, at 4:05 PM, Dominik Szczerba wrote: >>> >>>> When solving my linear system with -ksp_type bcgs I get: >>>> >>>> ?0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm >>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>>> ?1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm >>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02 >>>> [1]PETSC ERROR: --------------------- Error Message >>>> ------------------------------------ >>>> [1]PETSC ERROR: Petsc has generated inconsistent data! >>>> [1]PETSC ERROR: Divide by zero! >>>> [1]PETSC ERROR: >>>> ------------------------------------------------------------------------ >>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>>> 13:37:48 CDT 2011 >>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates. >>>> [1]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message >>>> ------------------------------------ >>>> [3]PETSC ERROR: Petsc has generated inconsistent data! >>>> [3]PETSC ERROR: Divide by zero! >>>> >>>> PETSc Option Table entries: >>>> -ksp_converged_reason >>>> -ksp_monitor >>>> -ksp_monitor_true_residual >>>> -ksp_norm_type unpreconditioned >>>> -ksp_right_pc >>>> -ksp_rtol 1e-3 >>>> -ksp_type bcgs >>>> -ksp_view >>>> -log_summary >>>> -pc_type jacobi >>>> >>>> When solving the same system with GMRES all works fine. This is a >>>> simple test diffusion problem. How can I find out what the problem is? >>>> >>>> Dominik >>> >>> > >
