OK I got it: initial guess. Was zero, why not, but perturbing it a bit off zero seems to work. I saw this already before with my own bicgstab implementation, and always thought this was some bug. So you either have a similar one, or its a bicgstab's feature :)
Dominik On Fri, Aug 26, 2011 at 11:51 PM, Dominik Szczerba <dominik at itis.ethz.ch> wrote: > Valgrind excerpts below, none seems related, and both are there also with > gmres. > > I just cant believe bicgstab would quit on such a trivial problem... > > Regards, > Dominik > > ==10423== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==10423== ? ?at 0x6B5E789: writev (writev.c:56) > ==10423== ? ?by 0x515658: MPIDU_Sock_writev (sock_immed.i:610) > ==10423== ? ?by 0x517943: MPIDI_CH3_iSendv (ch3_isendv.c:84) > ==10423== ? ?by 0x4FB756: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:509) > ==10423== ? ?by 0x4FD6E4: MPID_Isend (mpid_isend.c:118) > ==10423== ? ?by 0x4E502A: MPIC_Isend (helper_fns.c:210) > ==10423== ? ?by 0xED701D: MPIR_Alltoall (alltoall.c:420) > ==10423== ? ?by 0xED7880: PMPI_Alltoall (alltoall.c:685) > ==10423== ? ?by 0xE4BC95: SetUp__ (setup.c:122) > ==10423== ? ?by 0xE4C4E4: PartitionSmallGraph__ (weird.c:39) > ==10423== ? ?by 0xE497D0: ParMETIS_V3_PartKway (kmetis.c:131) > ==10423== ? ?by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97) > ==10423== ? ?by 0x717D70: MatPartitioningApply (partition.c:236) > ==10423== ? ?by 0x5287D7: PetscSolver::LoadMesh(std::string const&) > (PetscSolver.cxx:676) > ==10423== ? ?by 0x4C6157: CD3T10_USER::ProcessInputFile() > (cd3t10mpi_main.cxx:321) > ==10423== ? ?by 0x4C3B57: main (cd3t10mpi_main.cxx:568) > ==10423== ?Address 0x71ce9b4 is 4 bytes inside a block of size 72 alloc'd > ==10423== ? ?at 0x4C28FAC: malloc (vg_replace_malloc.c:236) > ==10423== ? ?by 0xE5C792: GKmalloc__ (util.c:151) > ==10423== ? ?by 0xE57C09: PreAllocateMemory__ (memory.c:38) > ==10423== ? ?by 0xE496B3: ParMETIS_V3_PartKway (kmetis.c:116) > ==10423== ? ?by 0x71B774: MatPartitioningApply_Parmetis (pmetis.c:97) > ==10423== ? ?by 0x717D70: MatPartitioningApply (partition.c:236) > ==10423== ? ?by 0x5287D7: PetscSolver::LoadMesh(std::string const&) > (PetscSolver.cxx:676) > ==10423== ? ?by 0x4C6157: CD3T10_USER::ProcessInputFile() > (cd3t10mpi_main.cxx:321) > ==10423== ? ?by 0x4C3B57: main (cd3t10mpi_main.cxx:568) > ==10423== > ==10423== Conditional jump or move depends on uninitialised value(s) > ==10423== ? ?at 0x55B6510: inflateReset2 (in > /lib/x86_64-linux-gnu/libz.so.1.2.3.4) > ==10423== ? ?by 0x55B6605: inflateInit2_ (in > /lib/x86_64-linux-gnu/libz.so.1.2.3.4) > ==10423== ? ?by 0x5308C13: H5Z_filter_deflate (H5Zdeflate.c:110) > ==10423== ? ?by 0x5308170: H5Z_pipeline (H5Z.c:1103) > ==10423== ? ?by 0x518BB69: H5D_chunk_lock (H5Dchunk.c:2758) > ==10423== ? ?by 0x518CB20: H5D_chunk_read (H5Dchunk.c:1728) > ==10423== ? ?by 0x519BDDA: H5D_read (H5Dio.c:447) > ==10423== ? ?by 0x519C248: H5Dread (H5Dio.c:173) > ==10423== ? ?by 0x4CEB99: HDF5::HDF5Reader::readData(std::string > const&) (HDF5Reader.cxx:634) > ==10423== ? ?by 0x4CE305: HDF5::HDF5Reader::read(std::vector<int, > std::allocator<int> >&, std::string const&) (HDF5Reader.cxx:527) > ==10423== ? ?by 0x4C71A3: CD3T10_USER::SetupConstraints() > (cd3t10mpi_main.cxx:404) > ==10423== ? ?by 0x4BA02A: CD3T10::Solve() (CD3T10mpi.cxx:658) > ==10423== ? ?by 0x4C3BE1: main (cd3t10mpi_main.cxx:590) > ==10423== > > On Fri, Aug 26, 2011 at 11:31 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >> >> ?First run with valgrind >> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind to >> make sure it does not have some "code bug cause". >> >> ? Do you get the same message on one process. >> >> ? My previous message still holds, if it is not a "code bug" then bi-CG-stab >> is just breaking down on that matrix and preconditioner combination. >> >> ? Barry >> >> On Aug 26, 2011, at 4:27 PM, Dominik Szczerba wrote: >> >>> Later in the message he only requested that I use "-ksp_norm_type >>> unpreconditioned". So I did, and the error comes back, now fully >>> documented below. As I wrote, it works fine with gmres, and the >>> problem is very simple, diagonal dominant steady state diffusion. >>> >>> Any hints are highly appreciated. >>> >>> Dominik >>> >>> #PETSc Option Table entries: >>> -ksp_converged_reason >>> -ksp_converged_use_initial_residual_norm >>> -ksp_monitor_true_residual >>> -ksp_norm_type unpreconditioned >>> -ksp_rtol 1e-3 >>> -ksp_type bcgs >>> -ksp_view >>> -log_summary >>> -pc_type jacobi >>> #End of PETSc Option Table entries >>> ?0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm >>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>> ?1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm >>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02 >>> [0]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [0]PETSC ERROR: Petsc has generated inconsistent data! >>> [0]PETSC ERROR: Divide by zero! >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [2]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [2]PETSC ERROR: Petsc has generated inconsistent data! >>> [2]PETSC ERROR: Divide by zero! >>> [2]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [2]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>> 13:37:48 CDT 2011 >>> [2]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [2]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>> [2]PETSC ERROR: See docs/index.html for manual pages. >>> [2]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [2]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >>> [2]PETSC ERROR: Libraries linked from >>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >>> [2]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >>> [2]PETSC ERROR: Configure options >>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >>> --download-f-blas-lapack=CD3T10::SaveSolution() >>> [0]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>> 13:37:48 CDT 2011 >>> [0]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>> [0]PETSC ERROR: See docs/index.html for manual pages. >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >>> [0]PETSC ERROR: Libraries linked from >>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >>> [0]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >>> [0]PETSC ERROR: Configure options >>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >>> [0]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [0]PETSC ERROR: KSPSolve_BCGS() line 75 in >>> src/ksp/ksp/impls/bcgs/[1]PETSC ERROR: --------------------- Error >>> Message ------------------------------------ >>> [1]PETSC ERROR: Petsc has generated inconsistent data! >>> [1]PETSC ERROR: Divide by zero! >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>> 13:37:48 CDT 2011 >>> [1]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>> [1]PETSC ERROR: See docs/index.html for manual pages. >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >>> [1]PETSC ERROR: Libraries linked from >>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >>> [1]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >>> [1]PETSC ERROR: Configure options >>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >>> --download-f-blas-lapack=1 --download-mpich=1 --download-hypre=1 >>> --with-parmetis=1 --download-parmetis=1 --with-x=0 --with-debugging=1 >>> [2]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [2]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >>> [2]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >>> [2]PETSC ERROR: User provided function() line 1215 in >>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >>> [3]PETSC ERROR: --------------------- Error Message >>> ------------------------------------ >>> [3]PETSC ERROR: Petsc has generated inconsistent data! >>> [3]PETSC ERROR: Divide by zero! >>> [3]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [3]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>> 13:37:48 CDT 2011 >>> [3]PETSC ERROR: See docs/changes/index.html for recent updates. >>> [3]PETSC ERROR: See docs/faq.html for hints about trouble shooting. >>> [3]PETSC ERROR: See docs/index.html for manual pages. >>> [3]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [3]PETSC ERROR: /home/dsz/build/framework-debug/trunk/bin/cd3t10mpi on >>> a linux-gnu named tharsis by dsz Fri Aug 26 23:23:47 2011 >>> [3]PETSC ERROR: Libraries linked from >>> /home/dsz/pack/petsc-3.1-p8/linux-gnu-c-debug/lib >>> [3]PETSC ERROR: Configure run at Mon Jul 25 14:20:10 2011 >>> [3]PETSC ERROR: Configure options >>> PETSC_DIR=/home/dsz/pack/petsc-3.1-p8 PETSC_ARCH=linux-gnu-c-debug >>> --download-f-blas-lapack=bcgs.c >>> [0]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >>> [0]PETSC ERROR: User provided function() line 1215 in >>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1 >>> --download-parmetis=1 --with-x=0 --with-debugging=1 >>> [1]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [1]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >>> [1]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >>> [1]PETSC ERROR: User provided function() line 1215 in >>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >>> 1 --download-mpich=1 --download-hypre=1 --with-parmetis=1 >>> --download-parmetis=1 --with-x=0 --with-debugging=1 >>> [3]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [3]PETSC ERROR: KSPSolve_BCGS() line 75 in src/ksp/ksp/impls/bcgs/bcgs.c >>> [3]PETSC ERROR: KSPSolve() line 396 in src/ksp/ksp/interface/itfunc.c >>> [3]PETSC ERROR: User provided function() line 1215 in >>> "unknowndirectory/"/home/dsz/src/framework/trunk/solve/PetscSolver.cxx >>> PetscSolver::Finalize() >>> PetscSolver::FinalizePetsc() >>> >>> >>> On Fri, Aug 26, 2011 at 11:17 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >>>> >>>> ?Are you sure that is the entire error message. It should print the >>>> routine and the line number where this happens. >>>> >>>> ? Likely it is at >>>> >>>> ?do { >>>> ? ?ierr = VecDot(R,RP,&rho);CHKERRQ(ierr); ? ? ? /* ? rho <- (r,rp) ? ? ?*/ >>>> ? ?beta = (rho/rhoold) * (alpha/omegaold); >>>> ? ?ierr = VecAXPBYPCZ(P,1.0,-omegaold*beta,beta,R,V);CHKERRQ(ierr); ?/* p >>>> <- r - omega * beta* v + beta * p */ >>>> ? ?ierr = KSP_PCApplyBAorAB(ksp,P,V,T);CHKERRQ(ierr); ?/* ? v <- K p ? ? ? >>>> ? ? */ >>>> ? ?ierr = VecDot(V,RP,&d1);CHKERRQ(ierr); >>>> ? ?if (d1 == 0.0) SETERRQ(((PetscObject)ksp)->comm,PETSC_ERR_PLIB,"Divide >>>> by zero"); >>>> ? ?alpha = rho / d1; ? ? ? ? ? ? ? ? /* ? a <- rho / (v,rp) ?*/ >>>> >>>> ?Which means bi-cg-stab has broken down. You'll need to consult references >>>> to Bi-CG-stab to see why this might happen (it can happen while GMRES is >>>> happy). It may be KSPBCGSL can proceed past this point with a problem >>>> values for >>>> ? Options Database Keys: >>>> + ?-ksp_bcgsl_ell <ell> Number of Krylov search directions >>>> - ?-ksp_bcgsl_cxpol Use a convex function of the MR and OR polynomials >>>> after the BiCG step >>>> - ?-ksp_bcgsl_xres <res> Threshold used to decide when to refresh computed >>>> residuals >>>> >>>> but most likely the preconditioner or matrix is bogus in some way since I >>>> think Bi-CG-stab rarely breaks down in practice. >>>> >>>> >>>> ?Barry >>>> >>>> >>>> >>>> On Aug 26, 2011, at 4:05 PM, Dominik Szczerba wrote: >>>> >>>>> When solving my linear system with -ksp_type bcgs I get: >>>>> >>>>> ?0 KSP preconditioned resid norm 1.166190378969e+01 true resid norm >>>>> 1.166190378969e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>>>> ?1 KSP preconditioned resid norm 5.658835826231e-01 true resid norm >>>>> 5.658835826231e-01 ||Ae||/||Ax|| 4.852411688762e-02 >>>>> [1]PETSC ERROR: --------------------- Error Message >>>>> ------------------------------------ >>>>> [1]PETSC ERROR: Petsc has generated inconsistent data! >>>>> [1]PETSC ERROR: Divide by zero! >>>>> [1]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> [1]PETSC ERROR: Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 >>>>> 13:37:48 CDT 2011 >>>>> [1]PETSC ERROR: See docs/changes/index.html for recent updates. >>>>> [1]PETSC ERROR: [3]PETSC ERROR: --------------------- Error Message >>>>> ------------------------------------ >>>>> [3]PETSC ERROR: Petsc has generated inconsistent data! >>>>> [3]PETSC ERROR: Divide by zero! >>>>> >>>>> PETSc Option Table entries: >>>>> -ksp_converged_reason >>>>> -ksp_monitor >>>>> -ksp_monitor_true_residual >>>>> -ksp_norm_type unpreconditioned >>>>> -ksp_right_pc >>>>> -ksp_rtol 1e-3 >>>>> -ksp_type bcgs >>>>> -ksp_view >>>>> -log_summary >>>>> -pc_type jacobi >>>>> >>>>> When solving the same system with GMRES all works fine. This is a >>>>> simple test diffusion problem. How can I find out what the problem is? >>>>> >>>>> Dominik >>>> >>>> >> >> >
