Hi,
My code was found to be giving error answer in one of the cluster, even
on single processor. No error msg was given. It used to be working fine.
I run the debug version and it gives the error msg:
[0]PETSC ERROR:
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
Exception,probably divide by zero
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSC
ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to
find memory corruption errors
[0]PETSC ERROR: likely location of problem given in stack below
[0]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
[0]PETSC ERROR: INSTEAD the line number of the start of the function
[0]PETSC ERROR: is given.
[0]PETSC ERROR: [0] VecDot_Seq line 62 src/vec/vec/impls/seq/bvec1.c
[0]PETSC ERROR: [0] VecDot_MPI line 14 src/vec/vec/impls/mpi/pbvec.c
[0]PETSC ERROR: [0] VecDot line 118 src/vec/vec/interface/rvector.c
[0]PETSC ERROR: [0] KSPSolve_BCGS line 39 src/ksp/ksp/impls/bcgs/bcgs.c
[0]PETSC ERROR: [0] KSPSolve line 356 src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Signal received!
[0]PETSC ERROR:
------------------------------------------------------------------------
It happens after KSPSolve. There was no problem on other cluster. So how
should I debug to find the error?
I tried to compare the input matrix and vector between different cluster
but there are too many values.
--
Thank you
Yours sincerely,
TAY wee-beng