Recent versions of gfortran have a -traceback option that (together with -g, I think) would give you the (almost) exact code line that causes the problem; without that you shoudl still be able to run addr2line -e /wherever/bin/gipaw.x 0x44e576 to get the line of cgsolve_all that triggers a NaN or inf. However, I do not know the gipaw code, maybe Davide can guess something from that.
cheers On Tue, 25 Oct 2011 18:54:30 +0200, Carlo Nervi <carlo.nervi at unito.it> wrote: > Il 25/10/2011 18.22, Lorenzo Paulatto ha scritto: >> You could try to compile using this option (all together, without >> spaces): >> -ffpe-trap=invalid,zero,overflow,underflow,denormal >> to force the code to crash at the first appearance of NaN, this could >> help >> track down the source of the problem. > > Thank you Lorenzo. I did what you suggested and I got many errors like > this: > > [chpc111:47070] Signal: Floating point exception (8) > [chpc111:47070] Signal code: Floating point divide-by-zero (3) > [chpc111:47070] Failing at address: 0x44e576 > [chpc111:47092] [ 0] /lib64/libpthread.so.0(+0x10eb0) [0x2b972490deb0] > [chpc111:47092] [ 1] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(cgsolve_all_+0xff6) > [0x44e576] > [chpc111:47092] [ 2] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(greenfunction_+0x1280) > [0x44d1a0] > [chpc111:47092] [ 3] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(paramagnetic_correction_aug_+0x1e30) > [0x4718f0] > [chpc111:47092] [ 4] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(suscept_crystal_+0x520f) > [0x45c02f] > [chpc111:47092] [ 5] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(main+0x1a7) > [0x43d227] > [chpc111:47092] [ 6] /lib64/libc.so.6(__libc_start_main+0xed) > [0x2b9724b3db0d] > [chpc111:47092] [ 7] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x() [0x43d2fd] > [chpc111:47092] *** End of error message *** > [chpc111:47105] *** Process received signal *** > > and > > [chpc111:47076] *** End of error message *** > [chpc111:47102] [ 3] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(paramagnetic_correction_aug_+0x1e30) > [0x4718f0] > [chpc111:47102] [ 4] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(suscept_crystal_+0x520f) > [0x45c02f] > [chpc111:47102] [ 5] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x(main+0x1a7) > [0x43d227] > [chpc111:47102] [ 6] /lib64/libc.so.6(__libc_start_main+0xed) > [0x2ac89299db0d] > [chpc111:47102] [ 7] > /home/nervi/src/QE432/espresso-4.3.2/GIPAW/src/gipaw.x() [0x43d2fd] > [chpc111:47102] *** End of error message *** > > > But if I run mpirun -n 1 I got the following: > > Fortran runtime warning: IEEE 'denormal number' exception not supported. > At line 739 of file suscept_crystal.f90 (unit = 99, file = > '/tmp/ceresoli/benzene.gipaw_recover') > Fortran runtime error: I/O past end of record on unformatted file > -------------------------------------------------------------------------- > mpirun has exited due to process rank 0 with PID 47850 on > node chpc111 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > > > Any hints? > Carlo > -- Lorenzo Paulatto IdR @ IMPMC/CNRS & Universit? Paris 6 phone: +33 (0)1 44275 084 / skype: paulatz www: http://www-int.impmc.upmc.fr/~paulatto/ mail: 23-24/4?16 Bo?te courrier 115, 4 place Jussieu 75252 Paris C?dex 05
