Hi Lukasz,
Thanks for the tip.
I tied using valgrind. However, I got a lot of errors at a few of
locations. One complained of uninitialized value of :
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
But I already initialize "ierr". Are these errors valid or can I hide them?
==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300== at 0x3C2A872849: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)
==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
==17300== Uninitialised value was created by a stack allocation
==17300== at 0x3C2A8E2C82: setmntent (in /lib64/libc-2.12.so)
==17300==
==17300== Conditional jump or move depends on uninitialised value(s)
==17300== at 0x3C2A87284F: _IO_file_fopen@@GLIBC_2.2.5 (in
/lib64/libc-2.12.so)
==17300== by 0x3C2A866D95: __fopen_internal (in /lib64/libc-2.12.so)
==17300== by 0x3C2A8E2CB3: setmntent (in /lib64/libc-2.12.so)
==17300== by 0xA726083: mca_mpool_hugepage_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/openmpi/mca_mpool_hugepage.so)
==17300== by 0x65A83A1: mca_base_framework_components_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x6614041: mca_mpool_base_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x65B1EC0: mca_base_framework_open (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libopen-pal.so.20.10.1)
==17300== by 0x5E11123: ompi_mpi_init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5E31032: PMPI_Init (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi.so.20.10.1)
==17300== by 0x5978E87: PMPI_INIT (in
/home/tsltaywb/lib/openmpi-2.1.1/lib/libmpi_mpifh.so.20.11.0)
==17300== by 0xB29696: petscinitialize_ (zstart.c:316)
==17300== by 0xA80D2B: MAIN__ (ibm3d_high_Re.F90:63)
Thank you very much.
Yours sincerely,
================================================
TAY Wee-Beng (Zheng Weiming) ιδΌζ
Personal research webpage:http://tayweebeng.wixsite.com/website
Youtube research
showcase:https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
linkedin:www.linkedin.com/in/tay-weebeng
================================================
On 7/6/2017 3:22 PM, Lukasz Kaczmarczyk wrote:
On 7 Jun 2017, at 07:57, TAY wee-beng <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I have been PETSc together with my CFD code. There seems to be a bug
with the Intel compiler such that when I call some DM routines such
as DMLocalToLocalBegin, a segmentation violation will occur if full
optimization is used. I had posted this question a while back. So the
current solution is to use -O1 -ip instead of -O3 -ipo -ip for
certain source files which uses DMLocalToLocalBegin etc.
Recently, I made some changes to the code, mainly adding some stuffs.
However, depending on my options. some cases still go thru the same
program path.
Now when I tried to run those same cases, I got segmentation
violation, which didn't happen before:
/ IIB_I_cell_no_uvw_total2 14 10 6 3//
// 2 1/
/[0]PETSC ERROR:
------------------------------------------------------------------------//
//[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
Violation, probably memory access out of range//
//[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger//
//[0]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind//
//[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
Mac OS X to find memory corruption errors//
//[0]PETSC ERROR: configure using --with-debugging=yes, recompile,
link, and run //
//[0]PETSC ERROR: to get more information on the crash.//
//[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------//
//[0]PETSC ERROR: Signal received//
//[0]PETSC ERROR: See
http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
shooting.//
//[0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 //
//[0]PETSC ERROR: ./a.out /
I can't debug using VS since the codes have been optimized. I tried
to print messages (if (myid == 0) print "1") to pinpoint the error.
Strangely, after adding these print messages, the error disappears.
/ IIB_I_cell_no_uvw_total2 14 10 6 3//
// 2 1//
// 1//
// 2//
// 3//
// 4//
// 5//
// 1 0.26873613 0.12620288 0.12949340 1.11422363
0.43983516E-06 -0.59311066E-01 0.25546227E+04//
// 2 0.22236892 0.14528589 0.16939270 1.10459102
0.74556128E-02 -0.55168234E-01 0.25532419E+04//
// 3 0.20764796 0.14832689 0.18780489 1.08039569
0.80299767E-02 -0.46972411E-01 0.25523174E+04/
Can anyone give a logical explanation why this is happening?
Moreover, if I removed printing 1 to 3, and only print 4 and 5,
segmentation violation appears again.
I am using Intel Fortran 2016.1.150. I wonder if it helps if I post
in the Intel Fortran forum.
I can provide more info if require.
You very likely write on the memory, for example when you exceed the
size of arrays. Depending on your compilation options, starting
parameters, etc. you write in an uncontrolled way on the part of
memory which belongs to your process or protected by operation system.
In the second case, you have a segmentation fault. You can have
correct results for some runs, but your bug is there hiding in the dark.
To put light on it, you need Valgrind. Compile the code with debugging
on, no optimisation and start searching. You can run as well generate
core file and in gdb/ldb buck track error.
Lukasz