https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92836

            Bug ID: 92836
           Summary: segfault with inquire()
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libfortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: abensonca at gmail dot com
  Target Milestone: ---

I'm attempting to track down a segfault that seems to be associated with the 
inquire() statement. Unfortunately it only happens when running a massively 
parallel code, using both MPI and OpenMP parallelism, and I have as yet been 
unable to make a simple test case that reproduces the bug. I also haven't 
convinced myself 100% that the problem is really in the libgfortran code for 
inquire() and not in my own code, but since it seems that inquire() should be 
thread safe it seems that the problem at least could be in libgfortran.

So, I'm hoping that someone on the mailing list has suggestions for how to 
track this down as I've exhausted everything I can think of!

The segfault is triggered by:

inquire(file=fileName,exist=exists)

This works as expected most of the time, but occasionally (typically after 
thousands of calls in a code running with 128 MPI processes and 4 OpenMP 
threads each) it will segfault. The gdb trace for the failed thread is:

(gdb) where
#0  galacticus_error::galacticus_signal_handler_sigsegv () at ./work/buildMPI/
galacticus.error.p.F90:328
#1  <signal handler called>
#2  find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-
trunk/libgfortran/io/unix.c:1725
#3  0x00007f33ac1b7460 in find_file0 (u=<optimized out>, 
st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741
#4  0x00007f33ac1b7460 in find_file0 (u=<optimized out>, 
st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741
#5  0x00007f33ac1b7460 in find_file0 (u=<optimized out>, 
st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741
#6  0x00007f33ac1b7460 in find_file0 (u=<optimized out>, 
st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741
#7  0x00007f33ac1b7460 in find_file0 (u=<optimized out>, 
st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741
#8  0x00007f33ac1b8bbb in _gfortrani_find_file (file=<optimized out>, 
file_len=<optimized out>) at ../../../gcc-trunk/libgfortran/io/unix.c:1779
#9  0x00007f33ac1a601d in _gfortran_st_inquire (iqp=0x7ffed9b88bf0) at 
../../../gcc-trunk/libgfortran/io/inquire.c:803

and the relevant line in unix.c is:

1725          if (st[0].st_dev == s->st_dev && st[0].st_ino == s->st_ino)

If I examine the variables used in this line I find:

(gdb) print st[0]
$2 = {st_dev = 24, st_ino = 12194159, st_nlink = 1, st_mode = 33188, st_uid = 
509, st_gid = 100, __pad0 = 0, st_rdev = 0, st_size = 104168, st_blksize = 
1048576, st_blocks = 208, 
  st_atim = {tv_sec = 1575236480, tv_nsec = 343597308}, st_mtim = {tv_sec = 
1575236486, tv_nsec = 774663718}, st_ctim = {tv_sec = 1575236486, tv_nsec = 
774663718}, __unused = {0, 0, 
    0}}
(gdb) print *s
Cannot access memory at address 0x7f33818d7eb0

which suggests that "s" is somehow corrupted. I don't see how this can happen 
though as the "unit_lock" lock is used by find_file() which should prevent any 
modification to the tree of units. 

One other OpenMP thread was waiting for unit_lock:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/
lowlevellock.S:136
#1  0x00007f33ab4a354f in _L_lock_1028 () from /home/abenson/Galacticus/Tools/
lib/libpthread.so.0
#2  0x00007f33ab4a336b in __pthread_mutex_lock (mutex=0x7f33ac40b340 
<_gfortrani_unit_lock>) at pthread_mutex_lock.c:61
#3  0x00007f33ac1b6f3c in __gthread_mutex_lock (__mutex=0x7f33ac40b340 
<_gfortrani_unit_lock>) at ../libgcc/gthr-default.h:749
#4  _gfortrani_newunit_alloc () at ../../../gcc-trunk/libgfortran/io/unit.c:
906
#5  0x00007f33ac1b706b in _gfortrani_get_unit (dtp=dtp@entry=0x7f33a1785800, 
do_create=do_create@entry=1) at ../../../gcc-trunk/libgfortran/io/unit.c:555
#6  0x00007f33ac1b4ebd in data_transfer_init (dtp=dtp@entry=0x7f33a1785800, 
read_flag=read_flag@entry=0) at ../../../gcc-trunk/libgfortran/io/transfer.c:
2851
#7  0x00007f33ac1b5924 in _gfortran_st_write (dtp=dtp@entry=0x7f33a1785800) at 
../../../gcc-trunk/libgfortran/io/transfer.c:4392

and was doing other writes to an internal file unit just prior to this.

Reply via email to