https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92836
Bug ID: 92836 Summary: segfault with inquire() Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libfortran Assignee: unassigned at gcc dot gnu.org Reporter: abensonca at gmail dot com Target Milestone: --- I'm attempting to track down a segfault that seems to be associated with the inquire() statement. Unfortunately it only happens when running a massively parallel code, using both MPI and OpenMP parallelism, and I have as yet been unable to make a simple test case that reproduces the bug. I also haven't convinced myself 100% that the problem is really in the libgfortran code for inquire() and not in my own code, but since it seems that inquire() should be thread safe it seems that the problem at least could be in libgfortran. So, I'm hoping that someone on the mailing list has suggestions for how to track this down as I've exhausted everything I can think of! The segfault is triggered by: inquire(file=fileName,exist=exists) This works as expected most of the time, but occasionally (typically after thousands of calls in a code running with 128 MPI processes and 4 OpenMP threads each) it will segfault. The gdb trace for the failed thread is: (gdb) where #0 galacticus_error::galacticus_signal_handler_sigsegv () at ./work/buildMPI/ galacticus.error.p.F90:328 #1 <signal handler called> #2 find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc- trunk/libgfortran/io/unix.c:1725 #3 0x00007f33ac1b7460 in find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741 #4 0x00007f33ac1b7460 in find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741 #5 0x00007f33ac1b7460 in find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741 #6 0x00007f33ac1b7460 in find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741 #7 0x00007f33ac1b7460 in find_file0 (u=<optimized out>, st=st@entry=0x7ffed9b88ad0) at ../../../gcc-trunk/libgfortran/io/unix.c:1741 #8 0x00007f33ac1b8bbb in _gfortrani_find_file (file=<optimized out>, file_len=<optimized out>) at ../../../gcc-trunk/libgfortran/io/unix.c:1779 #9 0x00007f33ac1a601d in _gfortran_st_inquire (iqp=0x7ffed9b88bf0) at ../../../gcc-trunk/libgfortran/io/inquire.c:803 and the relevant line in unix.c is: 1725 if (st[0].st_dev == s->st_dev && st[0].st_ino == s->st_ino) If I examine the variables used in this line I find: (gdb) print st[0] $2 = {st_dev = 24, st_ino = 12194159, st_nlink = 1, st_mode = 33188, st_uid = 509, st_gid = 100, __pad0 = 0, st_rdev = 0, st_size = 104168, st_blksize = 1048576, st_blocks = 208, st_atim = {tv_sec = 1575236480, tv_nsec = 343597308}, st_mtim = {tv_sec = 1575236486, tv_nsec = 774663718}, st_ctim = {tv_sec = 1575236486, tv_nsec = 774663718}, __unused = {0, 0, 0}} (gdb) print *s Cannot access memory at address 0x7f33818d7eb0 which suggests that "s" is somehow corrupted. I don't see how this can happen though as the "unit_lock" lock is used by find_file() which should prevent any modification to the tree of units. One other OpenMP thread was waiting for unit_lock: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/ lowlevellock.S:136 #1 0x00007f33ab4a354f in _L_lock_1028 () from /home/abenson/Galacticus/Tools/ lib/libpthread.so.0 #2 0x00007f33ab4a336b in __pthread_mutex_lock (mutex=0x7f33ac40b340 <_gfortrani_unit_lock>) at pthread_mutex_lock.c:61 #3 0x00007f33ac1b6f3c in __gthread_mutex_lock (__mutex=0x7f33ac40b340 <_gfortrani_unit_lock>) at ../libgcc/gthr-default.h:749 #4 _gfortrani_newunit_alloc () at ../../../gcc-trunk/libgfortran/io/unit.c: 906 #5 0x00007f33ac1b706b in _gfortrani_get_unit (dtp=dtp@entry=0x7f33a1785800, do_create=do_create@entry=1) at ../../../gcc-trunk/libgfortran/io/unit.c:555 #6 0x00007f33ac1b4ebd in data_transfer_init (dtp=dtp@entry=0x7f33a1785800, read_flag=read_flag@entry=0) at ../../../gcc-trunk/libgfortran/io/transfer.c: 2851 #7 0x00007f33ac1b5924 in _gfortran_st_write (dtp=dtp@entry=0x7f33a1785800) at ../../../gcc-trunk/libgfortran/io/transfer.c:4392 and was doing other writes to an internal file unit just prior to this.