On 9/9/22 18:20, John Reiser wrote:
[[Aggressive snipping, but relevant details preserved.]]

No threading is used. Postgres is multi-process, and uses shared memory for the shared cache (through shm_open etc.).

Multi-process plus shm_open() IS THREADING!  Not pthreads, but multiple
execution contexts that read and write the same memory, which is
subject to the same types of synchronization errors as pthreads.
Perhaps --tool=drd and --tool=helgrind can help.


OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine.


[[Another topic]]
Sure, but that's more of a workaround - it does not make the core file useful, it provides alternative way to get to the same result. Plus it requires additional tooling/scripting, and I'd prefer keeping the tooling as simple as possible.

I made a specific suggestion that takes less than one hour: build a small test case that performs a short chain of subroutine calls, with the last routine generating a deliberate SIGABRT.  Run the test case under valgrind, get a core file from valgrind, and see if gdb gives the correct traceback from that core file.  The objective is to provide a strong clue about whether *every* core file generated by valgrind (in your environment) fails to work well with gdb.  Perhaps solving the problem that involves your larger and more-complex case can be subsumed by analyzing
something that is much simpler.

Please perform that experiment and report the results here.


I did this experiment - attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). When built like this:

  $ gcc valgrind-core-test.c -O0 -g

then the core file produced without valgrind is perfectly fine:

  $ gdb ./a.out core
  ...
  Core was generated by `./a.out'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000005594350734 in f3 () at valgrind-core-test.c:6
  6             *ptr = 'a';
  (gdb) bt
  #0  0x0000005594350734 in f3 () at valgrind-core-test.c:6
  #1  0x0000005594350750 in f2 () at valgrind-core-test.c:13
  #2  0x0000005594350768 in f1 () at valgrind-core-test.c:18
  #3  0x0000005594350780 in main () at valgrind-core-test.c:23

but when run under valgrind it looks like this:

  $ gdb ./a.out vgcore.1395835
  ...
  Core was generated by `'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000000000108734 in ?? ()
  (gdb) bt
  #0  0x0000000000108734 in ?? ()
  #1  0x0000000000108780 in ?? ()
  #2  0x0000000000108644 in ?? ()

However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace.

So perhaps this is specific to (either) gcc 10.2, or aarch64 platform.


regards
Tomas
#include <stdlib.h>

char * f3(void)
{
        char *ptr = NULL;
        *ptr = 'a';
        return ptr;
        // abort();
}

void f2(void)
{
        (void) f3();
}

void f1(void)
{
        f2();
}

void main(void)
{
        f1();
}

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/10/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20210110 (Debian 10.2.1-6) 

_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to