OK, but why would that break core files only with valgrind? Because when ran 
directly, the core files work perfectly fine.

[Rhetorical]  Why are there bugs?

[Practical]  The operating system itself is the writer of ordinary
core files, which contain process state: register values, copies of
Writable pages, partial information from Read-only pages, etc.
Valgrind is an in-process emulator.  As far as the OS is concerned,
the process is valgrind, not postgresql.  The register values
are those of the valgrind emulator internal code, not of the target
program that valgrind is emulating.

In order for the core file to look like it was generated for postgresql,
then valgrind must write the core file.  The spec for the layout
of a core file (the C-language 'struct' that corresponds to the
sequence of bytes in the file) is rife with opportunities for bugs.
First, the spec is hard to find, or may refer to other documents
that are hard to access.  (What _exactly_ is the entire programmer-
visible register state?)
Then the spec is not executable (directly compilable).  Often the
spec or the C_language 'struct' is not updated in timely manner
when the hardware or the OS changes.

In practice it is very easy for there to be a discrepancy involving
the presence, order, width, or alignment of various fields,
especially for condition codes, processor modes (32 or 64 bit?),
and optional register files or accelerators (floating point,
SIMD, vector units, etc.)



... attached is a simple .c file, with a trivial example (3 functions) and 
segfault (or abort). ...
The core file produced without valgrind is perfectly fine:

   $ gdb ./a.out core
   ...
   Core was generated by `./a.out'.
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  0x0000005594350734 in f3 () at valgrind-core-test.c:6
   6        *ptr = 'a';
   (gdb) bt
   #0  0x0000005594350734 in f3 () at valgrind-core-test.c:6
   #1  0x0000005594350750 in f2 () at valgrind-core-test.c:13
   #2  0x0000005594350768 in f1 () at valgrind-core-test.c:18
   #3  0x0000005594350780 in main () at valgrind-core-test.c:23

but when run under valgrind it looks like this:

   $ gdb ./a.out vgcore.1395835
   ...
   Core was generated by `'.
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  0x0000000000108734 in ?? ()
   (gdb) bt
   #0  0x0000000000108734 in ?? ()
   #1  0x0000000000108780 in ?? ()
   #2  0x0000000000108644 in ?? ()

However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it 
works just fine and I get the same backtrace.

So perhaps this is specific to (either) gcc 10.2, or aarch64 platform.

Bingo!  You now have the raw material for a very good bug report:
"valgrind-generated core files lose debugging info on aarch64".
Please file a bug report; see  https://valgrind.org/support/bug_reports.html

(Also note that several values for program counter in the two tracebacks
agree in the lowest 12 bits (3 hex digits).  So this may be
some confusion about the placement ("relocation") of [groups of]
whole pages in the address space.)


_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to