[Valgrind-users] unable to read core generated by valgrind in gdb / aarch64

Tomas Vondra Sat, 03 Sep 2022 15:39:59 -0700

Hi,

I'm having some issues with analyzing cores generated from valgrind. Ido get the core file, but when I try opening it in gdb it just showssome entirely bogus information / backtrace etc.

This is a rpi4 machine, with 64-bit debian, running a local build ofvalgrind 3.19.0 (built from sources, not a package).


This is how I run the program (postgres binary)

  valgrind --quiet --trace-children=yes --track-origins=yes \
  --read-var-info=yes --num-callers=20 --leak-check=no \
  --gen-suppressions=all --error-limit=no \
  --log-file=/tmp/valgrind.543917.log postgres \
  -D /home/debian/postgres /contrib/test_decoding/tmp_check_iso/data \
  -F -c listen_addresses= -k /tmp/pg_regress-n7HodE

I get a ~200MB core file in /tmp, which I try loading like this:

  gdb src/backend/postgres /tmp/valgrind.542299.log.core.542391

but all I get is this:

  Reading symbols from src/backend/postgres...
  [New LWP 542391]
  Cannot access memory at address 0xcc10cc00cbf0cc6
  Cannot access memory at address 0xcc10cc00cbf0cbe
  Core was generated by `'.
  Program terminated with signal SIGABRT, Aborted.
  #0  0x00000000049d42ac in ?? ()
  (gdb) bt
  #0  0x00000000049d42ac in ?? ()
  #1  0x0000000000400000 in dshash_dump (hash_table=0x0) at dshash.c:782

#2 0x0000000000400000 in dshash_dump (hash_table=0x49c0e44) atdshash.c:782

  #3  0x0000000000000000 in ?? ()

Backtrace stopped: previous frame identical to this frame (corruptstack?)

So the stack might be corrupt, for some reason? The first part looksentirely bogus too, though. The file size seems about right - with 128MBshared buffers, 200MB might be about right.

The core is triggered by an "assert" in the source, and we even log abacktrace into the log - and that seems much more plausible:

TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:"reorderbuffer.c", Line: 902, PID: 536049)

  (ExceptionalCondition+0x98)[0x8f5cec]
  (+0x57a574)[0x682574]
  (+0x579edc)[0x681edc]
  (ReorderBufferAddNewTupleCids+0x60)[0x6864dc]
  (SnapBuildProcessNewCid+0x94)[0x68b6a4]
  (heap2_decode+0x17c)[0x671584]
  (LogicalDecodingProcessRecord+0xbc)[0x670cd0]
  (+0x570f88)[0x678f88]
  (pg_logical_slot_get_changes+0x1c)[0x6790fc]
  (ExecMakeTableFunctionResult+0x29c)[0x4a92c0]
  (+0x3be638)[0x4c6638]
  (+0x3a2c14)[0x4aac14]
  (ExecScan+0x8c)[0x4aaca8]
  (+0x3bea14)[0x4c6a14]
  (+0x39ea60)[0x4a6a60]
  (+0x392378)[0x49a378]
  (+0x39520c)[0x49d20c]
  (standard_ExecutorRun+0x214)[0x49aad8]
  (ExecutorRun+0x64)[0x49a8b8]
  (+0x62e2ac)[0x7362ac]
  (PortalRun+0x27c)[0x735f08]
  (+0x626be8)[0x72ebe8]
  (PostgresMain+0x9a0)[0x733e9c]
  (+0x547be8)[0x64fbe8]
  (+0x547540)[0x64f540]
  (+0x542d30)[0x64ad30]
  (PostmasterMain+0x1460)[0x64a574]
  (+0x418888)[0x520888]

Clearly, this is not an issue valgrind is meant to detect (like invalidmemory access, etc.) but an application issue. I've tried reproducing itwithout valgrind, but it only ever happens with valgrind - my theory isit's some sort of race condition, and valgrind changes the timing in away that makes it much more likely to hit. I need to analyze the core toinspect the state more closely, etc.


Any ideas what I might be doing wrong? Or how do I load the core file?


thanks
Tomas


_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

[Valgrind-users] unable to read core generated by valgrind in gdb / aarch64

Reply via email to