Hi Tom and -hackers! On Thu, Mar 28, 2024 at 7:36 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Jakub Wartak <jakub.war...@enterprisedb.com> writes: > > While chasing some other bug I've learned that backtrace_functions > > might be misleading with top elog/ereport() address. > > That was understood from the beginning: this type of backtrace is > inherently pretty imprecise, and I doubt there is much that can > be done to make it better. IIRC the fundamental problem is it only > looks at global symbols, so static functions inherently defeat it. > It was argued that this is better than nothing, which is true, but > you have to take the info with a mountain of salt.
OK, point taken, thanks for describing the limitations, still I find backtrace_functions often the best thing we have primarily due its simplicity and ease of use (for customers). I found out simplest portable way to generate NOP (or any other instruction that makes the problem go away): with the reproducer, psql returns: psql:ri-collation-bug-example.sql:48: error: ERROR: XX000: cache lookup failed for collation 0 LOCATION: get_collation_isdeterministic, lsyscache.c:1062 logfile without patch: 2024-05-07 09:05:43.043 CEST [44720] ERROR: cache lookup failed for collation 0 2024-05-07 09:05:43.043 CEST [44720] BACKTRACE: postgres: postgres postgres [local] DELETE(+0x155883) [0x55ce5a86a883] postgres: postgres postgres [local] DELETE(RI_FKey_cascade_del+0x2b0) [0x55ce5ac6dfd0] postgres: postgres postgres [local] DELETE(+0x2d488b) [0x55ce5a9e988b] [..] $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x155883 0x0000000000155883 get_constraint_type.cold <<<<<< so it's wrong as the real function should be get_collation_isdeterministic logfile with attached patch: 2024-05-07 09:11:06.596 CEST [51185] ERROR: cache lookup failed for collation 0 2024-05-07 09:11:06.596 CEST [51185] BACKTRACE: postgres: postgres postgres [local] DELETE(+0x168bf0) [0x560e1cdfabf0] postgres: postgres postgres [local] DELETE(RI_FKey_cascade_del+0x2b0) [0x560e1d200c00] postgres: postgres postgres [local] DELETE(+0x2e90fb) [0x560e1cf7b0fb] [..] $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x168bf0 0x0000000000168bf0 get_collation_isdeterministic.cold <<< It's ok with the patch NOTE: in case one will be testing this: one cannot ./configure with --enable-debug as it prevents the compiler optimizations that actually end up with the ".cold" branch optimizations that cause backtrace() to return the wrong address. > I recall speculating about whether we could somehow invoke gdb > to get a more comprehensive and accurate backtrace, but I don't > really have a concrete idea how that could be made to work. Oh no, I'm more about micro-fix rather than doing some bigger revolution. The patch simply adds this one instruction in macro, so that now backtrace_functions behaves better: 0x0000000000773d28 <+105>: call 0x79243f <errfinish> 0x0000000000773d2d <+110>: movzbl -0x12(%rbp),%eax << this ends up being added by the patch 0x0000000000773d31 <+114>: call 0xdc1a0 <abort@plt> I'll put that as for PG18 in CF, but maybe it could be backpatched too - no hard opinion on that (I don't see how that ERROR/FATAL path could cause any performance regressions) -J.
v1-0001-Tweak-ereport-so-that-it-generates-proper-address.patch
Description: Binary data