After investigating this, the root cause of the timeouts is caused by
GDB using the wrong type of breakpoint (between a 4 bytes ARM breakpoint
and a 2 bytes thumb breakpoint), which causes some unexpected results.

The reason why this is happening is a bit more complex though. GDB has a
couple mechanisms for tracking loading/unloading of shared libraries in
dynamically-linked binaries. Via _dl_debug_state and r_brk and via stap
probes.

Up until Ubuntu 18.04 (glibc 2.27), GDB could not use the stap probes
mechanism because it ran into a bug when parsing stap expression, thus
failing the check and falling back to using the old _dl_debug_state and
r_brk mechanism.

The _dl_debug_state/r_brk mechanism works because we have an entry for
_dl_debug_state in the .dynsym section of ld.so.  Even though ld.so is
completely stripped of mapping symbols (another way to tell arm/thumb
modes apart), which are only available via the debug symbols file, GDB
can still tell _dl_debug_state is arm or thumb mode because the ELF
symbol carries a flag indicating so. That's why this fallback mechanism
works.

On Ubuntu 20.04, running glibc 2.31, GDB no longer runs into problems
with stap probes. Thus GDB decides to use this mechanism instead of the
old _dl_debug_state/r_brk one.

Both mechanisms function by having GDB insert breakpoints at specific
location so shared library events can be tracked. But in the stap probes
case there are no real symbols.

What we have is metadata that contains the name of the probe and its
address. This address falls within a particular function. For example,
init_start and init_complete are probe points that fall within dl_main.
The probe points do not seem to carry any information about whether we
have arm or thumb mode.

As before, the mapping symbols should tell us what the mode is, but
ld.so is stripped and doesn't carry those. But GDB could look at the ELF
symbol of the function the probe is sitting at, except that these
symbols (not considered special in any way) have been stripped as well.
So the arm/thumb information is completely gone and GDB can no longer
make the correct decision.

So GDB defaults to assuming arm mode for the breakpoint to use, which is
obviously wrong for thumb code.

There are two possible solutions:

1 - Fallback to using _dl_debug_state/r_brk for armhf in GDB. This is
considered bad by GDB's maintainers, because it means using an outdated
mechanism instead of better interfaces.

2 - Don't strip glibc/ld.so function symbols that have stap probes
installed in them.

Right now, these are the functions that contain probes and that GDB
wants to breakpoint in a special way:

_dl_main, _dl_map_object_from_fd, lose, dl_open_worker and
_dl_close_worker

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1927192

Title:
  gdb ftbfs on armhf, testsuite timeouts

To manage notifications about this bug go to:
https://bugs.launchpad.net/gdb/+bug/1927192/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to