I found a new approach. By emitting histograms dynamically according
to gaps in the address space, we can skip encoding long regions
of mostly-zero-bins.

https://review.openocd.org/c/openocd/+/8739/ (plus the others)

Compared to increasing the single-histogram bucket count to fit the
target address space, this approach generates much smaller gmon files
for many systems.

In some cases, this approach may generate larger files than the previous
128KBucket encoder, but solves the problem where sparse address space
systems might end up with histogram bins larger than functions.
On ESP32-S3, it was common to see histogram bins >200B when
using only a few of the memory interfaces.

For compatibility with existing gprof builds, we round each
histogram bin to 2 bytes for compatibility with existing gprof builds
(before a future binutils 2.45 build). So not quite instruction-accurate
on x86 and Xtensa but within 1 instruction. CPUs with instruction-sizes
divisible by two are instruction-accurate.

-Richard

Reply via email to