Improving profiler histogram resolution

Richard Allen Sat, 14 Dec 2024 13:19:05 -0800

Currently, OpenOCD's gmon profiling output is limited to a 128KBucket
histogram. This works well when samples lie within roughly 128K instructions
of each other, but when profiles contain program counter samples with
extremes beyond that, we lose instruction level granularity as the
buckets contain multiple instructions. In many cases, the presence
of multiple instruction memories at sparse addresses causes 128KBucket
profiles to be unusably sparse, as the buckets can span multiple
functions.


One workaround is to specify the minimum and maximum address profiled,
but this is harder to work with, since each memory must be profiled
individually and gives confusing results because the filtered out
samples are not removed from the histogram duration.

For example, the ESP32-S3 has ~512MB of instruction-bus address space
between its RTC instruction area, XIP range, and internal ROM0.

The following patch-set first sorts the samples, then encodes one bucket
at a time. I've left the limiter in place, but raised limit is now 256MBuckets
to prevent giant gmon files on systems with very large instruction address
ranges. On my ESP32-S3, I often get profiles of 30-40MB.

 Patches:
  https://review.openocd.org/c/openocd/+/8603/
  https://review.openocd.org/c/openocd/+/8604/
  https://review.openocd.org/c/openocd/+/8605/
  https://review.openocd.org/c/openocd/+/8277/

I also wanted to ask for advice about the gmon.out format - perhaps
there is a better way to encode these in gmon files? Would a histogram
per memory bus or another way work better?

Thanks,
-Richard

Improving profiler histogram resolution

Reply via email to