https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97542

            Bug ID: 97542
           Summary: Enable OpenMP efficient performance profiling via ITT
                    tracing
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vitaly.slobodskoy at huawei dot com
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49429
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49429&action=edit
OpenMP runtime changes

In order to optimize OpenMP workloads, it is quite important to have a
dedicated performance analysis tool familiar with the OpenMP runtime specifics.
The typical OpenMP performance issues are:
        - Not all the performance-critical code is parallel
          * Serial time significantly affects scaling (Amdahl’s law)
        - Work balance is not good
          * Not all the cores doing useful work
        - Overhead on
          * Synchronization
          * Scheduling
          * Threads creation

Performance analysis tool should be able to identify serially executed portion
and parallel execution within work-sharing construct. Imbalance within the
parallel region can hardly be calculated without dedicated runtime support.

The proposal is to instrument GCC OpenMP runtime with add ITT API
(https://github.com/intel/ittapi) like it was already done for LLVM
(https://github.com/llvm/llvm-project/tree/master/openmp/runtime/src/thirdparty/ittnotify)
to enable dedicated OpenMP support within the tools like Intel VTune
(https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/openmp-code-analysis-method.html)
and others. This would enable "Serial Time", "Parallel Time", "Imbalance Time"
metrics and would allow performance tools to focus on serial or parallel
execution.

ITT is a lightweight API for source-based instrumentation. Open-source part is
simply a set of APIs and single .c file for loading dynamic ITT library
(so-called ITT collector, can be easily created by anyone). In order to enable
tracing, target application needs to be launched under the
"INTEL_LIBITTNOTIFY64=<collector>" environment variable. Otherwise all the ITT
calls would do nothing without causing any noticeable runtime overhead.

Attaching the initial proposal for the ITT integration enabling Serial/Parallel
Time metrics:
        - core.patch is the actual changes within the OpenMP runtime
        - Itt.patch is integration of ITT API (GPLv2 license is used)
        - autogenerated.patch - the list of autogenerated files as result of
"autoreconf" launch within libgomp directory

This proposal adds new "--disable-itt-instrumentation" configure option which
completely disables (removes) all the tracing. The tracing is ON by default.
OpenMP Imbalance time calculation is not included in this patch.

Reply via email to