https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97542
Bug ID: 97542 Summary: Enable OpenMP efficient performance profiling via ITT tracing Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vitaly.slobodskoy at huawei dot com CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 49429 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49429&action=edit OpenMP runtime changes In order to optimize OpenMP workloads, it is quite important to have a dedicated performance analysis tool familiar with the OpenMP runtime specifics. The typical OpenMP performance issues are: - Not all the performance-critical code is parallel * Serial time significantly affects scaling (Amdahl’s law) - Work balance is not good * Not all the cores doing useful work - Overhead on * Synchronization * Scheduling * Threads creation Performance analysis tool should be able to identify serially executed portion and parallel execution within work-sharing construct. Imbalance within the parallel region can hardly be calculated without dedicated runtime support. The proposal is to instrument GCC OpenMP runtime with add ITT API (https://github.com/intel/ittapi) like it was already done for LLVM (https://github.com/llvm/llvm-project/tree/master/openmp/runtime/src/thirdparty/ittnotify) to enable dedicated OpenMP support within the tools like Intel VTune (https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/openmp-code-analysis-method.html) and others. This would enable "Serial Time", "Parallel Time", "Imbalance Time" metrics and would allow performance tools to focus on serial or parallel execution. ITT is a lightweight API for source-based instrumentation. Open-source part is simply a set of APIs and single .c file for loading dynamic ITT library (so-called ITT collector, can be easily created by anyone). In order to enable tracing, target application needs to be launched under the "INTEL_LIBITTNOTIFY64=<collector>" environment variable. Otherwise all the ITT calls would do nothing without causing any noticeable runtime overhead. Attaching the initial proposal for the ITT integration enabling Serial/Parallel Time metrics: - core.patch is the actual changes within the OpenMP runtime - Itt.patch is integration of ITT API (GPLv2 license is used) - autogenerated.patch - the list of autogenerated files as result of "autoreconf" launch within libgomp directory This proposal adds new "--disable-itt-instrumentation" configure option which completely disables (removes) all the tracing. The tracing is ON by default. OpenMP Imbalance time calculation is not included in this patch.