On Thu, Jan 12, 2017 at 11:43:43AM -0600, Roland Haas wrote:
does anyone know if the floating point event counts reported by PAPI are summed over all threads inside of a MPI rank? Or is it only the count on thread 0?
From the documentation the answer seems to be "it depends":In order to support threaded operation, the operating system must save and restore the counter hardware upon context switches among different threads or processes. However, OpenMP hides the concept of user and kernel level threads from the user. As a result, unless the user explicitly takes action to bind their thread to a kernel thread (sometimes called a Light Weight Process or LWP), the counts returned by PAPI will not necessarily be accurate.
To address this situation, PAPI treats every platform as if it is running on top of kernel threads.
Unbound, user level threads that call PAPI will function properly, but will most likely return unreliable or inaccurate event counts.
Fortunately, in the batch environments of the HPC community, there is no significant advantage to user level threads and thus kernel level threads are the default.
Frank
signature.asc
Description: Digital signature
_______________________________________________ Users mailing list [email protected] http://cactuscode.org/mailman/listinfo/users
