Roland PAPI has a mechanism to work either on a process or on a single thread. I believe Cactus switches PAPI to threaded mode by default. This works only for operating system threads (OpenMP), not for user-level threads (FunHPC). I don't recall the details, but the PAPI documentation should describe this in its API documentation.
In Cactus, we initially run some tests (probably a DGEMM) to check whether PAPI's numbers are consistent with what we expect. This might help answer this question. I think that handling multi-threading correctly requires the operating system to cooperate. On an HPC system, the kernel might have been modified and cause problems. This is just a wild guess, though. -erik On Thu, Jan 12, 2017 at 12:43 PM, Roland Haas <rh...@illinois.edu> wrote: > Hello all, > > does anyone know if the floating point event counts reported by PAPI > are summed over all threads inside of a MPI rank? Or is it only the > count on thread 0? > > I would hope for the former but suspect the latter. > > That is, if I was to run the same job with using ncores > cores and would run once with nranks MPI ranks and nthreads threads > per rank and onece with ncores MPI ranks and 1 thread per rank, would > the sum over all *reported* event counts of all ranks (roughly, > neglecting ghost zones etc) agree? > > Yours, > Roland > > -- > My email is as private as my paper mail. I therefore support encrypting > and signing email messages. Get my PGP key from http://keys.gnupg.net. > > _______________________________________________ > Users mailing list > Users@cactuscode.org > http://cactuscode.org/mailman/listinfo/users > -- Erik Schnetter <schnet...@cct.lsu.edu> http://www.perimeterinstitute.ca/personal/eschnetter/ _______________________________________________ Users mailing list Users@cactuscode.org http://cactuscode.org/mailman/listinfo/users