Hi,

we are using slurm 14.11.9 with the hdf5 profile plugin. When profiling for a single entity (energy, network or lustre) using e.g.

#SBATCH --acctg-freq=energy=10
#SBATCH --profile=energy

we finally get nice data in the hdf5 file.
But combining two (or even more) entities like

#SBATCH --acctg-freq=energy=10,network=10
#SBATCH --profile=energy,network

causes the resulting hdf5 file to contain corrupt data. We checked that the correct data is given in the function put_hdf5_data (hdf5_api.c) when H5Dwrite is called. Since the hdf5 version of rhel6-x64 system rpm is not threadsafe, we compiled slurm again using a threadsafe version of hdf5, but still data in the file is wrong.

Does anyone else see this kind of problem?

Moreover: is it even intended to use several profiling metrics in parallel? In each profile plugin the call to acct_gather_profile_g_add_sample_data is protected via pthread_mutex, but there is no guard across different plugins.

Thank you for help,
Hendryk

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to