Hi,we are using slurm 14.11.9 with the hdf5 profile plugin. When profiling for a single entity (energy, network or lustre) using e.g.
#SBATCH --acctg-freq=energy=10 #SBATCH --profile=energy we finally get nice data in the hdf5 file. But combining two (or even more) entities like #SBATCH --acctg-freq=energy=10,network=10 #SBATCH --profile=energy,networkcauses the resulting hdf5 file to contain corrupt data. We checked that the correct data is given in the function put_hdf5_data (hdf5_api.c) when H5Dwrite is called. Since the hdf5 version of rhel6-x64 system rpm is not threadsafe, we compiled slurm again using a threadsafe version of hdf5, but still data in the file is wrong.
Does anyone else see this kind of problem?Moreover: is it even intended to use several profiling metrics in parallel? In each profile plugin the call to acct_gather_profile_g_add_sample_data is protected via pthread_mutex, but there is no guard across different plugins.
Thank you for help, Hendryk
smime.p7s
Description: S/MIME Cryptographic Signature
