Hi Nancy,
Removing ProfileHDF5DefaultProfile=filesystem allows the HDF5 plugin
to load, but there isn’t anything logged about the lustre filesystem
plugin.
Thank you,
Dan Milroy
From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 1:42 PM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Hi Dan,
You are right, if you have the following in your acct_gather.conf it
doesn’t’ work.
ProfileHDF5DefaultProfile=filesystem <--------------------------REMOVE
I have the following in my slurm.conf
AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre
And the following in my acct_gather.conf and it loads the libraries.
# options to AcctGatherProfileType/hdf5
ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data
My error is slightly different from yours, but when I remove this
line slurmctld works. Both the acct_gather_profile_hdf5 and
acct_gather_profile_filesystem_lustre are loaded.
slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.so
slurmctld: AcctGatherInfiniband NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre.so
slurmctld: debug3: Success.
slurmctld: debug2: Reading acct_gather.conf file
/app/slurm/nlk/install/master/etc/acct_gather.conf
slurmctld: AcctGatherProfile hdf5 plugin loaded
Can you try this? I don’t have a lustre file system to see if it
works however. But, could you try removing that line and see if you
are able to start,
Nancy
From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:22
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
This is very strange.
AcctGatherProfileType=acct_gather_profile/hdf5
Is exactly what I have in my slurm.conf and the plugin loads.
I’m afraid I have to disengage. I will be on vacation starting
tomorrow and have some other loose ends to deal with.
From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:12 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Hi Rod,
No, these messages are reported by slurmctld, which fails to start
if the parameters in my previous email are present in the
configuration files.
Dan Milroy
From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:58 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Daniel,
Are these messages still coming from salloc?
If so, are the slurm libraries installed on the node upon which
salloc has been launched.
Try sbatch –wrap=”srun –profile=task hostname”
This makes sure srun executes on the first node of the allocation.
The libraries should be there. –profile=task should work if you have
job_acct_gather configured.
Rod
From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:44 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Nancy,
Here’s the information you requested:
slurm.conf
AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre
acct_gather.conf
ProfileHDF5Dir=/curc/slurm/slurm/acct
ProfileHDF5Default=Filesystem
Dan Milroy
From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:32 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Hello Daniel,
The syntax that is reported in the error message actually looks
okay. Can you provide the part of your slurm.conf that defines the
acctGatherProfileType and your acct_gather.conf? Maybe we can try
and re-create the problem,
Thanks,
Nancy
From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:08
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Hi Nancy and Rod,
I believe that slurm was built properly on the runtime system.
Slurm was configured with the --with-hdf5=yes option, and config.log
indicates that the hdf5 libs were found:
configure:20180: checking hdf5.h usability
configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking hdf5.h presence
configure:20180: gcc -E -I/include conftest.c
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking for hdf5.h
configure:20180: result: yes
configure:20188: checking for H5Fcreate in -lhdf5
configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64 conftest.c -lhdf5 -lm -lz -lhdf5 >&5
configure:20213: $? = 0
configure:20222: result: yes
configure:20234: checking for main in -lhdf5_hl
configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64 conftest.c -lhdf5_hl -lm -lz -lhdf5 >&5
configure:20253: $? = 0
configure:20262: result: yes
configure:20274: checking for matching HDF5 Fortran wrapper
configure:20278: result: /usr/bin/h5fc
The required shared object is in
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so.
Thank you,
Dan Milroy
From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:55 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Dan,
You can check your installation to make sure the library is there.
The name of the library is acct_gather_profile_hdf5.so. It is
normally installed under /usr/lib64/slurm. But depending on your
.configure is could be elsewhere, including /usr/share. As Rod
said, if hdf5 is not installed, it will not be built.
Hope this helps too,
Nancy
From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 09:23
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Dan,
Do you have HDF5 installed on your system? Both the runtime system
and the system upon which you built slurm.
At configure time, there is a dependency on hdf5 being installed.
The first couple of error appear to be caused by not finding the
library. This is probably the result of a build problem.
The last few are continued parsing of account_gather.conf.
The parsing of this file involves calling parsers in each
sub-account-gather plugin. If the plugin isn’t installed, items in
the file are considered errors.
Rod
From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 8:37 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Hi Danny,
There wasn’t anything in the “Profiling Using HDF5 User Guide” that
indicated that I should load the plugin via spank. It was a result
of research into enabling the plugin since various combinations of
the parameters weren’t working.
Removing the reference to the lustre acct_gather shared object in
plugstack.conf and restarting the service yields:
error: Couldn't find the specified plugin name for
acct_gather_profile/hdf5 looking at all files
error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5
fatal: ProfileHDF5Default can not be set to NotSet, please specify a
valid option
error: Parsing error at unrecognized key: ProfileHDF5Dir
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct"
error: Parsing error at unrecognized key: ProfileHDF5Default
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 2: "ProfileHDF5Default=Filesystem"
Regards,
Dan Milroy
From: Danny Auble [mailto:[email protected]]
Sent: Tuesday, May 27, 2014 12:29 PM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
Dan, I wouldn't expect spank would be needed to load this plugin.
Try taking the line out of your plugstack.conf and see if that works
for you. Was there something in the documentation
(http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead
you down this path?
Danny
On 05/23/2014 03:59 PM, Daniel Milroy wrote:
Hello,
I’ve been experiencing difficulties enabling the
AcctGatherProfileType/hdf5 plugin for the Lustre filesystem. So far
I’ve set the following parameters:
slurm.conf
AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre
acct_gather.conf
ProfileHDF5Dir=/curc/slurm/slurm/acct
ProfileHDF5Default=Filesystem
plugstack.conf
required
/curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so
Upon job submission, I receive the following error:
salloc: error: spank:
"/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so"
exports 0 symbols
salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed
to load plugin
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so.
Aborting.
salloc: error: Failed to initialize plugin stack
Please let me know what I can do to properly enable this plugin.
Regards,
Dan Milroy