It should only load when running a job. And the output would be in the slurmd log.

If you have DebugFlags=filesystem you should get a little bit of debug, but debug3 should give you a message each time it reads from the file system on the frequency you chose.

Perhaps you can send your slurmd log during a running job.

Danny


On 05/28/2014 03:13 PM, Daniel Milroy wrote:
Hi Nancy, et al.,

I've changed the parameter to the correct value.  Unfortunately I still see no 
evidence that the lustre plugin is loading successfully, even after increasing 
to debug level 6.  The HDF5 plugin is loading correctly, though.


Dan Milroy

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 2:49 PM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup


Documentation fixed for next release.

Quoting Nancy Kritkausky <[email protected]>:

Hi Dan,
Actually, the syntax for the acct_gather.conf should be “lustre” not
Filesystem.  This seems to work.  But, again I am only setting the
parameters and starting the slurmctld, I have no way to test the
actual functionality.  I think the documentation needs to be updated
to match the functionality.  Good luck, Nancy

ProfileHDF5DefaultProfile=lustre


From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 12:45
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Dan,
You are right, if you have the following in your acct_gather.conf it
doesn’t’ work.

ProfileHDF5DefaultProfile=filesystem   <--------------------------REMOVE

I have the following in my slurm.conf

AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre

And the following in my acct_gather.conf and it loads the libraries.

# options to AcctGatherProfileType/hdf5
ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data

My error is slightly different from yours, but when I remove this line
slurmctld works.  Both the acct_gather_profile_hdf5 and
acct_gather_profile_filesystem_lustre are loaded.


slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.so
slurmctld: AcctGatherInfiniband NONE plugin loaded
slurmctld: debug3: Success.
slurmctld: debug3: Trying to load plugin
/app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre.so
slurmctld: debug3: Success.
slurmctld: debug2: Reading acct_gather.conf file
/app/slurm/nlk/install/master/etc/acct_gather.conf
slurmctld: AcctGatherProfile hdf5 plugin loaded

Can you try this?  I don’t have a lustre file system to see if it
works however.  But, could you try removing that line and see if you
are able to start,
Nancy

From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:22
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

This is very strange.

AcctGatherProfileType=acct_gather_profile/hdf5

Is exactly what I have in my slurm.conf and the plugin loads.

I’m afraid I have to disengage. I will be on vacation starting
tomorrow and have some other loose ends to deal with.


From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:12 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Rod,

No, these messages are reported by slurmctld, which fails to start
if the parameters in my previous email are present in the
configuration files.


Dan Milroy

From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:58 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Daniel,

Are these messages still coming from salloc?

If so, are the slurm libraries installed on the node upon which
salloc has been launched.

Try sbatch –wrap=”srun –profile=task hostname”

This makes sure srun executes on the first node of the allocation.
The libraries should be there. –profile=task should work if you have
job_acct_gather configured.

Rod



From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:44 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Nancy,

Here’s the information you requested:

slurm.conf
                 AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre

acct_gather.conf
                 ProfileHDF5Dir=/curc/slurm/slurm/acct
ProfileHDF5Default=Filesystem


Dan Milroy

From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 11:32 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hello Daniel,
The syntax that is reported in the error message actually looks
okay.  Can you provide the part of your slurm.conf that defines the
acctGatherProfileType and  your acct_gather.conf?  Maybe we can try
and re-create the problem,
Thanks,
Nancy

From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:08
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Nancy and Rod,

I believe that slurm was built properly on the runtime system.
Slurm was configured with the --with-hdf5=yes option, and config.log
indicates that the hdf5 libs were found:

configure:20180: checking hdf5.h usability
configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking hdf5.h presence
configure:20180: gcc -E -I/include conftest.c
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking for hdf5.h
configure:20180: result: yes
configure:20188: checking for H5Fcreate in -lhdf5
configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64  conftest.c -lhdf5  -lm -lz  -lhdf5 >&5
configure:20213: $? = 0
configure:20222: result: yes
configure:20234: checking for main in -lhdf5_hl
configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64  conftest.c -lhdf5_hl  -lm -lz  -lhdf5 >&5
configure:20253: $? = 0
configure:20262: result: yes
configure:20274: checking for matching HDF5 Fortran wrapper
configure:20278: result: /usr/bin/h5fc

The required shared object is in
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so.


Thank you,

Dan Milroy

From: Nancy Kritkausky [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 10:55 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan,
You can check your installation to make sure the library is there.
The name of the library is acct_gather_profile_hdf5.so.  It is
normally installed under /usr/lib64/slurm.  But depending on your
.configure is could be elsewhere, including /usr/share.  As Rod
said, if hdf5 is not installed, it will not be built.
Hope this helps too,
Nancy
From: Rod Schultz [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 09:23
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan,

Do you have HDF5 installed on your system? Both the runtime system
and the system upon which you built slurm.

At configure time, there is a dependency on hdf5 being installed.

The first couple of error appear to be caused by not finding the
library. This is probably the result of a build problem.

The last few are continued parsing of account_gather.conf.
The parsing of this file involves calling parsers in each
sub-account-gather plugin. If the plugin isn’t installed, items in
the file are considered errors.

Rod



From: Daniel Milroy [mailto:[email protected]]
Sent: Wednesday, May 28, 2014 8:37 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Danny,

There wasn’t anything in the “Profiling Using HDF5 User Guide” that
indicated that I should load the plugin via spank.  It was a result
of research into enabling the plugin since various combinations of
the parameters weren’t working.

Removing the reference to the lustre acct_gather shared object in
plugstack.conf and restarting the service yields:

error: Couldn't find the specified plugin name for
acct_gather_profile/hdf5 looking at all files
error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5
fatal: ProfileHDF5Default can not be set to NotSet, please specify a
valid option
error: Parsing error at unrecognized key: ProfileHDF5Dir
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct"
error: Parsing error at unrecognized key: ProfileHDF5Default
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 2: "ProfileHDF5Default=Filesystem"


Regards,

Dan Milroy

From: Danny Auble [mailto:[email protected]]
Sent: Tuesday, May 27, 2014 12:29 PM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan, I wouldn't expect spank would be needed to load this plugin.

Try taking the line out of your plugstack.conf and see if that works
for you.  Was there something in the documentation
(http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead
you down this path?

Danny
On 05/23/2014 03:59 PM, Daniel Milroy wrote:
Hello,

I’ve been experiencing difficulties enabling the
AcctGatherProfileType/hdf5 plugin for the Lustre filesystem.  So far
I’ve set the following parameters:

slurm.conf
                 AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre

acct_gather.conf
                 ProfileHDF5Dir=/curc/slurm/slurm/acct
ProfileHDF5Default=Filesystem

plugstack.conf
                 required
/curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so

Upon job submission, I receive the following error:
salloc: error: spank:
"/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so"
exports 0 symbols
salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed
to load plugin
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so.
Aborting.
salloc: error: Failed to initialize plugin stack

Please let me know what I can do to properly enable this plugin.


Regards,

Dan Milroy

Reply via email to