We may be talking about two different things, but as I pointed out in a 
previous email, it does show acct_gather_filesystem_luster plugin loading when 
the slurmctld first starts.  I'll show it again here, do you see this when you 
start the slurmctld?  I am just starting it with the "slurmctld -Dvvvv" option. 
  

>> slurmctld: debug3: Trying to load plugin 
>> /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre
>> .so
>> slurmctld: debug3: Success.
>> slurmctld: debug2: Reading acct_gather.conf file 
>> /app/slurm/nlk/install/master/etc/acct_gather.conf
>> slurmctld: AcctGatherProfile hdf5 plugin loaded

-----Original Message-----
From: Danny Auble [mailto:[email protected]] 
Sent: Wednesday, May 28, 2014 15:19
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup


It should only load when running a job.  And the output would be in the slurmd 
log.

If you have DebugFlags=filesystem you should get a little bit of debug, but 
debug3 should give you a message each time it reads from the file system on the 
frequency you chose.

Perhaps you can send your slurmd log during a running job.

Danny


On 05/28/2014 03:13 PM, Daniel Milroy wrote:
> Hi Nancy, et al.,
>
> I've changed the parameter to the correct value.  Unfortunately I still see 
> no evidence that the lustre plugin is loading successfully, even after 
> increasing to debug level 6.  The HDF5 plugin is loading correctly, though.
>
>
> Dan Milroy
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 2:49 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
>
> Documentation fixed for next release.
>
> Quoting Nancy Kritkausky <[email protected]>:
>
>> Hi Dan,
>> Actually, the syntax for the acct_gather.conf should be “lustre” not 
>> Filesystem.  This seems to work.  But, again I am only setting the 
>> parameters and starting the slurmctld, I have no way to test the 
>> actual functionality.  I think the documentation needs to be updated 
>> to match the functionality.  Good luck, Nancy
>>
>> ProfileHDF5DefaultProfile=lustre
>>
>>
>> From: Nancy Kritkausky [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 12:45
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Hi Dan,
>> You are right, if you have the following in your acct_gather.conf it 
>> doesn’t’ work.
>>
>> ProfileHDF5DefaultProfile=filesystem   <--------------------------REMOVE
>>
>> I have the following in my slurm.conf
>>
>> AcctGatherProfileType=acct_gather_profile/hdf5
>> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>>
>> And the following in my acct_gather.conf and it loads the libraries.
>>
>> # options to AcctGatherProfileType/hdf5 
>> ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data
>>
>> My error is slightly different from yours, but when I remove this 
>> line slurmctld works.  Both the acct_gather_profile_hdf5 and 
>> acct_gather_profile_filesystem_lustre are loaded.
>>
>>
>> slurmctld: debug3: Trying to load plugin 
>> /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.s
>> o
>> slurmctld: AcctGatherInfiniband NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre
>> .so
>> slurmctld: debug3: Success.
>> slurmctld: debug2: Reading acct_gather.conf file 
>> /app/slurm/nlk/install/master/etc/acct_gather.conf
>> slurmctld: AcctGatherProfile hdf5 plugin loaded
>>
>> Can you try this?  I don’t have a lustre file system to see if it 
>> works however.  But, could you try removing that line and see if you 
>> are able to start, Nancy
>>
>> From: Rod Schultz [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 11:22
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> This is very strange.
>>
>> AcctGatherProfileType=acct_gather_profile/hdf5
>>
>> Is exactly what I have in my slurm.conf and the plugin loads.
>>
>> I’m afraid I have to disengage. I will be on vacation starting 
>> tomorrow and have some other loose ends to deal with.
>>
>>
>> From: Daniel Milroy [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 11:12 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Hi Rod,
>>
>> No, these messages are reported by slurmctld, which fails to start if 
>> the parameters in my previous email are present in the configuration 
>> files.
>>
>>
>> Dan Milroy
>>
>> From: Rod Schultz [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 11:58 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Daniel,
>>
>> Are these messages still coming from salloc?
>>
>> If so, are the slurm libraries installed on the node upon which 
>> salloc has been launched.
>>
>> Try sbatch –wrap=”srun –profile=task hostname”
>>
>> This makes sure srun executes on the first node of the allocation.
>> The libraries should be there. –profile=task should work if you have 
>> job_acct_gather configured.
>>
>> Rod
>>
>>
>>
>> From: Daniel Milroy [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 10:44 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Nancy,
>>
>> Here’s the information you requested:
>>
>> slurm.conf
>>                  AcctGatherProfileType=acct_gather_profile/hdf5
>> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>>
>> acct_gather.conf
>>                  ProfileHDF5Dir=/curc/slurm/slurm/acct
>> ProfileHDF5Default=Filesystem
>>
>>
>> Dan Milroy
>>
>> From: Nancy Kritkausky [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 11:32 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Hello Daniel,
>> The syntax that is reported in the error message actually looks okay.  
>> Can you provide the part of your slurm.conf that defines the 
>> acctGatherProfileType and  your acct_gather.conf?  Maybe we can try 
>> and re-create the problem, Thanks, Nancy
>>
>> From: Daniel Milroy [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 10:08
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Hi Nancy and Rod,
>>
>> I believe that slurm was built properly on the runtime system.
>> Slurm was configured with the --with-hdf5=yes option, and config.log 
>> indicates that the hdf5 libs were found:
>>
>> configure:20180: checking hdf5.h usability
>> configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include 
>> conftest.c >&5
>> configure:20180: $? = 0
>> configure:20180: result: yes
>> configure:20180: checking hdf5.h presence
>> configure:20180: gcc -E -I/include conftest.c
>> configure:20180: $? = 0
>> configure:20180: result: yes
>> configure:20180: checking for hdf5.h
>> configure:20180: result: yes
>> configure:20188: checking for H5Fcreate in -lhdf5
>> configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include 
>> -L/usr/lib64  conftest.c -lhdf5  -lm -lz  -lhdf5 >&5
>> configure:20213: $? = 0
>> configure:20222: result: yes
>> configure:20234: checking for main in -lhdf5_hl
>> configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include 
>> -L/usr/lib64  conftest.c -lhdf5_hl  -lm -lz  -lhdf5 >&5
>> configure:20253: $? = 0
>> configure:20262: result: yes
>> configure:20274: checking for matching HDF5 Fortran wrapper
>> configure:20278: result: /usr/bin/h5fc
>>
>> The required shared object is in
>> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so.
>>
>>
>> Thank you,
>>
>> Dan Milroy
>>
>> From: Nancy Kritkausky [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 10:55 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Dan,
>> You can check your installation to make sure the library is there.
>> The name of the library is acct_gather_profile_hdf5.so.  It is 
>> normally installed under /usr/lib64/slurm.  But depending on your 
>> .configure is could be elsewhere, including /usr/share.  As Rod said, 
>> if hdf5 is not installed, it will not be built.
>> Hope this helps too,
>> Nancy
>> From: Rod Schultz [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 09:23
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Dan,
>>
>> Do you have HDF5 installed on your system? Both the runtime system 
>> and the system upon which you built slurm.
>>
>> At configure time, there is a dependency on hdf5 being installed.
>>
>> The first couple of error appear to be caused by not finding the 
>> library. This is probably the result of a build problem.
>>
>> The last few are continued parsing of account_gather.conf.
>> The parsing of this file involves calling parsers in each 
>> sub-account-gather plugin. If the plugin isn’t installed, items in 
>> the file are considered errors.
>>
>> Rod
>>
>>
>>
>> From: Daniel Milroy [mailto:[email protected]]
>> Sent: Wednesday, May 28, 2014 8:37 AM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Hi Danny,
>>
>> There wasn’t anything in the “Profiling Using HDF5 User Guide” that 
>> indicated that I should load the plugin via spank.  It was a result 
>> of research into enabling the plugin since various combinations of 
>> the parameters weren’t working.
>>
>> Removing the reference to the lustre acct_gather shared object in 
>> plugstack.conf and restarting the service yields:
>>
>> error: Couldn't find the specified plugin name for
>> acct_gather_profile/hdf5 looking at all files
>> error: cannot find acct_gather_profile plugin for 
>> acct_gather_profile/hdf5
>> fatal: ProfileHDF5Default can not be set to NotSet, please specify a 
>> valid option
>> error: Parsing error at unrecognized key: ProfileHDF5Dir
>> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
>> line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct"
>> error: Parsing error at unrecognized key: ProfileHDF5Default
>> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
>> line 2: "ProfileHDF5Default=Filesystem"
>>
>>
>> Regards,
>>
>> Dan Milroy
>>
>> From: Danny Auble [mailto:[email protected]]
>> Sent: Tuesday, May 27, 2014 12:29 PM
>> To: slurm-dev
>> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>>
>> Dan, I wouldn't expect spank would be needed to load this plugin.
>>
>> Try taking the line out of your plugstack.conf and see if that works 
>> for you.  Was there something in the documentation
>> (http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead you 
>> down this path?
>>
>> Danny
>> On 05/23/2014 03:59 PM, Daniel Milroy wrote:
>> Hello,
>>
>> I’ve been experiencing difficulties enabling the
>> AcctGatherProfileType/hdf5 plugin for the Lustre filesystem.  So far 
>> I’ve set the following parameters:
>>
>> slurm.conf
>>                  AcctGatherProfileType=acct_gather_profile/hdf5
>> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>>
>> acct_gather.conf
>>                  ProfileHDF5Dir=/curc/slurm/slurm/acct
>> ProfileHDF5Default=Filesystem
>>
>> plugstack.conf
>>                  required
>> /curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so
>>
>> Upon job submission, I receive the following error:
>> salloc: error: spank:
>> "/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so"
>> exports 0 symbols
>> salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed 
>> to load plugin 
>> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so.
>> Aborting.
>> salloc: error: Failed to initialize plugin stack
>>
>> Please let me know what I can do to properly enable this plugin.
>>
>>
>> Regards,
>>
>> Dan Milroy

Reply via email to