Thanks Moe and Danny!

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: Wednesday, May 28, 2014 13:50
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup


Documentation fixed for next release.

Quoting Nancy Kritkausky <[email protected]>:

> Hi Dan,
> Actually, the syntax for the acct_gather.conf should be “lustre” not 
> Filesystem.  This seems to work.  But, again I am only setting the 
> parameters and starting the slurmctld, I have no way to test the 
> actual functionality.  I think the documentation needs to be updated 
> to match the functionality.  Good luck, Nancy
>
> ProfileHDF5DefaultProfile=lustre
>
>
> From: Nancy Kritkausky [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 12:45
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Hi Dan,
> You are right, if you have the following in your acct_gather.conf it 
> doesn’t’ work.
>
> ProfileHDF5DefaultProfile=filesystem   <--------------------------REMOVE
>
> I have the following in my slurm.conf
>
> AcctGatherProfileType=acct_gather_profile/hdf5
> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>
> And the following in my acct_gather.conf and it loads the libraries.
>
> # options to AcctGatherProfileType/hdf5 
> ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data
>
> My error is slightly different from yours, but when I remove this line 
> slurmctld works.  Both the acct_gather_profile_hdf5 and 
> acct_gather_profile_filesystem_lustre are loaded.
>
>
> slurmctld: debug3: Trying to load plugin  
> /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so
> slurmctld: debug3: Success.
> slurmctld: debug3: Trying to load plugin  
> /app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.so
> slurmctld: AcctGatherInfiniband NONE plugin loaded
> slurmctld: debug3: Success.
> slurmctld: debug3: Trying to load plugin  
> /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre.so
> slurmctld: debug3: Success.
> slurmctld: debug2: Reading acct_gather.conf file  
> /app/slurm/nlk/install/master/etc/acct_gather.conf
> slurmctld: AcctGatherProfile hdf5 plugin loaded
>
> Can you try this?  I don’t have a lustre file system to see if it  
> works however.  But, could you try removing that line and see if you  
> are able to start,
> Nancy
>
> From: Rod Schultz [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 11:22
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> This is very strange.
>
> AcctGatherProfileType=acct_gather_profile/hdf5
>
> Is exactly what I have in my slurm.conf and the plugin loads.
>
> I’m afraid I have to disengage. I will be on vacation starting  
> tomorrow and have some other loose ends to deal with.
>
>
> From: Daniel Milroy [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 11:12 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Hi Rod,
>
> No, these messages are reported by slurmctld, which fails to start  
> if the parameters in my previous email are present in the  
> configuration files.
>
>
> Dan Milroy
>
> From: Rod Schultz [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 11:58 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Daniel,
>
> Are these messages still coming from salloc?
>
> If so, are the slurm libraries installed on the node upon which  
> salloc has been launched.
>
> Try sbatch –wrap=”srun –profile=task hostname”
>
> This makes sure srun executes on the first node of the allocation.  
> The libraries should be there. –profile=task should work if you have  
> job_acct_gather configured.
>
> Rod
>
>
>
> From: Daniel Milroy [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 10:44 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Nancy,
>
> Here’s the information you requested:
>
> slurm.conf
>                 AcctGatherProfileType=acct_gather_profile/hdf5
> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>
> acct_gather.conf
>                 ProfileHDF5Dir=/curc/slurm/slurm/acct
> ProfileHDF5Default=Filesystem
>
>
> Dan Milroy
>
> From: Nancy Kritkausky [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 11:32 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Hello Daniel,
> The syntax that is reported in the error message actually looks  
> okay.  Can you provide the part of your slurm.conf that defines the  
> acctGatherProfileType and  your acct_gather.conf?  Maybe we can try  
> and re-create the problem,
> Thanks,
> Nancy
>
> From: Daniel Milroy [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 10:08
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Hi Nancy and Rod,
>
> I believe that slurm was built properly on the runtime system.   
> Slurm was configured with the --with-hdf5=yes option, and config.log  
> indicates that the hdf5 libs were found:
>
> configure:20180: checking hdf5.h usability
> configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5
> configure:20180: $? = 0
> configure:20180: result: yes
> configure:20180: checking hdf5.h presence
> configure:20180: gcc -E -I/include conftest.c
> configure:20180: $? = 0
> configure:20180: result: yes
> configure:20180: checking for hdf5.h
> configure:20180: result: yes
> configure:20188: checking for H5Fcreate in -lhdf5
> configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse  
> -I/include -L/usr/lib64  conftest.c -lhdf5  -lm -lz  -lhdf5 >&5
> configure:20213: $? = 0
> configure:20222: result: yes
> configure:20234: checking for main in -lhdf5_hl
> configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse  
> -I/include -L/usr/lib64  conftest.c -lhdf5_hl  -lm -lz  -lhdf5 >&5
> configure:20253: $? = 0
> configure:20262: result: yes
> configure:20274: checking for matching HDF5 Fortran wrapper
> configure:20278: result: /usr/bin/h5fc
>
> The required shared object is in  
> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so.
>
>
> Thank you,
>
> Dan Milroy
>
> From: Nancy Kritkausky [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 10:55 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Dan,
> You can check your installation to make sure the library is there.   
> The name of the library is acct_gather_profile_hdf5.so.  It is  
> normally installed under /usr/lib64/slurm.  But depending on your  
> .configure is could be elsewhere, including /usr/share.  As Rod  
> said, if hdf5 is not installed, it will not be built.
> Hope this helps too,
> Nancy
> From: Rod Schultz [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 09:23
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Dan,
>
> Do you have HDF5 installed on your system? Both the runtime system  
> and the system upon which you built slurm.
>
> At configure time, there is a dependency on hdf5 being installed.
>
> The first couple of error appear to be caused by not finding the  
> library. This is probably the result of a build problem.
>
> The last few are continued parsing of account_gather.conf.
> The parsing of this file involves calling parsers in each  
> sub-account-gather plugin. If the plugin isn’t installed, items in  
> the file are considered errors.
>
> Rod
>
>
>
> From: Daniel Milroy [mailto:[email protected]]
> Sent: Wednesday, May 28, 2014 8:37 AM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Hi Danny,
>
> There wasn’t anything in the “Profiling Using HDF5 User Guide” that  
> indicated that I should load the plugin via spank.  It was a result  
> of research into enabling the plugin since various combinations of  
> the parameters weren’t working.
>
> Removing the reference to the lustre acct_gather shared object in  
> plugstack.conf and restarting the service yields:
>
> error: Couldn't find the specified plugin name for  
> acct_gather_profile/hdf5 looking at all files
> error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5
> fatal: ProfileHDF5Default can not be set to NotSet, please specify a  
> valid option
> error: Parsing error at unrecognized key: ProfileHDF5Dir
> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf  
> line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct"
> error: Parsing error at unrecognized key: ProfileHDF5Default
> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf  
> line 2: "ProfileHDF5Default=Filesystem"
>
>
> Regards,
>
> Dan Milroy
>
> From: Danny Auble [mailto:[email protected]]
> Sent: Tuesday, May 27, 2014 12:29 PM
> To: slurm-dev
> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup
>
> Dan, I wouldn't expect spank would be needed to load this plugin.
>
> Try taking the line out of your plugstack.conf and see if that works  
> for you.  Was there something in the documentation  
> (http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead  
> you down this path?
>
> Danny
> On 05/23/2014 03:59 PM, Daniel Milroy wrote:
> Hello,
>
> I’ve been experiencing difficulties enabling the  
> AcctGatherProfileType/hdf5 plugin for the Lustre filesystem.  So far  
> I’ve set the following parameters:
>
> slurm.conf
>                 AcctGatherProfileType=acct_gather_profile/hdf5
> AcctGatherFilesystemType=acct_gather_filesystem/lustre
>
> acct_gather.conf
>                 ProfileHDF5Dir=/curc/slurm/slurm/acct
> ProfileHDF5Default=Filesystem
>
> plugstack.conf
>                 required  
> /curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so
>
> Upon job submission, I receive the following error:
> salloc: error: spank:  
> "/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so"  
> exports 0 symbols
> salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed  
> to load plugin  
> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so.  
> Aborting.
> salloc: error: Failed to initialize plugin stack
>
> Please let me know what I can do to properly enable this plugin.
>
>
> Regards,
>
> Dan Milroy

Reply via email to