Hi Nancy, et al., I've changed the parameter to the correct value. Unfortunately I still see no evidence that the lustre plugin is loading successfully, even after increasing to debug level 6. The HDF5 plugin is loading correctly, though.
Dan Milroy -----Original Message----- From: [email protected] [mailto:[email protected]] Sent: Wednesday, May 28, 2014 2:49 PM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Documentation fixed for next release. Quoting Nancy Kritkausky <[email protected]>: > Hi Dan, > Actually, the syntax for the acct_gather.conf should be “lustre” not > Filesystem. This seems to work. But, again I am only setting the > parameters and starting the slurmctld, I have no way to test the > actual functionality. I think the documentation needs to be updated > to match the functionality. Good luck, Nancy > > ProfileHDF5DefaultProfile=lustre > > > From: Nancy Kritkausky [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 12:45 > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Hi Dan, > You are right, if you have the following in your acct_gather.conf it > doesn’t’ work. > > ProfileHDF5DefaultProfile=filesystem <--------------------------REMOVE > > I have the following in my slurm.conf > > AcctGatherProfileType=acct_gather_profile/hdf5 > AcctGatherFilesystemType=acct_gather_filesystem/lustre > > And the following in my acct_gather.conf and it loads the libraries. > > # options to AcctGatherProfileType/hdf5 > ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data > > My error is slightly different from yours, but when I remove this line > slurmctld works. Both the acct_gather_profile_hdf5 and > acct_gather_profile_filesystem_lustre are loaded. > > > slurmctld: debug3: Trying to load plugin > /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so > slurmctld: debug3: Success. > slurmctld: debug3: Trying to load plugin > /app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.so > slurmctld: AcctGatherInfiniband NONE plugin loaded > slurmctld: debug3: Success. > slurmctld: debug3: Trying to load plugin > /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre.so > slurmctld: debug3: Success. > slurmctld: debug2: Reading acct_gather.conf file > /app/slurm/nlk/install/master/etc/acct_gather.conf > slurmctld: AcctGatherProfile hdf5 plugin loaded > > Can you try this? I don’t have a lustre file system to see if it > works however. But, could you try removing that line and see if you > are able to start, > Nancy > > From: Rod Schultz [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 11:22 > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > This is very strange. > > AcctGatherProfileType=acct_gather_profile/hdf5 > > Is exactly what I have in my slurm.conf and the plugin loads. > > I’m afraid I have to disengage. I will be on vacation starting > tomorrow and have some other loose ends to deal with. > > > From: Daniel Milroy [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 11:12 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Hi Rod, > > No, these messages are reported by slurmctld, which fails to start > if the parameters in my previous email are present in the > configuration files. > > > Dan Milroy > > From: Rod Schultz [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 11:58 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Daniel, > > Are these messages still coming from salloc? > > If so, are the slurm libraries installed on the node upon which > salloc has been launched. > > Try sbatch –wrap=”srun –profile=task hostname” > > This makes sure srun executes on the first node of the allocation. > The libraries should be there. –profile=task should work if you have > job_acct_gather configured. > > Rod > > > > From: Daniel Milroy [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 10:44 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Nancy, > > Here’s the information you requested: > > slurm.conf > AcctGatherProfileType=acct_gather_profile/hdf5 > AcctGatherFilesystemType=acct_gather_filesystem/lustre > > acct_gather.conf > ProfileHDF5Dir=/curc/slurm/slurm/acct > ProfileHDF5Default=Filesystem > > > Dan Milroy > > From: Nancy Kritkausky [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 11:32 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Hello Daniel, > The syntax that is reported in the error message actually looks > okay. Can you provide the part of your slurm.conf that defines the > acctGatherProfileType and your acct_gather.conf? Maybe we can try > and re-create the problem, > Thanks, > Nancy > > From: Daniel Milroy [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 10:08 > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Hi Nancy and Rod, > > I believe that slurm was built properly on the runtime system. > Slurm was configured with the --with-hdf5=yes option, and config.log > indicates that the hdf5 libs were found: > > configure:20180: checking hdf5.h usability > configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5 > configure:20180: $? = 0 > configure:20180: result: yes > configure:20180: checking hdf5.h presence > configure:20180: gcc -E -I/include conftest.c > configure:20180: $? = 0 > configure:20180: result: yes > configure:20180: checking for hdf5.h > configure:20180: result: yes > configure:20188: checking for H5Fcreate in -lhdf5 > configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse > -I/include -L/usr/lib64 conftest.c -lhdf5 -lm -lz -lhdf5 >&5 > configure:20213: $? = 0 > configure:20222: result: yes > configure:20234: checking for main in -lhdf5_hl > configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse > -I/include -L/usr/lib64 conftest.c -lhdf5_hl -lm -lz -lhdf5 >&5 > configure:20253: $? = 0 > configure:20262: result: yes > configure:20274: checking for matching HDF5 Fortran wrapper > configure:20278: result: /usr/bin/h5fc > > The required shared object is in > /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so. > > > Thank you, > > Dan Milroy > > From: Nancy Kritkausky [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 10:55 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Dan, > You can check your installation to make sure the library is there. > The name of the library is acct_gather_profile_hdf5.so. It is > normally installed under /usr/lib64/slurm. But depending on your > .configure is could be elsewhere, including /usr/share. As Rod > said, if hdf5 is not installed, it will not be built. > Hope this helps too, > Nancy > From: Rod Schultz [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 09:23 > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Dan, > > Do you have HDF5 installed on your system? Both the runtime system > and the system upon which you built slurm. > > At configure time, there is a dependency on hdf5 being installed. > > The first couple of error appear to be caused by not finding the > library. This is probably the result of a build problem. > > The last few are continued parsing of account_gather.conf. > The parsing of this file involves calling parsers in each > sub-account-gather plugin. If the plugin isn’t installed, items in > the file are considered errors. > > Rod > > > > From: Daniel Milroy [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 8:37 AM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Hi Danny, > > There wasn’t anything in the “Profiling Using HDF5 User Guide” that > indicated that I should load the plugin via spank. It was a result > of research into enabling the plugin since various combinations of > the parameters weren’t working. > > Removing the reference to the lustre acct_gather shared object in > plugstack.conf and restarting the service yields: > > error: Couldn't find the specified plugin name for > acct_gather_profile/hdf5 looking at all files > error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5 > fatal: ProfileHDF5Default can not be set to NotSet, please specify a > valid option > error: Parsing error at unrecognized key: ProfileHDF5Dir > error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf > line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct" > error: Parsing error at unrecognized key: ProfileHDF5Default > error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf > line 2: "ProfileHDF5Default=Filesystem" > > > Regards, > > Dan Milroy > > From: Danny Auble [mailto:[email protected]] > Sent: Tuesday, May 27, 2014 12:29 PM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > Dan, I wouldn't expect spank would be needed to load this plugin. > > Try taking the line out of your plugstack.conf and see if that works > for you. Was there something in the documentation > (http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead > you down this path? > > Danny > On 05/23/2014 03:59 PM, Daniel Milroy wrote: > Hello, > > I’ve been experiencing difficulties enabling the > AcctGatherProfileType/hdf5 plugin for the Lustre filesystem. So far > I’ve set the following parameters: > > slurm.conf > AcctGatherProfileType=acct_gather_profile/hdf5 > AcctGatherFilesystemType=acct_gather_filesystem/lustre > > acct_gather.conf > ProfileHDF5Dir=/curc/slurm/slurm/acct > ProfileHDF5Default=Filesystem > > plugstack.conf > required > /curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so > > Upon job submission, I receive the following error: > salloc: error: spank: > "/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so" > exports 0 symbols > salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed > to load plugin > /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so. > Aborting. > salloc: error: Failed to initialize plugin stack > > Please let me know what I can do to properly enable this plugin. > > > Regards, > > Dan Milroy
