We may be talking about two different things, but as I pointed out in a previous email, it does show acct_gather_filesystem_luster plugin loading when the slurmctld first starts. I'll show it again here, do you see this when you start the slurmctld? I am just starting it with the "slurmctld -Dvvvv" option.
>> slurmctld: debug3: Trying to load plugin >> /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so >> slurmctld: debug3: Success. >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre >> .so >> slurmctld: debug3: Success. >> slurmctld: debug2: Reading acct_gather.conf file >> /app/slurm/nlk/install/master/etc/acct_gather.conf >> slurmctld: AcctGatherProfile hdf5 plugin loaded -----Original Message----- From: Danny Auble [mailto:[email protected]] Sent: Wednesday, May 28, 2014 15:19 To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup It should only load when running a job. And the output would be in the slurmd log. If you have DebugFlags=filesystem you should get a little bit of debug, but debug3 should give you a message each time it reads from the file system on the frequency you chose. Perhaps you can send your slurmd log during a running job. Danny On 05/28/2014 03:13 PM, Daniel Milroy wrote: > Hi Nancy, et al., > > I've changed the parameter to the correct value. Unfortunately I still see > no evidence that the lustre plugin is loading successfully, even after > increasing to debug level 6. The HDF5 plugin is loading correctly, though. > > > Dan Milroy > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Wednesday, May 28, 2014 2:49 PM > To: slurm-dev > Subject: [slurm-dev] Re: HDF5 Profile Plugin setup > > > Documentation fixed for next release. > > Quoting Nancy Kritkausky <[email protected]>: > >> Hi Dan, >> Actually, the syntax for the acct_gather.conf should be “lustre” not >> Filesystem. This seems to work. But, again I am only setting the >> parameters and starting the slurmctld, I have no way to test the >> actual functionality. I think the documentation needs to be updated >> to match the functionality. Good luck, Nancy >> >> ProfileHDF5DefaultProfile=lustre >> >> >> From: Nancy Kritkausky [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 12:45 >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Hi Dan, >> You are right, if you have the following in your acct_gather.conf it >> doesn’t’ work. >> >> ProfileHDF5DefaultProfile=filesystem <--------------------------REMOVE >> >> I have the following in my slurm.conf >> >> AcctGatherProfileType=acct_gather_profile/hdf5 >> AcctGatherFilesystemType=acct_gather_filesystem/lustre >> >> And the following in my acct_gather.conf and it loads the libraries. >> >> # options to AcctGatherProfileType/hdf5 >> ProfileHDF5Dir=/app/slurm/nlk/logs/nlk-slurm/profile_data >> >> My error is slightly different from yours, but when I remove this >> line slurmctld works. Both the acct_gather_profile_hdf5 and >> acct_gather_profile_filesystem_lustre are loaded. >> >> >> slurmctld: debug3: Trying to load plugin >> /app/slurm/nlk/install/master/lib/slurm/acct_gather_profile_hdf5.so >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /app/slurm/nlk/install/master/lib/slurm/acct_gather_infiniband_none.s >> o >> slurmctld: AcctGatherInfiniband NONE plugin loaded >> slurmctld: debug3: Success. >> slurmctld: debug3: Trying to load plugin >> /app/slurm/nlk/install/master/lib/slurm/acct_gather_filesystem_lustre >> .so >> slurmctld: debug3: Success. >> slurmctld: debug2: Reading acct_gather.conf file >> /app/slurm/nlk/install/master/etc/acct_gather.conf >> slurmctld: AcctGatherProfile hdf5 plugin loaded >> >> Can you try this? I don’t have a lustre file system to see if it >> works however. But, could you try removing that line and see if you >> are able to start, Nancy >> >> From: Rod Schultz [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 11:22 >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> This is very strange. >> >> AcctGatherProfileType=acct_gather_profile/hdf5 >> >> Is exactly what I have in my slurm.conf and the plugin loads. >> >> I’m afraid I have to disengage. I will be on vacation starting >> tomorrow and have some other loose ends to deal with. >> >> >> From: Daniel Milroy [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 11:12 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Hi Rod, >> >> No, these messages are reported by slurmctld, which fails to start if >> the parameters in my previous email are present in the configuration >> files. >> >> >> Dan Milroy >> >> From: Rod Schultz [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 11:58 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Daniel, >> >> Are these messages still coming from salloc? >> >> If so, are the slurm libraries installed on the node upon which >> salloc has been launched. >> >> Try sbatch –wrap=”srun –profile=task hostname” >> >> This makes sure srun executes on the first node of the allocation. >> The libraries should be there. –profile=task should work if you have >> job_acct_gather configured. >> >> Rod >> >> >> >> From: Daniel Milroy [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 10:44 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Nancy, >> >> Here’s the information you requested: >> >> slurm.conf >> AcctGatherProfileType=acct_gather_profile/hdf5 >> AcctGatherFilesystemType=acct_gather_filesystem/lustre >> >> acct_gather.conf >> ProfileHDF5Dir=/curc/slurm/slurm/acct >> ProfileHDF5Default=Filesystem >> >> >> Dan Milroy >> >> From: Nancy Kritkausky [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 11:32 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Hello Daniel, >> The syntax that is reported in the error message actually looks okay. >> Can you provide the part of your slurm.conf that defines the >> acctGatherProfileType and your acct_gather.conf? Maybe we can try >> and re-create the problem, Thanks, Nancy >> >> From: Daniel Milroy [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 10:08 >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Hi Nancy and Rod, >> >> I believe that slurm was built properly on the runtime system. >> Slurm was configured with the --with-hdf5=yes option, and config.log >> indicates that the hdf5 libs were found: >> >> configure:20180: checking hdf5.h usability >> configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include >> conftest.c >&5 >> configure:20180: $? = 0 >> configure:20180: result: yes >> configure:20180: checking hdf5.h presence >> configure:20180: gcc -E -I/include conftest.c >> configure:20180: $? = 0 >> configure:20180: result: yes >> configure:20180: checking for hdf5.h >> configure:20180: result: yes >> configure:20188: checking for H5Fcreate in -lhdf5 >> configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include >> -L/usr/lib64 conftest.c -lhdf5 -lm -lz -lhdf5 >&5 >> configure:20213: $? = 0 >> configure:20222: result: yes >> configure:20234: checking for main in -lhdf5_hl >> configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include >> -L/usr/lib64 conftest.c -lhdf5_hl -lm -lz -lhdf5 >&5 >> configure:20253: $? = 0 >> configure:20262: result: yes >> configure:20274: checking for matching HDF5 Fortran wrapper >> configure:20278: result: /usr/bin/h5fc >> >> The required shared object is in >> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so. >> >> >> Thank you, >> >> Dan Milroy >> >> From: Nancy Kritkausky [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 10:55 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Dan, >> You can check your installation to make sure the library is there. >> The name of the library is acct_gather_profile_hdf5.so. It is >> normally installed under /usr/lib64/slurm. But depending on your >> .configure is could be elsewhere, including /usr/share. As Rod said, >> if hdf5 is not installed, it will not be built. >> Hope this helps too, >> Nancy >> From: Rod Schultz [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 09:23 >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Dan, >> >> Do you have HDF5 installed on your system? Both the runtime system >> and the system upon which you built slurm. >> >> At configure time, there is a dependency on hdf5 being installed. >> >> The first couple of error appear to be caused by not finding the >> library. This is probably the result of a build problem. >> >> The last few are continued parsing of account_gather.conf. >> The parsing of this file involves calling parsers in each >> sub-account-gather plugin. If the plugin isn’t installed, items in >> the file are considered errors. >> >> Rod >> >> >> >> From: Daniel Milroy [mailto:[email protected]] >> Sent: Wednesday, May 28, 2014 8:37 AM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Hi Danny, >> >> There wasn’t anything in the “Profiling Using HDF5 User Guide” that >> indicated that I should load the plugin via spank. It was a result >> of research into enabling the plugin since various combinations of >> the parameters weren’t working. >> >> Removing the reference to the lustre acct_gather shared object in >> plugstack.conf and restarting the service yields: >> >> error: Couldn't find the specified plugin name for >> acct_gather_profile/hdf5 looking at all files >> error: cannot find acct_gather_profile plugin for >> acct_gather_profile/hdf5 >> fatal: ProfileHDF5Default can not be set to NotSet, please specify a >> valid option >> error: Parsing error at unrecognized key: ProfileHDF5Dir >> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf >> line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct" >> error: Parsing error at unrecognized key: ProfileHDF5Default >> error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf >> line 2: "ProfileHDF5Default=Filesystem" >> >> >> Regards, >> >> Dan Milroy >> >> From: Danny Auble [mailto:[email protected]] >> Sent: Tuesday, May 27, 2014 12:29 PM >> To: slurm-dev >> Subject: [slurm-dev] Re: HDF5 Profile Plugin setup >> >> Dan, I wouldn't expect spank would be needed to load this plugin. >> >> Try taking the line out of your plugstack.conf and see if that works >> for you. Was there something in the documentation >> (http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead you >> down this path? >> >> Danny >> On 05/23/2014 03:59 PM, Daniel Milroy wrote: >> Hello, >> >> I’ve been experiencing difficulties enabling the >> AcctGatherProfileType/hdf5 plugin for the Lustre filesystem. So far >> I’ve set the following parameters: >> >> slurm.conf >> AcctGatherProfileType=acct_gather_profile/hdf5 >> AcctGatherFilesystemType=acct_gather_filesystem/lustre >> >> acct_gather.conf >> ProfileHDF5Dir=/curc/slurm/slurm/acct >> ProfileHDF5Default=Filesystem >> >> plugstack.conf >> required >> /curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so >> >> Upon job submission, I receive the following error: >> salloc: error: spank: >> "/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so" >> exports 0 symbols >> salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed >> to load plugin >> /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so. >> Aborting. >> salloc: error: Failed to initialize plugin stack >> >> Please let me know what I can do to properly enable this plugin. >> >> >> Regards, >> >> Dan Milroy
