This is very strange. AcctGatherProfileType=acct_gather_profile/hdf5
Is exactly what I have in my slurm.conf and the plugin loads. I’m afraid I have to disengage. I will be on vacation starting tomorrow and have some other loose ends to deal with. From: Daniel Milroy [mailto:[email protected]] Sent: Wednesday, May 28, 2014 11:12 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Hi Rod, No, these messages are reported by slurmctld, which fails to start if the parameters in my previous email are present in the configuration files. Dan Milroy From: Rod Schultz [mailto:[email protected]] Sent: Wednesday, May 28, 2014 11:58 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Daniel, Are these messages still coming from salloc? If so, are the slurm libraries installed on the node upon which salloc has been launched. Try sbatch –wrap=”srun –profile=task hostname” This makes sure srun executes on the first node of the allocation. The libraries should be there. –profile=task should work if you have job_acct_gather configured. Rod From: Daniel Milroy [mailto:[email protected]] Sent: Wednesday, May 28, 2014 10:44 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Nancy, Here’s the information you requested: slurm.conf AcctGatherProfileType=acct_gather_profile/hdf5 AcctGatherFilesystemType=acct_gather_filesystem/lustre acct_gather.conf ProfileHDF5Dir=/curc/slurm/slurm/acct ProfileHDF5Default=Filesystem Dan Milroy From: Nancy Kritkausky [mailto:[email protected]] Sent: Wednesday, May 28, 2014 11:32 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Hello Daniel, The syntax that is reported in the error message actually looks okay. Can you provide the part of your slurm.conf that defines the acctGatherProfileType and your acct_gather.conf? Maybe we can try and re-create the problem, Thanks, Nancy From: Daniel Milroy [mailto:[email protected]] Sent: Wednesday, May 28, 2014 10:08 To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Hi Nancy and Rod, I believe that slurm was built properly on the runtime system. Slurm was configured with the --with-hdf5=yes option, and config.log indicates that the hdf5 libs were found: configure:20180: checking hdf5.h usability configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5 configure:20180: $? = 0 configure:20180: result: yes configure:20180: checking hdf5.h presence configure:20180: gcc -E -I/include conftest.c configure:20180: $? = 0 configure:20180: result: yes configure:20180: checking for hdf5.h configure:20180: result: yes configure:20188: checking for H5Fcreate in -lhdf5 configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include -L/usr/lib64 conftest.c -lhdf5 -lm -lz -lhdf5 >&5 configure:20213: $? = 0 configure:20222: result: yes configure:20234: checking for main in -lhdf5_hl configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse -I/include -L/usr/lib64 conftest.c -lhdf5_hl -lm -lz -lhdf5 >&5 configure:20253: $? = 0 configure:20262: result: yes configure:20274: checking for matching HDF5 Fortran wrapper configure:20278: result: /usr/bin/h5fc The required shared object is in /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so. Thank you, Dan Milroy From: Nancy Kritkausky [mailto:[email protected]] Sent: Wednesday, May 28, 2014 10:55 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Dan, You can check your installation to make sure the library is there. The name of the library is acct_gather_profile_hdf5.so. It is normally installed under /usr/lib64/slurm. But depending on your .configure is could be elsewhere, including /usr/share. As Rod said, if hdf5 is not installed, it will not be built. Hope this helps too, Nancy From: Rod Schultz [mailto:[email protected]] Sent: Wednesday, May 28, 2014 09:23 To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Dan, Do you have HDF5 installed on your system? Both the runtime system and the system upon which you built slurm. At configure time, there is a dependency on hdf5 being installed. The first couple of error appear to be caused by not finding the library. This is probably the result of a build problem. The last few are continued parsing of account_gather.conf. The parsing of this file involves calling parsers in each sub-account-gather plugin. If the plugin isn’t installed, items in the file are considered errors. Rod From: Daniel Milroy [mailto:[email protected]] Sent: Wednesday, May 28, 2014 8:37 AM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Hi Danny, There wasn’t anything in the “Profiling Using HDF5 User Guide” that indicated that I should load the plugin via spank. It was a result of research into enabling the plugin since various combinations of the parameters weren’t working. Removing the reference to the lustre acct_gather shared object in plugstack.conf and restarting the service yields: error: Couldn't find the specified plugin name for acct_gather_profile/hdf5 looking at all files error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5 fatal: ProfileHDF5Default can not be set to NotSet, please specify a valid option error: Parsing error at unrecognized key: ProfileHDF5Dir error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct" error: Parsing error at unrecognized key: ProfileHDF5Default error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf line 2: "ProfileHDF5Default=Filesystem" Regards, Dan Milroy From: Danny Auble [mailto:[email protected]] Sent: Tuesday, May 27, 2014 12:29 PM To: slurm-dev Subject: [slurm-dev] Re: HDF5 Profile Plugin setup Dan, I wouldn't expect spank would be needed to load this plugin. Try taking the line out of your plugstack.conf and see if that works for you. Was there something in the documentation (http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead you down this path? Danny On 05/23/2014 03:59 PM, Daniel Milroy wrote: Hello, I’ve been experiencing difficulties enabling the AcctGatherProfileType/hdf5 plugin for the Lustre filesystem. So far I’ve set the following parameters: slurm.conf AcctGatherProfileType=acct_gather_profile/hdf5 AcctGatherFilesystemType=acct_gather_filesystem/lustre acct_gather.conf ProfileHDF5Dir=/curc/slurm/slurm/acct ProfileHDF5Default=Filesystem plugstack.conf required /curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so Upon job submission, I receive the following error: salloc: error: spank: "/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so" exports 0 symbols salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed to load plugin /curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so. Aborting. salloc: error: Failed to initialize plugin stack Please let me know what I can do to properly enable this plugin. Regards, Dan Milroy
