Hi,
I have run into a problem using the slurmdb API and I'd like to bring it
to your attention.
This problem came up in the context of porting our pipeline manager,
which uses DRMAA, to slurm and trying to enhance the slurm-drmaa library
(http://apps.man.poznan.pl/trac/slurm-drmaa) so that drmaa_job_ps() call
will correctly return the status of completed jobs as well as jobs that
are running or queued.
I'm currently using slurm 14.11.6, which I know is old, but it is the
version that is installed and running on our cluster (and I don't
control which version is in use).
I do not know if this problem also exists in newer versions of slurm.
In a nutshell, I believe that the problem is that the code for
common/slurm_accounting_storage.c, which contains
slurm_acct_storage_init, appears to be included in _both_ libslurm.so
and libslurmdb.so, so there are two copies of the static data, one in
each shared library.
However the code for db_api/connection_functions.c, which contains
slurmdb_connection_get, is compiled only into libslurmdb.so.
> nm /usr/cluster/lib/libslurm.so.28.0.0 | grep slurm_acct_storage_init
00000000000a690c T slurm_acct_storage_init
> nm /usr/cluster/lib/libslurmdb.so.28.0.0 | grep slurm_acct_storage_init
00000000000ab240 T slurm_acct_storage_init
> nm /usr/cluster/lib/libslurm.so.28.0.0 | grep slurmdb_connection_get
> nm /usr/cluster/lib/libslurmdb.so.28.0.0 | grep slurmdb_connection_get
000000000001d000 T slurmdb_connection_get
When I try to write code like the following:
slurm_acct_storage_init(NULL);
conn = slurmdb_connection_get();
and link with -lslurm -lslurmdb it core dumps because
slurm_acct_storage_init is initializing the static data in slurm.so, but
not in slurmdb.so.
I have been able to work around this by dynamically calling the second
copy of slurm_acct_storage_init (through dlopen/dlsym), but it seems
like a bug that the static data (and the code) is built into both of
these shared libraries.
Do you know if this has been fixed in more recent versions of slurm?
Or am I trying to use the API in a way that is not intended?
Should I file a bug report?
Thanks very much,
Bob Handsaker