Greetings!
I have been experiencing *configuration problems* trying to start job
accounting in a local *MySQL* database. I'd like to thank you in advance
for spending some time to read this.
(Note: mysqld is running in all of these scenarios.)
Everything seems to work properly when using text files to keep
accounting information. However, I'd like to set the accounting and job
completion storage to a MySQL local installation. At first, I wasn't very
interested in using SlurmDBD, as it will not be necessary in my project.
After configuring Slurm to use MySQL storage, it shows the following
messages:
slurmctld: debug3: Trying to load plugin /usr/lib64/slurm/accounting_
storage_mysql.so
slurmctld: debug2: mysql_connect() called for db slurm_acct_db
slurmctld: debug4: (as_mysql_convert.c:771) query
show columns from "cluster_assoc_table" where Field='grp_cpus';
slurmctld: debug4: This could happen often and is expected.
mysql_query failed: 1146 Table 'slurm_acct_db.cluster_assoc_table' doesn't
exist
show columns from "cluster_assoc_table" where Field='grp_cpus';
slurmctld: error: issue converting tables
slurmctld: Accounting storage MYSQL plugin failed
slurmctld: error: Couldn't load specified plugin name for
accounting_storage/mysql: Plugin init() callback failed
slurmctld: error: cannot create accounting_storage context for
accounting_storage/mysql
slurmctld: debug: Association database appears down, reading from state
file.
slurmctld: debug2: No association state file (/var/spool/assoc_mgr_state)
to recover
slurmctld: fatal: You are running with a database but for some reason we
have no TRES from it. This should only happen if the database is down and
you don't have any state files.
I have tried to *delete* Slurm's database and let it create it again.
But it didn't succeed. I've tried to create
*slurm_acct.db.cluster_assoc_table*, which is mentioned above, *manually*.
However, it starts presenting another error.
Also, *sacct* and *sacctmgr* displays this same *error*:
sacctmgr: error: issue converting tables
sacctmgr: error: Couldn't load specified plugin name for
accounting_storage/mysql: Plugin init() callback failed
sacctmgr: error: cannot create accounting_storage context for
accounting_storage/mysql
sacctmgr: error: Problem talking to the database: Interrupted system call
Then, I decided to give *SlurmDBD* a try. After configuring it with my
specifications, it appears that SlurmDBD itself properly connects to the
configured MYSQL database. As it prints:
slurmdbd: slurmdbd version 17.02.1-2 started
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: running rollup at Mon Mar 27 20:48:17 2017
slurmdbd: debug2: Everything rolled up
slurmdbd: debug4: got 0 commits
However, *slurmctld* is still unable to start. It prompts the following
*messages*:
slurmctld: debug3: Trying to load plugin /usr/lib64/slurm/auth_none.so
slurmctld: debug: Null authentication plugin loaded
slurmctld: debug3: Success.
slurmctld: error: persistant connection experienced an error
slurmctld: error: persistant connection experienced an error
slurmctld: error: persistant connection experienced an error
//CONTINUES INDEFINETELY
Sometimes, after waiting long enough, *it shows*:
//AFTER MANY MESSAGES
slurmctld: error: persistant connection experienced an error
slurmctld: error: persistant connection experienced an error
slurmctld: error: persistant connection experienced an error
slurmctld: debug: No unpack method for msg type 13614
slurmctld: error: slurm_persist_conn_open: Failed to unpack persistent
connection init resp message from localhost.localdomain:3306
slurmctld: error: slurm_persist_conn_open: No response to persist_init
slurmctld: error: slurm_persist_conn_open: No response to persist_init
slurmctld: error: slurm_persist_conn_open: No response to persist_init
//CONTINUES FOR SOME MORE SECONDS
slurmctld: error: slurm_persist_conn_open: No response to persist_init
slurmctld: error: slurm_persist_conn_open: No response to persist_init
slurmctld: error: slurm_persist_conn_open: No response to persist_init
slurmctld: error: slurmdbd: Sending PersistInit msg: No error
slurmctld: debug4: slurmdbd: There is no state save file to open by name
/var/spool/dbd.messages
slurmctld: debug: Association database appears down, reading from state
file.
slurmctld: debug2: No association state file (/var/spool/assoc_mgr_state)
to recover
slurmctld: fatal: You are running with a database but for some reason we
have no TRES from it. This should only happen if the database is down and
you don't have any state files.
Running *sacct* and *sacctmgr*, the *same error* is prompted: "error:
persistant connection experienced an error".
Once again, *thank you* for reading and for sharing ideas that might be
helpful .
Cordially,
Lucas Barcellos Oliveira
Undergraduate - Computing and Information Engineering
Rio de Janeiro Federal University