Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state
Hi Guys, Thanks for your answers. I would like not to patch the source code of Slurm, like Jacek does it, to make things easier. But I think, it is the way to go. When I try the solutions, Florian and Angelos suggested, slurm will still think that the nodes are "powered down", even if they not. Well, it is better that slurm only thinks that they are down, better as if they will power down while upgrading something. What we really need is some state like "MAINT", for maintenance, which will slurm tell, not to utilize the node but also don't power down the node. Thanks, Herbert Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von Florian Zillner Gesendet: Mittwoch, 26. August 2020 10:36 An: Slurm User Community List Betreff: Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state Hi Herbert, just like Angelos described, we also have logic in our poweroff script that checks if the node is really IDLE and only sends the poweroff command if that's the case. Excerpt: hosts=$(scontrol show hostnames $1) for host in $hosts; do scontrol show node $host | tr ' ' '\n' | grep -q 'State=IDLE+POWER$' if [[ $? == 1 ]]; then echo "node $host NOT IDLE" >>$OUTFILE continue else echo "node $host IDLE" >>$OUTFILE fi ssh $host poweroff ... sleep 1 ... done Best, Florian From: slurm-users mailto:slurm-users-boun...@lists.schedmd.com>> on behalf of Steininger, Herbert mailto:herbert_steinin...@psych.mpg.de>> Sent: Monday, 24 August 2020 10:52 To: Slurm User Community List mailto:slurm-users@lists.schedmd.com>> Subject: [External] [slurm-users] [slurm 20.02.3] don't suspend nodes in down state Hi, how can I prevent slurm, to suspend nodes, which I have set to down state for maintenance? I know about "SuspendExcNodes", but this doesn't seem the right way, to roll out the slurm.conf every time this changes. Is there a state that I can set so that the nodes doesn't get suspended? It happened a few times that I was doing some stuff on a server and after our idle time (1h) slurm decided to suspend the node. TIA, Herbert -- Herbert Steininger Leiter EDV & HPC Administrator Max-Planck-Institut für Psychiatrie Kraepelinstr. 2-10 80804 München Tel +49 (0)89 / 30622-368 Mail herbert_steinin...@psych.mpg.de<mailto:herbert_steinin...@psych.mpg.de> Web https://www.psych.mpg.de
[slurm-users] [slurm 20.02.3] don't suspend nodes in down state
Hi, how can I prevent slurm, to suspend nodes, which I have set to down state for maintenance? I know about "SuspendExcNodes", but this doesn't seem the right way, to roll out the slurm.conf every time this changes. Is there a state that I can set so that the nodes doesn't get suspended? It happened a few times that I was doing some stuff on a server and after our idle time (1h) slurm decided to suspend the node. TIA, Herbert -- Herbert Steininger Leiter EDV & HPC Administrator Max-Planck-Institut für Psychiatrie Kraepelinstr. 2-10 80804 München Tel +49 (0)89 / 30622-368 Mail herbert_steinin...@psych.mpg.de Web https://www.psych.mpg.de
[slurm-users] Solved: Error upgrading slurmdbd from 19.05 to 20.02
Hi, just want to let you know that i solved the problem simply by renaming the columns back to 'pack_...' and started slurmdbd again, which renamed them to 'het_...' slurmdbd is running again. Best, Herbert -Ursprüngliche Nachricht- Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von Steininger, Herbert Gesendet: Freitag, 13. März 2020 11:49 An: Slurm User Community List Betreff: Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02 Hi, i guess i found the Problem. It seems to come from this file: src/plugins/accounting_storage/mysql/as_mysql_convert.c in particular from here: --- code --- static int _convert_job_table_pre(mysql_conn_t *mysql_conn, char *cluster_name) { int rc = SLURM_SUCCESS; char *query = NULL; if (db_curr_ver < 8) { /* * Change the names pack_job_id and pack_job_offset to be het_* */ query = xstrdup_printf( "alter table \"%s_%s\" " "change pack_job_id het_job_id int unsigned not null, " "change pack_job_offset het_job_offset " "int unsigned not null;", cluster_name, job_table); } if (query) { if (debug_flags & DEBUG_FLAG_DB_QUERY) DB_DEBUG(mysql_conn->conn, "query\n%s", query); rc = mysql_db_query(mysql_conn, query); xfree(query); if (rc != SLURM_SUCCESS) error("%s: Can't convert %s_%s info: %m", __func__, cluster_name, job_table); } return rc; } --- code --- it checks if version is below "8" and if it is so, rename the tables. In the Table the Version is "7" --- mysql --- MariaDB [slurm_acct_db]> select * from convert_version_table; ++-+ | mod_time | version | ++-+ | 1579853103 | 7 | ++-+ 1 row in set (0.00 sec) --- mysql --- But in my Table, I already have the right columns: --- table --- MariaDB [slurm_acct_db]> show columns from `mpip-cluster_job_table`; ++-+--+-+++ | Field | Type| Null | Key | Default| Extra | ++-+--+-+++ | job_db_inx | bigint(20) unsigned | NO | PRI | NULL | auto_increment | | mod_time | bigint(20) unsigned | NO | | 0 | | | deleted| tinyint(4) | NO | | 0 | | | account| tinytext| YES | | NULL | | | admin_comment | text| YES | | NULL | | | array_task_str | text| YES | | NULL | | | array_max_tasks| int(10) unsigned| NO | | 0 | | | array_task_pending | int(10) unsigned| NO | | 0 | | | constraints| text| YES | | NULL | | | cpus_req | int(10) unsigned| NO | | NULL | | | derived_ec | int(10) unsigned| NO | | 0 | | | derived_es | text| YES | | NULL | | | exit_code | int(10) unsigned| NO | | 0 | | | flags | int(10) unsigned| NO | | 0 | | | job_name | tinytext| NO | | NULL | | | id_assoc | int(10) unsigned| NO | MUL | NULL | | | id_array_job | int(10) unsigned| NO | MUL | 0 | | | id_array_task | int(10) unsigned| NO | | 4294967294 | | | id_block | tinytext| YES | | NULL | | | id_job | int(10) unsigned| NO | MUL | NULL | | | id_qos | int(10) unsigned| NO | MUL | 0 | | | id_resv| int(10) unsigned| NO | MUL | NULL | | | id_wckey | int(10) unsigned| NO | MUL | NULL | | | id_user| int(10) unsigned| NO | MUL | NULL | | | id_group | int(10) unsigned| NO | | NULL | | | het_job_id | int(10) unsigned| NO | MUL | NULL | | | het_job_offset | int(10) unsigned
Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02
unsigned| NO | MUL | NULL | | | node_inx | text| YES | | NULL | | | partition | tinytext| NO | | NULL | | | priority | int(10) unsigned| NO | | NULL | | | state | int(10) unsigned| NO | | NULL | | | timelimit | int(10) unsigned| NO | | 0 | | | time_submit| bigint(20) unsigned | NO | | 0 | | | time_eligible | bigint(20) unsigned | NO | MUL | 0 | | | time_start | bigint(20) unsigned | NO | | 0 | | | time_end | bigint(20) unsigned | NO | MUL | 0 | | | time_suspended | bigint(20) unsigned | NO | | 0 | | | gres_req | text| NO | | NULL | | | gres_alloc | text| NO | | NULL | | | gres_used | text| NO | | NULL | | | wckey | tinytext| NO | | NULL | | | work_dir | text| NO | | NULL | | | system_comment | text| YES | | NULL | | | track_steps| tinyint(4) | NO | | NULL | | | tres_alloc | text| NO | | NULL | | | tres_req | text| NO | | NULL | | ++-+--+-+++ 52 rows in set (0.00 sec) MariaDB [slurm_acct_db]> --- table --- somebody knows what could be done to get slurmdbd up? is there an option to prevent this upgrade to mysql? Would I have to rebuild slurm? Thanks in Advance, Herbert -Ursprüngliche Nachricht- Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von Steininger, Herbert Gesendet: Donnerstag, 12. März 2020 16:01 An: slurm-users@lists.schedmd.com Betreff: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02 Hello, while upgrading slurm from 19.05 to 20.02 an error occurred while trying to upgrade slurmdbd first. The error is: slurmdbd: debug: Munge authentication plugin loaded slurmdbd: debug2: mysql_connect() called for db slurm_acct_db slurmdbd: debug2: Attempting to connect to slurmmaster:3306 slurmdbd: debug2: innodb_buffer_pool_size: 134217728 slurmdbd: debug2: innodb_log_file_size: 5242880 slurmdbd: debug2: innodb_lock_wait_timeout: 50 slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout slurmdbd: pre-converting job table for mpip-cluster slurmdbd: error: mysql_query failed: 1054 Unknown column 'pack_job_id' in 'mpip-cluster_job_table' alter table "mpip-cluster_job_table" change pack_job_id het_job_id int unsigned not null, change pack_job_offset het_job_offset int unsigned not null; slurmdbd: error: _convert_job_table_pre: Can't convert mpip-cluster_job_table info: Unknown error 1054 slurmdbd: error: issue converting tables before create slurmdbd: Accounting storage MYSQL plugin failed slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin How to get the missing columns? Thanks in Advance, Herbert
[slurm-users] Error upgrading slurmdbd from 19.05 to 20.02
Hello, while upgrading slurm from 19.05 to 20.02 an error occurred while trying to upgrade slurmdbd first. The error is: slurmdbd: debug: Munge authentication plugin loaded slurmdbd: debug2: mysql_connect() called for db slurm_acct_db slurmdbd: debug2: Attempting to connect to slurmmaster:3306 slurmdbd: debug2: innodb_buffer_pool_size: 134217728 slurmdbd: debug2: innodb_log_file_size: 5242880 slurmdbd: debug2: innodb_lock_wait_timeout: 50 slurmdbd: error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout slurmdbd: pre-converting job table for mpip-cluster slurmdbd: error: mysql_query failed: 1054 Unknown column 'pack_job_id' in 'mpip-cluster_job_table' alter table "mpip-cluster_job_table" change pack_job_id het_job_id int unsigned not null, change pack_job_offset het_job_offset int unsigned not null; slurmdbd: error: _convert_job_table_pre: Can't convert mpip-cluster_job_table info: Unknown error 1054 slurmdbd: error: issue converting tables before create slurmdbd: Accounting storage MYSQL plugin failed slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin How to get the missing columns? Thanks in Advance, Herbert