Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in down state

2020-09-01 Thread Steininger, Herbert
Hi Guys,

Thanks for your answers.

I would like not to patch the source code of Slurm, like Jacek does it, to make 
things easier.
But I think, it is the way to go.

When I try the solutions, Florian and Angelos suggested, slurm will still think 
that the nodes are "powered down", even if they not.
Well, it is better that slurm only thinks that they are down, better as if they 
will power down while upgrading something.


What we really need is some state like "MAINT", for maintenance, which will 
slurm tell, not to utilize the node but also don't power down the node.

Thanks,
Herbert



Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von 
Florian Zillner
Gesendet: Mittwoch, 26. August 2020 10:36
An: Slurm User Community List 
Betreff: Re: [slurm-users] [External] [slurm 20.02.3] don't suspend nodes in 
down state

Hi Herbert,

just like Angelos described, we also have logic in our poweroff script that 
checks if the node is really IDLE and only sends the poweroff command if that's 
the case.

Excerpt:
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
scontrol show node $host | tr ' ' '\n' | grep -q 'State=IDLE+POWER$'
if [[ $? == 1 ]]; then
echo "node $host NOT IDLE" >>$OUTFILE
continue
else
echo "node $host IDLE" >>$OUTFILE
fi
ssh $host poweroff
...
sleep 1
...
done

Best,
Florian


From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of Steininger, Herbert 
mailto:herbert_steinin...@psych.mpg.de>>
Sent: Monday, 24 August 2020 10:52
To: Slurm User Community List 
mailto:slurm-users@lists.schedmd.com>>
Subject: [External] [slurm-users] [slurm 20.02.3] don't suspend nodes in down 
state

Hi,

how can I prevent slurm, to suspend nodes, which I have set to down state for 
maintenance?
I know about "SuspendExcNodes", but this doesn't seem the right way, to roll 
out the slurm.conf every time this changes.
Is there a state that I can set so that the nodes doesn't get suspended?

It happened a few times that I was doing some stuff on a server and after our 
idle time (1h) slurm decided to suspend the node.

TIA,
Herbert

--
Herbert Steininger
Leiter EDV & HPC
Administrator
Max-Planck-Institut für Psychiatrie
Kraepelinstr.  2-10
80804 München
Tel  +49 (0)89 / 30622-368
Mail   herbert_steinin...@psych.mpg.de<mailto:herbert_steinin...@psych.mpg.de>
Web  https://www.psych.mpg.de




[slurm-users] [slurm 20.02.3] don't suspend nodes in down state

2020-08-24 Thread Steininger, Herbert
Hi,

how can I prevent slurm, to suspend nodes, which I have set to down state for 
maintenance?
I know about "SuspendExcNodes", but this doesn't seem the right way, to roll 
out the slurm.conf every time this changes.
Is there a state that I can set so that the nodes doesn't get suspended?

It happened a few times that I was doing some stuff on a server and after our 
idle time (1h) slurm decided to suspend the node.

TIA,
Herbert

-- 
Herbert Steininger
Leiter EDV & HPC
Administrator
Max-Planck-Institut für Psychiatrie
Kraepelinstr.  2-10
80804 München  
Tel  +49 (0)89 / 30622-368
Mail   herbert_steinin...@psych.mpg.de
Web  https://www.psych.mpg.de





[slurm-users] Solved: Error upgrading slurmdbd from 19.05 to 20.02

2020-03-16 Thread Steininger, Herbert
Hi,

just want to let you know that i solved the problem simply by renaming the 
columns back to 'pack_...' and started slurmdbd again, which renamed them to 
'het_...'
slurmdbd is running again.

Best,
Herbert


-Ursprüngliche Nachricht-
Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von 
Steininger, Herbert
Gesendet: Freitag, 13. März 2020 11:49
An: Slurm User Community List 
Betreff: Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

Hi,

i guess i found the Problem.

It seems to come from this file:
src/plugins/accounting_storage/mysql/as_mysql_convert.c
in particular from here:

--- code ---
static int _convert_job_table_pre(mysql_conn_t *mysql_conn, char *cluster_name)
{
int rc = SLURM_SUCCESS;
char *query = NULL;

if (db_curr_ver < 8) {
/*
 * Change the names pack_job_id and pack_job_offset to be het_*
 */
query = xstrdup_printf(
"alter table \"%s_%s\" "
"change pack_job_id het_job_id int unsigned not null, "
"change pack_job_offset het_job_offset "
"int unsigned not null;",
cluster_name, job_table);
}

if (query) {
if (debug_flags & DEBUG_FLAG_DB_QUERY)
DB_DEBUG(mysql_conn->conn, "query\n%s", query);

rc = mysql_db_query(mysql_conn, query);
xfree(query);
if (rc != SLURM_SUCCESS)
error("%s: Can't convert %s_%s info: %m",
  __func__, cluster_name, job_table);
}

return rc;
}
--- code ---

it checks if version is below "8" and if it is so, rename the tables.

In the Table the Version is "7"

--- mysql ---
MariaDB [slurm_acct_db]> select * from convert_version_table;
++-+
| mod_time   | version |
++-+
| 1579853103 |   7 |
++-+
1 row in set (0.00 sec)
--- mysql ---


But in my Table, I already have the right columns:

--- table ---
MariaDB [slurm_acct_db]> show columns from `mpip-cluster_job_table`;
++-+--+-+++
| Field  | Type| Null | Key | Default| Extra
  |
++-+--+-+++
| job_db_inx | bigint(20) unsigned | NO   | PRI | NULL   | 
auto_increment |
| mod_time   | bigint(20) unsigned | NO   | | 0  |  
  |
| deleted| tinyint(4)  | NO   | | 0  |  
  |
| account| tinytext| YES  | | NULL   |  
  |
| admin_comment  | text| YES  | | NULL   |  
  |
| array_task_str | text| YES  | | NULL   |  
  |
| array_max_tasks| int(10) unsigned| NO   | | 0  |  
  |
| array_task_pending | int(10) unsigned| NO   | | 0  |  
  |
| constraints| text| YES  | | NULL   |  
  |
| cpus_req   | int(10) unsigned| NO   | | NULL   |  
  |
| derived_ec | int(10) unsigned| NO   | | 0  |  
  |
| derived_es | text| YES  | | NULL   |  
  |
| exit_code  | int(10) unsigned| NO   | | 0  |  
  |
| flags  | int(10) unsigned| NO   | | 0  |  
  |
| job_name   | tinytext| NO   | | NULL   |  
  |
| id_assoc   | int(10) unsigned| NO   | MUL | NULL   |  
  |
| id_array_job   | int(10) unsigned| NO   | MUL | 0  |  
  |
| id_array_task  | int(10) unsigned| NO   | | 4294967294 |  
  |
| id_block   | tinytext| YES  | | NULL   |  
  |
| id_job | int(10) unsigned| NO   | MUL | NULL   |  
  |
| id_qos | int(10) unsigned| NO   | MUL | 0  |  
  |
| id_resv| int(10) unsigned| NO   | MUL | NULL   |  
  |
| id_wckey   | int(10) unsigned| NO   | MUL | NULL   |  
  |
| id_user| int(10) unsigned| NO   | MUL | NULL   |  
  |
| id_group   | int(10) unsigned| NO   | | NULL   |  
  |
| het_job_id | int(10) unsigned| NO   | MUL | NULL   |  
  |
| het_job_offset | int(10) unsigned

Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

2020-03-13 Thread Steininger, Herbert
unsigned| NO   | MUL | NULL   |  
  |
| node_inx   | text| YES  | | NULL   |  
  |
| partition  | tinytext| NO   | | NULL   |  
  |
| priority   | int(10) unsigned| NO   | | NULL   |  
  |
| state  | int(10) unsigned| NO   | | NULL   |  
  |
| timelimit  | int(10) unsigned| NO   | | 0  |  
  |
| time_submit| bigint(20) unsigned | NO   | | 0  |  
  |
| time_eligible  | bigint(20) unsigned | NO   | MUL | 0  |  
  |
| time_start | bigint(20) unsigned | NO   | | 0  |  
  |
| time_end   | bigint(20) unsigned | NO   | MUL | 0  |  
  |
| time_suspended | bigint(20) unsigned | NO   | | 0  |  
  |
| gres_req   | text| NO   | | NULL   |  
  |
| gres_alloc | text| NO   | | NULL   |  
  |
| gres_used  | text| NO   | | NULL   |  
  |
| wckey  | tinytext| NO   | | NULL   |  
  |
| work_dir   | text| NO   | | NULL   |  
  |
| system_comment | text| YES  | | NULL   |  
  |
| track_steps| tinyint(4)  | NO   | | NULL   |  
  |
| tres_alloc | text| NO   | | NULL   |  
  |
| tres_req   | text| NO   | | NULL   |  
  |
++-+--+-+++
52 rows in set (0.00 sec)

MariaDB [slurm_acct_db]>

--- table ---

somebody knows what could be done to get slurmdbd up?
is there an option to prevent this upgrade to mysql?

Would I have to rebuild slurm?


Thanks in Advance,
Herbert


-Ursprüngliche Nachricht-
Von: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] Im Auftrag von 
Steininger, Herbert
Gesendet: Donnerstag, 12. März 2020 16:01
An: slurm-users@lists.schedmd.com
Betreff: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

Hello,

while upgrading slurm from 19.05 to 20.02 an error occurred while trying to 
upgrade slurmdbd first.

The error is:
slurmdbd: debug:  Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to slurmmaster:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 5242880
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout
slurmdbd: pre-converting job table for mpip-cluster
slurmdbd: error: mysql_query failed: 1054 Unknown column 'pack_job_id' in 
'mpip-cluster_job_table'
alter table "mpip-cluster_job_table" change pack_job_id het_job_id int unsigned 
not null, change pack_job_offset het_job_offset int unsigned not null;
slurmdbd: error: _convert_job_table_pre: Can't convert mpip-cluster_job_table 
info: Unknown error 1054
slurmdbd: error: issue converting tables before create
slurmdbd: Accounting storage MYSQL plugin failed
slurmdbd: error: Couldn't load specified plugin name for 
accounting_storage/mysql: Plugin init() callback failed
slurmdbd: error: cannot create accounting_storage context for 
accounting_storage/mysql
slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting 
storage plugin

How to get the missing columns?

Thanks in Advance,
Herbert





[slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

2020-03-12 Thread Steininger, Herbert
Hello,

while upgrading slurm from 19.05 to 20.02 an error occurred while trying to 
upgrade slurmdbd first.

The error is:
slurmdbd: debug:  Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to slurmmaster:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 5242880
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout
slurmdbd: pre-converting job table for mpip-cluster
slurmdbd: error: mysql_query failed: 1054 Unknown column 'pack_job_id' in 
'mpip-cluster_job_table'
alter table "mpip-cluster_job_table" change pack_job_id het_job_id int unsigned 
not null, change pack_job_offset het_job_offset int unsigned not null;
slurmdbd: error: _convert_job_table_pre: Can't convert mpip-cluster_job_table 
info: Unknown error 1054
slurmdbd: error: issue converting tables before create
slurmdbd: Accounting storage MYSQL plugin failed
slurmdbd: error: Couldn't load specified plugin name for 
accounting_storage/mysql: Plugin init() callback failed
slurmdbd: error: cannot create accounting_storage context for 
accounting_storage/mysql
slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting 
storage plugin

How to get the missing columns?

Thanks in Advance,
Herbert