Re: [slurm-users] User id inconsistency
Sorry for my confusion, I shouldn't try to write emails before coffee! On Mon, Apr 19, 2021 at 7:43 AM Bruno Gomes Pessanha < bruno.pessa...@gmail.com> wrote: > That is showing that I'm in different groups depending on how I run the > command id. > > PS: I'm running the controller and workers in docker containers using > privileged mode. > > Bruno > > On Mon, 19 Apr 2021 at 13:24, Dustin Lang wrote: > >> This is telling you you're root in the docker container, right? >> >> >> >> On Mon, Apr 19, 2021 at 4:51 AM Bruno Gomes Pessanha < >> bruno.pessa...@gmail.com> wrote: >> >>> Somebody could help me with this? >>> Pretty strange behaviour. If I run "id: it shows different groups if I >>> run "id myuser": >>> >>> [root@ctrl-slurm /]# srun --pty -p local --uid myuser bash >>> >>> [myuser@node-slurm /]$ id >>> uid=868295925(myuser) gid=0(root) groups=0(root),979(cgred) >>> >>> [myuser@node-slurm /]$ id myuser >>> uid=868295925(myuser) gid=1001(myuser) groups=1001(myuser),978(docker) >>> >>> -- >>> Bruno >>> >> > > -- > Bruno Gomes Pessanha >
Re: [slurm-users] User id inconsistency
This is telling you you're root in the docker container, right? On Mon, Apr 19, 2021 at 4:51 AM Bruno Gomes Pessanha < bruno.pessa...@gmail.com> wrote: > Somebody could help me with this? > Pretty strange behaviour. If I run "id: it shows different groups if I run > "id myuser": > > [root@ctrl-slurm /]# srun --pty -p local --uid myuser bash > > [myuser@node-slurm /]$ id > uid=868295925(myuser) gid=0(root) groups=0(root),979(cgred) > > [myuser@node-slurm /]$ id myuser > uid=868295925(myuser) gid=1001(myuser) groups=1001(myuser),978(docker) > > -- > Bruno >
Re: [slurm-users] Do not upgrade mysql to 5.7.30!
According to a very quick web search, migrating from MySQL to MariaDB is (very) easy. Does anyone have any counter-experience with Slurm databases? Thanks, --dustin On Thu, May 7, 2020 at 1:34 PM Christopher Samuel wrote: > On 5/7/20 6:08 AM, Riebs, Andy wrote: > > > Alternatively, you could switch to MariaDB; I've been using that for > years. > > Debian switched to only having MariaDB in 2017 with the release of > Debian 9 (Stretch), as a derivative distro I'm surprised that Ubuntu > still packages MySQL. > > I'd second Andy's suggestion. > > All the best, > Chris > -- >Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > >
[slurm-users] Do not upgrade mysql to 5.7.30!
Hi, Ubuntu has made mysql 5.7.30 the default version. At least with Ubuntu 16.04, this causes severe problems with Slurm dbd (v 17.x, 18.x, and 19.x; not sure about 20). Reverting to mysql 5.7.29 seems to make everything work okay again. cheers, --dustin
[slurm-users] "sacctmgr add cluster" crashing slurmdbd
Hi, I've just upgraded to slurm 19.05.5. With either my old database, OR creating an entirely new database, I am unable to create a new 'cluster' entry in the database -- slurmdbd is segfaulting! # sacctmgr add cluster test3 Adding Cluster(s) Name = test3 Would you like to commit changes? (You have 30 seconds to decide) (N/y): y sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to mn001:6819: Connection refused sacctmgr: error: slurmdbd: Getting response to message type: DBD_ADD_CLUSTERS Problem adding clusters: Unspecified error sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused Meanwhile, running "slurmdbd -D -v -v -v -v -v", I see [2020-05-05T18:17:19.503] debug4: 10(as_mysql_cluster.c:405) query insert into txn_table (timestamp, action, name, actor, info) values (1588717037, 1405, 'test3', 'root', 'mod_time=1588717037, shares=1, grp_jobs=NULL, grp_jobs_accrue=NULL, grp_submit_jobs=NULL, grp_wall=NULL, max_jobs=NULL, max_jobs_accrue=NULL, min_prio_thresh=NULL, max_submit_jobs=NULL, max_wall_pj=NULL, priority=NULL, def_qos_id=NULL, qos=\',1,\', federation=\'\', fed_id=0, fed_state=0, features=\'\''); slurmdbd: debug4: 10(as_mysql_assoc.c:635) query select id_assoc from "test3_assoc_table" where user='' and deleted = 0 and acct='root'; [2020-05-05T18:17:19.506] debug4: 10(as_mysql_assoc.c:635) query select id_assoc from "test3_assoc_table" where user='' and deleted = 0 and acct='root'; slurmdbd: debug4: 10(as_mysql_assoc.c:714) query call get_parent_limits('assoc_table', 'root', 'test3', 0); select @par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos, @prio; [2020-05-05T18:17:19.506] debug4: 10(as_mysql_assoc.c:714) query call get_parent_limits('assoc_table', 'root', 'test3', 0); select @par_id, @mj, @mja, @mpt, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos, @prio; Segmentation fault (core dumped) Since this happens on a fresh new database, I just don't understand how I can get back to a basic functional state. This is exceedingly frustrating. Thanks for any hints. --dustin
Re: [slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS
I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentation Fault! On Tue, May 5, 2020 at 2:39 PM Dustin Lang wrote: > Hi, > > Apparently my colleague upgraded the mysql client and server, but, as far > as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql > release notes I don't see anything that looks suspicious there... > > cheers, > --dustin > > > On Tue, May 5, 2020 at 1:37 PM Dustin Lang wrote: > >> Hi, >> >> We're running Slurm 17.11.12. Everything has been working fine, and then >> suddenly slurmctld is crashing and slurmdbd is crashing. >> >> We use fair-share as part of the queuing policy, and previously set up >> accounts with sacctmgr; that has been working fine for months. >> >> If I run slurmdbd in debug mode, >> >> slurmdbd -D -v -v -v -v -v >> >> it eventually (after being contacted by slurmctld) segfaults with: >> >> ... >> slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null) >> TIME:1588695584 >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null) >> TIME:1588695584 >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug2: DBD_GET_TRES: called >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug2: DBD_GET_QOS: called >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug2: DBD_GET_USERS: called >> slurmdbd: debug4: got 0 commits >> slurmdbd: debug2: DBD_GET_ASSOCS: called >> slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query >> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select >> @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, >> @delta_qos; >> Segmentation fault (core dumped) >> >> >> It looks (running slurmdbd in gdb) like that segfault is coming from >> >> >> https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073 >> >> and If I connect to the mysql database directly and call that stored >> procedure, I get >> >> mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); >> >> +-+-+-+--+---+-+-+-+-+--+-+-+ >> | @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj >> := max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos := >> REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := CONCAT(@mtpj, >> if (@mtpj != '' && max_tres_pj != '', ',', ''), max_tres_pj) | @mtpn := >> CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != '', ',', ''), max_tres_pn) >> | @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && max_tres_mins_pj != '', ',', >> ''), max_tres_mins_pj) | @mtrm := CONCAT(@mtrm, if (@mtrm != '' && >> max_tres_run_mins != '', ',', ''), max_tres_run_mins) | @my_acct_new := >> parent_acct | >> >> +-+-+-+--+---+-+-+-+-+--+-+-+ >> | 1 |NULL |NULL | >> NULL | NULL | ,1, | NULL >>| NULL >> | NULL >>| NULL >> >> | NULL >> | | >> >> +-+-+-+--+---+-+---
Re: [slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS
Hi, Apparently my colleague upgraded the mysql client and server, but, as far as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql release notes I don't see anything that looks suspicious there... cheers, --dustin On Tue, May 5, 2020 at 1:37 PM Dustin Lang wrote: > Hi, > > We're running Slurm 17.11.12. Everything has been working fine, and then > suddenly slurmctld is crashing and slurmdbd is crashing. > > We use fair-share as part of the queuing policy, and previously set up > accounts with sacctmgr; that has been working fine for months. > > If I run slurmdbd in debug mode, > > slurmdbd -D -v -v -v -v -v > > it eventually (after being contacted by slurmctld) segfaults with: > > ... > slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null) > TIME:1588695584 > slurmdbd: debug4: got 0 commits > slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null) > TIME:1588695584 > slurmdbd: debug4: got 0 commits > slurmdbd: debug4: got 0 commits > slurmdbd: debug2: DBD_GET_TRES: called > slurmdbd: debug4: got 0 commits > slurmdbd: debug2: DBD_GET_QOS: called > slurmdbd: debug4: got 0 commits > slurmdbd: debug2: DBD_GET_USERS: called > slurmdbd: debug4: got 0 commits > slurmdbd: debug2: DBD_GET_ASSOCS: called > slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query > call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select > @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, > @delta_qos; > Segmentation fault (core dumped) > > > It looks (running slurmdbd in gdb) like that segfault is coming from > > > https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073 > > and If I connect to the mysql database directly and call that stored > procedure, I get > > mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); > > +-+-+-+--+---+-+-+-+-+--+-+-+ > | @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj > := max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos := > REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := CONCAT(@mtpj, > if (@mtpj != '' && max_tres_pj != '', ',', ''), max_tres_pj) | @mtpn := > CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != '', ',', ''), max_tres_pn) > | @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && max_tres_mins_pj != '', ',', > ''), max_tres_mins_pj) | @mtrm := CONCAT(@mtrm, if (@mtrm != '' && > max_tres_run_mins != '', ',', ''), max_tres_run_mins) | @my_acct_new := > parent_acct | > > +-+-+-+--+---+-+-+-+-+--+-+-+ > | 1 |NULL |NULL | > NULL | NULL | ,1, | NULL >| NULL > | NULL >| NULL > > | NULL > | | > > +-+-+-+--+---+-+-+-+-+--+
[slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS
Hi, We're running Slurm 17.11.12. Everything has been working fine, and then suddenly slurmctld is crashing and slurmdbd is crashing. We use fair-share as part of the queuing policy, and previously set up accounts with sacctmgr; that has been working fine for months. If I run slurmdbd in debug mode, slurmdbd -D -v -v -v -v -v it eventually (after being contacted by slurmctld) segfaults with: ... slurmdbd: debug2: DBD_NODE_STATE: NODE:cn049 STATE:UP REASON:(null) TIME:1588695584 slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_NODE_STATE: NODE:cn050 STATE:UP REASON:(null) TIME:1588695584 slurmdbd: debug4: got 0 commits slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_TRES: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_QOS: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_USERS: called slurmdbd: debug4: got 0 commits slurmdbd: debug2: DBD_GET_ASSOCS: called slurmdbd: debug4: 10(as_mysql_assoc.c:2033) query call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; Segmentation fault (core dumped) It looks (running slurmdbd in gdb) like that segfault is coming from https://github.com/SchedMD/slurm/blob/slurm-17-11-12-1/src/plugins/accounting_storage/mysql/as_mysql_assoc.c#L2073 and If I connect to the mysql database directly and call that stored procedure, I get mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); +-+-+-+--+---+-+-+-+-+--+-+-+ | @par_id := id_assoc | @mj := max_jobs | @msj := max_submit_jobs | @mwpj := max_wall_pj | @def_qos_id := def_qos_id | @qos := qos | @delta_qos := REPLACE(CONCAT(delta_qos, @delta_qos), ',,', ',') | @mtpj := CONCAT(@mtpj, if (@mtpj != '' && max_tres_pj != '', ',', ''), max_tres_pj) | @mtpn := CONCAT(@mtpn, if (@mtpn != '' && max_tres_pn != '', ',', ''), max_tres_pn) | @mtmpj := CONCAT(@mtmpj, if (@mtmpj != '' && max_tres_mins_pj != '', ',', ''), max_tres_mins_pj) | @mtrm := CONCAT(@mtrm, if (@mtrm != '' && max_tres_run_mins != '', ',', ''), max_tres_run_mins) | @my_acct_new := parent_acct | +-+-+-+--+---+-+-+-+-+--+-+-+ | 1 |NULL |NULL | NULL | NULL | ,1, | NULL | NULL | NULL | NULL | NULL | | +-+-+-+--+---+-+-+-+-+--+-+-+ and if I run mysql> call get_parent_limits('assoc_table', 'root', 'slurm_cluster', 0); select @par_id, @mj, @msj, @mwpj, @mtpj, @mtpn, @mtmpj, @mtrm, @def_qos_id, @qos, @delta_qos; I get +-+--+--+---+---+---++---+-+--++ | @par_id | @mj | @msj | @mwpj | @mtpj | @mtpn | @mtmpj | @mtrm | @def_qos_id | @qos | @delta_qos | +-+--+--+---+---+---++---+-+--++ | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | ,1, | NULL |