[slurm-dev] Re: fairshare incrementing

Alan V. Cowles Wed, 28 Aug 2013 09:38:52 -0700


Hey guys,

We had a eureka moment, and have discovered the cause of the problem andhow to get it back working. Now we need to prevent it from occurringagain in the future.

Looking through the mysql lead us to dead ends, as well as everysacctmgr thing we ran for the first couple of days. Finally we dugaround in the transactions log, saw that there was a restart ofslurmctld immediately after the last correctly functioning account wascreated. We actually wondered if you had to bounce the daemon in orderto make priority work for users.

Trying to find more info that we could cull from the command line I alsoattempted to run sview on our master node, and found it couldn'tlaunch... it was then I discovered that if I ran scontrol ping, we gotthe following response:


Slurmctld(primary/backup) at slurm-master/slurm-backup are DOWN/UP
*****************************************
** RESTORE SLURMCTLD DAEMON TO SERVICE **
*****************************************

Curious we decided to restart it on the master node, and we immediatelyentered a panic where it took a bit to re-import all of the jobs we havecurrently in our queue, but after a few minutes, we are currently inthis status:


Slurmctld(primary/backup) at slurm-master/slurm-backup are UP/DOWN
*****************************************
** RESTORE SLURMCTLD DAEMON TO SERVICE **
*****************************************

So from what we can tell, on or about june 17, the slurmctld on ourmaster host failed and the backup host took off running with it.Meanwhile slurmdbd continued to run unfettered on our master node.

For what we can figure, the slurmctld on the backup node could notcommunicate properly with the slurmdbd on the master as it was a remotehost. Once we restored slurmctld to running on the master node withslurmdbd on the same host, priority fairshare began to work for all ofour problematic users.

Now we want to find a way to make sure this doesn't happen again toperhaps allow slurmdbd to also run on the backup node, or have thebackup node be able to make a remote call to the dbd host?


In slurmdbd.conf on the master node we have the following:

# slurmDBD info
DbdAddr=localhost
DbdHost=localhost

Would it help to put the ipaddr of the host itself and start slurmdbd onthe backup node as well?

On this page for ubuntu:http://manpages.ubuntu.com/manpages/jaunty/man5/slurmdbd.conf.5.html

We found a reference to a slurmdbdaddr value that should be placed inslurm.conf and possible tell the slurmctld where the slurmdbd isrunning. Though most references to it seem to be ancient (slurmdbd 1.3?)and perhaps this is no longer needed in slurm 2.5.4+


Any thoughts on config modifications we could make?

Thanks in advance.

AC



On 08/27/2013 11:24 AM, Alan V. Cowles wrote:

Hey guys,
We're still hung on our priority scheduling here, but I had a thoughtreading some other mailings to the list this morning. The syntax thatothers are using with user creation is not as basic as ours, and hasother variables in place suchas "fairshare=parent" are these thingsthat we need to specify when creating an account or are they defaults,wondering if this is why our newer users aren't showing up with thecorrect accounts in squeue.
It still bugs us that accounts show up correctly in sacctmgr, just notin squeue for the point of enforcing priority. Could this be a bugcorrected in later releases?
AC

On 08/23/2013 03:13 PM, Alan V. Cowles wrote:
Final update for the day, we have found what is causing priority tobe overlooked we just don't know what is causing it...
[root@cluster-login ~]# squeue --format="%a %.7i %.9P %.8j %.8u %.8T%.10M %.9l %.6D %R" |grep user1(null) 181378 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181379 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181380 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181381 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181382 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181383 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181384 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181385 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181386 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)(null) 181387 lowmem testbatc user1 PENDING 0:00UNLIMITED 1 (Priority)
Compared to:
[root@cluster-login ~]# squeue --format="%a %.7i %.9P %.8j %.8u %.8T%.10M %.9l %.6D %R" |grep user2account 181378 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181379 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181380 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181381 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181382 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181383 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181384 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181385 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181386 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)account 181387 lowmem testbatc user2 PENDING 0:00UNLIMITED 1 (Priority)
We have tried to create new users and new accounts this afternoon andall of them show (null) as their account when we break out theformatting rules on sacct.
sacctmgr add account accountname
sacctmgr add user username defaultaccount accountname
We have even one case where all users under and account are workingfine except a user we added yesterday... so at some point in the past(logs aren't helping us thus far) the ability to actually sync up auser and an account for accounting purposes has left us. Also I havefailed to mention to this point that we are still running Slurm2.5.4, my apologies for that.
AC


On 08/23/2013 11:22 AM, Alan V. Cowles wrote:
Sorry to spam the list, but we wanted to keep updates in flux.
We managed to find the issue in our mysqldb we are using for jobaccounting which had the column value set to smallint (5) for thatvalue, so it was rounding things off, some SQL magic and we now haveappropriate uid's showing up. A new monkey wrench, some test jobssubmitted by user3 below get their fairshare value of 5000 asexpected, just not user2... we just cleared his jobs from the queue,and submitted another 100 jobs for testing and none of them got afairshare value...
In his entire history of using our cluster he hasn't submitted over5000 jobs, in fact:
[root@slurm-master ~]# sacct -c--format=user,jobid,jobname,start,elapsed,state,exitcode -u user2 |grep user2 | wc -l
2573

So we can't figure out why he's being overlooked.

AC


On 08/23/2013 10:31 AM, Alan V. Cowles wrote:
We think we may be onto something, in sacct we were looking at thejobs submitted by the users, and found that many users share thesame uidnumber in the slurm database. It seems to correlate withthe size of the user's uid number in our ldap directory... userswho's uid number are greater than 65535 get trunked to thatnumber... users with uid numbers below that keep their correctuidnumbers (user2 in the sample output below)
[root@slurm-master ~]# sacct -c--format=User,uid,JobID,JobName,NodeList,Start,Elapsed,ExitCode,DerivedExitCode,state|grep user2|headuser2 27545 30548 bwa node01-1 2013-07-08T13:04:2500:00:48 0:0 COMPLETEDuser2 27545 30571 bwa node01-1 2013-07-08T15:18:0000:00:48 0:0 COMPLETEDuser2 27545 30573 bwa node01-1 2013-07-09T09:40:5900:00:48 0:0 COMPLETEDuser2 27545 30618 grep node01-1 2013-07-09T11:57:1200:00:48 0:0 COMPLETEDuser2 27545 30619 bc node01-1 2013-07-09T11:58:0800:00:48 0:0 CANCELLEDuser2 27545 30620 du node01-1 2013-07-09T11:58:1900:00:48 0:0 COMPLETEDuser2 27545 30621 wc node01-1 2013-07-09T11:58:4300:00:48 0:0 COMPLETEDuser2 27545 30622 zcat node01-1 2013-07-09T11:58:5400:00:48 0:0 COMPLETEDuser2 27545 30623 zcat node01-1 2013-07-09T12:12:5600:00:48 0:0 COMPLETEDuser2 27545 30624 zcat node01-1 2013-07-09T12:26:3700:00:48 0:0 CANCELLED[root@slurm-master ~]# sacct -c--format=User,uid,JobID,JobName,NodeList,Start,Elapsed,ExitCode,DerivedExitCode,state|grep user1|headuser1 65535 83 impute2_w+ node01-1 2013-04-17T09:29:4700:00:48 0:0 FAILEDuser1 65535 84 impute2_w+ node01-1 2013-04-17T09:30:1700:00:48 0:0 FAILEDuser1 65535 85 impute2_w+ node01-1 2013-04-17T09:30:4000:00:48 0:0 FAILEDuser1 65535 86 impute2_w+ node01-1 2013-04-17T09:40:4500:00:48 0:0 FAILEDuser1 65535 87 date node01-1 2013-04-17T09:42:3600:00:48 0:0 COMPLETEDuser1 65535 88 hostname node01-1 2013-04-17T09:42:3700:00:48 0:0 COMPLETEDuser1 65535 89 impute2_w+ node01-1 2013-04-17T09:48:5000:00:48 0:0 FAILEDuser1 65535 90 impute2_w+ node01-1 2013-04-17T09:48:5600:00:48 0:0 FAILEDuser1 65535 91 impute2_w+ node01-1 2013-04-17T09:49:5600:00:48 0:0 FAILEDuser1 65535 92 impute2_w+ node01-1 2013-04-17T09:50:0600:00:48 0:0 FAILED[root@slurm-master ~]# sacct -c--format=User,uid,JobID,JobName,NodeList,Start,Elapsed,ExitCode,DerivedExitCode,state|grep user3|headuser3 65535 5 script.sh node09-1 2013-04-09T15:55:0700:00:48 0:0 FAILEDuser3 65535 6 script.sh node09-12013-04-09T15:55:13 INVALID 0:0 COMPLETEDuser3 65535 8 bash node09-1 2013-04-09T15:57:3400:00:48 0:0 COMPLETEDuser3 65535 7 bash node09-1 2013-04-09T15:57:2100:00:48 0:0 COMPLETEDuser3 65535 23 script.sh node09-1 2013-04-09T16:10:0200:00:48 0:0 COMPLETEDuser3 65535 27 script.sh node09-+ 2013-04-09T16:18:3300:00:48 0:0 CANCELLEDuser3 65535 28 script.sh node01-+ 2013-04-09T16:18:5500:00:48 0:0 CANCELLEDuser3 65535 30 script.sh node01-+ 2013-04-09T16:34:1200:00:48 0:0 CANCELLEDuser3 65535 31 script.sh node01-+ 2013-04-09T16:34:1700:00:48 0:0 CANCELLEDuser3 65535 32 script.sh node01-+ 2013-04-09T16:34:2100:00:48 0:0 CANCELLED
We are thinking perhaps this could lead to our major issues withthe system and priority factoring.
AC

On 08/23/2013 07:56 AM, Alan V. Cowles wrote:
Hey guys,
So in the past we had 3 prioritization factors in effect:partition, age and fairshare and they were working wonderfully.Currently partition has no effect for us as it's all one largeshared partition so everyone gets the same value there. Soeverything is balanced in age and fairshare, In the past age andfairshare worked splendidly, and we have it set as I understand torefresh counters every 2 weeks... so basically everyone had ablank slate this past weekend. What our current issue is asfollows...
A problematic user has submitted 70k jobs to a partition with 512slots and she is currently consuming all slots... basicallylocking up the queue for anybody else that wants to try and work.
Normally fairshare kicks in and jumps other users to the top ofthe queue but when a new user submitted 25 jobs (vs the 70k) hedidn't get any fairshare weighting at all...
JOBID USER PRIORITY AGE FAIRSHARE JOBSIZEPARTITION QOS NICE162986 uid1 8371 371 0 0 80000 0162987 uid1 8371 371 0 0 80000 0162988 uid1 8371 371 0 0 80000 0180698 uid2 8320 321 0 0 80000 0180699 uid2 8320 321 0 0 80000 0180700 uid2 8320 321 0 0 80000 0180701 uid2 8320 321 0 0 80000 0
I'm used to seeing a user like that get 5000 fairshare to startout with... Thoughts?
AC

[slurm-dev] Re: fairshare incrementing

Reply via email to