Re: [slurm-users] Job runtime
I ran again with time command in front of g09. The console output is Wed Mar 14 09:15:58 EDT 2018 real32m14.136s user53m56.946s sys2m17.855s Wed Mar 14 09:48:12 EDT 2018 So the wall clock time is 32 minutes roughly. g09 says Job cpu time: 0 days 0 hours 47 minutes 56.0 seconds. If g09 reports user time, then that is different from the time command (about 5 min difference). On the other hand, slurm says [mahmood@rocks7 ~]$ sacct -j 32 --format=elapsed,ncpus,cputime,UserCPU Elapsed NCPUSCPUTimeUserCPU -- -- -- -- 00:32:14 2 01:04:28 53:56.955 00:32:14 2 01:04:28 53:56.955 Slurm also uses time output. But the CPUTime is not clear for me. Regards, Mahmood On Wed, Mar 14, 2018 at 8:45 AM, Shenglong Wangwrote: > Gaussian reports CPU time, sacct reports wall time here. Was Gaussian setup > to run with 2 CPU cores? > > Best, > Shenglong >
[slurm-users] scontrol return values
Hi fellow slurm users, Today I noticed that scontrol returns 0 when it denies a drain request because no reason was supplied. It seems to me that this is wrong behavior, it should return 1 or some other error code so that scripts will know that the node was not actually drain. Thanks, Eli Slurm version: 2017.11.2 Platform: Debian command run: scontrol update node=${node} state=drain reason="" or scontrol update node=${node} state=drain
Re: [slurm-users] Job runtime
Gaussian reports CPU time, sacct reports wall time here. Was Gaussian setup to run with 2 CPU cores? Best, Shenglong > On Mar 14, 2018, at 8:04 AM, Mahmood Naderanwrote: > > Hi, > I see that slurm reports a 35 min duration for a completed job (g09) like this > > [mahmood@rocks7 ~]$ sacct -j 30 --format=start,end,elapsed,time > Start EndElapsed Timelimit > --- --- -- -- > 2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13 01:00:00 > 2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13 > > > However, the program itself, which logs the run, says > > Job cpu time: 0 days 0 hours 48 minutes 5.9 seconds. > > the job scripts contains > > #SBATCH --ntasks=1 > #SBATCH --cpus-per-task=2 > > Is there any idea about that time difference? > > > Regards, > Mahmood >
[slurm-users] Job runtime
Hi, I see that slurm reports a 35 min duration for a completed job (g09) like this [mahmood@rocks7 ~]$ sacct -j 30 --format=start,end,elapsed,time Start EndElapsed Timelimit --- --- -- -- 2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13 01:00:00 2018-03-14T06:07:17 2018-03-14T06:42:30 00:35:13 However, the program itself, which logs the run, says Job cpu time: 0 days 0 hours 48 minutes 5.9 seconds. the job scripts contains #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 Is there any idea about that time difference? Regards, Mahmood
Re: [slurm-users] Memory allocation error
On Wednesday, 14 March 2018 9:14:45 PM AEDT Mahmood Naderan wrote: > Thank you very much. My pleasure, so glad it helped! -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Re: [slurm-users] Overriding 67 Million Job UID limit
On Wednesday, 7 March 2018 12:53:30 AM AEDT Ron Golberg wrote: > I have many Jobs divided into job arrays - which makes me cross the Slurm’s > 67 Million JOBuid limit. This is a consequence of the addition of federation support, you can see why here: https://slurm.schedmd.com/SLUG17/FederatedScheduling.pdf [...] > also, I have an external application that manages jobs sent to the > scheduler, Is it ok for it to rely on the db_index for managing jobs on > Slurm (is the accounting db always up-to date)? You can't rely on the structure of the slurmdbd tables in MySQL, they are considered volatile and can change without notice I'm afraid! Unfortunately I don't really have enough of a clue for the other questions! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Re: [slurm-users] Job reordering for users?
Jessica Nettelbladwrites: > Maybe look into if scontrol top is of any help. It was introduced in > 16.05 and changed in 17.11 to take a job list. Note that NEWS and the > man page has slightly different information on how to enable it, maybe > the man page is only partially updated. > > We don't use this at our center, but thought I'd look into it if we > change our policy. Thanks, Jessica, this was I was thinking of. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
Re: [slurm-users] Memory allocation error
>If you put this in your script rather than the g09 command what does it say? > >ulimit -a That was a very good hint. I first ran ssh to compute-0-1 and saw "unlimited" value for "max memory size" and "virtual memory". Then I submitted the job with --mem=2000M and put the command in the slurm script. I then noticed that "max memory size" is about "200" while "virtual memory" is about 250. I guessed that the problem is with the virtual memory which program wants to allocate. By setting --mem=4000M, the error disappeared and now the job is running. Thank you very much. Regards, Mahmood
Re: [slurm-users] Memory allocation error
On Wednesday, 14 March 2018 7:37:19 PM AEDT Mahmood Naderan wrote: > I tried with --mem=2000M in the slurm script and put strace command in front > of g09. Please see some last lines Gaussian is trying to allocate more than 2GB of RAM in that case. Unfortunately your strace doesn't show anything useful as the failure is in a process that's forked off from the shell script that is g09 and your strace isn't following child processes. If you put this in your script rather than the g09 command what does it say? ulimit -a Also this code can be useful to find out how much memory Gaussian is actually using, though you'll need root access (or a helpful sysadmin there) to be able to use it. https://github.com/gsauthof/cgmemtime It will use a special cgroup (which it can configure for you) to track the memory usage of the whole Gaussian run for you. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
Re: [slurm-users] Job reordering for users?
You can use the the "nice" features to "rearrange" jobs. /M On 2018-03-14 10:07, Loris Bennett wrote: Hi, I seem to remember reading something about users being able to change the priorities within a group of their own jobs. So if a user suddenly decided that a job submitted later that a similar job submitted earlier, he/she would be able to switch the priorities. Is it just wishful thinking on my part or does something along those lines exist? Cheers, Loris -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet
[slurm-users] Job reordering for users?
Hi, I seem to remember reading something about users being able to change the priorities within a group of their own jobs. So if a user suddenly decided that a job submitted later that a similar job submitted earlier, he/she would be able to switch the priorities. Is it just wishful thinking on my part or does something along those lines exist? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de