Re: [slurm-users] Job runtime

2018-03-14 Thread Mahmood Naderan
I ran again with time command in front of g09.

The console output is

Wed Mar 14 09:15:58 EDT 2018
real32m14.136s
user53m56.946s
sys2m17.855s
Wed Mar 14 09:48:12 EDT 2018


So the wall clock time is 32 minutes roughly.
g09 says

 Job cpu time:   0 days  0 hours 47 minutes 56.0 seconds.

If g09 reports user time, then that is different from the time command
(about 5 min difference). On the other hand, slurm says

[mahmood@rocks7 ~]$ sacct -j 32 --format=elapsed,ncpus,cputime,UserCPU
   Elapsed  NCPUSCPUTimeUserCPU
-- -- -- --
  00:32:14  2   01:04:28  53:56.955
  00:32:14  2   01:04:28  53:56.955


Slurm also uses time output. But the CPUTime is not clear for me.

Regards,
Mahmood




On Wed, Mar 14, 2018 at 8:45 AM, Shenglong Wang  wrote:
> Gaussian reports CPU time, sacct reports wall time here. Was Gaussian setup 
> to run with 2 CPU cores?
>
> Best,
> Shenglong
>



[slurm-users] scontrol return values

2018-03-14 Thread E.S. Rosenberg
Hi fellow slurm users,

Today I noticed that scontrol returns 0 when it denies a drain request
because no reason was supplied.
It seems to me that this is wrong behavior, it should return 1 or some
other error code so that scripts will know that the node was not actually
drain.

Thanks,
Eli

Slurm version: 2017.11.2
Platform: Debian
command run:
scontrol update node=${node} state=drain reason=""
or
scontrol update node=${node} state=drain


Re: [slurm-users] Job runtime

2018-03-14 Thread Shenglong Wang
Gaussian reports CPU time, sacct reports wall time here. Was Gaussian setup to 
run with 2 CPU cores?

Best,
Shenglong

> On Mar 14, 2018, at 8:04 AM, Mahmood Naderan  wrote:
> 
> Hi,
> I see that slurm reports a 35 min duration for a completed job (g09) like this
> 
> [mahmood@rocks7 ~]$ sacct -j 30 --format=start,end,elapsed,time
>  Start EndElapsed  Timelimit
> --- --- -- --
> 2018-03-14T06:07:17 2018-03-14T06:42:30   00:35:13   01:00:00
> 2018-03-14T06:07:17 2018-03-14T06:42:30   00:35:13
> 
> 
> However, the program itself, which logs the run, says
> 
> Job cpu time:   0 days  0 hours 48 minutes  5.9 seconds.
> 
> the job scripts contains
> 
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=2
> 
> Is there any idea about that time difference?
> 
> 
> Regards,
> Mahmood
> 




[slurm-users] Job runtime

2018-03-14 Thread Mahmood Naderan
Hi,
I see that slurm reports a 35 min duration for a completed job (g09) like this

[mahmood@rocks7 ~]$ sacct -j 30 --format=start,end,elapsed,time
  Start EndElapsed  Timelimit
--- --- -- --
2018-03-14T06:07:17 2018-03-14T06:42:30   00:35:13   01:00:00
2018-03-14T06:07:17 2018-03-14T06:42:30   00:35:13


However, the program itself, which logs the run, says

 Job cpu time:   0 days  0 hours 48 minutes  5.9 seconds.

the job scripts contains

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2

Is there any idea about that time difference?


Regards,
Mahmood



Re: [slurm-users] Memory allocation error

2018-03-14 Thread Chris Samuel
On Wednesday, 14 March 2018 9:14:45 PM AEDT Mahmood Naderan wrote:

> Thank you very much.

My pleasure, so glad it helped!

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Overriding 67 Million Job UID limit

2018-03-14 Thread Chris Samuel
On Wednesday, 7 March 2018 12:53:30 AM AEDT Ron Golberg wrote:

> I have many Jobs divided into job arrays - which makes me cross the Slurm’s
> 67 Million JOBuid limit.

This is a consequence of the addition of federation support, you can see why 
here:

https://slurm.schedmd.com/SLUG17/FederatedScheduling.pdf

[...]
> also, I have an external application that manages jobs sent to the
> scheduler, Is it ok for it to rely on the db_index for managing jobs on
> Slurm (is the accounting db always up-to date)?

You can't rely on the structure of the slurmdbd tables in MySQL, they are 
considered volatile and can change without notice I'm afraid!

Unfortunately I don't really have enough of a clue for the other questions!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Job reordering for users?

2018-03-14 Thread Loris Bennett
Jessica Nettelblad  writes:

> Maybe look into if scontrol top is of any help. It was introduced in
> 16.05 and changed in 17.11 to take a job list. Note that NEWS and the
> man page has slightly different information on how to enable it, maybe
> the man page is only partially updated.
>
> We don't use this at our center, but thought I'd look into it if we
> change our policy.

Thanks, Jessica, this was I was thinking of.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de



Re: [slurm-users] Memory allocation error

2018-03-14 Thread Mahmood Naderan
>If you put this in your script rather than the g09 command what does it say?
>
>ulimit -a

That was a very good hint. I first ran ssh to compute-0-1 and saw
"unlimited" value for "max memory size" and "virtual memory". Then I
submitted the job with --mem=2000M and put the command in the slurm
script.

I then noticed that "max memory size" is about "200" while
"virtual memory" is about 250. I guessed that the problem is with
the virtual memory which program wants to allocate. By setting
--mem=4000M, the error disappeared and now the job is running.

Thank you very much.

Regards,
Mahmood



Re: [slurm-users] Memory allocation error

2018-03-14 Thread Chris Samuel
On Wednesday, 14 March 2018 7:37:19 PM AEDT Mahmood Naderan wrote:

> I tried with --mem=2000M in the slurm script and put strace command in front
> of g09. Please see some last lines

Gaussian is trying to allocate more than 2GB of RAM in that case.

Unfortunately your strace doesn't show anything useful as the failure is in a 
process that's forked off from the shell script that is g09 and your strace 
isn't following child processes.

If you put this in your script rather than the g09 command what does it say?

ulimit -a

Also this code can be useful to find out how much memory Gaussian is actually 
using, though you'll need root access (or a helpful sysadmin there) to be able 
to use it.

https://github.com/gsauthof/cgmemtime

It will use a special cgroup (which it can configure for you) to track the 
memory usage of the whole Gaussian run for you.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Job reordering for users?

2018-03-14 Thread Magnus Jonsson

You can use the the "nice" features to "rearrange" jobs.

/M

On 2018-03-14 10:07, Loris Bennett wrote:

Hi,

I seem to remember reading something about users being able to change
the priorities within a group of their own jobs.  So if a user suddenly
decided that a job submitted later that a similar job submitted earlier,
he/she would be able to switch the priorities.

Is it just wishful thinking on my part or does something along those
lines exist?

Cheers,

Loris



--
Magnus Jonsson, Developer, HPC2N, Umeå Universitet



[slurm-users] Job reordering for users?

2018-03-14 Thread Loris Bennett
Hi,

I seem to remember reading something about users being able to change
the priorities within a group of their own jobs.  So if a user suddenly
decided that a job submitted later that a similar job submitted earlier,
he/she would be able to switch the priorities.

Is it just wishful thinking on my part or does something along those
lines exist?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de