On Thursday, 30 March 2017, at 09:19:59 (-0700),
Chad Cropper wrote:

> Yes, I have seen these. Thanks. But the issue is I only want this to
> run when the node is empty. Our workload is very serial and most job
> sonly use 1,2,4 cores. So on a large node of 32 cores, we would have
> many jobs active. I only ever want this to run when no jobs exist on
> the node. My current best plan is to write python to check for jobs
> on a node, if empty set to drain, then ssh as root and run the
> command.

First, the problem with "sudo echo 3 > /proc/sys/vm/drop_caches" is
that bash (and other shells) handle redirection of input/output
*before* spawning the command, and the bash process (i.e., the user's
shell) does not have the root access required to write to that file.
You need a shell with root privileges to do that.

The cleanest solution is simply:

sudo /bin/bash -c 'echo 3 > /proc/sys/vm/drop_caches'

if you want something that works from an unprivileged command prompt
(assuming the user has sudo access to run bash as root).

Counting jobs on a node is simply a matter of:

squeue -h -w $HOSTNAME | wc -l

Putting it all together, you can simply do (either in a script run via
sudo, via "bash -c" as shown above, or under something like NHC:

[ `squeue -h -w $HOSTNAME | wc -l` -eq 0 ] && echo 3 > /proc/sys/vm/drop_caches

(Note that you may need to tweak the above if, for example, your full
hostname as given by $HOSTNAME doesn't match your NodeName in SLURM.)

HTH,
Michael

-- 
Michael E. Jennings <m...@lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-0200, Rm. 212      W: +1 (505) 606-0605

Reply via email to