On Thursday, 30 March 2017, at 09:19:59 (-0700), Chad Cropper wrote: > Yes, I have seen these. Thanks. But the issue is I only want this to > run when the node is empty. Our workload is very serial and most job > sonly use 1,2,4 cores. So on a large node of 32 cores, we would have > many jobs active. I only ever want this to run when no jobs exist on > the node. My current best plan is to write python to check for jobs > on a node, if empty set to drain, then ssh as root and run the > command.
First, the problem with "sudo echo 3 > /proc/sys/vm/drop_caches" is that bash (and other shells) handle redirection of input/output *before* spawning the command, and the bash process (i.e., the user's shell) does not have the root access required to write to that file. You need a shell with root privileges to do that. The cleanest solution is simply: sudo /bin/bash -c 'echo 3 > /proc/sys/vm/drop_caches' if you want something that works from an unprivileged command prompt (assuming the user has sudo access to run bash as root). Counting jobs on a node is simply a matter of: squeue -h -w $HOSTNAME | wc -l Putting it all together, you can simply do (either in a script run via sudo, via "bash -c" as shown above, or under something like NHC: [ `squeue -h -w $HOSTNAME | wc -l` -eq 0 ] && echo 3 > /proc/sys/vm/drop_caches (Note that you may need to tweak the above if, for example, your full hostname as given by $HOSTNAME doesn't match your NodeName in SLURM.) HTH, Michael -- Michael E. Jennings <m...@lanl.gov> HPC Systems Team, Los Alamos National Laboratory Bldg. 03-0200, Rm. 212 W: +1 (505) 606-0605