The environment variable $HOSTNAME got expanded on the master compute node
(eagle1) by the shell before the srun command was executed. Also, if you
try to mv /scratch/Hello from a common filesystem, there is a race
condition as to which host will execute this first in parallel. You should
probably use cp. I haven't tested this, but you should probably escape the
$ and use cp:

srun cp /scratch/Hello /scratch/\$HOSTNAME

might work.

Bill.
--
Bill Barth, Ph.D., Director, HPC
[email protected]        |   Phone: (512) 232-7069
Office: ROC 1.435             |   Fax:   (512) 475-9445







On 9/16/14, 6:13 AM, "Mads Boye" <[email protected]> wrote:

>Hi.
>I am making some getting started scripts for my slurm users, and is
>facing this problem, I can not figure out if it is a bug or if I am
>simply holding it wrong ;-)
>
>What my script is doing is using sbcast to copy a file to all allocated
>nodes.
>
>Here after i am trying to rename each file to the hostname of the given
>node, and then cp all the file back to my home folder.
>
>It appears that the env variable HOSTNAME is the hostname of the
>SLURM_BATCHNODE.
>
>Here is my slurm script:
>
>#!/bin/bash
>#SBATCH -N 5 ## Number of nodes allocated
>#SBATCH --job-name=sbcast
>
>echo "create file"
>touch /tmp/Hello
>echo "copy Hello to every nodes /scratch"
>sbcast -f /tmp/Hello /scratch/Hello
>echo "see if Hello is on every node"
>srun ls -l /scratch/Hello
>echo "rename Hello to node hostname"
>srun mv /scratch/Hello /scratch/$HOSTNAME
>srun echo $HOSTNAME
>srun ls -l /scratch/eagle*
>~                 
>
>and here is to slurm-%jobid.out
>
>mb@birdnest:~$ cat slurm-2139.out
>create file
>copy Hello to every nodes /scratch
>see if Hello is on every node
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/Hello
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
>rename Hello to node hostname
>eagle1
>eagle1
>eagle1
>eagle1
>eagle1
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/eagle1
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
>-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
>
>Am I doing something wrong or using the function i a unintended way?
>
>
>Best Regards,
>
>Mads.

Reply via email to