Hi Dennis, Dennis Tants <[email protected]> writes:
> Hello Loris, > > Am 10.07.2017 um 07:39 schrieb Loris Bennett: >> Hi Dennis, >> >> Dennis Tants <[email protected]> writes: >> >>> Hi list, >>> >>> I am a little bit lost right now and would appreciate your help. >>> We have a little cluster with 16 nodes running with SLURM and it is >>> doing everything we want, except a few >>> little things I want to improve. >>> >>> So that is why I wanted to upgrade our old SLURM 15.X (don't know the >>> exact version) to 17.02.4 on my test machine. >>> I just deleted the old version completely with 'yum erase slurm-*' >>> (CentOS 7 btw.) and build the new version with rpmbuild. >>> Everything went fine so I started configuring a new slurm[dbd].conf. >>> This time I also wanted to integrate backfill instead of FIFO >>> and also use accounting (just to know which person uses the most >>> resources). Because we had no databases yet I started >>> slurmdbd and slurmctld without problems. >>> >>> Everything seemed fine with a simple mpi hello world test on one and two >>> nodes. >>> Now I wanted to enhance the script a bit more and include working in the >>> local directory of the nodes which is /work. >>> To get everything up and running I used the script which I attached for >>> you (it also includes the output after running the script). >>> It should basically just copy all data to /work/tants/$SLURM_JOB_NAME >>> before doing the mpi hello world. >>> But it seems that srun does not know $SLURM_JOB_NAME even though it is >>> there. >>> /work/tants belongs to the correct user and has rwx permissions. >>> >>> So did I just configure something wrong or what happened here? Nearly >>> the same example is working on our cluster with >>> 15.X. The script is only for testing purposes, thats why there are so >>> many echo commands in there. >>> If you see any mistake or can recommend better configurations I would >>> glady hear them. >>> Should you need any more information I will provide them. >>> Thank you for your time! >> Shouldn't the variable be $SBATCH_JOB_NAME? >> >> Cheers, >> >> Loris >> > > when I use "echo $SLURM_JOB_NAME" it will tell me the name I specified > with #SBATCH -J. > It is not working with srun in this version (it was working in 15.x). > > However, when I now use "echo $SBATCH_JOB_NAME" it is just a blank > variable. As told by someone from the list, > I used the command "env" to verify which variables are available. This > list includes SLURM_JOB_NAME > with the name I specified. So $SLURM_JOB_NAME shouldn't be a problem. > > Thank you for your suggestion though. > Any other hints? > > Best regards, > Dennis The manpage of srun says the following: SLURM_JOB_NAME Same as -J, --job-name except within an existing allocation, in which case it is ignored to avoid using the batch job’s name as the name of each job step. This sounds like it might mean that if you submit a job script via sbatch and in this script call srun, the variable might not be defined. However, the wording is a bit unclear and I have never tried this myself. Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
