Hi Dennis, Dennis Tants <[email protected]> writes:
> Hi list, > > I am a little bit lost right now and would appreciate your help. > We have a little cluster with 16 nodes running with SLURM and it is > doing everything we want, except a few > little things I want to improve. > > So that is why I wanted to upgrade our old SLURM 15.X (don't know the > exact version) to 17.02.4 on my test machine. > I just deleted the old version completely with 'yum erase slurm-*' > (CentOS 7 btw.) and build the new version with rpmbuild. > Everything went fine so I started configuring a new slurm[dbd].conf. > This time I also wanted to integrate backfill instead of FIFO > and also use accounting (just to know which person uses the most > resources). Because we had no databases yet I started > slurmdbd and slurmctld without problems. > > Everything seemed fine with a simple mpi hello world test on one and two > nodes. > Now I wanted to enhance the script a bit more and include working in the > local directory of the nodes which is /work. > To get everything up and running I used the script which I attached for > you (it also includes the output after running the script). > It should basically just copy all data to /work/tants/$SLURM_JOB_NAME > before doing the mpi hello world. > But it seems that srun does not know $SLURM_JOB_NAME even though it is > there. > /work/tants belongs to the correct user and has rwx permissions. > > So did I just configure something wrong or what happened here? Nearly > the same example is working on our cluster with > 15.X. The script is only for testing purposes, thats why there are so > many echo commands in there. > If you see any mistake or can recommend better configurations I would > glady hear them. > Should you need any more information I will provide them. > Thank you for your time! Shouldn't the variable be $SBATCH_JOB_NAME? Cheers, Loris -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email [email protected]
