Hi, any idea why the output of your job is not complete? There is nothing after "Copying files...". Does the /work/tants directory exists in all the nodes? The variable $SLURM_JOB_NAME is interpreted by bash so srun only sees "srun -N2 -n2 rm -rf /work/tants/mpicopytest"
Regards, Carlos On Mon, Jul 10, 2017 at 10:02 AM, Dennis Tants < [email protected]> wrote: > > Hello Loris, > > Am 10.07.2017 um 07:39 schrieb Loris Bennett: > > Hi Dennis, > > > > Dennis Tants <[email protected]> writes: > > > >> Hi list, > >> > >> I am a little bit lost right now and would appreciate your help. > >> We have a little cluster with 16 nodes running with SLURM and it is > >> doing everything we want, except a few > >> little things I want to improve. > >> > >> So that is why I wanted to upgrade our old SLURM 15.X (don't know the > >> exact version) to 17.02.4 on my test machine. > >> I just deleted the old version completely with 'yum erase slurm-*' > >> (CentOS 7 btw.) and build the new version with rpmbuild. > >> Everything went fine so I started configuring a new slurm[dbd].conf. > >> This time I also wanted to integrate backfill instead of FIFO > >> and also use accounting (just to know which person uses the most > >> resources). Because we had no databases yet I started > >> slurmdbd and slurmctld without problems. > >> > >> Everything seemed fine with a simple mpi hello world test on one and two > >> nodes. > >> Now I wanted to enhance the script a bit more and include working in the > >> local directory of the nodes which is /work. > >> To get everything up and running I used the script which I attached for > >> you (it also includes the output after running the script). > >> It should basically just copy all data to /work/tants/$SLURM_JOB_NAME > >> before doing the mpi hello world. > >> But it seems that srun does not know $SLURM_JOB_NAME even though it is > >> there. > >> /work/tants belongs to the correct user and has rwx permissions. > >> > >> So did I just configure something wrong or what happened here? Nearly > >> the same example is working on our cluster with > >> 15.X. The script is only for testing purposes, thats why there are so > >> many echo commands in there. > >> If you see any mistake or can recommend better configurations I would > >> glady hear them. > >> Should you need any more information I will provide them. > >> Thank you for your time! > > Shouldn't the variable be $SBATCH_JOB_NAME? > > > > Cheers, > > > > Loris > > > > when I use "echo $SLURM_JOB_NAME" it will tell me the name I specified > with #SBATCH -J. > It is not working with srun in this version (it was working in 15.x). > > However, when I now use "echo $SBATCH_JOB_NAME" it is just a blank > variable. As told by someone from the list, > I used the command "env" to verify which variables are available. This > list includes SLURM_JOB_NAME > with the name I specified. So $SLURM_JOB_NAME shouldn't be a problem. > > Thank you for your suggestion though. > Any other hints? > > Best regards, > Dennis > > -- > Dennis Tants > Auszubildender: Fachinformatiker für Systemintegration > > ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation > ZARM - Center of Applied Space Technology and Microgravity > > Universität Bremen > Am Fallturm > 28359 Bremen, Germany > > Telefon: 0421 218 57940 > E-Mail: [email protected] > > www.zarm.uni-bremen.de > > -- -- Carles Fenoy
