Hi Dennis,

Dennis Tants <[email protected]> writes:

> Hi list,
>
> I am a little bit lost right now and would appreciate your help.
> We have a little cluster with 16 nodes running with SLURM and it is
> doing everything we want, except a few
> little things I want to improve.
>
> So that is why I wanted to upgrade our old SLURM 15.X (don't know the
> exact version) to 17.02.4 on my test machine.
> I just deleted the old version completely with 'yum erase slurm-*'
> (CentOS 7 btw.) and build the new version with rpmbuild.
> Everything went fine so I started configuring a new slurm[dbd].conf.
> This time I also wanted to integrate backfill instead of FIFO
> and also use accounting (just to know which person uses the most
> resources). Because we had no databases yet I started
> slurmdbd and slurmctld without problems.
>
> Everything seemed fine with a simple mpi hello world test on one and two
> nodes.
> Now I wanted to enhance the script a bit more and include working in the
> local directory of the nodes which is /work.
> To get everything up and running I used the script which I attached for
> you (it also includes the output after running the script).
> It should basically just copy all data to /work/tants/$SLURM_JOB_NAME
> before doing the mpi hello world.
> But it seems that srun does not know $SLURM_JOB_NAME even though it is
> there.
> /work/tants belongs to the correct user and has rwx permissions.
>
> So did I just configure something wrong or what happened here? Nearly
> the same example is working on our cluster with
> 15.X. The script is only for testing purposes, thats why there are so
> many echo commands in there.
> If you see any mistake or can recommend better configurations I would
> glady hear them.
> Should you need any more information I will provide them.
> Thank you for your time!

Shouldn't the variable be $SBATCH_JOB_NAME?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to