Hi,

any idea why the output of your job is not complete? There is nothing after
"Copying files...". Does the /work/tants directory exists in all the nodes?
The variable $SLURM_JOB_NAME is interpreted by bash so srun only sees "srun
-N2 -n2 rm -rf /work/tants/mpicopytest"

Regards,
Carlos

On Mon, Jul 10, 2017 at 10:02 AM, Dennis Tants <
[email protected]> wrote:

>
> Hello Loris,
>
> Am 10.07.2017 um 07:39 schrieb Loris Bennett:
> > Hi Dennis,
> >
> > Dennis Tants <[email protected]> writes:
> >
> >> Hi list,
> >>
> >> I am a little bit lost right now and would appreciate your help.
> >> We have a little cluster with 16 nodes running with SLURM and it is
> >> doing everything we want, except a few
> >> little things I want to improve.
> >>
> >> So that is why I wanted to upgrade our old SLURM 15.X (don't know the
> >> exact version) to 17.02.4 on my test machine.
> >> I just deleted the old version completely with 'yum erase slurm-*'
> >> (CentOS 7 btw.) and build the new version with rpmbuild.
> >> Everything went fine so I started configuring a new slurm[dbd].conf.
> >> This time I also wanted to integrate backfill instead of FIFO
> >> and also use accounting (just to know which person uses the most
> >> resources). Because we had no databases yet I started
> >> slurmdbd and slurmctld without problems.
> >>
> >> Everything seemed fine with a simple mpi hello world test on one and two
> >> nodes.
> >> Now I wanted to enhance the script a bit more and include working in the
> >> local directory of the nodes which is /work.
> >> To get everything up and running I used the script which I attached for
> >> you (it also includes the output after running the script).
> >> It should basically just copy all data to /work/tants/$SLURM_JOB_NAME
> >> before doing the mpi hello world.
> >> But it seems that srun does not know $SLURM_JOB_NAME even though it is
> >> there.
> >> /work/tants belongs to the correct user and has rwx permissions.
> >>
> >> So did I just configure something wrong or what happened here? Nearly
> >> the same example is working on our cluster with
> >> 15.X. The script is only for testing purposes, thats why there are so
> >> many echo commands in there.
> >> If you see any mistake or can recommend better configurations I would
> >> glady hear them.
> >> Should you need any more information I will provide them.
> >> Thank you for your time!
> > Shouldn't the variable be $SBATCH_JOB_NAME?
> >
> > Cheers,
> >
> > Loris
> >
>
> when I use "echo $SLURM_JOB_NAME" it will tell me the name I specified
> with #SBATCH -J.
> It is not working with srun in this version (it was working in 15.x).
>
> However, when I now use "echo $SBATCH_JOB_NAME" it is just a blank
> variable. As told by someone from the list,
> I used the command "env" to verify which variables are available. This
> list includes SLURM_JOB_NAME
> with the name I specified. So $SLURM_JOB_NAME shouldn't be a problem.
>
> Thank you for your suggestion though.
> Any other hints?
>
> Best regards,
> Dennis
>
> --
> Dennis Tants
> Auszubildender: Fachinformatiker für Systemintegration
>
> ZARM - Zentrum für angewandte Raumfahrttechnologie und Mikrogravitation
> ZARM - Center of Applied Space Technology and Microgravity
>
> Universität Bremen
> Am Fallturm
> 28359 Bremen, Germany
>
> Telefon: 0421 218 57940
> E-Mail: [email protected]
>
> www.zarm.uni-bremen.de
>
>


-- 
--
Carles Fenoy

Reply via email to