[slurm-dev] Re: Diagnosing failing sbatch jobs

Adrian Reich Tue, 09 Dec 2014 09:27:11 -0800

Thank you so much for that suggestion! That led me straight to the issue.
The file system that I have mounted to the head node, is not visible to the
compute nodes. All the jobs were failing because I was getting streams of
"No such file or directory" errors. If I launch a job from a folder that is
part of the OS, the job runs, because that same folder also exists on the
compute nodes.


So, what is the best way for my compute nodes to write to the file system
that I have set up on the headnode? Thank you again.

Sincerely,
Adrian Reich

On Tue, Dec 9, 2014 at 11:57 AM, <[email protected]> wrote:

>
> Look at your SlurmctldLogFile (on the head node) and SlurmdLogFile (on the
> allocated node).
>
>
> Quoting Adrian Reich <[email protected]>:
>
>  Hello,
>>
>> I have set up a small SLURM cluster using the SLURM roll within Rocks.
>> Every time I try to submit an sbatch job it fails immediately and the job
>> quits. However, I can request resources using salloc and everything works.
>> How can I go about diagnosing where the issue is and what information can
>> I
>> provide to help in the diagnosis? Thank you.
>>
>> Sincerely,
>> Adrian Reich
>>
>
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
>

[slurm-dev] Re: Diagnosing failing sbatch jobs

Reply via email to