Thank you so much for that suggestion! That led me straight to the issue. The file system that I have mounted to the head node, is not visible to the compute nodes. All the jobs were failing because I was getting streams of "No such file or directory" errors. If I launch a job from a folder that is part of the OS, the job runs, because that same folder also exists on the compute nodes.
So, what is the best way for my compute nodes to write to the file system that I have set up on the headnode? Thank you again. Sincerely, Adrian Reich On Tue, Dec 9, 2014 at 11:57 AM, <[email protected]> wrote: > > Look at your SlurmctldLogFile (on the head node) and SlurmdLogFile (on the > allocated node). > > > Quoting Adrian Reich <[email protected]>: > > Hello, >> >> I have set up a small SLURM cluster using the SLURM roll within Rocks. >> Every time I try to submit an sbatch job it fails immediately and the job >> quits. However, I can request resources using salloc and everything works. >> How can I go about diagnosing where the issue is and what information can >> I >> provide to help in the diagnosis? Thank you. >> >> Sincerely, >> Adrian Reich >> > > > -- > Morris "Moe" Jette > CTO, SchedMD LLC >
