Mark,

Thanks for all the explanations. Most of it I found through following the build 
logs from the linpack executable. I did get the "hello world" program running. 
I am now down to digging through options to the srun and sbatch commands to 
determine exactly what they do and how it affects the jobs running. I am also 
getting the programmers in my group to supply some other test programs.

Carl

----- Original Message -----
> Hi Carl,
> 
> On 02/08/12 23:25, Carl Schmidtmann wrote:
> >
> > I have now gotten slurm to the point of submitting jobs to my
> > BlueGeneQ but they fail with the following error:
> >
> > 2012-08-02 08:41:30.704 (FATAL) [0xfff801c8b70]
> > 10066:ibm.runjob.client.Job: Load failed on R00-IC-J07:
> > Application executable ELF header contains invalid value, errno 8
> > Exec format error
> 
> This error is because the executable that's been passed to srun (and
> consequently runjob) gets to the IO node and doesn't look like an
> appropriate executable, so doesn't get loaded. This is most likely
> because it hasn't been compiled using the right toolchain.
> 
> >
> > I get this error when trying to run - a simple "hello world" shell
> > script; a simple "hello world" compiled C program; a simple "hello
> > world" C program compiled with the mpi compiler. I am obviously
> > missing something simple here. Below I have included my
> > slurm.conf, bluegene.conf, the shell script and C source code
> > files.
> 
> The shell script won't run on the compute nodes of the Blue Gene (so
> it
> is expected that runjob will complain about it). From the Blue Gene/Q
> Application Development Redbook (section 1.3.6, Application
> development
> and debugging):
> 
> Shell scripts
> The CNK does not provide a mechanism for a command interpreter or
> shell when
> applications start on the Blue Gene/Q system. Only the executable
> program can be started.
> Therefore, if the application includes shell scripts that control
> workflow, the workflow must be
> adapted.
> For example, an application workflow shell script cannot be started
> with
> the runjob command.
> Instead, run the application workflow scripts on the front end node
> and
> start the runjob
> command only at the innermost shell script level where the main
> application binary is called.
> 
> 
> But your simple C program should run fine. You should be able to
> compile
> it using the supplied gcc, here:
> /bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
> 
> Here's what I get:
> 
> $ /bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc -Wall
> -o
> hello-gcc hello.c
> $ salloc -N 1 bash
> salloc: Pending job allocation 27133
> salloc: job 27133 queued and waiting for resources
> salloc: job 27133 has been allocated resources
> salloc: Granted job allocation 27133
> salloc: Block RMP19Jl181150422 is ready for job
> $ srun hello-gcc
> Hello world.
> 
> 
> Alternatively if you wanted to use the xlc compiler you would use
> /opt/ibmcmp/vacpp/bg/12.1/bin/bgxlc (although your BG xlc compiler
> may
> be located elsewhere, this is just the default install location).
> 
> Seeing as you can launch the linpack executable, this looks like a
> compiler or perhaps linking issue. Your SLURM config is probably fine
> if
> you've gotten this far. I can e-mail you the hello world test program
> that I compiled if you want, to ensure that SLURM is working all
> correctly (it is 3.1MB - statically linked).
> 
> Hope that helps!
> Mark
> 
> >
> > The command I use to run them is:
> >
> > srun -N 256 ./helloworld.sh
> >
> > (I know 256 processors to run a shell script is pretty silly but I
> > will work on smaller blocks once I can run a job.)
> >
> > I am running slurm v2.4.2, BlueGene V1R1M1.
> >
> > The weird part is that if I use the linpack executable that I am
> > able to run with the the IBM 'runjob' command does execute from
> > slurm but it doesn't see the extra processors and exits. I have
> > tried replicating how that is compiled by using the mpi compiler
> > for the hello.c program but the makefiles are very convoluted and
> > I have probably missed some flags somewhere.
> >
> > Is this just an issue with compiler flags or is there some slurm
> > setting that might affect this? I would have thought a shell
> > script would run to enable someone to control execution of
> > multiple executables in a job.
> >
> > Thanks for any pointers or suggestions,
> > Carl
> >
> 

-- 
Carl Schmidtmann 
Center for Integrated Research Computing 
University of Rochester 

Reply via email to