I have now gotten slurm to the point of submitting jobs to my BlueGeneQ but 
they fail with the following error:

2012-08-02 08:41:30.704 (FATAL) [0xfff801c8b70] 10066:ibm.runjob.client.Job: 
Load failed on R00-IC-J07: Application executable ELF header contains invalid 
value, errno 8 Exec format error

I get this error when trying to run - a simple "hello world" shell script; a 
simple "hello world" compiled C program; a simple "hello world" C program 
compiled with the mpi compiler. I am obviously missing something simple here. 
Below I have included my slurm.conf, bluegene.conf, the shell script and C 
source code files.

The command I use to run them is:

srun -N 256 ./helloworld.sh

(I know 256 processors to run a shell script is pretty silly but I will work on 
smaller blocks once I can run a job.)

I am running slurm v2.4.2, BlueGene V1R1M1.

The weird part is that if I use the linpack executable that I am able to run 
with the the IBM 'runjob' command does execute from slurm but it doesn't see 
the extra processors and exits. I have tried replicating how that is compiled 
by using the mpi compiler for the hello.c program but the makefiles are very 
convoluted and I have probably missed some flags somewhere.

Is this just an issue with compiler flags or is there some slurm setting that 
might affect this? I would have thought a shell script would run to enable 
someone to control execution of multiple executables in a job.

Thanks for any pointers or suggestions,
Carl

-- 
Carl Schmidtmann 
Center for Integrated Research Computing 
University of Rochester 

[cschmid7_local@bgqsn BlueGeneQ.HPL-base]$ grep -v '^#' 
/usr/local/slurm/2.4.2/etc/slurm.conf
ControlMachine=bgqsn
AuthType=auth/munge
CacheGroups=0
CryptoType=crypto/munge
Epilog=/usr/local/slurm/current/sbin/epilog.bash
MpiDefault=none
ProctrackType=proctrack/pgid
Prolog=/usr/local/slurm/current/sbin/prolog.bash
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/slurm/state/slurmd
SlurmUser=slurm
StateSaveLocation=/var/slurm/state
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/bluegene
AccountingStorageHost=localhost
AccountingStorageLoc=slurmacct
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
ClusterName=UR-BGQ
JobCompHost=localhost
JobCompLoc=slurmdb
JobCompType=jobcomp/slurmdbd
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmSchedLogFile=/var/log/slurm/slurmsched.log
SlurmSchedLogLevel=3
FrontendName=bgqsn State=UNKNOWN
NodeName=bg[0000x0001] CPUs=1024 State=UNKNOWN 
PartitionName=debug Nodes=bg[0000x0001] Default=YES MaxTime=INFINITE State=UP 
Shared=force

[cschmid7_local@bgqsn BlueGeneQ.HPL-base]$ grep -v '^#' 
/usr/local/slurm/2.4.2/etc/bluegene.conf
MloaderImage=/bgsys/drivers/ppcfloor/boot/firmware
Numpsets=4 # io semi-poor
BridgeAPILogFile=/var/log/slurm/bridgeapi.log
BridgeAPIVerbose=2
BasePartitionNodeCnt=512
NodeCardNodeCnt=32
LayoutMode=STATIC
MPs=0000 Type=Torus,Torus,Torus,Torus 32CNBlocks=0 64CNBlocks=0 128CNBlocks=0 
256CNBlocks=2
MPs=0001 Type=Torus,Torus,Torus,Torus 32CNBlocks=0 64CNBlocks=0 128CNBlocks=0 
256CNBlocks=2

[cschmid7_local@bgqsn BlueGeneQ.HPL-base]$ cat helloworld.sh
#!/bin/bash

/bin/echo "Hello World"

exit 0


[cschmid7_local@bgqsn BlueGeneQ.HPL-base]$ cat hello.c

#include <stdio.h>

int main( int argc, char** argv )
{
        printf( "Hello world.\n" );
        return 1;

}

Reply via email to