Am 21.06.2012 um 16:26 schrieb Semi: > Can you help me to debug this problem step by step? Compilation passed OK. > First of all I want understand why it fails even without SGE. > > 1) I defined in rungms: > set NNODES=2 > set TARGET=mpi > /storage/openmpi-1.5_openib/bin/mpirun -np $NPROCS > /storage/app/ymiller/gamess_openib/gamess.$VERNO.x $JOB
Did you compile Open MPI with tight integration by --with-sge? It might be necesary > 2) when I run > /storage/app/ymiller/gamess_openib/rungms exam01 00 2 > exam01.log Is this outside of SGE already failing and you want to run here with 2 computing processes, i.e. 4 in total with tha data servers? -- Reuti > I got error > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 911. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun has exited due to process rank 0 with PID 13189 on > node sge135 exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > @: Expression Syntax. This may come from the line: @ NPROCS = $NCPUS + $NCPUS Can youput some `echo` commands there to check the value of $NCPUS before the call? -- Reuti > On 6/21/2012 3:09 PM, Reuti wrote: >> Am 21.06.2012 um 12:32 schrieb Semi: >> >>> I know how to run MPI on SGE. But GAMESS is more tricky, than simply MPI. >>> If you have such experience, please send me detailed example, like in >>> attached link. >> Well, we also use GAMESS sometimes but just with the default socket >> communication. >> >> Nevertheless, what I remember is that using MPI you need per node one data >> server plus the the computing instances. I.e. in principle you are starting >> more processes than granted slots, as some processes are used for >> communication only. >> >> This explains the doubling of processes in the given link. If you do the >> same in SGE, it's not working? In former times doubling the number of >> processes was indeed a problem, as this would lead to more `qrsh -inherit >> ...` calls than allowed by SGE. Recent MPI implementations are making only a >> one time `qrsh -inherit ...` to a slave node and using forks for additional >> processes. >> >> What happens if you just run it with the double amount of processes in SGE? >> >> -- Reuti >> >> >>> On 6/21/2012 1:08 PM, Dave Love wrote: >>>> Semi <s...@bgu.ac.il> writes: >>>> >>>>> How I can run GAMESS compiled with MPI on SGE? >>>>> I found only PBS: >>>>> http://ccmst.gatech.edu/wiki/index.php?title=GAMESS >>>> There are multiple GAMESS (-US, -UK, PC-) you can play, but they should >>>> run like any other MPI job, though some are better behaved than others. >>>> >>>> Is the problem not knowing how to configure SGE for MPI jobs generally? >>>> >>> >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users