Am 21.06.2012 um 16:26 schrieb Semi:

> Can you help me to debug this problem step by step? Compilation passed OK.
> First of all I want understand why it fails even without SGE.
> 
> 1) I defined in rungms:
> set NNODES=2
> set TARGET=mpi
> /storage/openmpi-1.5_openib/bin/mpirun -np $NPROCS 
> /storage/app/ymiller/gamess_openib/gamess.$VERNO.x $JOB

Did you compile Open MPI with tight integration by --with-sge? It might be 
necesary 


> 2) when I run
> /storage/app/ymiller/gamess_openib/rungms exam01 00 2 > exam01.log

Is this outside of SGE already failing and you want to run here with 2 
computing processes, i.e. 4 in total with tha data servers?

-- Reuti
 

> I got error
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 911.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 13189 on
> node sge135 exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> @: Expression Syntax.

This may come from the line:

@ NPROCS = $NCPUS + $NCPUS

Can youput some `echo` commands there to check the value of $NCPUS before the 
call?

-- Reuti


> On 6/21/2012 3:09 PM, Reuti wrote:
>> Am 21.06.2012 um 12:32 schrieb Semi:
>> 
>>> I know how to run MPI on SGE. But GAMESS is more tricky, than simply MPI.
>>> If you have such experience, please send me detailed example, like in 
>>> attached link.
>> Well, we also use GAMESS sometimes but just with the default socket 
>> communication.
>> 
>> Nevertheless, what I remember is that using MPI you need per node one data 
>> server plus the the computing instances. I.e. in principle you are starting 
>> more processes than granted slots, as some processes are used for 
>> communication only.
>> 
>> This explains the doubling of processes in the given link. If you do the 
>> same in SGE, it's not working? In former times doubling the number of 
>> processes was indeed a problem, as this would lead to more `qrsh -inherit 
>> ...` calls than allowed by SGE. Recent MPI implementations are making only a 
>> one time `qrsh -inherit ...` to a slave node and using forks for additional 
>> processes.
>> 
>> What happens if you just run it with the double amount of processes in SGE?
>> 
>> -- Reuti
>> 
>> 
>>> On 6/21/2012 1:08 PM, Dave Love wrote:
>>>> Semi <s...@bgu.ac.il> writes:
>>>> 
>>>>> How I can run GAMESS compiled with MPI on SGE?
>>>>> I found only PBS:
>>>>> http://ccmst.gatech.edu/wiki/index.php?title=GAMESS
>>>> There are multiple GAMESS (-US, -UK, PC-) you can play, but they should
>>>> run like any other MPI job, though some are better behaved than others.
>>>> 
>>>> Is the problem not knowing how to configure SGE for MPI jobs generally?
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@gridengine.org
>>> https://gridengine.org/mailman/listinfo/users
> 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to