Yes, I used the "use head node to compute" option when installing.

Michael Edwards wrote:
> The default setup on OSCAR is for the head node not to compute.
> 
> Did you select the "use head node to compute" option?  If so there may
> very well be a bug, it is not a widely used option.
> 
> On 7/6/07, Filipe Garrett <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> I've recently set up an OSCAR cluster with headnode + 1 node (total 4 CPUs).
>> I've been trying to submit jobs on LAM/MPI but some errors keep occurring. 
>> I've
>> noted that on the "bhost" file (pointed by $PBS_NODEFILE) there's just the
>> client node. Is it normal (since the headnode is supposed to also compute)?
>>
>> I attach the script I'm using as well as the error and output logs.
>>
>> thanks in adv,
>> FG
>>
>> tkill: setting prefix to (null)
>> tkill: setting suffix to (null)
>> tkill: got killname back: /tmp/[EMAIL PROTECTED]/lam-killfile
>> tkill: f_kill = "/tmp/[EMAIL PROTECTED]/lam-killfile"
>> tkill: nothing to kill: "/tmp/[EMAIL PROTECTED]/lam-killfile"
>> Job launched in molevol1.ub.edu at Fri Jul  6 18:11:31 2007
>> Shutting down LAM
>> hreq: sending HALT_PING to n0 (molevol1.ub.edu)
>> hreq: received HALT_ACK from n0 (molevol1.ub.edu)
>> hreq: sending HALT_DIE to n0 (molevol1.ub.edu)
>> lamhalt: sleeping to wait for lamds to die
>> lamhalt: local LAM daemon halted
>> LAM halted
>> Job finished at Fri Jul  6 18:11:32 2007
>>
>> n-1<17278> ssi:boot:open: opening
>> n-1<17278> ssi:boot:open: opening boot module globus
>> n-1<17278> ssi:boot:open: opened boot module globus
>> n-1<17278> ssi:boot:open: opening boot module rsh
>> n-1<17278> ssi:boot:open: opened boot module rsh
>> n-1<17278> ssi:boot:open: opening boot module slurm
>> n-1<17278> ssi:boot:open: opened boot module slurm
>> n-1<17278> ssi:boot:open: opening boot module tm
>> n-1<17278> ssi:boot:open: opened boot module tm
>> n-1<17278> ssi:boot:select: initializing boot module slurm
>> n-1<17278> ssi:boot:slurm: not running under SLURM
>> n-1<17278> ssi:boot:select: boot module not available: slurm
>> n-1<17278> ssi:boot:select: initializing boot module globus
>> n-1<17278> ssi:boot:globus: globus-job-run not found, globus boot will not 
>> run
>> n-1<17278> ssi:boot:select: boot module not available: globus
>> n-1<17278> ssi:boot:select: initializing boot module tm
>> n-1<17278> ssi:boot:tm: module initializing
>> n-1<17278> ssi:boot:tm:verbose: 1000
>> n-1<17278> ssi:boot:tm:priority: 50
>> n-1<17278> ssi:boot:select: boot module available: tm, priority: 50
>> n-1<17278> ssi:boot:select: initializing boot module rsh
>> n-1<17278> ssi:boot:rsh: module initializing
>> n-1<17278> ssi:boot:rsh:agent: /usr/bin/ssh
>> n-1<17278> ssi:boot:rsh:username: <same>
>> n-1<17278> ssi:boot:rsh:verbose: 1000
>> n-1<17278> ssi:boot:rsh:algorithm: linear
>> n-1<17278> ssi:boot:rsh:no_n: 0
>> n-1<17278> ssi:boot:rsh:no_profile: 0
>> n-1<17278> ssi:boot:rsh:fast: 0
>> n-1<17278> ssi:boot:rsh:ignore_stderr: 0
>> n-1<17278> ssi:boot:rsh:priority: 10
>> n-1<17278> ssi:boot:select: boot module available: rsh, priority: 10
>> n-1<17278> ssi:boot:select: finalizing boot module slurm
>> n-1<17278> ssi:boot:slurm: finalizing
>> n-1<17278> ssi:boot:select: closing boot module slurm
>> n-1<17278> ssi:boot:select: finalizing boot module globus
>> n-1<17278> ssi:boot:globus: finalizing
>> n-1<17278> ssi:boot:select: closing boot module globus
>> n-1<17278> ssi:boot:select: finalizing boot module rsh
>> n-1<17278> ssi:boot:rsh: finalizing
>> n-1<17278> ssi:boot:select: closing boot module rsh
>> n-1<17278> ssi:boot:select: selected boot module tm
>> n-1<17278> ssi:boot:tm: found the following 1 hosts:
>> n-1<17278> ssi:boot:tm:   n0 molevol1.ub.edu (cpu=1)
>> n-1<17278> ssi:boot:tm: starting RTE procs
>> n-1<17278> ssi:boot:base:linear_windowed: starting
>> n-1<17278> ssi:boot:base:linear_windowed: window size: 5
>> n-1<17278> ssi:boot:base:server: opening server TCP socket
>> n-1<17278> ssi:boot:base:server: opened port 47671
>> n-1<17278> ssi:boot:base:linear_windowed: booting n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: starting wipe on (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: starting on n0 (molevol1.ub.edu): 
>> /opt/lam-7.1.2/bin/tkill -setsid -d -v
>> n-1<17278> ssi:boot:tm: successfully launched on n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: waiting for completion on n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: finished on n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: starting lamd on (molevol1.ub.edu)
>> n-1<17278> ssi:boot:tm: starting on n0 (molevol1.ub.edu): 
>> /opt/lam-7.1.2/bin/lamd -H 161.116.70.157 -P 47671 -n 0 -o 0 -d
>> n-1<17278> ssi:boot:tm: successfully launched on n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:base:linear_windowed: finished launching
>> n-1<17278> ssi:boot:base:server: expecting connection from finite list
>> n-1<17280> ssi:boot:open: opening
>> n-1<17280> ssi:boot:open: opening boot module globus
>> n-1<17280> ssi:boot:open: opened boot module globus
>> n-1<17280> ssi:boot:open: opening boot module rsh
>> n-1<17280> ssi:boot:open: opened boot module rsh
>> n-1<17280> ssi:boot:open: opening boot module slurm
>> n-1<17280> ssi:boot:open: opened boot module slurm
>> n-1<17280> ssi:boot:open: opening boot module tm
>> n-1<17280> ssi:boot:open: opened boot module tm
>> n-1<17280> ssi:boot:select: initializing boot module slurm
>> n-1<17280> ssi:boot:slurm: not running under SLURM
>> n-1<17280> ssi:boot:select: boot module not available: slurm
>> n-1<17280> ssi:boot:select: initializing boot module globus
>> n-1<17280> ssi:boot:globus: globus-job-run not found, globus boot will not 
>> run
>> n-1<17280> ssi:boot:select: boot module not available: globus
>> n-1<17280> ssi:boot:select: initializing boot module tm
>> n-1<17280> ssi:boot:tm: module initializing
>> n-1<17280> ssi:boot:tm:verbose: 1000
>> n-1<17280> ssi:boot:tm:priority: 50
>> n-1<17280> ssi:boot:select: boot module available: tm, priority: 50
>> n-1<17280> ssi:boot:select: initializing boot module rsh
>> n-1<17280> ssi:boot:rsh: module initializing
>> n-1<17280> ssi:boot:rsh:agent: /usr/bin/ssh
>> n-1<17280> ssi:boot:rsh:username: <same>
>> n-1<17280> ssi:boot:rsh:verbose: 1000
>> n-1<17280> ssi:boot:rsh:algorithm: linear
>> n-1<17280> ssi:boot:rsh:no_n: 0
>> n-1<17280> ssi:boot:rsh:no_profile: 0
>> n-1<17280> ssi:boot:rsh:fast: 0
>> n-1<17280> ssi:boot:rsh:ignore_stderr: 0
>> n-1<17280> ssi:boot:rsh:priority: 10
>> n-1<17280> ssi:boot:select: boot module available: rsh, priority: 10
>> n-1<17280> ssi:boot:select: finalizing boot module slurm
>> n-1<17280> ssi:boot:slurm: finalizing
>> n-1<17280> ssi:boot:select: closing boot module slurm
>> n-1<17280> ssi:boot:select: finalizing boot module globus
>> n-1<17280> ssi:boot:globus: finalizing
>> n-1<17280> ssi:boot:select: closing boot module globus
>> n-1<17280> ssi:boot:select: finalizing boot module rsh
>> n-1<17280> ssi:boot:rsh: finalizing
>> n-1<17280> ssi:boot:select: closing boot module rsh
>> n-1<17280> ssi:boot:select: selected boot module tm
>> n-1<17280> ssi:boot:send_lamd: getting node ID from command line
>> n-1<17280> ssi:boot:send_lamd: getting agent haddr from command line
>> n-1<17280> ssi:boot:send_lamd: getting agent port from command line
>> n-1<17280> ssi:boot:send_lamd: getting node ID from command line
>> n-1<17280> ssi:boot:send_lamd: connecting to 161.116.70.157:47671, node id 0
>> n-1<17280> ssi:boot:send_lamd: sending dli_port 32787
>> n-1<17278> ssi:boot:base:server: got connection from 161.116.70.157
>> n-1<17278> ssi:boot:base:server: this connection is expected (n0)
>> n-1<17278> ssi:boot:base:server: remote lamd is at 161.116.70.157:32787
>> n-1<17278> ssi:boot:base:server: closing server socket
>> n-1<17278> ssi:boot:base:server: connecting to lamd at 161.116.70.157:40164
>> n-1<17278> ssi:boot:base:server: connected
>> n-1<17278> ssi:boot:base:server: sending number of links (1)
>> n-1<17278> ssi:boot:base:server: sending info: n0 (molevol1.ub.edu)
>> n-1<17278> ssi:boot:base:server: finished sending
>> n-1<17278> ssi:boot:base:server: disconnected from 161.116.70.157:40164
>> n-1<17278> ssi:boot:base:linear_windowed: finished
>> n-1<17278> ssi:boot:tm: all RTE procs started
>> n-1<17278> ssi:boot:tm: finalizing
>> n-1<17278> ssi:boot: Closing
>> n-1<17280> ssi:boot:tm: finalizing
>> n-1<17280> ssi:boot: Closing
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> /home/molevol/mrbayes-3.1.2/mb_mpi: error while loading shared libraries: 
>> liblamf77mpi.so.0: cannot open shared object file: No such file or directory
>> -----------------------------------------------------------------------------
>> It seems that [at least] one of the processes that was started with
>> mpirun did not invoke MPI_INIT before quitting (it is possible that
>> more than one process did not invoke MPI_INIT -- mpirun was only
>> notified of the first one, which was on node n0).
>>
>> mpirun can *only* be used with MPI programs (i.e., programs that
>> invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
>> to run non-MPI programs over the lambooted nodes.
>> -----------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> Oscar-users mailing list
>> Oscar-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>>
>>
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to