The odd is that oscar 4 passed all tests ( including lam on pbs ). Anyway ill' try as soon as possible your hint (  lamboot $PBS_NODEFILE ) and give u a feedback.

Tnx in advace


On Fri, 2005-03-04 at 16:06, Jeff Squyres wrote:
It looks like the LAM installed with your OSCAR does not include 
support for PBS.  I'm not quite sure how that happened, but that is 
definitely causing everything to be launched on localhost (i.e., if 
there's no PBS support and you don't provide a hostfile, LAM will 
execute on just the localhost).

A workaround for this is to use lamboot / mpirun / lamhalt (instead of 
mpiexec), but to use the hostfile $PBS_NODEFILE, which is created 
specially for each PBS job and contains a list of all the nodes just in 
your job.  For example:

-----
lamboot $PBS_NODEFILE
mpirun ...
lamhalt
------

On Mar 4, 2005, at 9:46 AM, Salvatore Di Nardo wrote:

>
>
> How did you verify this?
>
>  cexec "ps -aux | grep mpiblast"
>  and i see that ALL processes are on the same node
>
>
>
>
> You should not need to do any patching.
>  So it should work, but did now work. why ?
>
>
> What version of OSCAR are you running, and what version of LAM/MPI? 
>  I simply installed Oscar 4, and i'm using default LAM/MPI that comes 
> with Oscar 4
>
>
> Can you send the output of laminfo?
>
>
> ===============================================
>             LAM/MPI: 7.0.6
>              Prefix: /opt/lam-7.0.6
>        Architecture: i686-redhat-linux-gnu
>       Configured by: root
>       Configured on: Tue Nov  2 23:56:09 EST 2004
>      Configure host: headnode
>          C bindings: yes
>        C++ bindings: yes
>    Fortran bindings: yes
>         C profiling: yes
>       C++ profiling: yes
>  Fortran profiling: yes
>       ROMIO support: yes
>        IMPI support: no
>       Debug support: no
>        Purify clean: no
>            SSI boot: globus (Module v0.5)
>            SSI boot: rsh (Module v1.0)
>            SSI coll: lam_basic (Module v7.0)
>            SSI coll: smp (Module v1.0)
>             SSI rpi: crtcp (Module v1.0.1)
>             SSI rpi: lamd (Module v7.0)
>             SSI rpi: sysv (Module v7.0)
>             SSI rpi: tcp (Module v7.0)
>             SSI rpi: usysv (Module v7.0)
>  ===============================================
>
>
>
>  Just to explain u better, mpiblast works well if i dont use 
> torque/PBS using
>
>  >lamboot <my_hostfile>
>  >mpirun -np mpiblast -p blastx -d nr -i frag.0 -o frag.0.out6p_pbs
>  >lamhalt
>
>  As far i understood is that mpirun is unable to retrieve list of free 
> nodes from PBS so all jobs are sent on first node reserved by pbs. 
> What i need is to bea ble to retrieve a list from PBS and that PBS 
> know that there are some jobs running in those node. If i run mpiblast 
> outside from pbs environment, not only i need to check previusly what 
> nodes are working or not, but the main problem is that also if i find 
> free nodes, PBS can submit other jobs during mpiblast execution. 
>
>  Salvatore Di Nardo
>
>
>
>

Reply via email to