It looks like the LAM installed with your OSCAR does not include support for PBS. I'm not quite sure how that happened, but that is definitely causing everything to be launched on localhost (i.e., if there's no PBS support and you don't provide a hostfile, LAM will execute on just the localhost).

A workaround for this is to use lamboot / mpirun / lamhalt (instead of mpiexec), but to use the hostfile $PBS_NODEFILE, which is created specially for each PBS job and contains a list of all the nodes just in your job. For example:

-----
lamboot $PBS_NODEFILE
mpirun ...
lamhalt
------

On Mar 4, 2005, at 9:46 AM, Salvatore Di Nardo wrote:



How did you verify this?

 cexec "ps -aux | grep mpiblast"
 and i see that ALL processes are on the same node




You should not need to do any patching. So it should work, but did now work. why ?


What version of OSCAR are you running, and what version of LAM/MPI?�
I simply installed Oscar 4, and i'm using default LAM/MPI that comes with Oscar 4



Can you send the output of laminfo?


=============================================== ���������� LAM/MPI: 7.0.6 ����������� Prefix: /opt/lam-7.0.6 ����� Architecture: i686-redhat-linux-gnu ���� Configured by: root ���� Configured on: Tue Nov� 2 23:56:09 EST 2004 ��� Configure host: headnode ������� C bindings: yes ����� C++ bindings: yes � Fortran bindings: yes ������ C profiling: yes ���� C++ profiling: yes Fortran profiling: yes ���� ROMIO support: yes ����� IMPI support: no ���� Debug support: no ����� Purify clean: no ��������� SSI boot: globus (Module v0.5) ��������� SSI boot: rsh (Module v1.0) ��������� SSI coll: lam_basic (Module v7.0) ��������� SSI coll: smp (Module v1.0) ���������� SSI rpi: crtcp (Module v1.0.1) ���������� SSI rpi: lamd (Module v7.0) ���������� SSI rpi: sysv (Module v7.0) ���������� SSI rpi: tcp (Module v7.0) ���������� SSI rpi: usysv (Module v7.0) ===============================================



Just to explain u better, mpiblast works well if i dont use torque/PBS using

 >lamboot <my_hostfile>
 >mpirun -np mpiblast -p blastx -d nr -i frag.0 -o frag.0.out6p_pbs
 >lamhalt

As far i understood is that mpirun is unable to retrieve list of free nodes from PBS so all jobs are sent on first node reserved by pbs. What i need is to bea ble to retrieve a list from PBS and that PBS know that there are some jobs running in those node. If i run mpiblast outside from pbs environment, not only i need to check previusly what nodes are working or not, but the main problem is that also if i find free nodes, PBS can submit other jobs during mpiblast execution.�

 Salvatore Di Nardo





-- {+} Jeff Squyres {+} [EMAIL PROTECTED] {+} http://www.lam-mpi.org/



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to