Ashish- Try running the pbs_shell test manually for me:
Starting as root: su - oscartst cd pbs qsub -l nodes=2:ppn=1 pbs_script.shell cat shelltest.err (should be blank) cat shelltest.out
This should tell us if PBS itself is working ok or not.
Jeremy
At 01:32 PM 4/2/2003 +0000, Ashish Navaney wrote:
Jeremy -
Here's the output of "pbsnodes -a" with 2 compute nodes in the cluster:
[EMAIL PROTECTED] oscar-2.2]# pbsnodes -a node1.theory state = free np = 1 properties = all ntype = cluster
node2.theory state = free np = 1 properties = all ntype = cluster
- using oscar2.2 with rh 7.3 on m/cs with homogenous configns.
My hardware configuration :
P4 processor (2 GHz) Mainboard KOB P4M266 NDFSMX (VIA Chipset) 256 MB SDRAM 40 GB Harddisk Built-in Ethernet LAN 10BaseT/100BaseTX External Ethernet Card (D-Link DFE-538TX) 10/100 Mbps Adapter 24-port 100 MBPS D-Link Network Switch
Note : I am using the external card for networking the cluster.
- The PVM test and the LAM/MPI test fail immediately during cluster test(theres no timeout) and there is no error msg.
- have excluded HDF5 from present confign
Here are the relevant PVM and LAM o/p and err files -
1)
/home/oscartst/pvm/pvmtest.out : blank
2)
/home/oscartst/pvm/pvmtest.err :
/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC:pvmd:command not found
master1.c:37:18 : pvm3.h : No such file or directory
slave1.c:34:18 : pvm3.h : No such file or directory
/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC: ./master1 : No such file or directory
pvmd3: no process killed
3)
/home/oscartst/lam/lamtest.out:
Running LAM/MPI test
MPI C Bindings Test -->
TEST FAILED!
Commands : mpicc cpi.c -o lam-cpi && mpirun C lam-cpi && lamclean
4)
/home/oscartst/lam/lamtest.err :
/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamboot : command not found /var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:mpicc : command not found /var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamhalt : command not found
Thx in advance, Ashish
On Wed, 02 Apr 2003 Jeremy Enos wrote :Ashish-
Please send your "pbsnodes -a" output. What type of hardware are you running on?
thx-
Jeremy
At 03:43 AM 4/2/2003 +0000, Ashish Navaney wrote:Hi,
thx for replying
oscar2.2 + rh7.3 all m/cs with same confign.( P4 )
1)
when i ran 'pbsnodes -a' on the cluster the o/p shows that the nodes are coming online
(i dont have the exact output at this moment )
but i do remember that all the nodes were listed and for each of them, their 'properties' and 'state' showed as 'free' and 'ntype=cluster'
2)
henceforth i reconfigured oscar without the HDF5 package so the PBS HDF5 test didnt happen but
now the MPICH(via PBS) test fails i.e. it times out
the same error message comes on "Checking for 2 free nodes...not enough free nodes...tests incomplete.There were some issues running some user tests. Please check ur logs."
when i change the switcher option to LAM/MPI, still the LAM/MPI(via PBS) Test fails.
any suggestions...i hv been stuck up with this for days now...even tried oscar 2.1 but i get the same prob.
3)
is there any way i can manually install MPICH / LAM to work with the oscar cluster even though the oscar MPI fails ?
i'm an final year computer engg student from india....need prompt help.
Thx in adavance Ashish Navaney ([EMAIL PROTECTED])
Message 1 : Ashish- It sounds like your second node isn't coming online. Add it and then run a "pbsnodes -a" for me and send the output.
Jeremy
At 02:21 PM 3/30/2003 +0000, Ashish Navaney wrote:hi, i need some help urgently.... trying oscar 2.2 on rh7.3...
with 1 server and 1 node the cluster tests successfully but on adding even one more node the PBS HDF5 test fails during the cluster test...the 30 secs timeout the foll message appears :
"Checking for 2 free nodes...not enough free nodes...tests incomplete. There were some issues running some user tests. Please check ur logs."
also when i delete the 2nd node the cluster passes the test.
pls help thx Ashish Navaney ([EMAIL PROTECTED])
_______________________________________________________________________ Odomos - the only mosquito protection outside 4 walls - Click here to know more! http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn
-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users
