Ashish-
Try running the pbs_shell test manually for me:

Starting as root:
su - oscartst
cd pbs
qsub -l nodes=2:ppn=1 pbs_script.shell
cat shelltest.err (should be blank)
cat shelltest.out

This should tell us if PBS itself is working ok or not.

Jeremy

At 01:32 PM 4/2/2003 +0000, Ashish Navaney wrote:
Jeremy -

Here's the output of "pbsnodes -a" with 2 compute nodes in the cluster:

[EMAIL PROTECTED] oscar-2.2]# pbsnodes -a
node1.theory
      state = free
      np = 1
      properties = all
      ntype = cluster

node2.theory
      state = free
      np = 1
      properties = all
      ntype = cluster

- using oscar2.2 with rh 7.3 on m/cs with homogenous configns.

My hardware configuration :

P4 processor (2 GHz)
Mainboard KOB P4M266 NDFSMX (VIA Chipset)
256 MB SDRAM
40 GB Harddisk
Built-in Ethernet LAN 10BaseT/100BaseTX
External Ethernet Card (D-Link DFE-538TX) 10/100 Mbps Adapter
24-port 100 MBPS D-Link Network Switch


Note : I am using the external card for networking the cluster.


- The PVM test and the LAM/MPI test fail immediately during cluster
  test(theres no timeout) and there is no error msg.

- have excluded HDF5 from present confign

Here are the relevant PVM and LAM o/p and err files -

1)

/home/oscartst/pvm/pvmtest.out : blank

2)

/home/oscartst/pvm/pvmtest.err :


/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC:pvmd:command not found
master1.c:37:18 : pvm3.h : No such file or directory
slave1.c:34:18 : pvm3.h : No such file or directory
/var/spool/pbs/mom_priv/jobs/16.arjun.th.SC: ./master1 : No such file or directory
pvmd3: no process killed


3)

/home/oscartst/lam/lamtest.out:

Running LAM/MPI test

MPI C Bindings Test -->

TEST FAILED!

Commands : mpicc cpi.c -o lam-cpi && mpirun C lam-cpi && lamclean

4)

/home/oscartst/lam/lamtest.err :

/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamboot : command not found
/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:mpicc : command not found
/var/spool/pbs/mom_priv/jobs/17.arjun.th.SC:lamhalt : command not found


Thx in advance, Ashish



On Wed, 02 Apr 2003 Jeremy Enos wrote :
Ashish-
Please send your "pbsnodes -a" output. What type of hardware are you running on?
thx-


Jeremy

At 03:43 AM 4/2/2003 +0000, Ashish Navaney wrote:
Hi,

thx for replying

oscar2.2 + rh7.3
all m/cs with same confign.( P4 )

1)

when i ran 'pbsnodes -a' on the cluster the o/p shows that the nodes are coming online
(i dont have the exact output at this moment )
but i do remember that all the nodes were listed and for each of them, their 'properties' and 'state' showed as 'free' and 'ntype=cluster'


2)

henceforth i reconfigured oscar without the HDF5 package so the PBS HDF5 test didnt happen but
now the MPICH(via PBS) test fails i.e. it times out
the same error message comes on "Checking for 2 free nodes...not enough free nodes...tests incomplete.There were some issues running some user tests. Please check ur logs."



when i change the switcher option to LAM/MPI, still the LAM/MPI(via PBS) Test fails.


any suggestions...i hv been stuck up with this for days now...even tried oscar 2.1 but i get the same prob.

3)

is there any way i can manually install MPICH / LAM to work with the oscar cluster even though the oscar MPI fails ?

i'm an final year computer engg student from india....need prompt help.

Thx in adavance
Ashish Navaney
([EMAIL PROTECTED])

Message 1 :
Ashish-
It sounds like your second node isn't coming online. Add it and then run a
"pbsnodes -a" for me and send the output.

Jeremy

At 02:21 PM 3/30/2003 +0000, Ashish Navaney wrote:
hi,
i need some help urgently....
trying oscar 2.2 on rh7.3...

with 1 server and 1 node the cluster tests successfully
but on adding even one more node the PBS HDF5 test fails during the
cluster test...the 30 secs timeout
the foll message appears :

"Checking for 2 free nodes...not enough free nodes...tests incomplete.
There were some issues running some user tests. Please check ur logs."

also when i delete the 2nd node the cluster passes the test.

pls help
thx
Ashish Navaney
([EMAIL PROTECTED])




_______________________________________________________________________
Odomos - the only  mosquito protection outside 4 walls -
Click here to know more!
http://r.rediff.com/r?http://clients.rediff.com/odomos/Odomos.htm&&odomos&&wn




-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to