Bernard Li wrote:

What's the output of 'pbsnodes -a'?


pbsnodes -a returns that all are unknown or down

[EMAIL PROTECTED] oscar]# pbsnodes -a
cc001.pg-207.computing.dcu.ie
    state = state-unknown,down
    np = 1
    properties = all
    ntype = cluster

cc002.pg-207.computing.dcu.ie
    state = state-unknown,down
    np = 1
    properties = all
    ntype = cluster

cc003.pg-207.computing.dcu.ie
    state = state-unknown,down
    np = 1
    properties = all
    ntype = cluster

cc004.pg-207.computing.dcu.ie
    state = state-unknown,down
    np = 1
    properties = all
    ntype = cluster


Is pbs_mom running on all your client nodes?


a ps aux | grep pbs_mon on all nodes shows it is.


i have tried moving the pbs_oscar alias from the private to the public address in /etc/hosts
with no success

to recap.

   * OSCAR version 4.2.1b5
   * Fedora Core 3
   * x86

- successfully passed test_cluster after inital set up with head node and two compute nodes. happy days. - test fails after adding two new nodes which are up and alive. can mount /home and pass ssh pings, pvm etc.
but fail pbs

/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
then fails with not enough free nodes.


/nc

Cheers,

Bernard

well it was going well

i added two more nodes
and now it fails

[EMAIL PROTECTED] oscar]# testing/test_cluster
Performing root tests...
Maui service check:maui [PASSED]
Shutting down TORQUE Server:                               [  OK  ]
Connection refused
/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
Torque node check [PASSED]
Starting TORQUE Server:                                    [  OK  ]
Torque service check:pbs_server [PASSED] /home mounts [PASSED]

Preparing user tests...
Performing user tests...
SSH ping test [PASSED] SSH server- >node [PASSED] SSH node- >server [PASSED] Checking for 4 free nodes: [FAILED]
Not enough free nodes. Tests incomplete.
Checking for 4 free nodes: [FAILED]
Not enough free nodes. Tests incomplete.
Checking for 4 free nodes: [FAILED]
Not enough free nodes. Tests incomplete.
Torque default queue definition [PASSED] Checking for 4 free nodes: [FAILED]
Not enough free nodes. Tests incomplete.
Ganglia setup test [PASSED] Ganglia node count test [PASSED]



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to