Bernard Li wrote:
What's the output of 'pbsnodes -a'?
pbsnodes -a returns that all are unknown or down
[EMAIL PROTECTED] oscar]# pbsnodes -a
cc001.pg-207.computing.dcu.ie
state = state-unknown,down
np = 1
properties = all
ntype = cluster
cc002.pg-207.computing.dcu.ie
state = state-unknown,down
np = 1
properties = all
ntype = cluster
cc003.pg-207.computing.dcu.ie
state = state-unknown,down
np = 1
properties = all
ntype = cluster
cc004.pg-207.computing.dcu.ie
state = state-unknown,down
np = 1
properties = all
ntype = cluster
Is pbs_mom running on all your client nodes?
a ps aux | grep pbs_mon on all nodes shows it is.
i have tried moving the pbs_oscar alias from the private to the public
address in /etc/hosts
with no success
to recap.
* OSCAR version 4.2.1b5
* Fedora Core 3
* x86
- successfully passed test_cluster after inital set up with head node
and two compute nodes. happy days.
- test fails after adding two new nodes which are up and alive. can
mount /home and pass ssh pings, pvm etc.
but fail pbs
/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
then fails with not enough free nodes.
/nc
Cheers,
Bernard
well it was going well
i added two more nodes
and now it fails
[EMAIL PROTECTED] oscar]# testing/test_cluster
Performing root tests...
Maui service
check:maui
[PASSED]
Shutting down TORQUE Server: [ OK ]
Connection refused
/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
Torque node
check
[PASSED]
Starting TORQUE Server: [ OK ]
Torque service
check:pbs_server
[PASSED]
/home
mounts
[PASSED]
Preparing user tests...
Performing user tests...
SSH ping
test
[PASSED]
SSH server-
>node
[PASSED]
SSH node-
>server
[PASSED]
Checking for 4 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for 4 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for 4 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Torque default queue
definition
[PASSED]
Checking for 4 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Ganglia setup
test
[PASSED]
Ganglia node count
test
[PASSED]
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users