Hi,

Ok, so I just built an Oscar3.0 cluster using Torque and the add-on Maui package and although it appeared things were working ok to start out with, I've come across a number of problems that make me think I don't have things configured properly. To complicate matters, I recently had to change the name of my master node so I'm not sure whether the problems I am having now are because of the name change or a more general configuration problem. Here is some information on the problem.

In changing the name of the master node, here are the various files/settings I changed:
1) Changed /etc/sysconfig/network, /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth1 to reflect the new name and IP.
2) Logged onto all other nodes and changed /etc/hosts to reflect new name and IP address of master.
3) Stopped pbs_mom and pbs_server and recreated the pbs database after altering the /var/spool/pbs/server_priv/nodes file to include the new name of the master using "/opt/pbs/sbin/pbs_server -t create"
4) Changed /etc/maui/maui.cfg to reflect that the server host is h2o.llnl.gov.
5) Tried re-issuing the "Complete cluster install" from the GUI after rebooting the master and slave nodes and I think that's where I may have screwed things up.


Now when I issue a "pbsnodes -a" command I get a listing that looks like this:
...
node3.cluster
     state = state-unknown,down
     np = 2
     properties = all
     ntype = cluster

All my nodes are reported as down even though mpirun example tests run fine. Jobs submitted via qsub just hang though because it doesn't see that there are any available nodes to run on. The other concern is that before when I had configured the server using Torque, the properties list contained a lot more information. Perhaps me re-issuing the pbs_server -t create command is what blew away that information. All I want is to be able to rename the server and have pbs (Torque) and maui work properly. If anyone has any ideas, I'm up for anything because at this point, nothing is working. Thanks in advance.

-Jen


------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to