Jen-
Your client nodes need to have the server's new hostname specifically put in their /var/spool/pbs/mom_priv/config files. Also, if you've run "pbs_server -t create", then you've blown away the pbs database generated in it's post_install script. You either need to set the appropriate environment and re-run Torque's post_install script by itself, or go in via the wizard and re-run the "complete cluster setup" step again, which calls all package post_installs.


Notes:
The environment required to run pbs's post_install may just be OSCAR_HOME=/opt/oscar (or whereever you have it), but I'm not for sure on that. You'll have to try it and see. Also, I think there is an update_mom_config script in the Torque package's scripts directory as well as post_install, and it will push out the new configuration. However, you will need to correct the mom_priv/config file on the server first. Make sure you set the server hostname to whatever `hostname` returns on the server. Torque and Maui are both very particular about this.


   Jeremy

Jenny Aquilino wrote:

Hi,

Ok, so I just built an Oscar3.0 cluster using Torque and the add-on Maui package and although it appeared things were working ok to start out with, I've come across a number of problems that make me think I don't have things configured properly. To complicate matters, I recently had to change the name of my master node so I'm not sure whether the problems I am having now are because of the name change or a more general configuration problem. Here is some information on the problem.

In changing the name of the master node, here are the various files/settings I changed:
1) Changed /etc/sysconfig/network, /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth1 to reflect the new name and IP.
2) Logged onto all other nodes and changed /etc/hosts to reflect new name and IP address of master.
3) Stopped pbs_mom and pbs_server and recreated the pbs database after altering the /var/spool/pbs/server_priv/nodes file to include the new name of the master using "/opt/pbs/sbin/pbs_server -t create"
4) Changed /etc/maui/maui.cfg to reflect that the server host is h2o.llnl.gov.
5) Tried re-issuing the "Complete cluster install" from the GUI after rebooting the master and slave nodes and I think that's where I may have screwed things up.


Now when I issue a "pbsnodes -a" command I get a listing that looks like this:
...
node3.cluster
state = state-unknown,down
np = 2
properties = all
ntype = cluster


All my nodes are reported as down even though mpirun example tests run fine. Jobs submitted via qsub just hang though because it doesn't see that there are any available nodes to run on. The other concern is that before when I had configured the server using Torque, the properties list contained a lot more information. Perhaps me re-issuing the pbs_server -t create command is what blew away that information. All I want is to be able to rename the server and have pbs (Torque) and maui work properly. If anyone has any ideas, I'm up for anything because at this point, nothing is working. Thanks in advance.

-Jen


------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users



------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to