Hi Jeremy,
Thanks for the response and for pointing out the other hostname specific files that need to be changed. I ended up uninstalling and re-installing Oscar on the server only and that seemed to work. I had tried to redo the "complete cluster install" step and that seemed to fix the pbsnodes problem but then I rebooted and "pbsnodes -a" still reflected all down nodes again and didn't include the properties information. Uninstalling, making sure to uninstall torque rpms by hand, and then reinstalling seemed to do the trick. If I ever have to rename an Oscar cluster again, I will try the suggestions you provided. Fortunately, I only had one client image built on this server so to rebuild Oscar, all I had to do was define clients (and I saved a file with all the MACs before doing that), reassign their MACs and build a standard client image. This would have been much less of an option if I had the server in production for several iterations of builds. Thanks again for getting back to me to help.
-Jen =) At 1:09 AM -0600 11/16/04, Jeremy Enos wrote:
Jen-
Your client nodes need to have the server's new hostname specifically put in their /var/spool/pbs/mom_priv/config files. Also, if you've run "pbs_server -t create", then you've blown away the pbs database generated in it's post_install script. You either need to set the appropriate environment and re-run Torque's post_install script by itself, or go in via the wizard and re-run the "complete cluster setup" step again, which calls all package post_installs.
Notes:
The environment required to run pbs's post_install may just be OSCAR_HOME=/opt/oscar (or whereever you have it), but I'm not for sure on that. You'll have to try it and see. Also, I think there is an update_mom_config script in the Torque package's scripts directory as well as post_install, and it will push out the new configuration. However, you will need to correct the mom_priv/config file on the server first. Make sure you set the server hostname to whatever `hostname` returns on the server. Torque and Maui are both very particular about this.
Jeremy
Jenny Aquilino wrote:
Hi,
Ok, so I just built an Oscar3.0 cluster using Torque and the add-on Maui package and although it appeared things were working ok to start out with, I've come across a number of problems that make me think I don't have things configured properly. To complicate matters, I recently had to change the name of my master node so I'm not sure whether the problems I am having now are because of the name change or a more general configuration problem. Here is some information on the problem.
In changing the name of the master node, here are the various files/settings I changed:
1) Changed /etc/sysconfig/network, /etc/hosts and /etc/sysconfig/network-scripts/ifcfg-eth1 to reflect the new name and IP.
2) Logged onto all other nodes and changed /etc/hosts to reflect new name and IP address of master.
3) Stopped pbs_mom and pbs_server and recreated the pbs database after altering the /var/spool/pbs/server_priv/nodes file to include the new name of the master using "/opt/pbs/sbin/pbs_server -t create"
4) Changed /etc/maui/maui.cfg to reflect that the server host is h2o.llnl.gov.
5) Tried re-issuing the "Complete cluster install" from the GUI after rebooting the master and slave nodes and I think that's where I may have screwed things up.
Now when I issue a "pbsnodes -a" command I get a listing that looks like this:
...
node3.cluster
state = state-unknown,down
np = 2
properties = all
ntype = cluster
All my nodes are reported as down even though mpirun example tests run fine. Jobs submitted via qsub just hang though because it doesn't see that there are any available nodes to run on. The other concern is that before when I had configured the server using Torque, the properties list contained a lot more information. Perhaps me re-issuing the pbs_server -t create command is what blew away that information. All I want is to be able to rename the server and have pbs (Torque) and maui work properly. If anyone has any ideas, I'm up for anything because at this point, nothing is working. Thanks in advance.
-Jen
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
