Bernard, I believe I have provided all the info that you requested. Thanks for your help.
>>Q: Was the job queued or was it running? A: Here is an example of Q'd jobs ... When running the cluster test the job goes from running to Q. If the job is deleted and test_cluster is rerun, the job Qs again just as listed below. Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 12.redrock oscartst workq shelltest -- 2 1 -- 10000 Q -- >>Q: Contents of /home/oscartst/torque: A: -rwxr-xr-x 1 oscartst oscartst 364 Jun 7 09:32 pbs_script.shell -rwxr-xr-x 1 oscartst oscartst 1657 Jun 7 09:32 test_root -rwxr-xr-x 1 oscartst oscartst 1015 Jun 7 09:32 test_user No logs present. >>Q: Torque related logs in /var/spool/pbs: A: No torque logs found in any of the directories. ./pbs: /aux /checkpoint /mom_logs /mom_priv pbs_environment /sched_logs /sched_priv /server_logs server_name /server_priv /spool /undelivered >>Q: Output of ./test_cluster. A: Output below. #-------------------------- ./test_cluster output -------------------------------------# Performing root tests... Maui service check:maui PASSED Torque node check PASSED Torque service check:pbs_server PASSED /home mounts /home mounts eqoscarnode1.eqoscardomain /home mounts eqoscarnode2.eqoscardomain /home mounts PASSED Preparing user tests... Performing user tests... SSH ping test PASSED SSH server->node PASSED SSH node->server PASSED Ganglia setup test PASSED Ganglia node count test PASSED Torque default queue definition PASSED Checking for 2 free nodes: FAILED Not enough free nodes. Tests incomplete. Checking for 2 free nodes: FAILED Not enough free nodes. Tests incomplete. Checking for 2 free nodes: FAILED Not enough free nodes. Tests incomplete. Checking for 2 free nodes: FAILED Not enough free nodes. Tests incomplete. There were issues running some user test scripts. Please check your logs located in /home/oscartst. Run APItests... Running Installation tests for pvm PASS 2006-06-26T08:22:15Z pvmd-path-ls.apt PASS 2006-06-26T08:22:15Z envvar-pvm_arch.apt PASS 2006-06-26T08:22:15Z envvar-pvm_root.apt PASS 2006-06-26T08:22:15Z pvmd-path-which.apt PASS 2006-06-26T08:22:15Z modulecmd-path-ls.apt PASS 2006-06-26T08:22:15Z pvm-module-list.apt PASS 2006-06-26T08:22:15Z pvm-module-show-pvm_rsh.apt PASS 2006-06-26T08:22:15Z pvm-module-show-pvm_arch.apt PASS 2006-06-26T08:22:15Z pvm-module-show-pvm_root.apt >>> "Bernard Li" <[EMAIL PROTECTED]> >>> Hi Tyler: One of the TORQUE tests is a shelltest, and that is probably the job you are seeing with qstat. Was the job queued or was it running? If you delete it and re-run the tests, does another shelltest get stuck? Can you post the output of tests? I'd like to see a complete list of tests which passed/failed. Also, did you check /home/oscartst/torque to see if there was anything there? You might also look for TORQUE related logs in /var/spool/pbs. Cheers, Bernard -----Original Message----- From: [EMAIL PROTECTED] on behalf of Tyler Cruickshank Sent: Thu 22/06/2006 15:32 To: oscar-users@lists.sourceforge.net Subject: [Oscar-users] No Free Nodes Hi Bernard et al, Thought I should re-send this message with the results of pbsnodes -a: I successfully tested a 1 node cluster. I then added a second node and successfully completed Step 7, complete cluster setup. I ran into problems in Step 8, test cluster setup. I failed on a "Not enough free nodes" error. It checks for 2 free nodes, but cant find them. The message says to check error logs in /home/oscartst. The testing PASSes through the Torque default queue definition. I also PASS all of the PVM installation tests. There are plenty of user list entries on the "not enough free nodes" error and based on the pieces of advice found there I checked a few things: 1) All directories are created in /home/oscartst, but no .out or .err files are created in any of the dirs. There aren't any logs to check. 2) http://localhost/ganglia shows that all nodes are up and it indicates the correct number of nodes. gexec is listed as OFF. 3) qstat -a indicates a single job open. The job name is shelltest. I can delete the job using qdel, but this does not enable me to rerun the test scripts with success. 4) pbsnodes -a indicates that the state of both nodes is free. 5) There was a suggestion to check /etc/hosts ... this looked good with the nodes and ips added to the file. 6) I rebooted the server node and retried the test. Same problems. Thanks. ty Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users