hi did you run . settings.sh point to /opt/gridengine/default/common/settings.sh include qmaster FQDN in /etc/hosts? ???
On 3/28/2012 5:39 PM, Hung-Sheng Tsao (LaoTsao) Ph.D wrote:
hi is this submit host on the public net or private net of rockscluster? is this node run the same os as compute node? -LT Sent from my iPadOn Mar 28, 2012, at 17:05, Robert Chase <[email protected] <mailto:[email protected]>> wrote:Hello,I followed your advice and copied the directory from one of the compute nodes to the new submit node. I opened the firewall on ports 536 and 537 and added execd and qmaster to the /etc/services file. I'm getting the following error messages when I use qping..[root@galdev common]# qping q.bwh.harvard.edu <http://q.bwh.harvard.edu> 536 qmaster 1endpoint q/qmaster/1 at port 536: can't find connection got select error: Connection refused got select error: closing "q/qmaster/1" endpoint q/qmaster/1 at port 536: can't find connection endpoint q/qmaster/1 at port 536: can't find connection When I try to use qsub I get the following error... [root@galdev jobs]# qsub simple.sh error: commlib error: got select error (Connection refused)Unable to run job: unable to send message to qmaster using port 536 on host "q.bwh.harvard.edu <http://q.bwh.harvard.edu>": got send error.Exiting. Any help would be greatly appreciated. -Robert ChaseOn Wed, Mar 28, 2012 at 6:51 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." <[email protected] <mailto:[email protected]>> wrote:sorry again one can always add login node in rockscluster that will act as submit node to sge regards On 3/28/2012 6:21 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: On 3/28/2012 5:53 AM, Reuti wrote: Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao: May be just copy the /opt/ grid engine from one of the compute node Add this as submit host from the frontend It may be good to add an explanation: to me it looks like the original poster installed a separate SGE cluster on just one machine, including the qmaster daemon and hence it's just running local which explains the job id of being 1. sorry, if one just copy the /opt/gridengine from compute nodes then it will have the full directory of /opt/gridengine/default/common and /opt/gridengine/bin yes there is also default/spool that one could delete the demon should not run! of course one will need the home directory, uid etc from the rocks frontend IMHO, it is much simpler then install a new version of SGE of course if the submit host is not running the same centos/redhat of compute node that is another story regards To add a submit host to an existing cluster it isn't necessary to have any daemon running on it, and installing a different version of SGE will most likely not work too, as the internal protocol changes between the releases. I suggest to: - Stop the daemons you started on the new submit host - Remove the compilation you did - Share the users from the existing cluster by NIS/LDAP (unless you want to define them all by hand on the new machine too) - Mount /home from the existing cluster - Mount /usr/sge or /opt/grid whereever you have SGE installed in the exisitng cluster - Add the machine in question as submit host in the original cluster - Source during login $SGE_ROOT/default/common/settings.sh on the submit machine Then you should be able to submit jobs from this machine. As there is no builtin file staging in SGE, it's most common to share /home. == Nevertheless it could be done to have a separate single machine cluster (with a different version of SGE) and use file staging (which you have to implement on your own) but it's to much overhead for adding just this particular machine IMO. It's a suitable setup to combine clusters by the use of a transfer queue this way. I did it once and used the job context to name the files which have to be copied back and forth to copy them then on my own in a starter method. -- Reuti LT Sent from my iPhone On Mar 27, 2012, at 4:36 PM, Robert Chase<[email protected] <mailto:[email protected]>> wrote: Hello, A number of years ago, our group created a rocks cluster consisting of a head node, a data node and eight execution nodes. The eight execution nodes can only be accessed by the head node. My goal is to add a submit node to the existing cluster. I have downloaded GE2011.11 and compiled from source without errors. When I try the command: qsub simple.sh I get the error: Unable to run job: warning: root your job is not allowed to run in any queue When I look at qstat I get: job-ID prior name user statesubmit/start at queue slots ja-task-ID-----------------------------------------------------------------------------------------------------------------1 0.55500 simple.sh root qw 03/27/2012 09:41:11 1I have added the new submit node to the list of submit nodes on the head node using the command qconf -as When I run qconf -ss on the new submit node I see the head node, the data node and the new submit node. When I run qconf -ss on the head node, I see the head node, the data node, the new submit node and all eight execution nodes. When I run qhost on the new submit node, I get HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS -------------------------------------------------------------------------------global - - - - - - -Other posts have asked about the output of qconf -sq all.q... [root@HEADNODE jobs]# qconf -sq all.q qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make mpi mpich multicore orte rerun FALSEslots 1,[compute-0-0.local=16],[compute-0-1.local=16], \ [compute-0-2.local=16],[compute-0-3.local=16], \ [compute-0-4.local=16],[compute-0-6.local=16], \[compute-0-7.local=16] tmpdir /tmp shell /bin/csh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY [root@SUBMITNODE jobs]# qconf -sq all.q qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make rerun FALSE slots 1 tmpdir /tmp shell /bin/csh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY I would like to know how to get qsub working. Thanks, -Robert Paul Chase Channing Labs _______________________________________________ users mailing list [email protected] <mailto:[email protected]> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] <mailto:[email protected]> https://gridengine.org/mailman/listinfo/users-- Hung-Sheng Tsao Ph D.Founder& Principal HopBit GridComputing LLC cell: 9734950840 <tel:9734950840> http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/
-- Hung-Sheng Tsao Ph D. Founder& Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/
<<attachment: laotsao.vcf>>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
