Hello, I followed your advice and copied the directory from one of the compute nodes to the new submit node. I opened the firewall on ports 536 and 537 and added execd and qmaster to the /etc/services file. I'm getting the following error messages when I use qping..
[root@galdev common]# qping q.bwh.harvard.edu 536 qmaster 1 endpoint q/qmaster/1 at port 536: can't find connection got select error: Connection refused got select error: closing "q/qmaster/1" endpoint q/qmaster/1 at port 536: can't find connection endpoint q/qmaster/1 at port 536: can't find connection When I try to use qsub I get the following error... [root@galdev jobs]# qsub simple.sh error: commlib error: got select error (Connection refused) Unable to run job: unable to send message to qmaster using port 536 on host "q.bwh.harvard.edu": got send error. Exiting. Any help would be greatly appreciated. -Robert Chase On Wed, Mar 28, 2012 at 6:51 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." < [email protected]> wrote: > > sorry again > one can always add login node in rockscluster that will act as submit node > to sge > regards > > > On 3/28/2012 6:21 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: > >> >> >> On 3/28/2012 5:53 AM, Reuti wrote: >> >>> Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao: >>> >>> May be just copy the /opt/ grid engine from one of the compute node >>>> Add this as submit host from the frontend >>>> >>> It may be good to add an explanation: to me it looks like the original >>> poster installed a separate SGE cluster on just one machine, including the >>> qmaster daemon and hence it's just running local which explains the job id >>> of being 1. >>> >> sorry, if one just copy the /opt/gridengine from compute nodes >> then >> it will have the full directory of /opt/gridengine/default/common and >> /opt/gridengine/bin >> yes there is also default/spool that one could delete >> >> the demon should not run! >> >> of course one will need the home directory, uid etc from the rocks >> frontend >> >> IMHO, it is much simpler then install a new version of SGE >> of course if the submit host is not running the same centos/redhat of >> compute node that is another story >> regards >> >> >>> To add a submit host to an existing cluster it isn't necessary to have >>> any daemon running on it, and installing a different version of SGE will >>> most likely not work too, as the internal protocol changes between the >>> releases. I suggest to: >>> >>> - Stop the daemons you started on the new submit host >>> - Remove the compilation you did >>> - Share the users from the existing cluster by NIS/LDAP (unless you want >>> to define them all by hand on the new machine too) >>> - Mount /home from the existing cluster >>> - Mount /usr/sge or /opt/grid whereever you have SGE installed in the >>> exisitng cluster >>> - Add the machine in question as submit host in the original cluster >>> - Source during login $SGE_ROOT/default/common/**settings.sh on the >>> submit machine >>> >>> Then you should be able to submit jobs from this machine. >>> >>> As there is no builtin file staging in SGE, it's most common to share >>> /home. >>> >>> == >>> >>> Nevertheless it could be done to have a separate single machine cluster >>> (with a different version of SGE) and use file staging (which you have to >>> implement on your own) but it's to much overhead for adding just this >>> particular machine IMO. It's a suitable setup to combine clusters by the >>> use of a transfer queue this way. I did it once and used the job context to >>> name the files which have to be copied back and forth to copy them then on >>> my own in a starter method. >>> >>> -- Reuti >>> >>> >>> LT >>>> >>>> Sent from my iPhone >>>> >>>> On Mar 27, 2012, at 4:36 PM, Robert >>>> Chase<[email protected].**edu<[email protected]>> >>>> wrote: >>>> >>>> Hello, >>>>> >>>>> A number of years ago, our group created a rocks cluster consisting of >>>>> a head node, a data node and eight execution nodes. The eight execution >>>>> nodes can only be accessed by the head node. >>>>> >>>>> My goal is to add a submit node to the existing cluster. I have >>>>> downloaded GE2011.11 and compiled from source without errors. When I try >>>>> the command: >>>>> >>>>> qsub simple.sh >>>>> >>>>> I get the error: >>>>> >>>>> Unable to run job: warning: root your job is not allowed to run in any >>>>> queue >>>>> >>>>> When I look at qstat I get: >>>>> >>>>> job-ID prior name user state submit/start at >>>>> queue slots ja-task-ID >>>>> ------------------------------**------------------------------** >>>>> ------------------------------**----------------------- >>>>> 1 0.55500 simple.sh root qw 03/27/2012 09:41:11 >>>>> 1 >>>>> >>>>> I have added the new submit node to the list of submit nodes on the >>>>> head node using the command >>>>> >>>>> qconf -as >>>>> >>>>> When I run qconf -ss on the new submit node I see the head node, the >>>>> data node and the new submit node. >>>>> >>>>> When I run qconf -ss on the head node, I see the head node, the data >>>>> node, the new submit node and all eight execution nodes. >>>>> >>>>> When I run qhost on the new submit node, I get >>>>> >>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE >>>>> SWAPTO SWAPUS >>>>> ------------------------------**------------------------------**------------------- >>>>> >>>>> global - - - - - >>>>> - - >>>>> >>>>> >>>>> Other posts have asked about the output of qconf -sq all.q... >>>>> >>>>> [root@HEADNODE jobs]# qconf -sq all.q >>>>> qname all.q >>>>> hostlist @allhosts >>>>> seq_no 0 >>>>> load_thresholds np_load_avg=1.75 >>>>> suspend_thresholds NONE >>>>> nsuspend 1 >>>>> suspend_interval 00:05:00 >>>>> priority 0 >>>>> min_cpu_interval 00:05:00 >>>>> processors UNDEFINED >>>>> qtype BATCH INTERACTIVE >>>>> ckpt_list NONE >>>>> pe_list make mpi mpich multicore orte >>>>> rerun FALSE >>>>> slots 1,[compute-0-0.local=16],[**compute-0-1.local=16], >>>>> \ >>>>> [compute-0-2.local=16],[**compute-0-3.local=16], >>>>> \ >>>>> [compute-0-4.local=16],[**compute-0-6.local=16], >>>>> \ >>>>> [compute-0-7.local=16] >>>>> tmpdir /tmp >>>>> shell /bin/csh >>>>> prolog NONE >>>>> epilog NONE >>>>> shell_start_mode posix_compliant >>>>> starter_method NONE >>>>> suspend_method NONE >>>>> resume_method NONE >>>>> terminate_method NONE >>>>> notify 00:00:60 >>>>> owner_list NONE >>>>> user_lists NONE >>>>> xuser_lists NONE >>>>> subordinate_list NONE >>>>> complex_values NONE >>>>> projects NONE >>>>> xprojects NONE >>>>> calendar NONE >>>>> initial_state default >>>>> s_rt INFINITY >>>>> h_rt INFINITY >>>>> s_cpu INFINITY >>>>> h_cpu INFINITY >>>>> s_fsize INFINITY >>>>> h_fsize INFINITY >>>>> s_data INFINITY >>>>> h_data INFINITY >>>>> s_stack INFINITY >>>>> h_stack INFINITY >>>>> s_core INFINITY >>>>> h_core INFINITY >>>>> s_rss INFINITY >>>>> h_rss INFINITY >>>>> s_vmem INFINITY >>>>> h_vmem INFINITY >>>>> >>>>> >>>>> [root@SUBMITNODE jobs]# qconf -sq all.q >>>>> qname all.q >>>>> hostlist @allhosts >>>>> seq_no 0 >>>>> load_thresholds np_load_avg=1.75 >>>>> suspend_thresholds NONE >>>>> nsuspend 1 >>>>> suspend_interval 00:05:00 >>>>> priority 0 >>>>> min_cpu_interval 00:05:00 >>>>> processors UNDEFINED >>>>> qtype BATCH INTERACTIVE >>>>> ckpt_list NONE >>>>> pe_list make >>>>> rerun FALSE >>>>> slots 1 >>>>> tmpdir /tmp >>>>> shell /bin/csh >>>>> prolog NONE >>>>> epilog NONE >>>>> shell_start_mode posix_compliant >>>>> starter_method NONE >>>>> suspend_method NONE >>>>> resume_method NONE >>>>> terminate_method NONE >>>>> notify 00:00:60 >>>>> owner_list NONE >>>>> user_lists NONE >>>>> xuser_lists NONE >>>>> subordinate_list NONE >>>>> complex_values NONE >>>>> projects NONE >>>>> xprojects NONE >>>>> calendar NONE >>>>> initial_state default >>>>> s_rt INFINITY >>>>> h_rt INFINITY >>>>> s_cpu INFINITY >>>>> h_cpu INFINITY >>>>> s_fsize INFINITY >>>>> h_fsize INFINITY >>>>> s_data INFINITY >>>>> h_data INFINITY >>>>> s_stack INFINITY >>>>> h_stack INFINITY >>>>> s_core INFINITY >>>>> h_core INFINITY >>>>> s_rss INFINITY >>>>> h_rss INFINITY >>>>> s_vmem INFINITY >>>>> h_vmem INFINITY >>>>> >>>>> I would like to know how to get qsub working. >>>>> >>>>> Thanks, >>>>> -Robert Paul Chase >>>>> Channing Labs >>>>> ______________________________**_________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/**mailman/listinfo/users<https://gridengine.org/mailman/listinfo/users> >>>>> >>>> ______________________________**_________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/**mailman/listinfo/users<https://gridengine.org/mailman/listinfo/users> >>>> >>> >> > -- > Hung-Sheng Tsao Ph D. > Founder& Principal > HopBit GridComputing LLC > cell: 9734950840 > > http://laotsao.blogspot.com/ > http://laotsao.wordpress.com/ > http://blogs.oracle.com/**hstsao/ <http://blogs.oracle.com/hstsao/> > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
