Am 28.03.2012 um 23:05 schrieb Robert Chase: > I followed your advice and copied the directory from one of the compute nodes > to the new submit node. I opened the firewall on ports 536 and 537 and added > execd and qmaster to the /etc/services file. I'm getting the following error > messages when I use qping.. > > [root@galdev common]# qping q.bwh.harvard.edu 536 qmaster 1 > endpoint q/qmaster/1 at port 536: can't find connection > got select error: Connection refused > got select error: closing "q/qmaster/1" > endpoint q/qmaster/1 at port 536: can't find connection > endpoint q/qmaster/1 at port 536: can't find connection
Firewall on both ends disabled? -- Reuti > When I try to use qsub I get the following error... > > [root@galdev jobs]# qsub simple.sh > error: commlib error: got select error (Connection refused) > Unable to run job: unable to send message to qmaster using port 536 on host > "q.bwh.harvard.edu": got send error. > Exiting. > > Any help would be greatly appreciated. > > -Robert Chase > > On Wed, Mar 28, 2012 at 6:51 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." > <[email protected]> wrote: > > sorry again > one can always add login node in rockscluster that will act as submit node to > sge > regards > > > On 3/28/2012 6:21 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: > > > On 3/28/2012 5:53 AM, Reuti wrote: > Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao: > > May be just copy the /opt/ grid engine from one of the compute node > Add this as submit host from the frontend > It may be good to add an explanation: to me it looks like the original poster > installed a separate SGE cluster on just one machine, including the qmaster > daemon and hence it's just running local which explains the job id of being 1. > sorry, if one just copy the /opt/gridengine from compute nodes > then > it will have the full directory of /opt/gridengine/default/common and > /opt/gridengine/bin > yes there is also default/spool that one could delete > > the demon should not run! > > of course one will need the home directory, uid etc from the rocks frontend > > IMHO, it is much simpler then install a new version of SGE > of course if the submit host is not running the same centos/redhat of compute > node that is another story > regards > > > To add a submit host to an existing cluster it isn't necessary to have any > daemon running on it, and installing a different version of SGE will most > likely not work too, as the internal protocol changes between the releases. I > suggest to: > > - Stop the daemons you started on the new submit host > - Remove the compilation you did > - Share the users from the existing cluster by NIS/LDAP (unless you want to > define them all by hand on the new machine too) > - Mount /home from the existing cluster > - Mount /usr/sge or /opt/grid whereever you have SGE installed in the > exisitng cluster > - Add the machine in question as submit host in the original cluster > - Source during login $SGE_ROOT/default/common/settings.sh on the submit > machine > > Then you should be able to submit jobs from this machine. > > As there is no builtin file staging in SGE, it's most common to share /home. > > == > > Nevertheless it could be done to have a separate single machine cluster (with > a different version of SGE) and use file staging (which you have to implement > on your own) but it's to much overhead for adding just this particular > machine IMO. It's a suitable setup to combine clusters by the use of a > transfer queue this way. I did it once and used the job context to name the > files which have to be copied back and forth to copy them then on my own in a > starter method. > > -- Reuti > > > LT > > Sent from my iPhone > > On Mar 27, 2012, at 4:36 PM, Robert Chase<[email protected]> wrote: > > Hello, > > A number of years ago, our group created a rocks cluster consisting of a head > node, a data node and eight execution nodes. The eight execution nodes can > only be accessed by the head node. > > My goal is to add a submit node to the existing cluster. I have downloaded > GE2011.11 and compiled from source without errors. When I try the command: > > qsub simple.sh > > I get the error: > > Unable to run job: warning: root your job is not allowed to run in any queue > > When I look at qstat I get: > > job-ID prior name user state submit/start at queue > slots ja-task-ID > ----------------------------------------------------------------------------------------------------------------- > > 1 0.55500 simple.sh root qw 03/27/2012 09:41:11 > 1 > > I have added the new submit node to the list of submit nodes on the head node > using the command > > qconf -as > > When I run qconf -ss on the new submit node I see the head node, the data > node and the new submit node. > > When I run qconf -ss on the head node, I see the head node, the data node, > the new submit node and all eight execution nodes. > > When I run qhost on the new submit node, I get > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > SWAPUS > ------------------------------------------------------------------------------- > > global - - - - - - > - > > > Other posts have asked about the output of qconf -sq all.q... > > [root@HEADNODE jobs]# qconf -sq all.q > qname all.q > hostlist @allhosts > seq_no 0 > load_thresholds np_load_avg=1.75 > suspend_thresholds NONE > nsuspend 1 > suspend_interval 00:05:00 > priority 0 > min_cpu_interval 00:05:00 > processors UNDEFINED > qtype BATCH INTERACTIVE > ckpt_list NONE > pe_list make mpi mpich multicore orte > rerun FALSE > slots 1,[compute-0-0.local=16],[compute-0-1.local=16], \ > [compute-0-2.local=16],[compute-0-3.local=16], \ > [compute-0-4.local=16],[compute-0-6.local=16], \ > [compute-0-7.local=16] > tmpdir /tmp > shell /bin/csh > prolog NONE > epilog NONE > shell_start_mode posix_compliant > starter_method NONE > suspend_method NONE > resume_method NONE > terminate_method NONE > notify 00:00:60 > owner_list NONE > user_lists NONE > xuser_lists NONE > subordinate_list NONE > complex_values NONE > projects NONE > xprojects NONE > calendar NONE > initial_state default > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > > [root@SUBMITNODE jobs]# qconf -sq all.q > qname all.q > hostlist @allhosts > seq_no 0 > load_thresholds np_load_avg=1.75 > suspend_thresholds NONE > nsuspend 1 > suspend_interval 00:05:00 > priority 0 > min_cpu_interval 00:05:00 > processors UNDEFINED > qtype BATCH INTERACTIVE > ckpt_list NONE > pe_list make > rerun FALSE > slots 1 > tmpdir /tmp > shell /bin/csh > prolog NONE > epilog NONE > shell_start_mode posix_compliant > starter_method NONE > suspend_method NONE > resume_method NONE > terminate_method NONE > notify 00:00:60 > owner_list NONE > user_lists NONE > xuser_lists NONE > subordinate_list NONE > complex_values NONE > projects NONE > xprojects NONE > calendar NONE > initial_state default > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > I would like to know how to get qsub working. > > Thanks, > -Robert Paul Chase > Channing Labs > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > > > -- > Hung-Sheng Tsao Ph D. > Founder& Principal > HopBit GridComputing LLC > cell: 9734950840 > > http://laotsao.blogspot.com/ > http://laotsao.wordpress.com/ > http://blogs.oracle.com/hstsao/ > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
