which version of rocks? -LT Sent from my iPad
On Mar 28, 2012, at 17:39, "Hung-Sheng Tsao (LaoTsao) Ph.D" <[email protected]> wrote: > hi > is this submit host on the public net or private net of rockscluster? > is this node run the same os as compute node? > -LT > > > Sent from my iPad > > On Mar 28, 2012, at 17:05, Robert Chase <[email protected]> wrote: > >> Hello, >> >> I followed your advice and copied the directory from one of the compute >> nodes to the new submit node. I opened the firewall on ports 536 and 537 and >> added execd and qmaster to the /etc/services file. I'm getting the following >> error messages when I use qping.. >> >> [root@galdev common]# qping q.bwh.harvard.edu 536 qmaster 1 >> endpoint q/qmaster/1 at port 536: can't find connection >> got select error: Connection refused >> got select error: closing "q/qmaster/1" >> endpoint q/qmaster/1 at port 536: can't find connection >> endpoint q/qmaster/1 at port 536: can't find connection >> >> When I try to use qsub I get the following error... >> >> [root@galdev jobs]# qsub simple.sh >> error: commlib error: got select error (Connection refused) >> Unable to run job: unable to send message to qmaster using port 536 on host >> "q.bwh.harvard.edu": got send error. >> Exiting. >> >> Any help would be greatly appreciated. >> >> -Robert Chase >> >> On Wed, Mar 28, 2012 at 6:51 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." >> <[email protected]> wrote: >> >> sorry again >> one can always add login node in rockscluster that will act as submit node >> to sge >> regards >> >> >> On 3/28/2012 6:21 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: >> >> >> On 3/28/2012 5:53 AM, Reuti wrote: >> Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao: >> >> May be just copy the /opt/ grid engine from one of the compute node >> Add this as submit host from the frontend >> It may be good to add an explanation: to me it looks like the original >> poster installed a separate SGE cluster on just one machine, including the >> qmaster daemon and hence it's just running local which explains the job id >> of being 1. >> sorry, if one just copy the /opt/gridengine from compute nodes >> then >> it will have the full directory of /opt/gridengine/default/common and >> /opt/gridengine/bin >> yes there is also default/spool that one could delete >> >> the demon should not run! >> >> of course one will need the home directory, uid etc from the rocks frontend >> >> IMHO, it is much simpler then install a new version of SGE >> of course if the submit host is not running the same centos/redhat of >> compute node that is another story >> regards >> >> >> To add a submit host to an existing cluster it isn't necessary to have any >> daemon running on it, and installing a different version of SGE will most >> likely not work too, as the internal protocol changes between the releases. >> I suggest to: >> >> - Stop the daemons you started on the new submit host >> - Remove the compilation you did >> - Share the users from the existing cluster by NIS/LDAP (unless you want to >> define them all by hand on the new machine too) >> - Mount /home from the existing cluster >> - Mount /usr/sge or /opt/grid whereever you have SGE installed in the >> exisitng cluster >> - Add the machine in question as submit host in the original cluster >> - Source during login $SGE_ROOT/default/common/settings.sh on the submit >> machine >> >> Then you should be able to submit jobs from this machine. >> >> As there is no builtin file staging in SGE, it's most common to share /home. >> >> == >> >> Nevertheless it could be done to have a separate single machine cluster >> (with a different version of SGE) and use file staging (which you have to >> implement on your own) but it's to much overhead for adding just this >> particular machine IMO. It's a suitable setup to combine clusters by the use >> of a transfer queue this way. I did it once and used the job context to name >> the files which have to be copied back and forth to copy them then on my own >> in a starter method. >> >> -- Reuti >> >> >> LT >> >> Sent from my iPhone >> >> On Mar 27, 2012, at 4:36 PM, Robert Chase<[email protected]> wrote: >> >> Hello, >> >> A number of years ago, our group created a rocks cluster consisting of a >> head node, a data node and eight execution nodes. The eight execution nodes >> can only be accessed by the head node. >> >> My goal is to add a submit node to the existing cluster. I have downloaded >> GE2011.11 and compiled from source without errors. When I try the command: >> >> qsub simple.sh >> >> I get the error: >> >> Unable to run job: warning: root your job is not allowed to run in any queue >> >> When I look at qstat I get: >> >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> >> 1 0.55500 simple.sh root qw 03/27/2012 09:41:11 >> 1 >> >> I have added the new submit node to the list of submit nodes on the head >> node using the command >> >> qconf -as >> >> When I run qconf -ss on the new submit node I see the head node, the data >> node and the new submit node. >> >> When I run qconf -ss on the head node, I see the head node, the data node, >> the new submit node and all eight execution nodes. >> >> When I run qhost on the new submit node, I get >> >> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >> SWAPUS >> ------------------------------------------------------------------------------- >> >> global - - - - - - >> - >> >> >> Other posts have asked about the output of qconf -sq all.q... >> >> [root@HEADNODE jobs]# qconf -sq all.q >> qname all.q >> hostlist @allhosts >> seq_no 0 >> load_thresholds np_load_avg=1.75 >> suspend_thresholds NONE >> nsuspend 1 >> suspend_interval 00:05:00 >> priority 0 >> min_cpu_interval 00:05:00 >> processors UNDEFINED >> qtype BATCH INTERACTIVE >> ckpt_list NONE >> pe_list make mpi mpich multicore orte >> rerun FALSE >> slots 1,[compute-0-0.local=16],[compute-0-1.local=16], \ >> [compute-0-2.local=16],[compute-0-3.local=16], \ >> [compute-0-4.local=16],[compute-0-6.local=16], \ >> [compute-0-7.local=16] >> tmpdir /tmp >> shell /bin/csh >> prolog NONE >> epilog NONE >> shell_start_mode posix_compliant >> starter_method NONE >> suspend_method NONE >> resume_method NONE >> terminate_method NONE >> notify 00:00:60 >> owner_list NONE >> user_lists NONE >> xuser_lists NONE >> subordinate_list NONE >> complex_values NONE >> projects NONE >> xprojects NONE >> calendar NONE >> initial_state default >> s_rt INFINITY >> h_rt INFINITY >> s_cpu INFINITY >> h_cpu INFINITY >> s_fsize INFINITY >> h_fsize INFINITY >> s_data INFINITY >> h_data INFINITY >> s_stack INFINITY >> h_stack INFINITY >> s_core INFINITY >> h_core INFINITY >> s_rss INFINITY >> h_rss INFINITY >> s_vmem INFINITY >> h_vmem INFINITY >> >> >> [root@SUBMITNODE jobs]# qconf -sq all.q >> qname all.q >> hostlist @allhosts >> seq_no 0 >> load_thresholds np_load_avg=1.75 >> suspend_thresholds NONE >> nsuspend 1 >> suspend_interval 00:05:00 >> priority 0 >> min_cpu_interval 00:05:00 >> processors UNDEFINED >> qtype BATCH INTERACTIVE >> ckpt_list NONE >> pe_list make >> rerun FALSE >> slots 1 >> tmpdir /tmp >> shell /bin/csh >> prolog NONE >> epilog NONE >> shell_start_mode posix_compliant >> starter_method NONE >> suspend_method NONE >> resume_method NONE >> terminate_method NONE >> notify 00:00:60 >> owner_list NONE >> user_lists NONE >> xuser_lists NONE >> subordinate_list NONE >> complex_values NONE >> projects NONE >> xprojects NONE >> calendar NONE >> initial_state default >> s_rt INFINITY >> h_rt INFINITY >> s_cpu INFINITY >> h_cpu INFINITY >> s_fsize INFINITY >> h_fsize INFINITY >> s_data INFINITY >> h_data INFINITY >> s_stack INFINITY >> h_stack INFINITY >> s_core INFINITY >> h_core INFINITY >> s_rss INFINITY >> h_rss INFINITY >> s_vmem INFINITY >> h_vmem INFINITY >> >> I would like to know how to get qsub working. >> >> Thanks, >> -Robert Paul Chase >> Channing Labs >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> >> >> -- >> Hung-Sheng Tsao Ph D. >> Founder& Principal >> HopBit GridComputing LLC >> cell: 9734950840 >> >> http://laotsao.blogspot.com/ >> http://laotsao.wordpress.com/ >> http://blogs.oracle.com/hstsao/ >> >>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
