Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao: > May be just copy the /opt/ grid engine from one of the compute node > Add this as submit host from the frontend
It may be good to add an explanation: to me it looks like the original poster installed a separate SGE cluster on just one machine, including the qmaster daemon and hence it's just running local which explains the job id of being 1. To add a submit host to an existing cluster it isn't necessary to have any daemon running on it, and installing a different version of SGE will most likely not work too, as the internal protocol changes between the releases. I suggest to: - Stop the daemons you started on the new submit host - Remove the compilation you did - Share the users from the existing cluster by NIS/LDAP (unless you want to define them all by hand on the new machine too) - Mount /home from the existing cluster - Mount /usr/sge or /opt/grid whereever you have SGE installed in the exisitng cluster - Add the machine in question as submit host in the original cluster - Source during login $SGE_ROOT/default/common/settings.sh on the submit machine Then you should be able to submit jobs from this machine. As there is no builtin file staging in SGE, it's most common to share /home. == Nevertheless it could be done to have a separate single machine cluster (with a different version of SGE) and use file staging (which you have to implement on your own) but it's to much overhead for adding just this particular machine IMO. It's a suitable setup to combine clusters by the use of a transfer queue this way. I did it once and used the job context to name the files which have to be copied back and forth to copy them then on my own in a starter method. -- Reuti > LT > > Sent from my iPhone > > On Mar 27, 2012, at 4:36 PM, Robert Chase <[email protected]> wrote: > >> Hello, >> >> A number of years ago, our group created a rocks cluster consisting of a >> head node, a data node and eight execution nodes. The eight execution nodes >> can only be accessed by the head node. >> >> My goal is to add a submit node to the existing cluster. I have downloaded >> GE2011.11 and compiled from source without errors. When I try the command: >> >> qsub simple.sh >> >> I get the error: >> >> Unable to run job: warning: root your job is not allowed to run in any queue >> >> When I look at qstat I get: >> >> job-ID prior name user state submit/start at queue >> slots ja-task-ID >> ----------------------------------------------------------------------------------------------------------------- >> 1 0.55500 simple.sh root qw 03/27/2012 09:41:11 >> 1 >> >> I have added the new submit node to the list of submit nodes on the head >> node using the command >> >> qconf -as >> >> When I run qconf -ss on the new submit node I see the head node, the data >> node and the new submit node. >> >> When I run qconf -ss on the head node, I see the head node, the data node, >> the new submit node and all eight execution nodes. >> >> When I run qhost on the new submit node, I get >> >> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO >> SWAPUS >> ------------------------------------------------------------------------------- >> global - - - - - - >> - >> >> >> Other posts have asked about the output of qconf -sq all.q... >> >> [root@HEADNODE jobs]# qconf -sq all.q >> qname all.q >> hostlist @allhosts >> seq_no 0 >> load_thresholds np_load_avg=1.75 >> suspend_thresholds NONE >> nsuspend 1 >> suspend_interval 00:05:00 >> priority 0 >> min_cpu_interval 00:05:00 >> processors UNDEFINED >> qtype BATCH INTERACTIVE >> ckpt_list NONE >> pe_list make mpi mpich multicore orte >> rerun FALSE >> slots 1,[compute-0-0.local=16],[compute-0-1.local=16], \ >> [compute-0-2.local=16],[compute-0-3.local=16], \ >> [compute-0-4.local=16],[compute-0-6.local=16], \ >> [compute-0-7.local=16] >> tmpdir /tmp >> shell /bin/csh >> prolog NONE >> epilog NONE >> shell_start_mode posix_compliant >> starter_method NONE >> suspend_method NONE >> resume_method NONE >> terminate_method NONE >> notify 00:00:60 >> owner_list NONE >> user_lists NONE >> xuser_lists NONE >> subordinate_list NONE >> complex_values NONE >> projects NONE >> xprojects NONE >> calendar NONE >> initial_state default >> s_rt INFINITY >> h_rt INFINITY >> s_cpu INFINITY >> h_cpu INFINITY >> s_fsize INFINITY >> h_fsize INFINITY >> s_data INFINITY >> h_data INFINITY >> s_stack INFINITY >> h_stack INFINITY >> s_core INFINITY >> h_core INFINITY >> s_rss INFINITY >> h_rss INFINITY >> s_vmem INFINITY >> h_vmem INFINITY >> >> >> [root@SUBMITNODE jobs]# qconf -sq all.q >> qname all.q >> hostlist @allhosts >> seq_no 0 >> load_thresholds np_load_avg=1.75 >> suspend_thresholds NONE >> nsuspend 1 >> suspend_interval 00:05:00 >> priority 0 >> min_cpu_interval 00:05:00 >> processors UNDEFINED >> qtype BATCH INTERACTIVE >> ckpt_list NONE >> pe_list make >> rerun FALSE >> slots 1 >> tmpdir /tmp >> shell /bin/csh >> prolog NONE >> epilog NONE >> shell_start_mode posix_compliant >> starter_method NONE >> suspend_method NONE >> resume_method NONE >> terminate_method NONE >> notify 00:00:60 >> owner_list NONE >> user_lists NONE >> xuser_lists NONE >> subordinate_list NONE >> complex_values NONE >> projects NONE >> xprojects NONE >> calendar NONE >> initial_state default >> s_rt INFINITY >> h_rt INFINITY >> s_cpu INFINITY >> h_cpu INFINITY >> s_fsize INFINITY >> h_fsize INFINITY >> s_data INFINITY >> h_data INFINITY >> s_stack INFINITY >> h_stack INFINITY >> s_core INFINITY >> h_core INFINITY >> s_rss INFINITY >> h_rss INFINITY >> s_vmem INFINITY >> h_vmem INFINITY >> >> I would like to know how to get qsub working. >> >> Thanks, >> -Robert Paul Chase >> Channing Labs >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
