On 3/28/2012 5:53 AM, Reuti wrote:
Am 27.03.2012 um 23:27 schrieb Hung-sheng Tsao:

May be just copy the /opt/ grid engine from one of the compute node
Add this as submit host from the frontend
It may be good to add an explanation: to me it looks like the original poster 
installed a separate SGE cluster on just one machine, including the qmaster 
daemon and hence it's just running local which explains the job id of being 1.
sorry, if one just copy the /opt/gridengine from compute nodes
then
it will have the full directory of /opt/gridengine/default/common and /opt/gridengine/bin
yes there is also default/spool that one could delete

the demon should not run!

of course one will need the home directory, uid etc from the rocks frontend

IMHO, it is much simpler then install a new version of SGE
of course if the submit host is not running the same centos/redhat of compute node that is another story
regards


To add a submit host to an existing cluster it isn't necessary to have any 
daemon running on it, and installing a different version of SGE will most 
likely not work too, as the internal protocol changes between the releases. I 
suggest to:

- Stop the daemons you started on the new submit host
- Remove the compilation you did
- Share the users from the existing cluster by NIS/LDAP (unless you want to 
define them all by hand on the new machine too)
- Mount /home from the existing cluster
- Mount /usr/sge or /opt/grid whereever you have SGE installed in the exisitng 
cluster
- Add the machine in question as submit host in the original cluster
- Source during login $SGE_ROOT/default/common/settings.sh on the submit machine

Then you should be able to submit jobs from this machine.

As there is no builtin file staging in SGE, it's most common to share /home.

==

Nevertheless it could be done to have a separate single machine cluster (with a 
different version of SGE) and use file staging (which you have to implement on 
your own) but it's to much overhead for adding just this particular machine 
IMO. It's a suitable setup to combine clusters by the use of a transfer queue 
this way. I did it once and used the job context to name the files which have 
to be copied back and forth to copy them then on my own in a starter method.

-- Reuti


LT

Sent from my iPhone

On Mar 27, 2012, at 4:36 PM, Robert Chase<[email protected]>  wrote:

Hello,

A number of years ago, our group created a rocks cluster consisting of a head 
node, a data node and eight execution nodes. The eight execution nodes can only 
be accessed by the head node.

My goal is to add a submit node to the existing cluster. I have downloaded 
GE2011.11 and compiled from source without errors. When I try the command:

qsub simple.sh

I get the error:

Unable to run job: warning: root your job is not allowed to run in any queue

When I look at qstat I get:

job-ID  prior   name       user         state submit/start at     queue         
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.55500 simple.sh  root         qw    03/27/2012 09:41:11               
                     1

I have added the new submit node to the list of submit nodes on the head node 
using the command

qconf -as

When I run qconf -ss on the new submit node I see the head node, the data node 
and the new submit node.

When I run qconf -ss on the head node, I see the head node, the data node, the 
new submit node and all eight execution nodes.

When I run qhost on the new submit node, I get

HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -


Other posts have asked about the output of qconf -sq all.q...

[root@HEADNODE jobs]# qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich multicore orte
rerun                 FALSE
slots                 1,[compute-0-0.local=16],[compute-0-1.local=16], \
                      [compute-0-2.local=16],[compute-0-3.local=16], \
                      [compute-0-4.local=16],[compute-0-6.local=16], \
                      [compute-0-7.local=16]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


[root@SUBMITNODE jobs]# qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 1
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

I would like to know how to get qsub working.

Thanks,
-Robert Paul Chase
Channing Labs
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840

http://laotsao.blogspot.com/
http://laotsao.wordpress.com/
http://blogs.oracle.com/hstsao/

<<attachment: laotsao.vcf>>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to