Am 07.04.2012 um 21:16 schrieb Skip Coombe:

> Thanks for all the help - just a couple of quick questions and comment:
> 
> Can I share the $SGE_ROOT as an automount directory?

automount and SGE doesn't work well together. It's better to use a hard mount.


> That is, have only the execd running on the 2nd host without any locally 
> stored
> data - all queues and config info are on the 1st host.

Yep. Then you should see both execution daemons alive in:

$ qhost


> By /home I think you mean
> the settings.sh env vars.

No, the home directory of the users.

But sure, to get access to the SGE commands the settings.sh needs to be sourced 
during login on the machine where you log in.

-- Reuti



> I have opened the ports with the shell var option.
> 
> Skip
> 
> 
> On Sat, Apr 7, 2012 at 3:04 PM, Reuti <[email protected]> wrote:
> Hi,
> 
> Am 07.04.2012 um 20:54 schrieb Skip Coombe:
> 
> > I follwed your advice by removing the local conf's on both elm and oak,
> > but saw no difference in my results.
> >
> > I believe the clus is in this line from my trial submissions:
> >
> > queuename                      qtype resv/used/tot. load_avg arch          
> > states
> > ---------------------------------------------------------------------------------
> > [email protected]            BIP   0/2/2          0.06     linux-x64
> >      35 0.55500 Sleeper    skip         r     04/07/2012 12:42:02     1
> >      36 0.55500 Sleeper    skip         r     04/07/2012 12:42:02     1
> > ---------------------------------------------------------------------------------
> > [email protected]            BIP   0/0/1          -NA-     -NA-          
> > au  <<<<
> >
> > What does "state = au" mean?
> 
> `man qstat` (section "Full Format (with -f and -F)"):
> 
>       ยท  the  state  of  the  queue  - one of u(nknown) if the corresponding 
> ge_execd(8) cannot be contacted, a(larm), A(larm),
>          C(alendar suspended), s(uspended), S(ubordinate), d(isabled), 
> D(isabled), E(rror) or combinations thereof.
> 
> 
> > Why is "tot = 2" for elm and "tot = 1" for oak when they are identical hosts
> 
> If you check the queue configuration, there will be something like:
> 
> $ qconf -sq all.q
> ...
> slots 1,[elm=2]
> 
> You can change it to read
> 
> $ qconf -sq all.q
> ...
> slots 2
> 
> as both machines have two cores if I get you right.
> 
> 
> > configured the same way?
> 
> On elm you granted access to another execution daemon, But unless the 
> execution daemon on oak contact the qmaster, he is not aware that he is alive.
> 
> -- Reuti
> 
> 
> 
> > Thanks,
> >
> > Skip
> >
> > On Sat, Apr 7, 2012 at 11:14 AM, Skip Coombe <[email protected]> wrote:
> > Hi Reuti,
> >
> > On Sat, Apr 7, 2012 at 8:27 AM, Reuti <[email protected]> wrote:
> > Hi,
> >
> > Am 07.04.2012 um 03:54 schrieb Skip Coombe:
> >
> > > (Sorry for incomplete message)
> > >
> > > I set up 2 hosts in one cluster on CentOS 5.4
> > >
> > > Linux version 2.6.18-308.1.1.el5 ([email protected]) (gcc 
> > > version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Wed Mar 7 04:16:51 EST 
> > > 2012
> > > Linux elm.tdi.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >
> > > ge2011.11 installed with ge2011.11-x64.tar with mostly default values 
> > > except db=classic (same domain)
> > >
> > > (all cmds on elm.tdi.local)
> > >
> > > $ qconf -sel
> > > elm.tdi.local
> > > oak.tdi.local
> > >
> > > but
> > >
> > > $ qconf -sconf oak
> > > configuration oak.tdi.local not defined
> >
> > If all the machine have the same OS, you don't need any local configuration 
> > at all. In fact: it often leads to confusion which settings are used 
> > finally.
> >
> > $ qconf -dconf elm
> >
> > The remove the one from eln. Then for both machines the global 
> > configuration is used (qconf -sconf).
> > I will experiment with this, although I am unclear about the process. The 
> > only configuration I did after
> > identical installation on elm and oak was to add oak as an execution host 
> > on elm with
> > qconf -Ae ${oak-execution-host-spec}.
> >
> > How did you install SGE on each of them? The share a common directory on 
> > both machines? You started the execd on oak by /etc/init.d/sgeexecd by hand?
> > Both installations were done identically using scripts install_qmaster 
> > followed by install_execd from
> > a common pathname (/opt/SGE/ge2011.11) but on separate hosts.
> >
> > /opt/SGE/ge2011.11/bin/linux-x64/sge_qmaster
> > /opt/SGE/ge2011.11/bin/linux-x64/sge_execd
> >
> > were both started by the installation scripts and are running on both hosts.
> >
> > Skip
> >
> >
> > -- Reuti
> >
> >
> > > I issued "qsub sleeper.sh 300" 6 times and expected to see 2 jobs being 
> > > executed on
> > > each host, but
> > >
> > > $ qstat -f
> > > queuename                      qtype resv/used/tot. load_avg arch         
> > >  states
> > > ---------------------------------------------------------------------------------
> > > [email protected]            BIP   0/2/2          0.19     linux-x64
> > >      29 0.55500 Sleeper    skip         r     04/06/2012 15:31:32     1
> > >      30 0.55500 Sleeper    skip         r     04/06/2012 15:31:32     1
> > > ---------------------------------------------------------------------------------
> > > [email protected]            BIP   0/0/1          -NA-     -NA-         
> > >  au
> > >
> > > ############################################################################
> > >  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING 
> > > JOBS
> > > ############################################################################
> > >      31 0.55500 Sleeper    skip         qw    04/06/2012 15:31:26     1
> > >      32 0.55500 Sleeper    skip         qw    04/06/2012 15:31:27     1
> > >      33 0.55500 Sleeper    skip         qw    04/06/2012 15:31:28     1
> > >      34 0.55500 Sleeper    skip         qw    04/06/2012 15:31:29     1
> > >
> > >
> > >
> > >
> > > [skip@elm jobs]$ qstat -F
> > > queuename                      qtype resv/used/tot. load_avg arch         
> > >  states
> > > ---------------------------------------------------------------------------------
> > > [email protected]            BIP   0/2/2          0.19     linux-x64
> > >         hl:load_avg=0.190000
> > >         hl:load_short=0.290000
> > >         hl:load_medium=0.190000
> > >         hl:load_long=0.150000
> > >         hl:arch=linux-x64
> > >         hl:num_proc=2
> > >         hl:mem_free=2.969G
> > >         hl:swap_free=5.750G
> > >         hl:virtual_free=8.718G
> > >         hl:mem_total=3.796G
> > >         hl:swap_total=5.750G
> > >         hl:virtual_total=9.546G
> > >         hl:mem_used=846.965M
> > >         hl:swap_used=160.000K
> > >         hl:virtual_used=847.121M
> > >         hl:cpu=1.000000
> > >         hl:m_topology=SCC
> > >         hl:m_topology_inuse=SCC
> > >         hl:m_socket=1
> > >         hl:m_core=2
> > >         hl:np_load_avg=0.095000
> > >         hl:np_load_short=0.145000
> > >         hl:np_load_medium=0.095000
> > >         hl:np_load_long=0.075000
> > >         qf:qname=all.q
> > >         qf:hostname=elm.tdi.local
> > >         qc:slots=0
> > >         qf:tmpdir=/tmp
> > >         qf:seq_no=0
> > >         qf:rerun=0.000000
> > >         qf:calendar=NONE
> > >         qf:s_rt=infinity
> > >         qf:h_rt=infinity
> > >         qf:s_cpu=infinity
> > >         qf:h_cpu=infinity
> > >         qf:s_fsize=infinity
> > >         qf:h_fsize=infinity
> > >         qf:s_data=infinity
> > >         qf:h_data=infinity
> > >         qf:s_stack=infinity
> > >         qf:h_stack=infinity
> > >         qf:s_core=infinity
> > >         qf:h_core=infinity
> > >         qf:s_rss=infinity
> > >         qf:h_rss=infinity
> > >         qf:s_vmem=infinity
> > >         qf:h_vmem=infinity
> > >         qf:min_cpu_interval=00:05:00
> > >      29 0.55500 Sleeper    skip         r     04/06/2012 15:31:32     1
> > >      30 0.55500 Sleeper    skip         r     04/06/2012 15:31:32     1
> > > ---------------------------------------------------------------------------------
> > > [email protected]            BIP   0/0/1          -NA-     -NA-         
> > >  au
> > >         qf:qname=all.q
> > >         qf:hostname=oak.tdi.local
> > >         qc:slots=1
> > >         qf:tmpdir=/tmp
> > >         qf:seq_no=0
> > >         qf:rerun=0.000000
> > >         qf:calendar=NONE
> > >         qf:s_rt=infinity
> > >         qf:h_rt=infinity
> > >         qf:s_cpu=infinity
> > >         qf:h_cpu=infinity
> > >         qf:s_fsize=infinity
> > >         qf:h_fsize=infinity
> > >         qf:s_data=infinity
> > >         qf:h_data=infinity
> > >         qf:s_stack=infinity
> > >         qf:h_stack=infinity
> > >         qf:s_core=infinity
> > >         qf:h_core=infinity
> > >         qf:s_rss=infinity
> > >         qf:h_rss=infinity
> > >         qf:s_vmem=infinity
> > >         qf:h_vmem=infinity
> > >         qf:min_cpu_interval=00:05:00
> > >
> > > ############################################################################
> > >  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING 
> > > JOBS
> > > ############################################################################
> > >      31 0.55500 Sleeper    skip         qw    04/06/2012 15:31:26     1
> > >      32 0.55500 Sleeper    skip         qw    04/06/2012 15:31:27     1
> > >      33 0.55500 Sleeper    skip         qw    04/06/2012 15:31:28     1
> > >      34 0.55500 Sleeper    skip         qw    04/06/2012 15:31:29     1
> > >
> > > also qmon cluster conf (on elm) only shows elm, but has both host in
> > > the execution hosts list and has a host group containing both named 
> > > "@allhosts".
> > >
> > > I'm probably overlooking something obvious. Any help will be appreciated.
> > >
> > > Skip Coombe
> > > [email protected]
> > >
> > >
> > >
> > > --
> > > Skip Coombe
> > > [email protected]
> > > 919.442.VLSI
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Skip Coombe
> > [email protected]
> > 919.442.VLSI
> >
> >
> >
> >
> >
> >
> > --
> > Skip Coombe
> > [email protected]
> > 919.442.VLSI
> >
> >
> >
> 
> 
> 
> 
> -- 
> Skip Coombe
> [email protected]
> 919.442.VLSI
> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to