Hi Reuti,
It seems that the previous tests are wrong.
I realize that your doubts are right.. There was only one slot being busy
despite all 16 being deployed.
I´d change the job launcher to:
$qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
Note that (for some reason) it´s mandatory to tell PE and mpi that are 20
slots to use.
Doing that, it comes this output for a job with 20 slots
$round_robin:
job with 20 slots
job launched as
$qsub -N $nameofthecase -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
$ ps -e f --cols=500
2390 ? Sl 0:00 /opt/sge6/bin/linux-x64/sge_execd
2835 ? S 0:00 \_ sge_shepherd-1 -bg
2837 ? Ss 0:00 \_ mpiexec -np 20 newave170502_L
2838 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
--control-port master:46220 --demux poll --pgid 0 --retries 10 --proxy-id 0
2840 ? R 1:18 | \_ newave170502_L
2841 ? S 0:54 | \_ newave170502_L
2842 ? S 1:07 | \_ newave170502_L
2843 ? S 0:52 | \_ newave170502_L
2844 ? S 1:07 | \_ newave170502_L
2845 ? S 1:08 | \_ newave170502_L
2846 ? S 0:00 | \_ newave170502_L
2847 ? S 0:00 | \_ newave170502_L
2848 ? S 0:00 | \_ newave170502_L
2849 ? S 0:00 | \_ newave170502_L
2839 ? Sl 0:00 \_ /opt/sge6/bin/linux-x64/qrsh
-inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:46220
--demux poll --pgid 0 --retries 10 --proxy-id 1
$ mpiexec --version
HYDRA build details:
Version: 1.4
Release Date: Thu Jun 16 16:41:08 CDT 2011
CC: gcc
-I/build/buildd/mpich2-1.4/src/mpl/include
-I/build/buildd/mpich2-1.4/src/mpl/include
-I/build/buildd/mpich2-1.4/src/openpa/src
-I/build/buildd/mpich2-1.4/src/openpa/src
-I/build/buildd/mpich2-1.4/src/mpid/ch3/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/include
-I/build/buildd/mpich2-1.4/src/mpid/common/datatype
-I/build/buildd/mpich2-1.4/src/mpid/common/datatype
-I/build/buildd/mpich2-1.4/src/mpid/common/locks
-I/build/buildd/mpich2-1.4/src/mpid/common/locks
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
-I/build/buildd/mpich2-1.4/src/util/wrappers
-I/build/buildd/mpich2-1.4/src/util/wrappers -g -O2 -g -O2 -Wall -O2
-Wl,-Bsymbolic-functions -lrt -lcr -lpthread
CXX:
F77:
F90: gfortran -Wl,-Bsymbolic-functions
-lrt -lcr -lpthread
Configure options: '--build=x86_64-linux-gnu'
'--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
'--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var'
'--libexecdir=${prefix}/lib/mpich2' '--srcdir=.'
'--disable-maintainer-mode' '--disable-dependency-tracking'
'--disable-silent-rules' '--enable-shared' '--prefix=/usr' '--enable-fc'
'--disable-rpath' '--sysconfdir=/etc/mpich2'
'--includedir=/usr/include/mpich2' '--docdir=/usr/share/doc/mpich2'
'--with-hwloc-prefix=system' '--enable-checkpointing'
'--with-hydra-ckpointlib=blcr' 'build_alias=x86_64-linux-gnu'
'MPICH2LIB_CFLAGS=-g -O2 -g -O2 -Wall' 'MPICH2LIB_CXXFLAGS=-g -O2 -g -O2
-Wall' 'MPICH2LIB_FFLAGS=-g -O2' 'MPICH2LIB_FCFLAGS='
'LDFLAGS=-Wl,-Bsymbolic-functions ' 'CPPFLAGS=
-I/build/buildd/mpich2-1.4/src/mpl/include
-I/build/buildd/mpich2-1.4/src/mpl/include
-I/build/buildd/mpich2-1.4/src/openpa/src
-I/build/buildd/mpich2-1.4/src/openpa/src
-I/build/buildd/mpich2-1.4/src/mpid/ch3/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/include
-I/build/buildd/mpich2-1.4/src/mpid/common/datatype
-I/build/buildd/mpich2-1.4/src/mpid/common/datatype
-I/build/buildd/mpich2-1.4/src/mpid/common/locks
-I/build/buildd/mpich2-1.4/src/mpid/common/locks
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/include
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
-I/build/buildd/mpich2-1.4/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor
-I/build/buildd/mpich2-1.4/src/util/wrappers
-I/build/buildd/mpich2-1.4/src/util/wrappers' 'FFLAGS= -g -O2 -O2'
'FC=gfortran' 'CFLAGS= -g -O2 -g -O2 -Wall -O2' 'CXXFLAGS= -g -O2 -g -O2
-Wall -O2' '--disable-option-checking' 'CC=gcc' 'LIBS=-lrt -lcr -lpthread '
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge
none persist
Binding libraries available: hwloc plpa
Resource management kernels available: none slurm ll lsf sge pbs
Checkpointing libraries available: blcr
Demux engines available: poll select
$ ps -eLf
sgeadmin 2837 2835 2837 0 1 19:49 ? 00:00:00 mpiexec -np 20
newave170502_L
sgeadmin 2838 2837 2838 0 1 19:49 ? 00:00:00
/usr/bin/hydra_pmi_proxy --control-port master:46220 --demux poll --pgid 0
--retries 10 --proxy-id 0
sgeadmin 2839 2837 2839 0 3 19:49 ? 00:00:00
/opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy"
--control-port master:46220 --demux poll -
sgeadmin 2839 2837 2850 0 3 19:49 ? 00:00:00
/opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy"
--control-port master:46220 --demux poll -
sgeadmin 2839 2837 2851 0 3 19:49 ? 00:00:00
/opt/sge6/bin/linux-x64/qrsh -inherit -V node001 "/usr/bin/hydra_pmi_proxy"
--control-port master:46220 --demux poll -
sgeadmin 2840 2838 2840 98 1 19:49 ? 00:04:32 newave170502_L
sgeadmin 2841 2838 2841 89 1 19:49 ? 00:04:05 newave170502_L
sgeadmin 2842 2838 2842 93 1 19:49 ? 00:04:18 newave170502_L
sgeadmin 2843 2838 2843 88 1 19:49 ? 00:04:03 newave170502_L
sgeadmin 2844 2838 2844 93 1 19:49 ? 00:04:19 newave170502_L
sgeadmin 2845 2838 2845 94 1 19:49 ? 00:04:20 newave170502_L
sgeadmin 2846 2838 2846 69 1 19:49 ? 00:03:11 newave170502_L
sgeadmin 2847 2838 2847 69 1 19:49 ? 00:03:11 newave170502_L
sgeadmin 2848 2838 2848 69 1 19:49 ? 00:03:11 newave170502_L
sgeadmin 2849 2838 2849 69 1 19:49 ? 00:03:11 newave170502_L
sgeadmin 2858 2491 2858 0 1 19:54 pts/0 00:00:00 ps -eLf
$ cat /etc/hosts
127.0.0.1 ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-17-48-113.ec2.internal ip-10-17-48-113
10.17.48.113 master
10.17.48.210 node001
$ which mpiexec
/usr/bin/mpiexec
$ cat newave.tim (this is an output of the mpi app showing that 20 slots
are being used)
Programa Newave
Versao 17.5.2
Caso: PMO JANEIRO - 2011 29/12/2010 CVAR L25 A25 niveis para 31/12 NW
Versao 17.5.x
Data: 27-07-2013
Hora: 19h 49min 28.425sec
Numero de Processadores: 20 (<-- number of processors)
Everything runs fine. The job is divided into the 2 servers equally,
occupying 10 slots in each one.
Now.. if I change PE to $fill_up and submit the same 20 slots´ job..
something weird happens.
Let´s see:
$fill_up
job with 20 slots
job launched as
$qsub -N $NOMECASO -b y -pe orte 20 -cwd mpiexec -np 20 newave170502_L
$ ps -e f --cols=500
2390 ? Sl 0:01 /opt/sge6/bin/linux-x64/sge_execd
2890 ? S 0:00 \_ sge_shepherd-2 -bg
2892 ? Ss 0:00 \_ mpiexec -np 20 newave170502_L
2893 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
--control-port master:37827 --demux poll --pgid 0 --retries 10 --proxy-id 0
2895 ? R 0:31 | \_ newave170502_L
2896 ? R 0:24 | \_ newave170502_L
2897 ? R 0:24 | \_ newave170502_L
2898 ? R 0:24 | \_ newave170502_L
2899 ? R 0:24 | \_ newave170502_L
2900 ? R 0:24 | \_ newave170502_L
2901 ? S 0:00 | \_ newave170502_L
2902 ? S 0:00 | \_ newave170502_L
2903 ? S 0:00 | \_ newave170502_L
2904 ? S 0:00 | \_ newave170502_L
2894 ? Sl 0:00 \_ /opt/sge6/bin/linux-x64/qrsh
-inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:37827
--demux poll --pgid 0 --retries 10 --proxy-id 1
$ qstat -f
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q@master BIP 0/16/16 8.20 linux-x64
2 0.55500 pmo_2011-0 sgeadmin r 07/27/2013 20:01:11 16
---------------------------------------------------------------------------------
all.q@node001 BIP 0/4/16 8.24 linux-x64
2 0.55500 pmo_2011-0 sgeadmin r 07/27/2013 20:01:11 4
*** As you can see, the queue filled up the first server and use the 4
slots of the second, but..
the mpi used 10 slots of the first server and 10 in the other one.
If I resubmit it, now with 16 slots:
job with 16 slots
$ ps -e f --cols=500
2932 ? S 0:00 \_ sge_shepherd-3 -bg
2934 ? Ss 0:00 \_ mpiexec -np 16 newave170502_L
2935 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
--control-port master:50693 --demux poll --pgid 0 --retries 10 --proxy-id 0
2937 ? S 0:00 | \_ newave170502_L
2938 ? S 0:00 | \_ newave170502_L
2939 ? S 0:00 | \_ newave170502_L
2940 ? S 0:00 | \_ newave170502_L
2941 ? S 0:00 | \_ newave170502_L
2942 ? S 0:00 | \_ newave170502_L
2943 ? S 0:00 | \_ newave170502_L
2944 ? S 0:00 | \_ newave170502_L
2936 ? Z 0:00 \_ [qrsh] <defunct>
$ qstat -f
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q@master BIP 0/16/16 4.39 linux-x64
3 0.55500 pmo_2011-0 sgeadmin r 07/27/2013 20:12:26 16
---------------------------------------------------------------------------------
all.q@node001 BIP 0/0/16 4.67 linux-x64
$ ps -eLf
sgeadmin 2934 2932 2934 0 1 20:12 ? 00:00:00 mpiexec -np 16
newave170502_L
sgeadmin 2935 2934 2935 0 1 20:12 ? 00:00:00
/usr/bin/hydra_pmi_proxy --control-port master:50693 --demux poll --pgid 0
--retries 10 --proxy-id 0
sgeadmin 2936 2934 2936 0 1 20:12 ? 00:00:00 [qrsh] <defunct>
sgeadmin 2937 2935 2937 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2938 2935 2938 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2939 2935 2939 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2940 2935 2940 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2941 2935 2941 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2942 2935 2942 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2943 2935 2943 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2944 2935 2944 0 1 20:12 ? 00:00:00 newave170502_L
sgeadmin 2949 2491 2949 0 1 20:14 pts/0 00:00:00 ps -eLf
*** Again you can see, the queue filled up the first server and use no
slots of the second, but..
the mpi used 8 slots of the first server and tried to use 8 in the other
one but got an error...
Comments?
All the best and thank you so much for your time and effort to help in this
one...
Sergio
On Sat, Jul 27, 2013 at 3:58 PM, Reuti <[email protected]> wrote:
> Am 27.07.2013 um 16:25 schrieb Sergio Mafra:
>
> > Reuti,
> >
> > Aggregating all data...
> >
> > My cluster has 2 servers (master and node001), with 16 slots each one.
> >
> > My mpi app is newave170502_L
> >
> > I ran 3 tests:
> >
> > 1. $round_robin using 32 slots: (ran ok)
> >
> > 2382 ? Sl 0:00 /opt/sge6/bin/linux-x64/sge_execd
> > 2817 ? S 0:00 \_ sge_shepherd-1 -bg
> > 2819 ? Ss 0:00 \_ mpiexec newave170502_L
> > 2820 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
> --control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > 2822 ? R 0:30 | \_ newave170502_L
> > 2821 ? Sl 0:00 \_ /opt/sge6/bin/linux-x64/qrsh
> -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:40945
> --demux poll --pgid 0 --ret
>
> As both nodes are used, this will succeed. I wonder why there is only one
> `newave170502` process. It should show 16 on each machine as child of the
> particular `hydra_pmi_proxy`.
>
> What is the output of:
>
> mpiexec --version
>
> Maybe the application is using threads in addition. Does:
>
> ps -eLf
>
> list more instances of the application?
>
>
> > 2. $fill_up with 16 slots: (aborted with error error: executing task of
> job 2 failed: execution daemon on host "node001" didn't accept task)
> >
> > 2842 ? S 0:00 \_ sge_shepherd-2 -bg
> > 2844 ? Ss 0:00 \_ mpiexec newave170502_L
> > 2845 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
> --control-port master:45562 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > 2847 ? S 0:00 | \_ newave170502_L
> > 2846 ? Z 0:00 \_ [qrsh] <defunct>
>
> SGE allocated all slots to the "master" and none to "node001", as the
> submitted job can get the required amount of slots from only one machine,
> there is no need to spread another task on "node001". They question is: why
> is your application (or even the `mpiexec`) trying to do so? There were
> cases, where SGE was misled due to contradictory entries in:
>
> /etc/hosts
>
> having two or more different names for each machine.
>
> - What is the content of this file in your machines?
>
> - Is
>
> > 3. $fill_up with 18 slots (ran ok):
> >
> > 2382 ? Sl 0:01 /opt/sge6/bin/linux-x64/sge_execd
> > 2861 ? Sl 0:00 \_ sge_shepherd-3 -bg
> > 2862 ? Ss 0:00 \_
> /opt/sge6/utilbin/linux-x64/qrsh_starter
> /opt/sge6/default/spool/exec_spool_local/master/active_jobs/3.1/1.master
> > 2869 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
> --control-port node001:36673 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > 2870 ? R 0:24 \_ newave170502_L
>
> While in former times (with the old MPICH(1)) each slave task need its own
> `qrsh --inherit ...`, nowadays only one is used and all additional
> processes on the master or any slave node are forks.
>
> I guess even 17 would work, as it would need at least one slot from the
> other machine.
>
> - Is there any comment in the output of your application, how many
> processes were started for a computation?
>
> - Is the `mpiexec` a plain binary, or some kind of wrapper script?
>
> file `which mpiexec`
>
> If it's a symbolic link, it should point to mpiexec.hydra and the inquiry
> can be repeated.
>
> -- Reuti
>
>
> > ---------- Forwarded message ----------
> > From: Sergio Mafra <[email protected]>
> > Date: Sat, Jul 27, 2013 at 11:07 AM
> > Subject: Fwd: [gridengine users] Round Robin x Fill Up
> > To: Reuti <[email protected]>, "[email protected]" <
> [email protected]>
> >
> >
> > Appending to previous message.
> >
> > If I change to $fill_up and submit the same job using only 16 slots of
> 32 available slots. here comes the output:
> >
> > 2842 ? S 0:00 \_ sge_shepherd-2 -bg
> > 2844 ? Ss 0:00 \_ mpiexec newave170502_L
> > 2845 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
> --control-port master:45562 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > 2847 ? S 0:00 | \_ newave170502_L
> > 2846 ? Z 0:00 \_ [qrsh] <defunct>
> > ---------- Forwarded message ----------
> > From: Sergio Mafra <[email protected]>
> > Date: Sat, Jul 27, 2013 at 10:58 AM
> > Subject: Re: [gridengine users] Round Robin x Fill Up
> > To: Reuti <[email protected]>
> > Cc: "[email protected]" <[email protected]>
> >
> >
> > Hi Reuti,
> >
> > >Do you start in your job script any `mpiexec` resp. `mpirun` or is this
> issued already inside >the application you started? The question is,
> whether there is any additional "-hostlist", "->machinefile" or alike given
> as argument to this command and invalidating the generated >$PE_HOSTFILE of
> SGE.
> >
> > The job is started using mpiexec, in this way:
> > $ qsub -N $nameofthecase -b y -pe orte $1 -cwd mpiexec newave170502_L
> > where newave170502_L is the name of mpi app.
> >
> > >You can also try the following:
> > >
> > >- revert the PE definition to allocate by $round_robin
> > >- submit a job
> > >- SSH to the master node of the parallel job
> > >- issue:
> > >
> > >ps -e f --cols=500
> > >
> > >(f w/o -)
> >
> > >- somewhere should be the `mpiexec` resp. `mpirun` command. Can you
> please post >this line, it should be a child of the started job script.
> >
> > Here comes the output:
> >
> > 2382 ? Sl 0:00 /opt/sge6/bin/linux-x64/sge_execd
> > 2817 ? S 0:00 \_ sge_shepherd-1 -bg
> > 2819 ? Ss 0:00 \_ mpiexec newave170502_L
> > 2820 ? S 0:00 \_ /usr/bin/hydra_pmi_proxy
> --control-port master:40945 --demux poll --pgid 0 --retries 10 --proxy-id 0
> > 2822 ? R 0:30 | \_ newave170502_L
> > 2821 ? Sl 0:00 \_ /opt/sge6/bin/linux-x64/qrsh
> -inherit -V node001 "/usr/bin/hydra_pmi_proxy" --control-port master:40945
> --demux poll --pgid 0 --retries 10 --proxy-id 1
> >
> > All best,
> >
> > Sergio
> >
> >
> > On Sat, Jul 27, 2013 at 10:13 AM, Reuti <[email protected]>
> wrote:
> > Hi,
> >
> > Am 26.07.2013 um 23:26 schrieb Sergio Mafra:
> >
> > > Hi Reuti,
> > >
> > > Thanks for your prompt answer.
> > > Regarding yout questions:
> > >
> > > > How does you application read the list of granted machines?
> > > > Did you compile MPI on your own (which implementation in detail)?
> > >
> > > I´ve got no control or no documentation about this app. It was design
> by an Electrical Research Center for our proposes.
> > >
> > > > PS: I assume that with $round_robin simply all (or at least: many)
> nodes were access allowed to.
> > >
> > > Yes. It´s correct.
> > >
> > > >As now hosts are first filled before access to another one is
> granted, you might see the >effect of the former (possibly wrong)
> distribution of slave tasks to the nodes
> > >
> > > So I understand that the app should be recompiled to take advantages
> of $fill_up option?
> >
> > No necessarily, the used version of MPI is obviously prepared to run
> under the control of SGE, as it uses `qrsh -inherit ...` to start slave
> tasks on other nodes. Unfortunately also on machines/slots which weren't
> granted for this job and results in the error you mentioned first.
> >
> > Do you start in your job script any `mpiexec` resp. `mpirun` or is this
> issued already inside the application you started? The question is, whether
> there is any additional "-hostlist", "-machinefile" or alike given as
> argument to this command and invalidating the generated $PE_HOSTFILE of SGE.
> >
> > The MPI library should detect the granted allocation automatically, as
> it honors already that it's started under SGE.
> >
> > You can also try the following:
> >
> > - revert the PE definition to allocate by $round_robin
> > - submit a job
> > - SSH to the master node of the parallel job
> > - issue:
> >
> > ps -e f --cols=500
> >
> > (f w/o -)
> >
> > - somewhere should be the `mpiexec` resp. `mpirun` command. Can you
> please post this line, it should be a child of the started job script.
> >
> > -- Reuti
> >
> >
> > > All the best,
> > >
> > > Sergio
> > >
> > >
> > > On Fri, Jul 26, 2013 at 10:06 AM, Reuti <[email protected]>
> wrote:
> > > Hi,
> > >
> > > Am 26.07.2013 um 14:22 schrieb Sergio Mafra:
> > >
> > > > I'm using MIT StarCluster with mpich2 and OGE. Everything's ok.
> > > > But when I tried to change the strategy of distribution of work from
> Round Robin (default) to Fill Up... My problems had just began.
> > > > OGE keeps me teling that some nodes can not receive tasks...
> > >
> > > On the one hand this is a good sign, as it confirms that your PE is
> defined to control slave tasks on the nodes.
> > >
> > >
> > > > "Error: executing task of job 9 failed: execution daemon on host
> "node002" didn't accept task"It seems that my mpi app always tries to run
> in all nodes of the cluster, no matter if OGE doesn't allow it to do it.
> > > > Does anybody knows of a workaround ?
> > >
> > > This indicates, that you application tries to use a node in the
> cluster, which wasn't granted to this job by SGE.
> > >
> > > How does you application read the list of granted machines?
> > >
> > > Did you compile MPI on your own (which implementation in detail)?
> > >
> > > -- Reuti
> > >
> > > PS: I assume that with $round_robin simply all (or at least: many)
> nodes were access allowed to. As now hosts are first filled before access
> to another one is granted, you might see the effect of the former (possibly
> wrong) distribution of slave tasks to the nodes.
> > >
> >
> >
> >
> >
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users