Re: [gridengine users] Q: Understanding of Loose and Tight Integration of PEs.

Lee, Wayne Tue, 01 Dec 2015 13:35:49 -0800

Hi Reuti,

I just wanted to thank you again for providing answers and comments regarding 
my inquiries about MPI and GE integration.   I was able to get our 3rd party 
Platform MPI application to be managed by GE.   I did this by creating a GE 
script setting the application's environment variables in the script along with 
setting MPI_TMPDIR=$TMPDIR in the script.   At this point, I've demonstrated to 
the application vendor that this works.   It is now up to them to get their 
wrapper scripts to build the GE submission scripts.


I wanted to also ask you if there are standardized ways to handled OpenMP 
applications with GE.   Is it similar to the ways we've been discussing with 
MPI based applications?   I'm also working on getting another 3rd party 
application to be managed by GE.   The vendor has stated that they use OpenMP 
for multi-threading (in a single process) and that they phased out the use of 
OpenMPI for multi-tasking (running multiple processes on different nodes across 
the network).  

>From what I have seen when running this application interactively, it looks 
>like the application allows one to specify the total number of tasks in a job. 
>  The application also allows one to specify a path to a "machinefile", the 
>number of CPUs to be used for computations on each node and the maximum amount 
>of memory the application is allowed to allocate.    I'm guessing that from a 
>GE perspective, the total number tasks could be thought of as the total number 
>of nodes to use where each node would use a specific number of CPUs.

Any thoughs/comments would be appreciated.

Regards,

-----
Wayne Lee


-----Original Message-----
From: Reuti [mailto:[email protected]] 
Sent: Saturday, November 21, 2015 8:26 AM
To: Lee, Wayne <[email protected]>
Cc: [email protected] Group <[email protected]>
Subject: Re: [gridengine users] Q: Understanding of Loose and Tight Integration 
of PEs.

Hi,

Am 21.11.2015 um 06:14 schrieb Lee, Wayne:

> Hi Reuti,
> 
> First of all.   Thanks (Danke) for your very quick and prompt reply.   I was 
> amazed that you were able to reply so quickly given the time I received your 
> reply, it must have been late at night for you.
> 
> Anyway, thank you for clarifying some things about the details regarding PEs. 
>  I'm attempting to see how use GE to manage an application using Platform MPI 
> version 9.1.2.   I did see a some posting from back in 2014 where you 
> provided some assistance to an individual attempting to use GE to manage a 
> Platform MPI application.    As far as I know, Platform MPI doesn't have 
> built in support for GE.

Correct. But they provide a MPICH(1) behavior to accept a plain list of nodes 
for each slot with the -hostfile option.


>   I would expect this given that Platform has LSF as their job scheduler and 
> I wouldn't expect Platform to support a competing job scheduler like GE.

Well, it's owned by IBM now and they even have a free license of Platform MPI: 
http://www.ibm.com/developerworks/downloads/im/mpi/ - and before it was at HP.


> After looking over the postings on the GE forum, I was able to submit the 
> Platform MPI applications to GE using the following "qsub" command.   Based 
> on what I did, I believe I've used "tight integration"

According to the `ps -e f` output you provided, it looks like a proper tight 
integration.


> since the slave processes were started up by "qrsh".   The job submitted used 
> a total of 16 CPU cores, one for each MPI process (rank) that was requested.  
>  The PE used was configured to distribute the processes in a round-robin 
> fashion.  
> 
> qsub -l myapp_program=1 -l mr_p=8 -h -l AHC_JobType=myapp -l 
> AHC_JobClass=prod -v 
> PATH=/apps/myapp/tools/linux_x86_64/platform/9.1.2/bin:$PATH -v 
> MPI_REMSH=ssh -v  \ MPI_TMPDIR=/tmp/ge -V -t 1-1 -N TEST_DATA -q 
> prod_myapp_rr_pmpi.low.q -V -cwd -j y -b y -o TEST_DATA.OUT -pe 
> prod_myapp_rr_pmpi 16  \ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpirun -hostfile 
> /tmp/ge/pmpi_machines -np 16 -prot -aff=automatic:bandwidth \ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe 
> CEIBA-V3-LO12-4PV_E100

Can more than one job being run on  a node at the same time? I fear 
/tmp/ge/pmpi_machines might get overridden then, like any scratch data in 
MPI_TMPDIR.

> 
> Comments about the above qsub command:
> ==================================
> 
> 1. The above qsub is not what our users would execute to run the Platform MPI 
> application.   The application vendor provides a Python wrapper script which 
> essentially builds a qsub script and submits the script on behalf of the 
> user.   The Python wrapper script accepts various command-line arguments 
> which provides the end-user a choice of which job scheduler, application, and 
> MPI versions to use.   I have found that the vendor's wrapper script does has 
> some minor bugs in it as far as how it supports GE.   Hence, I decided to 
> bypass the script in order to ensure that the application could be tested 
> with GE.
> 
> 2. Most of the qsub command-line arguments you're familiar with already.   
> However, I wanted to focus on the "-v" environment variables and the "-V" 
> options.
> 
> - After seeing the postings from 2014, it was suggested to pass the PATH 
> environment variable to include the path to the Platform MPI binaries, 
> MPI_REMSH which specifies the protocol that Master and Slave processes in 
> Platform MPI would communicate.   In the case of the vendor's application, 
> they chose "ssh".    I'm not quite sure exactly what MPI_TMPDIR is used for 
> but I guess it is the location on all Slave and Master host for temporary 
> files to be written to by Platform MPI.   I just know that I set it to 
> "/tmp/ge" which is a directory on all the nodes I've tested the Platform MPI 
> application on.    The "/tmp/ge" location is also where I configured versions 
> of the "startmpi.sh", "stopmpi.sh", and "rsh" scripts write out the 
> $pe_hostfile to as the job was executed by GE.    From the past postings, it 
> was suggested that one set the MPI_TMPDIR variable to $TMPDIR which is what 
> GE sets.   However, when I attempted to set this in the above qsub command, 
> MPI_TMPDIR wouldn't g!
 et set.   So I am wondering how do I set "MPI_TMPDIR" so that it uses the GE's 
value for $TMPDIR?

If you set it on the command line, it will be expanded at submission time and 
at that time it's not known. Escaping it might put a plain $TMPDIR in the 
variable - also not the desired effect. Would it be possible to have a small 
wrapper script, where you could also use other SGE feature like putting fixed 
options to `qsub` therein and use some environment variables set by SGE:

#!/bin/sh
#$ -cwd
#$ -j y
export PATH=/apps/myapp/tools/linux_x86_64/platform/9.1.2/bin:$PATH
export MPI_TMPDIR=$TMPDIR
export HOSTFILE=$MPI_TMPDIR/machines
/apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpirun -hostfile $HOSTFILE 
-np $NSLOTS ...


> - I should point that I included the use of the "-V" option since I wanted 
> the qsub I used to inherit the shell environment I was using when I ran qsub. 
>    The shell environment that the application vendor recommends all of our 
> users to use is C-Shell (i.e. /bin/csh).

But the shell used in the application script is independent from the shell used 
on the command line. They suggest to use the csh for their scripts or also on 
the command line? Just the setting of the queue might need to be adjusted: 
"shell_start_mode" (see `man queue_conf` - I prefer "unix_behavior") and 
additional "shell" in some cases, depending on what behavior you prefer.


>     One thing I am wondering is should I set up this environment along with 
> the "PATH" to the Platform MPI binaries and the MPI_REMSH and MPI_TMPDIR 
> variables in a C-Shell script and get this script executed as part of the 
> "starter_method" for the queue I am using for these Platform MPI jobs?  Or is 
> there some other preferred methods for passing the login environment and 
> other environment variables so that the job will execute properly?

See above. Whether Platform's `mpirun` is executed in a csh or bash script 
should make no difference. I wonder about the intention of the vendor for its 
statement and whether its mandatory for his software or just a suggestion like 
a personal preference.


>   I hope I'm being clear here.   Also, if the application login shell is 
> C-shell, does that mean that if I define a script for the "starter_method", 
> that this script should be a C-shell based script?

No. You could even switch to the by -S requested shell by checking 
SGE_STARTER_SHELL_PATH (see `man queue_conf`), or hard code any shell therein: 
`exec` the desired shell in the "starter_method" with the main script as 
argument.

-- Reuti


> Additional information:
> ==================
> 
> Parallel Environment Used:
> ---------------------------------
> pe_name               prod_myapp_rr_pmpi
> slots                 9999
> used_slots            0
> bound_slots           0
> user_lists            NONE
> xuser_lists           NONE
> start_proc_args       /nfs/njs/ge/mpi/pmpi/startpmpi.sh -catch_rsh 
> $pe_hostfile
> stop_proc_args                /nfs/njs/ge/mpi/pmpi/stoppmpi.sh
> allocation_rule               $round_robin
> control_slaves                TRUE
> job_is_first_task     TRUE
> urgency_slots         min
> accounting_summary    TRUE
> daemon_forks_slaves   FALSE
> master_forks_slaves   FALSE
> 
> 
> Startpmpi.sh script used (This is a copy of the template startmpi.sh script 
> provided by GE.   I only show the parts I modified/added.   The rest remained 
> the same.)
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------------------------------------------------------
> 
> # trace machines file
> cat $machines
> 
> # make copy of $machines to /tmp/ge on each node.
> cp -p $machines /tmp/ge/pmpi_machines
> .
> .
> .
> if [ $catch_rsh = 1 ]; then
>   rsh_wrapper=$SGE_ROOT/mpi/pmpi/rsh      ### Changed location of rsh script
>   if [ ! -x $rsh_wrapper ]; then
>      echo "$me: can't execute $rsh_wrapper" >&2
>      echo "     maybe it resides at a file system not available at this 
> machine" >&2
>      exit 1
>   fi
> 
> #   rshcmd=rsh
>   rshcmd=ssh          ### Since Platform MPI application wants to use ssh 
> instead of rsh.
>   case "$ARC" in
>      hp*) rshcmd=remsh ;;
>      *) ;;
>   esac
>   # note: This could also be done using rcp, ftp or s.th.
>   #       else. We use a symbolic link since it is the
>   #       cheapest in case of a shared filesystem
>   #
>   ln -s $rsh_wrapper $TMPDIR/$rshcmd                  ### Hence an "ssh" link 
> to $SGE_ROOT/mpi/pmpi/rsh is created.
> fi
> .
> .
> .
> if [ $catch_hostname = 1 ]; then
>   hostname_wrapper=$SGE_ROOT/mpi/pmpi/hostname        ### Changed location of 
> hostname script
> .
> .
> .
> exit 0
> 
> Stoppmpi.sh script used (This is a copy of the template stopmpi.sh script 
> provided by GE.   I only show the parts I modified/added.   The rest remained 
> the same.)
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------------------------------------------------------
> 
> #
> # Just remove machine-file that was written by startpmpi.sh # #rm 
> $TMPDIR/machines
> rm /tmp/ge/pmpi_machines              ### Remove list of hosts from each node.
> 
> #rshcmd=rsh
> rshcmd=ssh                    ### Changed to ssh for Platform MPI application.
> case "$ARC" in
>   hp*) rshcmd=remsh ;;
>   *) ;;
> esac
> rm $TMPDIR/$rshcmd
> 
> exit 0
> 
> rsh script used (This is a copy of the template rsh script provided by GE.   
> I only show the parts I modified/added.   The rest remained the same.)
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------------------------------------------------------
> .
> .
> .
> if [ x$just_wrap = x ]; then
>   if [ $minus_n -eq 1 ]; then
>      echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd            
>                 ### -V option added in order to pass login environment to 
> qrsh.
>      exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit -nostdin $rhost $cmd            
>         ### -V option added in order to pass login environment to qrsh.
>   else
>      echo $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd                     
>         ### -V option added in order to pass login environment to qrsh.
>      exec $SGE_ROOT/bin/$ARC/qrsh -V -inherit $rhost $cmd                     
>         ### -V option added in order to pass login environment to qrsh.
>   fi
> else
> .
> .
> .
> 
> ps -ef f output form job submitted by qsub (Only partial listing shown 
> for Master and Slave nodes.)
> ----------------------------------------------------------------------
> --------------------------------------------------
> 
> Node n100
> ==================
> sgeadmin 14768     1  0 Oct14 ?        Sl   127:51 
> /nfs/njs/ge/bin/lx-amd64/sge_execd
> sgeadmin 32339 14768  0 21:13 ?        S      0:00  \_ sge_shepherd-1304 -bg
> csh_test 32402 32339  0 21:13 ?        Ss     0:00      \_ -csh -c 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpirun -hostfile 
> /tmp/ge/pmpi_machines -np 16 -prot -aff=automatic:bandwidth 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA 
> csh_test 32501 32402  0 21:13 ?        S      0:00          \_ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpirun -hostfile 
> /tmp/ge/pmpi_machines -np 16 -prot -aff=automatic:bandwidth 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 32504 32501  0 21:13 ?        S      0:00              \_ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 0 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 32630 32504 20 21:13 ?        Rl     0:11              |   \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 32631 32504 97 21:13 ?        R      0:52              |   \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 32632 32504 97 21:13 ?        R      0:52              |   \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 32633 32504 97 21:13 ?        R      0:52              |   \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 32505 32501  0 21:13 ?        S      0:00              \_ cat
> csh_test 32506 32501  0 21:13 ?        Sl     0:00              \_ 
> /nfs/njs/ge/bin/lx-amd64/qrsh -V -inherit -nostdin 10.231.82.13 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 1 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 32507 32501  0 21:13 ?        Sl     0:00              \_ 
> /nfs/njs/ge/bin/lx-amd64/qrsh -V -inherit -nostdin 10.231.82.215 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 2 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 32508 32501  0 21:13 ?        Sl     0:00              \_ 
> /nfs/njs/ge/bin/lx-amd64/qrsh -V -inherit -nostdin 10.231.83.42 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 3 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> 
> Node n101
> ==================
> sgeadmin  7608     1  0 Aug05 ?        Sl   351:24 
> /nfs/njs/ge/bin/lx-amd64/sge_execd
> sgeadmin 25337  7608  0 21:13 ?        Sl     0:00  \_ sge_shepherd-1304 -bg
> csh_test 25338 25337  0 21:13 ?        Ss     0:00      \_ 
> /nfs/njs/ge/utilbin/lx-amd64/qrsh_starter 
> /tmp/ge/n101/active_jobs/1304.1/1.n101
> csh_test 25345 25338  0 21:13 ?        S      0:00          \_ csh -c 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 1 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 25443 25345  0 21:13 ?        S      0:00              \_ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 1 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 25538 25443 98 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 25539 25443 98 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 25540 25443 98 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 25541 25443 98 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> 
> Node n102
> ==================
> sgeadmin  3647     1  0 Aug05 ?        Sl   346:57 
> /nfs/njs/ge/bin/lx-amd64/sge_execd
> sgeadmin 24051  3647  0 21:13 ?        Sl     0:00  \_ sge_shepherd-1304 -bg
> csh_test 24052 24051  0 21:13 ?        Ss     0:00      \_ 
> /nfs/njs/ge/utilbin/lx-amd64/qrsh_starter 
> /tmp/ge/n102/active_jobs/1304.1/1.n102
> csh_test 24059 24052  0 21:13 ?        S      0:00          \_ csh -c 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 2 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 24157 24059  0 21:13 ?        S      0:00              \_ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 2 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test 24252 24157 97 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 24253 24157 97 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 24254 24157 97 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test 24255 24157 97 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> 
> Node n103
> ==================
> sgeadmin  2412     1  0 Sep03 ?        Sl   250:56 
> /nfs/njs/ge/bin/lx-amd64/sge_execd
> sgeadmin  5569  2412  0 21:13 ?        Sl     0:00  \_ sge_shepherd-1304 -bg
> csh_test  5570  5569  0 21:13 ?        Ss     0:00      \_ 
> /nfs/njs/ge/utilbin/lx-amd64/qrsh_starter 
> /tmp/ge/n103/active_jobs/1304.1/1.n103
> csh_test  5577  5570  0 21:13 ?        S      0:00          \_ csh -c 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 3 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test  5675  5577  0 21:13 ?        S      0:00              \_ 
> /apps/myapp/tools/linux_x86_64/platform/9.1.2/bin/mpid 3 0 151060992 
> 10.231.82.15 40823 32501 /apps/myapp/tools/linux_x86_64/platform/9.1.2
> csh_test  5770  5675 99 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test  5771  5675 99 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test  5772  5675 99 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> csh_test  5773  5675 99 21:13 ?        R      0:52                  \_ 
> /apps/myapp/2014.1/bin/linux_x86_64/myapp_program_plmpi.exe TEST_DATA
> 
> 
> Kind Regards,
> 
> -----
> Wayne Lee
> 
> 
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Wednesday, November 18, 2015 4:26 PM
> To: Lee, Wayne
> Cc: [email protected] Group
> Subject: Re: [gridengine users] Q: Understanding of Loose and Tight 
> Integration of PEs.
> 
> Ups - fatal typo - it's late:
> 
> Am 18.11.2015 um 23:09 schrieb Reuti:
> 
>> Hi,
>> 
>> Am 18.11.2015 um 22:00 schrieb Lee, Wayne:
>> 
>>> To list,
>>> 
>>> I've been reading some of the information from various web links regarding 
>>> the differences between "loose" and "tight" integration associated with 
>>> Parallel Environments (PEs) within Grid Engine (GE).   One of the weblinks 
>>> I found which provides a really good explanation of this is "Dan 
>>> Templeton's PE Tight Integration 
>>> (https://blogs.oracle.com/templedf/entry/pe_tight_integration).  I would 
>>> like to just confirm my understanding of "loose"/"tight" integration as 
>>> well as what the role of the "rsh" wrapper is in the process. 
>>> 
>>> 1.       Essentially, as best as I can tell an application, regardless if 
>>> it is setup to use either "loose" or "tight" integration have the GE 
>>> "sge_execd" execution daemon start up the "Master" task that is part of a 
>>> parallel job application.   An example of this would be an MPI (eg. LAM, 
>>> Intel, Platform, Open, etc.) application.   So I'm assuming I would the 
>>> "sge_execd" daemon fork off a "sge_shepherd" process which in turn starts 
>>> up something like "mpirun" or some script.  Is this correct?
>> 
>> Yes.
>> 
>> But to be complete: in addition we first have to distinguish whether the MPI 
>> slave tasks can be started by an `ssh`/`rsh` (resp. `qrsh -inherit ...` for 
>> a tight integration) on its own, or whether they need some running daemons 
>> beforehand. Creating a tight integration for a daemon based setup is more 
>> convoluted by far, and my Howtos for PVM, LAM/MPI and early versions of 
>> MPICH2 are still available, but I wouldn't recommend to use it - unless you 
>> have some legacy applications which depend on this and you can't recompile 
>> them.
>> 
>> Recent versions of Intel MPI, Open MPI, MPICH2 and Platform MPI can achieve 
>> a tight integration with minimal effort. Let me know if you need more 
>> information about a specific one.
>> 
>> 
>>> 2.       The differences between the "loose" and "tight" integration is how 
>>> the parallel job application's "Slave" tasks are handled.   With "loose" 
>>> integration the slave tasks/processes are not managed and started by GE.   
>>> The application would start up the slave tasks via something like "rsh" or 
>>> "ssh".    An example of this is mpirun starting the various slave processes 
>>> to the various nodes listed in the "$pe_hostlist" provided by GE.  With 
>>> "tight" integration, the slave tasks/processes are managed and started by 
>>> GE but through the use of "qrsh".  Is this correct?
>> 
>> Yes.
>> 
>> 
>>> 3.       One of the things I was reading from the document discussing 
>>> "loose" and "tight" integration using LAM MPI was the differences in the 
>>> way they handle "accounting" and how the processes associated with a 
>>> parallel job are handled if deleted using qdel.    By "accounting", does 
>>> this mean that the GE is able to better keep track of where each of the 
>>> slave tasks are and how much resources are being used by the slave tasks?   
>>>  So does this mean that "tight" integration is preferable over "loose" 
>>> integration since one allows GE to better keep track of the resources used 
>>> by the slave tasks and one is able to better delete a "tight" integration 
>>> job in a "cleaner" manner?
>> 
>> Yes - absolutely.
>> 
>> 
>>> 4.       Continuing with "tight" integration.   Does this also mean that if 
>>> a parallel MPI application uses either "rsh" or "ssh" to facilitate the 
>>> communications between the Master and Slave tasks/processes, that 
>>> essentially, "qrsh", intercepts or replaces the communications performed by 
>>> "rsh" or "ssh"?     Hence this is why the "rsh" wrapper script is used to 
>>> facilitate the "tight" integration.   Is that correct?
>> 
>> The wrapper solution is only necessary in case the actual MPI library 
>> has now builtin support for SGE. In case of Open MPI (./configure 
>> --with-sge ...) and
> 
> Should read: [...] actual MPI library has no builtin support [...]
> 
> -- Reuti
> 
> 
>> MPICH2 the support is built in and you can find hints to set it up on their 
>> websites - no wrapper necessary and the start_/stop-_proc_args can be set to 
>> NONE (i.e.: they call `qrsh` directly, in case they discover that they are 
>> executed under SGE [by certain set environment variables]). The 
>> start_proc_args in the PE was/is used to set up the links to the wrapper(s) 
>> and reformat the $pe_hostfile, in case the parallel library understands only 
>> a different format*. This is necessary e.g. for MPICH(1).
>> 
>> *) In case you heard of the application Gaussian: I also create the 
>> "%lindaworkers=..." list of nodes for the input file line in the 
>> start_proc_args.
>> 
>> 
>>> 5.       I was reading from some of the postings in the GE archive from 
>>> someone named "Reuti" regarding the "rsh" wrapper script.   If I understood 
>>> what he wrote correctly, it doesn't matter if the Parallel MPI application 
>>> is using either "rsh" or "ssh", the "rsh" wrapper script provided by GE is 
>>> just to force the application so use GE's qrsh?    Am I stating this 
>>> correctly?    Another way to state this is that "rsh" is just a name.   The 
>>> name could be anything as long as your MPI application is configured to use 
>>> whatever name of the communications protocol is used by the application, 
>>> essentially the basic contents of the wrapper script won't change aside 
>>> from the name "rsh" and locations of scripts referenced by the wrapper 
>>> script.   Again, am I stating this correctly?
>> 
>> Yes to all.
>> 
>> 
>>> 6.       With regards to the various types and vendor's MPI implementation. 
>>>   What does it exactly mean that certain MPI implementations are GE aware?  
>>>  I tend to think that this means that parallel applications built with GE 
>>> aware MPI implementations know where to find the "$pe_hostfile" that GE 
>>> generates based on what resources the parallel application needs.   Is that 
>>> all to it for the MPI implementation to be GE aware?    I know that with 
>>> Intel or Open MPI, the PE environments that I've created don't really 
>>> require any special scripts for the "start_proc_args" and "stop_proc_args" 
>>> parameters in the PE.    However, based on what little I have seen, LAM and 
>>> Platform MPI implementations appear to require one to use scripts based on 
>>> ones like "startmpi.sh" and "stopmpi.sh" in order to setup the proper 
>>> formatted $pe_hostfile to be used by these MPI implementations.   Is my 
>>> understanding of this correct?
>> 
>> Yes. While LAM/MPI is daemon based, Platform MPI uses a plain call to the 
>> slave nodes and can be tightly integrated by the wrapper and setting `export 
>> MPI_REMSH=rsh`.
>> 
>> For a builtin tight integration the MPI library needs to a) discover under 
>> what queuing system it is running (set environment variables, cna be SGE, 
>> SLURM, LSF, PBS, ...), b) find and honor the $pe_hostfile automatically 
>> (resp. other files for other queuing systems), c) start `qrsh -inherit ...` 
>> to start something on the granted nodes (some implementations need -V here 
>> too (you can check the source of Open MPI for example), to forward some 
>> variable to the slaves - *not* `qsub -V ...` which I try to avoid, as a 
>> random adjustment to the user's shell might lead to a crash of the job when 
>> it finally starts and this can be really hard to investigate, as a new 
>> submission with a fresh shell might work again.
>> 
>> 
>>> 7.       I was looking at the following options for the "qconf -sconf" 
>>> (global configuration) from GE.   
>>> 
>>> qlogin_command             builtin
>>> qlogin_daemon                builtin
>>> rlogin_command              builtin
>>> rlogin_daemon                 builtin
>>> rsh_command                   builtin
>>> rsh_daemon                      builtin
>>> 
>>> I was attempting to fully understand how the above parameters are related 
>>> to the execution of Parallel application jobs in GE.   What I'm wonder here 
>>> is if the parallel application job I would want GE to manage requires and 
>>> uses "ssh" by default for communications between Master and Slave tasks, 
>>> does this mean, that the above parameters would need to be configured to 
>>> use "slogin", "ssh", "sshd", etc.?
>> 
>> No. These are two different things. With all of the above settings (before 
>> this question) you first configure SGE to intercept the `rsh` resp. `ssh` 
>> call (hence the application should never use an absolute path to start 
>> them). This will lead the to effect that `qrsh - inherit ...` will finally 
>> call the communication method which is configured by "rsh_command" and 
>> "rsh_daemon". If possible they should stay as "builtin". Then SGE will use 
>> its own internal communication to start the slave tasks, hence the cluster 
>> needs no `ssh` or `rsh` at all. In my clusters this is even disabled for 
>> normal users, and only admins can `ssh` to the nodes (if a user needs X11 
>> forwarding to a node, this would be special of course). To let users check a 
>> node they have to run an interactive job in a special queue, which grants 
>> only 10 seconds CPU time (while the wallclock time can be almost infinity).
>> 
>> Other settings for these parameters are covered in this document - also 
>> different kinds of communication can be set up for different nodes and 
>> direction of the calls:
>> 
>> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
>> 
>> Let me know in case you need further details.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>> Apologies for all the questions.   I just want to ensure I understand the 
>>> PEs a bit more.
>>> 
>>> Kind Regards,
>>> 
>>> -------
>>> Wayne Lee
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Q: Understanding of Loose and Tight Integration of PEs.

Reply via email to