Re: [gridengine users] Q: Understanding of Loose and Tight Integration of PEs.

Reuti Wed, 18 Nov 2015 14:12:07 -0800

Hi,

Am 18.11.2015 um 22:00 schrieb Lee, Wayne:


> To list,
>  
> I’ve been reading some of the information from various web links regarding 
> the differences between “loose” and “tight” integration associated with 
> Parallel Environments (PEs) within Grid Engine (GE).   One of the weblinks I 
> found which provides a really good explanation of this is “Dan Templeton’s PE 
> Tight Integration 
> (https://blogs.oracle.com/templedf/entry/pe_tight_integration).  I would like 
> to just confirm my understanding of “loose”/”tight” integration as well as 
> what the role of the “rsh” wrapper is in the process. 
>  
> 1.       Essentially, as best as I can tell an application, regardless if it 
> is setup to use either “loose” or “tight” integration have the GE “sge_execd” 
> execution daemon start up the “Master” task that is part of a parallel job 
> application.   An example of this would be an MPI (eg. LAM, Intel, Platform, 
> Open, etc.) application.   So I’m assuming I would the “sge_execd” daemon 
> fork off a “sge_shepherd” process which in turn starts up something like 
> “mpirun” or some script.  Is this correct?

Yes.

But to be complete: in addition we first have to distinguish whether the MPI 
slave tasks can be started by an `ssh`/`rsh` (resp. `qrsh -inherit ...` for a 
tight integration) on its own, or whether they need some running daemons 
beforehand. Creating a tight integration for a daemon based setup is more 
convoluted by far, and my Howtos for PVM, LAM/MPI and early versions of MPICH2 
are still available, but I wouldn't recommend to use it - unless you have some 
legacy applications which depend on this and you can't recompile them.

Recent versions of Intel MPI, Open MPI, MPICH2 and Platform MPI can achieve a 
tight integration with minimal effort. Let me know if you need more information 
about a specific one.


>  2.       The differences between the “loose” and “tight” integration is how 
> the parallel job application’s “Slave” tasks are handled.   With “loose” 
> integration the slave tasks/processes are not managed and started by GE.   
> The application would start up the slave tasks via something like “rsh” or 
> “ssh”.    An example of this is mpirun starting the various slave processes 
> to the various nodes listed in the “$pe_hostlist” provided by GE.  With 
> “tight” integration, the slave tasks/processes are managed and started by GE 
> but through the use of “qrsh”.  Is this correct?

Yes.


> 3.       One of the things I was reading from the document discussing “loose” 
> and “tight” integration using LAM MPI was the differences in the way they 
> handle “accounting” and how the processes associated with a parallel job are 
> handled if deleted using qdel.    By “accounting”, does this mean that the GE 
> is able to better keep track of where each of the slave tasks are and how 
> much resources are being used by the slave tasks?    So does this mean that 
> “tight” integration is preferable over “loose” integration since one allows 
> GE to better keep track of the resources used by the slave tasks and one is 
> able to better delete a “tight” integration job in a “cleaner” manner?

Yes - absolutely.


>  4.       Continuing with “tight” integration.   Does this also mean that if 
> a parallel MPI application uses either “rsh” or “ssh” to facilitate the 
> communications between the Master and Slave tasks/processes, that 
> essentially, “qrsh”, intercepts or replaces the communications performed by 
> “rsh” or “ssh”?     Hence this is why the “rsh” wrapper script is used to 
> facilitate the “tight” integration.   Is that correct?

The wrapper solution is only necessary in case the actual MPI library has now 
builtin support for SGE. In case of Open MPI (./configure --with-sge ...) and 
MPICH2 the support is built in and you can find hints to set it up on their 
websites - no wrapper necessary and the start_/stop-_proc_args can be set to 
NONE (i.e.: they call `qrsh` directly, in case they discover that they are 
executed under SGE [by certain set environment variables]). The start_proc_args 
in the PE was/is used to set up the links to the wrapper(s) and reformat the 
$pe_hostfile, in case the parallel library understands only a different 
format*. This is necessary e.g. for MPICH(1).

*) In case you heard of the application Gaussian: I also create the 
"%lindaworkers=..." list of nodes for the input file line in the 
start_proc_args.


> 5.       I was reading from some of the postings in the GE archive from 
> someone named “Reuti” regarding the “rsh” wrapper script.   If I understood 
> what he wrote correctly, it doesn’t matter if the Parallel MPI application is 
> using either “rsh” or “ssh”, the “rsh” wrapper script provided by GE is just 
> to force the application so use GE’s qrsh?    Am I stating this correctly?    
> Another way to state this is that “rsh” is just a name.   The name could be 
> anything as long as your MPI application is configured to use whatever name 
> of the communications protocol is used by the application, essentially the 
> basic contents of the wrapper script won’t change aside from the name “rsh” 
> and locations of scripts referenced by the wrapper script.   Again, am I 
> stating this correctly?

Yes to all.


> 6.       With regards to the various types and vendor’s MPI implementation.   
> What does it exactly mean that certain MPI implementations are GE aware?   I 
> tend to think that this means that parallel applications built with GE aware 
> MPI implementations know where to find the “$pe_hostfile” that GE generates 
> based on what resources the parallel application needs.   Is that all to it 
> for the MPI implementation to be GE aware?    I know that with Intel or Open 
> MPI, the PE environments that I’ve created don’t really require any special 
> scripts for the “start_proc_args” and “stop_proc_args” parameters in the PE.  
>   However, based on what little I have seen, LAM and Platform MPI 
> implementations appear to require one to use scripts based on ones like 
> “startmpi.sh” and “stopmpi.sh” in order to setup the proper formatted 
> $pe_hostfile to be used by these MPI implementations.   Is my understanding 
> of this correct?

Yes. While LAM/MPI is daemon based, Platform MPI uses a plain call to the slave 
nodes and can be tightly integrated by the wrapper and setting `export 
MPI_REMSH=rsh`.

For a builtin tight integration the MPI library needs to a) discover under what 
queuing system it is running (set environment variables, cna be SGE, SLURM, 
LSF, PBS, ...), b) find and honor the $pe_hostfile automatically (resp. other 
files for other queuing systems), c) start `qrsh -inherit ...` to start 
something on the granted nodes (some implementations need -V here too (you can 
check the source of Open MPI for example), to forward some variable to the 
slaves - *not* `qsub -V ...` which I try to avoid, as a random adjustment to 
the user's shell might lead to a crash of the job when it finally starts and 
this can be really hard to investigate, as a new submission with a fresh shell 
might work again.


> 7.       I was looking at the following options for the “qconf –sconf” 
> (global configuration) from GE.   
>  
> qlogin_command             builtin
> qlogin_daemon                builtin
> rlogin_command              builtin
> rlogin_daemon                 builtin
> rsh_command                   builtin
> rsh_daemon                      builtin
>  
> I was attempting to fully understand how the above parameters are related to 
> the execution of Parallel application jobs in GE.   What I’m wonder here is 
> if the parallel application job I would want GE to manage requires and uses 
> “ssh” by default for communications between Master and Slave tasks, does this 
> mean, that the above parameters would need to be configured to use “slogin”, 
> “ssh”, “sshd”, etc.?

No. These are two different things. With all of the above settings (before this 
question) you first configure SGE to intercept the `rsh` resp. `ssh` call 
(hence the application should never use an absolute path to start them). This 
will lead the to effect that `qrsh - inherit ...` will finally call the 
communication method which is configured by "rsh_command" and "rsh_daemon". If 
possible they should stay as "builtin". Then SGE will use its own internal 
communication to start the slave tasks, hence the cluster needs no `ssh` or 
`rsh` at all. In my clusters this is even disabled for normal users, and only 
admins can `ssh` to the nodes (if a user needs X11 forwarding to a node, this 
would be special of course). To let users check a node they have to run an 
interactive job in a special queue, which grants only 10 seconds CPU time 
(while the wallclock time can be almost infinity).

Other settings for these parameters are covered in this document - also 
different kinds of communication can be set up for different nodes and 
direction of the calls:

https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html

Let me know in case you need further details.

-- Reuti


> 
> Apologies for all the questions.   I just want to ensure I understand the PEs 
> a bit more.
>  
> Kind Regards,
>  
> -------
> Wayne Lee
>  
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Q: Understanding of Loose and Tight Integration of PEs.

Reply via email to