Re: [OMPI users] Questions about integration with resource distribution systems

r...@open-mpi.org Wed, 26 Jul 2017 17:58:48 -0700

Oh no, that's not right. Mpirun launches daemons using qrsh and those daemons 
spawn the app's procs. SGE has no visibility of the app at all


Sent from my iPad

> On Jul 26, 2017, at 7:46 AM, Kulshrestha, Vipul 
> <vipul_kulshres...@mentor.com> wrote:
> 
> Thanks Reuti & RHC for your responses.
> 
> My application does not relies on the actual value of m_mem_free and I used 
> this as an example, in open source SGE environment, we use mem_free resource.
> 
> Now, I understand that SGE will allocate requested resources (based on qsub 
> options) and then launch mpirun, which starts "a.out" on allocated resouces 
> using 'qrsh -inherit', so that SGE can keep track of all the launched 
> processes.
> 
> I assume LSF integration works in a similar way.
> 
> Regards,
> Vipul
> 
> 
> -----Original Message-----
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti
> Sent: Wednesday, July 26, 2017 9:25 AM
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] Questions about integration with resource 
> distribution systems
> 
> 
>> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.org:
>> 
>> mpirun doesn’t get access to that requirement, nor does it need to do so. 
>> SGE will use the requirement when determining the nodes to allocate.
> 
> m_mem_free appears to come from Univa GE and is not part of the open source 
> versions. So I can't comment on this for sure, but it seems to set the memory 
> also in cgroups.
> 
> -- Reuti
> 
> 
>> mpirun just uses the nodes that SGE provides.
>> 
>> What your cmd line does is restrict the entire operation on each node 
>> (daemon + 8 procs) to 40GB of memory. OMPI does not support per-process 
>> restrictions other than binding to cpus.
>> 
>> 
>>> On Jul 26, 2017, at 6:03 AM, Kulshrestha, Vipul 
>>> <vipul_kulshres...@mentor.com> wrote:
>>> 
>>> Thanks for a quick response.
>>> 
>>> I will try building OMPI as suggested.
>>> 
>>> On the integration with unsupported distribution systems, we cannot use 
>>> script based approach, because often these machines don’t have ssh 
>>> permission in customer environment. I will explore the path of writing orte 
>>> component. At this stage, I don’t understand the effort for the same.
>>> 
>>> I guess my question 2 was not understood correctly. I used the below 
>>> command as an example for SGE and want to understand the expected behavior 
>>> for such a command. With the below command, I expect to have 8 copies of 
>>> a.out launched with each copy having access to 40GB of memory. Is that 
>>> correct? I am doubtful, because I don’t understand how mpirun gets access 
>>> to information about RAM requirement.
>>> 
>>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>> 
>>> 
>>> Regards,
>>> Vipul
>>> 
>>> 
>>> 
>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
>>> r...@open-mpi.org
>>> Sent: Tuesday, July 25, 2017 8:16 PM
>>> To: Open MPI Users <users@lists.open-mpi.org>
>>> Subject: Re: [OMPI users] Questions about integration with resource 
>>> distribution systems
>>> 
>>> 
>>> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>>> <vipul_kulshres...@mentor.com> wrote:
>>> 
>>> I have several questions about integration of openmpi with resource queuing 
>>> systems.
>>> 
>>> 1.
>>> I understand that openmpi supports integration with various resource 
>>> distribution systems such as SGE, LSF, torque etc.
>>> 
>>> I need to build an openmpi application that can interact with variety of 
>>> different resource distribution systems, since different customers have 
>>> different systems. Based on my research, it seems that I need to build a 
>>> different openmpi installation to work, e.g. create an installation of 
>>> opempi with grid and create a different installation of openmpi with LSF. 
>>> Is there a way to build a generic installation of openmpi that can be used 
>>> with more than 1 distribution system by using some generic mechanism?
>>> 
>>> Just to be clear: your application doesn’t depend on the environment in any 
>>> way. Only mpirun does - so if you are distributing an _application_, then 
>>> your question is irrelevant. 
>>> 
>>> If you are distributing OMPI itself, and therefore mpirun, then you can 
>>> build the various components if you first install the headers for that 
>>> environment on your system. It means that you need one machine where all 
>>> those resource managers at least have their headers installed on it. Then 
>>> configure OMPI --with-xxx pointing to each of the RM’s headers so all the 
>>> components get built. When the binary hits your customer’s machine, only 
>>> those components that have active libraries present will execute.
>>> 
>>> 
>>> 2.
>>> For integration with LSF/grid, how would I specify the memory (RAM) 
>>> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
>>> command? Will something like below work to ensure that each of the 8 copies 
>>> of a.out have 40 GB memory reserved for them by grid engine?
>>> 
>>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>> 
>>> You’ll have to provide something that is environment dependent, I’m afraid 
>>> - there is no standard out there.
>>> 
>>> 
>>> 
>>> 3.
>>> Some of our customers use custom distribution engine (some 
>>> non-industry-standard distribution engine). How can I integrate my openmpi  
>>> application with such system? I would think that it should be possible to 
>>> do that if openmpi launched/managed interaction with the distribution 
>>> engine using some kind of generic mechanism (say, use a configurable 
>>> command to launch, monitor, kill a job and then allow specification of a 
>>> plugin define these operations with commands specific to the distribution 
>>> engine being in use). Does such integration exist in openmpi?
>>> 
>>> Easiest solution is to write a script that reads the allocation and dumps 
>>> it into a file, and then provide that file as your hostfile on the mpirun 
>>> cmd line (or in the environment). We will then use ssh to perform the 
>>> launch. Otherwise, you’ll need to write at least an orte/mca/ras component 
>>> to get the allocation, and possibly an orte/mca/plm component if you want 
>>> to use the native launch mechanism in place of ssh.
>>> 
>>> 
>>> 
>>> 
>>> Thanks,
>>> Vipul
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

Reply via email to