Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-31 Thread Kenneth A. Lloyd
I haven't used SGE or Oracle Grid Engine in ages, but apparently it is now
called Open Grid Engine
http://gridscheduler.sourceforge.net/


-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Rayson Ho
Sent: Friday, July 27, 2012 8:25 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] OpenMPI and SGE integration made more stable

On Fri, Jul 27, 2012 at 8:53 AM, Daniel Gruber <dgru...@univa.com> wrote:
> A while after u5 the open source repository was closed and most of the 
> German engineers from Sun/Oracle moved to Univa, working on Univa Grid 
> Engine. Currently you have the choice between Univa Grid Engine, Son 
> of Grid Engine (free acadmic project), and OGS.

Oracle Grid Engine is still alive, and in fact updates are still released by
Oracle from time to time.

(But of course it is not free, and since most people are looking for a free
download, it is usually not mentioned in the mailing list
discussions...)

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


> Daniel
>
>
>>
>> +-+--+
>> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
>> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
>> | Erwin-Schrödinger-Str.  |  |
>> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
>> ||
>> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
>> +-+--+
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2197 / Virus Database: 2437/5166 - Release Date: 07/30/12




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Rayson Ho
On Fri, Jul 27, 2012 at 8:53 AM, Daniel Gruber  wrote:
> A while after u5 the open source repository was closed and most of the
> German engineers from Sun/Oracle moved to Univa, working on Univa
> Grid Engine. Currently you have the choice between Univa Grid Engine,
> Son of Grid Engine (free acadmic project), and OGS.

Oracle Grid Engine is still alive, and in fact updates are still
released by Oracle from time to time.

(But of course it is not free, and since most people are looking for a
free download, it is usually not mentioned in the mailing list
discussions...)

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/


> Daniel
>
>
>>
>> +-+--+
>> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
>> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
>> | Erwin-Schrödinger-Str.  |  |
>> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
>> ||
>> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
>> +-+--+
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Daniel Gruber

Am 27.07.2012 um 14:39 schrieb Christoph van Wüllen:

> 
> Am 27.07.2012 um 09:00 schrieb Daniel Gruber:
> 
>> Setting the stack size limit to the vmem limit was fixed long 
>> time ago in 2009 for SGE 6.2u3 hence it should work in 
>> all later versions as well as in all SGE 6.2u5 successors 
>> like Univa Grid Engine. Hence the exact version number
>> would be interesting.
> 
> According to the rpm file name, sun-sge-bin-linux24-x64-6.2-2_1.x86_64.rpm,
> I am using 6.2u2 which came with the hardware.
> 
> I guess 6.2u5 is the latest "free" version, since it is the basis
> for Open Grid Scheduler.

Yes, at Sun/Oracle this was the latest version with free courtesy binaries
we released.
A while after u5 the open source repository was closed and most of the 
German engineers from Sun/Oracle moved to Univa, working on Univa 
Grid Engine. Currently you have the choice between Univa Grid Engine,
Son of Grid Engine (free acadmic project), and OGS.

Daniel


> 
> +-+--+
> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str.  |  |
> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
> ||
> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
> +-+--+
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Christoph van Wüllen

Am 27.07.2012 um 09:00 schrieb Daniel Gruber:

> Setting the stack size limit to the vmem limit was fixed long 
> time ago in 2009 for SGE 6.2u3 hence it should work in 
> all later versions as well as in all SGE 6.2u5 successors 
> like Univa Grid Engine. Hence the exact version number
> would be interesting.

According to the rpm file name, sun-sge-bin-linux24-x64-6.2-2_1.x86_64.rpm,
I am using 6.2u2 which came with the hardware.

I guess 6.2u5 is the latest "free" version, since it is the basis
for Open Grid Scheduler.

+-+--+
| Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
| TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
| Erwin-Schrödinger-Str.  |  |
| D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
||
| HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
+-+--+




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Daniel Gruber

Am 27.07.2012 um 14:25 schrieb Christoph van Wüllen:

> 
> Am 27.07.2012 um 06:11 schrieb Ralph Castain:
> 
>> Been chatting off-list with the SGE folks - can you tell us what version of 
>> SGE you are using?
>> 
>> 
> SGE 6.2, the rpm says sun-sge-bin-linux24-x64-6.2-2_1.x86_64.rpm

Your version is 6.2u2, the next update (u3) fixed it.

> 
> The problem is, that the address space limit set when requesting the
> resource h_vmem is automatically copied to the stack size limit.
> 
> However, it is much better not to touch the stack size limit and let
> it remain INFINITY.
> 

Exactly this was the fix in 2009.

Daniel

> It my sound harsh, but in my view SGE's behaviour is rather a bug
> than a feature.
> 
> Yours,
> 
> +-+--+
> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str.  |  |
> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
> ||
> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
> +-+--+
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Christoph van Wüllen

Am 27.07.2012 um 06:11 schrieb Ralph Castain:

> Been chatting off-list with the SGE folks - can you tell us what version of 
> SGE you are using?
> 
> 
SGE 6.2, the rpm says sun-sge-bin-linux24-x64-6.2-2_1.x86_64.rpm

The problem is, that the address space limit set when requesting the
resource h_vmem is automatically copied to the stack size limit.

However, it is much better not to touch the stack size limit and let
it remain INFINITY.

It my sound harsh, but in my view SGE's behaviour is rather a bug
than a feature.

Yours,

+-+--+
| Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
| TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
| Erwin-Schrödinger-Str.  |  |
| D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
||
| HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
+-+--+




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Daniel Gruber
Setting the stack size limit to the vmem limit was fixed long 
time ago in 2009 for SGE 6.2u3 hence it should work in 
all later versions as well as in all SGE 6.2u5 successors 
like Univa Grid Engine. Hence the exact version number
would be interesting.

Daniel


Am 26.07.2012 um 18:02 schrieb Christoph van Wüllen:

> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
> 
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
> 
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child()  before
> execv'ing qrsh circumvents the problem,  so just after exec_patch is set
> by strdup(...)   I inserted the lines
> 
>   {
>   struct rlimit rlim;
>   int l;
> 
>   l=strlen(exec_path);
>   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
> getrlimit(RLIMIT_STACK, );
> if (rlim.rlim_max > 1000L) rlim.rlim_max=1000L;
> if (rlim.rlim_cur > 1000L) rlim.rlim_cur=1000L;
> setrlimit(RLIMIT_STACK, );
>   }
>   }
> 
> 
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
> 
> Is this (still) of interest?
> 
> +-+--+
> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str.  |  |
> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
> ||
> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
> +-+--+
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OpenMPI and SGE integration made more stable

2012-07-27 Thread Ralph Castain
Been chatting off-list with the SGE folks - can you tell us what version of SGE 
you are using?


On Jul 26, 2012, at 9:02 AM, Christoph van Wüllen wrote:

> It is a long-standing problem that due to a bug in Sun GridEngine
> (setting the stack size limit equal to the address space limit)
> using qrsh from within OpenMPI fails if a large memory is requested
> but the stack size not explicitly set to a reasonably small value.
> 
> The best solution were if SGE just would not touch the stack
> size limit and leave it at INFINITY.
> 
> However I have tested that just reducing the stack size limit in
> file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child()  before
> execv'ing qrsh circumvents the problem,  so just after exec_patch is set
> by strdup(...)   I inserted the lines
> 
>   {
>   struct rlimit rlim;
>   int l;
> 
>   l=strlen(exec_path);
>   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
> getrlimit(RLIMIT_STACK, );
> if (rlim.rlim_max > 1000L) rlim.rlim_max=1000L;
> if (rlim.rlim_cur > 1000L) rlim.rlim_cur=1000L;
> setrlimit(RLIMIT_STACK, );
>   }
>   }
> 
> 
> It looks quick-and-dirty and it certainly is, but it solves a severe
> problem many users have with OpenMPI and SGE. Feel free to use this
> information as you like. Note that MPI worker jobs eventually
> spawned off on "distant" nodes do not suffer from the reduced stack
> size limit, it is only the qrsh command.
> 
> Is this (still) of interest?
> 
> +-+--+
> | Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
> | TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
> | Erwin-Schrödinger-Str.  |  |
> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
> ||
> | HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
> +-+--+
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] OpenMPI and SGE integration made more stable

2012-07-26 Thread Christoph van Wüllen
It is a long-standing problem that due to a bug in Sun GridEngine
(setting the stack size limit equal to the address space limit)
using qrsh from within OpenMPI fails if a large memory is requested
but the stack size not explicitly set to a reasonably small value.

The best solution were if SGE just would not touch the stack
size limit and leave it at INFINITY.

However I have tested that just reducing the stack size limit in
file orte/mca/plm/rsh/plm_rsh_module.c, function ssh_child()  before
execv'ing qrsh circumvents the problem,  so just after exec_patch is set
by strdup(...)   I inserted the lines

   {
   struct rlimit rlim;
   int l;

   l=strlen(exec_path);
   if (l > 5 && !strcmp("/qrsh", exec_path + (l-5))) {
 getrlimit(RLIMIT_STACK, );
 if (rlim.rlim_max > 1000L) rlim.rlim_max=1000L;
 if (rlim.rlim_cur > 1000L) rlim.rlim_cur=1000L;
 setrlimit(RLIMIT_STACK, );
   }
   }


It looks quick-and-dirty and it certainly is, but it solves a severe
problem many users have with OpenMPI and SGE. Feel free to use this
information as you like. Note that MPI worker jobs eventually
spawned off on "distant" nodes do not suffer from the reduced stack
size limit, it is only the qrsh command.

Is this (still) of interest?

+-+--+
| Prof. Christoph van Wüllen  | Tele-Phone (+49) (0)631 205 2749 |
| TU Kaiserslautern, FB Chemie| Tele-Fax   (+49) (0)631 205 2750 |
| Erwin-Schrödinger-Str.  |  |
| D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de   |
||
| HomePage:  http://www.chemie.uni-kl.de/vanwullen   |
+-+--+