Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
Yes, i am setting the config correcty.  Our IB machines seem to run
just fine so far using srun and openmpi v1.5.

As another data point, we enabled mpi-threads in Openmpi and that also
seems to trigger the Srun/TCP behavior, but on the IB fabric.  Running
the program within an salloc rather the straight srun and the problem
seems to go away



On Tue, Jan 25, 2011 at 2:59 PM, Nathan Hjelm  wrote:
> We are seeing the similar problem with our infiniband machines. After some
> investigation I discovered that we were not setting our slurm environment
> correctly (ref:
> https://computing.llnl.gov/linux/slurm/mpi_guide.html#open_mpi). Are you
> setting the ports in your slurm.conf and executing srun with --resv-ports?
>
> I have yet to see if this fixes the problem for LANL. Waiting on a sysadmin
> to modify the slurm.conf.
>
> -Nathan
> HPC-3, LANL
>
> On Tue, 25 Jan 2011, Michael Di Domenico wrote:
>
>> Thanks.  We're only seeing it on machines with Ethernet only as the
>> interconnect.  fortunately for us that only equates to one small
>> machine, but it's still annoying.  unfortunately, i don't have enough
>> knowledge to dive into the code to help fix, but i can certainly help
>> test
>>
>> On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm  wrote:
>>>
>>> I am seeing similar issues on our slurm clusters. We are looking into the
>>> issue.
>>>
>>> -Nathan
>>> HPC-3, LANL
>>>
>>> On Tue, 11 Jan 2011, Michael Di Domenico wrote:
>>>
 Any ideas on what might be causing this one?  Or atleast what
 additional debug information someone might need?

 On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico
  wrote:
>
> I'm still testing the slurm integration, which seems to work fine so
> far.  However, i just upgraded another cluster to openmpi-1.5 and
> slurm 2.1.15 but this machine has no infiniband
>
> if i salloc the nodes and mpirun the command it seems to run and
> complete
> fine
> however if i srun the command i get
>
> [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received
> unexpected prcoess identifier
>
> the job does not seem to run, but exhibits two behaviors
> running a single process per node the job runs and does not present
> the error (srun -N40 --ntasks-per-node=1)
> running multiple processes per node, the job spits out the error but
> does not run (srun -n40 --ntasks-per-node=8)
>
> I copied the configs from the other machine, so (i think) everything
> should be configured correctly (but i can't rule it out)
>
> I saw (and reported) a similar error to above with the 1.4-dev branch
> (see mailing list) and slurm, I can't say whether they're related or
> not though
>
>
> On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres 
> wrote:
>>
>> Yo Ralph --
>>
>> I see this was committed
>> https://svn.open-mpi.org/trac/ompi/changeset/24197.  Do you want to
>> add a
>> blurb in README about it, and/or have this executable compiled as part
>> of
>> the PSM MTL and then installed into $bindir (maybe named
>> ompi-psm-keygen)?
>>
>> Right now, it's only compiled as part of "make check" and not
>> installed,
>> right?
>>
>>
>>
>> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:
>>
>>> Run the program only once - it can be in the prolog of the job if you
>>> like. The output value needs to be in the env of every rank.
>>>
>>> You can reuse the value as many times as you like - it doesn't have
>>> to
>>> be unique for each job. There is nothing magic about the value
>>> itself.
>>>
>>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
>>>
 How early does this need to run? Can I run it as part of a task
 prolog, or does it need to be the shell env for each rank?  And does
 it need to run on one node or all the nodes in the job?

 On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain 
 wrote:
>
> Well, I couldn't do it as a patch - proved too complicated as the
> psm
> system looks for the value early in the boot procedure.
>
> What I can do is give you the attached key generator program. It
> outputs the envar required to run your program. So if you run the
> attached
> program and then export the output into your environment, you
> should be
> okay. Looks like this:
>
> $ ./psm_keygen
>
>
> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
> $
>
> You compile the program with the usual mpicc.
>
> Let me know if this solves the problem (or not).
> Ralph
>
>
>
>
> On Dec 30, 2010, at 

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Reuti
Am 25.01.2011 um 20:10 schrieb Will Glover:

> Thanks for your response, Reuti.  Actually I had seen you mention the SGE 
> mailing list in response to a similar question but I can't for the life of me 
> find that list :(

The list was removed with the shutdown of the open source site by Oracle, 
moving GridEngine to pure commercial product. But as you might know, Univa 
stepped in and we will see some findings shortly...

For now you can check Markmail: http://gridengine.markmail.org/ or an unindexed 
archive at http://arc.liv.ac.uk/pipermail/gridengine-users/ It's a bit hidden 
in http://arc.liv.ac.uk/pipermail/gridengine-users/2009-December.txt (search 
for cutting) or http://arc.liv.ac.uk/pipermail/gridengine-users/2010-July.txt 
(search for varying) There is also another solution explained using a load 
threshold.

(dynamic MPI-2 tasks shorter than background jobs: use the solution with 
different nice values,
background jobs much shorter than the MPI-2 tasks: get rid of these background 
jobs by a load threshold and drain them)


> As for using the background queue, just to clarify - is the idea to submit my 
> parallel job on a regular queue with 100 processors at nice 0

yep

> , but allow other 'background queue' jobs on the same processors at nice 19?

yep

>  Presumably, I'd still need mpi-2's dynamic process management to free up 
> processors when they are not needed (at the moment, they use 100% cpu idling 
> in MPI_Recv for example).

When they are really idling at 100%, then you are correct, and you have to 
release them by an MPI-2 call.


>  Did I understand you correctly?

yep.

This way you will minimize the otherwise wasted computing time and avoid idling 
cores.

-- Reuti


> -- 
> Will
> 
> --- On Tue, 1/25/11, Reuti  wrote:
> 
>> From: Reuti 
>> Subject: Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?
>> To: "Open MPI Users" 
>> Date: Tuesday, January 25, 2011, 9:27 AM
>> Am 25.01.2011 um 12:32 schrieb Terry
>> Dontje:
>> 
>>> On 01/25/2011 02:17 AM, Will Glover wrote:
 Hi all,
 I tried a google/mailing list search for this but
>> came up with nothing, so here goes:
 
 Is there any level of automation between open
>> mpi's dynamic process management and the SGE queue
>> manager?  
 In particular, can I make a call to mpi_comm_spawn
>> and have SGE dynamically increase the number of slots? 
>> 
 This seems a little far fetched, but it would be
>> really useful if this is possible.  My application is
>> 'restricted' to coarse-grain task parallelism and involves a
>> work load that varies significantly during runtime (between
>> 1 and ~100 parallel tasks).  Dynamic process management
>> would maintain an optimal number of processors and reduce
>> idling.
 
 Many thanks,
 
>>> This is an interesting idea but no integration has
>> been done that would allow an MPI job to request more slots.
>> 
>> 
>> Similar ideas were on the former SGE mailing list a couple
>> of times - having varying resource requests over the
>> lifetime of a job (cores, memory, licenses, ...). This would
>> mean in the end to have some kind of real-time-queuing
>> system, as you have to have the necessary resources to be
>> free in time for sure.
>> 
>> Besides this also some syntax for either requesting a
>> "resource profile over time" when such a job is submitted
>> would be necessary, or to allow a job while it's running
>> issuing some kinds of commands to request/release resources
>> on demand.
>> 
>> If you have such a "resource profile over time" for a bunch
>> of jobs, it could then be extended to solve a cutting-stock
>> problem where the unit to be cut would be time, e.g. arrange
>> these 10 jobs that they finish in the least amount of time
>> all together - and you could predict exactly when each job
>> will end. This is getting really complex.
>> 
>> ==
>> 
>> What can be done in your situation: have some kind of
>> "background queue" with a nice value of 19, but the parallel
>> job you submit to a queue with the default nice value 0.
>> Although you request 100 cores and reserve them (i.e. the
>> background queue shouldn't be suspended in such a case of
>> course), the background queue will still run at full speed
>> when nothing else is running on the nodes. When some of the
>> parallel tasks are started on the nodes, they will get most
>> of the computing time (this means: oversubscription by
>> intention). The background queue can be used for less
>> important jobs. Such a setup is usefull when your parallel
>> application isn't running in parallel all the time like in
>> your case.
>> 
>> -- Reuti
>> 
>> 
>>> -- 
>>> 
>>> Terry D. Dontje | Principal Software Engineer
>>> Developer Tools Engineering | +1.781.442.2631
>>> Oracle - Performance Technologies
>>> 95 Network Drive, Burlington, MA 01803
>>> Email terry.don...@oracle.com
>>> 
>>> 
>>> 
>>> 

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Nathan Hjelm

We are seeing the similar problem with our infiniband machines. After some 
investigation I discovered that we were not setting our slurm environment 
correctly (ref: 
https://computing.llnl.gov/linux/slurm/mpi_guide.html#open_mpi). Are you 
setting the ports in your slurm.conf and executing srun with --resv-ports?

I have yet to see if this fixes the problem for LANL. Waiting on a sysadmin to 
modify the slurm.conf.

-Nathan
HPC-3, LANL

On Tue, 25 Jan 2011, Michael Di Domenico wrote:


Thanks.  We're only seeing it on machines with Ethernet only as the
interconnect.  fortunately for us that only equates to one small
machine, but it's still annoying.  unfortunately, i don't have enough
knowledge to dive into the code to help fix, but i can certainly help
test

On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm  wrote:

I am seeing similar issues on our slurm clusters. We are looking into the
issue.

-Nathan
HPC-3, LANL

On Tue, 11 Jan 2011, Michael Di Domenico wrote:


Any ideas on what might be causing this one?  Or atleast what
additional debug information someone might need?

On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico
 wrote:


I'm still testing the slurm integration, which seems to work fine so
far.  However, i just upgraded another cluster to openmpi-1.5 and
slurm 2.1.15 but this machine has no infiniband

if i salloc the nodes and mpirun the command it seems to run and complete
fine
however if i srun the command i get

[btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received
unexpected prcoess identifier

the job does not seem to run, but exhibits two behaviors
running a single process per node the job runs and does not present
the error (srun -N40 --ntasks-per-node=1)
running multiple processes per node, the job spits out the error but
does not run (srun -n40 --ntasks-per-node=8)

I copied the configs from the other machine, so (i think) everything
should be configured correctly (but i can't rule it out)

I saw (and reported) a similar error to above with the 1.4-dev branch
(see mailing list) and slurm, I can't say whether they're related or
not though


On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres  wrote:


Yo Ralph --

I see this was committed
https://svn.open-mpi.org/trac/ompi/changeset/24197.  Do you want to add a
blurb in README about it, and/or have this executable compiled as part of
the PSM MTL and then installed into $bindir (maybe named ompi-psm-keygen)?

Right now, it's only compiled as part of "make check" and not installed,
right?



On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:


Run the program only once - it can be in the prolog of the job if you
like. The output value needs to be in the env of every rank.

You can reuse the value as many times as you like - it doesn't have to
be unique for each job. There is nothing magic about the value itself.

On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:


How early does this need to run? Can I run it as part of a task
prolog, or does it need to be the shell env for each rank?  And does
it need to run on one node or all the nodes in the job?

On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain 
wrote:


Well, I couldn't do it as a patch - proved too complicated as the psm
system looks for the value early in the boot procedure.

What I can do is give you the attached key generator program. It
outputs the envar required to run your program. So if you run the attached
program and then export the output into your environment, you should be
okay. Looks like this:

$ ./psm_keygen

OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
$

You compile the program with the usual mpicc.

Let me know if this solves the problem (or not).
Ralph




On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote:


Sure, i'll give it a go

On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain 
wrote:


Ah, yes - that is going to be a problem. The PSM key gets generated
by mpirun as it is shared info - i.e., every proc has to get the same value.

I can create a patch that will do this for the srun direct-launch
scenario, if you want to try it. Would be later today, though.


On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote:


Well maybe not horray, yet.  I might have jumped the gun a bit,
it's
looking like srun works in general, but perhaps not with PSM

With PSM i get this error, (at least now i know what i changed)

Error obtaining unique transport key from ORTE
(orte_precondition_transports not present in the environment)
PML add procs failed
--> Returned "Error" (-1) instead of "Success" (0)

Turn off PSM and srun works fine


On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain 
wrote:


Hooray!

On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote:


I think i take it all back.  I just tried it again and it seems
to
work now.  I'm not sure what I changed (between my first and
this
msg), but it does appear 

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Will Glover
Thanks for your response, Reuti.  Actually I had seen you mention the SGE 
mailing list in response to a similar question but I can't for the life of me 
find that list :(

As for using the background queue, just to clarify - is the idea to submit my 
parallel job on a regular queue with 100 processors at nice 0, but allow other 
'background queue' jobs on the same processors at nice 19?  Presumably, I'd 
still need mpi-2's dynamic process management to free up processors when they 
are not needed (at the moment, they use 100% cpu idling in MPI_Recv for 
example).  Did I understand you correctly?
-- 
Will

--- On Tue, 1/25/11, Reuti  wrote:

> From: Reuti 
> Subject: Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?
> To: "Open MPI Users" 
> Date: Tuesday, January 25, 2011, 9:27 AM
> Am 25.01.2011 um 12:32 schrieb Terry
> Dontje:
> 
> > On 01/25/2011 02:17 AM, Will Glover wrote:
> >> Hi all,
> >> I tried a google/mailing list search for this but
> came up with nothing, so here goes:
> >> 
> >> Is there any level of automation between open
> mpi's dynamic process management and the SGE queue
> manager?  
> >> In particular, can I make a call to mpi_comm_spawn
> and have SGE dynamically increase the number of slots? 
> 
> >> This seems a little far fetched, but it would be
> really useful if this is possible.  My application is
> 'restricted' to coarse-grain task parallelism and involves a
> work load that varies significantly during runtime (between
> 1 and ~100 parallel tasks).  Dynamic process management
> would maintain an optimal number of processors and reduce
> idling.
> >> 
> >> Many thanks,
> >> 
> > This is an interesting idea but no integration has
> been done that would allow an MPI job to request more slots.
> 
> 
> Similar ideas were on the former SGE mailing list a couple
> of times - having varying resource requests over the
> lifetime of a job (cores, memory, licenses, ...). This would
> mean in the end to have some kind of real-time-queuing
> system, as you have to have the necessary resources to be
> free in time for sure.
> 
> Besides this also some syntax for either requesting a
> "resource profile over time" when such a job is submitted
> would be necessary, or to allow a job while it's running
> issuing some kinds of commands to request/release resources
> on demand.
> 
> If you have such a "resource profile over time" for a bunch
> of jobs, it could then be extended to solve a cutting-stock
> problem where the unit to be cut would be time, e.g. arrange
> these 10 jobs that they finish in the least amount of time
> all together - and you could predict exactly when each job
> will end. This is getting really complex.
> 
> ==
> 
> What can be done in your situation: have some kind of
> "background queue" with a nice value of 19, but the parallel
> job you submit to a queue with the default nice value 0.
> Although you request 100 cores and reserve them (i.e. the
> background queue shouldn't be suspended in such a case of
> course), the background queue will still run at full speed
> when nothing else is running on the nodes. When some of the
> parallel tasks are started on the nodes, they will get most
> of the computing time (this means: oversubscription by
> intention). The background queue can be used for less
> important jobs. Such a setup is usefull when your parallel
> application isn't running in parallel all the time like in
> your case.
> 
> -- Reuti
> 
> 
> > -- 
> > 
> > Terry D. Dontje | Principal Software Engineer
> > Developer Tools Engineering | +1.781.442.2631
> > Oracle - Performance Technologies
> > 95 Network Drive, Burlington, MA 01803
> > Email terry.don...@oracle.com
> > 
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 






Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
Thanks.  We're only seeing it on machines with Ethernet only as the
interconnect.  fortunately for us that only equates to one small
machine, but it's still annoying.  unfortunately, i don't have enough
knowledge to dive into the code to help fix, but i can certainly help
test

On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm  wrote:
> I am seeing similar issues on our slurm clusters. We are looking into the
> issue.
>
> -Nathan
> HPC-3, LANL
>
> On Tue, 11 Jan 2011, Michael Di Domenico wrote:
>
>> Any ideas on what might be causing this one?  Or atleast what
>> additional debug information someone might need?
>>
>> On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico
>>  wrote:
>>>
>>> I'm still testing the slurm integration, which seems to work fine so
>>> far.  However, i just upgraded another cluster to openmpi-1.5 and
>>> slurm 2.1.15 but this machine has no infiniband
>>>
>>> if i salloc the nodes and mpirun the command it seems to run and complete
>>> fine
>>> however if i srun the command i get
>>>
>>> [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received
>>> unexpected prcoess identifier
>>>
>>> the job does not seem to run, but exhibits two behaviors
>>> running a single process per node the job runs and does not present
>>> the error (srun -N40 --ntasks-per-node=1)
>>> running multiple processes per node, the job spits out the error but
>>> does not run (srun -n40 --ntasks-per-node=8)
>>>
>>> I copied the configs from the other machine, so (i think) everything
>>> should be configured correctly (but i can't rule it out)
>>>
>>> I saw (and reported) a similar error to above with the 1.4-dev branch
>>> (see mailing list) and slurm, I can't say whether they're related or
>>> not though
>>>
>>>
>>> On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres  wrote:

 Yo Ralph --

 I see this was committed
 https://svn.open-mpi.org/trac/ompi/changeset/24197.  Do you want to add a
 blurb in README about it, and/or have this executable compiled as part of
 the PSM MTL and then installed into $bindir (maybe named ompi-psm-keygen)?

 Right now, it's only compiled as part of "make check" and not installed,
 right?



 On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote:

> Run the program only once - it can be in the prolog of the job if you
> like. The output value needs to be in the env of every rank.
>
> You can reuse the value as many times as you like - it doesn't have to
> be unique for each job. There is nothing magic about the value itself.
>
> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote:
>
>> How early does this need to run? Can I run it as part of a task
>> prolog, or does it need to be the shell env for each rank?  And does
>> it need to run on one node or all the nodes in the job?
>>
>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain 
>> wrote:
>>>
>>> Well, I couldn't do it as a patch - proved too complicated as the psm
>>> system looks for the value early in the boot procedure.
>>>
>>> What I can do is give you the attached key generator program. It
>>> outputs the envar required to run your program. So if you run the 
>>> attached
>>> program and then export the output into your environment, you should be
>>> okay. Looks like this:
>>>
>>> $ ./psm_keygen
>>>
>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954
>>> $
>>>
>>> You compile the program with the usual mpicc.
>>>
>>> Let me know if this solves the problem (or not).
>>> Ralph
>>>
>>>
>>>
>>>
>>> On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote:
>>>
 Sure, i'll give it a go

 On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain 
 wrote:
>
> Ah, yes - that is going to be a problem. The PSM key gets generated
> by mpirun as it is shared info - i.e., every proc has to get the same 
> value.
>
> I can create a patch that will do this for the srun direct-launch
> scenario, if you want to try it. Would be later today, though.
>
>
> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote:
>
>> Well maybe not horray, yet.  I might have jumped the gun a bit,
>> it's
>> looking like srun works in general, but perhaps not with PSM
>>
>> With PSM i get this error, (at least now i know what i changed)
>>
>> Error obtaining unique transport key from ORTE
>> (orte_precondition_transports not present in the environment)
>> PML add procs failed
>> --> Returned "Error" (-1) instead of "Success" (0)
>>
>> Turn off PSM and srun works fine
>>
>>
>> On Thu, Dec 30, 2010 at 5:13 PM, 

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Reuti
Am 25.01.2011 um 12:32 schrieb Terry Dontje:

> On 01/25/2011 02:17 AM, Will Glover wrote:
>> Hi all,
>> I tried a google/mailing list search for this but came up with nothing, so 
>> here goes:
>> 
>> Is there any level of automation between open mpi's dynamic process 
>> management and the SGE queue manager?  
>> In particular, can I make a call to mpi_comm_spawn and have SGE dynamically 
>> increase the number of slots?  
>> This seems a little far fetched, but it would be really useful if this is 
>> possible.  My application is 'restricted' to coarse-grain task parallelism 
>> and involves a work load that varies significantly during runtime (between 1 
>> and ~100 parallel tasks).  Dynamic process management would maintain an 
>> optimal number of processors and reduce idling.
>> 
>> Many thanks,
>> 
> This is an interesting idea but no integration has been done that would allow 
> an MPI job to request more slots. 

Similar ideas were on the former SGE mailing list a couple of times - having 
varying resource requests over the lifetime of a job (cores, memory, licenses, 
...). This would mean in the end to have some kind of real-time-queuing system, 
as you have to have the necessary resources to be free in time for sure.

Besides this also some syntax for either requesting a "resource profile over 
time" when such a job is submitted would be necessary, or to allow a job while 
it's running issuing some kinds of commands to request/release resources on 
demand.

If you have such a "resource profile over time" for a bunch of jobs, it could 
then be extended to solve a cutting-stock problem where the unit to be cut 
would be time, e.g. arrange these 10 jobs that they finish in the least amount 
of time all together - and you could predict exactly when each job will end. 
This is getting really complex.

==

What can be done in your situation: have some kind of "background queue" with a 
nice value of 19, but the parallel job you submit to a queue with the default 
nice value 0. Although you request 100 cores and reserve them (i.e. the 
background queue shouldn't be suspended in such a case of course), the 
background queue will still run at full speed when nothing else is running on 
the nodes. When some of the parallel tasks are started on the nodes, they will 
get most of the computing time (this means: oversubscription by intention). The 
background queue can be used for less important jobs. Such a setup is usefull 
when your parallel application isn't running in parallel all the time like in 
your case.

-- Reuti


> -- 
> 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Will Glover
Hi all,
I tried a google/mailing list search for this but came up with nothing, so here 
goes:

Is there any level of automation between open mpi's dynamic process management 
and the SGE queue manager?  
In particular, can I make a call to mpi_comm_spawn and have SGE dynamically 
increase the number of slots?  
This seems a little far fetched, but it would be really useful if this is 
possible.  My application is 'restricted' to coarse-grain task parallelism and 
involves a work load that varies significantly during runtime (between 1 and 
~100 parallel tasks).  Dynamic process management would maintain an optimal 
number of processors and reduce idling.

Many thanks,
-- 
Will Glover





[OMPI users] Serial Rapid IO ?

2011-01-25 Thread Mohamed Husain A.K
Dear all ,
Is it possible to configure the cluster communication on serial rapid io ?
kindly help

Mohamed Husain


>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>