Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Ralph Castain
Let's chat off-list about it - I don't see exactly how this works, but it may 
be similar enough. 


On Aug 27, 2011, at 8:30 AM, Joshua Hursey wrote:

> There is a 'self' checkpointer (CRS component) that does application level 
> checkpointing - exposed at the MPI level. I don't know how different what you 
> are working on is, but maybe something like that could be harnessed. Note 
> that I have not tested the 'self' checkpointer with the process migration 
> support, it -should- work, but there might be some bugs to work out.
> 
> Documentation and examples at the link below:
>  http://osl.iu.edu/research/ft/ompi-cr/examples.php#example-self
> 
> -- Josh
> 
> On Aug 26, 2011, at 6:17 PM, Ralph Castain wrote:
> 
>> FWIW: I'm in the process of porting some code from a branch that allows apps 
>> to do on-demand checkpoint/recovery style operations at the app level. 
>> Specifically, it provides the ability to:
>> 
>> * request a "recovery image" - an application-level blob containing state 
>> info required for the app to recover its state.
>> 
>> * register a callback point for providing a "recovery image", either to 
>> store for later use (separate API is used to indicate when to acquire it) or 
>> to provide to another process upon request
>> 
>> This is at the RTE level, so someone would have to expose it via an 
>> appropriate MPI call if someone wants to use it at that layer (I'm open to 
>> changes to support that use, if someone is interested).
>> 
>> 
>> On Aug 26, 2011, at 3:16 PM, Josh Hursey wrote:
>> 
>>> There are some great comments in this thread. Process migration (like
>>> many topics in systems) can get complex fast.
>>> 
>>> The Open MPI process migration implementation is checkpoint/restart
>>> based (currently using BLCR), and uses an 'eager' style of migration.
>>> This style of migration stops a process completely on the source
>>> machine, checkpoints/terminates it, restarts it on the destination
>>> machine, then rejoins it with the other running processes. I think the
>>> only documentation that we have is at the webpage below (and my PhD
>>> thesis, if you want the finer details):
>>> http://osl.iu.edu/research/ft/ompi-cr/
>>> 
>>> We have wanted to experiment with a 'pre-copy' or 'live' migration
>>> style, but have not had the necessary support from the underlying
>>> checkpointer or time to devote to making it happen. I think BLCR is
>>> working on including the necessary pieces in a future release (there
>>> are papers where a development version of BLCR has done this with
>>> LAM/MPI). So that might be something of interest.
>>> 
>>> Process migration techniques can benefit from fault prediction and
>>> 'good' target destination selection. Fault prediction allows us to
>>> move processes away from soon-to-fail locations, but it can be
>>> difficult to accurately predict failures. Open MPI has some hooks in
>>> the runtime layer that support 'sensors' which might help here. Good
>>> target destination selection is equally complex, but the idea here is
>>> to move processes to a machine where they can continue supporting the
>>> efficient execution of the application. So this might mean moving to
>>> the least loaded machine, or moving to a machine with other processes
>>> to reduce interprocess communication (something like dynamic load
>>> balancing).
>>> 
>>> So there are some ideas to get you started.
>>> 
>>> -- Josh
>>> 
>>> On Thu, Aug 25, 2011 at 12:06 PM, Rayson Ho  wrote:
 Don't know which SSI project you are referring to... I only know the
 OpenSSI project, and I was one of the first who subscribed to its
 mailing list (since 2001).
 
 http://openssi.org/cgi-bin/view?page=openssi.html
 
 I don't think those OpenSSI clusters are designed for tens of
 thousands of nodes, and not sure if it scales well to even a thousand
 nodes -- so IMO they have limited use for HPC clusters.
 
 Rayson
 
 
 
 On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  
 wrote:
> Also, in 2005 there was an attempt to implement SSI (Single System
> Image) functionality to the then-current 2.6.10 kernel. The proposal
> was very detailed and covered most of the bases of task creation, PID
> allocation etc across a loosely tied cluster (without using fancy
> hardware such as RDMA fabric). Anybody knows if it was ever
> implemented? Any pointers in this direction?
> 
> Thanks and regards
> Durga
> 
> 
> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
>> Srinivas,
>> 
>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>> if you can checkpoint an MPI task and restart it on a new node, then
>> this is also "process migration".
>> 
>> Of course, doing a checkpoint & restart can be slower than pure
>> in-kernel process migration, but the advantage is that you don't 

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Joshua Hursey
There is a 'self' checkpointer (CRS component) that does application level 
checkpointing - exposed at the MPI level. I don't know how different what you 
are working on is, but maybe something like that could be harnessed. Note that 
I have not tested the 'self' checkpointer with the process migration support, 
it -should- work, but there might be some bugs to work out.

Documentation and examples at the link below:
  http://osl.iu.edu/research/ft/ompi-cr/examples.php#example-self

-- Josh

On Aug 26, 2011, at 6:17 PM, Ralph Castain wrote:

> FWIW: I'm in the process of porting some code from a branch that allows apps 
> to do on-demand checkpoint/recovery style operations at the app level. 
> Specifically, it provides the ability to:
> 
> * request a "recovery image" - an application-level blob containing state 
> info required for the app to recover its state.
> 
> * register a callback point for providing a "recovery image", either to store 
> for later use (separate API is used to indicate when to acquire it) or to 
> provide to another process upon request
> 
> This is at the RTE level, so someone would have to expose it via an 
> appropriate MPI call if someone wants to use it at that layer (I'm open to 
> changes to support that use, if someone is interested).
> 
> 
> On Aug 26, 2011, at 3:16 PM, Josh Hursey wrote:
> 
>> There are some great comments in this thread. Process migration (like
>> many topics in systems) can get complex fast.
>> 
>> The Open MPI process migration implementation is checkpoint/restart
>> based (currently using BLCR), and uses an 'eager' style of migration.
>> This style of migration stops a process completely on the source
>> machine, checkpoints/terminates it, restarts it on the destination
>> machine, then rejoins it with the other running processes. I think the
>> only documentation that we have is at the webpage below (and my PhD
>> thesis, if you want the finer details):
>> http://osl.iu.edu/research/ft/ompi-cr/
>> 
>> We have wanted to experiment with a 'pre-copy' or 'live' migration
>> style, but have not had the necessary support from the underlying
>> checkpointer or time to devote to making it happen. I think BLCR is
>> working on including the necessary pieces in a future release (there
>> are papers where a development version of BLCR has done this with
>> LAM/MPI). So that might be something of interest.
>> 
>> Process migration techniques can benefit from fault prediction and
>> 'good' target destination selection. Fault prediction allows us to
>> move processes away from soon-to-fail locations, but it can be
>> difficult to accurately predict failures. Open MPI has some hooks in
>> the runtime layer that support 'sensors' which might help here. Good
>> target destination selection is equally complex, but the idea here is
>> to move processes to a machine where they can continue supporting the
>> efficient execution of the application. So this might mean moving to
>> the least loaded machine, or moving to a machine with other processes
>> to reduce interprocess communication (something like dynamic load
>> balancing).
>> 
>> So there are some ideas to get you started.
>> 
>> -- Josh
>> 
>> On Thu, Aug 25, 2011 at 12:06 PM, Rayson Ho  wrote:
>>> Don't know which SSI project you are referring to... I only know the
>>> OpenSSI project, and I was one of the first who subscribed to its
>>> mailing list (since 2001).
>>> 
>>> http://openssi.org/cgi-bin/view?page=openssi.html
>>> 
>>> I don't think those OpenSSI clusters are designed for tens of
>>> thousands of nodes, and not sure if it scales well to even a thousand
>>> nodes -- so IMO they have limited use for HPC clusters.
>>> 
>>> Rayson
>>> 
>>> 
>>> 
>>> On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  
>>> wrote:
 Also, in 2005 there was an attempt to implement SSI (Single System
 Image) functionality to the then-current 2.6.10 kernel. The proposal
 was very detailed and covered most of the bases of task creation, PID
 allocation etc across a loosely tied cluster (without using fancy
 hardware such as RDMA fabric). Anybody knows if it was ever
 implemented? Any pointers in this direction?
 
 Thanks and regards
 Durga
 
 
 On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
> Srinivas,
> 
> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
> if you can checkpoint an MPI task and restart it on a new node, then
> this is also "process migration".
> 
> Of course, doing a checkpoint & restart can be slower than pure
> in-kernel process migration, but the advantage is that you don't need
> any kernel support, and can in fact do all of it in user-space.
> 
> Rayson
> 
> 
> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>> It also depends on what part of migration interests you - are you 

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-26 Thread Ralph Castain
FWIW: I'm in the process of porting some code from a branch that allows apps to 
do on-demand checkpoint/recovery style operations at the app level. 
Specifically, it provides the ability to:

* request a "recovery image" - an application-level blob containing state info 
required for the app to recover its state.

* register a callback point for providing a "recovery image", either to store 
for later use (separate API is used to indicate when to acquire it) or to 
provide to another process upon request

This is at the RTE level, so someone would have to expose it via an appropriate 
MPI call if someone wants to use it at that layer (I'm open to changes to 
support that use, if someone is interested).


On Aug 26, 2011, at 3:16 PM, Josh Hursey wrote:

> There are some great comments in this thread. Process migration (like
> many topics in systems) can get complex fast.
> 
> The Open MPI process migration implementation is checkpoint/restart
> based (currently using BLCR), and uses an 'eager' style of migration.
> This style of migration stops a process completely on the source
> machine, checkpoints/terminates it, restarts it on the destination
> machine, then rejoins it with the other running processes. I think the
> only documentation that we have is at the webpage below (and my PhD
> thesis, if you want the finer details):
>  http://osl.iu.edu/research/ft/ompi-cr/
> 
> We have wanted to experiment with a 'pre-copy' or 'live' migration
> style, but have not had the necessary support from the underlying
> checkpointer or time to devote to making it happen. I think BLCR is
> working on including the necessary pieces in a future release (there
> are papers where a development version of BLCR has done this with
> LAM/MPI). So that might be something of interest.
> 
> Process migration techniques can benefit from fault prediction and
> 'good' target destination selection. Fault prediction allows us to
> move processes away from soon-to-fail locations, but it can be
> difficult to accurately predict failures. Open MPI has some hooks in
> the runtime layer that support 'sensors' which might help here. Good
> target destination selection is equally complex, but the idea here is
> to move processes to a machine where they can continue supporting the
> efficient execution of the application. So this might mean moving to
> the least loaded machine, or moving to a machine with other processes
> to reduce interprocess communication (something like dynamic load
> balancing).
> 
> So there are some ideas to get you started.
> 
> -- Josh
> 
> On Thu, Aug 25, 2011 at 12:06 PM, Rayson Ho  wrote:
>> Don't know which SSI project you are referring to... I only know the
>> OpenSSI project, and I was one of the first who subscribed to its
>> mailing list (since 2001).
>> 
>> http://openssi.org/cgi-bin/view?page=openssi.html
>> 
>> I don't think those OpenSSI clusters are designed for tens of
>> thousands of nodes, and not sure if it scales well to even a thousand
>> nodes -- so IMO they have limited use for HPC clusters.
>> 
>> Rayson
>> 
>> 
>> 
>> On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  wrote:
>>> Also, in 2005 there was an attempt to implement SSI (Single System
>>> Image) functionality to the then-current 2.6.10 kernel. The proposal
>>> was very detailed and covered most of the bases of task creation, PID
>>> allocation etc across a loosely tied cluster (without using fancy
>>> hardware such as RDMA fabric). Anybody knows if it was ever
>>> implemented? Any pointers in this direction?
>>> 
>>> Thanks and regards
>>> Durga
>>> 
>>> 
>>> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
 Srinivas,
 
 There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
 if you can checkpoint an MPI task and restart it on a new node, then
 this is also "process migration".
 
 Of course, doing a checkpoint & restart can be slower than pure
 in-kernel process migration, but the advantage is that you don't need
 any kernel support, and can in fact do all of it in user-space.
 
 Rayson
 
 
 On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
> It also depends on what part of migration interests you - are you wanting 
> to look at the MPI part of the problem (reconnecting MPI transports, 
> ensuring messages are not lost, etc.) or the RTE part of the problem 
> (where to restart processes, detecting failures, etc.)?
> 
> 
> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
> 
>> Be aware that process migration is a pretty complex issue.
>> 
>> Josh is probably the best one to answer your question directly, but he's 
>> out today.
>> 
>> 
>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>> 
>>> I am final year grad student looking for my final year project in 
>>> OpenMPI.We are group of 4 

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-26 Thread Josh Hursey
There are some great comments in this thread. Process migration (like
many topics in systems) can get complex fast.

The Open MPI process migration implementation is checkpoint/restart
based (currently using BLCR), and uses an 'eager' style of migration.
This style of migration stops a process completely on the source
machine, checkpoints/terminates it, restarts it on the destination
machine, then rejoins it with the other running processes. I think the
only documentation that we have is at the webpage below (and my PhD
thesis, if you want the finer details):
  http://osl.iu.edu/research/ft/ompi-cr/

We have wanted to experiment with a 'pre-copy' or 'live' migration
style, but have not had the necessary support from the underlying
checkpointer or time to devote to making it happen. I think BLCR is
working on including the necessary pieces in a future release (there
are papers where a development version of BLCR has done this with
LAM/MPI). So that might be something of interest.

Process migration techniques can benefit from fault prediction and
'good' target destination selection. Fault prediction allows us to
move processes away from soon-to-fail locations, but it can be
difficult to accurately predict failures. Open MPI has some hooks in
the runtime layer that support 'sensors' which might help here. Good
target destination selection is equally complex, but the idea here is
to move processes to a machine where they can continue supporting the
efficient execution of the application. So this might mean moving to
the least loaded machine, or moving to a machine with other processes
to reduce interprocess communication (something like dynamic load
balancing).

So there are some ideas to get you started.

-- Josh

On Thu, Aug 25, 2011 at 12:06 PM, Rayson Ho  wrote:
> Don't know which SSI project you are referring to... I only know the
> OpenSSI project, and I was one of the first who subscribed to its
> mailing list (since 2001).
>
> http://openssi.org/cgi-bin/view?page=openssi.html
>
> I don't think those OpenSSI clusters are designed for tens of
> thousands of nodes, and not sure if it scales well to even a thousand
> nodes -- so IMO they have limited use for HPC clusters.
>
> Rayson
>
>
>
> On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  wrote:
>> Also, in 2005 there was an attempt to implement SSI (Single System
>> Image) functionality to the then-current 2.6.10 kernel. The proposal
>> was very detailed and covered most of the bases of task creation, PID
>> allocation etc across a loosely tied cluster (without using fancy
>> hardware such as RDMA fabric). Anybody knows if it was ever
>> implemented? Any pointers in this direction?
>>
>> Thanks and regards
>> Durga
>>
>>
>> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
>>> Srinivas,
>>>
>>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>>> if you can checkpoint an MPI task and restart it on a new node, then
>>> this is also "process migration".
>>>
>>> Of course, doing a checkpoint & restart can be slower than pure
>>> in-kernel process migration, but the advantage is that you don't need
>>> any kernel support, and can in fact do all of it in user-space.
>>>
>>> Rayson
>>>
>>>
>>> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
 It also depends on what part of migration interests you - are you wanting 
 to look at the MPI part of the problem (reconnecting MPI transports, 
 ensuring messages are not lost, etc.) or the RTE part of the problem 
 (where to restart processes, detecting failures, etc.)?


 On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:

> Be aware that process migration is a pretty complex issue.
>
> Josh is probably the best one to answer your question directly, but he's 
> out today.
>
>
> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>
>> I am final year grad student looking for my final year project in 
>> OpenMPI.We are group of 4 students.
>> I wanted to know about the "Process Migration" process of MPI processes 
>> in OpenMPI.
>> Can anyone suggest me any ideas for project related to process migration 
>> in OenMPI or other topics in Systems.
>>
>>
>>
>> regards,
>> Srinivas Kundaram
>> srinu1...@gmail.com
>> +91-8149399160
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


 ___

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Don't know which SSI project you are referring to... I only know the
OpenSSI project, and I was one of the first who subscribed to its
mailing list (since 2001).

http://openssi.org/cgi-bin/view?page=openssi.html

I don't think those OpenSSI clusters are designed for tens of
thousands of nodes, and not sure if it scales well to even a thousand
nodes -- so IMO they have limited use for HPC clusters.

Rayson



On Thu, Aug 25, 2011 at 11:45 AM, Durga Choudhury  wrote:
> Also, in 2005 there was an attempt to implement SSI (Single System
> Image) functionality to the then-current 2.6.10 kernel. The proposal
> was very detailed and covered most of the bases of task creation, PID
> allocation etc across a loosely tied cluster (without using fancy
> hardware such as RDMA fabric). Anybody knows if it was ever
> implemented? Any pointers in this direction?
>
> Thanks and regards
> Durga
>
>
> On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
>> Srinivas,
>>
>> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
>> if you can checkpoint an MPI task and restart it on a new node, then
>> this is also "process migration".
>>
>> Of course, doing a checkpoint & restart can be slower than pure
>> in-kernel process migration, but the advantage is that you don't need
>> any kernel support, and can in fact do all of it in user-space.
>>
>> Rayson
>>
>>
>> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>>> It also depends on what part of migration interests you - are you wanting 
>>> to look at the MPI part of the problem (reconnecting MPI transports, 
>>> ensuring messages are not lost, etc.) or the RTE part of the problem (where 
>>> to restart processes, detecting failures, etc.)?
>>>
>>>
>>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>>>
 Be aware that process migration is a pretty complex issue.

 Josh is probably the best one to answer your question directly, but he's 
 out today.


 On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:

> I am final year grad student looking for my final year project in 
> OpenMPI.We are group of 4 students.
> I wanted to know about the "Process Migration" process of MPI processes 
> in OpenMPI.
> Can anyone suggest me any ideas for project related to process migration 
> in OenMPI or other topics in Systems.
>
>
>
> regards,
> Srinivas Kundaram
> srinu1...@gmail.com
> +91-8149399160
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


 --
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/


 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Rayson
>>
>> ==
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Durga Choudhury
Is anything done at the kernel level portable (e.g. to Windows)? It
*can* be, in principle at least (by putting appropriate #ifdef's in
the code), but I am wondering if it is in reality.

Also, in 2005 there was an attempt to implement SSI (Single System
Image) functionality to the then-current 2.6.10 kernel. The proposal
was very detailed and covered most of the bases of task creation, PID
allocation etc across a loosely tied cluster (without using fancy
hardware such as RDMA fabric). Anybody knows if it was ever
implemented? Any pointers in this direction?

Thanks and regards
Durga


On Thu, Aug 25, 2011 at 11:08 AM, Rayson Ho  wrote:
> Srinivas,
>
> There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
> if you can checkpoint an MPI task and restart it on a new node, then
> this is also "process migration".
>
> Of course, doing a checkpoint & restart can be slower than pure
> in-kernel process migration, but the advantage is that you don't need
> any kernel support, and can in fact do all of it in user-space.
>
> Rayson
>
>
> On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
>> It also depends on what part of migration interests you - are you wanting to 
>> look at the MPI part of the problem (reconnecting MPI transports, ensuring 
>> messages are not lost, etc.) or the RTE part of the problem (where to 
>> restart processes, detecting failures, etc.)?
>>
>>
>> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>>
>>> Be aware that process migration is a pretty complex issue.
>>>
>>> Josh is probably the best one to answer your question directly, but he's 
>>> out today.
>>>
>>>
>>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>>
 I am final year grad student looking for my final year project in 
 OpenMPI.We are group of 4 students.
 I wanted to know about the "Process Migration" process of MPI processes in 
 OpenMPI.
 Can anyone suggest me any ideas for project related to process migration 
 in OenMPI or other topics in Systems.



 regards,
 Srinivas Kundaram
 srinu1...@gmail.com
 +91-8149399160
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Rayson
>
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Rayson Ho
Srinivas,

There's also Kernel-Level Checkpointing vs. User-Level Checkpointing -
if you can checkpoint an MPI task and restart it on a new node, then
this is also "process migration".

Of course, doing a checkpoint & restart can be slower than pure
in-kernel process migration, but the advantage is that you don't need
any kernel support, and can in fact do all of it in user-space.

Rayson


On Thu, Aug 25, 2011 at 10:26 AM, Ralph Castain  wrote:
> It also depends on what part of migration interests you - are you wanting to 
> look at the MPI part of the problem (reconnecting MPI transports, ensuring 
> messages are not lost, etc.) or the RTE part of the problem (where to restart 
> processes, detecting failures, etc.)?
>
>
> On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:
>
>> Be aware that process migration is a pretty complex issue.
>>
>> Josh is probably the best one to answer your question directly, but he's out 
>> today.
>>
>>
>> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
>>
>>> I am final year grad student looking for my final year project in 
>>> OpenMPI.We are group of 4 students.
>>> I wanted to know about the "Process Migration" process of MPI processes in 
>>> OpenMPI.
>>> Can anyone suggest me any ideas for project related to process migration in 
>>> OenMPI or other topics in Systems.
>>>
>>>
>>>
>>> regards,
>>> Srinivas Kundaram
>>> srinu1...@gmail.com
>>> +91-8149399160
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/



Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-25 Thread Ralph Castain
It also depends on what part of migration interests you - are you wanting to 
look at the MPI part of the problem (reconnecting MPI transports, ensuring 
messages are not lost, etc.) or the RTE part of the problem (where to restart 
processes, detecting failures, etc.)?


On Aug 24, 2011, at 7:04 AM, Jeff Squyres wrote:

> Be aware that process migration is a pretty complex issue.
> 
> Josh is probably the best one to answer your question directly, but he's out 
> today.
> 
> 
> On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:
> 
>> I am final year grad student looking for my final year project in OpenMPI.We 
>> are group of 4 students.
>> I wanted to know about the "Process Migration" process of MPI processes in 
>> OpenMPI.
>> Can anyone suggest me any ideas for project related to process migration in 
>> OenMPI or other topics in Systems.
>> 
>> 
>> 
>> regards,
>> Srinivas Kundaram
>> srinu1...@gmail.com
>> +91-8149399160
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-24 Thread Jeff Squyres
Be aware that process migration is a pretty complex issue.

Josh is probably the best one to answer your question directly, but he's out 
today.


On Aug 24, 2011, at 5:45 AM, srinivas kundaram wrote:

> I am final year grad student looking for my final year project in OpenMPI.We 
> are group of 4 students.
> I wanted to know about the "Process Migration" process of MPI processes in 
> OpenMPI.
> Can anyone suggest me any ideas for project related to process migration in 
> OenMPI or other topics in Systems.
> 
> 
> 
> regards,
> Srinivas Kundaram
> srinu1...@gmail.com
> +91-8149399160
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Related to project ideas in OpenMPI

2011-08-24 Thread srinivas kundaram
I am final year grad student looking for my final year project in OpenMPI.We
are group of 4 students.
I wanted to know about the "Process Migration" process of MPI processes in
OpenMPI.
Can anyone suggest me any ideas for project related to process migration in
OenMPI or other topics in Systems.



regards,
Srinivas Kundaram
srinu1...@gmail.com
+91-8149399160