Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 16:10 schrieb Joshua Hursey:

> 
> On Jan 27, 2011, at 9:47 AM, Reuti wrote:
> 
>> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
>> 
>>> The current version of Open MPI does not support continued operation of an 
>>> MPI application after process failure within a job. If a process dies, so 
>>> will the MPI job. Note that this is true of many MPI implementations out 
>>> there at the moment.
>>> 
>>> At Oak Ridge National Laboratory, we are working on a version of Open MPI 
>>> that will be able to run-through process failure, if the application wishes 
>>> to do so. The semantics and interfaces needed to support this functionality 
>>> are being actively developed by the MPI Forums Fault Tolerance Working 
>>> Group, and can be found at the wiki page below:
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization
>> 
>> I had a look at this document, but what is really covered - the application 
>> has to react on the notification of a failed rank and act appropriate on its 
>> own?
> 
> Yes. This is to support application based fault tolerance (ABFT). Libraries 
> could be developed on top of these semantics to hide some of the fault 
> handing. The purpose is to enable fault tolerant MPI applications and 
> libraries to be built on top of MPI.
> 
> This document only covers run-through stabilization, not process recovery, at 
> the moment. So the application will have well defined semantics to allow it 
> to continue processing without the failed process. Recovering the failed 
> process is not specified in this document. That is the subject of a 
> supplemental document in preparation - the two proposals are meant to be 
> complementary and build upon one another.
> 
>> 
>> Having a true ability to survive a dying process (i.e. rank) which might be 
>> computing already for hours would mean to have some kind of "rank RAID" or 
>> "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks 
>> are failing, your job will be ready in time.
> 
> Yes, that is one possible technique. So once a process failure occurs, the 
> application is notified via the existing error handling mechanisms. The 
> application is then responsible for determining how best to recover from that 
> process failure. This could include using MPI_Comm_spawn to create new 
> processes (useful in manager/worker applications), recovering the state from 
> an in-memory checksum, using spare processes in the communicator, rolling 
> back some/all ranks to an application level checkpoint, ignoring the failure 
> and allowing the residual error to increase, aborting the job or a single 
> sub-communicator, ... the list goes on. But the purpose of the proposal is to 
> allow an application or library to start building such techniques based on 
> portable semantics and well defined interfaces.
> 
> Does that help clarify?

Yes - thx.

-- Reuti


> If you would like to discuss the developing proposals further or have input 
> on how to make it better, I would suggest moving the discussion to the 
> MPI3-ft mailing list so other groups can participate that do not normally 
> follow the Open MPI lists. The mailing list information is below:
>  http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> 
> -- Josh
> 
>> 
>> -- Reuti
>> 
>> 
>>> This work is on-going, but once we have a stable prototype we will assess 
>>> how to bring it back to the mainline Open MPI trunk. For the moment, there 
>>> is no public release of this branch, but once there is we will be sure to 
>>> announce it on the appropriate Open MPI mailing list for folks to start 
>>> playing around with it.
>>> 
>>> -- Josh
>>> 
>>> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:
>>> 
 Hi,
 
 I was wondering what support Open MPI has for allowing a job to
 continue running when one or more processes in the job die
 unexpectedly? Is there a special mpirun flag for this? Any other ways?
 
 It seems obvious that collectives will fail once a process dies, but
 would it be possible to create a new group (if you knew which ranks
 are dead) that excludes the dead processes - then turn this group into
 a working communicator?
 
 Thanks,
 Kirk
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 
>>> 
>>> 
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> Joshua Hursey
> Postdoctoral Research Associate

Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey

On Jan 27, 2011, at 9:47 AM, Reuti wrote:

> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
> 
>> The current version of Open MPI does not support continued operation of an 
>> MPI application after process failure within a job. If a process dies, so 
>> will the MPI job. Note that this is true of many MPI implementations out 
>> there at the moment.
>> 
>> At Oak Ridge National Laboratory, we are working on a version of Open MPI 
>> that will be able to run-through process failure, if the application wishes 
>> to do so. The semantics and interfaces needed to support this functionality 
>> are being actively developed by the MPI Forums Fault Tolerance Working 
>> Group, and can be found at the wiki page below:
>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization
> 
> I had a look at this document, but what is really covered - the application 
> has to react on the notification of a failed rank and act appropriate on its 
> own?

Yes. This is to support application based fault tolerance (ABFT). Libraries 
could be developed on top of these semantics to hide some of the fault handing. 
The purpose is to enable fault tolerant MPI applications and libraries to be 
built on top of MPI.

This document only covers run-through stabilization, not process recovery, at 
the moment. So the application will have well defined semantics to allow it to 
continue processing without the failed process. Recovering the failed process 
is not specified in this document. That is the subject of a supplemental 
document in preparation - the two proposals are meant to be complementary and 
build upon one another.

> 
> Having a true ability to survive a dying process (i.e. rank) which might be 
> computing already for hours would mean to have some kind of "rank RAID" or 
> "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks are 
> failing, your job will be ready in time.

Yes, that is one possible technique. So once a process failure occurs, the 
application is notified via the existing error handling mechanisms. The 
application is then responsible for determining how best to recover from that 
process failure. This could include using MPI_Comm_spawn to create new 
processes (useful in manager/worker applications), recovering the state from an 
in-memory checksum, using spare processes in the communicator, rolling back 
some/all ranks to an application level checkpoint, ignoring the failure and 
allowing the residual error to increase, aborting the job or a single 
sub-communicator, ... the list goes on. But the purpose of the proposal is to 
allow an application or library to start building such techniques based on 
portable semantics and well defined interfaces.

Does that help clarify?


If you would like to discuss the developing proposals further or have input on 
how to make it better, I would suggest moving the discussion to the MPI3-ft 
mailing list so other groups can participate that do not normally follow the 
Open MPI lists. The mailing list information is below:
  http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


-- Josh

> 
> -- Reuti
> 
> 
>> This work is on-going, but once we have a stable prototype we will assess 
>> how to bring it back to the mainline Open MPI trunk. For the moment, there 
>> is no public release of this branch, but once there is we will be sure to 
>> announce it on the appropriate Open MPI mailing list for folks to start 
>> playing around with it.
>> 
>> -- Josh
>> 
>> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:
>> 
>>> Hi,
>>> 
>>> I was wondering what support Open MPI has for allowing a job to
>>> continue running when one or more processes in the job die
>>> unexpectedly? Is there a special mpirun flag for this? Any other ways?
>>> 
>>> It seems obvious that collectives will fail once a process dies, but
>>> would it be possible to create a new group (if you knew which ranks
>>> are dead) that excludes the dead processes - then turn this group into
>>> a working communicator?
>>> 
>>> Thanks,
>>> Kirk
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Ralph Castain

On Jan 27, 2011, at 7:47 AM, Reuti wrote:

> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
> 
>> The current version of Open MPI does not support continued operation of an 
>> MPI application after process failure within a job. If a process dies, so 
>> will the MPI job. Note that this is true of many MPI implementations out 
>> there at the moment.
>> 
>> At Oak Ridge National Laboratory, we are working on a version of Open MPI 
>> that will be able to run-through process failure, if the application wishes 
>> to do so. The semantics and interfaces needed to support this functionality 
>> are being actively developed by the MPI Forums Fault Tolerance Working 
>> Group, and can be found at the wiki page below:
>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization
> 
> I had a look at this document, but what is really covered - the application 
> has to react on the notification of a failed rank and act appropriate on its 
> own?
> 
> Having a true ability to survive a dying process (i.e. rank) which might be 
> computing already for hours would mean to have some kind of "rank RAID" or 
> "rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks are 
> failing, your job will be ready in time.

We have the run-time part of this done - of course, figuring out the MPI part 
of the problem is harder ;-)

> 
> -- Reuti
> 
> 
>> This work is on-going, but once we have a stable prototype we will assess 
>> how to bring it back to the mainline Open MPI trunk. For the moment, there 
>> is no public release of this branch, but once there is we will be sure to 
>> announce it on the appropriate Open MPI mailing list for folks to start 
>> playing around with it.
>> 
>> -- Josh
>> 
>> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:
>> 
>>> Hi,
>>> 
>>> I was wondering what support Open MPI has for allowing a job to
>>> continue running when one or more processes in the job die
>>> unexpectedly? Is there a special mpirun flag for this? Any other ways?
>>> 
>>> It seems obvious that collectives will fail once a process dies, but
>>> would it be possible to create a new group (if you knew which ranks
>>> are dead) that excludes the dead processes - then turn this group into
>>> a working communicator?
>>> 
>>> Thanks,
>>> Kirk
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Reuti
Am 27.01.2011 um 15:23 schrieb Joshua Hursey:

> The current version of Open MPI does not support continued operation of an 
> MPI application after process failure within a job. If a process dies, so 
> will the MPI job. Note that this is true of many MPI implementations out 
> there at the moment.
> 
> At Oak Ridge National Laboratory, we are working on a version of Open MPI 
> that will be able to run-through process failure, if the application wishes 
> to do so. The semantics and interfaces needed to support this functionality 
> are being actively developed by the MPI Forums Fault Tolerance Working Group, 
> and can be found at the wiki page below:
>  
> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization

I had a look at this document, but what is really covered - the application has 
to react on the notification of a failed rank and act appropriate on its own?

Having a true ability to survive a dying process (i.e. rank) which might be 
computing already for hours would mean to have some kind of "rank RAID" or 
"rank Parchive". E.g. start 12 ranks when you need 10 - what ever 2 ranks are 
failing, your job will be ready in time.

-- Reuti


> This work is on-going, but once we have a stable prototype we will assess how 
> to bring it back to the mainline Open MPI trunk. For the moment, there is no 
> public release of this branch, but once there is we will be sure to announce 
> it on the appropriate Open MPI mailing list for folks to start playing around 
> with it.
> 
> -- Josh
> 
> On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:
> 
>> Hi,
>> 
>> I was wondering what support Open MPI has for allowing a job to
>> continue running when one or more processes in the job die
>> unexpectedly? Is there a special mpirun flag for this? Any other ways?
>> 
>> It seems obvious that collectives will fail once a process dies, but
>> would it be possible to create a new group (if you knew which ranks
>> are dead) that excludes the dead processes - then turn this group into
>> a working communicator?
>> 
>> Thanks,
>> Kirk
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] allow job to survive process death

2011-01-27 Thread Joshua Hursey
The current version of Open MPI does not support continued operation of an MPI 
application after process failure within a job. If a process dies, so will the 
MPI job. Note that this is true of many MPI implementations out there at the 
moment.

At Oak Ridge National Laboratory, we are working on a version of Open MPI that 
will be able to run-through process failure, if the application wishes to do 
so. The semantics and interfaces needed to support this functionality are being 
actively developed by the MPI Forums Fault Tolerance Working Group, and can be 
found at the wiki page below:
  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization

This work is on-going, but once we have a stable prototype we will assess how 
to bring it back to the mainline Open MPI trunk. For the moment, there is no 
public release of this branch, but once there is we will be sure to announce it 
on the appropriate Open MPI mailing list for folks to start playing around with 
it.

-- Josh

On Jan 27, 2011, at 9:11 AM, Kirk Stako wrote:

> Hi,
> 
> I was wondering what support Open MPI has for allowing a job to
> continue running when one or more processes in the job die
> unexpectedly? Is there a special mpirun flag for this? Any other ways?
> 
> It seems obvious that collectives will fail once a process dies, but
> would it be possible to create a new group (if you knew which ranks
> are dead) that excludes the dead processes - then turn this group into
> a working communicator?
> 
> Thanks,
> Kirk
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




[OMPI users] allow job to survive process death

2011-01-27 Thread Kirk Stako
Hi,

I was wondering what support Open MPI has for allowing a job to
continue running when one or more processes in the job die
unexpectedly? Is there a special mpirun flag for this? Any other ways?

It seems obvious that collectives will fail once a process dies, but
would it be possible to create a new group (if you knew which ranks
are dead) that excludes the dead processes - then turn this group into
a working communicator?

Thanks,
Kirk