On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Just to clarify: I am not aware of any MPI that will allow you to relocate a
> process while it is running. You have to checkpoint the job, terminate it,
> and then restart the entire thing with the desired process on the new node.
>


Dear all,

For your information, MVAPICH2 supports live migration of MPI
processes, without the need to terminate and restart the whole job.

All the details are in the MVAPICH2 user guide:
  - How to configure MVAPICH2 for migration
    
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-120004.4
  - How to trigger process migration
    
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-760006.14.3

You can also check the paper "High Performance Pipelined Process
Migration with RDMA"
http://mvapich.cse.ohio-state.edu/static/media/publications/abstract/ouyangx-2011-ccgrid.pdf


Best regards,

Xavier



>
> On Mar 16, 2016, at 3:15 AM, Husen R <hus...@gmail.com> wrote:
>
> In the case of MPI application (not gromacs), How do I relocate MPI
> application from one node to another node while it is running ?
> I'm sorry, as far as I know the ompi-restart command is used to restart
> application, based on checkpoint file, once the application already
> terminated (no longer running).
>
> Thanks
>
> regards,
>
> Husen
>
> On Wed, Mar 16, 2016 at 4:29 PM, Jeff Hammond <jeff.scie...@gmail.com>
> wrote:
>>
>> Just checkpoint-restart the app to relocate. The overhead will be lower
>> than trying to do with MPI.
>>
>> Jeff
>>
>>
>> On Wednesday, March 16, 2016, Husen R <hus...@gmail.com> wrote:
>>>
>>> Hi Jeff,
>>>
>>> Thanks for the reply.
>>>
>>> After consulting the Gromacs docs, as you suggested, Gromacs already
>>> supports checkpoint/restart. thanks for the suggestion.
>>>
>>> Previously, I asked about checkpoint/restart in Open MPI because I want
>>> to checkpoint MPI Application and restart/migrate it while it is running.
>>> For the example, I run MPI application in node A,B and C in a cluster and
>>> I want to migrate process running in node A to other node, let's say to node
>>> C.
>>> is there a way to do this with open MPI ? thanks.
>>>
>>> Regards,
>>>
>>> Husen
>>>
>>>
>>>
>>>
>>> On Wed, Mar 16, 2016 at 12:37 PM, Jeff Hammond <jeff.scie...@gmail.com>
>>> wrote:
>>>>
>>>> Why do you need OpenMPI to do this? Molecular dynamics trajectories are
>>>> trivial to checkpoint and restart at the application level. I'm sure 
>>>> Gromacs
>>>> already supports this. Please consult the Gromacs docs or user support for
>>>> details.
>>>>
>>>> Jeff
>>>>
>>>>
>>>> On Tuesday, March 15, 2016, Husen R <hus...@gmail.com> wrote:
>>>>>
>>>>> Dear Open MPI Users,
>>>>>
>>>>>
>>>>> Does the current stable release of Open MPI (v1.10 series) support
>>>>> fault tolerant feature ?
>>>>> I got the information from Open MPI FAQ that The checkpoint/restart
>>>>> support was last released as part of the v1.6 series.
>>>>> I just want to make sure about this.
>>>>>
>>>>> and by the way, does Open MPI able to checkpoint or restart mpi
>>>>> application/GROMACS automatically ?
>>>>> Please, I really need help.
>>>>>
>>>>> Regards,
>>>>>
>>>>>
>>>>> Husen
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Hammond
>>>> jeff.scie...@gmail.com
>>>> http://jeffhammond.github.io/
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2016/03/28705.php
>>>
>>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.scie...@gmail.com
>> http://jeffhammond.github.io/
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/03/28709.php
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28710.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28731.php

Reply via email to