Re: [OMPI devel] Use unique collective ids for the checkpoint/restart code
Thanks for spotting the 'printf'. I removed it as it was for debugging in a very early stage. I committed the patch without the 'printf' to svn. Adrian On Mon, Feb 03, 2014 at 12:42:39PM -0800, Ralph Castain wrote: > Looks okay to me - I see you left a "printf" statement in > plm_base_launch_support.c, so you might want to make that an > opal_output_verbose or something. > > On Feb 3, 2014, at 12:19 PM, Adrian Reberwrote: > > > This patch > > > > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf > > > > introduces unique collective ids for the checkpoint/restart code and > > with this applied it seems to work pretty good. As this patch also > > touches non-CR code it would be good if someone could have a look at it. > > > > With this patch applied the code seems to work up to the point where > > orterun actually pauses all processes and tries to create the > > checkpoints. The checkpoint creation does not work for me as CRS does > > not yet include support for checkpoint/restart using CRIU which would be > > my next step. > > > > Adrian > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Use unique collective ids for the checkpoint/restart code
Looks okay to me - I see you left a "printf" statement in plm_base_launch_support.c, so you might want to make that an opal_output_verbose or something. On Feb 3, 2014, at 12:19 PM, Adrian Reberwrote: > This patch > > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf > > introduces unique collective ids for the checkpoint/restart code and > with this applied it seems to work pretty good. As this patch also > touches non-CR code it would be good if someone could have a look at it. > > With this patch applied the code seems to work up to the point where > orterun actually pauses all processes and tries to create the > checkpoints. The checkpoint creation does not work for me as CRS does > not yet include support for checkpoint/restart using CRIU which would be > my next step. > > Adrian > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Use unique collective ids for the checkpoint/restart code
This patch https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf introduces unique collective ids for the checkpoint/restart code and with this applied it seems to work pretty good. As this patch also touches non-CR code it would be good if someone could have a look at it. With this patch applied the code seems to work up to the point where orterun actually pauses all processes and tries to create the checkpoints. The checkpoint creation does not work for me as CRS does not yet include support for checkpoint/restart using CRIU which would be my next step. Adrian