Re: [OMPI devel] Use unique collective ids for the checkpoint/restart code

2014-02-04 Thread Adrian Reber
Thanks for spotting the 'printf'. I removed it as it was for debugging
in a very early stage. I committed the patch without the 'printf' to svn.

Adrian

On Mon, Feb 03, 2014 at 12:42:39PM -0800, Ralph Castain wrote:
> Looks okay to me - I see you left a "printf" statement in 
> plm_base_launch_support.c, so you might want to make that an 
> opal_output_verbose or something.
> 
> On Feb 3, 2014, at 12:19 PM, Adrian Reber  wrote:
> 
> > This patch
> > 
> > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf
> > 
> > introduces unique collective ids for the checkpoint/restart code and
> > with this applied it seems to work pretty good. As this patch also
> > touches non-CR code it would be good if someone could have a look at it.
> > 
> > With this patch applied the code seems to work up to the point where
> > orterun actually pauses all processes and tries to create the
> > checkpoints. The checkpoint creation does not work for me as CRS does
> > not yet include support for checkpoint/restart using CRIU which would be
> > my next step.
> > 
> > Adrian
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Use unique collective ids for the checkpoint/restart code

2014-02-03 Thread Ralph Castain
Looks okay to me - I see you left a "printf" statement in 
plm_base_launch_support.c, so you might want to make that an 
opal_output_verbose or something.

On Feb 3, 2014, at 12:19 PM, Adrian Reber  wrote:

> This patch
> 
> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf
> 
> introduces unique collective ids for the checkpoint/restart code and
> with this applied it seems to work pretty good. As this patch also
> touches non-CR code it would be good if someone could have a look at it.
> 
> With this patch applied the code seems to work up to the point where
> orterun actually pauses all processes and tries to create the
> checkpoints. The checkpoint creation does not work for me as CRS does
> not yet include support for checkpoint/restart using CRIU which would be
> my next step.
> 
>   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Use unique collective ids for the checkpoint/restart code

2014-02-03 Thread Adrian Reber
This patch

https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=14ec7f42baab882e345948ff79c4f75f5084bbbf

introduces unique collective ids for the checkpoint/restart code and
with this applied it seems to work pretty good. As this patch also
touches non-CR code it would be good if someone could have a look at it.

With this patch applied the code seems to work up to the point where
orterun actually pauses all processes and tries to create the
checkpoints. The checkpoint creation does not work for me as CRS does
not yet include support for checkpoint/restart using CRIU which would be
my next step.

Adrian