Re: [OMPI devel] C/R and orte_oob
On Mar 6, 2014, at 1:02 PM, Adrian Reberwrote: > On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote: >> I tried to implement something like you described. It is not yet event >> driven, but before continuing I wanted to get some feedback if it is at >> least the right start: >> >> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706 >> >> I looked at the other ORTE_OOB_* macros and tried to model my >> functionality a bit after what I have seen there. Right now it is still >> a simple function which just tries to call ft_event() on all oob >> components. Does this look right so far? > > Sorry for delay - yes, that looks like the right direction. I would > suggest doing it via the current state machine, though, by simply > defining another job or proc state in orte/mca/plm/plm_types.h, and then > registering a callback function using the > orte_state.add_job[proc]_state(state, function to be called, > ORTE_ERR_PRI). Then you can activate it by calling > ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the > proper order. What is a job/proc in the Open MPI context. >>> >>> A "job" is the entire application, while a "proc" is just one process in >>> that application. In this case you could use either one as you are >>> checkpointing the entire job, but all this activity is occurring inside >>> each proc. So I'd suggest defining it as a proc state since it only really >>> involves local actions. >>> >>> If you like, I can define the required code in the trunk and let you fill >>> in the event functionality. >> >> That would be great. > > Thanks for your changes. When using --with-ft there are a few compiler > errors which I tried to fix with following patch: > > https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c That looks okay, with the only caveat being that you wouldn't ordinarily pass the state_caddy_t into a function. It's just there to pass along the job etc in case the callback function needs to reference something. In this case, I can't think of anything the FT event function would need to know - you just want it to quiet all messaging. > > Adrian > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14309.php
Re: [OMPI devel] C/R and orte_oob
On Tue, Feb 18, 2014 at 03:46:58PM +0100, Adrian Reber wrote: > > >>> I tried to implement something like you described. It is not yet event > > >>> driven, but before continuing I wanted to get some feedback if it is at > > >>> least the right start: > > >>> > > >>> https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=5048a9cec2cd0bc4867eadfd7e48412b73267706 > > >>> > > >>> I looked at the other ORTE_OOB_* macros and tried to model my > > >>> functionality a bit after what I have seen there. Right now it is still > > >>> a simple function which just tries to call ft_event() on all oob > > >>> components. Does this look right so far? > > >> > > >> Sorry for delay - yes, that looks like the right direction. I would > > >> suggest doing it via the current state machine, though, by simply > > >> defining another job or proc state in orte/mca/plm/plm_types.h, and then > > >> registering a callback function using the > > >> orte_state.add_job[proc]_state(state, function to be called, > > >> ORTE_ERR_PRI). Then you can activate it by calling > > >> ORTE_ACTIVATE_JOB[PROC]_STATE(NULL, state) and it will be handled in the > > >> proper order. > > > > > > What is a job/proc in the Open MPI context. > > > > A "job" is the entire application, while a "proc" is just one process in > > that application. In this case you could use either one as you are > > checkpointing the entire job, but all this activity is occurring inside > > each proc. So I'd suggest defining it as a proc state since it only really > > involves local actions. > > > > If you like, I can define the required code in the trunk and let you fill > > in the event functionality. > > That would be great. Thanks for your changes. When using --with-ft there are a few compiler errors which I tried to fix with following patch: https://lisas.de/git/?p=open-mpi.git;a=commitdiff;h=71521789ef9d248a7eef53030d2ec5de900faa4c Adrian
Re: [OMPI devel] autoconf warnings: openib BTL
but AF_IB is always declared, regardless of actual presence in the kernel. On Thu, Mar 6, 2014 at 5:56 PM, Ralph Castainwrote: > Let me see if I can help translate. I think the problem here is Jeff's > comment about a "run time check", which wasn't actually what he is > proposing here. > > If you look at Jeff's proposed code, what he is saying is that you don't > need to use AC_TRY_RUN - you can just build based on whether or not AF_IB > is declared, and so AC_CHECK_DECLS is adequate. If the resulting code > fails, then that's an error anyway. So you can just protect the code as he > shows and be done with it. > > This would avoid all the warnings we are now receiving on the trunk, and > do what you need. Make sense? > > > > > > On Thu, Mar 6, 2014 at 7:26 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> On Mar 6, 2014, at 4:08 AM, Vasily Filipov >> wrote: >> >> >> #if HAVE_DECL_AF_IB >> >>rc = try_using_af_ib(); >> >>if (OMPI_ERR_NOT_AVAILABLE == rc) { >> >>rc = try_the_other_way(); >> >>} >> >> #else >> >>rc = try_the_other_way(); >> >> #endif >> >I mean I cannot use "another way" if func call for >> "try_using_af_ib" fails (call for "try_the_other_way()") because RDMACM was >> compiled for AF_IB usage (different fields in structs, different >> functions prototypes). >> >> Ok, that means the implementation is reduced to: >> >> #if HAVE_DECL_AF_IB >>rc = try_using_af_ib(); >> #else >>rc = try_the_other_way(); >> #endif >> >> Right? If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is >> easily detectable as to which way it is compiled (because it has, for >> example, different fields), then AC_CHECK_DECLS should be enough, right? >> >> I must be missing something...? >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/03/14306.php >> > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14307.php >
Re: [OMPI devel] autoconf warnings: openib BTL
Let me see if I can help translate. I think the problem here is Jeff's comment about a "run time check", which wasn't actually what he is proposing here. If you look at Jeff's proposed code, what he is saying is that you don't need to use AC_TRY_RUN - you can just build based on whether or not AF_IB is declared, and so AC_CHECK_DECLS is adequate. If the resulting code fails, then that's an error anyway. So you can just protect the code as he shows and be done with it. This would avoid all the warnings we are now receiving on the trunk, and do what you need. Make sense? On Thu, Mar 6, 2014 at 7:26 AM, Jeff Squyres (jsquyres)wrote: > On Mar 6, 2014, at 4:08 AM, Vasily Filipov > wrote: > > >> #if HAVE_DECL_AF_IB > >>rc = try_using_af_ib(); > >>if (OMPI_ERR_NOT_AVAILABLE == rc) { > >>rc = try_the_other_way(); > >>} > >> #else > >>rc = try_the_other_way(); > >> #endif > >I mean I cannot use "another way" if func call for "try_using_af_ib" > fails (call for "try_the_other_way()") because RDMACM was compiled for > AF_IB usage (different fields in structs, different functions prototypes). > > Ok, that means the implementation is reduced to: > > #if HAVE_DECL_AF_IB >rc = try_using_af_ib(); > #else >rc = try_the_other_way(); > #endif > > Right? If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is > easily detectable as to which way it is compiled (because it has, for > example, different fields), then AC_CHECK_DECLS should be enough, right? > > I must be missing something...? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/03/14306.php >
Re: [OMPI devel] autoconf warnings: openib BTL
On Mar 6, 2014, at 4:08 AM, Vasily Filipovwrote: >> #if HAVE_DECL_AF_IB >>rc = try_using_af_ib(); >>if (OMPI_ERR_NOT_AVAILABLE == rc) { >>rc = try_the_other_way(); >>} >> #else >>rc = try_the_other_way(); >> #endif >I mean I cannot use "another way" if func call for "try_using_af_ib" > fails (call for "try_the_other_way()") because RDMACM was compiled for AF_IB > usage (different fields in structs, different functions prototypes). Ok, that means the implementation is reduced to: #if HAVE_DECL_AF_IB rc = try_using_af_ib(); #else rc = try_the_other_way(); #endif Right? If so, I don't see why you need the AC_TRY_RUN -- if RDMACM is easily detectable as to which way it is compiled (because it has, for example, different fields), then AC_CHECK_DECLS should be enough, right? I must be missing something...? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] autoconf warnings: openib BTL
On 05-Mar-14 18:08, Jeff Squyres (jsquyres) wrote: On Mar 3, 2014, at 10:59 PM, Vasily Filipovwrote: Yes, it is possible, but there is some different if I will do it this way - With the current implementation (today into a trunk) if AC_RUN_IFELSE fails => old code of RDMACM will rise, And by way you suggest, if we postpone the decision to a run time and the check fails => we have to abort RDMACM at all, because it was compiled for working with AF_IB. So my question to you, if we take into account all this stuff above - What's the right way to implement it ? What do you think ? I'm not sure I understand. Can't you write something like: #if HAVE_DECL_AF_IB rc = try_using_af_ib(); if (OMPI_ERR_NOT_AVAILABLE == rc) { rc = try_the_other_way(); } #else rc = try_the_other_way(); #endif I mean I cannot use "another way" if func call for "try_using_af_ib" fails (call for "try_the_other_way()") because RDMACM was compiled for AF_IB usage (different fields in structs, different functions prototypes).