Re: [OMPI devel] problem in the ORTE notifier framework

2009-06-08 Thread Ralph Castain
I believe the concern here was that we aren't entirely sure just where you plan to do this. If we are talking about reporting errors, then there is less concern about adding cycles. For example, we already check to see if the IB driver has exceeded the limit on retries - adding more logic

Re: [OMPI devel] problem in the ORTE notifier framework

2009-06-08 Thread Sylvain Jeaugey
Ralph, Sorry for answering on this old thread, but it seems that my answer was blocked in the "postponed" folder. About the if-then, I thought it was 1 cycle. I mean, if you don't break the pipeline, i.e. use likely() or builtin_expect() or something like that to be sure that the compiler

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres
On May 28, 2009, at 8:48 AM, Jeff Squyres (jsquyres) wrote: Yes, the opal-sos branch has a variant of this as well. One thing I didn't mention: the opal-sos hg tree is unfortunately unrelated from the main ompi-svn-mirror, so you can't just push/pull between them. :-( Most of us OMPI

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres
On May 28, 2009, at 2:55 AM, Nadia Derbey wrote: Well, it didn't because from what I understood, the MPI program need to be changed (register a callback routine for the event, activate the event, etc), and this is something we wanted to avoid. Combined with what Terry and Ralph already

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Jeff Squyres
On May 28, 2009, at 7:53 AM, Ralph Castain wrote: The code in 1.3 is definitely different from the trunk as it lags quite a bit behind. However, the trunk definitely does include the code I referenced. Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff on that

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Ralph Castain
I agree with Terry here about being careful in pursuing this path. What I wouldn't want to have happen is to force anyone wanting to be notified of error events to have to also turn on peruse, which impacts the non-error code path. Again, I'm not entirely sure what you are trying to do here. As I

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Ralph Castain
The code in 1.3 is definitely different from the trunk as it lags quite a bit behind. However, the trunk definitely does include the code I referenced. Not sure why the hg mirror wouldn't have it. I would have to defer to Jeff on that question - could be a bug in the update macro that maintains

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Terry Dontje
Nadia Derbey wrote: On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: Excellent points; Ralph and I chatted about this on the phone today -- we concur with George. Bull -- would peruse work for you? I think you mentioned before that it didn't seem attractive to you. Well,

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Sylvain Jeaugey
To be more complete, we pull Hg from http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/ ; are we mistaken ? If not, the code in v1.3 seems to be different from the code in the trunk ... Sylvain On Thu, 28 May 2009, Nadia Derbey wrote: On Tue, 2009-05-26 at 17:24 -0600, Ralph

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
On Tue, 2009-05-26 at 17:24 -0600, Ralph Castain wrote: > First, to answer Nadia's question: you will find that the init > function for the module is already called when it is selected - see > the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the > trunk. Strange? Our repository

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-28 Thread Nadia Derbey
On Wed, 2009-05-27 at 14:25 -0400, Jeff Squyres wrote: > Excellent points; Ralph and I chatted about this on the phone today -- > we concur with George. > > Bull -- would peruse work for you? I think you mentioned before that > it didn't seem attractive to you. Well, it didn't because from

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-27 Thread Ralph Castain
I think it depends upon what is being monitored. As I understand it, we could use the peruse link to generate notifications based on the number of times someone calls "MPI_Send", for example. I concur with George's concerns about performance in this area and would agree that using the peruse hooks

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-27 Thread Jeff Squyres
FYI, I moved the opal-sos repo this morning to bitbucket.org: http://bitbucket.org/jsquyres/opal-sos/overview/ If you already pulled from the old www.open-mpi.org copy, you can just edit your .hg/hgrc to set the new bitbucket URL and continue pulling / etc. (no need to get a new

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-27 Thread Jeff Squyres
This all sounds reasonable. If these are, indeed, on already-slow code paths, I doubt there will be much of an issue. If you want to talk about this higher bandwidth, let us know -- I can setup a Webex call pretty easily. On May 27, 2009, at 8:21 AM, Sylvain Jeaugey wrote: I thought an

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-27 Thread Sylvain Jeaugey
I thought an if-then was 1 cycle. I mean, if you don't break the pipeline, i.e. use likely() or builtin_expect() or something like that to be sure that the compiler will generate assembly in the right way, it shouldn't be more than 1 cycle, perhaps less on some architectures like Itanium. But

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-26 Thread Jeff Squyres
Sure, I can setup a webex (with international dialins) if it would be useful. On May 26, 2009, at 7:24 PM, Ralph Castain wrote: First, to answer Nadia's question: you will find that the init function for the module is already called when it is selected - see the code in

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-26 Thread Ralph Castain
First, to answer Nadia's question: you will find that the init function for the module is already called when it is selected - see the code in orte/mca/base/notifier_base_select.c, lines 72-76 (in the trunk. It would be a good idea to tie into the sos work to avoid conflicts when it all gets

Re: [OMPI devel] problem in the ORTE notifier framework

2009-05-26 Thread Jeff Squyres
Nadia -- Sorry I didn't get to jump in on the other thread earlier. We have made considerable changes to the notifier framework in a branch to better support "SOS" functionality: https://www.open-mpi.org/hg/auth/hgwebdir.cgi/jsquyres/opal-sos Cisco and Indiana U. have been working on

[OMPI devel] problem in the ORTE notifier framework

2009-05-26 Thread Nadia Derbey
Hi, While having a look at the notifier framework under orte, I noticed that the way it is written, the init routine for the selected module cannot be called. Attached is a small patch that fixes this issue. Regards, Nadia ORTE notifier module init routine is never called: orte_notifier.init