Re: [OMPI devel] Jan ORTE meeting

2008-12-04 Thread Richard Graham
How about if we start on this over e-mail and phone ? A face-to-face meeting is good, but I am already booked Jan 5-9, maybe 12-13, Jan 16-Feb 6th, and Feb 8-11. I would prefer not to tack on something at the end of the MPI Forum meeting, as I will have been gone from home for most of the month

Re: [OMPI devel] Jan ORTE meeting

2008-12-04 Thread Tim Mattox
I have other meetings so far on Jan 21, and possibly Jan 6-8. So I would ask we not have the Jan/Feb ORTE meeting either of those weeks. On Thu, Dec 4, 2008 at 5:50 PM, Ralph Castain wrote: > > On Dec 4, 2008, at 3:25 PM, Jeff Squyres wrote: > >> I don't know who's interested, so

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
What specifically do you have in mind ? After talking with Jeff I withdraw my request to change the approach. This is a good approach when one wants to send warnings to some sort of logging system, in addition to errors. Sending the data up stream like I suggested can¹t rely on the error

[OMPI devel] Jan ORTE meeting

2008-12-04 Thread Jeff Squyres
I don't know who's interested, so I thought I'd bring it up on the devel list: let's start the basics for the January ORTE meeting. We may be able to sketch out an agenda, but frankly, it may depend on how far we get in the December meeting. So we may not be able to fully decide that

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Jeff Squyres
The likelihood of a physical meeting about this in the near future is unlikely; I think we're all facing travel restrictions and constraints with the holidays coming up. How about a teleconf to discuss the following about the notifier: - what exactly is there today - why what is there today

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Ralph Castain
I'm beginning to believe that we need a design meeting specifically over this question. Too many unknowns exist, with significant potential problems lurking behind them. Frankly, this issue could have a major impact on how we operate, performance, and a variety of other factors going

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
On 12/4/08 2:28 PM, "Ralph Castain" wrote: > I guess you lost me on this one. How are the btl's going to push an error "up" > to a higher layer? The errors could contain an arbitrary amount of information > in them. Since the btl API's currently only return ints, are you

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Ralph Castain
I guess you lost me on this one. How are the btl's going to push an error "up" to a higher layer? The errors could contain an arbitrary amount of information in them. Since the btl API's currently only return ints, are you proposing that we change all the btl APIs to include a new error

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
Not exactly, it depends on what you push up the stack. If you push just an error code, than you are right, there is very little value. However, if you push up the error strings (or something like that), and have an upper layer interact with SLURM or Moab¹s error reporting system, the btl¹s don¹t

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Brian W. Barrett
That was my thought exactly. And since the point of the notifier component is to return a *useful* description of what failure the BTL had (like IB ran out of resource X again), that will be lost if we just push that up to the next layer. Just my $0.02, of course. Brian On Thu, 4 Dec 2008,

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Ralph Castain
Hmmm...only problem with that idea is that the entity being communicated to (e.g., SLURM, Moab) have no concept of MPI nor any way to communicate via that system. They do, however, have APIs that notifier can call, and know how to speak TCP via their own agreed-upon protocols. And many

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
Here is where I think we should reconsider accessing the notifier component in the btl. It creates dependencies in the btl that are not needed. The idea of a notifier component is a good one, but I would defer using it to upper layers, rather than embedding it in the guts of the communication

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
On 12/4/08 9:05 AM, "Jeff Squyres" wrote: > After reflecting on this a bit, there's two more things I should have > mentioned: > > 1. I think that moving the BTL's out into their own layer (or > whatever) should be a separate effort than re-introducing the RSL (or >

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Ralph Castain
On Dec 4, 2008, at 7:05 AM, Jeff Squyres wrote: After reflecting on this a bit, there's two more things I should have mentioned: 1. I think that moving the BTL's out into their own layer (or whatever) should be a separate effort than re-introducing the RSL (or something like it). To

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Ralph Castain
Yes, FTB utilizes the notifier framework. In addition, we have three other components getting ready to be added to that framework that will provide interfaces to Moab, SLURM, and a DOE monitoring program. The first two will require messaging capabilities to tell the schedulers about

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Jeff Squyres
I think you got it right. And I think we're pretty good in terms of BTL usage of ORTE and OPAL (to include the new "notifier" service that Ralph put in recently -- what the FTB will likely eventually use, I think...?); those interfaces and abstraction barriers are technologically