Re: [O-MPI devel] FYI: Failing intel tests
Josh Hursey wrote: So I have been working a variety of platforms [x86 32 bit, x86 64 bit, PPC 64 bit]. As of SVN Revision 6541 below are the current failures from the intel_tests suite. For all of these tests I used "tcp,self" PTLs and the TEG PML. On x86-32, x86-64, PPC64 - MPI_Send_self_f MPI_Send_self_c MPI_Send_init_self_c This will be true for all platforms w/ the ptl code - as it does no buffering. The intel tests are incorrect - in that they assume the implementation provides some degree of buffering (blocking sends are called before receive is posted). The btl code will buffer up to a configurable eager limit - which I've set by default to be large enough to pass the intel tests. Tim
[O-MPI devel] FYI: Failing intel tests
So I have been working a variety of platforms [x86 32 bit, x86 64 bit, PPC 64 bit]. As of SVN Revision 6541 below are the current failures from the intel_tests suite. For all of these tests I used "tcp,self" PTLs and the TEG PML. On x86-32, x86-64, PPC64 - MPI_Send_self_f MPI_Send_self_c MPI_Send_init_self_c On x86-64, PPC64 - MPI_Allreduce_loc_c MPI_Reduce_loc_c MPI_Reduce_scatter_loc_c MPI_Scan_loc_c Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [O-MPI devel] OMPI_MCA_ptl_base_exclude=sm
There are several possibilities to make this option the default one. The simplest one is to use the mca-params.conf file. In your home directory create a .openmpi directory and inside create a file called mca-params.conf. You can add in this file all options that you want to be the default behavior of your OMPI. Here is mine: component_show_load_errors=0 ptl_tcp_if_include=eth0 ptl_tcp_if_exclude=lo ptl_base_include=tcp,self pml=uniq mpi_yield_when_idle = 0 mpi_event_tick_rate=0 Notice that you don't have to add the OMPI_MCA in the beginning. You can add whatever you want just pick the name as printed by the ompi_info tool. george. On Jul 18, 2005, at 2:49 PM, Greg Watson wrote: Currently I have to set this environment variable on bproc/x86_64 or mpirun just hangs. Would it be possible to make this the default setting until it's fixed? Greg ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] July meeting
Added. I've now got: - ORTE sub group: - simplifying GPR access (if possible) - multi-cell RTE - external perspective from Eclipse guys Since you and Ralph are not core OMPI developers and not the focus of the rest of the meeting, let's coordinate offline and come up with a time (and location?) for this. On Jul 18, 2005, at 2:58 PM, Greg Watson wrote: Would you be interested in an external tool perspective on orte? Greg On Jul 18, 2005, at 9:34 AM, Jeff Squyres wrote: Here's the topics that I have for the July meeting (did I miss anything?). I don't really have a schedule -- only some things need to assigned times (e.g., the collectives meeting(s)). - All: Current status, future plans/goals (presentation from each organization) - Jeff: IU - Tim: LANL - Ralph: ORTE - Rainer: HLRS - George: UTK - All, Tim: overview of PML/BTL interface and design - Retire teg/PTL? - All: SC, Euro PVM/MPI, LACSI planning - All, Brian: what you need to know about the new configure.m4 system - All, Jeff: build system changes - All: discussion of others coming into the tree (e.g., IB vendors) - All: processor/memory affinity - All: project split issues - tools in wrong trees (abstraction issues) - wrapper compilers for ORTE/OPAL? - #include files - any final cleanup moving between opal/orte/ompi trees - All: MCA issues - Framework hiding / re-entrance (e.g., ompi_info) - How to have multiple PML's without re-initializing BTLs/PTLs? - Better guidelines for framework/component "open" calls - Separate function for framework/component MCA parameter registration? - Frameworks as DSOs - Jeff, Ralph: simplifying GPR access (if possible) - Red Storm sub-group: current status and future plans - One-sided sub-group: current status, SC plans, future plans - Fault tolerance sub-group: - Integrating LAM-style coordinated, synchronous checkpointing - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.) - Collectives sub-group (including Access Grid participants): - New framework (coll v2) - Using btl's - Striping - Non-blocking collectives - ...? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] July meeting
On Jul 18, 2005, at 2:56 PM, Ralph Castain wrote: 1. RedStorm design - what are we going to do about the RTE? That's what I meant by "future plans", but I could certainly be more explicit. :-) 2. Multi-cell RTE - I've been working on this (finally set it aside to complete the scalable startup stuff first), but it is complex and might merit some discussion with those interested. Shirley. Added. -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] July meeting
Would you be interested in an external tool perspective on orte? Greg On Jul 18, 2005, at 9:34 AM, Jeff Squyres wrote: Here's the topics that I have for the July meeting (did I miss anything?). I don't really have a schedule -- only some things need to assigned times (e.g., the collectives meeting(s)). - All: Current status, future plans/goals (presentation from each organization) - Jeff: IU - Tim: LANL - Ralph: ORTE - Rainer: HLRS - George: UTK - All, Tim: overview of PML/BTL interface and design - Retire teg/PTL? - All: SC, Euro PVM/MPI, LACSI planning - All, Brian: what you need to know about the new configure.m4 system - All, Jeff: build system changes - All: discussion of others coming into the tree (e.g., IB vendors) - All: processor/memory affinity - All: project split issues - tools in wrong trees (abstraction issues) - wrapper compilers for ORTE/OPAL? - #include files - any final cleanup moving between opal/orte/ompi trees - All: MCA issues - Framework hiding / re-entrance (e.g., ompi_info) - How to have multiple PML's without re-initializing BTLs/PTLs? - Better guidelines for framework/component "open" calls - Separate function for framework/component MCA parameter registration? - Frameworks as DSOs - Jeff, Ralph: simplifying GPR access (if possible) - Red Storm sub-group: current status and future plans - One-sided sub-group: current status, SC plans, future plans - Fault tolerance sub-group: - Integrating LAM-style coordinated, synchronous checkpointing - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.) - Collectives sub-group (including Access Grid participants): - New framework (coll v2) - Using btl's - Striping - Non-blocking collectives - ...? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] July meeting
Yo Jeff Couple of things you might want to add: 1. RedStorm design - what are we going to do about the RTE? 2. Multi-cell RTE - I've been working on this (finally set it aside to complete the scalable startup stuff first), but it is complex and might merit some discussion with those interested. On Mon, 2005-07-18 at 09:34, Jeff Squyres wrote: > Here's the topics that I have for the July meeting (did I miss > anything?). I don't really have a schedule -- only some things need to > assigned times (e.g., the collectives meeting(s)). > > - All: Current status, future plans/goals (presentation from each >organization) >- Jeff: IU >- Tim: LANL >- Ralph: ORTE >- Rainer: HLRS >- George: UTK > - All, Tim: overview of PML/BTL interface and design >- Retire teg/PTL? > - All: SC, Euro PVM/MPI, LACSI planning > - All, Brian: what you need to know about the new configure.m4 system > - All, Jeff: build system changes > - All: discussion of others coming into the tree (e.g., IB vendors) > - All: processor/memory affinity > - All: project split issues >- tools in wrong trees (abstraction issues) >- wrapper compilers for ORTE/OPAL? >- #include files >- any final cleanup moving between opal/orte/ompi trees > - All: MCA issues >- Framework hiding / re-entrance (e.g., ompi_info) >- How to have multiple PML's without re-initializing BTLs/PTLs? >- Better guidelines for framework/component "open" calls >- Separate function for framework/component MCA parameter > registration? >- Frameworks as DSOs > > - Jeff, Ralph: simplifying GPR access (if possible) > - Red Storm sub-group: current status and future plans > - One-sided sub-group: current status, SC plans, future plans > - Fault tolerance sub-group: >- Integrating LAM-style coordinated, synchronous checkpointing >- How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.) > - Collectives sub-group (including Access Grid participants): >- New framework (coll v2) >- Using btl's >- Striping >- Non-blocking collectives >- ...?
[O-MPI devel] OMPI_MCA_ptl_base_exclude=sm
Currently I have to set this environment variable on bproc/x86_64 or mpirun just hangs. Would it be possible to make this the default setting until it's fixed? Greg
Re: [O-MPI devel] processor affinity
Thanks to Edgar for copying me on this: it sounds like I asked at the right time. There were some OpenMP-internal discussions on this a couple of years ago, and a straw proposal, but at that time there was no obvious interest on the MPI side. So we'd have to go back and look at the ideas that came up at the time. Barbara > [Original Message] > From: Jeff Squyres > To: Open MPI Developers > Cc: Barbara Chapman > Date: 7/18/2005 10:44:15 AM > Subject: Re: [O-MPI devel] processor affinity > > Excellent. Seems like several people have thought of this at the same > time (I was pinged about this by the IB vendors). > > I know that others on the team have more experience in this area than I > do, so I personally welcome all information. I've read a few papers on > the topic (general/simplified consensus: memory and process affinity is > good), but would appreciate any pointers to more material. > > After the theory, however, we need to decide on an implementation > strategy. As Rich mentioned, we can do this all via MCA parameters, or > perhaps via MPI_Info or MPI attributes. Although I'm not sure how much > of this can be set ahead of time and what needs to be done on a > per-thread basis, I'm assuming that each thread will need to make some > kind of function call (if MPI is going to handle it, then it will need > to be an MPI function call that triggers some magic under the covers). > > Any advice here from the Open MP community would also be appreciated... > > > On Jul 18, 2005, at 11:08 AM, Edgar Gabriel wrote: > > > We have currently Barbara Chapman from the University of Houston as a > > guest scientist here at Stuttgart. Most of you might now, that Barbara > > is working on compiler design and OpenMP issues. This afternoon she > > dropped in my office and asked me, whether the Open MPI group has > > thought/discussed processor affinity issues up to now (which we just > > did > > :-) ). > > > > Anyway, I just wanted to point out, that various people from the OpenMP > > community have been working/are still working on this issue, and that > > it > > might be interesting to exchange information and maybe coordinate the > > approaches. I cc'ed Barbara therefore also on this email... > > -- > {+} Jeff Squyres > {+} The Open MPI Project > {+} http://www.open-mpi.org/
Re: [O-MPI devel] collectives discussion @LANL
Torsten Hoefler wrote: Hi, Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) yes, it's ok for me - I'll ask the vcc and reply to this mail if they have any problems. Can you do the 28th, too? I'd prevfer the 27th and 28th if possible (vs. 26th and 27th) because the 26th is our first day there (Monday/25th is a travel day) and I'd like to get a bunch of other stuff out of the way before we attack collectives. yes of course - so I make the reservation for 27th to 28th. As for tech assistance, contact Cindy at LANL (Ralph has her contact info -- indeed, she might proactively contact your people anyway, to setup for smooth sailing on the actual conference days). Ralph, could you please give me Cindys contact information? I'll confirm these dates w/ Cindy and provide her your contact info. Thanks, Tim
Re: [O-MPI devel] collectives discussion @LANL
Tim is taking care of this arrangement. I believe (from prior chat with Cindy) that she may already be familiar with her equivalent at your location. > > As for tech assistance, contact Cindy at LANL (Ralph has her contact > > info -- indeed, she might proactively contact your people anyway, to > > setup for smooth sailing on the actual conference days). > Ralph, could you please give me Cindys contact information? > > Thanks, >Torsten
Re: [O-MPI devel] processor affinity
Excellent. Seems like several people have thought of this at the same time (I was pinged about this by the IB vendors). I know that others on the team have more experience in this area than I do, so I personally welcome all information. I've read a few papers on the topic (general/simplified consensus: memory and process affinity is good), but would appreciate any pointers to more material. After the theory, however, we need to decide on an implementation strategy. As Rich mentioned, we can do this all via MCA parameters, or perhaps via MPI_Info or MPI attributes. Although I'm not sure how much of this can be set ahead of time and what needs to be done on a per-thread basis, I'm assuming that each thread will need to make some kind of function call (if MPI is going to handle it, then it will need to be an MPI function call that triggers some magic under the covers). Any advice here from the Open MP community would also be appreciated... On Jul 18, 2005, at 11:08 AM, Edgar Gabriel wrote: We have currently Barbara Chapman from the University of Houston as a guest scientist here at Stuttgart. Most of you might now, that Barbara is working on compiler design and OpenMP issues. This afternoon she dropped in my office and asked me, whether the Open MPI group has thought/discussed processor affinity issues up to now (which we just did :-) ). Anyway, I just wanted to point out, that various people from the OpenMP community have been working/are still working on this issue, and that it might be interesting to exchange information and maybe coordinate the approaches. I cc'ed Barbara therefore also on this email... -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[O-MPI devel] July meeting
Here's the topics that I have for the July meeting (did I miss anything?). I don't really have a schedule -- only some things need to assigned times (e.g., the collectives meeting(s)). - All: Current status, future plans/goals (presentation from each organization) - Jeff: IU - Tim: LANL - Ralph: ORTE - Rainer: HLRS - George: UTK - All, Tim: overview of PML/BTL interface and design - Retire teg/PTL? - All: SC, Euro PVM/MPI, LACSI planning - All, Brian: what you need to know about the new configure.m4 system - All, Jeff: build system changes - All: discussion of others coming into the tree (e.g., IB vendors) - All: processor/memory affinity - All: project split issues - tools in wrong trees (abstraction issues) - wrapper compilers for ORTE/OPAL? - #include files - any final cleanup moving between opal/orte/ompi trees - All: MCA issues - Framework hiding / re-entrance (e.g., ompi_info) - How to have multiple PML's without re-initializing BTLs/PTLs? - Better guidelines for framework/component "open" calls - Separate function for framework/component MCA parameter registration? - Frameworks as DSOs - Jeff, Ralph: simplifying GPR access (if possible) - Red Storm sub-group: current status and future plans - One-sided sub-group: current status, SC plans, future plans - Fault tolerance sub-group: - Integrating LAM-style coordinated, synchronous checkpointing - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.) - Collectives sub-group (including Access Grid participants): - New framework (coll v2) - Using btl's - Striping - Non-blocking collectives - ...? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] processor affinity
To add to this, I would suggest not coupling processor affinity and memory locality at the component level. At some level you do need to tie these together, but there are other components that also need to be considered, such as NIC locality, and probably other things too ... Rich On Jul 18, 2005, at 8:50 AM, Matt Leininger wrote: On Mon, 2005-07-18 at 08:28 -0400, Jeff Squyres wrote: On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote: Generally speaking, if you launch <=N processes in a job on a node (where N == number of CPUs on that node), then we set processor affinity. We set each process's affinity to the CPU number according to the VPID ordering of the procs in that job on that node. So if you launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would go to processor 1, etc. (it's an easy, locally-determined ordering). You'd need to be careful with dual-core cpus. Say you launch a 4 task MPI job on a 4-socket dual core Opteron. You'd want to schedule the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory bandwidth to each MPI task. With the potential for non-trivial logic like this, perhaps the extra work for a real framework would be justified, then. I agree. Also, how would this work with hybrid MPI+threading (either pthreads or OpenMP) applications? Let's say you have an 8 or 16 cpu node and you start up 2 MPI tasks with 4 compute threads in each task. The optimum layout may not be running the MPI tasks on cpu's 0 and 1. Several hybrid applications that ran on ASC White and now Purple will have these requirements. Hum. Good question. The MPI API doesn't really address this -- the MPI API is not aware of additional threads that are created until you call an MPI function (and even then, we're not currently checking which thread is calling -- that would just add latency). What do these applications do right now? Do they set their own processor / memory affinity? This might actually be outside the scope of MPI...? (I'mm not trying to shrug off responsibility, but this might be a case where the MPI simply doesn't have enough information, and to get that information [e.g., via MPI attributes or MPI info arguments] would be more hassle than the user just setting the affinity themselves...?) We played around with setting processor affinity in our app a few years ago. It got a little ugly, but things have improved since then. I was thinking of having the app pass threading info to MPI (via info or attributes). This might be outside the scope of MPI now, but this should be the responsibility of the parallel programming language/method. Making it the apps responsibility to set processor affinity seems a bit too much of a low-level worry to put on application developers. Some discussions around what a memory/processor affinity framework should look like and be doing is a good starting point. - Matt ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] collectives discussion @LANL
Hi, > >> Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) > > yes, it's ok for me - I'll ask the vcc and reply to this mail if they > > have any problems. > > Can you do the 28th, too? I'd prevfer the 27th and 28th if possible > (vs. 26th and 27th) because the 26th is our first day there > (Monday/25th is a travel day) and I'd like to get a bunch of other > stuff out of the way before we attack collectives. yes of course - so I make the reservation for 27th to 28th. > As for tech assistance, contact Cindy at LANL (Ralph has her contact > info -- indeed, she might proactively contact your people anyway, to > setup for smooth sailing on the actual conference days). Ralph, could you please give me Cindys contact information? Thanks, Torsten -- bash$ :(){ :|:&};: - pgp: http://www.unixer.de/htor-key.asc - Only wimps use tape backup. REAL men just upload their important stuff on ftp and let the rest of the world mirror it. (Linus Torvalds)
Re: [O-MPI devel] processor affinity
We have currently Barbara Chapman from the University of Houston as a guest scientist here at Stuttgart. Most of you might now, that Barbara is working on compiler design and OpenMP issues. This afternoon she dropped in my office and asked me, whether the Open MPI group has thought/discussed processor affinity issues up to now (which we just did :-) ). Anyway, I just wanted to point out, that various people from the OpenMP community have been working/are still working on this issue, and that it might be interesting to exchange information and maybe coordinate the approaches. I cc'ed Barbara therefore also on this email... Thanks Edgar Rich L. Graham wrote: On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote: On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote: Generally speaking, if you launch <=N processes in a job on a node (where N == number of CPUs on that node), then we set processor affinity. We set each process's affinity to the CPU number according to the VPID ordering of the procs in that job on that node. So if you launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would go to processor 1, etc. (it's an easy, locally-determined ordering). You'd need to be careful with dual-core cpus. Say you launch a 4 task MPI job on a 4-socket dual core Opteron. You'd want to schedule the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory bandwidth to each MPI task. With the potential for non-trivial logic like this, perhaps the extra work for a real framework would be justified, then. Also, how would this work with hybrid MPI+threading (either pthreads or OpenMP) applications? Let's say you have an 8 or 16 cpu node and you start up 2 MPI tasks with 4 compute threads in each task. The optimum layout may not be running the MPI tasks on cpu's 0 and 1. Several hybrid applications that ran on ASC White and now Purple will have these requirements. Hum. Good question. The MPI API doesn't really address this -- the MPI API is not aware of additional threads that are created until you call an MPI function (and even then, we're not currently checking which thread is calling -- that would just add latency). What do these applications do right now? Do they set their own processor / memory affinity? This might actually be outside the scope of MPI...? (I'mm not trying to shrug off responsibility, but this might be a case where the MPI simply doesn't have enough information, and to get that information [e.g., via MPI attributes or MPI info arguments] would be more hassle than the user just setting the affinity themselves...?) Comments? If you set things up such that you can specify input parameters on where to put each process, you have the flexibility you want. The locality API's I have seen all mimiced the IRIX API, which had these capabilities. If you want some ideas, look at LA-MPI, it does this - the implementation is pretty strange (just he coding), but it is there. Rich -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Dr.-Ing. Edgar Gabriel Clusters and Distributed Units High Performance Computing Center Stuttgart (HLRS) University of Stuttgart Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel Fax: +49 711 678 7626e-mail:gabr...@hlrs.de ==
Re: [O-MPI devel] processor affinity
On Mon, 2005-07-18 at 08:28 -0400, Jeff Squyres wrote: > On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote: > > >> Generally speaking, if you launch <=N processes in a job on a node > >> (where N == number of CPUs on that node), then we set processor > >> affinity. We set each process's affinity to the CPU number according > >> to the VPID ordering of the procs in that job on that node. So if you > >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would > >> go to processor 1, etc. (it's an easy, locally-determined ordering). > > > >You'd need to be careful with dual-core cpus. Say you launch a 4 > > task MPI job on a 4-socket dual core Opteron. You'd want to schedule > > the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory > > bandwidth to each MPI task. > > With the potential for non-trivial logic like this, perhaps the extra > work for a real framework would be justified, then. I agree. > > >Also, how would this work with hybrid MPI+threading (either pthreads > > or OpenMP) applications? Let's say you have an 8 or 16 cpu node and > > you > > start up 2 MPI tasks with 4 compute threads in each task. The optimum > > layout may not be running the MPI tasks on cpu's 0 and 1. Several > > hybrid applications that ran on ASC White and now Purple will have > > these > > requirements. > > Hum. Good question. The MPI API doesn't really address this -- the > MPI API is not aware of additional threads that are created until you > call an MPI function (and even then, we're not currently checking which > thread is calling -- that would just add latency). > > What do these applications do right now? Do they set their own > processor / memory affinity? This might actually be outside the scope > of MPI...? (I'mm not trying to shrug off responsibility, but this > might be a case where the MPI simply doesn't have enough information, > and to get that information [e.g., via MPI attributes or MPI info > arguments] would be more hassle than the user just setting the affinity > themselves...?) We played around with setting processor affinity in our app a few years ago. It got a little ugly, but things have improved since then. I was thinking of having the app pass threading info to MPI (via info or attributes). This might be outside the scope of MPI now, but this should be the responsibility of the parallel programming language/method. Making it the apps responsibility to set processor affinity seems a bit too much of a low-level worry to put on application developers. Some discussions around what a memory/processor affinity framework should look like and be doing is a good starting point. - Matt
Re: [O-MPI devel] Presentation of organization goals
I'd be glad to provide an overview of LANLs proposed '06 OpenMPI development. Tim Jeff Squyres wrote: A month or two ago, I gave the LANL guys an overview of IU's goals with respect to OMPI. There were several "Ohhh! So *that's* why you guys were asking for X, Y, and Z..." from the LANL guys during my presentation. As such, I think it was really helpful for everyone to understand our motivations. It's not that we were previously hiding them -- we just never got around to explicitly sharing them. I think we should have similar presentations at the upcoming meeting. Can someone from each core organization (LANL, IU, UTK, HLRS) give a 10-15 minute presentation (or whatever) listing all your goals and ambitions with OMPI? Separate them into short- and long-term goals, if possible. For example, I'll talk about IU's MPI-2 one-sided efforts, our plans for PBS and SLURM, and our future efforts for fault tolerance/checkpoint restart. Thanks!
Re: [O-MPI devel] collectives discussion @LANL
On Jul 18, 2005, at 10:28 AM, Torsten Hoefler wrote: Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) yes, it's ok for me - I'll ask the vcc and reply to this mail if they have any problems. Can you do the 28th, too? I'd prevfer the 27th and 28th if possible (vs. 26th and 27th) because the 26th is our first day there (Monday/25th is a travel day) and I'd like to get a bunch of other stuff out of the way before we attack collectives. As for tech assistance, contact Cindy at LANL (Ralph has her contact info -- indeed, she might proactively contact your people anyway, to setup for smooth sailing on the actual conference days). -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] collectives discussion @LANL
I am happy with any of the two dates either... whichever is more convinient for LANL... Jeff Squyres wrote: I'm happy with either. Europe -- comments? On Jul 18, 2005, at 9:59 AM, Ralph Castain wrote: Cindy indicated she was willing to come in early - she needs about an hour to set things up. 8:30am might be more polite for her, but I suspect she'd be willing to do 8am to accommodate Europe if necessary. On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote:Ok, since no one made a suggestion yet :-), I nominate the following two times: Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) Thursday, 28 Jul, 8-11am MT (is that too early for Cindy? I seem to recall that she has to get in early to set things up...?) I think 6 hours is probably enough for this one meeting. Please propose alternate times by COB today so that we can have an upper bound on time-for-consensus. Three remote sites so far: - UTK - ? for Torsten - Stuttgart On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote: The day/time was never set that I know about. Cindy Sievers is still holding the room, but we do need to let her know ASAP the times when you want the system operational. On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a day/time for the collectives meeting at LANL next week? (we may have and I've forgotten...?) I ask because those of us involved in that meeting will need to reserve time on the Access Grid and coordinate with the remote sites for participation. AFAIK, we have 2 remote sites, right? - UTK, for Graham Fagg (and others?) - ?? for Torsten (Torsten: you mentioned where before, but I don't recall the name of the site) Are there others? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Dr.-Ing. Edgar Gabriel Clusters and Distributed Units High Performance Computing Center Stuttgart (HLRS) University of Stuttgart Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel Fax: +49 711 678 7626e-mail:gabr...@hlrs.de ==
Re: [O-MPI devel] Presentation of organization goals
Excellent. On Jul 18, 2005, at 10:01 AM, Ralph Castain wrote: I can talk about OpenRTE, if you want - pretty short, but could describe the research directions being discussed with others. On Mon, 2005-07-18 at 07:39, Jeff Squyres wrote:A month or two ago, I gave the LANL guys an overview of IU's goals with respect to OMPI. There were several "Ohhh! So *that's* why you guys were asking for X, Y, and Z..." from the LANL guys during my presentation. As such, I think it was really helpful for everyone to understand our motivations. It's not that we were previously hiding them -- we just never got around to explicitly sharing them. I think we should have similar presentations at the upcoming meeting. Can someone from each core organization (LANL, IU, UTK, HLRS) give a 10-15 minute presentation (or whatever) listing all your goals and ambitions with OMPI? Separate them into short- and long-term goals, if possible. For example, I'll talk about IU's MPI-2 one-sided efforts, our plans for PBS and SLURM, and our future efforts for fault tolerance/checkpoint restart. Thanks! ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] collectives discussion @LANL
Hi, > Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) yes, it's ok for me - I'll ask the vcc and reply to this mail if they have any problems. Bye, Torsten -- "Elchwurst enthält kein Rindfleisch" Schild bei IKEA
Re: [O-MPI devel] collectives discussion @LANL
I'm happy with either. Europe -- comments? On Jul 18, 2005, at 9:59 AM, Ralph Castain wrote: Cindy indicated she was willing to come in early - she needs about an hour to set things up. 8:30am might be more polite for her, but I suspect she'd be willing to do 8am to accommodate Europe if necessary. On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote:Ok, since no one made a suggestion yet :-), I nominate the following two times: Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) Thursday, 28 Jul, 8-11am MT (is that too early for Cindy? I seem to recall that she has to get in early to set things up...?) I think 6 hours is probably enough for this one meeting. Please propose alternate times by COB today so that we can have an upper bound on time-for-consensus. Three remote sites so far: - UTK - ? for Torsten - Stuttgart On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote: > The day/time was never set that I know about. Cindy Sievers is still > holding the room, but we do need to let her know ASAP the times when > you want the system operational. > > > On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a > day/time for the collectives meeting at LANL next >> week? (we may have and I've forgotten...?) >> >> I ask because those of us involved in that meeting will need to >> reserve >> time on the Access Grid and coordinate with the remote sites for >> participation. >> >> AFAIK, we have 2 remote sites, right? >> >> - UTK, for Graham Fagg (and others?) >> - ?? for Torsten (Torsten: you mentioned where before, but I don't >> recall the name of the site) >> >> Are there others? > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[O-MPI devel] SSIMS meeting
Jeff, I'm currently at the SciDAC2 SSIMS meeting talking about software integration, maintenance, and support funding (on the order of $2 million). I promised Rich I'd keep all informed. We've set up a IRC channel (separate email). Craig
Re: [O-MPI devel] collectives discussion @LANL
Hi, > Did we ever set a day/time for the collectives meeting at LANL next > week? (we may have and I've forgotten...?) I reserved the AccessGrid from 25.-27. of July. > I ask because those of us involved in that meeting will need to reserve > time on the Access Grid and coordinate with the remote sites for > participation. yes, it should be no problem for me if the time/date does not change (we did not set a specific time as far as I remember). But I would need technical assistance - no, actually not I but the technicians from my regional video conferencing center. Is there anybody in charge for the technical basic conditions? > AFAIK, we have 2 remote sites, right? > > - UTK, for Graham Fagg (and others?) > - ??? for Torsten (Torsten: you mentioned where before, but I don't > recall the name of the site) ??? is the Technical University of Dresden (our responsible video conferencing center - http://vcc.urz.tu-dresden.de/) Thanks, Torsten -- Dipl.-Inf. Torsten Hoefler bash$ :(){ :|:&};: address: Chemnitz University of Technology department of computer science/professorship of computer architecture Strasse der Nationen 62 | 09107 Chemnitz | Germany room:014 | phone: +49 371 531 1660
Re: [O-MPI devel] Presentation of organization goals
I can talk about OpenRTE, if you want - pretty short, but could describe the research directions being discussed with others. On Mon, 2005-07-18 at 07:39, Jeff Squyres wrote: > A month or two ago, I gave the LANL guys an overview of IU's goals with > respect to OMPI. There were several "Ohhh! So *that's* why you guys > were asking for X, Y, and Z..." from the LANL guys during my > presentation. As such, I think it was really helpful for everyone to > understand our motivations. > > It's not that we were previously hiding them -- we just never got > around to explicitly sharing them. > > I think we should have similar presentations at the upcoming meeting. > Can someone from each core organization (LANL, IU, UTK, HLRS) give a > 10-15 minute presentation (or whatever) listing all your goals and > ambitions with OMPI? Separate them into short- and long-term goals, if > possible. > > For example, I'll talk about IU's MPI-2 one-sided efforts, our plans > for PBS and SLURM, and our future efforts for fault > tolerance/checkpoint restart. > > Thanks!
Re: [O-MPI devel] collectives discussion @LANL
Cindy indicated she was willing to come in early - she needs about an hour to set things up. 8:30am might be more polite for her, but I suspect she'd be willing to do 8am to accommodate Europe if necessary. On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote: > Ok, since no one made a suggestion yet :-), I nominate the following > two times: > > Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) > Thursday, 28 Jul, 8-11am MT > > (is that too early for Cindy? I seem to recall that she has to get in > early to set things up...?) > > I think 6 hours is probably enough for this one meeting. Please > propose alternate times by COB today so that we can have an upper bound > on time-for-consensus. > > Three remote sites so far: > > - UTK > - ? for Torsten > - Stuttgart > > > > On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote: > > > The day/time was never set that I know about. Cindy Sievers is still > > holding the room, but we do need to let her know ASAP the times when > > you want the system operational. > > > > > > On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a > > day/time for the collectives meeting at LANL next > >> week? (we may have and I've forgotten...?) > >> > >> I ask because those of us involved in that meeting will need to > >> reserve > >> time on the Access Grid and coordinate with the remote sites for > >> participation. > >> > >> AFAIK, we have 2 remote sites, right? > >> > >> - UTK, for Graham Fagg (and others?) > >> - ?? for Torsten (Torsten: you mentioned where before, but I don't > >> recall the name of the site) > >> > >> Are there others? > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] collectives discussion @LANL
Please make sure that foreign nationals have the building this is going to take place in (presumably, ACL-west) on their paperwork, or else they can't participate. Rich On Jul 18, 2005, at 6:59 AM, Jeff Squyres wrote: Did we ever set a day/time for the collectives meeting at LANL next week? (we may have and I've forgotten...?) I ask because those of us involved in that meeting will need to reserve time on the Access Grid and coordinate with the remote sites for participation. AFAIK, we have 2 remote sites, right? - UTK, for Graham Fagg (and others?) - ?? for Torsten (Torsten: you mentioned where before, but I don't recall the name of the site) Are there others? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [O-MPI devel] collectives discussion @LANL
Ok, since no one made a suggestion yet :-), I nominate the following two times: Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?) Thursday, 28 Jul, 8-11am MT (is that too early for Cindy? I seem to recall that she has to get in early to set things up...?) I think 6 hours is probably enough for this one meeting. Please propose alternate times by COB today so that we can have an upper bound on time-for-consensus. Three remote sites so far: - UTK - ? for Torsten - Stuttgart On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote: The day/time was never set that I know about. Cindy Sievers is still holding the room, but we do need to let her know ASAP the times when you want the system operational. On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a day/time for the collectives meeting at LANL next week? (we may have and I've forgotten...?) I ask because those of us involved in that meeting will need to reserve time on the Access Grid and coordinate with the remote sites for participation. AFAIK, we have 2 remote sites, right? - UTK, for Graham Fagg (and others?) - ?? for Torsten (Torsten: you mentioned where before, but I don't recall the name of the site) Are there others? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
[O-MPI devel] Presentation of organization goals
A month or two ago, I gave the LANL guys an overview of IU's goals with respect to OMPI. There were several "Ohhh! So *that's* why you guys were asking for X, Y, and Z..." from the LANL guys during my presentation. As such, I think it was really helpful for everyone to understand our motivations. It's not that we were previously hiding them -- we just never got around to explicitly sharing them. I think we should have similar presentations at the upcoming meeting. Can someone from each core organization (LANL, IU, UTK, HLRS) give a 10-15 minute presentation (or whatever) listing all your goals and ambitions with OMPI? Separate them into short- and long-term goals, if possible. For example, I'll talk about IU's MPI-2 one-sided efforts, our plans for PBS and SLURM, and our future efforts for fault tolerance/checkpoint restart. Thanks! -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/
Re: [O-MPI devel] collectives discussion @LANL
The day/time was never set that I know about. Cindy Sievers is still holding the room, but we do need to let her know ASAP the times when you want the system operational. On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote: > Did we ever set a day/time for the collectives meeting at LANL next > week? (we may have and I've forgotten...?) > > I ask because those of us involved in that meeting will need to reserve > time on the Access Grid and coordinate with the remote sites for > participation. > > AFAIK, we have 2 remote sites, right? > > - UTK, for Graham Fagg (and others?) > - ?? for Torsten (Torsten: you mentioned where before, but I don't > recall the name of the site) > > Are there others?
Re: [O-MPI devel] collectives discussion @LANL
don't forget Stuttgart in the list of remote sites :-). Rainer will be at the meeting, but I can only attend through Access Grid. Thanks Edgar Jeff Squyres wrote: Did we ever set a day/time for the collectives meeting at LANL next week? (we may have and I've forgotten...?) I ask because those of us involved in that meeting will need to reserve time on the Access Grid and coordinate with the remote sites for participation. AFAIK, we have 2 remote sites, right? - UTK, for Graham Fagg (and others?) - ?? for Torsten (Torsten: you mentioned where before, but I don't recall the name of the site) Are there others? -- == Dr.-Ing. Edgar Gabriel Clusters and Distributed Units High Performance Computing Center Stuttgart (HLRS) University of Stuttgart Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel Fax: +49 711 678 7626e-mail:gabr...@hlrs.de ==
Re: [O-MPI devel] processor affinity
Did a little digging into this last night, and finally figured out what you were getting at in your comments here. Yeah, I think an "affinity" framework would definitely be the best approach - can handle both cpu and memory, I imagine. Isn't clear how pressing that is as it is mostly an optimization issue, but you're welcome to create the framework if you like. On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote: > It needs to be done in the launched process itself. So we'd either > have to extend rmaps (from my understanding of rmaps, that doesn't seem > like a good idea), or do something different. > > Perhaps the easiest thing to do is to add this to the LANL meeting > agenda...? Then we can have a whiteboard to discuss. :-) > > > > On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote: > > > Wouldn't it belong in the rmaps framework? That's where we tell the > > launcher where to put each process - seems like a natural fit. > > > > > > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote: > > > >> I'm thinking that we should add some processor affinity code to OMPI > >> -- > >> possibly in the orte layer (ORTE is the interface to the back-end > >> launcher, after all). This will really help on systems like opterons > >> (and others) to prevent processes from bouncing between processors, > >> and > >> potentially getting located far from "their" RAM. > >> > >> This has the potential to help even micro-benchmark results (e.g., > >> ping-pong). It's going to be quite relevant for my shared memory > >> collective work on mauve. > >> > >> > >> General scheme: > >> --- > >> > >> I think that somewhere in ORTE, we should actively set processor > >> affinity when: > >>- Supported by the OS > >>- Not disabled by the user (via MCA param) > >>- The node is not over-subscribed with processes from this job > >> > >> Generally speaking, if you launch <=N processes in a job on a node > >> (where N == number of CPUs on that node), then we set processor > >> affinity. We set each process's affinity to the CPU number according > >> to the VPID ordering of the procs in that job on that node. So if you > >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would > >> go to processor 1, etc. (it's an easy, locally-determined ordering). > >> > >> Someday, we might want to make this scheme universe-aware (i.e., see > >> if > >> any other ORTE jobs are running on that node, and not schedule on any > >> processors that are already claimed by the processes on that(those) > >> job(s)), but I think single-job awareness is sufficient for the > >> moment. > >> > >> > >> Implementation: > >> --- > >> > >> We'll need relevant configure tests to figure out if the target system > >> as CPU affinity system calls. Those are simple to add. > >> > >> We could use simply #if statements for the affinity stuff or make it a > >> real framework. Since it's only 1 function call to set the affinity, > >> I > >> tend to lean towards the [simpler] #if solution, but could probably be > >> pretty easily convinced that a framework is the Right solution. I'm > >> on > >> the fence (and if someone convinces me, I'd volunteer for the extra > >> work to setup the framework). > >> > >> I'm not super-familiar with the processor-affinity stuff (e.g., for > >> best effect, should it be done after the fork and before the exec?), > >> so > >> I'm not sure exactly where this would go in ORTE. Potentially either > >> before new processes are exec'd (where we only have control of that in > >> some kinds of systems, like rsh/ssh) or right up very very near the > >> top > >> of orte_init(). > >> > >> Comments? > >> > >> -- > >> {+} Jeff Squyres > >> {+} The Open MPI Project > >> {+} http://www.open-mpi.org/ > >> > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
Re: [O-MPI devel] processor affinity
On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote: On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote: Generally speaking, if you launch <=N processes in a job on a node (where N == number of CPUs on that node), then we set processor affinity. We set each process's affinity to the CPU number according to the VPID ordering of the procs in that job on that node. So if you launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would go to processor 1, etc. (it's an easy, locally-determined ordering). You'd need to be careful with dual-core cpus. Say you launch a 4 task MPI job on a 4-socket dual core Opteron. You'd want to schedule the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory bandwidth to each MPI task. With the potential for non-trivial logic like this, perhaps the extra work for a real framework would be justified, then. Also, how would this work with hybrid MPI+threading (either pthreads or OpenMP) applications? Let's say you have an 8 or 16 cpu node and you start up 2 MPI tasks with 4 compute threads in each task. The optimum layout may not be running the MPI tasks on cpu's 0 and 1. Several hybrid applications that ran on ASC White and now Purple will have these requirements. Hum. Good question. The MPI API doesn't really address this -- the MPI API is not aware of additional threads that are created until you call an MPI function (and even then, we're not currently checking which thread is calling -- that would just add latency). What do these applications do right now? Do they set their own processor / memory affinity? This might actually be outside the scope of MPI...? (I'mm not trying to shrug off responsibility, but this might be a case where the MPI simply doesn't have enough information, and to get that information [e.g., via MPI attributes or MPI info arguments] would be more hassle than the user just setting the affinity themselves...?) Comments? If you set things up such that you can specify input parameters on where to put each process, you have the flexibility you want. The locality API's I have seen all mimiced the IRIX API, which had these capabilities. If you want some ideas, look at LA-MPI, it does this - the implementation is pretty strange (just he coding), but it is there. Rich -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[O-MPI devel] collectives discussion @LANL
Did we ever set a day/time for the collectives meeting at LANL next week? (we may have and I've forgotten...?) I ask because those of us involved in that meeting will need to reserve time on the Access Grid and coordinate with the remote sites for participation. AFAIK, we have 2 remote sites, right? - UTK, for Graham Fagg (and others?) - ?? for Torsten (Torsten: you mentioned where before, but I don't recall the name of the site) Are there others? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/