Re: [O-MPI devel] FYI: Failing intel tests

2005-07-18 Thread Tim S. Woodall

Josh Hursey wrote:

So I have been working a variety of platforms [x86 32 bit, x86 64 bit, 
PPC 64 bit]. As of SVN Revision 6541 below are the current failures 
from the intel_tests suite. For all of these tests I used "tcp,self" 
PTLs and the TEG PML.


On x86-32, x86-64, PPC64
-
MPI_Send_self_f
MPI_Send_self_c
MPI_Send_init_self_c

 



This will be true for all platforms w/ the ptl code - as it does no 
buffering. The intel
tests are incorrect - in that they assume the implementation provides 
some degree of

buffering (blocking sends are called before receive is posted).

The btl code will buffer up to a configurable eager limit - which I've 
set by default

to be large enough to pass the intel tests.

Tim



[O-MPI devel] FYI: Failing intel tests

2005-07-18 Thread Josh Hursey
So I have been working a variety of platforms [x86 32 bit, x86 64 bit, 
PPC 64 bit]. As of SVN Revision 6541 below are the current failures 
from the intel_tests suite. For all of these tests I used "tcp,self" 
PTLs and the TEG PML.


On x86-32, x86-64, PPC64
-
MPI_Send_self_f
MPI_Send_self_c
MPI_Send_init_self_c

On x86-64, PPC64
-
MPI_Allreduce_loc_c
MPI_Reduce_loc_c
MPI_Reduce_scatter_loc_c
MPI_Scan_loc_c



Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/



Re: [O-MPI devel] OMPI_MCA_ptl_base_exclude=sm

2005-07-18 Thread George Bosilca
There are several possibilities to make this option the default one.  
The simplest one is to use the mca-params.conf file. In your home  
directory create a .openmpi directory and inside create a file called  
mca-params.conf. You can add in this file all options that you want  
to be the default behavior of your OMPI.


Here is mine:
component_show_load_errors=0
ptl_tcp_if_include=eth0
ptl_tcp_if_exclude=lo
ptl_base_include=tcp,self
pml=uniq
mpi_yield_when_idle = 0
mpi_event_tick_rate=0

Notice that you don't have to add the OMPI_MCA in the beginning. You  
can add whatever you want just pick the name as printed by the  
ompi_info tool.


  george.


On Jul 18, 2005, at 2:49 PM, Greg Watson wrote:


Currently I have to set this environment variable on bproc/x86_64 or
mpirun just hangs. Would it be possible to make this the default
setting until it's fixed?

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] July meeting

2005-07-18 Thread Jeff Squyres

Added.

I've now got:

- ORTE sub group:
  - simplifying GPR access (if possible)
  - multi-cell RTE
  - external perspective from Eclipse guys

Since you and Ralph are not core OMPI developers and not the focus of 
the rest of the meeting, let's coordinate offline and come up with a 
time (and location?) for this.




On Jul 18, 2005, at 2:58 PM, Greg Watson wrote:


Would you be interested in an external tool perspective on orte?

Greg

On Jul 18, 2005, at 9:34 AM, Jeff Squyres wrote:


Here's the topics that I have for the July meeting (did I miss
anything?).  I don't really have a schedule -- only some things
need to
assigned times (e.g., the collectives meeting(s)).

- All: Current status, future plans/goals (presentation from each
   organization)
   - Jeff: IU
   - Tim: LANL
   - Ralph: ORTE
   - Rainer: HLRS
   - George: UTK
- All, Tim: overview of PML/BTL interface and design
   - Retire teg/PTL?
- All: SC, Euro PVM/MPI, LACSI planning
- All, Brian: what you need to know about the new configure.m4 system
- All, Jeff: build system changes
- All: discussion of others coming into the tree (e.g., IB vendors)
- All: processor/memory affinity
- All: project split issues
   - tools in wrong trees (abstraction issues)
   - wrapper compilers for ORTE/OPAL?
   - #include files
   - any final cleanup moving between opal/orte/ompi trees
- All: MCA issues
   - Framework hiding / re-entrance (e.g., ompi_info)
   - How to have multiple PML's without re-initializing BTLs/PTLs?
   - Better guidelines for framework/component "open" calls
   - Separate function for framework/component MCA parameter
registration?
   - Frameworks as DSOs

- Jeff, Ralph: simplifying GPR access (if possible)
- Red Storm sub-group: current status and future plans
- One-sided sub-group: current status, SC plans, future plans
- Fault tolerance sub-group:
   - Integrating LAM-style coordinated, synchronous checkpointing
   - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V,
etc.)
- Collectives sub-group (including Access Grid participants):
   - New framework (coll v2)
   - Using btl's
   - Striping
   - Non-blocking collectives
   - ...?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] July meeting

2005-07-18 Thread Jeff Squyres

On Jul 18, 2005, at 2:56 PM, Ralph Castain wrote:


 1. RedStorm design - what are we going to do about the RTE?


That's what I meant by "future plans", but I could certainly be more 
explicit.  :-)


 2. Multi-cell RTE - I've been working on this (finally set it aside 
to complete the scalable startup stuff first), but it is complex and 
might merit some discussion with those interested.


Shirley.  Added.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] July meeting

2005-07-18 Thread Greg Watson

Would you be interested in an external tool perspective on orte?

Greg

On Jul 18, 2005, at 9:34 AM, Jeff Squyres wrote:


Here's the topics that I have for the July meeting (did I miss
anything?).  I don't really have a schedule -- only some things  
need to

assigned times (e.g., the collectives meeting(s)).

- All: Current status, future plans/goals (presentation from each
   organization)
   - Jeff: IU
   - Tim: LANL
   - Ralph: ORTE
   - Rainer: HLRS
   - George: UTK
- All, Tim: overview of PML/BTL interface and design
   - Retire teg/PTL?
- All: SC, Euro PVM/MPI, LACSI planning
- All, Brian: what you need to know about the new configure.m4 system
- All, Jeff: build system changes
- All: discussion of others coming into the tree (e.g., IB vendors)
- All: processor/memory affinity
- All: project split issues
   - tools in wrong trees (abstraction issues)
   - wrapper compilers for ORTE/OPAL?
   - #include files
   - any final cleanup moving between opal/orte/ompi trees
- All: MCA issues
   - Framework hiding / re-entrance (e.g., ompi_info)
   - How to have multiple PML's without re-initializing BTLs/PTLs?
   - Better guidelines for framework/component "open" calls
   - Separate function for framework/component MCA parameter
registration?
   - Frameworks as DSOs

- Jeff, Ralph: simplifying GPR access (if possible)
- Red Storm sub-group: current status and future plans
- One-sided sub-group: current status, SC plans, future plans
- Fault tolerance sub-group:
   - Integrating LAM-style coordinated, synchronous checkpointing
   - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V,  
etc.)

- Collectives sub-group (including Access Grid participants):
   - New framework (coll v2)
   - Using btl's
   - Striping
   - Non-blocking collectives
   - ...?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] July meeting

2005-07-18 Thread Ralph Castain
Yo Jeff

Couple of things you might want to add:

1. RedStorm design - what are we going to do about the RTE?

2. Multi-cell RTE - I've been working on this (finally set it aside to
complete the scalable startup stuff first), but it is complex and might
merit some discussion with those interested.


On Mon, 2005-07-18 at 09:34, Jeff Squyres wrote:

> Here's the topics that I have for the July meeting (did I miss 
> anything?).  I don't really have a schedule -- only some things need to 
> assigned times (e.g., the collectives meeting(s)).
> 
> - All: Current status, future plans/goals (presentation from each
>organization)
>- Jeff: IU
>- Tim: LANL
>- Ralph: ORTE
>- Rainer: HLRS
>- George: UTK
> - All, Tim: overview of PML/BTL interface and design
>- Retire teg/PTL?
> - All: SC, Euro PVM/MPI, LACSI planning
> - All, Brian: what you need to know about the new configure.m4 system
> - All, Jeff: build system changes
> - All: discussion of others coming into the tree (e.g., IB vendors)
> - All: processor/memory affinity
> - All: project split issues
>- tools in wrong trees (abstraction issues)
>- wrapper compilers for ORTE/OPAL?
>- #include files
>- any final cleanup moving between opal/orte/ompi trees
> - All: MCA issues
>- Framework hiding / re-entrance (e.g., ompi_info)
>- How to have multiple PML's without re-initializing BTLs/PTLs?
>- Better guidelines for framework/component "open" calls
>- Separate function for framework/component MCA parameter 
> registration?
>- Frameworks as DSOs
> 
> - Jeff, Ralph: simplifying GPR access (if possible)
> - Red Storm sub-group: current status and future plans
> - One-sided sub-group: current status, SC plans, future plans
> - Fault tolerance sub-group:
>- Integrating LAM-style coordinated, synchronous checkpointing
>- How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.)
> - Collectives sub-group (including Access Grid participants):
>- New framework (coll v2)
>- Using btl's
>- Striping
>- Non-blocking collectives
>- ...?


[O-MPI devel] OMPI_MCA_ptl_base_exclude=sm

2005-07-18 Thread Greg Watson
Currently I have to set this environment variable on bproc/x86_64 or  
mpirun just hangs. Would it be possible to make this the default  
setting until it's fixed?


Greg


Re: [O-MPI devel] processor affinity

2005-07-18 Thread bmchap...@earthlink.net

Thanks to Edgar for copying me on this: it sounds like I asked at the right
time. There were some OpenMP-internal discussions on this a couple of years
ago, and a straw proposal, but at that time there was no obvious interest
on the MPI side. So we'd have to go back and look at the ideas that came up
at the time.
Barbara 

> [Original Message]
> From: Jeff Squyres 
> To: Open MPI Developers 
> Cc: Barbara Chapman 
> Date: 7/18/2005 10:44:15 AM
> Subject: Re: [O-MPI devel] processor affinity
>
> Excellent.  Seems like several people have thought of this at the same 
> time (I was pinged about this by the IB vendors).
>
> I know that others on the team have more experience in this area than I 
> do, so I personally welcome all information.  I've read a few papers on 
> the topic (general/simplified consensus: memory and process affinity is 
> good), but would appreciate any pointers to more material.
>
> After the theory, however, we need to decide on an implementation 
> strategy.  As Rich mentioned, we can do this all via MCA parameters, or 
> perhaps via MPI_Info or MPI attributes.  Although I'm not sure how much 
> of this can be set ahead of time and what needs to be done on a 
> per-thread basis, I'm assuming that each thread will need to make some 
> kind of function call (if MPI is going to handle it, then it will need 
> to be an MPI function call that triggers some magic under the covers).
>
> Any advice here from the Open MP community would also be appreciated...
>
>
> On Jul 18, 2005, at 11:08 AM, Edgar Gabriel wrote:
>
> > We have currently Barbara Chapman from the University of Houston as a
> > guest scientist here at Stuttgart. Most of you might now, that Barbara
> > is working on compiler design and OpenMP issues. This afternoon she
> > dropped in my office and asked me, whether the Open MPI group has
> > thought/discussed processor affinity issues up to now (which we just 
> > did
> > :-) ).
> >
> > Anyway, I just wanted to point out, that various people from the OpenMP
> > community have been working/are still working on this issue, and that 
> > it
> > might be interesting to exchange information and maybe coordinate the
> > approaches. I cc'ed Barbara therefore also on this email...
>
> -- 
> {+} Jeff Squyres
> {+} The Open MPI Project
> {+} http://www.open-mpi.org/




Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Tim S. Woodall

Torsten Hoefler wrote:


Hi,
 


Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
   


yes, it's ok for me - I'll ask the vcc and reply to this mail if they
have any problems.
 

Can you do the 28th, too?  I'd prevfer the 27th and 28th if possible 
(vs. 26th and 27th) because the 26th is our first day there 
(Monday/25th is a travel day) and I'd like to get a bunch of other 
stuff out of the way before we attack collectives.
   


yes of course - so I make the reservation for 27th to 28th.

 

As for tech assistance, contact Cindy at LANL (Ralph has her contact 
info -- indeed, she might proactively contact your people anyway, to 
setup for smooth sailing on the actual conference days).
   

Ralph, could you please give me Cindys contact information? 

 


I'll confirm these dates w/ Cindy and provide her your contact info.

Thanks,
Tim



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Ralph Castain
Tim is taking care of this arrangement. I believe (from prior chat with
Cindy) that she may already be familiar with her equivalent at your
location.


> > As for tech assistance, contact Cindy at LANL (Ralph has her contact 
> > info -- indeed, she might proactively contact your people anyway, to 
> > setup for smooth sailing on the actual conference days).
> Ralph, could you please give me Cindys contact information? 
> 
> Thanks,
>Torsten


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Jeff Squyres
Excellent.  Seems like several people have thought of this at the same 
time (I was pinged about this by the IB vendors).


I know that others on the team have more experience in this area than I 
do, so I personally welcome all information.  I've read a few papers on 
the topic (general/simplified consensus: memory and process affinity is 
good), but would appreciate any pointers to more material.


After the theory, however, we need to decide on an implementation 
strategy.  As Rich mentioned, we can do this all via MCA parameters, or 
perhaps via MPI_Info or MPI attributes.  Although I'm not sure how much 
of this can be set ahead of time and what needs to be done on a 
per-thread basis, I'm assuming that each thread will need to make some 
kind of function call (if MPI is going to handle it, then it will need 
to be an MPI function call that triggers some magic under the covers).


Any advice here from the Open MP community would also be appreciated...


On Jul 18, 2005, at 11:08 AM, Edgar Gabriel wrote:


We have currently Barbara Chapman from the University of Houston as a
guest scientist here at Stuttgart. Most of you might now, that Barbara
is working on compiler design and OpenMP issues. This afternoon she
dropped in my office and asked me, whether the Open MPI group has
thought/discussed processor affinity issues up to now (which we just 
did

:-) ).

Anyway, I just wanted to point out, that various people from the OpenMP
community have been working/are still working on this issue, and that 
it

might be interesting to exchange information and maybe coordinate the
approaches. I cc'ed Barbara therefore also on this email...


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] July meeting

2005-07-18 Thread Jeff Squyres
Here's the topics that I have for the July meeting (did I miss 
anything?).  I don't really have a schedule -- only some things need to 
assigned times (e.g., the collectives meeting(s)).


- All: Current status, future plans/goals (presentation from each
  organization)
  - Jeff: IU
  - Tim: LANL
  - Ralph: ORTE
  - Rainer: HLRS
  - George: UTK
- All, Tim: overview of PML/BTL interface and design
  - Retire teg/PTL?
- All: SC, Euro PVM/MPI, LACSI planning
- All, Brian: what you need to know about the new configure.m4 system
- All, Jeff: build system changes
- All: discussion of others coming into the tree (e.g., IB vendors)
- All: processor/memory affinity
- All: project split issues
  - tools in wrong trees (abstraction issues)
  - wrapper compilers for ORTE/OPAL?
  - #include files
  - any final cleanup moving between opal/orte/ompi trees
- All: MCA issues
  - Framework hiding / re-entrance (e.g., ompi_info)
  - How to have multiple PML's without re-initializing BTLs/PTLs?
  - Better guidelines for framework/component "open" calls
  - Separate function for framework/component MCA parameter 
registration?

  - Frameworks as DSOs

- Jeff, Ralph: simplifying GPR access (if possible)
- Red Storm sub-group: current status and future plans
- One-sided sub-group: current status, SC plans, future plans
- Fault tolerance sub-group:
  - Integrating LAM-style coordinated, synchronous checkpointing
  - How to do other kinds of checkpointing (e.g., FT-MPI, MPICH-V, etc.)
- Collectives sub-group (including Access Grid participants):
  - New framework (coll v2)
  - Using btl's
  - Striping
  - Non-blocking collectives
  - ...?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] processor affinity

2005-07-18 Thread Richard L. Graham

To add to this, I would suggest not coupling processor affinity and
memory locality at the component level.  At some level you do need
to tie these together, but there are other components that also need
to be considered, such as NIC locality, and probably other things
too ...

Rich

On Jul 18, 2005, at 8:50 AM, Matt Leininger wrote:


On Mon, 2005-07-18 at 08:28 -0400, Jeff Squyres wrote:

On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:


Generally speaking, if you launch <=N processes in a job on a node
(where N == number of CPUs on that node), then we set processor
affinity.  We set each process's affinity to the CPU number 
according
to the VPID ordering of the procs in that job on that node.  So if 
you
launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 
would

go to processor 1, etc. (it's an easy, locally-determined ordering).


   You'd need to be careful with dual-core cpus.  Say you launch a 4
task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
bandwidth to each MPI task.


With the potential for non-trivial logic like this, perhaps the extra
work for a real framework would be justified, then.


   I agree.


   Also, how would this work with hybrid MPI+threading (either 
pthreads

or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and
you
start up 2 MPI tasks with 4 compute threads in each task.  The 
optimum

layout may not be running the MPI tasks on cpu's 0 and 1.  Several
hybrid applications that ran on ASC White and now Purple will have
these
requirements.


Hum.  Good question.  The MPI API doesn't really address this -- the
MPI API is not aware of additional threads that are created until you
call an MPI function (and even then, we're not currently checking 
which

thread is calling -- that would just add latency).

What do these applications do right now?  Do they set their own
processor / memory affinity?  This might actually be outside the scope
of MPI...?  (I'mm not trying to shrug off responsibility, but this
might be a case where the MPI simply doesn't have enough information,
and to get that information [e.g., via MPI attributes or MPI info
arguments] would be more hassle than the user just setting the 
affinity

themselves...?)


  We played around with setting processor affinity in our app a few
years ago.  It got a little ugly, but things have improved since then.

  I was thinking of having the app pass threading info to MPI (via info
or attributes).  This might be outside the scope of MPI now, but this
should be the responsibility of the parallel programming
language/method.  Making it the apps responsibility to set processor
affinity seems a bit too much of a low-level worry to put on 
application

developers.


   Some discussions around what a memory/processor affinity framework
should look like and be doing is a good starting point.

  - Matt


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Torsten Hoefler
Hi,
> >> Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
> > yes, it's ok for me - I'll ask the vcc and reply to this mail if they
> > have any problems.
> 
> Can you do the 28th, too?  I'd prevfer the 27th and 28th if possible 
> (vs. 26th and 27th) because the 26th is our first day there 
> (Monday/25th is a travel day) and I'd like to get a bunch of other 
> stuff out of the way before we attack collectives.
yes of course - so I make the reservation for 27th to 28th.

> As for tech assistance, contact Cindy at LANL (Ralph has her contact 
> info -- indeed, she might proactively contact your people anyway, to 
> setup for smooth sailing on the actual conference days).
Ralph, could you please give me Cindys contact information? 

Thanks,
   Torsten

-- 
 bash$ :(){ :|:&};: - pgp: http://www.unixer.de/htor-key.asc -
Only wimps use tape backup. REAL men just upload their important stuff
on ftp and let the rest of the world mirror it. 
   
(Linus Torvalds)



Re: [O-MPI devel] processor affinity

2005-07-18 Thread Edgar Gabriel
We have currently Barbara Chapman from the University of Houston as a 
guest scientist here at Stuttgart. Most of you might now, that Barbara 
is working on compiler design and OpenMP issues. This afternoon she 
dropped in my office and asked me, whether the Open MPI group has 
thought/discussed processor affinity issues up to now (which we just did 
:-) ).


Anyway, I just wanted to point out, that various people from the OpenMP 
community have been working/are still working on this issue, and that it 
might be interesting to exchange information and maybe coordinate the 
approaches. I cc'ed Barbara therefore also on this email...


Thanks
Edgar

Rich L. Graham wrote:

On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote:



On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:



Generally speaking, if you launch <=N processes in a job on a node
(where N == number of CPUs on that node), then we set processor
affinity.  We set each process's affinity to the CPU number according
to the VPID ordering of the procs in that job on that node.  So if 
you

launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
go to processor 1, etc. (it's an easy, locally-determined ordering).


  You'd need to be careful with dual-core cpus.  Say you launch a 4
task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
bandwidth to each MPI task.


With the potential for non-trivial logic like this, perhaps the extra
work for a real framework would be justified, then.


  Also, how would this work with hybrid MPI+threading (either 
pthreads

or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and
you
start up 2 MPI tasks with 4 compute threads in each task.  The optimum
layout may not be running the MPI tasks on cpu's 0 and 1.  Several
hybrid applications that ran on ASC White and now Purple will have
these
requirements.


Hum.  Good question.  The MPI API doesn't really address this -- the
MPI API is not aware of additional threads that are created until you
call an MPI function (and even then, we're not currently checking which
thread is calling -- that would just add latency).

What do these applications do right now?  Do they set their own
processor / memory affinity?  This might actually be outside the scope
of MPI...?  (I'mm not trying to shrug off responsibility, but this
might be a case where the MPI simply doesn't have enough information,
and to get that information [e.g., via MPI attributes or MPI info
arguments] would be more hassle than the user just setting the affinity
themselves...?)

Comments?



If you set things up such that you can specify input parameters on where
to put each process, you have the flexibility you want.  The locality 
API's
I have seen all mimiced the IRIX API, which had these capabilities.  If 
you

want some ideas, look at LA-MPI, it does this - the implementation is
pretty strange (just he coding), but it is there.

Rich



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
==
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626e-mail:gabr...@hlrs.de
==


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Matt Leininger
On Mon, 2005-07-18 at 08:28 -0400, Jeff Squyres wrote:
> On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:
> 
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity.  We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node.  So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >
> >You'd need to be careful with dual-core cpus.  Say you launch a 4
> > task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
> > the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
> > bandwidth to each MPI task.
> 
> With the potential for non-trivial logic like this, perhaps the extra 
> work for a real framework would be justified, then.

   I agree.
> 
> >Also, how would this work with hybrid MPI+threading (either pthreads
> > or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and 
> > you
> > start up 2 MPI tasks with 4 compute threads in each task.  The optimum
> > layout may not be running the MPI tasks on cpu's 0 and 1.  Several
> > hybrid applications that ran on ASC White and now Purple will have 
> > these
> > requirements.
> 
> Hum.  Good question.  The MPI API doesn't really address this -- the 
> MPI API is not aware of additional threads that are created until you 
> call an MPI function (and even then, we're not currently checking which 
> thread is calling -- that would just add latency).
> 
> What do these applications do right now?  Do they set their own 
> processor / memory affinity?  This might actually be outside the scope 
> of MPI...?  (I'mm not trying to shrug off responsibility, but this 
> might be a case where the MPI simply doesn't have enough information, 
> and to get that information [e.g., via MPI attributes or MPI info 
> arguments] would be more hassle than the user just setting the affinity 
> themselves...?)

  We played around with setting processor affinity in our app a few
years ago.  It got a little ugly, but things have improved since then.

  I was thinking of having the app pass threading info to MPI (via info
or attributes).  This might be outside the scope of MPI now, but this
should be the responsibility of the parallel programming
language/method.  Making it the apps responsibility to set processor
affinity seems a bit too much of a low-level worry to put on application
developers.  


   Some discussions around what a memory/processor affinity framework
should look like and be doing is a good starting point.

  - Matt




Re: [O-MPI devel] Presentation of organization goals

2005-07-18 Thread Tim S. Woodall

I'd be glad to provide an overview of LANLs proposed '06 OpenMPI
development.

Tim

Jeff Squyres wrote:

A month or two ago, I gave the LANL guys an overview of IU's goals with 
respect to OMPI.  There were several "Ohhh!  So *that's* why you guys 
were asking for X, Y, and Z..." from the LANL guys during my 
presentation.  As such, I think it was really helpful for everyone to 
understand our motivations.


It's not that we were previously hiding them -- we just never got 
around to explicitly sharing them.


I think we should have similar presentations at the upcoming meeting.  
Can someone from each core organization (LANL, IU, UTK, HLRS) give a 
10-15 minute presentation (or whatever) listing all your goals and 
ambitions with OMPI?  Separate them into short- and long-term goals, if 
possible.


For example, I'll talk about IU's MPI-2 one-sided efforts, our plans 
for PBS and SLURM, and our future efforts for fault 
tolerance/checkpoint restart.


Thanks!

 





Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Jeff Squyres

On Jul 18, 2005, at 10:28 AM, Torsten Hoefler wrote:


Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)

yes, it's ok for me - I'll ask the vcc and reply to this mail if they
have any problems.


Can you do the 28th, too?  I'd prevfer the 27th and 28th if possible 
(vs. 26th and 27th) because the 26th is our first day there 
(Monday/25th is a travel day) and I'd like to get a bunch of other 
stuff out of the way before we attack collectives.


As for tech assistance, contact Cindy at LANL (Ralph has her contact 
info -- indeed, she might proactively contact your people anyway, to 
setup for smooth sailing on the actual conference days).


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Edgar Gabriel
I am happy with any of the two dates either... whichever is more 
convinient for LANL...


Jeff Squyres wrote:


I'm happy with either.

Europe -- comments?


On Jul 18, 2005, at 9:59 AM, Ralph Castain wrote:


Cindy indicated she was willing to come in early - she needs about an 
hour to set things up. 8:30am might be more polite for her, but I 
suspect she'd be willing to do 8am to accommodate Europe if necessary.



On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote:Ok, since no one made 
a suggestion yet :-), I nominate the following



two times:

Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
Thursday, 28 Jul, 8-11am MT

(is that too early for Cindy?  I seem to recall that she has to get in
early to set things up...?)

I think 6 hours is probably enough for this one meeting.  Please
propose alternate times by COB today so that we can have an upper 
bound

on time-for-consensus.

Three remote sites so far:

- UTK
- ? for Torsten
- Stuttgart



On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote:


The day/time was never set that I know about. Cindy Sievers is 


still


holding the room, but we do need to let her know ASAP the times when
you want the system operational.


On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a
day/time for the collectives meeting at LANL next


week?  (we may have and I've forgotten...?)

I ask because those of us involved in that meeting will need to
reserve
time on the Access Grid and coordinate with the remote sites for
participation.

AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't
recall the name of the site)

Are there others?


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





--
==
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626e-mail:gabr...@hlrs.de
==


Re: [O-MPI devel] Presentation of organization goals

2005-07-18 Thread Jeff Squyres

Excellent.

On Jul 18, 2005, at 10:01 AM, Ralph Castain wrote:

 I can talk about OpenRTE, if you want - pretty short, but could 
describe the research directions being discussed with others.



 On Mon, 2005-07-18 at 07:39, Jeff Squyres wrote:A month or two ago, I 
gave the LANL guys an overview of IU's goals with

respect to OMPI.  There were several "Ohhh!  So *that's* why you guys
were asking for X, Y, and Z..." from the LANL guys during my
presentation.  As such, I think it was really helpful for everyone to
understand our motivations.

It's not that we were previously hiding them -- we just never got
around to explicitly sharing them.

I think we should have similar presentations at the upcoming meeting.
Can someone from each core organization (LANL, IU, UTK, HLRS) give a
10-15 minute presentation (or whatever) listing all your goals and
ambitions with OMPI?  Separate them into short- and long-term goals, 
if

possible.

For example, I'll talk about IU's MPI-2 one-sided efforts, our plans
for PBS and SLURM, and our future efforts for fault
tolerance/checkpoint restart.

Thanks!

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Torsten Hoefler
Hi,
> Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
yes, it's ok for me - I'll ask the vcc and reply to this mail if they
have any problems.

Bye,
   Torsten

-- 
"Elchwurst enthält kein Rindfleisch"
  Schild bei IKEA   


Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Jeff Squyres

I'm happy with either.

Europe -- comments?


On Jul 18, 2005, at 9:59 AM, Ralph Castain wrote:

 Cindy indicated she was willing to come in early - she needs about an 
hour to set things up. 8:30am might be more polite for her, but I 
suspect she'd be willing to do 8am to accommodate Europe if necessary.



 On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote:Ok, since no one made 
a suggestion yet :-), I nominate the following

two times:

Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
Thursday, 28 Jul, 8-11am MT

(is that too early for Cindy?  I seem to recall that she has to get in
early to set things up...?)

I think 6 hours is probably enough for this one meeting.  Please
propose alternate times by COB today so that we can have an upper 
bound

on time-for-consensus.

Three remote sites so far:

- UTK
- ? for Torsten
- Stuttgart



On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote:

>  The day/time was never set that I know about. Cindy Sievers is 
still

> holding the room, but we do need to let her know ASAP the times when
> you want the system operational.
>
>
>  On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a
> day/time for the collectives meeting at LANL next
>> week?  (we may have and I've forgotten...?)
>>
>> I ask because those of us involved in that meeting will need to
>> reserve
>> time on the Access Grid and coordinate with the remote sites for
>> participation.
>>
>> AFAIK, we have 2 remote sites, right?
>>
>> - UTK, for Graham Fagg (and others?)
>> - ?? for Torsten (Torsten: you mentioned where before, but I don't
>> recall the name of the site)
>>
>> Are there others?
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

 ___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] SSIMS meeting

2005-07-18 Thread Craig Rasmussen

Jeff,

I'm currently at the SciDAC2 SSIMS meeting talking about software  
integration, maintenance, and support funding (on the order of $2  
million).  I promised Rich I'd keep all informed.  We've set up a IRC  
channel (separate email).


Craig



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Torsten Hoefler
Hi,
> Did we ever set a day/time for the collectives meeting at LANL next 
> week?  (we may have and I've forgotten...?)
I reserved the AccessGrid from 25.-27. of July. 

> I ask because those of us involved in that meeting will need to reserve 
> time on the Access Grid and coordinate with the remote sites for 
> participation.
yes, it should be no problem for me if the time/date does not change
(we did not set a specific time as far as I remember). But I would need
technical assistance - no, actually not I but the technicians from my
regional video conferencing center. Is there anybody in charge for
the technical basic conditions?

> AFAIK, we have 2 remote sites, right?
> 
> - UTK, for Graham Fagg (and others?)
> - ??? for Torsten (Torsten: you mentioned where before, but I don't 
> recall the name of the site)
??? is the Technical University of Dresden (our responsible video
conferencing center - http://vcc.urz.tu-dresden.de/)

Thanks,
   Torsten

-- 
Dipl.-Inf. Torsten Hoefler  bash$ :(){ :|:&};: 
address: Chemnitz University of Technology
 department of computer science/professorship of computer architecture
 Strasse der Nationen 62 | 09107 Chemnitz | Germany
room:014 | phone: +49 371 531 1660


Re: [O-MPI devel] Presentation of organization goals

2005-07-18 Thread Ralph Castain
I can talk about OpenRTE, if you want - pretty short, but could describe
the research directions being discussed with others.


On Mon, 2005-07-18 at 07:39, Jeff Squyres wrote:

> A month or two ago, I gave the LANL guys an overview of IU's goals with 
> respect to OMPI.  There were several "Ohhh!  So *that's* why you guys 
> were asking for X, Y, and Z..." from the LANL guys during my 
> presentation.  As such, I think it was really helpful for everyone to 
> understand our motivations.
> 
> It's not that we were previously hiding them -- we just never got 
> around to explicitly sharing them.
> 
> I think we should have similar presentations at the upcoming meeting.  
> Can someone from each core organization (LANL, IU, UTK, HLRS) give a 
> 10-15 minute presentation (or whatever) listing all your goals and 
> ambitions with OMPI?  Separate them into short- and long-term goals, if 
> possible.
> 
> For example, I'll talk about IU's MPI-2 one-sided efforts, our plans 
> for PBS and SLURM, and our future efforts for fault 
> tolerance/checkpoint restart.
> 
> Thanks!


Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Ralph Castain
Cindy indicated she was willing to come in early - she needs about an
hour to set things up. 8:30am might be more polite for her, but I
suspect she'd be willing to do 8am to accommodate Europe if necessary.


On Mon, 2005-07-18 at 07:48, Jeff Squyres wrote:

> Ok, since no one made a suggestion yet :-), I nominate the following 
> two times:
> 
> Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
> Thursday, 28 Jul, 8-11am MT
> 
> (is that too early for Cindy?  I seem to recall that she has to get in 
> early to set things up...?)
> 
> I think 6 hours is probably enough for this one meeting.  Please 
> propose alternate times by COB today so that we can have an upper bound 
> on time-for-consensus.
> 
> Three remote sites so far:
> 
> - UTK
> - ? for Torsten
> - Stuttgart
> 
> 
> 
> On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote:
> 
> >  The day/time was never set that I know about. Cindy Sievers is still 
> > holding the room, but we do need to let her know ASAP the times when 
> > you want the system operational.
> >
> >
> >  On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a 
> > day/time for the collectives meeting at LANL next
> >> week?  (we may have and I've forgotten...?)
> >>
> >> I ask because those of us involved in that meeting will need to 
> >> reserve
> >> time on the Access Grid and coordinate with the remote sites for
> >> participation.
> >>
> >> AFAIK, we have 2 remote sites, right?
> >>
> >> - UTK, for Graham Fagg (and others?)
> >> - ?? for Torsten (Torsten: you mentioned where before, but I don't
> >> recall the name of the site)
> >>
> >> Are there others?
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Rich L. Graham

Please make sure that foreign nationals have the building this is
going to take place in (presumably, ACL-west) on their paperwork,
or else they can't participate.

Rich

On Jul 18, 2005, at 6:59 AM, Jeff Squyres wrote:


Did we ever set a day/time for the collectives meeting at LANL next
week?  (we may have and I've forgotten...?)

I ask because those of us involved in that meeting will need to reserve
time on the Access Grid and coordinate with the remote sites for
participation.

AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't
recall the name of the site)

Are there others?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Jeff Squyres
Ok, since no one made a suggestion yet :-), I nominate the following 
two times:


Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
Thursday, 28 Jul, 8-11am MT

(is that too early for Cindy?  I seem to recall that she has to get in 
early to set things up...?)


I think 6 hours is probably enough for this one meeting.  Please 
propose alternate times by COB today so that we can have an upper bound 
on time-for-consensus.


Three remote sites so far:

- UTK
- ? for Torsten
- Stuttgart



On Jul 18, 2005, at 9:28 AM, Ralph Castain wrote:

 The day/time was never set that I know about. Cindy Sievers is still 
holding the room, but we do need to let her know ASAP the times when 
you want the system operational.



 On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:Did we ever set a 
day/time for the collectives meeting at LANL next

week?  (we may have and I've forgotten...?)

I ask because those of us involved in that meeting will need to 
reserve

time on the Access Grid and coordinate with the remote sites for
participation.

AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't
recall the name of the site)

Are there others?

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] Presentation of organization goals

2005-07-18 Thread Jeff Squyres
A month or two ago, I gave the LANL guys an overview of IU's goals with 
respect to OMPI.  There were several "Ohhh!  So *that's* why you guys 
were asking for X, Y, and Z..." from the LANL guys during my 
presentation.  As such, I think it was really helpful for everyone to 
understand our motivations.


It's not that we were previously hiding them -- we just never got 
around to explicitly sharing them.


I think we should have similar presentations at the upcoming meeting.  
Can someone from each core organization (LANL, IU, UTK, HLRS) give a 
10-15 minute presentation (or whatever) listing all your goals and 
ambitions with OMPI?  Separate them into short- and long-term goals, if 
possible.


For example, I'll talk about IU's MPI-2 one-sided efforts, our plans 
for PBS and SLURM, and our future efforts for fault 
tolerance/checkpoint restart.


Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Ralph Castain
The day/time was never set that I know about. Cindy Sievers is still
holding the room, but we do need to let her know ASAP the times when you
want the system operational.


On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:

> Did we ever set a day/time for the collectives meeting at LANL next 
> week?  (we may have and I've forgotten...?)
> 
> I ask because those of us involved in that meeting will need to reserve 
> time on the Access Grid and coordinate with the remote sites for 
> participation.
> 
> AFAIK, we have 2 remote sites, right?
> 
> - UTK, for Graham Fagg (and others?)
> - ?? for Torsten (Torsten: you mentioned where before, but I don't 
> recall the name of the site)
> 
> Are there others?


Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Edgar Gabriel
don't forget Stuttgart in the list of remote sites :-). Rainer will be 
at the meeting, but I can only attend through Access Grid.


Thanks
Edgar

Jeff Squyres wrote:

Did we ever set a day/time for the collectives meeting at LANL next 
week?  (we may have and I've forgotten...?)


I ask because those of us involved in that meeting will need to reserve 
time on the Access Grid and coordinate with the remote sites for 
participation.


AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't 
recall the name of the site)


Are there others?



--
==
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626e-mail:gabr...@hlrs.de
==


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Ralph Castain
Did a little digging into this last night, and finally figured out what
you were getting at in your comments here. Yeah, I think an "affinity"
framework would definitely be the best approach - can handle both cpu
and memory, I  imagine. Isn't clear how pressing that is as it is mostly
an optimization issue, but you're welcome to create the framework if you
like.


On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote:

> It needs to be done in the launched process itself.  So we'd either 
> have to extend rmaps (from my understanding of rmaps, that doesn't seem 
> like a good idea), or do something different.
> 
> Perhaps the easiest thing to do is to add this to the LANL meeting 
> agenda...?  Then we can have a whiteboard to discuss.  :-)
> 
> 
> 
> On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote:
> 
> > Wouldn't it belong in the rmaps framework? That's where we tell the
> > launcher where to put each process - seems like a natural fit.
> >
> >
> > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote:
> >
> >> I'm thinking that we should add some processor affinity code to OMPI 
> >> --
> >> possibly in the orte layer (ORTE is the interface to the back-end
> >> launcher, after all).  This will really help on systems like opterons
> >> (and others) to prevent processes from bouncing between processors, 
> >> and
> >> potentially getting located far from "their" RAM.
> >>
> >> This has the potential to help even micro-benchmark results (e.g.,
> >> ping-pong).  It's going to be quite relevant for my shared memory
> >> collective work on mauve.
> >>
> >>
> >> General scheme:
> >> ---
> >>
> >> I think that somewhere in ORTE, we should actively set processor
> >> affinity when:
> >>- Supported by the OS
> >>- Not disabled by the user (via MCA param)
> >>- The node is not over-subscribed with processes from this job
> >>
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity.  We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node.  So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >>
> >> Someday, we might want to make this scheme universe-aware (i.e., see 
> >> if
> >> any other ORTE jobs are running on that node, and not schedule on any
> >> processors that are already claimed by the processes on that(those)
> >> job(s)), but I think single-job awareness is sufficient for the 
> >> moment.
> >>
> >>
> >> Implementation:
> >> ---
> >>
> >> We'll need relevant configure tests to figure out if the target system
> >> as CPU affinity system calls.  Those are simple to add.
> >>
> >> We could use simply #if statements for the affinity stuff or make it a
> >> real framework.  Since it's only 1 function call to set the affinity, 
> >> I
> >> tend to lean towards the [simpler] #if solution, but could probably be
> >> pretty easily convinced that a framework is the Right solution.  I'm 
> >> on
> >> the fence (and if someone convinces me, I'd volunteer for the extra
> >> work to setup the framework).
> >>
> >> I'm not super-familiar with the processor-affinity stuff (e.g., for
> >> best effect, should it be done after the fork and before the exec?), 
> >> so
> >> I'm not sure exactly where this would go in ORTE.  Potentially either
> >> before new processes are exec'd (where we only have control of that in
> >> some kinds of systems, like rsh/ssh) or right up very very near the 
> >> top
> >> of orte_init().
> >>
> >> Comments?
> >>
> >> -- 
> >> {+} Jeff Squyres
> >> {+} The Open MPI Project
> >> {+} http://www.open-mpi.org/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Rich L. Graham


On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote:


On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:


Generally speaking, if you launch <=N processes in a job on a node
(where N == number of CPUs on that node), then we set processor
affinity.  We set each process's affinity to the CPU number according
to the VPID ordering of the procs in that job on that node.  So if 
you

launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
go to processor 1, etc. (it's an easy, locally-determined ordering).


   You'd need to be careful with dual-core cpus.  Say you launch a 4
task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
bandwidth to each MPI task.


With the potential for non-trivial logic like this, perhaps the extra
work for a real framework would be justified, then.

   Also, how would this work with hybrid MPI+threading (either 
pthreads

or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and
you
start up 2 MPI tasks with 4 compute threads in each task.  The optimum
layout may not be running the MPI tasks on cpu's 0 and 1.  Several
hybrid applications that ran on ASC White and now Purple will have
these
requirements.


Hum.  Good question.  The MPI API doesn't really address this -- the
MPI API is not aware of additional threads that are created until you
call an MPI function (and even then, we're not currently checking which
thread is calling -- that would just add latency).

What do these applications do right now?  Do they set their own
processor / memory affinity?  This might actually be outside the scope
of MPI...?  (I'mm not trying to shrug off responsibility, but this
might be a case where the MPI simply doesn't have enough information,
and to get that information [e.g., via MPI attributes or MPI info
arguments] would be more hassle than the user just setting the affinity
themselves...?)

Comments?


If you set things up such that you can specify input parameters on where
to put each process, you have the flexibility you want.  The locality 
API's
I have seen all mimiced the IRIX API, which had these capabilities.  If 
you

want some ideas, look at LA-MPI, it does this - the implementation is
pretty strange (just he coding), but it is there.

Rich



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Jeff Squyres
Did we ever set a day/time for the collectives meeting at LANL next 
week?  (we may have and I've forgotten...?)


I ask because those of us involved in that meeting will need to reserve 
time on the Access Grid and coordinate with the remote sites for 
participation.


AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't 
recall the name of the site)


Are there others?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/