Re: [O-MPI devel] July meeting

2005-07-18 Thread Jeff Squyres

On Jul 18, 2005, at 2:56 PM, Ralph Castain wrote:


 1. RedStorm design - what are we going to do about the RTE?


That's what I meant by "future plans", but I could certainly be more 
explicit.  :-)


 2. Multi-cell RTE - I've been working on this (finally set it aside 
to complete the scalable startup stuff first), but it is complex and 
might merit some discussion with those interested.


Shirley.  Added.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Ralph Castain
Tim is taking care of this arrangement. I believe (from prior chat with
Cindy) that she may already be familiar with her equivalent at your
location.


> > As for tech assistance, contact Cindy at LANL (Ralph has her contact 
> > info -- indeed, she might proactively contact your people anyway, to 
> > setup for smooth sailing on the actual conference days).
> Ralph, could you please give me Cindys contact information? 
> 
> Thanks,
>Torsten


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Matt Leininger
On Mon, 2005-07-18 at 08:28 -0400, Jeff Squyres wrote:
> On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:
> 
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity.  We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node.  So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >
> >You'd need to be careful with dual-core cpus.  Say you launch a 4
> > task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
> > the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
> > bandwidth to each MPI task.
> 
> With the potential for non-trivial logic like this, perhaps the extra 
> work for a real framework would be justified, then.

   I agree.
> 
> >Also, how would this work with hybrid MPI+threading (either pthreads
> > or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and 
> > you
> > start up 2 MPI tasks with 4 compute threads in each task.  The optimum
> > layout may not be running the MPI tasks on cpu's 0 and 1.  Several
> > hybrid applications that ran on ASC White and now Purple will have 
> > these
> > requirements.
> 
> Hum.  Good question.  The MPI API doesn't really address this -- the 
> MPI API is not aware of additional threads that are created until you 
> call an MPI function (and even then, we're not currently checking which 
> thread is calling -- that would just add latency).
> 
> What do these applications do right now?  Do they set their own 
> processor / memory affinity?  This might actually be outside the scope 
> of MPI...?  (I'mm not trying to shrug off responsibility, but this 
> might be a case where the MPI simply doesn't have enough information, 
> and to get that information [e.g., via MPI attributes or MPI info 
> arguments] would be more hassle than the user just setting the affinity 
> themselves...?)

  We played around with setting processor affinity in our app a few
years ago.  It got a little ugly, but things have improved since then.

  I was thinking of having the app pass threading info to MPI (via info
or attributes).  This might be outside the scope of MPI now, but this
should be the responsibility of the parallel programming
language/method.  Making it the apps responsibility to set processor
affinity seems a bit too much of a low-level worry to put on application
developers.  


   Some discussions around what a memory/processor affinity framework
should look like and be doing is a good starting point.

  - Matt




Re: [O-MPI devel] Presentation of organization goals

2005-07-18 Thread Jeff Squyres

Excellent.

On Jul 18, 2005, at 10:01 AM, Ralph Castain wrote:

 I can talk about OpenRTE, if you want - pretty short, but could 
describe the research directions being discussed with others.



 On Mon, 2005-07-18 at 07:39, Jeff Squyres wrote:A month or two ago, I 
gave the LANL guys an overview of IU's goals with

respect to OMPI.  There were several "Ohhh!  So *that's* why you guys
were asking for X, Y, and Z..." from the LANL guys during my
presentation.  As such, I think it was really helpful for everyone to
understand our motivations.

It's not that we were previously hiding them -- we just never got
around to explicitly sharing them.

I think we should have similar presentations at the upcoming meeting.
Can someone from each core organization (LANL, IU, UTK, HLRS) give a
10-15 minute presentation (or whatever) listing all your goals and
ambitions with OMPI?  Separate them into short- and long-term goals, 
if

possible.

For example, I'll talk about IU's MPI-2 one-sided efforts, our plans
for PBS and SLURM, and our future efforts for fault
tolerance/checkpoint restart.

Thanks!

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Torsten Hoefler
Hi,
> Wednesday, 27 Jul, 8-11am MT (I think that's 4-7pm DE time, right?)
yes, it's ok for me - I'll ask the vcc and reply to this mail if they
have any problems.

Bye,
   Torsten

-- 
"Elchwurst enthält kein Rindfleisch"
  Schild bei IKEA   


[O-MPI devel] SSIMS meeting

2005-07-18 Thread Craig Rasmussen

Jeff,

I'm currently at the SciDAC2 SSIMS meeting talking about software  
integration, maintenance, and support funding (on the order of $2  
million).  I promised Rich I'd keep all informed.  We've set up a IRC  
channel (separate email).


Craig



[O-MPI devel] Presentation of organization goals

2005-07-18 Thread Jeff Squyres
A month or two ago, I gave the LANL guys an overview of IU's goals with 
respect to OMPI.  There were several "Ohhh!  So *that's* why you guys 
were asking for X, Y, and Z..." from the LANL guys during my 
presentation.  As such, I think it was really helpful for everyone to 
understand our motivations.


It's not that we were previously hiding them -- we just never got 
around to explicitly sharing them.


I think we should have similar presentations at the upcoming meeting.  
Can someone from each core organization (LANL, IU, UTK, HLRS) give a 
10-15 minute presentation (or whatever) listing all your goals and 
ambitions with OMPI?  Separate them into short- and long-term goals, if 
possible.


For example, I'll talk about IU's MPI-2 one-sided efforts, our plans 
for PBS and SLURM, and our future efforts for fault 
tolerance/checkpoint restart.


Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Ralph Castain
The day/time was never set that I know about. Cindy Sievers is still
holding the room, but we do need to let her know ASAP the times when you
want the system operational.


On Mon, 2005-07-18 at 06:59, Jeff Squyres wrote:

> Did we ever set a day/time for the collectives meeting at LANL next 
> week?  (we may have and I've forgotten...?)
> 
> I ask because those of us involved in that meeting will need to reserve 
> time on the Access Grid and coordinate with the remote sites for 
> participation.
> 
> AFAIK, we have 2 remote sites, right?
> 
> - UTK, for Graham Fagg (and others?)
> - ?? for Torsten (Torsten: you mentioned where before, but I don't 
> recall the name of the site)
> 
> Are there others?


Re: [O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Edgar Gabriel
don't forget Stuttgart in the list of remote sites :-). Rainer will be 
at the meeting, but I can only attend through Access Grid.


Thanks
Edgar

Jeff Squyres wrote:

Did we ever set a day/time for the collectives meeting at LANL next 
week?  (we may have and I've forgotten...?)


I ask because those of us involved in that meeting will need to reserve 
time on the Access Grid and coordinate with the remote sites for 
participation.


AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't 
recall the name of the site)


Are there others?



--
==
Dr.-Ing. Edgar Gabriel
Clusters and Distributed Units
High Performance Computing Center Stuttgart (HLRS)
University of Stuttgart
Tel: +49 711 685 8039http://www.hlrs.de/people/gabriel
Fax: +49 711 678 7626e-mail:gabr...@hlrs.de
==


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Ralph Castain
Did a little digging into this last night, and finally figured out what
you were getting at in your comments here. Yeah, I think an "affinity"
framework would definitely be the best approach - can handle both cpu
and memory, I  imagine. Isn't clear how pressing that is as it is mostly
an optimization issue, but you're welcome to create the framework if you
like.


On Sun, 2005-07-17 at 09:13, Jeff Squyres wrote:

> It needs to be done in the launched process itself.  So we'd either 
> have to extend rmaps (from my understanding of rmaps, that doesn't seem 
> like a good idea), or do something different.
> 
> Perhaps the easiest thing to do is to add this to the LANL meeting 
> agenda...?  Then we can have a whiteboard to discuss.  :-)
> 
> 
> 
> On Jul 17, 2005, at 10:26 AM, Ralph Castain wrote:
> 
> > Wouldn't it belong in the rmaps framework? That's where we tell the
> > launcher where to put each process - seems like a natural fit.
> >
> >
> > On Jul 17, 2005, at 6:45 AM, Jeff Squyres wrote:
> >
> >> I'm thinking that we should add some processor affinity code to OMPI 
> >> --
> >> possibly in the orte layer (ORTE is the interface to the back-end
> >> launcher, after all).  This will really help on systems like opterons
> >> (and others) to prevent processes from bouncing between processors, 
> >> and
> >> potentially getting located far from "their" RAM.
> >>
> >> This has the potential to help even micro-benchmark results (e.g.,
> >> ping-pong).  It's going to be quite relevant for my shared memory
> >> collective work on mauve.
> >>
> >>
> >> General scheme:
> >> ---
> >>
> >> I think that somewhere in ORTE, we should actively set processor
> >> affinity when:
> >>- Supported by the OS
> >>- Not disabled by the user (via MCA param)
> >>- The node is not over-subscribed with processes from this job
> >>
> >> Generally speaking, if you launch <=N processes in a job on a node
> >> (where N == number of CPUs on that node), then we set processor
> >> affinity.  We set each process's affinity to the CPU number according
> >> to the VPID ordering of the procs in that job on that node.  So if you
> >> launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
> >> go to processor 1, etc. (it's an easy, locally-determined ordering).
> >>
> >> Someday, we might want to make this scheme universe-aware (i.e., see 
> >> if
> >> any other ORTE jobs are running on that node, and not schedule on any
> >> processors that are already claimed by the processes on that(those)
> >> job(s)), but I think single-job awareness is sufficient for the 
> >> moment.
> >>
> >>
> >> Implementation:
> >> ---
> >>
> >> We'll need relevant configure tests to figure out if the target system
> >> as CPU affinity system calls.  Those are simple to add.
> >>
> >> We could use simply #if statements for the affinity stuff or make it a
> >> real framework.  Since it's only 1 function call to set the affinity, 
> >> I
> >> tend to lean towards the [simpler] #if solution, but could probably be
> >> pretty easily convinced that a framework is the Right solution.  I'm 
> >> on
> >> the fence (and if someone convinces me, I'd volunteer for the extra
> >> work to setup the framework).
> >>
> >> I'm not super-familiar with the processor-affinity stuff (e.g., for
> >> best effect, should it be done after the fork and before the exec?), 
> >> so
> >> I'm not sure exactly where this would go in ORTE.  Potentially either
> >> before new processes are exec'd (where we only have control of that in
> >> some kinds of systems, like rsh/ssh) or right up very very near the 
> >> top
> >> of orte_init().
> >>
> >> Comments?
> >>
> >> -- 
> >> {+} Jeff Squyres
> >> {+} The Open MPI Project
> >> {+} http://www.open-mpi.org/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >


Re: [O-MPI devel] processor affinity

2005-07-18 Thread Rich L. Graham


On Jul 18, 2005, at 6:28 AM, Jeff Squyres wrote:


On Jul 18, 2005, at 2:50 AM, Matt Leininger wrote:


Generally speaking, if you launch <=N processes in a job on a node
(where N == number of CPUs on that node), then we set processor
affinity.  We set each process's affinity to the CPU number according
to the VPID ordering of the procs in that job on that node.  So if 
you

launch VPIDs 5, 6, 7, 8 on a node, 5 would go to processor 0, 6 would
go to processor 1, etc. (it's an easy, locally-determined ordering).


   You'd need to be careful with dual-core cpus.  Say you launch a 4
task MPI job on a 4-socket dual core Opteron.  You'd want to schedule
the tasks on nodes 0, 2, 4, 6 - not 0, 1, 2, 3 to get maximum memory
bandwidth to each MPI task.


With the potential for non-trivial logic like this, perhaps the extra
work for a real framework would be justified, then.

   Also, how would this work with hybrid MPI+threading (either 
pthreads

or OpenMP) applications?  Let's say you have an 8 or 16 cpu node and
you
start up 2 MPI tasks with 4 compute threads in each task.  The optimum
layout may not be running the MPI tasks on cpu's 0 and 1.  Several
hybrid applications that ran on ASC White and now Purple will have
these
requirements.


Hum.  Good question.  The MPI API doesn't really address this -- the
MPI API is not aware of additional threads that are created until you
call an MPI function (and even then, we're not currently checking which
thread is calling -- that would just add latency).

What do these applications do right now?  Do they set their own
processor / memory affinity?  This might actually be outside the scope
of MPI...?  (I'mm not trying to shrug off responsibility, but this
might be a case where the MPI simply doesn't have enough information,
and to get that information [e.g., via MPI attributes or MPI info
arguments] would be more hassle than the user just setting the affinity
themselves...?)

Comments?


If you set things up such that you can specify input parameters on where
to put each process, you have the flexibility you want.  The locality 
API's
I have seen all mimiced the IRIX API, which had these capabilities.  If 
you

want some ideas, look at LA-MPI, it does this - the implementation is
pretty strange (just he coding), but it is there.

Rich



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[O-MPI devel] collectives discussion @LANL

2005-07-18 Thread Jeff Squyres
Did we ever set a day/time for the collectives meeting at LANL next 
week?  (we may have and I've forgotten...?)


I ask because those of us involved in that meeting will need to reserve 
time on the Access Grid and coordinate with the remote sites for 
participation.


AFAIK, we have 2 remote sites, right?

- UTK, for Graham Fagg (and others?)
- ?? for Torsten (Torsten: you mentioned where before, but I don't 
recall the name of the site)


Are there others?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/