Re: [OMPI devel] [devel-core] Collective Communications Optimization - MeetingScheduled in Albuquerque!

2007-07-10 Thread Galen Shipman
Hotels near the airport / university area, I pulled this off of this  
site: http://www.airnav.com/airport/KABQ



Miles
 Price ($)
FAIRFIELD INN BY MARRIOTT ALBUQUERQUE UNIVERSITY AREA

 4.8
79-80
COMFORT INN AIRPORT

 1.3
52-101
COURTYARD BY MARRIOTT ALBUQUERQUE AIRPORT

 1.6
74-139
SLEEP INN AIRPORT

 1.6
45-71
LA QUINTA INN ALBUQUERQUE AIRPORT

 1.4
69-91
AMERISUITES ALBUQUERQUE AIRPORT

 1.5
59-109
RAMADA LTD ALBUQUERQUE AIRPORT

 1.7
62-89
SUBURBAN EXTENDED STAY

 4.6
51-52
HOWARD JOHNSON - ALBUQUERQUE (EAST)

 5.4
54-69
WYNDHAM ALBUQUERQUE AIRPORT

 1.0
80-109
COMFORT INN EAST

 6.3
55-56
PARK PLAZA HOTEL ALBUQUERQUE

 4.7
60-61

Other hotels near Albuquerque International Sunport Airport


Miles
 Price ($)
HILTON GARDEN INN ALBUQUERQUE AIRPORT

 1.2
85-180
BEST WESTERN INNSUITES (AIRPORT)

 1.2
49-71
HAMPTON INN ALBUQERQUE AP

 1.4
71-105
ESA ALBUQUERQUE-AIRPORT

 1.6
54-70
HAWTHORN INN & SUITES - ALBUQUERQUE (AIRPORT)

 1.8
69-70
QUALITY SUITES ALBUQUERQUE

 1.8
58-65
VAGABOND EXECUTIVE INN - FORMERLY THE AIRPORT UNIVERSITY INN

 1.9
49-56
COUNTRY INN & SUITES BY CARLSON - ALBUQUERQUE AIRPORT

 1.9
79-109
HOMEWOOD STE ALBUQUERQUE ARPT


On Jul 10, 2007, at 2:49 PM, Gil Bloch wrote:

What time do we plan to start on Aug. 6? I am trying to figure out  
if I have to be there the day before.


Also, is there any specific hotel you would recommend?

Regards,
Gil Bloch


-Original Message-
From: devel-core-boun...@open-mpi.org [mailto:devel-core- 
boun...@open-mpi.org] On Behalf Of Galen Shipman

Sent: ב 09 יולי 2007 15:44
To: Open MPI Developers
Subject: Re: [devel-core] Collective Communications Optimization -  
MeetingScheduled in Albuquerque!



All,

I have confirmed the meeting to be held at the HPC facility at UNM on
Aug 6,7,8.

Here is a link to the HPC center:

http://www.hpc.unm.edu/

Here is the visitor information link:

http://www.hpc.unm.edu/info/visitor-information


I hope everyone who expressed interest is able to attend!

Thanks,

Galen







On Jun 29, 2007, at 6:23 PM, Galen Shipman wrote:



So we are looking at a change of venue for this meeting.
Santa Fe turned out to be a bit too costly in terms of hotel rooms
for some participants.
I am looking into getting the HPC conference room in Albuquerque.
This is a convenient location for most and the hotels are cheaper.
I am firming up the details with the new HPC director at UNM, the
dates will remain August 6,7,8.

Thanks,

Galen


On Jun 6, 2007, at 2:43 PM, Galen Shipman wrote:


Updated Attendees as of  June 6th

(5 tentative, 12 confirmed):

Cisco
Jeff (tentative)

IU
Tim
Andrew (tentative)
Josh (tentative)
Torsten

LANL
Brian
Ollie
Galen

Mellanox
Gil

Myricom
Patrick (tentative)

ORNL
Rich

SNL
Ron

UH
Edgar

UT
George
Jelena (tentative)

SUN
Rolf


QLogic
Christian



On Jun 5, 2007, at 10:10 AM, Galen Shipman wrote:



Sorry for duplicate (included a reasonable subject line):

Okay, so we tried to get the Hilton at a reasonable rate, didn't
happen. Instead we got the eldorado hotel:

http://www.eldoradohotel.com/

So the meeting will be held here.

The room rates at the hotel are probably a bit high, but there  
are a

number of other hotels in and around the area. I will try to get a
list from our admin.

I have the following attendees so far, if you are on the list and
marked as tentative please let me know ASAP if you are definitely
coming. If you are on the list and not marked as tentative, then we
are expecting you, so please let me know today if you are unable to
make it.

This should be a good meeting, we will be located in the heart of
Santa Fe so travel will be easier (you still need a car from ABQ  
but

it is less than one hour) and there are lots of things to do/see
before and after the meetings.

Thanks,

Galen




Updated Attendees (15 in total):

Cisco
Jeff (tentative)

IU
Tim
Andrew (tentative)
Josh (tentative)
Torsten

LANL
Brian
Ollie
Galen

Mellanox
Gil

Myricom
Patrick (tentative)

ORNL
Rich

SNL
Ron

UH
Edgar

UT
George
Jelena

___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core


___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core




___
devel-core mailing list
devel-c...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core




Re: [OMPI devel] ticket 1023

2007-07-10 Thread Sharon Melamed
Hi Ralph,

My answer has 2 parts:

1. I'm not familiar with bproc but I assume that when working with bproc
there is a component that reads the RMAPS information somehow and
launches the local process. In that case you can add the affinity there
according the slot_list (new member in the map) from the RMAP. In any
case you mast add the mapping form the user map to the actual cpu_set
bitmap, on the end node because the head node haven't the information of
the internal structure in each and every end node in the grid.

2. My changes did not changed anything in the way that orte works today
it just added some functionality. You don't have to use this new
functionality you can still work as you work today. 


Sharon.

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Ralph H Castain
Sent: Tuesday, July 10, 2007 6:31 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] ticket 1023

Ah, I see the problem. I think you have misunderstood the ODLS
framework.

The ODLS is the "Orte Daemon Launch Subsystem" and is used by the orted
to
launch the local procs. Mpirun also accesses the ODLS, but only to
construct
the add_procs message that gets sent to the daemons.

The problem is, therefore, that systems which do not use the orteds to
actually launch the backend processes will not have access to the ODLS
on
the backend machines. Instead, they use their own internal mechanism for
launching the remote processes. Bproc is an example of this mode of
operation.

So if the mapping is in the ODLS component, then systems that do not use
the
orted will not be able to map rank to processor. Does this mean they
cannot
set affinity?

For example, this change appears to break bproc's ability to do affinity
since bproc launches the local procs outside of the orteds - is this
true,
or can I set affinity without going through the ODLS? That would be an
issue
for LANL, I believe.

Thanks
Ralph



On 7/10/07 9:18 AM, "Sharon Melamed"  wrote:

> Hi Ralph,
> 
> The responsibility for mapping rank to processor is in the ODLS
> component.
> I didn't touch the orted code.
> 
> If you doesn't use orted - you steel use the ODLS component (like ODLS
> bproc). Any way you mast have a component in the end machine that
builds
> the orte_odls_child_t structure from the RMAPS information and launch
> the local processes. Currently this component is the ODLS. Most of my
> work is in the ODLS component so if you decide to eliminate the orteds
> you mast, somehow, preserve the ODLS functionality.
> 
> Sharon.
> 
>   
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On
> Behalf Of Ralph H Castain
> Sent: Tuesday, July 10, 2007 4:43 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] ticket 1023
> 
> As I understood our original discussions, this would move
responsibility
> for
> mapping rank to processor back into the orted - is that still true?
> 
> Reason I ask is to again clarify for people if we are doing so as it
(a)
> impacts those systems that don't use our orteds (e.g., will affinity
> still
> work in those environments?); and (b) it will make elimination of the
> orteds
> just a little more difficult.
> 
> So could you please clarify for everyone what this code functionally
> does?
> All 1023 does is layout syntax - it doesn't clearly state what happens
> where.
> 
> Thanks
> Ralph
> 
> 
> 
> On 7/10/07 7:32 AM, "Sharon Melamed"  wrote:
> 
>> Hello All,
>> 
>>  
>> In the recent few weeks I implemented ticket 1023
>> (https://svn.open-mpi.org/trac/ompi/ticket/1023
>>  ).
>> 
>> In a few words, the purpose of ticket 1023 is to expand the hostfile
> syntax to
>> precisely specify slot
>> location (in terms of virtual CPU ID or socket core notation) in the
> node
>> and/or rank in a MCW.
>> 
>>  
>> 
>> The code is in a temporary branch
>> https://svn.open-mpi.org/svn/ompi/tmp/sharon/
>> 
>> The changes are:
>> 
>> 1. In the RAS base component:
>>a. Added new list of orte_ras_proc_t structures
>>b. Each orte_ras_proc_t structure contains 3 members: node_name,
> rank and
>> cpu_list.
>>c. the cpu_list is a string representing the slot list from the
> hostfile
>> i.e.: if the 
>>   SLOT token in the hostfile is - SLOT=1@2:1,3:1-4, the slot_list
> string
>> is: 2:1,3:7-9.
>>  
>> 2. In the RDS hostfile component:
>>a. Added new token SLOT to the lex parser.
>>b. filling the orte_ras_proc_t structure list according the SLOT
> token in
>> the hostfile.
>>  
>> 3. In the RMAPS round robin component:
>>a. Added new member to orte_mapped_node_t structure - slot_list
> (similar to
>> the slot_list 
>>   in the orte_ras_proc_t structure)
>>b. in the orte_rmaps_rr_map, mapping job according to hostfile
> ranks before
>> mapping the job
>>   by slot or by node.
>>c. in the orte_rmaps_rr_map, arranging the MCW 

Re: [OMPI devel] Bproc support

2007-07-10 Thread Jeff Squyres

Do you feel like updating http://www.open-mpi.org/faq/?category=bproc ?

:D


On Jul 10, 2007, at 10:17 AM, Ralph H Castain wrote:


Yo all

I have upgraded the support for Bproc on the Open MPI trunk as of  
r15328.


We now support Bproc environments that do not utilize resource  
managers - in
these cases, we will allow the user to launch on all nodes upon  
which they
have execution authorities. Please note that, if you login to your  
system
via multiple windows, any applications you execute in the different  
windows
may well overlap their use of resources as Open MPI has no way of  
knowing

what an mpirun in another window is doing.

I also have attempted to provide BJS support, but I have no way of  
testing

it - I would appreciate feedback from anyone that does.

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

Point taken.

Is this an accurate summary?

1. "Best practices" should be documented, to include sysadmins  
specifically itemizing what components should be used on their  
systems (e.g., in an environment variable or the system-wide MCA  
parameters file).


2. It may be useful to have some high-level parameters to specify a  
specific run-time environment, since ORTE has multiple, related  
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or  
somesuch.



On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:

Actually, I was talking specifically about configuration at build  
time. I
realize there are trade-offs here, and suspect we can find a common  
ground.

The problem with using the options Jeff described is that they require
knowledge on the part of the builder as to what environments have  
had their
include files/libraries installed on the file system of this  
particular

machine. And unfortunately, not every component is protected by these
"sentinel" variables, nor does it appear possible to do so in a  
"guaranteed

safe" manner.

Note that I didn't say "installed on their machine". In most cases,  
these
alternative environments are not currently installed at all - they  
are stale
files, or were placed on the file system by someone that wanted to  
look at
their documentation, or whatever. The problem is that Open MPI  
blindly picks
them up and attempts to use them, with sometimes disastrous and  
frequently

unpredictable ways.

Hence, the user can be "astonished" to find that an application  
that worked
perfectly yesterday suddenly segfaults today - because someone  
decided one
day, for example, to un-tar the bproc files in a public place where  
we pick
them up, and then someone else (perhaps a sys admin or the user  
themselves)

at some later time rebuilt Open MPI to bring in an update.

Now imagine being a software provider who gets the call about a  
problem with

Open MPI and has to figure out what the heck happened

My suggested solution may not be the best, which is why I put it  
out there
for discussion. One alternative might be for us to instruct sys  
admins to

put MCA params in their default param file that force selection of the
proper components for each framework. Thus, someone with an lsf  
system would
enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that  
only lsf

was used.

The negative to that approach is that we would have to warn  
everyone any
time that list changed (e.g., a new component for a new framework).  
Another
option to help that problem, of course, would be to set one mca  
param (say
something like "enviro=lsf") that we would use internal to Open MPI  
to set

the individual components correctly - i.e., we would hold the list of
relevant frameworks internally since (hopefully) we know what they  
should be

for a given environment.

Anyway, I'm glad people are looking at this and suggesting  
solutions. It is
a problem that seems to be biting us recently and may become a  
bigger issue

as the user community grows.

Ralph


On 7/10/07 6:12 AM, "Bogdan Costescu"
 wrote:


On Tue, 10 Jul 2007, Jeff Squyres wrote:


Do either of these work for you?


Will report back in a bit, I'm now in the middle of an OS upgrade on
the cluster.

But my question was more like: is this a configuration that should
theoretically work ? Or in other words, are there known dependencies
on rsh that would make a rsh-less build not work or work with reduced
functionality ?


Most batch systems today set a sentinel environment variable that we
check for.


I think that we talk about slightly different things - my impression
was that the OP was asking about detection at config time, while your
statements make perfect sense to me if they are relative to detection
at run-time. If the OP was indeed asking about run-time detection,
then I apologize for the time you wasted on reading and replying  
to my

questions...


That's what the compile-time vs. run-time detection and selection is
supposed to be for.


Yes, I understand that, it's the same type of mechanism as in LAM/MPI
which it's not that foreign to me ;-)



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 9:51 AM, Brian Barrett wrote:


Actually, there are --with-slurm/--without-slurm options. We
default to
building slurm support automatically on linux and aix, but not on
other
platforms.


On a mostly unrelated note...  We should probably also now build the
SLURM component for OS X, since SLURM is now available for OS X as
well.  And probably should also check for SLURM's srun and build if
we find it even if we aren't on Linux, AIX, or OS X.


Hah.  So SLURM isn't really a SLURM (Simple Linux Utility for  
Resource Management) after all, eh?  :-)


Point noted, though -- someone file a feature enhancement...

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] ticket 1023

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 11:18 AM, Sharon Melamed wrote:


The responsibility for mapping rank to processor is in the ODLS
component.
I didn't touch the orted code.

If you doesn't use orted - you steel use the ODLS component (like ODLS
bproc). Any way you mast have a component in the end machine that  
builds

the orte_odls_child_t structure from the RMAPS information and launch
the local processes. Currently this component is the ODLS. Most of my
work is in the ODLS component so if you decide to eliminate the orteds
you mast, somehow, preserve the ODLS functionality.


...if you want processor affinity.

Is that correct?  If you don't care about processor affinity and  
you're not using orted's/ODLS, then any extra mappings will be safely  
ignored, right?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
Ah, I see the problem. I think you have misunderstood the ODLS framework.

The ODLS is the "Orte Daemon Launch Subsystem" and is used by the orted to
launch the local procs. Mpirun also accesses the ODLS, but only to construct
the add_procs message that gets sent to the daemons.

The problem is, therefore, that systems which do not use the orteds to
actually launch the backend processes will not have access to the ODLS on
the backend machines. Instead, they use their own internal mechanism for
launching the remote processes. Bproc is an example of this mode of
operation.

So if the mapping is in the ODLS component, then systems that do not use the
orted will not be able to map rank to processor. Does this mean they cannot
set affinity?

For example, this change appears to break bproc's ability to do affinity
since bproc launches the local procs outside of the orteds - is this true,
or can I set affinity without going through the ODLS? That would be an issue
for LANL, I believe.

Thanks
Ralph



On 7/10/07 9:18 AM, "Sharon Melamed"  wrote:

> Hi Ralph,
> 
> The responsibility for mapping rank to processor is in the ODLS
> component.
> I didn't touch the orted code.
> 
> If you doesn't use orted - you steel use the ODLS component (like ODLS
> bproc). Any way you mast have a component in the end machine that builds
> the orte_odls_child_t structure from the RMAPS information and launch
> the local processes. Currently this component is the ODLS. Most of my
> work is in the ODLS component so if you decide to eliminate the orteds
> you mast, somehow, preserve the ODLS functionality.
> 
> Sharon.
> 
>   
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
> Behalf Of Ralph H Castain
> Sent: Tuesday, July 10, 2007 4:43 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] ticket 1023
> 
> As I understood our original discussions, this would move responsibility
> for
> mapping rank to processor back into the orted - is that still true?
> 
> Reason I ask is to again clarify for people if we are doing so as it (a)
> impacts those systems that don't use our orteds (e.g., will affinity
> still
> work in those environments?); and (b) it will make elimination of the
> orteds
> just a little more difficult.
> 
> So could you please clarify for everyone what this code functionally
> does?
> All 1023 does is layout syntax - it doesn't clearly state what happens
> where.
> 
> Thanks
> Ralph
> 
> 
> 
> On 7/10/07 7:32 AM, "Sharon Melamed"  wrote:
> 
>> Hello All,
>> 
>>  
>> In the recent few weeks I implemented ticket 1023
>> (https://svn.open-mpi.org/trac/ompi/ticket/1023
>>  ).
>> 
>> In a few words, the purpose of ticket 1023 is to expand the hostfile
> syntax to
>> precisely specify slot
>> location (in terms of virtual CPU ID or socket core notation) in the
> node
>> and/or rank in a MCW.
>> 
>>  
>> 
>> The code is in a temporary branch
>> https://svn.open-mpi.org/svn/ompi/tmp/sharon/
>> 
>> The changes are:
>> 
>> 1. In the RAS base component:
>>a. Added new list of orte_ras_proc_t structures
>>b. Each orte_ras_proc_t structure contains 3 members: node_name,
> rank and
>> cpu_list.
>>c. the cpu_list is a string representing the slot list from the
> hostfile
>> i.e.: if the 
>>   SLOT token in the hostfile is - SLOT=1@2:1,3:1-4, the slot_list
> string
>> is: 2:1,3:7-9.
>>  
>> 2. In the RDS hostfile component:
>>a. Added new token SLOT to the lex parser.
>>b. filling the orte_ras_proc_t structure list according the SLOT
> token in
>> the hostfile.
>>  
>> 3. In the RMAPS round robin component:
>>a. Added new member to orte_mapped_node_t structure - slot_list
> (similar to
>> the slot_list 
>>   in the orte_ras_proc_t structure)
>>b. in the orte_rmaps_rr_map, mapping job according to hostfile
> ranks before
>> mapping the job
>>   by slot or by node.
>>c. in the orte_rmaps_rr_map, arranging the MCW ranks according to
> the
>> hostfile.
>>  
>> 4. in the ODLS default module:
>>a. Added slot_list to orte_odls_default_get_add_procs_data.
>>b. Added slot_list to orte_odls_default_launch_local_procs.
>>c. Added new member to the child structure a cpu_set bitmap (for
> PLPA)
>>d. Added mapping of the slot_list string to a cpu_set bitmap in the
> child
>> structure.  
>>  
>> For more details you can browse the code.
>>  
>> I would like to merge these changes to the trunk as soon as possible
> since, as
>> I understood from Ralph Castain emails,
>> The Open RTE will go through a lot of changes in the near future and
> since
>> this is a relatively small change I want to merge
>> it before the big change.
>>  
>> Any comments?
>>  
>> Sharon.
>>  
>>
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> 

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Brian Barrett

On Jul 10, 2007, at 7:09 AM, Tim Prins wrote:


Jeff Squyres wrote:

2. The "--enable-mca-no-build" option takes a comma-delimited list of
components that will then not be built.  Granted, this option isn't
exactly intuitive, but it was the best that we could think of at the
time to present a general solution for inhibiting the build of a
selected list of components.  Hence, "--enable-mca-no-build=pls-
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS
components (note that the SLURM components currently do not require
any additional libraries, so a) there is no corresponding --with 
[out]-

slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We  
default to
building slurm support automatically on linux and aix, but not on  
other

platforms.


On a mostly unrelated note...  We should probably also now build the  
SLURM component for OS X, since SLURM is now available for OS X as  
well.  And probably should also check for SLURM's srun and build if  
we find it even if we aren't on Linux, AIX, or OS X.


Brian


Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
As I understood our original discussions, this would move responsibility for
mapping rank to processor back into the orted - is that still true?

Reason I ask is to again clarify for people if we are doing so as it (a)
impacts those systems that don't use our orteds (e.g., will affinity still
work in those environments?); and (b) it will make elimination of the orteds
just a little more difficult.

So could you please clarify for everyone what this code functionally does?
All 1023 does is layout syntax - it doesn't clearly state what happens
where.

Thanks
Ralph



On 7/10/07 7:32 AM, "Sharon Melamed"  wrote:

> Hello All,
> 
>  
> In the recent few weeks I implemented ticket 1023
> (https://svn.open-mpi.org/trac/ompi/ticket/1023
>  ).
> 
> In a few words, the purpose of ticket 1023 is to expand the hostfile syntax to
> precisely specify slot
> location (in terms of virtual CPU ID or socket core notation) in the node
> and/or rank in a MCW.
> 
>  
> 
> The code is in a temporary branch
> https://svn.open-mpi.org/svn/ompi/tmp/sharon/
> 
> The changes are:
> 
> 1. In the RAS base component:
>a. Added new list of orte_ras_proc_t structures
>b. Each orte_ras_proc_t structure contains 3 members: node_name, rank and
> cpu_list.
>c. the cpu_list is a string representing the slot list from the hostfile
> i.e.: if the 
>   SLOT token in the hostfile is - SLOT=1@2:1,3:1-4, the slot_list string
> is: 2:1,3:7-9.
>  
> 2. In the RDS hostfile component:
>a. Added new token SLOT to the lex parser.
>b. filling the orte_ras_proc_t structure list according the SLOT token in
> the hostfile.
>  
> 3. In the RMAPS round robin component:
>a. Added new member to orte_mapped_node_t structure - slot_list (similar to
> the slot_list 
>   in the orte_ras_proc_t structure)
>b. in the orte_rmaps_rr_map, mapping job according to hostfile ranks before
> mapping the job
>   by slot or by node.
>c. in the orte_rmaps_rr_map, arranging the MCW ranks according to the
> hostfile.
>  
> 4. in the ODLS default module:
>a. Added slot_list to orte_odls_default_get_add_procs_data.
>b. Added slot_list to orte_odls_default_launch_local_procs.
>c. Added new member to the child structure a cpu_set bitmap (for PLPA)
>d. Added mapping of the slot_list string to a cpu_set bitmap in the child
> structure.  
>  
> For more details you can browse the code.
>  
> I would like to merge these changes to the trunk as soon as possible since, as
> I understood from Ralph Castain emails,
> The Open RTE will go through a lot of changes in the near future and since
> this is a relatively small change I want to merge
> it before the big change.
>  
> Any comments?
>  
> Sharon.
>  
>
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel





[OMPI devel] ticket 1023

2007-07-10 Thread Sharon Melamed
Hello All,

 

In the recent few weeks I implemented ticket 1023
(https://svn.open-mpi.org/trac/ompi/ticket/1023
 ).

In a few words, the purpose of ticket 1023 is to expand the hostfile
syntax to precisely specify slot
location (in terms of virtual CPU ID or socket core notation) in the
node and/or rank in a MCW.

 

The code is in a temporary branch
https://svn.open-mpi.org/svn/ompi/tmp/sharon/ 

The changes are:

1. In the RAS base component:

   a. Added new list of orte_ras_proc_t structures

   b. Each orte_ras_proc_t structure contains 3 members: node_name, rank
and cpu_list.

   c. the cpu_list is a string representing the slot list from the
hostfile i.e.: if the 

  SLOT token in the hostfile is - SLOT=1@2:1,3:1-4, the slot_list
string is: 2:1,3:7-9.

 

2. In the RDS hostfile component:

   a. Added new token SLOT to the lex parser.

   b. filling the orte_ras_proc_t structure list according the SLOT
token in the hostfile.

 

3. In the RMAPS round robin component:

   a. Added new member to orte_mapped_node_t structure - slot_list
(similar to the slot_list 

  in the orte_ras_proc_t structure)

   b. in the orte_rmaps_rr_map, mapping job according to hostfile ranks
before mapping the job

  by slot or by node.

   c. in the orte_rmaps_rr_map, arranging the MCW ranks according to the
hostfile.

 

4. in the ODLS default module:

   a. Added slot_list to orte_odls_default_get_add_procs_data.

   b. Added slot_list to orte_odls_default_launch_local_procs.

   c. Added new member to the child structure a cpu_set bitmap (for
PLPA)

   d. Added mapping of the slot_list string to a cpu_set bitmap in the
child structure.  

 

For more details you can browse the code.

 

I would like to merge these changes to the trunk as soon as possible
since, as I understood from Ralph Castain emails,

The Open RTE will go through a lot of changes in the near future and
since this is a relatively small change I want to merge 

it before the big change.

 

Any comments?

 

Sharon.

 

   



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Tim Prins

Jeff Squyres wrote:
2. The "--enable-mca-no-build" option takes a comma-delimited list of  
components that will then not be built.  Granted, this option isn't  
exactly intuitive, but it was the best that we could think of at the  
time to present a general solution for inhibiting the build of a  
selected list of components.  Hence, "--enable-mca-no-build=pls- 
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS  
components (note that the SLURM components currently do not require  
any additional libraries, so a) there is no corresponding --with[out]- 
slurm option, and b) they are usually always built).


Actually, there are --with-slurm/--without-slurm options. We default to 
building slurm support automatically on linux and aix, but not on other 
platforms.


Tim



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
Actually, I was talking specifically about configuration at build time. I
realize there are trade-offs here, and suspect we can find a common ground.
The problem with using the options Jeff described is that they require
knowledge on the part of the builder as to what environments have had their
include files/libraries installed on the file system of this particular
machine. And unfortunately, not every component is protected by these
"sentinel" variables, nor does it appear possible to do so in a "guaranteed
safe" manner.

Note that I didn't say "installed on their machine". In most cases, these
alternative environments are not currently installed at all - they are stale
files, or were placed on the file system by someone that wanted to look at
their documentation, or whatever. The problem is that Open MPI blindly picks
them up and attempts to use them, with sometimes disastrous and frequently
unpredictable ways.

Hence, the user can be "astonished" to find that an application that worked
perfectly yesterday suddenly segfaults today - because someone decided one
day, for example, to un-tar the bproc files in a public place where we pick
them up, and then someone else (perhaps a sys admin or the user themselves)
at some later time rebuilt Open MPI to bring in an update.

Now imagine being a software provider who gets the call about a problem with
Open MPI and has to figure out what the heck happened

My suggested solution may not be the best, which is why I put it out there
for discussion. One alternative might be for us to instruct sys admins to
put MCA params in their default param file that force selection of the
proper components for each framework. Thus, someone with an lsf system would
enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that only lsf
was used.

The negative to that approach is that we would have to warn everyone any
time that list changed (e.g., a new component for a new framework). Another
option to help that problem, of course, would be to set one mca param (say
something like "enviro=lsf") that we would use internal to Open MPI to set
the individual components correctly - i.e., we would hold the list of
relevant frameworks internally since (hopefully) we know what they should be
for a given environment.

Anyway, I'm glad people are looking at this and suggesting solutions. It is
a problem that seems to be biting us recently and may become a bigger issue
as the user community grows.

Ralph


On 7/10/07 6:12 AM, "Bogdan Costescu"
 wrote:

> On Tue, 10 Jul 2007, Jeff Squyres wrote:
> 
>> Do either of these work for you?
> 
> Will report back in a bit, I'm now in the middle of an OS upgrade on
> the cluster.
> 
> But my question was more like: is this a configuration that should
> theoretically work ? Or in other words, are there known dependencies
> on rsh that would make a rsh-less build not work or work with reduced
> functionality ?
> 
>> Most batch systems today set a sentinel environment variable that we
>> check for.
> 
> I think that we talk about slightly different things - my impression
> was that the OP was asking about detection at config time, while your
> statements make perfect sense to me if they are relative to detection
> at run-time. If the OP was indeed asking about run-time detection,
> then I apologize for the time you wasted on reading and replying to my
> questions...
> 
>> That's what the compile-time vs. run-time detection and selection is
>> supposed to be for.
> 
> Yes, I understand that, it's the same type of mechanism as in LAM/MPI
> which it's not that foreign to me ;-)




Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 6:07 AM, Bogdan Costescu wrote:


For example, I can readily find machines that are running TM, but
also have LSF and SLURM libraries installed (although those
environments are not "active" - the libraries in some cases are old
and stale, usually present because either someone wanted to look at
them or represent an old installation).


Whatever the outcome of this discussion is, please keep in mind that
this represents an exception rather than the rule. So the common cases
of no batch environment or one batch environment installed should work
as effortless as possible. Furthermore, keep in mind that there are
lots of people who don't compile themselves Open MPI, but rely on
packages compiled by others (Linux distributions, most likely) - so
don't make life harder for those who produce these packages.


FWIW, this is exactly the reason that we have the "auto as much as  
possible" behavior today; back in LAM/MPI, we had the problem that  
[many] users would say "I built LAM, but it doesn't support ABC, even  
though your manual says that it does!  LAM's a piece of junk!"  The  
sad fact is that most people assume that "./configure && make  
install" will do all the Right magic for their system; efforts at  
education seemed to fail.  So we took the path of least resistance  
and assumed that if we can find it on your system, we should use it.   
Specifically: it was more of a support issue than anything else.



1. ... we would only build support for those environments that the
builder specifies, and error out of the build process if multiple
conflicting environments are specified.


I think that Ralf's suggestion (auto unless forced) is better, as it
allows:
- a better chance of finding the environments for people who don't
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they
are doing


This raises the issue of what to do with rsh, but I think we can
handle that one by simply building it wherever possible.


I've been meaning to ask this for some time: is it possible to get rid
of rsh support when building/running in an environment where rsh is
not used (like a TM-based one) ? I'm not trying to achieve security by
doing this (after all, a user can build a separate copy of Open MPI
with rsh support), but just to make sure that the programs that I
build are either using the "blessed" start-up mechanism or error out.


Do either of these work for you?

1. Use the --enable-mca-no-build option as I discussed in a mail a  
few minutes ago.

2. Remove the "mca_pls_rsh.*" files in $prefix/lib/openmpi.

2. We could laboriously go through all the components and ensure  
that they

check in their selection logic to see if that environment is active.


I might be missing something in the design of batch systems or
software in general, but how do you decide that an environment is
active or not ?


Most batch systems today set a sentinel environment variable that we  
check for.



Can a library check if it's being used in a program ?
Or if that program actually runs ? And if a configuration file exists,
does it mean that the environment is actually active ?


We do not generally assume that the presence of a plugin means that  
that plugin can run in the current environment.  I thought that all  
framework selection logic was adapted to this philosophy, but  
apparently Ralph is indicating that some do not.  :-)



How to deal
with the case where there are several versions of the same batch
system installed, all using the same configuration files and therefore
being ready to run ?


We assume that Open MPI was built compiling/linking against the Right  
version.  There's not much else we can do if you build against the  
Wrong version.



And how about the case where there is a machine
reserved for compilations, where libraries are made available but
there is no batch system active ?


That's what the compile-time vs. run-time detection and selection is  
supposed to be for.  The presence of an OMPI component at run-time is  
not supposed to mean that it can run; it's supposed to be queried and  
the component can do whatever checks it wants to see if it is  
supposed to run, and then report "Yes, I can run" / "No, I cannot  
run" back to Open MPI.


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralf Wildenhues
* Jeff Squyres wrote on Tue, Jul 10, 2007 at 01:28:40PM CEST:
> On Jul 10, 2007, at 2:42 AM, Ralf Wildenhues wrote:
> 
> >> 1. The most obvious one (to me, at least) is to require that  
> >> people provide
> >> "--with-xx" when they build the system.
> >
> > I'll throw in another one for good measure: If --with-xx is given,
> > build with the component.  If --without-xx is given, disable it.
> > If neither is given, do as you currently do: enable it if you find
> > suitable libraries.
> 
> FWIW, we have this already:
[...]

Ah good, I must confess I wasn't aware of this existing functionality
(I don't need it myself).  Thanks.

> > In case the number of components gets too large, have a switch to
> > turn off automatic discovery even in the nonpresence of --with* flags.
> 
> Did you mean the equivalent of the --enable-mca-no-build switch, or  
> disable *all* automatic discovery?

I meant: disable all automatic discovery.  But you guys are in a much
better position to decide which way is more useful.  All I wanted is for
you to be aware of existing possibilities (which you are, but that
wasn't obvious to me).

Cheers,
Ralf


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Jeff Squyres

On Jul 10, 2007, at 2:42 AM, Ralf Wildenhues wrote:

1. The most obvious one (to me, at least) is to require that  
people provide

"--with-xx" when they build the system.


I'll throw in another one for good measure: If --with-xx is given,
build with the component.  If --without-xx is given, disable it.
If neither is given, do as you currently do: enable it if you find
suitable libraries.


FWIW, we have this already:

1. If a particular component needs some additional libraries such  
that we added a "--with-foo" switch to specify where those libraries  
can be found, there is also an implicit "--without-foo" switch that  
will disable that component.  E.g., "--without-tm" will inhibit the  
building of the TM RAS and PLS components.


2. The "--enable-mca-no-build" option takes a comma-delimited list of  
components that will then not be built.  Granted, this option isn't  
exactly intuitive, but it was the best that we could think of at the  
time to present a general solution for inhibiting the build of a  
selected list of components.  Hence, "--enable-mca-no-build=pls- 
slurm,ras-slurm" would inhibit building the SLURM RAS and PLS  
components (note that the SLURM components currently do not require  
any additional libraries, so a) there is no corresponding --with[out]- 
slurm option, and b) they are usually always built).



In case the number of components gets too large, have a switch to
turn off automatic discovery even in the nonpresence of --with* flags.


Did you mean the equivalent of the --enable-mca-no-build switch, or  
disable *all* automatic discovery?  I'm not sure that disabling all  
automatic discovery will be useful -- you'd have to specifically list  
each component that would be built, and that list would be pretty  
darn long...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Bogdan Costescu

On Mon, 9 Jul 2007, Ralph Castain wrote:

For example, I can readily find machines that are running TM, but 
also have LSF and SLURM libraries installed (although those 
environments are not "active" - the libraries in some cases are old 
and stale, usually present because either someone wanted to look at 
them or represent an old installation).


Whatever the outcome of this discussion is, please keep in mind that 
this represents an exception rather than the rule. So the common cases 
of no batch environment or one batch environment installed should work 
as effortless as possible. Furthermore, keep in mind that there are 
lots of people who don't compile themselves Open MPI, but rely on 
packages compiled by others (Linux distributions, most likely) - so 
don't make life harder for those who produce these packages.


1. ... we would only build support for those environments that the 
builder specifies, and error out of the build process if multiple 
conflicting environments are specified.


I think that Ralf's suggestion (auto unless forced) is better, as it 
allows:
- a better chance of finding the environments for people who don't 
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they 
are doing


This raises the issue of what to do with rsh, but I think we can 
handle that one by simply building it wherever possible.


I've been meaning to ask this for some time: is it possible to get rid 
of rsh support when building/running in an environment where rsh is 
not used (like a TM-based one) ? I'm not trying to achieve security by 
doing this (after all, a user can build a separate copy of Open MPI 
with rsh support), but just to make sure that the programs that I 
build are either using the "blessed" start-up mechanism or error out.



2. We could laboriously go through all the components and ensure that they
check in their selection logic to see if that environment is active.


I might be missing something in the design of batch systems or 
software in general, but how do you decide that an environment is 
active or not ? Can a library check if it's being used in a program ? 
Or if that program actually runs ? And if a configuration file exists, 
does it mean that the environment is actually active ? How to deal 
with the case where there are several versions of the same batch 
system installed, all using the same configuration files and therefore 
being ready to run ? And how about the case where there is a machine 
reserved for compilations, where libraries are made available but 
there is no batch system active ?


--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralf Wildenhues
Hello Ralph,

* Ralph Castain wrote on Tue, Jul 10, 2007 at 03:51:06AM CEST:
> 
> The problem is that our Open MPI build system automatically detects the
> presence of those libraries, builds the corresponding components, and then
> links those libraries into our system. Unfortunately, this causes two
> side-effects:
[...]
> A couple of solutions come immediately to mind:
> 
> 1. The most obvious one (to me, at least) is to require that people provide
> "--with-xx" when they build the system.

I'll throw in another one for good measure: If --with-xx is given,
build with the component.  If --without-xx is given, disable it.
If neither is given, do as you currently do: enable it if you find
suitable libraries.

In case the number of components gets too large, have a switch to
turn off automatic discovery even in the nonpresence of --with* flags.

It may be a bit more work on the Open MPI configury, but it may be
more convenient for your users.

2 cents from somebody who's not going to have to implement it.  ;-)

Cheers,
Ralf