Re: [OMPI devel] Configure and contrib pkgs

2008-03-03 Thread Jeff Squyres

On Mar 3, 2008, at 1:16 PM, Ralph H Castain wrote:

We have several options in configure that take lists as their  
argument. Yet
there appears to be no way for a user to find out valid members for  
those

lists.

For example, we have the option --enable-contrib-no-build. Is there  
some way
that the user and/or sys admin can find out what contributed  
packages are in

this distribution? Do they have to just leaf through the source code,
assuming that they know where the contributed packages are stored?


No.  This is probably worth a feature request.

Also, is there some way that configure can output "here are the  
packages I
am going to build" in a more concise format than we currently get?  
I'm not
knocking the current output, but it is rather verbose and hard to  
find just

the list of what is going to be built.


This is probably also worth a feature request.


I'm wondering about this since we now have so many things building by
default - if a user wants to -not- build them, they first have to  
(a) know
that they are being built, and (b) figure out how to tell configure  
not to

build them.

It isn't very obvious at the moment - at least, not to me, and I'm  
coming

across cases where people are simply mis-configuring the system out of
ignorance and/or not realizing what the system is doing.


Yep -- all good reasons.



Do we need to do something about this?

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17681

2008-03-03 Thread Ralph H Castain
./orte/mca/errmgr/errmgr.h(135): warning #1286: invalid attribute for
"orte_errmgr_base_module_abort_fn_t"

typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char
*fmt, ...) __opal_attribute_format__(__printf__, 2, 3);

I think the issue is that you can't apply attributes to the type def for the
function pointer, but only to the particular instance of that function. At
one time, we also had an attribute indicating that a function did not
return. This would also apply to this function, but IIRC you cannot have
multiple attributes (or at least, we used to run into problems with it).

So I figured I would just let this ride for now and re-address it later.

On 3/3/08 10:21 AM, "Rainer Keller"  wrote:

> Ralph,
> On Monday 03 March 2008 17:06, r...@osl.iu.edu wrote:
>> Author: rhc
>> Date: 2008-03-03 11:06:47 EST (Mon, 03 Mar 2008)
>> New Revision: 17681
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17681
>> 
>> Log:
>> Cleanup an attribute warning - not sure which one to set or where it should
>> go, so I'll leave that to someone more familiar with "attributes".
>> 
>> Ensure some debugging is only enabled when have_debug is set.
> 
>> Modified: trunk/orte/mca/errmgr/errmgr.h
>> ===
>> === --- trunk/orte/mca/errmgr/errmgr.h (original)
>> +++ trunk/orte/mca/errmgr/errmgr.h 2008-03-03 11:06:47 EST (Mon, 03 Mar
>> 2008) @@ -132,7 +132,7 @@
>>   * itself, and then exit - it takes no other actions. The intent here is
>> to provide * a last-ditch exit procedure that attempts to clean up a
>> little. */
>> -typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char
>> *fmt, ...) __opal_attribute_format__(__printf__, 2, 3);
>> +typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char
>> *fmt, ...); 
> 
> What was the warning requiring You to get rid of the attribute?
> 
> The attribute should help find errors in the callers to
> orte_errmgr_base_abort...
> 
> Maybe the help on
>   https://svn.open-mpi.org/trac/ompi/wiki/CompilerAttributes
> could be improved?
> 
> 
> Thanks,
> Rainer




Re: [OMPI devel] Ticket 1224: disable early completion in v1.2.x series

2008-03-03 Thread Jeff Squyres

On Mar 3, 2008, at 12:48 PM, Shipman, Galen M. wrote:


Unfortunately this adds an "if" to the critical path.
You should at least use OPAL_UNLIKELY..


I could have sworn there was no OPAL_UNLIKELY in the 1.2 series, which  
is why I didn't add it.  But I just checked right now and I see that  
it's there.  Doh!  So I'll add 2 UNLIKELY's and one LIKELY to the  
patch and amend the ticket (i.e., default to "will probably use early  
completion").


Before adding the UN/LIKELY's, I ran the following tests:

slightly older hardware (pre-woodcrest), netpipe 3.7.1, 1 byte sends:

1.63us with patch, disabled (use_early_completion==0)
 --> saw lots of 1.6xus and 1.9xus results -- nothing in between

1.54us with patch, enabled (use_early_completion==1)
 --> mostly 1.5x, 1.6x, 1.7xus   results -- never 1.8x or higher

Saw about same results with vanilla 1.2.5 (no patch) as with  
use_early_completion==1 -- in the noise difference.


If someone else could verify these results, it would be great.




On Mar 3, 2008, at 12:28 PM, Jeff Squyres wrote:


The topic of the "early completion" behavior in OB1 for IB
optimizations has come up several times in the v1.2 series (it causes
problems in some scenarios).

- leave the default the way it is now (early completions enabled)
- add an MCA parameter for disabling early completions

I mention this now because I had a customer complain about it over  
the

weekend.  :-)

Gleb and I propose the patch in https://svn.open-mpi.org/trac/ompi/
ticket/1224
 for the v1.2 series.  The new OB1 MCA parameter
pml_ob1_use_early_completions defaults to 1 (preserving the same
behavior as the rest of the v1.2 series), but it can be set to 0 if
the early completions on IB are creating problems for specific
applications.

It would be good to get this functionality in a real release (e.g.,
v1.2.6).

Note that this MCA parameter is not necessary for the upcoming v1.3
series because of changes in ob1 and the openib btl.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-03 Thread Ralph H Castain
I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:

> Ralph,
> 
> could this mechanism be used also to exclude a node, indicating to never
> run a job there? Here is the problem that I face quite often: students
> working on the homework forget to allocate a partition  on the cluster,
> and just type mpirun. Because of that, all jobs end up running on the
> front-end node.
> 
> If we would have now the ability to specify in a default hostfile, to
> never run a job on a specified node (e.g. the front end node), users
> would get an error message when trying to do that. I am aware that
> that's a little ugly...
> 
> THanks
> edgar
> 
> Ralph Castain wrote:
>> I forget all the formatting we are supposed to use, so I hope you'll all
>> just bear with me.
>> 
>> George brought up the fact that we used to have an MCA param to specify a
>> hostfile to use for a job. The hostfile behavior described on the wiki,
>> however, doesn't provide for that option. It associates a hostfile with a
>> specific app_context, and provides a detailed hierarchical layout of how
>> mpirun is to interpret that information.
>> 
>> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
>> to replace the deprecated capability. If found, the system's behavior will
>> be:
>> 
>> 1. in a managed environment, the default hostfile will be used to filter the
>> discovered nodes to define the available node pool. Any hostfile and/or dash
>> host options provided to an app_context will be used to further filter the
>> node pool to define the specific nodes for use by that app_context. Thus,
>> nodes in the hostfile and dash host options given to an app_context -must-
>> also be in the default hostfile in order to be available for use by that
>> app_context - any nodes in the app_context options that are not in the
>> default hostfile will be ignored.
>> 
>> 2. in an unmanaged environment, the default hostfile will be used to define
>> the available node pool. Any hostfile and/or dash host options provided to
>> an app_context will be used to filter the node pool to define the specific
>> nodes for use by that app_context, subject to the previous caveat. However,
>> add-hostfile and add-host options will add nodes to the node pool for use
>> -only- by the associated app_context.
>> 
>> 
>> I believe this proposed behavior is consistent with that described on the
>> wiki, and would be relatively easy to implement. If nobody objects, I will
>> do so by end-of-day 3/6.
>> 
>> Comments, suggestions, objections - all are welcome!
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Ticket 1224: disable early completion in v1.2.x series

2008-03-03 Thread Shipman, Galen M.

Unfortunately this adds an "if" to the critical path.
You should at least use OPAL_UNLIKELY..


On Mar 3, 2008, at 12:28 PM, Jeff Squyres wrote:


The topic of the "early completion" behavior in OB1 for IB
optimizations has come up several times in the v1.2 series (it causes
problems in some scenarios).

- leave the default the way it is now (early completions enabled)
- add an MCA parameter for disabling early completions

I mention this now because I had a customer complain about it over the
weekend.  :-)

Gleb and I propose the patch in https://svn.open-mpi.org/trac/ompi/ 
ticket/1224

  for the v1.2 series.  The new OB1 MCA parameter
pml_ob1_use_early_completions defaults to 1 (preserving the same
behavior as the rest of the v1.2 series), but it can be set to 0 if
the early completions on IB are creating problems for specific
applications.

It would be good to get this functionality in a real release (e.g.,
v1.2.6).

Note that this MCA parameter is not necessary for the upcoming v1.3
series because of changes in ob1 and the openib btl.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17681

2008-03-03 Thread Rainer Keller
Ralph,
On Monday 03 March 2008 17:06, r...@osl.iu.edu wrote:
> Author: rhc
> Date: 2008-03-03 11:06:47 EST (Mon, 03 Mar 2008)
> New Revision: 17681
> URL: https://svn.open-mpi.org/trac/ompi/changeset/17681
>
> Log:
> Cleanup an attribute warning - not sure which one to set or where it should
> go, so I'll leave that to someone more familiar with "attributes".
>
> Ensure some debugging is only enabled when have_debug is set.

> Modified: trunk/orte/mca/errmgr/errmgr.h
> ===
>=== --- trunk/orte/mca/errmgr/errmgr.h (original)
> +++ trunk/orte/mca/errmgr/errmgr.h2008-03-03 11:06:47 EST (Mon, 03 Mar
> 2008) @@ -132,7 +132,7 @@
>   * itself, and then exit - it takes no other actions. The intent here is
> to provide * a last-ditch exit procedure that attempts to clean up a
> little. */
> -typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char
> *fmt, ...) __opal_attribute_format__(__printf__, 2, 3);
> +typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char
> *fmt, ...); 

What was the warning requiring You to get rid of the attribute?

The attribute should help find errors in the callers to 
orte_errmgr_base_abort...

Maybe the help on 
  https://svn.open-mpi.org/trac/ompi/wiki/CompilerAttributes
could be improved?


Thanks,
Rainer
-- 

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 HLRS  Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgartemail: kel...@hlrs.de 
 Germany AIM/Skype:rusraink


[OMPI devel] [RFC] Removal of orte_proc_table

2008-03-03 Thread Tim Prins

WHAT: Removal of orte_proc_table

WHY: It is the last 'orte' class, its implementation is an abstraction 
violation since it assumes certain things about how the opal_hash_table 
is implemented, and it is not much code to remove it.


WHERE: This will necessitate minor changes in:
btl: tcp, sctp
oob: tcp
routed: unity, tree
grpcomm: basic
iof: svc

TIMEOUT: COB Wednesday, March 5.

DESCRIPTION:
After the orte changes, we were left with only one orte 'class' left. So 
Ralph and I discussed the possibility of removing it, and found that the 
amount of code change to do so is relatively minor.


There are also a couple other good reasons to remove or revamp it:
  1. The way the orte_proc_table was used was confusing (since you 
created an opal_hash_table and then used it as an opal_proc_table).

  2. It assumed things about the implementation of the opal_hash_table.

So with this in mind, we feel it would be good to remove it.

Attached is a patch that removes the usage of orte_proc_table. If there 
are no objections, I will commit it COB Wednesday (likely with a couple 
minor cleanups).


Index: orte/mca/oob/tcp/oob_tcp_peer.c
===
--- orte/mca/oob/tcp/oob_tcp_peer.c (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp_peer.c (working copy)
@@ -55,7 +55,7 @@
 #include "opal/util/net.h"
 #include "opal/util/error.h"

-#include "orte/class/orte_proc_table.h"
+#include "opal/class/opal_hash_table.h"
 #include "orte/util/name_fns.h"
 #include "orte/runtime/orte_globals.h"
 #include "orte/mca/errmgr/errmgr.h"
@@ -216,14 +216,14 @@
 mca_oob_tcp_peer_t * mca_oob_tcp_peer_lookup(const orte_process_name_t* name)
 {
 int rc;
-mca_oob_tcp_peer_t * peer, *old;
+mca_oob_tcp_peer_t * peer = NULL, *old;
 if (NULL == name) { /* can't look this one up */
 return NULL;
 }

 OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock);
-peer = (mca_oob_tcp_peer_t*)orte_hash_table_get_proc(
-   _oob_tcp_component.tcp_peers, name);
+opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peers,
+ orte_util_hash_name(name), (void**));
 if (NULL != peer && 0 == orte_util_compare_name_fields(ORTE_NS_CMP_ALL, 
>peer_name, name)) {
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 return peer;
@@ -247,8 +247,8 @@
 peer->peer_retries = 0;

 /* add to lookup table */
-if(ORTE_SUCCESS != 
orte_hash_table_set_proc(_oob_tcp_component.tcp_peers, 
->peer_name, peer)) {
+if(OPAL_SUCCESS != 
opal_hash_table_set_value_uint64(_oob_tcp_component.tcp_peers, 
+
orte_util_hash_name(>peer_name), peer)) {
 MCA_OOB_TCP_PEER_RETURN(peer);
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 return NULL;
Index: orte/mca/oob/tcp/oob_tcp_peer.h
===
--- orte/mca/oob/tcp/oob_tcp_peer.h (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp_peer.h (working copy)
@@ -93,7 +93,7 @@
 #define MCA_OOB_TCP_PEER_RETURN(peer)   \
 {   \
 mca_oob_tcp_peer_shutdown(peer);\
-orte_hash_table_remove_proc(_oob_tcp_component.tcp_peers, 
>peer_name); \
+opal_hash_table_remove_value_uint64(_oob_tcp_component.tcp_peers, 
orte_util_hash_name(>peer_name)); \
 OPAL_FREE_LIST_RETURN(_oob_tcp_component.tcp_peer_free, \
   >super);\
 }
Index: orte/mca/oob/tcp/oob_tcp.c
===
--- orte/mca/oob/tcp/oob_tcp.c  (revision 17666)
+++ orte/mca/oob/tcp/oob_tcp.c  (working copy)
@@ -50,7 +50,6 @@
 #include "opal/util/net.h"
 #include "opal/class/opal_hash_table.h"

-#include "orte/class/orte_proc_table.h"
 #include "orte/mca/errmgr/errmgr.h"
 #include "orte/mca/rml/rml.h"
 #include "orte/util/name_fns.h"
@@ -1125,12 +1124,12 @@

 int mca_oob_tcp_resolve(mca_oob_tcp_peer_t* peer)
 {
-mca_oob_tcp_addr_t* addr;
+mca_oob_tcp_addr_t* addr = NULL;

 /* if the address is already cached - simply return it */
 OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock);
-addr = (mca_oob_tcp_addr_t 
*)orte_hash_table_get_proc(_oob_tcp_component.tcp_peer_names,
- >peer_name);
+opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peer_names,
+ orte_util_hash_name(>peer_name), (void**));
 OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock);
 if(NULL != addr) {
  mca_oob_tcp_peer_resolved(peer, addr);
@@ -1459,23 +1458,26 @@
 int mca_oob_tcp_set_addr(const orte_process_name_t* name, const char* uri)
 {
 struct sockaddr_storage inaddr;
-mca_oob_tcp_addr_t* addr;
-mca_oob_tcp_peer_t* peer;
+mca_oob_tcp_addr_t* addr = NULL;
+mca_oob_tcp_peer_t* 

Re: [OMPI devel] vt configuration issues

2008-03-03 Thread Matthias Jurenz
Hello,

the 'make dist[clean]' problem should be fixed now. To avoid that Make
enters a 'ompi/contrib'-directory 
of a disabled contributed software the Makefile variable
OMPI_CONTRIB_DIST_SUBDIRS must not be set.
I've tested this fix as follows:

1. configure --enable-contrib-no-build=vt
make distclean
-> works
2. configure --enable-contrib-no-build=vt
make dist # the tarball doesn't contain VT sources
(omp/contrib/vt/vt)
tar xfz 
configure ; make ; make install
-> works
3. test 1 in VPATH mode
4. test 2 in VPATH mode


Matthias


On Fr, 2008-02-29 at 13:10 +0100, Andreas Knüpfer wrote:

> On Thursday 28 February 2008, Jeff Squyres wrote:
> > I can't remember if I posted about this before or not -- should we
> > disable trunk/VT building by default while the configuration issues
> > are being worked out?
> 
> Which config issues are you refering to? Is it about the 'make distclean' 
> that 
> fails if you explicitly disabled VT before?
> 
> I see no easy fix for this, because you will get an incomplete set of 
> Makefiles if you ask for an incomplete configure. Yet, by default everything 
> should be fine.
> 
> Therefore, I'd like to keep it enabled by default ... well, of course I 
> would ;)
> 
> Are there any other isses open with VT config?
> 
> Andreas
> 
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature