Re: [OMPI devel] Configure and contrib pkgs
On Mar 3, 2008, at 1:16 PM, Ralph H Castain wrote: We have several options in configure that take lists as their argument. Yet there appears to be no way for a user to find out valid members for those lists. For example, we have the option --enable-contrib-no-build. Is there some way that the user and/or sys admin can find out what contributed packages are in this distribution? Do they have to just leaf through the source code, assuming that they know where the contributed packages are stored? No. This is probably worth a feature request. Also, is there some way that configure can output "here are the packages I am going to build" in a more concise format than we currently get? I'm not knocking the current output, but it is rather verbose and hard to find just the list of what is going to be built. This is probably also worth a feature request. I'm wondering about this since we now have so many things building by default - if a user wants to -not- build them, they first have to (a) know that they are being built, and (b) figure out how to tell configure not to build them. It isn't very obvious at the moment - at least, not to me, and I'm coming across cases where people are simply mis-configuring the system out of ignorance and/or not realizing what the system is doing. Yep -- all good reasons. Do we need to do something about this? Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17681
./orte/mca/errmgr/errmgr.h(135): warning #1286: invalid attribute for "orte_errmgr_base_module_abort_fn_t" typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char *fmt, ...) __opal_attribute_format__(__printf__, 2, 3); I think the issue is that you can't apply attributes to the type def for the function pointer, but only to the particular instance of that function. At one time, we also had an attribute indicating that a function did not return. This would also apply to this function, but IIRC you cannot have multiple attributes (or at least, we used to run into problems with it). So I figured I would just let this ride for now and re-address it later. On 3/3/08 10:21 AM, "Rainer Keller"wrote: > Ralph, > On Monday 03 March 2008 17:06, r...@osl.iu.edu wrote: >> Author: rhc >> Date: 2008-03-03 11:06:47 EST (Mon, 03 Mar 2008) >> New Revision: 17681 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/17681 >> >> Log: >> Cleanup an attribute warning - not sure which one to set or where it should >> go, so I'll leave that to someone more familiar with "attributes". >> >> Ensure some debugging is only enabled when have_debug is set. > >> Modified: trunk/orte/mca/errmgr/errmgr.h >> === >> === --- trunk/orte/mca/errmgr/errmgr.h (original) >> +++ trunk/orte/mca/errmgr/errmgr.h 2008-03-03 11:06:47 EST (Mon, 03 Mar >> 2008) @@ -132,7 +132,7 @@ >> * itself, and then exit - it takes no other actions. The intent here is >> to provide * a last-ditch exit procedure that attempts to clean up a >> little. */ >> -typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char >> *fmt, ...) __opal_attribute_format__(__printf__, 2, 3); >> +typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char >> *fmt, ...); > > What was the warning requiring You to get rid of the attribute? > > The attribute should help find errors in the callers to > orte_errmgr_base_abort... > > Maybe the help on > https://svn.open-mpi.org/trac/ompi/wiki/CompilerAttributes > could be improved? > > > Thanks, > Rainer
Re: [OMPI devel] Ticket 1224: disable early completion in v1.2.x series
On Mar 3, 2008, at 12:48 PM, Shipman, Galen M. wrote: Unfortunately this adds an "if" to the critical path. You should at least use OPAL_UNLIKELY.. I could have sworn there was no OPAL_UNLIKELY in the 1.2 series, which is why I didn't add it. But I just checked right now and I see that it's there. Doh! So I'll add 2 UNLIKELY's and one LIKELY to the patch and amend the ticket (i.e., default to "will probably use early completion"). Before adding the UN/LIKELY's, I ran the following tests: slightly older hardware (pre-woodcrest), netpipe 3.7.1, 1 byte sends: 1.63us with patch, disabled (use_early_completion==0) --> saw lots of 1.6xus and 1.9xus results -- nothing in between 1.54us with patch, enabled (use_early_completion==1) --> mostly 1.5x, 1.6x, 1.7xus results -- never 1.8x or higher Saw about same results with vanilla 1.2.5 (no patch) as with use_early_completion==1 -- in the noise difference. If someone else could verify these results, it would be great. On Mar 3, 2008, at 12:28 PM, Jeff Squyres wrote: The topic of the "early completion" behavior in OB1 for IB optimizations has come up several times in the v1.2 series (it causes problems in some scenarios). - leave the default the way it is now (early completions enabled) - add an MCA parameter for disabling early completions I mention this now because I had a customer complain about it over the weekend. :-) Gleb and I propose the patch in https://svn.open-mpi.org/trac/ompi/ ticket/1224 for the v1.2 series. The new OB1 MCA parameter pml_ob1_use_early_completions defaults to 1 (preserving the same behavior as the rest of the v1.2 series), but it can be set to 0 if the early completions on IB are creating problems for specific applications. It would be good to get this functionality in a real release (e.g., v1.2.6). Note that this MCA parameter is not necessary for the upcoming v1.3 series because of changes in ob1 and the openib btl. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [RFC] Default hostfile MCA param
I personally have no objection, but I would ask then that the wiki be modified to cover this case. All I require is that someone define the syntax to be used to indicate "this is a node I do -not- want used", or alternatively a flag that indicates "all nodes below are -not- to be used". Implementation isn't too hard once I have that... On 3/3/08 9:44 AM, "Edgar Gabriel"wrote: > Ralph, > > could this mechanism be used also to exclude a node, indicating to never > run a job there? Here is the problem that I face quite often: students > working on the homework forget to allocate a partition on the cluster, > and just type mpirun. Because of that, all jobs end up running on the > front-end node. > > If we would have now the ability to specify in a default hostfile, to > never run a job on a specified node (e.g. the front end node), users > would get an error message when trying to do that. I am aware that > that's a little ugly... > > THanks > edgar > > Ralph Castain wrote: >> I forget all the formatting we are supposed to use, so I hope you'll all >> just bear with me. >> >> George brought up the fact that we used to have an MCA param to specify a >> hostfile to use for a job. The hostfile behavior described on the wiki, >> however, doesn't provide for that option. It associates a hostfile with a >> specific app_context, and provides a detailed hierarchical layout of how >> mpirun is to interpret that information. >> >> What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile" >> to replace the deprecated capability. If found, the system's behavior will >> be: >> >> 1. in a managed environment, the default hostfile will be used to filter the >> discovered nodes to define the available node pool. Any hostfile and/or dash >> host options provided to an app_context will be used to further filter the >> node pool to define the specific nodes for use by that app_context. Thus, >> nodes in the hostfile and dash host options given to an app_context -must- >> also be in the default hostfile in order to be available for use by that >> app_context - any nodes in the app_context options that are not in the >> default hostfile will be ignored. >> >> 2. in an unmanaged environment, the default hostfile will be used to define >> the available node pool. Any hostfile and/or dash host options provided to >> an app_context will be used to filter the node pool to define the specific >> nodes for use by that app_context, subject to the previous caveat. However, >> add-hostfile and add-host options will add nodes to the node pool for use >> -only- by the associated app_context. >> >> >> I believe this proposed behavior is consistent with that described on the >> wiki, and would be relatively easy to implement. If nobody objects, I will >> do so by end-of-day 3/6. >> >> Comments, suggestions, objections - all are welcome! >> Ralph >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Ticket 1224: disable early completion in v1.2.x series
Unfortunately this adds an "if" to the critical path. You should at least use OPAL_UNLIKELY.. On Mar 3, 2008, at 12:28 PM, Jeff Squyres wrote: The topic of the "early completion" behavior in OB1 for IB optimizations has come up several times in the v1.2 series (it causes problems in some scenarios). - leave the default the way it is now (early completions enabled) - add an MCA parameter for disabling early completions I mention this now because I had a customer complain about it over the weekend. :-) Gleb and I propose the patch in https://svn.open-mpi.org/trac/ompi/ ticket/1224 for the v1.2 series. The new OB1 MCA parameter pml_ob1_use_early_completions defaults to 1 (preserving the same behavior as the rest of the v1.2 series), but it can be set to 0 if the early completions on IB are creating problems for specific applications. It would be good to get this functionality in a real release (e.g., v1.2.6). Note that this MCA parameter is not necessary for the upcoming v1.3 series because of changes in ob1 and the openib btl. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17681
Ralph, On Monday 03 March 2008 17:06, r...@osl.iu.edu wrote: > Author: rhc > Date: 2008-03-03 11:06:47 EST (Mon, 03 Mar 2008) > New Revision: 17681 > URL: https://svn.open-mpi.org/trac/ompi/changeset/17681 > > Log: > Cleanup an attribute warning - not sure which one to set or where it should > go, so I'll leave that to someone more familiar with "attributes". > > Ensure some debugging is only enabled when have_debug is set. > Modified: trunk/orte/mca/errmgr/errmgr.h > === >=== --- trunk/orte/mca/errmgr/errmgr.h (original) > +++ trunk/orte/mca/errmgr/errmgr.h2008-03-03 11:06:47 EST (Mon, 03 Mar > 2008) @@ -132,7 +132,7 @@ > * itself, and then exit - it takes no other actions. The intent here is > to provide * a last-ditch exit procedure that attempts to clean up a > little. */ > -typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char > *fmt, ...) __opal_attribute_format__(__printf__, 2, 3); > +typedef void (*orte_errmgr_base_module_abort_fn_t)(int error_code, char > *fmt, ...); What was the warning requiring You to get rid of the attribute? The attribute should help find errors in the callers to orte_errmgr_base_abort... Maybe the help on https://svn.open-mpi.org/trac/ompi/wiki/CompilerAttributes could be improved? Thanks, Rainer -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink
[OMPI devel] [RFC] Removal of orte_proc_table
WHAT: Removal of orte_proc_table WHY: It is the last 'orte' class, its implementation is an abstraction violation since it assumes certain things about how the opal_hash_table is implemented, and it is not much code to remove it. WHERE: This will necessitate minor changes in: btl: tcp, sctp oob: tcp routed: unity, tree grpcomm: basic iof: svc TIMEOUT: COB Wednesday, March 5. DESCRIPTION: After the orte changes, we were left with only one orte 'class' left. So Ralph and I discussed the possibility of removing it, and found that the amount of code change to do so is relatively minor. There are also a couple other good reasons to remove or revamp it: 1. The way the orte_proc_table was used was confusing (since you created an opal_hash_table and then used it as an opal_proc_table). 2. It assumed things about the implementation of the opal_hash_table. So with this in mind, we feel it would be good to remove it. Attached is a patch that removes the usage of orte_proc_table. If there are no objections, I will commit it COB Wednesday (likely with a couple minor cleanups). Index: orte/mca/oob/tcp/oob_tcp_peer.c === --- orte/mca/oob/tcp/oob_tcp_peer.c (revision 17666) +++ orte/mca/oob/tcp/oob_tcp_peer.c (working copy) @@ -55,7 +55,7 @@ #include "opal/util/net.h" #include "opal/util/error.h" -#include "orte/class/orte_proc_table.h" +#include "opal/class/opal_hash_table.h" #include "orte/util/name_fns.h" #include "orte/runtime/orte_globals.h" #include "orte/mca/errmgr/errmgr.h" @@ -216,14 +216,14 @@ mca_oob_tcp_peer_t * mca_oob_tcp_peer_lookup(const orte_process_name_t* name) { int rc; -mca_oob_tcp_peer_t * peer, *old; +mca_oob_tcp_peer_t * peer = NULL, *old; if (NULL == name) { /* can't look this one up */ return NULL; } OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock); -peer = (mca_oob_tcp_peer_t*)orte_hash_table_get_proc( - _oob_tcp_component.tcp_peers, name); +opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peers, + orte_util_hash_name(name), (void**)); if (NULL != peer && 0 == orte_util_compare_name_fields(ORTE_NS_CMP_ALL, >peer_name, name)) { OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock); return peer; @@ -247,8 +247,8 @@ peer->peer_retries = 0; /* add to lookup table */ -if(ORTE_SUCCESS != orte_hash_table_set_proc(_oob_tcp_component.tcp_peers, ->peer_name, peer)) { +if(OPAL_SUCCESS != opal_hash_table_set_value_uint64(_oob_tcp_component.tcp_peers, + orte_util_hash_name(>peer_name), peer)) { MCA_OOB_TCP_PEER_RETURN(peer); OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock); return NULL; Index: orte/mca/oob/tcp/oob_tcp_peer.h === --- orte/mca/oob/tcp/oob_tcp_peer.h (revision 17666) +++ orte/mca/oob/tcp/oob_tcp_peer.h (working copy) @@ -93,7 +93,7 @@ #define MCA_OOB_TCP_PEER_RETURN(peer) \ { \ mca_oob_tcp_peer_shutdown(peer);\ -orte_hash_table_remove_proc(_oob_tcp_component.tcp_peers, >peer_name); \ +opal_hash_table_remove_value_uint64(_oob_tcp_component.tcp_peers, orte_util_hash_name(>peer_name)); \ OPAL_FREE_LIST_RETURN(_oob_tcp_component.tcp_peer_free, \ >super);\ } Index: orte/mca/oob/tcp/oob_tcp.c === --- orte/mca/oob/tcp/oob_tcp.c (revision 17666) +++ orte/mca/oob/tcp/oob_tcp.c (working copy) @@ -50,7 +50,6 @@ #include "opal/util/net.h" #include "opal/class/opal_hash_table.h" -#include "orte/class/orte_proc_table.h" #include "orte/mca/errmgr/errmgr.h" #include "orte/mca/rml/rml.h" #include "orte/util/name_fns.h" @@ -1125,12 +1124,12 @@ int mca_oob_tcp_resolve(mca_oob_tcp_peer_t* peer) { -mca_oob_tcp_addr_t* addr; +mca_oob_tcp_addr_t* addr = NULL; /* if the address is already cached - simply return it */ OPAL_THREAD_LOCK(_oob_tcp_component.tcp_lock); -addr = (mca_oob_tcp_addr_t *)orte_hash_table_get_proc(_oob_tcp_component.tcp_peer_names, - >peer_name); +opal_hash_table_get_value_uint64(_oob_tcp_component.tcp_peer_names, + orte_util_hash_name(>peer_name), (void**)); OPAL_THREAD_UNLOCK(_oob_tcp_component.tcp_lock); if(NULL != addr) { mca_oob_tcp_peer_resolved(peer, addr); @@ -1459,23 +1458,26 @@ int mca_oob_tcp_set_addr(const orte_process_name_t* name, const char* uri) { struct sockaddr_storage inaddr; -mca_oob_tcp_addr_t* addr; -mca_oob_tcp_peer_t* peer; +mca_oob_tcp_addr_t* addr = NULL; +mca_oob_tcp_peer_t*
Re: [OMPI devel] vt configuration issues
Hello, the 'make dist[clean]' problem should be fixed now. To avoid that Make enters a 'ompi/contrib'-directory of a disabled contributed software the Makefile variable OMPI_CONTRIB_DIST_SUBDIRS must not be set. I've tested this fix as follows: 1. configure --enable-contrib-no-build=vt make distclean -> works 2. configure --enable-contrib-no-build=vt make dist # the tarball doesn't contain VT sources (omp/contrib/vt/vt) tar xfz configure ; make ; make install -> works 3. test 1 in VPATH mode 4. test 2 in VPATH mode Matthias On Fr, 2008-02-29 at 13:10 +0100, Andreas Knüpfer wrote: > On Thursday 28 February 2008, Jeff Squyres wrote: > > I can't remember if I posted about this before or not -- should we > > disable trunk/VT building by default while the configuration issues > > are being worked out? > > Which config issues are you refering to? Is it about the 'make distclean' > that > fails if you explicitly disabled VT before? > > I see no easy fix for this, because you will get an incomplete set of > Makefiles if you ask for an incomplete configure. Yet, by default everything > should be fine. > > Therefore, I'd like to keep it enabled by default ... well, of course I > would ;) > > Are there any other isses open with VT config? > > Andreas > > -- Matthias Jurenz, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A106, Zellescher Weg 12, 01062 Dresden phone +49-351-463-31945, fax +49-351-463-37773 smime.p7s Description: S/MIME cryptographic signature