Re: [OMPI devel] [OMPI svn] svn:open-mpi r23767

2010-09-20 Thread Barrett, Brian W
Jeff - Sorry, was out of town last week. The patch undoes the discussion we had -- we always run component macros, even if the component couldn't build, to solve the issue of AC_CONFIG_FILES needing to always be run. So the first part of the patch is incorrect and should not be committed. We

Re: [OMPI devel] RFC: make hwloc first-class data

2010-09-23 Thread Barrett, Brian W
I unfortunately don't have many cycles to think about this before Oct 1, but I'm still a little concerned about the portability aspects of having hwloc be a first class citizen of OMPI - if we support a platform hwloc doesn't, that seems like it will still cause problems... Brian On Sep 22, 20

Re: [OMPI devel] Threading

2010-10-12 Thread Barrett, Brian W
On Oct 11, 2010, at 11:41 PM, Ralph Castain wrote: > Does anyone know of a reason why mpirun can -not- be threaded, assuming that > all threads block and do not continuously chew cpu? Is there an environment > where this would cause a problem? We don't have any machines at Sandia where I could

Re: [OMPI devel] Use of OPAL_PREFIX to relocate a lib

2010-10-26 Thread Barrett, Brian W
I'll take a look at fixing this the right way today. Since I wrote both the original autogen.sh that guaranteed static-components was ordered and PREFIX code, I had considered it to be a documented feature that there was strong otdering in the static-components list. So personally, I'd conside

Re: [OMPI devel] 1.5.x plans

2010-10-26 Thread Barrett, Brian W
On Oct 26, 2010, at 3:07 PM, Jeff Squyres wrote: > There seem to be 3 obvious options about moving forward (all assume that we > do 1.5.1 as described above): > > A. End the 1.5 line (i.e., work towards transitioning it to 1.6), and then > re-branch the trunk to be v1.7. > B. Sync the trunk

Re: [OMPI devel] 1.5.x plans

2010-10-27 Thread Barrett, Brian W
On Oct 27, 2010, at 9:14 AM, Jeff Squyres wrote: > On Oct 26, 2010, at 5:52 PM, Barrett, Brian W wrote: > >>> B. Sync the trunk to the 1.5 branch en masse. Stabilize that and call it >>> 1.5.2. >>> >>> Most people (including me) favored B. Rich wa

Re: [OMPI devel] Use of OPAL_PREFIX to relocate a lib

2010-10-27 Thread Barrett, Brian W
n On Oct 26, 2010, at 8:36 AM, Barrett, Brian W wrote: > I'll take a look at fixing this the right way today. > > Since I wrote both the original autogen.sh that guaranteed static-components > was ordered and PREFIX code, I had considered it to be a documented feature > that

Re: [OMPI devel] === CREATE FAILURE (trunk) ===

2010-11-03 Thread Barrett, Brian W
I'm pretty sure it was Shiqing's patch. THe problem is that OPAL_DECLSPEC was added to event.h to export a couple of symbols, but none of the libevent files include opal_config.h, so OPAL_DECLSPEC isn't properly defined on Unix systems. I ran into this last night and was going to send an e-mai

Re: [OMPI devel] Change in OPAL / OMPI DPM system time during MPI_INIT

2010-11-22 Thread Barrett, Brian W
Um, the counter starts initialized at one. Brian On Nov 22, 2010, at 9:32 AM, Jeff Squyres wrote: > A user noticed a specific change that we made between 1.4.2 and 1.4.3: > >https://svn.open-mpi.org/trac/ompi/changeset/23448 > > which is from CMR https://svn.open-mpi.org/trac/ompi/ticket/2

Re: [OMPI devel] Change in OPAL / OMPI DPM system time during MPI_INIT

2010-11-22 Thread Barrett, Brian W
early in finalize. Brian On Nov 22, 2010, at 12:27 PM, Jeff Squyres wrote: > On Nov 22, 2010, at 11:35 AM, Barrett, Brian W wrote: > >> Um, the counter starts initialized at one. > > Does that mean that we should or should not leave that extra _decrement() in > th

[OMPI devel] Datatype question

2010-12-21 Thread Barrett, Brian W
All - I'm trying to follow up on James Dinan's one-sided datatype errors e-mail and running into some datatype issues from when the datatype engine was moved to OPAL (sigh). Accumulate needs to get at the underlying datatypes for a user-created dataype. Before the ddt move, one just walked bd

[OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
All - On one of our clusters, we're seeing the following on one of our applications, I believe using Open MPI 1.4.3: [xxx:27545] *** An error occurred in MPI_Scatterv [xxx:27545] *** on communicator MPI COMMUNICATOR 5 DUP FROM 4 [xxx:27545] *** MPI_ERR_OTHER: known error not in list [xxx:27545]

Re: [OMPI devel] OFED question

2011-01-27 Thread Barrett, Brian W
formance Tools Group > Computer Science and Math Division > Oak Ridge National Laboratory > > > > > > > On Jan 27, 2011, at 5:56 PM, Barrett, Brian W wrote: > >> All - >> >> On one of our clusters, we're seeing the following on one of ou

Re: [OMPI devel] Minor OMPI SVN configuration change

2011-02-17 Thread Barrett, Brian W
Why did "we" make this change? It was originally this way, and we changed it to the no-auth way for a reason. Brian - Original Message - From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Wednesday, February 16, 2011 09:24 AM To: Open Developers Subject: [OMPI devel] Minor OMPI SVN

[OMPI devel] trunk hwloc & static builds

2011-02-21 Thread Barrett, Brian W
All - The trunk currently doesn't link with --enable-static --disable-shared on Linux. The problem is that the component doesn't pass it's dependencies into the wrapper compiler list. In particular, the xml support creates a dependency on libxml. --disable-xml solves the problem, but is stil

Re: [OMPI devel] Bug in openmpi-1.5/opal/config/opal_config_asm.m4

2011-02-23 Thread Barrett, Brian W
Thanks. I've applied the patch and will start the process of pushing it to the next 1.5 release. Brian On 2/23/11 11:04 AM, "Jay Fenlason" wrote: >I was recently handed >https://bugzilla.redhat.com/attachment.cgi?id=480307 >for which a kindly GCC expert attached the enclosed patch. Apparentl

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24449

2011-02-23 Thread Barrett, Brian W
George - You're right, I misread the patch. I've run into the same issue with gcc before, but not on x86. Jay, can you point us to the original bug report? I couldn't figure out how to get from the patch to the bug in your bugzilla. Brian On 2/23/11 2:57 PM, "George Bosilca" wrote: >If I un

[OMPI devel] Libevent visibility problem

2011-07-11 Thread Barrett, Brian W
Hi all - When libevent was made its own component last fall, it appears that the function renames and visibility settings were lost. This is proving rather problematic for a project I'm trying to get running with the trunk which uses libev (which provides a libevent compatibility layer). It wor

Re: [OMPI devel] Libevent visibility problem

2011-07-12 Thread Barrett, Brian W
On 7/11/11 4:31 PM, "Ralph Castain" wrote: >On Jul 11, 2011, at 2:51 PM, Barrett, Brian W wrote: > >> Hi all - >> >> When libevent was made its own component last fall, it appears that the >> function renames and visibility settings were lost. Thi

Re: [OMPI devel] [devel-core] RFC: extend MTL API

2011-07-12 Thread Barrett, Brian W
This makes sense to me. Brian On 7/1/11 8:45 AM, "Mike Dubman" wrote: >WHAT: Adding communicator add/delete callbacks to MTL. >WHY: MTL will be able to separate messages on different contexts. >WHEN: On trunk (later on v1.5 as well), Tuesday telconf, 5 July 2011 >TIMEOUT: Tuesday telconf, 12 Ju

Re: [OMPI devel] Libevent visibility problem

2011-07-12 Thread Barrett, Brian W
On 7/12/11 4:21 PM, "Ralph Castain" wrote: >On Jul 12, 2011, at 12:29 PM, Barrett, Brian W wrote: > >> On 7/11/11 4:31 PM, "Ralph Castain" wrote: >> >>> On Jul 11, 2011, at 2:51 PM, Barrett, Brian W wrote: >>> >>>> Hi all

Re: [OMPI devel] Libevent visibility problem

2011-07-14 Thread Barrett, Brian W
Looks good, thanks! Brian On 7/14/11 1:12 AM, "Ralph Castain" wrote: >Should be fixed in r24902 - let me know. > > >On Jul 12, 2011, at 4:30 PM, Barrett, Brian W wrote: > >> On 7/12/11 4:21 PM, "Ralph Castain" wrote: >> >>> On Jul 12, 2

[OMPI devel] RFC: MProbe addition

2011-07-21 Thread Barrett, Brian W
WHAT: Add the MPI-3 MProbe implementation to the trunk WHY: MPI-3 rocks? WHEN (timeout): Tuesday teleconf time, July 26 WHERE: The trunk WHO: me? Details: I have a complete mprobe implementation in tmp-public/bwb-mprobe2/ which is ready to come into the trunk. It matches the mprobe proposal

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Barrett, Brian W
On 8/3/11 10:51 AM, "Shamis, Pavel" wrote: >> >> Err.. I don't quite understand. How exactly are you configuring? If I >>do this: >> >> ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 >>--disable-vt -- >> disable-io-romio --disable-mpi-cxx --disable-shared --enable-static >>--enab

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015

2011-08-08 Thread Barrett, Brian W
On 8/8/11 9:34 AM, "Jeff Squyres" wrote: >On Aug 8, 2011, at 11:30 AM, Wesley Bland wrote: > >> The reason is because valgrind was complaining about uninitialized >>values that were passed into proc_get_epoch. I saw the same warnings >>from valgrind when I ran it. I added the code to initialize t

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015

2011-08-08 Thread Barrett, Brian W
do that here because >this isn't a pointer. If it would make the code look better I can move >the first assignment to the top of the function where the other >initializations are. > >On Mon, Aug 8, 2011 at 11:41 AM, Barrett, Brian W >wrote: > >On 8/8/11 9:34 AM, &qu

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25015

2011-08-08 Thread Barrett, Brian W
coding an alternative >ORTE_SOMETHING_PRINT function, only for this use; > - having proc be defined as an opal_object, and set epoch to INVALID (or >even UNSET) into the constructor. This could induce changes at many >places, and there is always the risk that some changes are left >

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25234

2011-10-05 Thread Barrett, Brian W
I don't think we need to go that far; in fact, we really shouldn't use m4 macros to enforce license policies like that. But more importantly, we should remove that particular warning from this test, since the test is used in places other than SLURM, which don't have negative licensing impact. Bri

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25234

2011-10-05 Thread Barrett, Brian W
On 10/5/11 12:37 PM, "Jeff Squyres" wrote: >On Oct 5, 2011, at 2:30 PM, Barrett, Brian W wrote: > >> I don't think we need to go that far; in fact, we really shouldn't use >>m4 >> macros to enforce license policies like that. > >I'm not talk

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25234

2011-10-05 Thread Barrett, Brian W
On 10/5/11 2:22 PM, "Ralph Castain" wrote: >I thought I already had a check pmi m4 somewhere? Should have been in >that pmi component I committed a few months ago. I can check next week. You did :). LANL's moving some code around so that we can extend the ALPS ess to use PMI instead of cnos to

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread Barrett, Brian W
I actually think it's worse than that. An ORTE error code can now have the same error code as an OMPI error. OMPI_ERR_REQUEST and ORTE_ERR_RECV_LESS_THANK_POSTED now share the same integer return code. Or, they should, if George hadn't made a mistake (see below). The sharing of return codes seem

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread Barrett, Brian W
tly what the execution path was. > > george. > >PS: I'll fix the +/- issue. > >On Oct 19, 2011, at 14:09 , Jeff Squyres wrote: > >> Oy, yes, that is bad -- we cannot have overlapping ORTE and OMPI error >>codes. That seems like a very bad idea (in addition to t

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-02 Thread Barrett, Brian W
I really don't like our show_help at every level behavior (look at what happens when MPI_INIT fails, you get a page per process of the same error message from each level of the call stack). If you want to show_help and abort on debug, that makes sense. It doesn't make any sense on a production bu

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-02 Thread Barrett, Brian W
OMPI layer. > >I don't know how to entirely avoid the message issue Brian mentions - >I'll still have to say -something- when I get an error code, but I have >come up with some methods for reducing the clutter. > >On Nov 2, 2011, at 11:43 AM, Barrett, Brian W wrote

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25431

2011-11-04 Thread Barrett, Brian W
Why not? I use asserts all the time in OMPI code. A quick grep in ompi/mca says I'm not alone. There are a whole bunch of places where I "know" a fact, such as a pointer never being NULL or consistency checks between two values. These don't need to run in production; I've theoretically tested t

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread Barrett, Brian W
On 11/7/11 3:27 PM, "George Bosilca" wrote: > >On Nov 7, 2011, at 10:37 , Jeff Squyres wrote: > >> On Nov 7, 2011, at 10:16 AM, Nathan T. Hjelm wrote: >> >>> Yes, and I completely agree. I was simply trying to keep it consistent >>>in >>> case there is something I don't know about the heterogene

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread Barrett, Brian W
On 11/7/11 8:58 PM, "George Bosilca" wrote: >I'll take as an example the atomic operations. They exist only on amd64, >and only if the compiler supports gcc-like assembly. However, the atomic >operation is defined in a global header with a very exciting name >atomic.h. If anybody else start using

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Barrett, Brian W
On 11/8/11 5:25 PM, "George Bosilca" wrote: >2. one sided: A quick look in the OSC seems to indicate there are some >special handling to be done in the RDMA one. Look at >ompi_osc_rdma_sendreq_t in osc_rdma_sendreq.h, it is using a trick to >store the remote segments. First, the mca_btl_base_segm

Re: [OMPI devel] [EXTERNAL] Re: Rename "vader" BTL to "xpmem"

2011-11-17 Thread Barrett, Brian W
On 11/17/11 6:29 AM, "Ralph Castain" wrote: >Frankly, the only vote that counts is Nathan's - it's his btl, and we >have never forcibly made someone rename their component. I would suggest >we not set that precedent. I'm comfortable with whatever he decides to >call it. I have no objection to a

Re: [OMPI devel] [EXTERNAL] [patch] One-sided communication with derived datatype fails on sparc64

2012-01-12 Thread Barrett, Brian W
George - This looks right to me, but the patches are in the datatype engine, so can you weigh in? Thanks, Brian On 1/11/12 10:04 PM, "Kawashima" wrote: >Hi Open MPI developers, > >We, Fujitsu, noticed that one-sided communication with some sort of >derived datatype fails on sparc64 machines.

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r25813

2012-01-30 Thread Barrett, Brian W
Actually, the MXM component is an MTL, not a PML, so Jeff's option wouldn't work. I'm not sure this is any better than saying "--mca pml cm" versus "--mca pml ob1", other than there's at least some rational names involved. However, it does seem awful silly to thread the enabled through the module;

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r25813

2012-01-30 Thread Barrett, Brian W
On 1/30/12 3:53 PM, "Jeff Squyres" wrote: >On Jan 30, 2012, at 5:38 PM, Barrett, Brian W wrote: > >> I'm not sure this is any better than saying "--mca pml cm" versus "--mca >> pml ob1", other than there's at least some rational names in

Re: [OMPI devel] [EXTERNAL] MPI-3: MPI_GET_LIBRARY_VERSION

2012-02-03 Thread Barrett, Brian W
Jeff, What's the jsquyres@svby-mpi063 tag for? It seems odd to have it there in a release tarball, at least as presented. Having the version number earlier seems like a good idea... Brian On 2/2/12 5:01 PM, "Jeff Squyres" wrote: >I just committed in https://svn.open-mpi.org/trac/ompi/changes

[OMPI devel] Matched probe support

2012-02-08 Thread Barrett, Brian W
All - With r25865, the trunk now has support for the MPI-3 matched probe functionality. Currently, all PMLs other than OB1 will throw a not implemented error when mprobe, improbe, mrecv, or imrecv are called. I will adding support to CM for matched probe (which will likely require changes to the

Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]

2012-02-20 Thread Barrett, Brian W
Hi Paul - Thanks for noticing this. I guess we don't have many Altix developers. I think I've fixed it on the trunk with r25968, plus r25967 to make sure the Altix component gets selected over the Linux component on Altix systems. I don't have an Altix to test on; can you give it a go and let me

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r25966

2012-02-20 Thread Barrett, Brian W
That's because Jeff forgot to copy the line: AC_CONFIG_FILES([ompi/mca/fbtl/posix/Makefile]) >From whatever configure.m4 script he used as the base for his new macro :). Brian On 2/20/12 3:36 PM, "Ralph Castain" wrote: >I'm afraid this commit breaks the ability to build from a tarball. I >c

Re: [OMPI devel] [EXTERNAL] Re: RFC: ob1: fallback on put/send on rget failure

2012-03-19 Thread Barrett, Brian W
I'm not sure I'm the best one to comment on OB1 these days, but I didn't see anything obviously wrong. Brian On 3/19/12 9:32 AM, "Jeffrey Squyres" wrote: >George / Brian -- > >Can you guys comment on this patch? > > >On Mar 15, 2012, at 5:07 PM, Nathan Hjelm wrote: > >> What: Update ob1 to do t

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r26180

2012-03-23 Thread Barrett, Brian W
Gah; sorry about that. I thought I had tested that code path, but it appears not. Stupid flexibility of Open MPI :). Jeff's correct, it's supposed to be MPI_SOURCE. Thanks, Jeff! Brian On Mar 22, 2012, at 6:50 PM, Ralph Castain wrote: > Thanks! > > On Mar 22, 2012, at 6:12 PM, Jeffrey Squ

[OMPI devel] Remove Portals support?

2012-03-23 Thread Barrett, Brian W
Hi all - This is not an RFC, but more a question for the community. Is anyone still actively using the Portals MTL/BTLs? We're not at Sandia. I know ORNL was using it at one point. SNL probably can't do much in the way of support anymore, so if no one wants them, it might make sense to remo

[OMPI devel] Debugger question

2012-03-26 Thread Barrett, Brian W
Hi all - In ompi/debuggers/predefined_gap_test.c, there's set of tests looking at all the fields in a window structure. The other source files in ompi/debuggers/ don't seem to use most of those fields (since they really shouldn't be useful to a debugger anyway). I removed some of the fields as p

Re: [OMPI devel] [EXTERNAL] Re: Debugger question

2012-03-26 Thread Barrett, Brian W
e or two of those fields with any other fields on the >new window structure? > > >On Mar 26, 2012, at 1:17 PM, Barrett, Brian W wrote: > >> Hi all - >> >> In ompi/debuggers/predefined_gap_test.c, there's set of tests looking at >> all the fields in a windo

Re: [OMPI devel] [EXTERNAL] Re: Remove Portals support?

2012-03-27 Thread Barrett, Brian W
eager to >>remove it. >> >> >> Pavel (Pasha) Shamis >> --- >> Application Performance Tools Group >> Computer Science and Math Division >> Oak Ridge National Laboratory >> >> >> >> >> >> >> On Mar 23, 20

[OMPI devel] MPI Conformance List

2012-04-03 Thread Barrett, Brian W
Hi all - Ralph and I put together a list of MPI conformance (against the 2.1, 2.2, and 3.0 documents). Could you please take a look at the list and either add items you think are missing, claim items you are working on, or mark items as complete. https://svn.open-mpi.org/trac/ompi/wiki/MPIConf

[OMPI devel] Developers Meeting

2012-04-03 Thread Barrett, Brian W
Hi all - There is discussion of attempting to have a developers meeting this summer. We haven't had one in a while and people thought it would be good to work through some of the ideas on how to implement features for 1.7. We don't have a location yet, but possibilities include Los Alamos and San

Re: [OMPI devel] [EXTERNAL] Re: Developers Meeting

2012-04-03 Thread Barrett, Brian W
On 4/3/12 11:08 AM, "Jeffrey Squyres" wrote: >On Apr 3, 2012, at 11:44 AM, Barrett, Brian W wrote: > >> There is discussion of attempting to have a developers meeting this >> summer. We haven't had one in a while and people thought it would be >>good >

Re: [OMPI devel] [EXTERNAL] Re: Developers Meeting

2012-04-06 Thread Barrett, Brian W
Developers Meeting >> >> I second Oak Ridge (or even UTK) sometime in June. >> >> -- Josh >> >> On Tue, Apr 3, 2012 at 3:07 PM, Barrett, Brian W wrote: >>> On 4/3/12 11:08 AM, "Jeffrey Squyres" wrote: >>> >>>> On Apr 3, 2012,

Re: [OMPI devel] [EXTERNAL] Re: r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread Barrett, Brian W
r2655 is awful as a patch. It doesn't work on any non-Linux platform, which is unpleasant. But worse, what does it possibly accomplish? In codes other than benchmarks, there's no advantage to aligning the pointer to 32 or 64 byte boundaries, as the malloced buffer very rarely is exactly what is

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r26329

2012-04-24 Thread Barrett, Brian W
And I think Jeff made me look at the code when you sent the RFC. Shame on Jeff for making me review the same code twice ;). Brian On 4/24/12 2:47 PM, "Nathan Hjelm" wrote: >This was RFC'd last month. No one objected :) > >-Nathan > >On Tue, 24 Apr 2012, Jeffrey Squyres wrote: > >> There's some

Re: [OMPI devel] [EXTERNAL] Re: libevent socket code

2012-05-01 Thread Barrett, Brian W
This isn't a static library problem and it can't be fixed by the cluster admins, it's a glibc design problem. There are a set of functions (those that use the services configured with nsswitch.conf, which includes host name resolution and user information resolution) that are implemented through a

Re: [OMPI devel] [EXTERNAL] Re: Unable to set flags using platform files in the 1.6 release

2012-05-23 Thread Barrett, Brian W
David - Where exactly the platform file gets evaluated depends on a number of things that the OMPI developers don't have a lot of control over. It was never meant to be used to set environment variables, only command line arguments. It looks like something bad has happened with ordering; I'm not

Re: [OMPI devel] [EXTERNAL] Re: Unable to set flags using platform files in the 1.6 release

2012-05-23 Thread Barrett, Brian W
ly, we're not going to 1.6 >at the moment. > >-david > >-- >David Gunter >HPC-3: Infrastructure Team >Los Alamos National Laboratory > > > > >On May 23, 2012, at 2:23 PM, Barrett, Brian W wrote: > >> David - >> >> Where exactly the p

Re: [OMPI devel] [EXTERNAL] Re: Unable to set flags using platform files in the 1.6 release

2012-05-23 Thread Barrett, Brian W
't pull any of the flags >out of it until later. I'm trying to see if I can adjust it. > >BTW: none of this changed from the 1.5 series, so this has been the >situation for a very long time. > > >On May 23, 2012, at 2:41 PM, Barrett, Brian W wrote: > >> Y

Re: [OMPI devel] [EXTERNAL] Re: Unable to set flags using platform files in the 1.6 release

2012-05-23 Thread Barrett, Brian W
g the ordering wrong there >was probably my fault; sorry. > >On May 23, 2012, at 4:53 PM, Ralph Castain wrote: > >> Ah, okay - didn't realize that ordering. I'll fix it - thanks! >> >> On May 23, 2012, at 2:49 PM, Barrett, Brian W wrote: >> >>

Re: [OMPI devel] [EXTERNAL] SVN trunk PSM MTL is busted

2012-05-24 Thread Barrett, Brian W
On 5/24/12 8:55 AM, "Jeff Squyres" wrote: >Per Brian's recent MTL updates, the PSM MTL is busted. I notice the >following when I run on a machine that has the PSM software stack >installed: > >[ompi_r00lez:19108] mca: base: component_find: unable to open >/scratch/local/jsquyres/bogus/lib/openmp

[OMPI devel] Component Maintainers

2012-06-07 Thread Barrett, Brian W
Hi all - As part of an effort to get 1.7 out the door quickly, we'd like to figure out which components are essentially "unmaintained". With that in mind, I've made a list of all components in the trunk. Ralph and I would appreciate it if everyone would "claim" the components their organization

[OMPI devel] OpenIB compile error

2012-06-20 Thread Barrett, Brian W
Hi all - I'm seeing the compile error with the OMPI trunk and OFED 15.3.1. Has anyone seen this before? I have vague recollections of seeing e-mail discussion on the issue, but can't find those e-mails now... Thanks, Brian In file included from ../../../../opal/mca/hwloc/hwloc.h:87,

[OMPI devel] Fwd: Component Maintainers

2012-06-25 Thread Barrett, Brian W
Hi all - As a reminder, we're going to be discussing supported components for 1.7 at the teleconference tomorrow. There are a number of components that are currently lacking a maintainer; please consider signing up to help distribute the load across all OMPI participants. Thanks, Brian & Ral

Re: [OMPI devel] [EXTERNAL] u_int32_t typo in nbc_internal.h?

2012-06-28 Thread Barrett, Brian W
Yes, I think that's right. Sorry, this is what happens when you use code from Torsten ;). I'll fix today. Brian On Jun 27, 2012, at 8:45 PM, Eugene Loh wrote: > ompi/mca/coll/libnbc/nbc_internal.h > > 259/* Schedule cache structures/functions */ > 260u_int32_t adler32(u_int32_t a

Re: [OMPI devel] [EXTERNAL] Trunk compilation broken

2012-07-02 Thread Barrett, Brian W
On 7/2/12 11:00 AM, "Nathan Hjelm" wrote: >With platform contrib/platform/lanl/tlss/debug-panasus I get an error: >make[2]: Entering directory >`/panfs/scratch/vol7/hjelmn/turing/ompi-trunk-git/ompi/tools/ompi_info' > CCLD ompi_info >../../../ompi/.libs/libmpi.so: undefined reference to `NBC_O

[OMPI devel] Open MPI 1.7 development and testing

2012-07-02 Thread Barrett, Brian W
Hello everyone - This morning the branch for the 1.7 release series was created (and all the other ancillary stuff that goes along with a branch also occurred). The nightly tarballs are available at the expected location (http://www.open-mpi.org/nightly/v1.7/), with the first generated this mornin

Re: [OMPI devel] [EXTERNAL] ibarrier failures on MTT

2012-07-05 Thread Barrett, Brian W
On 7/3/12 5:08 PM, "Eugene Loh" wrote: >I'll look at this more, but for now I'll just note that the new ibarrier >test is showing lots of failures on MTT (cisco and oracle). I was initializing the MPI_ERROR field of the request status after calling the request completion function, which was caus

Re: [OMPI devel] [EXTERNAL] Re: non-blocking barrier

2012-07-06 Thread Barrett, Brian W
Yeah, there was a bug in the code. Fixed now. Brian On 7/6/12 10:47 AM, "Richard Graham" wrote: >Forget what I just posted - I looked at George's words, and not the code >- wait() is the synchronization point, so George's response is correct. > >Rich > >-Original Message- >From: devel-

Re: [OMPI devel] [EXTERNAL] reduce_scatter_block failing on v1.7

2012-07-06 Thread Barrett, Brian W
On 7/6/12 2:31 PM, "Eugene Loh" wrote: >The new reduce_scatter_block test is segfaulting with v1.7 but not with >the trunk. When we drop down into MPI_Reduce_scatter_block and attempt >to call > >comm->c_coll.coll_reduce_scatter_block() > >it's NULL. (So is comm->c_coll.coll_reduce_scatter_bloc

Re: [OMPI devel] [EXTERNAL] MPI_Ibarrier fails

2012-07-11 Thread Barrett, Brian W
Thanks for the bug report. This has been fixed in r26784 of the trunk and should be in tonight's tarball. Brian On 7/11/12 6:09 AM, "Mikhail Kurnosov" wrote: >Hello, > >In the case of single process the MPI_Ibarrier call fails (seg. fault). >Request object does not initialized in this function

Re: [OMPI devel] [EXTERNAL] MPI_Ireduce_scatter_block hangs

2012-07-12 Thread Barrett, Brian W
Hello - Thank you for the bug report. This has been fixed in the trunk. Brian On 7/12/12 1:46 AM, "Mikhail Kurnosov" wrote: >Hello, > >In the case of single process the MPI_Ireduce_scatter_block is >segfaulting with v1.9a1r26786. > >But in other cases (commsize >= 2) processes hang in >MPI_Ir

Re: [OMPI devel] [EXTERNAL] non-blocking collectives, SPARC, and alignment

2012-07-16 Thread Barrett, Brian W
Eugene - It's unlikely that I will have time to fix this in the short term. The scheduling code is fairly localized in nbc.c if Oracle has some time to spend looking at these issues. If not, it might be best to remove the libnbc code from 1.7, as it's unfortunately clear that it's not as ready f

Re: [OMPI devel] [EXTERNAL] Re: MPI_Mprobe

2012-08-14 Thread Barrett, Brian W
On 8/8/12 11:28 PM, "Eugene Loh" wrote: >On 8/7/2012 5:45 AM, Jeff Squyres wrote: >> So the issue is when (for example) Fortran MPI_Recv says "hey, C ints >>are the same as Fortran INEGERs, so I don't need a temporary MPI_Status >>buffer; I'll just use the INTEGER array that I was given, and pass

Re: [OMPI devel] [EXTERNAL] Re: MPI_Mprobe

2012-08-14 Thread Barrett, Brian W
On 8/14/12 8:30 AM, "Jeff Squyres" wrote: >On Aug 14, 2012, at 10:04 AM, Barrett, Brian W wrote: > >> That's incorrect. Fortran statuses should never be passed to C >> interfaces. If you look at testany_f.c, for example, a temporary status >> is create

[OMPI devel] "Fake" mpool usage

2012-10-18 Thread Barrett, Brian W
All - I'm trying to clean up the MX situation in 1.7 with regards to the fake mpool and have some questions. It looks like the point of the fake mpool is to translate a memory hook release into a free call in some other library. My question is why we're using the mpool to do that. Since opal al

Re: [OMPI devel] [EXTERNAL] [patch] SEGV on processing unexpected messages

2012-10-18 Thread Barrett, Brian W
I'm torn on this one. On the one hand, I think this is probably the most performant solution. On the other hand, it feels icky; a more clean solution would be to use hdr->type to determine the size to copy. What do others think? Brian On 10/17/12 9:06 PM, "Kawashima, Takahiro" wrote: >Hi Op

Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Barrett, Brian W
Your first e-mail got eaten by our virus scanner (it doesn't like .bz2 files), but we could probably only register the libnbc progress function on first use, but it would slightly slow down all non blocking collectives. Probably worth it, but not sure I'll have time to add that code today. Brian

[OMPI devel] MX BTL segfaults

2012-10-25 Thread Barrett, Brian W
Hi all - The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my mpool change in r27485). I'm not really interested in fixing it; the problem does not occur with the MX MTL. Does anyone else have interest in fixing it? If not, should we remove it from the trunk (we already remo

Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-26 Thread Barrett, Brian W
On 10/25/12 10:55 AM, "Jeff Squyres" wrote: >Something that might not be clear from my initial writeup: > >1. I had to go change C code to disable libnbc. Since non-blocking >collectives are part of MPI-3: > a) we have no convenient configure argument to not build the libnbc >coll component (t

Re: [OMPI devel] [EXTERNAL] Re: Compile-time MPI_Datatype checking

2012-10-31 Thread Barrett, Brian W
On 10/31/12 1:39 PM, "Paul Hargrove" wrote: >No, I don't have specific usage cases that concern me. > > >As I said a minute or two ago in a reply to Ralph, my concern is that the >Sandia codes provide an "existence proof" that "really smart people" can >write questionable code at times. So, I fe

Re: [OMPI devel] [EXTERNAL] Re: Compile-time MPI_Datatype checking

2012-10-31 Thread Barrett, Brian W
On 10/31/12 1:57 PM, "Dmitri Gribenko" wrote: >On Wed, Oct 31, 2012 at 9:51 PM, Barrett, Brian W >wrote: >> On 10/31/12 1:39 PM, "Paul Hargrove" wrote: >> >>>No, I don't have specific usage cases that concern me. >>> >>> &g

Re: [OMPI devel] [EXTERNAL] Re: running top-level autogen.sh breaks romio in 1.6.3 tarball

2012-10-31 Thread Barrett, Brian W
I don't think this is actually old autotools, since those are the most recent. My guess is that there's an m4 file not being included in the tarball. I'll try to take a look, but we probably need to fix a Makefile in ROMIO. Brian On 10/31/12 2:46 PM, "Ralph Castain" wrote: >We've seen this be

Re: [OMPI devel] [EXTERNAL] Re: running top-level autogen.sh breaks romio in 1.6.3 tarball

2012-11-01 Thread Barrett, Brian W
David - Thanks for the bug report; the missing file will be in the tarball of the next release. Brian On 10/31/12 3:15 PM, "David Shrader" wrote: >Hello, > >Thank you for the reply! All of the autotools I am using have the same >or higher versions than those specified at >http://www.open-mpi.o

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r27580 - in trunk: ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/ml opal/util/keyval orte/mca/rmaps/rank_file

2012-12-04 Thread Barrett, Brian W
We should never have to have the makefile extension. Just making sure the lex file gets included should work. When Nathan commits his patch, I'll take a look. Brian Sent with Good (www.good.com) -Original Message-

Re: [OMPI devel] [EXTERNAL] RFC: Enable thread support by default

2012-12-10 Thread Barrett, Brian W
On 12/8/12 7:59 PM, "Ralph Castain" wrote: >WHAT:Enable both OPAL and libevent thread support by default > >WHY: We need to support threaded operations for MPI-3, and for >MPI_THREAD_MULTIPLE. > Enabling thread support by default is the only way to >ensure we fix all the

Re: [OMPI devel] [EXTERNAL] RFC: Enable thread support by default

2012-12-10 Thread Barrett, Brian W
On 12/10/12 11:25 AM, "Ralph Castain" wrote: > >On Dec 10, 2012, at 10:15 AM, "Barrett, Brian W" >wrote: > >> On 12/8/12 7:59 PM, "Ralph Castain" wrote: >> >>> WHAT:Enable both OPAL and libevent thread support by default >

Re: [OMPI devel] [EXTERNAL] RFC: Enable thread support by default

2012-12-10 Thread Barrett, Brian W
read support by default On Dec 10, 2012, at 10:35 AM, "Barrett, Brian W" wrote: > On 12/10/12 11:25 AM, "Ralph Castain" wrote: > >> >> On Dec 10, 2012, at 10:15 AM, "Barrett, Brian W" >> wrote: >> >>> On 12/8/12 7:59 PM, "

[OMPI devel] bcol basesmuma maintainer?

2013-01-02 Thread Barrett, Brian W
Hi all - Who's maintaining the bcol basesmuma component? I'd like to commit the attached patch, which cleans up some usage of process names, but want a second pair of eyeballs. The orte_namelist_t type is meant for places where the orte_process_na me_t needs to be put on a list. In basesmuma, i

[OMPI devel] RFC: RTE Framework

2013-01-21 Thread Barrett, Brian W
Hi all - As discussed at the December developer's meeting, a number of us have been working on a framework in OMPI to encompass the RTE resources (typically provided by ORTE). This follows on work Oak Ridge did on the ORCA layer, which ended up having a number of technical challenges and was drop

Re: [OMPI devel] [EXTERNAL] Re: RTE Framework

2013-01-23 Thread Barrett, Brian W
l of granularity one can bring in additional capabilities. >> I have not looked in detail yet, but will in the near future. >> >> Thanks, >> Rich >> >> -Original Message- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On &g

Re: [OMPI devel] [EXTERNAL] Open MPI Configure Script

2013-01-28 Thread Barrett, Brian W
On 1/28/13 10:50 AM, "David Beer" wrote: > By way of introduction, I'm a TORQUE developer and I probably should've joined > this list - even if only to keep myself informed - years ago. > > At any rate, we're in the process of changing TORQUE so that it compiles using > g++ instead of gcc. We're

Re: [OMPI devel] [EXTERNAL] Open MPI Configure Script

2013-01-28 Thread Barrett, Brian W
On 1/28/13 11:54 AM, "David Beer" wrote: > checking for tm_init in -ltorque... no > configure: error: TM support requested but not found. Aborting > > Oddly enough, if you have already configured with an older version of TORQUE, > you can build open-mpi with TORQUE 4.2 installed, so it can find

Re: [OMPI devel] [EXTERNAL] trunk install failure [brbarret]

2013-01-29 Thread Barrett, Brian W
Thanks for noticing this. Fixed in the trunk. Brian On 1/28/13 11:15 PM, "Paul Hargrove" wrote: > Using tonight's trunk tarball (r27954) configured using "--with-devel-headers" > it looks like "make install" is trying to install rte_orte.h TWICE: > >> /usr/bin/install -c -m 644 ../../../../

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-01 Thread Barrett, Brian W
I don't think this is right either. Excluding a device that doesn't exist has many use cases. Such as disabling a network that only exists on part of the cluster. I'm not sure about what to do with seq; it's more like include than exclude. Brian Sent with Good (www.good.com) -Origina

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-04 Thread Barrett, Brian W
I'm confused; why is it disastrous to have an interface in if_exclude that doesn't exist? I can see it being a problem if we don't exclude something in the list, but the other way is (in my opinion) harmless but with a useful use case... Brian Sent with Good (www.good.com) -Original

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-05 Thread Barrett, Brian W
t I have on some machines, >every single MPI job hung because they tried to use those interfaces to >communicate with processes on other nodes that that interface could not >reach. > > > >On Feb 4, 2013, at 5:56 PM, "Barrett, Brian W" wrote: > >> I'm confused

  1   2   3   >