Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication

2012-11-01 Thread TERRY DONTJE
IIRC, the first 16 or so messages over the openib btl uses the send/recv API as opposed to rdma which is significantly faster. I am not sure as to how 1.5.3 and multi-rail affects this but the preconnected I believe short circuits when one cuts over to use rdma for eager messages. --td On

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread TERRY DONTJE
Interestingly enough it worked for me for a while and then after many runs I started seeing the below too. --td On 7/26/2012 11:07 AM, Ralph Castain wrote: Hmmm...it was working for me, but I'll recheck. Thanks! On Jul 26, 2012, at 8:04 AM, George Bosilca wrote: r26868 seems to have some

Re: [hwloc-devel] hwloc v1.5rc1 coming soon

2012-07-10 Thread TERRY DONTJE
Is this also going to include the topology_solaris.c improvements? --td On 7/10/2012 3:16 PM, Brice Goglin wrote: I am preparing a v1.5rc1 release, so here's the current status in case somebody wants to comment. Major changes for is v1.5: * instruction caches * lstopo becomes lstopo +

Re: [OMPI devel] openib compiler warnings

2012-07-10 Thread TERRY DONTJE
On 7/10/2012 1:57 PM, TERRY DONTJE wrote: On 7/10/2012 12:50 PM, Jeff Squyres wrote: I'm getting these compiler warnings on the SVN trunk HEAD (r26776): btl_openib.c: In function 'mca_btl_openib_put': btl_openib.c:1652: warning: assignment makes integer from pointer without a cast

Re: [OMPI devel] openib compiler warnings

2012-07-10 Thread TERRY DONTJE
On 7/10/2012 12:50 PM, Jeff Squyres wrote: I'm getting these compiler warnings on the SVN trunk HEAD (r26776): btl_openib.c: In function 'mca_btl_openib_put': btl_openib.c:1652: warning: assignment makes integer from pointer without a cast btl_openib.c: In function 'mca_btl_openib_get':

[OMPI devel] openib max_cqe

2012-07-05 Thread TERRY DONTJE
With Jeff's latest changes to how we set up the cq_size I am now seeing error messages saying that my machine's memlocked limits are too low. I am concerned that it might be something else because my max'd locked memory is unlimited on my machine. So if I do a run of -np 2 across two

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread TERRY DONTJE
So is ofacm another replacement for ibcm and rdmacm? --td On 7/2/2012 11:20 AM, Nathan Hjelm wrote: Nice! Are we moving this to 1.7 as well? -Nathan On Mon, Jul 02, 2012 at 11:20:12AM -0400, svn-commit-mai...@open-mpi.org wrote: Author: pasha (Pavel Shamis) Date: 2012-07-02 11:20:12 EDT

Re: [hwloc-devel] HWLOC_NBMAXCPUS

2012-06-29 Thread TERRY DONTJE
DONTJE <terry.don...@oracle.com> a écrit : I finally got access to the big machine again. The code you sent me seems to work nicely. Are you going to put it back to the hwloc trunk and 1.4 branches? --td On 6/25/2012 11:31 AM, TERRY DONTJE wrote: I'll test out the patc

Re: [hwloc-devel] HWLOC_NBMAXCPUS

2012-06-25 Thread TERRY DONTJE
I'll test out the patch once the test machine is available again. --td On 6/25/2012 3:42 AM, Brice Goglin wrote: Hello Terry, Here's a patch that should help. It cleans the code and makes all arrays dynamic. I artificially set the initial array sizes to 4 to experience the code on our 24-way

Re: [OMPI devel] OpenIB compile error

2012-06-25 Thread TERRY DONTJE
On 6/25/2012 10:12 AM, Jeff Squyres wrote: On Jun 25, 2012, at 5:44 AM, TERRY DONTJE wrote: Hmmm, I guess I could see the thinking of tying ofud and openib btls configuring together. However it seems inconsistent to me that one btl doesn't allow you to control configuring

Re: [OMPI devel] OpenIB compile error

2012-06-25 Thread TERRY DONTJE
On 6/23/2012 6:32 AM, Jeff Squyres wrote: On Jun 22, 2012, at 11:26 PM, TERRY DONTJE wrote: 4. The behavior of --with[out]-verbs is as was described in a prior mail: - if --with-verbs is specified, all 3 verbs-based components must succeed - if --without-verbs is specified, all 4 verbs

Re: [OMPI devel] OpenIB compile error

2012-06-23 Thread TERRY DONTJE
On 6/22/2012 3:36 PM, Jeff Squyres wrote: To update everyone: there was much more discussion about this off-list. :-) We decided to do the following: 1. The name --with-verbs seems better than --with-openfabrics, if for no other reason than the name "openfabrics" encompasses more things

[OMPI devel] bug in r26626

2012-06-22 Thread TERRY DONTJE
It looks like compilation of 32 bit platforms is failing due to a missing field. It looks to me that for some reason r26626 deleted hdr_segkey in ompi/mca/osc/rdma/osc_rdma_header.h which is used in the macro OMPI_OSC_RDMA_RDMA_INFO_HDR_NTOH and HTON. Is there a reason that hdr_segkey was

Re: [hwloc-devel] HWLOC_NBMAXCPUS

2012-06-21 Thread TERRY DONTJE
I guess I was looking at the wrong version of code since I now see that topology-linux.c has fixed this issue. I guess I will need to look to port this change over to solaris-topology.c --td On 6/21/2012 9:46 AM, TERRY DONTJE wrote: I see a couple places where HWLOC_NBMAXCPUS is defined

[hwloc-devel] HWLOC_NBMAXCPUS

2012-06-21 Thread TERRY DONTJE
I see a couple places where HWLOC_NBMAXCPUS is defined with a comment of "FIXME: drop". This static size just bit me on a machine that has 1440 CPUs. I can bump up the define in my clone but I was wondering if this fixed size might change in the near future? -- Terry D. Dontje | Principal

Re: [OMPI devel] OpenIB compile error

2012-06-21 Thread TERRY DONTJE
On 6/21/2012 6:38 AM, Jeff Squyres wrote: On Jun 21, 2012, at 6:11 AM, TERRY DONTJE wrote: As far as I understand it is not reason to rename it. The OFED-lovin components should look at $with_openib. I agree with Pasha that the reason you give for renaming openib btl seem orthogonal

Re: [OMPI devel] OpenIB compile error

2012-06-21 Thread TERRY DONTJE
On 6/20/2012 5:02 PM, Jeff Squyres wrote: On Jun 20, 2012, at 4:25 PM, Shamis, Pavel wrote: I hate it ... As far as I understand it is not reason to rename it. The OFED-lovin components should look at $with_openib. I agree with Pasha that the reason you give for renaming openib btl seem

[OMPI devel] openib Dynamic SL opensm-devel usage

2012-06-18 Thread TERRY DONTJE
I've ran into an issue compiling openib's Dynamic SL support on a RH 6.2 based system with the Oracle Studio compilers. Turns out if I compile btl_openib_connect_sl.c with the Oracle Studio compilers with the "-g" option the compiler compiles some static inline functions in ib_types.h

Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
#endif + Void_t* _int_malloc(mstate av, size_t bytes) { -- End of Patch -- On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote: I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 12.1.0 for

Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
timization_level 1 +#endif +#endif + Void_t* _int_malloc(mstate av, size_t bytes) { -- End of Patch -- On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote: I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in opal_memory_ptmalloc2_int_free when OMPI trunk is compil

[OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread TERRY DONTJE
I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 12.1.0 for 64 bit on linux. Just wondering if anyone has seen anything similar to this with a different version of icc. Other non-Intel compilers seem

Re: [OMPI devel] Non-zero exit status

2012-04-14 Thread TERRY DONTJE
't believe we are setting the env-var which is why I think we have a regression. It also seems very suspicious to me that both Oracle and IU are seeing the same condition in MTT. I'll look into this more on Monday. --td On Apr 13, 2012, at 4:32 PM, TERRY DONTJE wrote: I could see if l

Re: [OMPI devel] Non-zero exit status

2012-04-13 Thread TERRY DONTJE
I could see if less then N processes exit with non-zero exit code that the ORTE may choose not to abort the job. However, if all N processes have exited or aborted I expect everything to clean up and mpirun to exit. It does not do that at the moment which I think is what is causing most of

Re: [OMPI devel] r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread TERRY DONTJE
there be other OSes that could run into the same problem? Or what happens if PTMALLOC2 is not used (does that happen)? --td On 4/13/2012 10:45 AM, TERRY DONTJE wrote: r26255 is forcing the use of __malloc_hook which is implemented in opal/mca/memory/linux however that is not compiled in the library

[OMPI devel] r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread TERRY DONTJE
r26255 is forcing the use of __malloc_hook which is implemented in opal/mca/memory/linux however that is not compiled in the library when built on Solaris thus causing a referenced symbol not found when libmpi tries to load the openib btl. I am looking how to fix this now but if someone has a

[OMPI devel] trunk regressions

2012-04-09 Thread TERRY DONTJE
After looking at Oracles MTT results there seem to be a (some??) regressions between r26240 and 26249 detected by the ibm and intel tests suites. An example of this is the failures in the comm_join, final and loop_spawn tests of the ibm test suite as seen in

Re: [OMPI devel] [EXTERNAL] Re: Developers Meeting

2012-04-09 Thread TERRY DONTJE
+1 here too. --td On 4/6/2012 11:19 PM, Barrett, Brian W wrote: Agreed. Brian On Apr 6, 2012, at 7:31 PM, Ralph Castain wrote: +1 for SJ - much easier to be someplace with a major airport. On Apr 5, 2012, at 7:54 AM, Gutierrez, Samuel K wrote: My vote is for San Jose. Sam

Re: [OMPI devel] OpenMPI and R

2012-04-06 Thread TERRY DONTJE
Have you tried to compile and run a simple MPI program with your installed Open MPI? If that works then you need to figure out what is being done by the Makefile when it is "testing if installed package can be loaded" and try and reproduce the issue manually. BTW, I normally configure my

Re: [OMPI devel] Intel test MPI_Keyval3_f now failing

2012-04-05 Thread TERRY DONTJE
bindings will be replaced in about 2 weeks, and the problem doesn't occur on my mpi3-fortran bitbucket. On Apr 5, 2012, at 7:03 AM, TERRY DONTJE wrote: I noticed both IU and Oracle are seeing failures on the trunk with Intel test MPI_Keyval3_f. This was with r26237 and the last successful MTT

[OMPI devel] Intel test MPI_Keyval3_f now failing

2012-04-05 Thread TERRY DONTJE
I noticed both IU and Oracle are seeing failures on the trunk with Intel test MPI_Keyval3_f. This was with r26237 and the last successful MTT run of this test was r26232. I looked at the log and nothing popped out at me. I'll try and narrow this down a little further but that won't be until

Re: [OMPI devel] SUSE verification

2012-03-21 Thread TERRY DONTJE
Sorry the below cc line if for Solaris Studio compilers if you have gcc replace "-G" with "-shared". thanks, --td On 3/21/2012 11:32 AM, TERRY DONTJE wrote: I ran into a problem on a Suse 10.1 system and was wondering if anyone has a version of Suse newer than 10.1 that c

[OMPI devel] SUSE verification

2012-03-21 Thread TERRY DONTJE
I ran into a problem on a Suse 10.1 system and was wondering if anyone has a version of Suse newer than 10.1 that can try the following test and send me the results. -testpci cat

Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-23 Thread TERRY DONTJE
On 2/22/2012 8:53 PM, Jeffrey Squyres wrote: Terry / Eugene -- Can you comment? Sorry I cannot. --td On Feb 22, 2012, at 3:16 PM, Paul H. Hargrove wrote: I think I have the beginning of a fix for this issue. I had not even noticed earlier that the error in event.h is from the C++

Re: [OMPI devel] 1.5 supported systems

2012-02-23 Thread TERRY DONTJE
I actually think the systems tested line for Solaris should read: - Oracle Solaris 10 and 11, 32 and 64 bit (SPARC, i386, x86_64), with Oracle Solaris Studio 12.2 and 12.3 --td On 2/22/2012 8:55 PM, Paul H. Hargrove wrote: Folks at Oracle should decide, but I suspect "Solaris 10" should be

Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread TERRY DONTJE
On 2/21/2012 5:55 AM, Jeff Squyres (jsquyres) wrote: That is truly bizarre "make" behavior. Heads up that in the upcoming fortran revamp, we *only* use FC. I.E., there's only mpifort wrapper compiler (mpif77 and mpif90 still exist, but only as sym links to mpifort, signifying that mpifort

Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread TERRY DONTJE
On 2/10/2012 11:50 AM, Jeff Squyres wrote: This is an open question to OMPI developers... It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen is activated. This IP interface is only used to communicate with the local Xen instance(s); it is not used to communicate

Re: [OMPI devel] 1.4.5rc5 has been released

2012-02-08 Thread TERRY DONTJE
On 2/7/2012 9:57 PM, Paul H. Hargrove wrote: On 2/7/2012 2:37 PM, Paul H. Hargrove wrote: + "make check" fails atomics tests using GCCFSS-4.0.4 compilers on Solaris10/SPARC Originally reported in: http://www.open-mpi.org/community/lists/devel/2012/01/10234.php This is a matter of the

Re: [OMPI devel] 1.4.5rc5 has been released

2012-02-08 Thread TERRY DONTJE
On 2/7/2012 4:25 PM, Paul H. Hargrove wrote: On 2/7/2012 8:59 AM, Jeff Squyres wrote: This fixes all known issues. Well, not quite... I've SUCCESSFULLY retested 44 out of the 55 cpu/os/compiler/abi combinations currently on my list. I expect 9 more by the end of the day (the

Re: [OMPI devel] 1.4.5rc2 more Solaris Studio compiler results

2012-01-30 Thread TERRY DONTJE
On 1/29/2012 7:40 PM, Paul Hargrove wrote: I can additionally report success w/ ILP32 builds with both SS12.2 and 12.3 compilers on x86-64 and sun4v systems running Solaris and x86-64/Linux: solaris-10 Generic_137111-07/sun4v (*FLAGS="-m32 -xarch=sparc" for v8plus ABI) solaris-11

Re: [OMPI devel] 1.4.5rc2 now released

2012-01-20 Thread TERRY DONTJE
On 1/19/2012 5:22 PM, Paul H. Hargrove wrote: Minor documentation nit, which might apply to the 1.5 branch as well (didn't check). README says: - Open MPI does not support the Sparc v8 CPU target, which is the default on Sun Solaris. The v8plus (32 bit) or v9 (64 bit) targets must be

Re: [hwloc-devel] Solaris visibility issue

2012-01-18 Thread TERRY DONTJE
Ok, I tried the patch and it seems to work for me. On 1/18/2012 2:10 PM, Samuel Thibault wrote: TERRY DONTJE, le Wed 18 Jan 2012 19:23:03 +0100, a écrit : Don't you need the function to make lstopo work? lstopo itself does not need it, only internal functions of libhwloc needs it. Could you

Re: [hwloc-devel] Solaris visibility issue

2012-01-18 Thread TERRY DONTJE
On 1/18/2012 1:17 PM, Samuel Thibault wrote: TERRY DONTJE, le Wed 18 Jan 2012 18:52:50 +0100, a écrit : Also, I tried to build v1.4 and the trunk and I keep getting linkage errors on lstopo-lstopo-draw.o complaining about hwloc_insert_object_by_cpuset being undefined. It is defined in ./src

Re: [hwloc-devel] topology-solaris-chiptype.c bugs on processortype

2012-01-18 Thread TERRY DONTJE
On 1/18/2012 12:59 PM, TERRY DONTJE wrote: On 1/18/2012 12:51 PM, Samuel Thibault wrote: Samuel Thibault, le Wed 18 Jan 2012 15:55:18 +0100, a écrit : I'm getting an issue with the topology-solaris-chiptype.c work on Solaris 10: in the ProcessorType case, the returned value does not look

Re: [hwloc-devel] topology-solaris-chiptype.c bugs on processortype

2012-01-18 Thread TERRY DONTJE
On 1/18/2012 12:51 PM, Samuel Thibault wrote: Samuel Thibault, le Wed 18 Jan 2012 15:55:18 +0100, a écrit : I'm getting an issue with the topology-solaris-chiptype.c work on Solaris 10: in the ProcessorType case, the returned value does not look like a readable string, I'm getting "¨", which

Re: [OMPI devel] missing file when running autogen - ALSO in 1.4.5rc1

2011-12-21 Thread TERRY DONTJE
Paul's probably more than likely right. The nightly runs Oracle does using MTT and tarballs do not do autogen.sh (which I believe is not expected anyways, right). All other builds we do using autogen.* are from an svn workspace. --td On 12/20/2011 8:21 PM, Paul H. Hargrove wrote: Not too

Re: [OMPI devel] Nodes already filled when spawning

2011-12-15 Thread TERRY DONTJE
, we changed the default for RM-given allocations to be no-oversubscribe. So your MTTs may well fail if they weren't updated as all those tests oversubscribe the nodes, and are running in RM environments. On Dec 15, 2011, at 8:37 AM, TERRY DONTJE wrote: Last night MTT test results

[OMPI devel] Nodes already filled when spawning

2011-12-15 Thread TERRY DONTJE
Last night MTT test results for 1.7a1r25652 from IU and Oracle is showing failures during some of the spawn tests see http://www.open-mpi.org/mtt/index.php?do_redir=2036. Essentially, the test are failing with the message: All nodes which are allocated for this job are already filled. I

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread TERRY DONTJE
On 11/23/2011 1:45 PM, Lukas Razik wrote: TERRY DONTJE<terry.don...@oracle.com> wrote Can you build OMPI as a 32 bit library and see if that works any better? So you mean I shall leave the whole OFED stack as 64 bit and build only openmpi as 32 bit? I believe the OFED user lib

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread TERRY DONTJE
On 11/23/2011 11:05 AM, Lukas Razik wrote: TERRY DONTJE<terry.don...@oracle.com> wrote: Nuts!!! Ok I am going to have to think about this a little more. Do you have the ability to configure and remake your ompi install? I might want to have you add some stuff to help me track thi

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread TERRY DONTJE
On 11/23/2011 10:11 AM, Lukas Razik wrote: TERRY DONTJE<terry.don...@oracle.com> wrote: Can you try running the benchmark with coalescing off? To do that add the following option to your mpirun line "-mca btl_openib_use_message_coalescing 0". I've tried this:

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread TERRY DONTJE
On 11/23/2011 9:57 AM, Lukas Razik wrote: TERRY DONTJE<terry.don...@oracle.com> wrote: On 11/22/2011 6:59 PM, Lukas Razik wrote: Roland Dreier<rol...@purestorage.com> wrote: On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik<li...@razik.name> wrote: #0

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread TERRY DONTJE
On 11/22/2011 6:59 PM, Lukas Razik wrote: Roland Dreier wrote: On Tue, Nov 22, 2011 at 3:05 PM, Lukas Razik wrote: #0 0xf8010229ba9c in mca_pml_ob1_send_request_start_copy (sendreq=0xb23200, bml_btl=0xb29050, size=0) at pml_ob1_sendreq.c:551

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-22 Thread TERRY DONTJE
So with the aliasing scheme the code for openib would still under ompi/mca/btl/openib but you could access it with -mca btl ofrc? Ok, so when an error happens in the openib btl how does it identify itself? Does it use openib or ofrc? This seems like there could be some user confusion by

Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-22 Thread TERRY DONTJE
On 11/22/2011 5:49 AM, TERRY DONTJE wrote: The error you are seeing is usually indicative of some code operating on memory that isn't aligned properly for a SPARC instruction being used. The address that is causing the failure is odd aligned which is more than likely the culprit. If you

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-17 Thread TERRY DONTJE
On 11/17/2011 9:54 AM, Ralph Castain wrote: On Nov 17, 2011, at 7:45 AM, TERRY DONTJE wrote: I could possibly buy your argument Ralph if this was a one off BTL that only Nathan (and his employer) is going to use. I am assuming though this is a more general protocol for a vendor specific

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-17 Thread TERRY DONTJE
vote that counts is Nathan's - it's his btl, and we have never forcibly made someone rename their component. I would suggest we not set that precedent. I'm comfortable with whatever he decides to call it. On Nov 17, 2011, at 7:00 AM, TERRY DONTJE wrote: +1 Isn't there precedent

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-17 Thread TERRY DONTJE
+1 Isn't there precedent with the other BTLs to name them based on the messaging protocol they are supporting instead of some movie character (tcp, openib, shmem, portals, ...). --td On 11/17/2011 8:11 AM, Jeff Squyres wrote: After having to explain to someone at SC for the umpteenth time

Re: [OMPI devel] r25470 (hwloc CMR) breaks v1.5

2011-11-16 Thread TERRY DONTJE
On 11/15/2011 10:16 PM, Jeff Squyres wrote: On Nov 14, 2011, at 10:17 PM, Eugene Loh wrote: I tried building v1.5. r25469 builds for me, r25470 does not. This is Friday's hwloc putback of CMR 2866. I'm on Solaris11/x86. The problem is basically: Doh! Making all in tools/ompi_info

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-02 Thread TERRY DONTJE
On 11/1/2011 7:48 PM, Jeff Squyres wrote: So this was slightly different than the opinion that was discussed on the call today, which was 2. The rationale for #2 was to punish developers, but if such a bug did make it through to production, users wouldn't be annoyed with show_help messages

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25302

2011-10-18 Thread TERRY DONTJE
Strange - it ran fine for me on multiple tests. I'll check to see if something strange got into the mix and recommit. Not sure it is the same issue but it looks like all my MTT tests on the trunk r25308 are timing out. --td On Oct 17, 2011, at 8:51 PM, George Bosilca wrote: This commit

Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread TERRY DONTJE
BTW, I am working on a patch for this. Just want to validate there are no other loose ends. I remember there were a couple oddities about this issue. --td Never mind; I just ready your text more carefully - 2887 caused the problem. Sent from my phone. No type good. On Oct 18, 2011, at

Re: [OMPI devel] [OMPI bugs] [Open MPI] #2888: base.h inclusion breaks Solaris build

2011-10-18 Thread TERRY DONTJE
Terry - Did #2887 fix this already? No it broke it. --td Sent from my phone. No type good. On Oct 18, 2011, at 6:19 AM, "Open MPI" wrote: #2888: base.h inclusion breaks Solaris build + Reporter: tdd | Owner:

Re: [hwloc-devel] CPU Model and type

2011-09-16 Thread TERRY DONTJE
On 9/14/2011 5:54 AM, Brice Goglin wrote: Le 13/09/2011 22:06, TERRY DONTJE a écrit : On 9/13/2011 9:54 AM, Brice Goglin wrote: Le 13/09/2011 21:51, TERRY DONTJE a écrit : Both type and model are character strings. An example of what I currently store in the sysinfo structures are: type

Re: [hwloc-devel] CPU Model and type

2011-09-14 Thread TERRY DONTJE
On 9/14/2011 5:54 AM, Brice Goglin wrote: Le 13/09/2011 22:06, TERRY DONTJE a écrit : On 9/13/2011 9:54 AM, Brice Goglin wrote: Le 13/09/2011 21:51, TERRY DONTJE a écrit : Both type and model are character strings. An example of what I currently store in the sysinfo structures are: type

Re: [hwloc-devel] CPU Model and type

2011-09-13 Thread TERRY DONTJE
On 9/13/2011 9:23 AM, Brice Goglin wrote: Le 12/09/2011 21:01, Brice Goglin a écrit : Le 09/09/2011 13:25, TERRY DONTJE a écrit : On 9/8/2011 3:10 PM, Brice Goglin wrote: Hello Terry, Indeed there's nothing like this as of today. We talked about it in the past but it's not very easy

Re: [hwloc-devel] CPU Model and type

2011-09-09 Thread TERRY DONTJE
s. Since one can look for the named object and it is either going to be there or not :-). Anyway, I'll play around with this for Solaris. Can I then email you the diff for a review? thanks, --td Le 08/09/2011 20:57, TERRY DONTJE a écrit : I wanted to verify that I am not overlooking some

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
e all help / error messages As you can see it is identical to the output in your test. george. On Aug 18, 2011, at 12:29 , TERRY DONTJE wrote: Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same except I don't see the "readv failed.." message. Ha

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
george. On Aug 18, 2011, at 12:29 , TERRY DONTJE wrote: Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same except I don't see the "readv failed.." message. Have your tried to run this code yourself? It is pretty simple and fails with one node using

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same except I don't see the "readv failed.." message. Have your tried to run this code yourself? It is pretty simple and fails with one node using np=4. --td On 8/18/2011 10:57 AM, Wesley Bland wrote: I

[OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
I am seeing the intel test suite tests MPI_Errhandler_fatal_c and MPI_Errhandler_fatal_f fail with an oob failure quite a bit I have not seen this test failing under MTT until the epoch code was added. So I have a suspicion the epoch code might be at fault. Could someone familiar with the

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24830

2011-07-14 Thread Terry Dontje
people. Other question is do Oracle folks care about IB QoS and torus/mesh topologies w.r.t. OMPI, because otherwise the dynamic SL is irrelevant. It is not an extreme priority of ours but we would like to support it. --td -- YK On Jul 14, 2011, at 7:24 AM, Terry Dontje wrote: I do but my

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24830

2011-07-14 Thread Terry Dontje
On 7/14/2011 9:30 AM, Yevgeny Kliteynik wrote: On 14-Jul-11 4:21 PM, Paul H. Hargrove wrote: On 7/13/2011 11:42 PM, Yevgeny Kliteynik wrote: [adding Terry] On 14-Jul-11 2:49 AM, Eugene Loh wrote: On 7/13/2011 4:31 PM, Paul H. Hargrove wrote: On 7/13/2011 4:20 PM, Yevgeny Kliteynik wrote:

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24830

2011-07-14 Thread Terry Dontje
I do but my machine room's power is down so I don't have access to it right now. I will grope around once it comes up to see what it has. I also have sent email to our IB team for some direction. --td On 7/14/2011 2:42 AM, Yevgeny Kliteynik wrote: [adding Terry] On 14-Jul-11 2:49 AM,

Re: [OMPI devel] opal_init/finalize counter --> boolean

2011-07-11 Thread Terry Dontje
Trying to uplevel this a bit so I can figure out which of these paths makes sense to me. Is the only reason we want to convert the symmetry of init and finalize to being asymmetric is to support an abort case? Forgive me Ralph, I know you had posted this in one of the emails but I wanted to

Re: [hwloc-devel] Merging the PCI branch?

2011-04-06 Thread Terry Dontje
On 04/06/2011 08:49 AM, Brice Goglin wrote: Le 31/03/2011 18:06, Jeff Squyres a écrit : On Mar 28, 2011, at 5:26 PM, Brice Goglin wrote: libpci is needed to make this work. And only Linux gives you OS devices for now (we use sysfs to translate between pci devs and os devs). Is libpci

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
wrote: On Mar 16, 2011, at 6:50 AM, Terry Dontje wrote: K. When Ralph and I removed that code, it was on he educated guess that no one was using it (because it hasn't compiled right in a while). If we were wrong, it can be put back, but someone will need to update it and Ralph and I don't have

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 06:34 AM, Terry Dontje wrote: On 03/16/2011 06:21 AM, Jeff Squyres wrote: On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: I've seen this with the following: RH 4.6 / OFED 1.3.6 Errr... did you look athttp://www.open-mpi.org/community/lists/devel/2011/03/9068.php? Yes I did

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
AM, "Terry Dontje" <terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote: On 03/16/2011 06:21 AM, Jeff Squyres wrote: On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: I've seen this with the following: RH 4.6 / OFED 1.3.6 Errr... did you look athttp://www.

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-16 Thread Terry Dontje
On 03/16/2011 06:21 AM, Jeff Squyres wrote: On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote: I've seen this with the following: RH 4.6 / OFED 1.3.6 Errr... did you look at http://www.open-mpi.org/community/lists/devel/2011/03/9068.php? Yes I did, and I will be talking with my group about

Re: [OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-15 Thread Terry Dontje
It looks to me like r24507 is what changed in btl_openib_connect_oob.c to include the two header files that are conflicting with each other. --td On 03/15/2011 01:39 PM, Terry Dontje wrote: While compiling btl_openib_connect_oob.c I am getting identifier redeclared: ib_gid_t. Looks like

[OMPI devel] trunk not compiling for btl_openib_connect_oob.c

2011-03-15 Thread Terry Dontje
While compiling btl_openib_connect_oob.c I am getting identifier redeclared: ib_gid_t. Looks like infiniband/mad.h defines this and then iba/types.h tries to redefine it. I am on Linux compiling with gcc. Is anyone else seeing the same issue or am I possibly dealing with some old s/w? --

Re: [OMPI devel] RFC: use ISO C99 style struct initialization

2011-01-19 Thread Terry Dontje
Hopefully we'll find out tomorrow but I think I vaguely remember an issue with the Studio compilers and this type of initialization style. --td On 01/19/2011 05:22 PM, Nathan Hjelm wrote: Done. I added the module orte/mca/debugger/dummy and I will remove it tomorrow. -Nathan HPC-3, LANL On

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
On 01/18/2011 07:48 AM, Jeff Squyres wrote: > IBCM is broken and disabled (has been for a long time). > > Did you mean RDMACM? > > No I think I meant OMPI oob. sorry, -- Oracle Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.781.442.2631 Oracle *- Performance

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
Could the issue have anything to do with the how OMPI implements lazy connections with IBCM? Does setting the mca parameter mpi_preconnect_all to 1 change things? --td On 01/16/2011 04:12 AM, Doron Shoham wrote: Hi, The gather hangs only in liner_sync algorithm but works with basic_linear

[OMPI devel] openib btl_openib_async_thread poll question

2010-12-21 Thread Terry Dontje
We're doing some testing with openib btl on a system with Solaris. It looks like Solaris can return POLLIN|POLLRDNORM in revents from a poll call. I looked at the manpages for Linux and it reads like Linux could possibly do this too. However the code in btl_openib_async_thread that checks

Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Terry Dontje
AM, Terry Dontje wrote: On 11/30/2010 09:00 AM, Jeff Squyres wrote: On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote: Can you make a v1.7 milestone on Trac, so I can move some of my tickets? Done. I have a question about Josh's recent ticket moves. One of them mentions 1.5 is stablizing quic

Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Terry Dontje
On 11/30/2010 09:00 AM, Jeff Squyres wrote: On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote: Can you make a v1.7 milestone on Trac, so I can move some of my tickets? Done. I have a question about Josh's recent ticket moves. One of them mentions 1.5 is stablizing quickly Josh can you

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r23998

2010-11-08 Thread Terry Dontje
Hmmm, it looks like you are right so my original change probably is the right thing then. --td On 11/08/2010 08:13 AM, Jeff Squyres wrote: It doesn't look like is needed at all in libevent207.h. Should it just be removed? On Nov 8, 2010, at 6:18 AM, Terry Dontje wrote: In light

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r23998

2010-11-08 Thread Terry Dontje
In light of the push event changes upstream to libevent the changes to libevent207.h probably should be modified to look like event.h. That is wrap the include with some ifdef for C++. I did not do this in the original fix because everything pulling it in was also pulling in opal_config.h

Re: [OMPI devel] v1.5.1: one idea

2010-10-11 Thread Terry Dontje
On 10/11/2010 06:11 AM, Jeff Squyres wrote: On Oct 10, 2010, at 7:49 AM, Terry Dontje wrote: At first glance this sounds like a sane approach but didn't we start with this same approach with 1.5.0? I know it was kind of required to do it for 1.5.0 but we did go off track with delivery. I

Re: [hwloc-devel] Support for solaris lgrp_affinity_set

2010-08-20 Thread Terry Dontje
Samuel Thibault wrote: Hello, Terry Dontje, le Fri 06 Aug 2010 13:11:30 -0400, a écrit : Is anyone looking at replacing the Solaris processor_bind calls with lgrp_affinity_set calls in hwloc? I eventually added using lgrp_affinity_set(). Not as a replacement for processor_bind

Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-12 Thread Terry Dontje
Sorry Rich, I didn't realize there was a graph attached at the end of message. In other words my comments are not applicable because I really didn't know you were asking about the graph. I agree it would be nice to know what the graph was plotting. --td Terry Dontje wrote: Graham, Richard

Re: [OMPI devel] Trunk Commit Heads-up: New Common Shared Memory Component

2010-08-12 Thread Terry Dontje
Terry, On Aug 11, 2010, at 12:34 PM, Terry Dontje wrote: I've done some minor testing on Linux looking at resident and shared memory sizes for np=4, 8 and 16 jobs. I could not see any appreciable differences in sizes in the process between sysv, posix or mmap usage in the SM btl. So I am

[hwloc-devel] Support for solaris lgrp_affinity_set

2010-08-06 Thread Terry Dontje
Is anyone looking at replacing the Solaris processor_bind calls with lgrp_affinity_set calls in hwloc? -- Oracle Terry D. Dontje | Principal Software Engineer Developer Tools Engineering | +1.650.633.7054 Oracle * - Performance Technologies* 95 Network Drive, Burlington, MA 01803 Email

Re: [OMPI devel] RFC: Add new Solaris sysinfo component

2010-08-03 Thread Terry Dontje
Graham, Richard L. wrote: Why do we need an RFC for this sort of component ? Seems self contained. Probably don't, just giving a heads up. --td Rich On 8/3/10 6:59 AM, "Terry Dontje" <terry.don...@oracle.com> wrote: WHAT: Add new Solaris sysinfo component WHY

Re: [OMPI devel] RFC: Remove all other paffinity components

2010-05-18 Thread Terry Dontje
Jeff Squyres wrote: Just chatted with Ralph about this on the phone and he came up with a slightly better compromise... He points out that we really don't need *all* of the hwloc API (there's a bajillion tiny little accessor functions). We could provide a steady, OPAL/ORTE/OMPI-specific API

Re: [OMPI devel] System V Shared Memory forOpenMPI:Request forCommunity Input and Testing

2010-05-05 Thread Terry Dontje
Jeff Squyres wrote: On May 4, 2010, at 9:53 AM, Ashley Pittman wrote: Point noted. But actually -- can you give specific reasons as to why a user should care? Keep in mind that this would be a short-lived fork'ed process -- not "spawn" in the MPI sense of the word. You might be

Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-04 Thread Terry Dontje
Ralph Castain wrote: On May 4, 2010, at 3:45 AM, Terry Dontje wrote: Is a configure-time test good enough? For example, are all Linuxes the same in this regard. That is if you built OMPI on RH and it configured in the new SysV SM will those bits actually run on other Linux systems

Re: [OMPI devel] System V Shared Memory for Open MPI:Request for Community Input and Testing

2010-05-04 Thread Terry Dontje
Is a configure-time test good enough? For example, are all Linuxes the same in this regard. That is if you built OMPI on RH and it configured in the new SysV SM will those bits actually run on other Linux systems correctly? I think Jeff had hinted to this similarly when suggesting this may

Re: [OMPI devel] RFC: Deprecate rankfile?

2010-04-16 Thread Terry Dontje
he history of this feature, and I don't see anything changing to improve that situation. Now if you implement a replacement... :-) I'll get right on that after you approve the RFC that I am also suppose to send out :-). -td On Apr 16, 2010, at 5:08 AM, Terry Dontje wrote: Jeff Squyres wrote: On Ap

  1   2   3   >