Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
I run an entire battery of tests on these without any issues. Moreover it is an OMPI related thing, and these error messages were never used. Anyway, please let me know what exactly failed, I'll fix it asap. Thanks, george. On Oct 19, 2011, at 10:06 , Ralph Castain wrote: > If you are go

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
OK, just saw your commit. It make sense, an OMPI component should return OMPI error codes. Thanks for the fix. george. On Oct 19, 2011, at 10:12 , George Bosilca wrote: > I run an entire battery of tests on these without any issues. Moreover it is > an OMPI related thing, and these

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
Indeed, I removed some of the OMPI level error codes. As you can see in the patch they were defined but never used. I don't think they were worth an RFC, as they are not only never used in the trunk, but on 1.5 and 1.4. And I did check it because I was wondering why they existed in the first pl

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
I don't know how you think that the error codes work in Open MPI, so I'll take the liberty to depict it here so we all agree we're talking about the same thing. The opal_strerror is a nice feature, it allow to register a range of error codes with a particular error converter. Every time you loo

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
Can I have an example on how the current trunk is broken due to this change? Thanks, george. On Oct 19, 2011, at 16:32 , Ralph Castain wrote: > I propose that we retain the rest of the changeset, but revert the OMPI > constants to bring back their ORTE equivalents. We clearly should scrub tho

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
; should be discussed in an RFC. What the current code does is not > consistent with the original intent. > > I don't agree that you shouldn't propagate error codes through OMPI; in > fact, the original intent of the design was to allow such propagation. > Again, suc

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-19 Thread George Bosilca
somebody feels like filling up an RFC to remove them, please feel free to go ahead. george. On Oct 19, 2011, at 18:41 , George Bosilca wrote: > A careful reading of the committed patch, would have pointed out that none of > the concerns raised so far were true, the "old-way" beha

Re: [OMPI devel] Locality info

2011-10-19 Thread George Bosilca
Wonderful!!! We've been waiting for such functionality for a while. I do have some questions/remarks related to this patch. What is the my_node_rank in the orte_proc_info_t structure? Is there any difference between using the field my_node_rank or the vpid part of the my_daemon? What is the cor

Re: [OMPI devel] Problem-Bug with MPI_Intercomm_create()

2011-10-26 Thread George Bosilca
_Barrier(), >>>> i got the following error : >>>> >>>> *** An error occurred in MPI_Barrier >>>> *** on communicator >>>> *** MPI_ERR_INTERN: internal error >>>> *** MPI_ERRORS_ARE_FATAL: your

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-01 Thread George Bosilca
This is a much saner solution. We [mostly] stayed away from calling exit deep into our libraries, there is no reason to add it now. I'll vote in favor of show_help + return code. george. On Nov 1, 2011, at 15:14 , Jeff Squyres wrote: > We talked about this on the call today. > > A good sugg

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-01 Thread George Bosilca
> 3. show_help. if OPAL_ENABLE_DEBUG, exit(1), else return error code. > > > > On Nov 1, 2011, at 4:50 PM, George Bosilca wrote: > >> This is a much saner solution. We [mostly] stayed away from calling exit >> deep into our libraries, there is no reason to ad

Re: [OMPI devel] RFC: MCA param registration errors

2011-11-02 Thread George Bosilca
as some opposition to having show_help messages possibly coming up all > over the place I thought a fall back of only doing the show_help on > enable_debug builds was a reasonable middle ground. > > --td >> >> On Nov 1, 2011, at 7:30 PM, George Bosilca wrote: >> &g

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25445

2011-11-06 Thread George Bosilca
I might have missed some of the phone conferences, but this is a highly critical modification of the one of the performance critical sub-system of Open MPI. There was no RFC about and no prior warning. This change impacts every other BTL and PML out there. Moreover, at this point there is no ass

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25450

2011-11-07 Thread George Bosilca
This commit introduced quite a few warnings on Mac OS X. A snippet is attached below. Btw, why do we need to build buffer event support in libevent? And why ssl ... ../../../../../../ompi/opal/mca/event/libevent2013/libevent/bufferevent_openssl.c: In function 'bio_bufferevent_read': ../../../..

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25450

2011-11-07 Thread George Bosilca
t 11:25 PM, Nathan T. Hjelm wrote: > >> Hmm, I didn't come across that during testing. You are right that we >> should't be compiling with ssl support. >> >> >> On Mon, 7 Nov 2011 01:17:50 -0500, George Bosilca >> wrote: >>> This c

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
tl. >>> >>> Anyone else care to weigh in or do some measurements? >>> >>> -Nathan >>> >>> On Sun, 6 Nov 2011 23:05:57 -0500, George Bosilca >>> wrote: >>>> I might have missed some of the phone conferences, but this is a

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
A little bit of history: 1. r25305: added 2 atomic operations to OPAL. However, they only exists on amd64 and are only used in the vader BTL, which I assume only supports amd64. 2. r25334: The seg_key union got a new member ptr. This member is solely used in the vader BTL, as all other BTL use

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
On Nov 7, 2011, at 10:37 , Jeff Squyres wrote: > On Nov 7, 2011, at 10:16 AM, Nathan T. Hjelm wrote: > >> Yes, and I completely agree. I was simply trying to keep it consistent in >> case there is something I don't know about the heterogeneous case. >> >> I increased the size of the 64 bit memb

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
On Nov 7, 2011, at 17:34 , Barrett, Brian W wrote: > A number of OMPI developer institutions are working on a new BTL > (different from vader) for the Cray XE series using the uGNI upper layer. > The rkeys in uGNI are 128 bytes. Thanks Brian for the clarification. Obviously there was no need fo

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
On Nov 7, 2011, at 17:34 , Barrett, Brian W wrote: > I'm not sure why they called it vader, but vader is a fairly straight > forward shared memory BTL. It differs from sm in two important ways: 1) > it uses the XPMEM driver instead of SysV for shared memory and 2) it uses > the the Nemesis queue

[OMPI devel] debugger confusion

2011-11-07 Thread George Bosilca
I was trying to understand how the debugger interface is supposed to work. And if I was confused before, that feeling never disappeared. There is one thing that I really can't figure out, and I hope that somebody (Jeff/Ralph/Rolf based on svn blame) can enlighten me. MPIR_debug_gate. In the doc

Re: [OMPI devel] debugger confusion

2011-11-07 Thread George Bosilca
widely adopted. > > I'd suggest being a little careful about making changes without consulting > people who use TV and "stat", at least - those are the ones most recently > tested. > > > On Nov 7, 2011, at 5:59 PM, George Bosilca wrote: > >> I was

Re: [OMPI devel] debugger confusion

2011-11-07 Thread George Bosilca
t is broken...and I don't recall seeing that question raised > to the community (and time given for a response) prior to the changes being > committed. :-) ? george. > > > On Nov 7, 2011, at 8:20 PM, George Bosilca wrote: > >> They better do conform to what they

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-07 Thread George Bosilca
Nov 2011 17:18:42 -0500, George Bosilca > wrote: >> A little bit of history: >> >> 1. r25305: added 2 atomic operations to OPAL. However, they only exists > on >> amd64 and are only used in the vader BTL, which I assume only supports >> amd64. > > Two thin

Re: [OMPI devel] debugger confusion

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 07:52 , Jeff Squyres wrote: > To be clear: that document simply standardizes what MPI implementations are > supposed to provide in their MPIR implementation (prior to this, MPI > implementations tended to have subtle differences between their MPIR > implementations, which we

[OMPI devel] Open MPI BOF

2011-11-08 Thread George Bosilca
Folks, Wednesday November 15th at 12:15 PST, we will have an Open MPI BOF. We will have two guest speakers: Rolf vandeVaart from NVIDIA and Shinji Sumimoto from the K-computer. If you are at SC, you are all invited to participate to this annual event. Blend for a moment with our user community,

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the startup process or the MPI application to signal changes to the debugger. No return value, nothing more than a breakpoint. I wonder how the volatile got there, there is no such requirement on variables that cannot be ch

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
Elements in an array are always stored in the expected [increasing] order, regardless of the endianess of the architecture. Moreover, due to the alignment rules, all members in a union will start at the same address. It turns out there is no endianess conversion on the keys, so I suppose both p

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread George Bosilca
ptmalloc2/malloc.c in the various versions of OpenMPI that > are still being maintained. > > Larry Baker > US Geological Survey > 650-329-5608 > ba...@usgs.gov > > On 17 Oct 2011, at 8:18 PM, George Bosilca wrote: > >> Larry, >> >> Sorry for not

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
tarter = 0; george. On Nov 8, 2011, at 17:43 , Ashley Pittman wrote: > > I think the volatiles are there to ensure the compiler doesn't optimise away > reads or function calls which has been a problem with this interface in the > past. > > On 8 Nov 201

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
uld ask a hard-core compiler guru about … george. > > -Paul > > On 11/8/2011 2:46 PM, George Bosilca wrote: >> This value is not even read by the debugger. It only check for it's >> existence in the startup process, so I guess we're safe here as well. >

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
32; > uint64_t key64; > struct { uint64_t value[2] } key128; > }; > > -Nathan > > On Tue, 8 Nov 2011 17:22:48 -0500, George Bosilca > wrote: >> Elements in an array are always stored in the expected [increasing] > order, >> regardless of the endianess of the

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
t;> -Paul >> >> On 11/8/2011 2:46 PM, George Bosilca wrote: >>> This value is not even read by the debugger. It only check for it's >>> existence in the startup process, so I guess we're safe here as well. >> >> -- >> Paul H. Hargrov

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 10:36 , Nathan T. Hjelm wrote: > On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart > wrote: >>> george. >>> >>> PS: Regarding the hand-copy instead of the memcpy, we tried to avoid >> using >>> memcpy in performance critical codes, especially when we know the size of >>> the

Re: [OMPI devel] VT issue

2011-11-14 Thread George Bosilca
This is supposed to be an intrinsic, automatically replaced by GCC during the compilation process. If I do the same configure as you (on the same machine) I have in my opal_config.h: /* Whether C compiler supports __builtin_expect */ #define OPAL_C_HAVE_BUILTIN_EXPECT 1 /* Whether C++ compiler s

[OMPI devel] Fwd: [OMPI svn] svn:open-mpi r25476

2011-11-17 Thread George Bosilca
I guess I reach one of these corner-cases that didn't got tested. I can't start any apps (not even a hostname) after this commit using the rsh PLM (as soon as I add a hostile). The mpirun is blocked in an infinite loop (after it spawned the daemons) in orte_rmaps_base_compute_vpids. Attaching wi

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25476

2011-11-17 Thread George Bosilca
.indiana.edu from app number 0 > universe size 8 > > > I'll get a fresh checkout and see if I can replicate from that... > > On Nov 17, 2011, at 7:42 PM, George Bosilca wrote: > >> I guess I reach one of these corner-cases that didn't got tested. I can't >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25476

2011-11-17 Thread George Bosilca
Hello, World, I am 0 of 2 on host odin090.cs.indiana.edu from app number 0 >>> universe size 8 >>> >>> >>> I'll get a fresh checkout and see if I can replicate from that... >>> >>> On Nov 17, 2011, at 7:42 PM, George Bosilca wrote: >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25476

2011-11-18 Thread George Bosilca
t get it to fail, even with hostfile arguments. I'll try again in the > morning. > > On Nov 17, 2011, at 8:49 PM, George Bosilca wrote: > >> Maybe the issue is generated by how the hostile is specified. I used >> orte_default_hostfile= in my mca-params.conf. >>

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-11-18 Thread George Bosilca
Dear Yuki and Takahiro, Thanks for the bug report and for the patch. I pushed a [nearly identical] patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A special version for the 1.4 has been prepared and has been attached to the ticket #2916 (https://svn.open-mpi.org/trac/o

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25476

2011-11-18 Thread George Bosilca
On Nov 18, 2011, at 07:49 , Ralph Castain wrote: > That's a condition which should never be reached, but just to be safe, I have > added a "bozo check" that will cause the routine to error out with a message > if that situation occurs. I have tried everything - hostfile, dash-host, > bizarre co

Re: [OMPI devel] Retrying a MPI_SEND

2011-11-18 Thread George Bosilca
On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: > Hello again. > > I was doing some trace into de PML_OB1 files. I start to follow a MPI_Ssend() > trying to find where a message is stored (in the sender) if it is not send > until the receiver post the recv, but i didn't find that place.

Re: [OMPI devel] Retrying a MPI_SEND

2011-11-18 Thread George Bosilca
On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote: > 2011/11/18 George Bosilca > > On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: > >> Hello again. >> >> I was doing some trace into de PML_OB1 files. I start to follow a >> MPI_Ssend() trying to fi

Re: [OMPI devel] Retrying a MPI_SEND

2011-11-18 Thread George Bosilca
On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote: > > 2011/11/18 George Bosilca > > On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote: > >> 2011/11/18 George Bosilca >> >> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: >> >>> Hell

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-22 Thread George Bosilca
-10! george. On Nov 22, 2011, at 08:38 , Jeff Squyres wrote: > 1. Personally, I would love to rename the openib BTL to something that makes > sense and is a current name. By "rename", I include "rename the directory" > -- so it would be ompi/mca/btl/ofrc, or something like that. > > 2. Goo

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-24 Thread George Bosilca
aming xpmem? Or ...? > > On Nov 22, 2011, at 11:37 AM, George Bosilca wrote: > >> -10! >> >> george. >> >> On Nov 22, 2011, at 08:38 , Jeff Squyres wrote: >> >>> 1. Personally, I would love to rename the openib BTL to something that >

Re: [OMPI devel] Rename "vader" BTL to "xpmem"

2011-11-24 Thread George Bosilca
27;ing the whole concept? Or >> just renaming xpmem? Or ...? >> >> On Nov 22, 2011, at 11:37 AM, George Bosilca wrote: >> >>> -10! >>> >>> george. >>> >>> On Nov 22, 2011, at 08:38 , Jeff Squyres wrote: >>> >

Re: [OMPI devel] RFC: new btl descriptor flags

2011-11-29 Thread George Bosilca
These two functions target at defining a memory layout (contiguous or not) that can be target for a one-sided communication. I don't see why there is a need to know what type of communication that will be … What is so different in the xpmem that requires the memory to be prepared based on the op

Re: [OMPI devel] RFC: new btl descriptor flags

2011-11-29 Thread George Bosilca
On Nov 29, 2011, at 15:52 , Nathan Hjelm wrote: > > > On Tue, 29 Nov 2011, George Bosilca wrote: > >> These two functions target at defining a memory layout (contiguous or not) >> that can be target for a one-sided communication. I don't see why there is a

Re: [OMPI devel] Retrying a MPI_SEND

2011-12-09 Thread George Bosilca
is triggered right before returning above the MPI layer, at the level where you placed your interception you have all the freedom you need to handle the faults. george. > > Thanks for the help. > > Hugo > > 2011/11/19 Hugo Daniel Meyer > > > 2011/11/18 Geo

Re: [OMPI devel] Invalid free (btl_openib_endpoint.c, 448) in v1.5

2011-12-12 Thread George Bosilca
Do we have the same issue in the trunk? george. On Dec 12, 2011, at 12:49 , Mike Dubman wrote: > after removing my debug prints - the correct line is 448 > > On Mon, Dec 12, 2011 at 7:46 PM, Mike Dubman wrote: > > > Hi guys, > > The latest v1.5 from trunk, compiled in debug mode yields

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25621

2011-12-12 Thread George Bosilca
Broken !!! george. On Dec 12, 2011, at 15:52 , hje...@osl.iu.edu wrote: > Author: hjelmn > Date: 2011-12-12 15:52:51 EST (Mon, 12 Dec 2011) > New Revision: 25621 > URL: https://svn.open-mpi.org/trac/ompi/changeset/25621 > > Log: > enable ptmalloc with using uGNI > Added: > trunk/orte/mca/rm

Re: [OMPI devel] [OMPI svn] svn:open-mpi r25621

2011-12-12 Thread George Bosilca
aps_lb.c:530:9: error: too few arguments to function > ‘orte_rmaps_base_compute_vpids’ > ../../../../orte/mca/rmaps/base/rmaps_private.h:65:19: note: declared here > make[1]: *** [rmaps_lb.lo] Error 1 > make[1]: Leaving directory > `/home/jsquyres/svn/ompi3/orte/mca/rmaps/load_balance

Re: [OMPI devel] Invalid free (btl_openib_endpoint.c, 448) in v1.5

2011-12-13 Thread George Bosilca
We are investigating. A fix will be hopefully provided soon. Thanks for the report, george. On Dec 13, 2011, at 00:25 , Mike Dubman wrote: > nope > > On Mon, Dec 12, 2011 at 10:40 PM, George Bosilca wrote: > Do we have the same issue in the trunk? > > george. > >

[OMPI devel] Drastic change in ORTE behavior between trunk and 1.5

2011-12-13 Thread George Bosilca
I noticed today a drastic change in how ORTE deal with the hostfile between trunk and 1.5. 1. 1.5 and prior used the hostile as a suggestion, a placeholder where to pick the requested number of daemons during the launch. The current trunk spawn daemons on all the nodes provided on the host file

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25627

2011-12-14 Thread George Bosilca
Shiqing, This file seems to be there. $ pwd /home/bosilca/unstable/1.5/ompi $ svn info opal/mca/shmem/windows/.windows Path: opal/mca/shmem/windows/.windows Name: .windows URL: https://svn.open-mpi.org/svn/ompi/branches/v1.5/opal/mca/shmem/windows/.windows Repository Root: https://svn.open-mp

Re: [OMPI devel] Totalview broken with 1.5/trunk

2011-12-14 Thread George Bosilca
A comment in the commit suggest that the symbols were not linked into the orterun if they were not accessed there. I guess this was the trick to make sure MPIR_Breakpoint is in there. Now that you pointed me to this commit I have to disagree with. Why the MPI debugging symbols have been delete

Re: [OMPI devel] Drastic change in ORTE behavior between trunk and 1.5

2011-12-14 Thread George Bosilca
ses, at a point where after a while we have to reboot the machines to liberate pids. On Dec 14, 2011, at 10:08 , Ralph Castain wrote: > On Dec 13, 2011, at 9:10 PM, George Bosilca wrote: > >> I noticed today a drastic change in how ORTE deal with the hostfile between >> trunk

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25661

2011-12-15 Thread George Bosilca
This patch is not correct. All these variables have been moved into the ORTE layer (they are declared in orte/mca/debugger/base/base.h), so they should be in fact removed from the MPI level files. While I don't think moving them all in the ORTE was a good choice, changing their definition in t

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread George Bosilca
On Dec 15, 2011, at 16:55 , Ashley Pittman wrote: > There is a problem with 1.5.5rc1 that prevents padb from loading the process > table start from the orterun process, what appears to be happening is that > MPIR_proctable and MPIR_proctable_size is present in both orterun itself and > also in

Re: [OMPI devel] MPIR attach from padb broken (1.5.5rc1)

2011-12-15 Thread George Bosilca
This is quite impressive. After digging a little bit more, it appears that the orte/tools/orterun/debuggers.c is in the repository but it is not used for compilation. Thus, I really don't see where the second definition is coming from? george. On Dec 15, 2011, at 17:02 , George Bo

Re: [OMPI devel] Retrying a MPI_SEND

2011-12-16 Thread George Bosilca
MPI_ERROR is set. Do you know where jumps the execution? or >> at least in which error handler? >> >> Thanks in advance. >> >> Hugo >> >> 2011/12/9 George Bosilca >> >> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote: >> >>> He

Re: [OMPI devel] Fwd: Troubles using MPI_Isend/MPI_Irecv/MPI_Waitany and MPI_Allreduce

2011-12-20 Thread George Bosilca
I checked the wait_any code, and I can only see one possible execution path to return MPI_UNDEFINED. All requests have to be marked as inactive, which only happens after the OMPI request completion function is called. This lead to the following question. Are your threads waiting on common reque

Re: [OMPI devel] [EXTERNAL] [patch] One-sided communication with derived datatype fails on sparc64

2012-01-12 Thread George Bosilca
The problem is correctly identified and solved. I already pushed the patch in the trunk. I will create the CMR for both 1.5 and 1.4. Kudos to the Fujitsu team, that was a tricky one to find. Thanks for you contributions! george. On Jan 12, 2012, at 10:39 , Barrett, Brian W wrote: > George -

Re: [OMPI devel] RFC: Support Cross Memory Attach in sm btl

2012-01-13 Thread George Bosilca
Chris, If the sys call are there we should clearly take advantage of them. The patch looks good, I vote for it! george. On Jan 12, 2012, at 04:34 , Christopher Yeoh wrote: > Hi Brad, > > WHAT: Adds Cross Memory Attach support to the sm btl > > WHY: For faster intranode communication > >

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25742

2012-01-19 Thread George Bosilca
This is a critical change, with a significant impact on the code base. Basically by moving the binding later in the code after the modex was completed, all memory allocated before (which is all memory allocated during the registration of all OMPI modules) will endue being on the wrong NUMA node

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25762

2012-01-21 Thread George Bosilca
How about instead of all the patches (r25758, r25762 and r25763) we just set both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored on Unix while the other on Darwin? Another benefit will be to have a significantly cleaner... george. On Jan 21, 2012, at 18:48 , r...@osl.

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25762

2012-01-23 Thread George Bosilca
> Doesn't strike me as all that complicated. > > On Jan 21, 2012, at 7:26 PM, George Bosilca wrote: > >> How about instead of all the patches (r25758, r25762 and r25763) we just set >> both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored &g

Re: [OMPI devel] [PATCH] MPI_FILE_SEEK_SHARED is wrong in Fortran

2012-01-25 Thread George Bosilca
Thanks Yuki, All the submitted patches have been pushed into the trunk (https://svn.open-mpi.org/trac/ompi/changeset/25781) and are pending on the queue for 1.4 and 1.5. Thanks, george. On Jan 25, 2012, at 05:25 , Y.MATSUMOTO wrote: > Dear All, > > Next is about "MPI_FILE_SEEK_SHARED"

Re: [OMPI devel] RFC: Java MPI bindings

2012-02-07 Thread George Bosilca
This doesn't sound like a very good idea, despite a significant support from a lot of institutions. There is no standardization efforts in the targeted community, and championing a broader support in the Java world was not one of our main target. OMPI does not include the Boost bindings, despit

Re: [OMPI devel] thread safety of the self btl

2012-02-08 Thread George Bosilca
The self BTL is different from any other BTL. Any memcpy operation done by this BTL is automatically protected behind the matching logic, and therefore does not require extra thread safety protection. A mutex in the self BTL is mostly a copy/paste mistake. george. On Feb 8, 2012, at 17:57 ,

Re: [OMPI devel] RFC: Java MPI bindings

2012-02-08 Thread George Bosilca
be done on a branch and the RFC can be >>>>>> reissued when there is both >>>>>> a) a standard to which the bindings can claim to conform >>>>>> b) an implementation which has been shown to be stable >>>>>> >>>>>> -Paul

Re: [OMPI devel] thread safety of the self btl

2012-02-08 Thread George Bosilca
t can succeed only for one thread). george. On Feb 8, 2012, at 21:38 , Christopher Yeoh wrote: > On Wed, 8 Feb 2012 20:58:52 -0500 > George Bosilca wrote: > >> The self BTL is different from any other BTL. Any memcpy operation >> done by this BTL is automatically protected be

Re: [OMPI devel] btl/gm build broken on trunk

2012-02-16 Thread George Bosilca
Thanks Paul. Fixed in r25946. george. On Feb 16, 2012, at 21:47 , Paul H. Hargrove wrote: > I just tried to build yesterday's ompi trunk tarball (1.7a1r25937) with the > Intel compilers. > Sorry if this was fixed in the past 23 hours or so. > > > My system has GM-2.1.30 installed, and icc w

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

2012-03-01 Thread George Bosilca
Good catch!!! That's indeed a quite nasty bug. If it fixes the IB issues it justifies a 1.4.6 release. Thanks, george. On Mar 1, 2012, at 10:56 , Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see commit below). > If this fix addresses some hangs we are se

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread George Bosilca
Please do a "ompi_info --param btl sm" on your environment. The lazy_free direct the internals of the SM BTL not to release the memory fragments used to communicate until the lazy limit is reached. The default value was deemed as reasonable a while back when the number of default fragments was l

Re: [OMPI devel] [PATCH]Incorrect algorithm choice using coll_tuned_dynamic_rules_filename (over 2GiB message)

2012-03-02 Thread George Bosilca
Yuki, I applied your patch and added your copyright in the corresponding files (r26080). I will make a CMR for the 1.4 and 1.5. However, as you might have noticed we're trying to close the 1.4 and move forward. Thanks, george. On Mar 1, 2012, at 02:33 , Y.MATSUMOTO wrote: > Dear All, >

Re: [OMPI devel] [PATCH]Incorrect algorithm choice using coll_tuned_dynamic_rules_filename (over 2GiB message)

2012-03-02 Thread George Bosilca
Yuki, r26084 should fixes the issue with the dynamic rules file in the trunk. Thanks for reporting it. george. On Mar 1, 2012, at 02:33 , Y.MATSUMOTO wrote: > But, we found problem when over 2GiB message is written in rulefile as > "message size". > (over 2GiB message cannot read correctly.

Re: [OMPI devel] Replacing poll()

2012-03-03 Thread George Bosilca
On Mar 3, 2012, at 18:18 , Alex Margolin wrote: > I've figured that what I really need is to write my own BTL component, rather > then trying to manipulate the existing TCP one. I've started writing it using > the 1.5.5rc3 tarball and some pdfs from 2006 I found on the website (anything > else

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
Yuki, I pushed a fix for this issue in the trunk (r26097). However, I disagree with you on some of the topics below. On Mar 5, 2012, at 04:02 , Y.MATSUMOTO wrote: > Dear All, > > Next feedback is about "collective communications". > > Collective communication may be abend when it use over 2Gi

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I was afraid about all those little intermediary steps. I asked a compiler guy and apparently reversing the order (aka starting with the ptrdiff_t variable) will not solve anything. The only portable way to solve this is to cast every single member, to prevent __any__ compiler from hurting us.

Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread George Bosilca
I gave it a try (r26103). It was messy, and I hope I got it right. Let's soak it for few days with our nightly testing to see how it behave. george. On Mar 5, 2012, at 16:37 , N.M. Maclaren wrote: > On Mar 5 2012, George Bosilca wrote: >> >> I was afraid about all those

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-06 Thread George Bosilca
I din't check thoroughly the code, but OMPI_ERR_OUT_OF_RESOURCES is not an error. If the registration returns out of resources, the BTL will returns OUT_OF_RESOURCE (as an example via the mca_btl_openib_prepare_src). At the upper level, the PML (in the mca_pml_ob1_send_request_start function) in

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26118

2012-03-08 Thread George Bosilca
Josh, Open MPI already have a similar function in the communicator part, function that is not exposed to the upper layer. I think that using the code in ompi_comm_compare (the second part that compare groups) is sound proof. Moreover, if now we have an ompi_group_compare function you should use

Re: [OMPI devel] MCA BTL Fragment lists

2012-03-09 Thread George Bosilca
On Mar 9, 2012, at 08:38 , Alex Margolin wrote: > Hi, > > I'm implementing a new BTL component, and > > 1. I read the TCP code and ran into the three fragment lists: > >/* free list of fragment descriptors */ >ompi_free_list_t tcp_frag_eager; >ompi_free_list_t tcp_frag_max; >om

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread George Bosilca
this is in the BTL registration code (openib_reg_mr), not in >> the directly-invoked-by-the-PML code. So it's the mpool's fault -- not the >> PML's fault. >> >> >> >> On Mar 6, 2012, at 10:05 AM, George Bosilca wrote: >> >>> I din'

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread George Bosilca
On Mar 9, 2012, at 14:23 , Nathan Hjelm wrote: > BTW, can anyone tell me why each mpool defines mca_mpool_base_resources_t > instead of defining mca_mpool_blah_resources_t. The current design makes it > impossible to support more than one mpool in a btl. I can delete a bunch of > code if I can

Re: [OMPI devel] Replacing poll()

2012-03-19 Thread George Bosilca
; frag->hdr.base.tag; >reg->cbfunc(&frag->btl->super, frag->hdr.base.tag, > &frag->base, reg->cbdata); >} > This calls a callback function, which I assume notifies the upper layer of a > message, but this is only for MCA_BTL

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26172

2012-03-21 Thread George Bosilca
Josh, I don't agree that these changes are required. In the current standard (2.2), MPI_ERR_PENDING is only allowed to be returned by MPI_WAITALL, in some very specific conditions. Here is the snippet from the MPI standard clarifying this behavior. > When one or more of the communications comp

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26172

2012-03-21 Thread George Bosilca
equest_default_wait_all was > incorrect. Would you care to elaborate as to what specifically is > incorrect? > > -- Josh > > On Wed, Mar 21, 2012 at 2:17 PM, George Bosilca wrote: >> Josh, >> >> I don't agree that these changes are required. In the current

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26172

2012-03-22 Thread George Bosilca
is supposed to achieve. Moreover, I guess your patch is indeed correct if the MPICH test you were using pass. > Do we both agree that before this patch Open MPI did not support > MPI_ERR_PENDING? Who dares claim the opposite? george. > > -- Josh > > > On Wed, Mar

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26172

2012-03-22 Thread George Bosilca
ng in the default_wait_all of MPI_ERR_PENDING is >> incorrect as well." >> I'm glad that you no longer think the patch is "incorrect", but just >> not implemented as well as it could be. >> >> Thanks, >> Josh >> >> On Thu

Re: [OMPI devel] RFC: change default for tuned alltoallv to pairwise

2012-03-22 Thread George Bosilca
On Mar 21, 2012, at 12:14 , Nathan Hjelm wrote: > What: Change coll tuned default to pairwise exchange > > Why: The linear algorithm does not scale to any reasonable number of PEs > > When: Timeout in 2 days (Fri) > > Is there any reason the default should not be changed? Nathan, I can see w

Re: [OMPI devel] algorithm selection in open mpi

2012-04-03 Thread George Bosilca
Roswan, There a re simpler solutions to achieve this. We have a built-in mechanism to select a specific collective implementation. Here is what you have to add in your .openmpi/mca-params.conf (or as MCA argument on the command line): coll_tuned_use_dynamic_rules = 1 coll_tuned_bcast_algorithm

Re: [OMPI devel] algorithm selection in open mpi

2012-04-03 Thread George Bosilca
nd of first collective On Apr 3, 2012, at 09:01 , Pavel Mezentsev wrote: > Is there a way to specify collective depending on the size of the message and > number of processes? > > Regards, > Pavel Mezentsev > > 2012/4/3 George Bosilca > Roswan, > > There a re simpler

Re: [OMPI devel] Using opal_convertor_t for In-place send buffers in a BTL component

2012-04-05 Thread George Bosilca
Alex, This is indeed quite strange. You're receiving an error about truncated data during a barrier. The MPI_Barrier is the only MPI function that has a synchronization meaning, and does not move data around, so I can hardly see how this can generate a truncation. You should put a breakpoint i

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26243

2012-04-10 Thread George Bosilca
Adachi, You're right indeed, I should have multiplied the displacement by the extent of the datatype. Thanks for catching this! Commit r26259 is supposed to fix this. george. On Apr 9, 2012, at 01:57 , ADACHI Tomoya wrote: > Hi George, > > This fix seems insufficient for multibyte datatype

Re: [OMPI devel] RFC: opal_cache_line_size

2012-04-23 Thread George Bosilca
No strong opinion. However, the comment about the initial value of opal_cache_line_size is wrong (opal/runtime/opal.h), as it states that the default value is -1 while it is 128. george. On Apr 23, 2012, at 16:21 , Jeffrey Squyres wrote: > No one replied to this RFC. Does anyone have an opi

Re: [OMPI devel] RFC: opal_cache_line_size

2012-04-23 Thread George Bosilca
I guess at the end is a trade-off between performance and space. What we wanted with this code is to avoid false sharing of cache lines between our data (internal header we add in from of elements in lists) and the content of the data (whatever is coming from the upper layer). If the header is

Re: [OMPI devel] How to debug segv

2012-04-25 Thread George Bosilca
Alex, You got the banner of the FT benchmark, so I guess at least the rank 0 successfully completed the MPI_Init call. This is a hint that you should investigate more into the point-to-point logic of your mosix BTL. george. On Apr 25, 2012, at 09:30 , Alex Margolin wrote: > NAS Parallel Ben

<    1   2   3   4   5   6   7   8   9   10   >