Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-23 Thread George Bosilca
y require the > notion of a jobid and rank-within-that-job. If the current ones don't, I > assure you that at least one off-trunk one definitely does > > Some of the MTL's, of course, definitely rely on those fields. > > > On Jul 23, 2014, at 7:15 PM, George Bosilca <bo

Re: [OMPI devel] RFC: Bump minimum sm pool size to 128K from 64K

2014-07-26 Thread George Bosilca
We are talking MB not KB isn't it? George. On Thu, Jul 24, 2014 at 2:57 PM, Rolf vandeVaart wrote: > WHAT: Bump up the minimum sm pool size to 128K from 64K. > WHY: When running OSU benchmark on 2 nodes and utilizing a larger > btl_smcuda_max_send_size, we can run

Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-26 Thread George Bosilca
All, I take advantage of this thread to clarify what is missing to have a perfectly MPI agnostic BTL interface. Some of these issues are pretty straightforward (getting rid of RTE and OMPI vestiges), some others will require some thinking from their developers in order to cope with a not

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
This means you are trying to initialize things too early. Most of the information made available in opal/util/proc.h is only available once the RTE was setup, i.e. only after the call to rte_init. Thus, the BTL can only use it after the init call... George. On Mon, Jul 28, 2014 at 1:01 PM,

Re: [OMPI devel] TCP btl seq

2014-07-28 Thread George Bosilca
So do we want to sequence the BTL interfaces between jobs or only between local processes on the same job? I'm also fine with removing this option if it is not in use. George. On Mon, Jul 28, 2014 at 1:09 PM, Ralph Castain wrote: > > On Jul 28, 2014, at 10:02 AM, Jeff

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
do_component_setup() > >> > >> Yuck. > >> > >> Is there a better way? > >> > >> Crazy idea: should we add more hooks during the init / setup sequence? > E.g., a BTL component_init_after_rte_has_been_initialized() that is > guaranteed to be called before any module functions are

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
such information. Patience ... George. On Mon, Jul 28, 2014 at 1:38 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > Well, I'm slightly confused as the BTL are initialized outside opal_init. > There must be a specific call to mca_base_framework_open for the BTL, and > curr

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread George Bosilca
This has been clear from day one: everything based on RML to setup will need to be rewritten. This is not only SM, it also related to IB. Meanwhile, one must build with dlopen enabled in order to get access to these calls. George. On Mon, Jul 28, 2014 at 4:02 PM, Nathan Hjelm

Re: [OMPI devel] OMPI_XXX defines in opal_config.h.in question

2014-07-29 Thread George Bosilca
Good catch. As you already have the warnings, please go ahead and fix them. Thanks, George. On Tue, Jul 29, 2014 at 3:58 PM, Pritchard Jr., Howard wrote: > Hi Folks, > > > > So I’m trying to get my pmix project back in order after making > > the big mistake of

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
rint name2 >> >> $2 = (const orte_process_name_t *) 0xbaf76c >> >> (gdb) print *name2 >> >> $3 = {jobid = 2452946945, vpid = 1} >> >> (gdb) >> >> >> >> >> >> >> >> >-Original Message- >&g

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
issue across the code base now, so we'll have to troll and fix it. I was > doing the minimal change required to fix the trunk in the meantime. > > On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > Yes. opal_process_name_t has basically no meaning by

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread George Bosilca
gs the "new" way, that's all > > On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > No, this is not going to be an issue if the opal_identifier_t is used > correctly (aka only via the exposed accessors). > > George. > > > &

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-30 Thread George Bosilca
Why do you want to add new versions? This will lead to having two, almost identical, sets of atomics that are conceptually equivalent but different in terms of code. And we will have to maintained both! I did a similar change in a fork of OPAL in another project but instead of adding another

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread George Bosilca
On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres) wrote: > WHAT: Should we make the job size (i.e., initial number of procs) available > in OPAL? > > WHY: At least 2 BTLs are using this info (*more below) > > WHERE: usnic and ugni > > TIMEOUT: there's already been some

Re: [OMPI devel] mca_PROJECT_FRAMEWORK_COMPONENT_symbol vs. mca_FRAMEWORK_COMPONENT_symbol

2014-07-30 Thread George Bosilca
I can also picture an environment where different projects can supply component that would technically belong to a framework from another project. Let me take an example. Imagine we decide to keep the RML-based connection setup for SM, thing that is not currently possible in the OPAL layer. In

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-30 Thread George Bosilca
On Jul 30, 2014, at 20:37 , Ralph Castain <r...@open-mpi.org> wrote: > > On Jul 30, 2014, at 5:25 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> >> On Jul 30, 2014, at 18:00 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> >> wrote: &

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
What is your definition of “global job size”? George. On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard wrote: > Hi Folks, > > I think given the way we want to use the btl's in lower levels like opal, > it is pretty disgusting for a btl to need to figure out on its own

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread George Bosilca
I definitively think you misunderstood this scope of this RFC. The information that is so important to you to configure the mailbox size is available to you when you need it. This information is made available by the PML through the call to add_procs, which comes with all the procs in the

Re: [OMPI devel] Further questions about BTL OPAL move...

2014-07-31 Thread George Bosilca
On Jul 31, 2014, at 18:26 , Jeff Squyres (jsquyres) wrote: > George -- > > Got 2 questions for ya: > > 1. I see some orte_* specific symbols/functions in ompi_mpi_init.c. Was that > intentional? Shouldn’t that stuff be in the RTE framework, or some such? Good catch.

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
Hjelm <hje...@lanl.gov> wrote: > > That is what I would prefer. I was trying to not disturb things too > much :). Please bring the changes over! > > -Nathan > > On Wed, Jul 30, 2014 at 03:18:44PM -0400, George Bosilca wrote: >> Why do you want to add new

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-31 Thread George Bosilca
...@lbl.gov> wrote: > >> >> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca <bosi...@icl.utk.edu> >> wrote: >> >>> Paul, I know you have a pretty diverse range computers. Can you try to >>> compile and run a “make check” with the following patch? >

Re: [OMPI devel] trunk link failure on Solaris-10/SPARC

2014-08-01 Thread George Bosilca
A missing include. Should be fixed by r32388. Thanks, George. On Thu, Jul 31, 2014 at 11:15 PM, Paul Hargrove wrote: > > $ INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c' > ld.so.1: ring_c: fatal: relocation error: file >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread George Bosilca
i.org> > <r...@open-mpi.org> wrote: > > > FWIW: we had Siegmar try that and it didn't solve the problem. Paul? > > > On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote: > > > Author: bosilca (George Bosilca) > Date: 2014-07-31 23:28

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread George Bosilca
lied cleanly to trunk except the portion > changing opal/include/opal/sys/osx/atomic.h, which does not exist. > > -Paul > > > On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca <bosi...@icl.utk.edu> > wrote: > >> Awesome, thanks Paul. When the results will be in we wi

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread George Bosilca
rogeneous cluster > - for this case, the fix is to have a different version of the > OPAL_PROCESS_NAME_xTOy > on little endian arch if heterogeneous mode is supported. > > > > does that make sense ? > > Cheers, > > Gilles > > > On 2014/07/3

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
This commit brings two things. One if the renaming suggested by Gilles. The second one is forcing the ORTE process down on the OPAL. This doesn't fit the current design of the BTL move. The current design assumes that the local OPAL process is part of the local OMPI process. George. PS: If it

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
Castain <r...@open-mpi.org> wrote: > > On Aug 1, 2014, at 8:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > This commit brings two things. One if the renaming suggested by Gilles. > The second one is forcing the ORTE process down on the OPAL. This doesn't > fit the

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
ove that assert? > > > On Aug 1, 2014, at 9:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > I missed the fact that the app doesn't force it. But if this is indeed the > case then it is extremely weird that you are seing someone else releasing > your proc. > > Regarding

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread George Bosilca
Another version of the atomic patch. Paul has tested it on a bunch of platforms. At this point we have confirmation from all architectures except SPARC (v8+ and v9). George. atomics.patch Description: Binary data On Jul 31, 2014, at 19:13 , George Bosilca <bosi...@icl.utk.edu>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
on an heterogeneous cluster, > this is now fixed in r32425. > > i am not convinced i chose the most elegant way to achieve the desired > result ... > could you please double check this commit ? > > Thanks, > > Gilles > > On 2014/08/02 0:14, George Bosilca wrote: > >

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
t around this problem. I missed that he did it by > location instead of named fields - perhaps we should do that instead? > As soon as we impose the ORTE naming scheme at the OPAL level (aka. the notion of jobid and vpid) this approach will become possible. George. > > > On A

Re: [OMPI devel] canceling buffered send request with pml/cm

2014-08-05 Thread George Bosilca
Yossi, I think you raised an interesting corner-case, and a possible bug in the MTL implementation. As the request is marked as complete by the CM/PML the cancel should never succeed. As the CM/PML is forcing the completion on all bend requests, it should also enforce that all completed

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread George Bosilca
org> wrote: > > On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Hmmm...wouldn't that then require that you know (a) the other side is >> little

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-05 Thread George Bosilca
, Paul Hargrove <phhargr...@lbl.gov> wrote:I have confirmed that George's latest version works on both SPARC ABIs.ARMv7 and three MIPS ABIs still pending...-PaulOn Fri, Aug 1, 2014 at 9:40 AM, George Bosilca <bosi...@icl.utk.edu> wrote:Another version of the atomic patch. Paul has tested i

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread George Bosilca
are of jobid and vpid but this is a bit > more heavyweight imho. > > i'll try this today and make sure it works. > > any thoughts ? > > Cheers, > > Gilles > > > On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> > <r...@open-mpi.org>

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
I have an extremely vague recollection about a similar issue in the datatype engine: on the SPARC architecture the 64 bits integers must be aligned on a 64bits boundary or you get a bus error. Takahiro you can confirm this by printing the value of data when signal is raised. George. On Fri,

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-08 Thread George Bosilca
Paul's tests identified an small issue with the previous patch (a real corner-case for ARM v5). The patch below is fixing all known issues. Btw, there is still room for volunteers for the .asm work. George. On Tue, Aug 5, 2014 at 2:23 PM, George Bosilca <bosi...@icl.utk.edu>

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
This is a gigantic patch for an almost trivial issue. The current problem is purely related to the fact that in a single location (nidmap.c) the orte_process_name_t (which is a structure of 2 integers) is supposed to be aligned based on the uint64_t requirements. Bad assumption! Looking at the

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
store pointer points to the store function > (db_hash.c:178) > and proc is only used id memcpy at line 194, so 64 bits alignment is not > required. > (and comment is explicit : /* to protect alignment, copy the data across > */ > > that might sounds pedantic, but are

Re: [OMPI devel] circular library dependence prevents static link on Solaris-10/SPARC

2014-08-08 Thread George Bosilca
r32467 should fix the problem. George. On Fri, Aug 8, 2014 at 1:20 PM, Jeff Squyres (jsquyres) wrote: > That'll do it... > > George: can you fix? > > > On Aug 8, 2014, at 1:11 PM, Ralph Castain wrote: > > > I think it might be getting pulled in from

Re: [OMPI devel] ORTE headers in OPAL source

2014-08-08 Thread George Bosilca
These are harmless. They are only used when FT is enabled which should rarely be the case. George. On Fri, Aug 8, 2014 at 4:36 PM, Jeff Squyres (jsquyres) wrote: > Here's a few ORTE headers in OPAL source -- can respective owners clean > these up? Thanks. > > - >

Re: [OMPI devel] ORTE headers in OPAL source

2014-08-11 Thread George Bosilca
quy...@cisco.com> wrote: > I think you're making a joke, right...? > > I see direct calls to ORTE sstore functionality in all three. > > > > > On Aug 8, 2014, at 5:42 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > These are harmless. They are only

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-11 Thread George Bosilca
It is not that I care, but it was one of our supported platforms and we don't usually drop support for anything without a proper RFC. George. On Mon, Aug 11, 2014 at 12:09 PM, Dave Goodell (dgoodell) < dgood...@cisco.com> wrote: > On Aug 7, 2014, at 11:37 PM, George Bosi

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-11 Thread George Bosilca
Dave, We all understand your concerns. However, the current issue has nothing to do with Nathan, the code for supporting ARMv5 is already in the patch I submitted and that Paul validated. What Nathan said he might take a look at is a different method for generating assembly code, one that only

Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread George Bosilca
There are many differences between the trunk and 1.8 regarding the TCP BTL. The major I remember about is that the TCP in the trunk is reporting errors to the upper level via the callbacks attached to fragments, while the 1.8 TCP BTL doesn't. So, I guess that once a connection to a particular

Re: [OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread George Bosilca
this in the 1.8 ... On Wed, Aug 13, 2014 at 3:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > On Aug 13, 2014, at 12:52 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > There are many differences between the trunk and 1.8 regarding the TCP > BTL.

Re: [OMPI devel] [1.8.2rc4] build failure with --enable-osx-builtin-atomics

2014-08-14 Thread George Bosilca
The atomic.h file should also be trimmed of the SPARC relique. George. Index: opal/include/opal/sys/atomic.h === --- opal/include/opal/sys/atomic.h (revision 32531) +++ opal/include/opal/sys/atomic.h (working copy) @@ -162,8

Re: [OMPI devel] RFC: add opal/threads/spinlock.h

2014-08-14 Thread George Bosilca
SHARED is only supported when the pthread library does support spinlock, while in all other case it falls back into using atomic locks. Providing support only for a small fraction of environments without reporting errors or providing any alternative on other systems is difficult to accept. I

Re: [OMPI devel] RFC: BTL Interface Change (2 of 5)

2014-08-18 Thread George Bosilca
Nathan, Indeed the original design allowed for multiple usages of the same descriptor, not concurrent as the text in the btl.h indicates but consecutive. The MCA_BTL_FLAGS_RDMA_MATCHED flag is a weirdness needed for Portal, and I am not use it is currently in use anywhere in the code base. My

Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-26 Thread George Bosilca
The MPI standard clearly states (in 8.7.1 Allowing User Functions at Process Termination) that the mechanism you describe is only allowed on MPI_COMM_SELF. The most relevant part starts at line 14. George. On Tue, Aug 26, 2014 at 11:20 AM, Lisandro Dalcin wrote: > Another

Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-26 Thread George Bosilca
alc...@gmail.com> wrote: > On 26 August 2014 21:29, George Bosilca <bosi...@icl.utk.edu> wrote: > > The MPI standard clearly states (in 8.7.1 Allowing User Functions at > Process > > Termination) that the mechanism you describe is only allowed on > > MPI_COMM_SELF. The

Re: [OMPI devel] Comm_split_type(COMM_SELF, MPI_UNDEFINED, ...)

2014-08-27 Thread George Bosilca
The proposed patch has several issues, all of them detailed on the ticket. A correct patch as well as a broaden tester are provided. George. On Tue, Aug 26, 2014 at 8:21 PM, Jeff Squyres (jsquyres) wrote: > Good catch. > > I filed

Re: [OMPI devel] Envelope of HINDEXED_BLOCK

2014-08-27 Thread George Bosilca
Lisandro, Thanks for the tester. I pushed a fix in the trunk (r32613) and I requested a CMR for the 1.8.3. George. On Tue, Aug 26, 2014 at 6:53 AM, Lisandro Dalcin wrote: > I've just installed 1.8.2, something is still wrong with > HINDEXED_BLOCK datatypes. > > Please

Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-27 Thread George Bosilca
t 2014 23:59, George Bosilca <bosi...@icl.utk.edu> wrote: > > Lisandro, > > > > You rely on a feature clearly prohibited by the MPI standard. Please read > > the entire section I pinpointed you to (8.7.1). > > > > There are 2 key sentences in the section. &g

Re: [OMPI devel] clang alignment warnings

2014-09-10 Thread George Bosilca
It complains about 0x2b1b1ed9 being misaligned which seems as a valid complaint. What is the dst value when this trigger? What is var->mbv_storage? George. On Thu, Sep 11, 2014 at 5:29 AM, Jeff Squyres (jsquyres) wrote: > On Sep 10, 2014, at 4:23 PM, Jeff Squyres

Re: [OMPI devel] race condition in oob/tcp

2014-09-19 Thread George Bosilca
Or copy the handshake protocol design of the TCP BTL... George. On Fri, Sep 19, 2014 at 6:23 PM, Ralph Castain wrote: > You know, I'm almost beginning to dread opening my email in the morning > for fear of seeing another "race condition" subject line! :-) > > I think the

Re: [OMPI devel] Different behaviour with MPI_IN_PLACE in MPI_Reduce_scatter() and MPI_Ireduce_scatter()

2014-09-28 Thread George Bosilca
, Sep 28, 2014 at 6:29 AM, Lisandro Dalcin <dalc...@gmail.com> wrote: > On 22 April 2014 03:02, George Bosilca <bosi...@icl.utk.edu> wrote: > > Btw, the proposed validator was incorrect the first printf instead of > > > > printf(“[%d] rbuf[%d]=%2d expected:

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-10-01 Thread George Bosilca
I see no change in the topo interface in any of the patches attached to this thread. Is there any other patch related to this discussion? George. > On Oct 1, 2014, at 14:52, Jeff Squyres (jsquyres) wrote: > >> On Oct 1, 2014, at 6:48 AM, Gilles Gouaillardet >>

Re: [OMPI devel] opal components subject to removal for 1.9 release

2014-10-03 Thread George Bosilca
On Oct 3, 2014, at 17:06 , Howard wrote: > Hello OMPI folks, > > As part of the code cleanup for release 1.9/2.0, there are several > opal components that are on the radar for possible removal. > > These include: > > mca/btl/template (not clear anyone is maintaining

Re: [OMPI devel] RFC: calloc instead of malloc in opal_obj_new()

2014-10-03 Thread George Bosilca
It’s a tough call. This proposal will create significant differences between the debug and fast builds. As the entire objects will be set to zero this might reduce bugs in the debug build, bugs that will be horribly difficult to track in any non-debug builds. Moreover, if the structures are

Re: [OMPI devel] Open MPI Developers F2F Q1 2015 (poll closes on Friday, 7th of November)

2014-11-05 Thread George Bosilca
Even to US attendees Atlanta might seem more appealing, as it is one hop away from most locations and it has reasonable weather forecast for January/February (not as good as Dallas I concede). George. On Wed, Nov 5, 2014 at 1:18 PM, Jeff Squyres (jsquyres) wrote: > SHORT

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread George Bosilca
I pushed a slightly better patch for the TCP BTL (54ddb0aece0892dcdb1a1293a3bd3902b5f3acdc). The correct scheme would be to OBJ_RETAIN the proc once it is attached to the btl_proc and release it upon destruction of the btl_proc. However, for some obscure reason this doesn't quite works, as the

Re: [OMPI devel] 1.8.3 and PSM errors

2014-11-11 Thread George Bosilca
> On Nov 11, 2014, at 17:13 , Jeff Squyres (jsquyres) > wrote: > >> More particularly, it looks like add_procs is being called a second time >> during MPI_Intercomm_create and being passed a process that is already >> connected (passed into the first add_procs call). Is

Re: [OMPI devel] question to OMPI_DECLSPEC

2014-11-26 Thread George Bosilca
Edgar, The restriction you are facing doesn't come from Open MPI, but instead it comes from the default behavior of how dlopen loads the .so files. As we do not manually force the RTLD_GLOBAL flag the scope of our modules is local, which means that the symbols defined in this library are not made

Re: [OMPI devel] Problem on MPI_Type_create_resized and multiple BTL modules

2014-11-29 Thread George Bosilca
Takahiro, Sorry for the delay in answering. Thanks for the bug report and the patch. I applied you patch, and added some tougher tests to make sure we catch similar issues in the future. Thanks, George. On Mon, Sep 29, 2014 at 8:56 PM, Kawashima, Takahiro < t-kawash...@jp.fujitsu.com> wrote:

Re: [OMPI devel] RFC: update opal lifo class and add fifo class

2014-12-02 Thread George Bosilca
The FIFO implementation doesn't look right to me. I don't have time to look at it right now, but just looking at the push it will not correctly succeed if two threads are pushing items in same time. A FIFO is a very sensitive algorithm, and should be treated accordingly. Moreover, there is no

Re: [OMPI devel] knem support in sm btl

2014-12-04 Thread George Bosilca
You can't use the PML error reporting mechanism in this particular instance, it is too early in the setup process (in the BTL component init function) and the PML has not setup the error callback yet. This function is called during the MPI_Init, at a time where most of the Open MPI infrastructure

Re: [OMPI devel] knem support in sm btl

2014-12-04 Thread George Bosilca
t 12:35 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > > Oh, good catch -- thanks. > > > > I wouldn't call abort -- that will dump core. Just show_help() and > exit(nonzero), I guess. > > > > > > On Dec 4, 2014, at 3:31 PM, Ge

[OMPI devel] autogen broken

2014-12-09 Thread George Bosilca
After updating to the latest master (3a14c8e), I start having issues with the VPATH build on Mac OS X. The autogen.pl and configure succeeded but when make is invoked I got the following error: Making all in opal Making all in include /Applications/Xcode.app/Contents/Developer/usr/bin/make

Re: [OMPI devel] autogen broken

2014-12-09 Thread George Bosilca
y bug-report on this (and the work-around) here: > http://www.open-mpi.org/community/lists/devel/2014/11/16371.php > > 2014-12-09 7:57 GMT+01:00 George Bosilca <bosi...@icl.utk.edu>: > >> After updating to the latest master (3a14c8e), I start having issues with >> the VPATH build

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-11 Thread George Bosilca
The overall design in OMPI was that no OMPI module should be allowed to decide if threads are on (thus it should not rely on the value returned by opal_using_threads during it's initialization stage). Instead, they should respect the level of thread support requested as an argument during the

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-12 Thread George Bosilca
question is for Pascal at Bull: why do you feel > this earlier setting is required? > This might allow to see if using functions that require protection, such as opal_lifo_push, will work by default or one should use directly their atomic version? George. > > > On

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-12 Thread George Bosilca
t I found. George. > > Cheers, > > Gilles > > On 2014/12/12 10:30, Ralph Castain wrote: > > Just to help me understand: I don’t think this change actually changed any > behavior. However, it certainly *allows* a different behavior. Isn’t that > true? > >

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-13 Thread George Bosilca
it_thread(). > > I saw also that opal_using_threads() exists and was used by other BTLs. > > > > Maybe the solution is to find the way to set enable_mpi_threads to 0 when > MPI_Init() is called. > > > > > > *De :* devel [mailto:devel-boun...@open-mpi.org] *De la part de* Ge

Re: [OMPI devel] [1.8.4rc5] preliminary results

2014-12-18 Thread George Bosilca
I also noticed a drastic increase in the number of linking warnings. This is on 64 bits SciLinux Carbon (6.6) with using the Intel compilers 14.0.0 20130728. I run some tests and everything seems to work just fine, so this might not be such a deal breaker. George. libtool: install: warning:

Re: [OMPI devel] FT code (again)

2014-12-19 Thread George Bosilca
A opal_pmix.fence seems like a perfect replacement. George. On Fri, Dec 19, 2014 at 10:26 AM, Adrian Reber wrote: > Again I am trying to get the FT code working. This time I am unsure how > to resolve the code changes from this commit: > > commit

Re: [OMPI devel] libfabric, config.h and hwloc

2014-12-20 Thread George Bosilca
The trunk is broken: libfabric/libfabric/include/fi.h:50:25: error: stdatomic.h: No such file or directory George. On Fri, Dec 19, 2014 at 2:03 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Jeff, > > the include path is $top_srcdir/opal/mca/event/libevent2021/libevent >

Re: [OMPI devel] libfabric, config.h and hwloc

2014-12-20 Thread George Bosilca
; > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > -- > *From:* devel [devel-boun...@open-mpi.org] on behalf of George Bosilca [ > bosi...@icl.utk.ed

Re: [OMPI devel] RFC: remove --disable-smp-locks

2015-01-06 Thread George Bosilca
Successive alteration of the build system made this option less relevant and especially less meaningful. However, while removing it sounds like a desirable cleanup, we have to keep in mind that this will enable all locks and all memory barriers even in cases where they are not necessary (via

Re: [OMPI devel] RFC: remove --disable-smp-locks

2015-01-06 Thread George Bosilca
so turns on the lock prefix for the atomic operations, forcing them to always be atomic. I am not sure that this has no unexpected side-effects on the code. George. > > > On Jan 6, 2015, at 4:12 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > Successive alteration

Re: [OMPI devel] test/class/opal_fifo failure on ppc64

2015-01-08 Thread George Bosilca
Why do you need the memory write barrier inside the loop ? George. On Thu, Jan 8, 2015 at 11:16 AM, Nathan Hjelm wrote: > > Fixed on master. I forgot a write memory barrier in the 64-bit version > of opal_fifo_pop_atomic. > > -Nathan > > On Thu, Jan 08, 2015 at 02:29:05PM

[OMPI devel] #327

2015-01-09 Thread George Bosilca
I have some comments about this ticket and the corresponding patch. Honestly, the patch lacks most of the things we have talked about during our last developers meeting. However, my main concern in this particular email is about the SIGNAL flag. 1. The fact that currently there is little

Re: [OMPI devel] Another Open MPI <-> PSM question (MPI_Isend()/MPI_Cancel())

2015-01-15 Thread George Bosilca
eturn > > OMPI_SUCCESS. MPI_Cancel() still fails. > > > > Looking at the PSM code it seems it can directly call exit(-1) and thus > > terminating and never returning to Open MPI. I do not see any debug > > output from Open MPI after "Cannot cancel send requests" f

[OMPI devel] Failures

2015-01-15 Thread George Bosilca
Today's trunk compiled with icc fails to complete the check on 2 tests: opal_lifo and opal_tree. For opal_tree the output is: OPAL dss:unpack: got type 9 when expecting type 3 Failure : failed tree deserialization size compare SUPPORT: OMPI Test failed: opal_tree_t (1 of 12 failed) and

Re: [OMPI devel] Failures

2015-01-16 Thread George Bosilca
opal_tree_item_deserialize_fn_t > deserialize, > - char *curr_delim, > + volatile char *curr_delim, > int depth) > { > int idx = 1, rc; > > On 2015/01/16 8:57, George Bosilca wrote: &

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread George Bosilca
The extent should not be part of the decision, what matters is the amount of data to be pushed on the wire, and not it's span in memory. George. On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Adrian, > > i just fixed this in the master > ( >

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread George Bosilca
Btw, MPI_Type_hvector(2, 1, 0, MPI_INT, ); Is just a weird datatype. Because the stride is 0, this datatype a memory layout that includes 2 times the same int. I'm not sure this was indeed intended... George. On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet <

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-20 Thread George Bosilca
t; > Gilles > > On 2015/01/21 3:00, Jeff Squyres (jsquyres) wrote: > > George is right -- Gilles: was this the correct solution? > > > > Put differently: the extent of the 20K vector created below is 4 (bytes). > > > > > > > >> On Jan 19, 2015, at 2:39 A

Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-21 Thread George Bosilca
of datatype entries, so the cost was prohibitive. George. On Wed, Jan 21, 2015 at 9:43 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > On Jan 20, 2015, at 10:10 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > > Receiving with such a datatype

[OMPI devel] One sided tests

2015-01-21 Thread George Bosilca
Current trunk compiled with any compiler (gcc or icc) fails the one sided tests from mpi_test_suite. It deadlocks in a fetch. George.

Re: [OMPI devel] One sided tests

2015-01-22 Thread George Bosilca
a tentative fix is available at https://github.com/open-mpi/ompi/pull/355 > > i asked Nathan to review it before it lands into the master > > Cheers, > > Gilles > > > On 2015/01/22 7:08, George Bosilca wrote: > > Current trunk compiled with any compiler (gcc or icc)

[OMPI devel] Coll ML issues

2015-01-23 Thread George Bosilca
During some experiments we have identified several major issues with coll ML with a very recent version of Open MPI master (22ab638 Jan 20 13:21:44). Based on the description below I consider these issues as major drawbacks that require immediate action (or disabling coll ML by default in all

Re: [OMPI devel] Master warnings

2015-01-27 Thread George Bosilca
I took care of the TCP warnings. George. On Tue, Jan 27, 2015 at 7:20 AM, Ralph Castain wrote: > btl_tcp_frag.c: In function 'mca_btl_tcp_frag_dump': > > btl_tcp_frag.c:99: warning: comparison between signed and unsigned > > btl_tcp_frag.c:104: warning: comparison

Re: [OMPI devel] omni-release Github comment bot

2015-02-05 Thread George Bosilca
The RM should not be expected to read and accept the code itself, but his role should be limited to accepting the idea behind the patch and making sure it is compatible with the rules in place. As such, removing the RM-approval mark is not yielding any benefits. Moreover, based on the ideas

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread George Bosilca
My feeling is that the current patch hide the symptoms without addressing the real issue. As a side note: The compiler incriminated in this thread, works perfectly for 128 bits atomic operations in other projects where I use atomic LIFO & FIFO (but not the one from OMPI as I already raised my

Re: [OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread George Bosilca
Dave, Based on your ompi_info.all the following bandwidth are reported on your system: MCA btl: parameter "btl_openib_bandwidth" (current value: "4", data source: default, level: 5 tuner/detail, type: unsigned) Approximate maximum bandwidth of

Re: [OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread George Bosilca
nt FIFOs with CAS2, and even after peer reviews some of them turned out to be incorrect. What I am saying is that we are quick to blame these failures on the icc compiler, while we have no formal proof that the FIFO algorithm in Open MPI is correct. George. > > Cheers, > > Gilles >

Re: [OMPI devel] OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-09 Thread George Bosilca
ks with recent icc, i would not go "all in" with this ... > > And as you pointed, even if the problem does come from the compiler, that > does not mean ompi algo are necessarily correct. > > Cheers, > > Gilles > > George Bosilca <bosi...@icl.utk.edu>

Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-10 Thread George Bosilca
I did alter these to 40960 and 10240 as someone else >> suggested to me. The attached graph shows the base red line, along with >> the manual balanced blue line and auto balanced green line (0's for both). >> This shift lower suggests to me that the higher TCP latency

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread George Bosilca
Seriously? George. On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm wrote: > > I think I see the issue. Looks like there is a missing memory barrier > after the head consistency code. I will add one and see if that fixes > your problem. > > BTW, I can't reproduce the issue on

  1   2   3   4   5   6   7   8   9   10   >