Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
, 2014 at 01:05:50PM -0700, Ralph Castain wrote: > I'm not sure the sm actually relies on the RML any more - I thought we had > removed that dependency, though the file may not have been deleted. > > On Jul 28, 2014, at 1:02 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > >

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
t;, Let me see if I can resolve that one. -Nathan On Mon, Jul 28, 2014 at 02:14:36PM -0600, Nathan Hjelm wrote: > > Looks like you are correct. The function that calls the rml code is > mca_common_sm_init which is no longer called by anything (other than > mca_common_sm_init_group..

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
go back up in ompi/common/sm...? (since only > ompi/coll/sm uses it) > > > On Jul 28, 2014, at 4:34 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > > > > Damn, spoke too soon. coll/sm uses it: > > > > ./ompi/mca/coll/sm/coll_sm_module.c:

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
Ok, got --disable-dlopen working again. I removed the code in question and changed how coll/sm shares the segment data. -Nathan On Mon, Jul 28, 2014 at 02:41:37PM -0600, Nathan Hjelm wrote: > > Or pull it into coll/sm. Though I think we can do better here since > point-to-point mess

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-29 Thread Nathan Hjelm
The problem is the code in question does not check the return code of MPI_T_cvar_handle_alloc . We are returning an error and they still try to use the handle (which is stale). Uncomment this section of the code: //if (MPI_T_ERR_INVALID_INDEX == err)// { NOTE TZI: This variable

[OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-29 Thread Nathan Hjelm
What: Add new versions of opal_atomic_cmpset_* that return the old value not whether they succeeded. Why: I plan on adding support for network atomics to the BTL interface. The compare-and-swap function will return the old value from the target memory location. In order to implement a similar

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32346 - trunk/opal/mca/btl/openib

2014-07-29 Thread Nathan Hjelm
Josh, you can not free from a memory location that has been registered with the MCA variable system. It will likely SEGV when the component is unloaded. You should call mca_base_var_get_value to get the source of the value. The following should do it: vari = mca_base_var_find ("opal", "btl",

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32346 - trunk/opal/mca/btl/openib

2014-07-29 Thread Nathan Hjelm
(0 == strcmp(default_qps, >> - mca_btl_openib_component.receive_queues)) ? >> -BTL_OPENIB_RQ_SOURCE_DEFAULT : BTL_OPENIB_RQ_SOURCE_MCA; > >Josh > >On Tue, Jul 29, 2014 at 5:53 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > Josh, you can

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32346 - trunk/opal/mca/btl/openib

2014-07-29 Thread Nathan Hjelm
On Tue, Jul 29, 2014 at 04:12:18PM -0600, Nathan Hjelm wrote: > > Yeah. Though it would be best to just check the source when you need to > see if it should come from the ini. Then if we need to set the value > from the ini either use mca_base_var_set_value or be sure to strdup whe

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm
This is odd. The variable in question is registered by the MCA itself. I will take a look and see if I can determine why it isn't being deregistered correctly when the rest of the component's parameters are. -Nathan On Wed, Jul 30, 2014 at 08:17:15AM +0900, KAWASHIMA Takahiro wrote: > Nathan, >

Re: [OMPI devel] MPI_T SEGV on DSO

2014-07-30 Thread Nathan Hjelm
Yup, just noticed that. All component variables should be registered with mca_base_component_var_register but the versions were registered with the generic register function. The code in question is the oldest part of the MCA rewrite so it probably was missed when the component variable register

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-07-30 Thread Nathan Hjelm
returning the old value. I can bring the code over. > George. > >On Tue, Jul 29, 2014 at 5:29 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > On Tue, Jul 29, 2014 at 2:10 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > >Is there a reason why t

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
+2^1000 This information is absolutely necessary at this point. If someone has a better solution they can provide it as an alternative RFC. Until then this is how it should be done... Otherwise we loose uGNI support on the trunk. Because we ARE NOT going to remove the mailbox size

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
add in a hack to probe the apps placement info file and > > scale the smsg blocks by number of nodes rather than number of ranks. > > > > Howard > > > > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
that in >order to allow the BTLs to have access to some possible number of >processes between the call to btl_open and the call to btl_all_proc (in >other words during btl_init). > > George. > >PS: here is the patch that fixes all issues in ugni. >

Re: [OMPI devel] RFC: job size info in OPAL

2014-07-31 Thread Nathan Hjelm
des hints for how to allocate resources it needs to provide its > >> functionality? > >> > >> I'll see if there's something clever that can be done in ugni for now. > >> I can always add in a hack to probe the apps placement info file and > >> scale the smsg bl

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-11 Thread Nathan Hjelm
Which brings us back to Dave's question. Is there a list of supported architectures? I don't want to bother with DEC Alpha if we no longer support it. BTW, so far I have converted: AMD64, IA32, ARM. Working on IA64 now. -Nathan On Mon, Aug 11, 2014 at 01:57:21PM -0400, George Bosilca wrote: >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-20 Thread Nathan Hjelm
Really? That means PGI 2013 is NOT C99 compliant! Figures. -Nathan On Tue, Aug 19, 2014 at 10:48:48PM -0400, svn-commit-mai...@open-mpi.org wrote: > Author: ggouaillardet (Gilles Gouaillardet) > Date: 2014-08-19 22:48:47 EDT (Tue, 19 Aug 2014) > New Revision: 32555 > URL:

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-08-26 Thread Nathan Hjelm
Good catch. I will take a look and see how best to fix this. -Nathan On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: > I finally managed to track down some issues in mpi4py's test suite > using Open MPI 1.8+. The code below should be enough to reproduce the > problem. Run it

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Nathan Hjelm
topology information available at that point as you >described it ? > >the attached patch does this, could you please review it ? > >Cheers, > >Gilles > >On 2014/09/26 2:50, Nathan Hjelm wrote: > > On Tue, Aug 26, 2014 at 07:03:24PM +03

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Nathan Hjelm
supports >graph and dist graph. > >Cheers, > >Gilles > > On 2014/09/30 0:05, Nathan Hjelm wrote: > > An equivalent change would need to be made for graph and dist graph as > well. That will take a little more work. Also, I was avoiding changing >

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-10-01 Thread Nathan Hjelm
k and > >> carried forward to the v1.9 series (since it introduces an ABI break for > >> the topo framework, which we try not to do in the middle of a release > >> series). > >> > >> On Sep 30, 2014, at 10:40 AM, Nathan Hjelm <hje...@lanl.gov&

Re: [OMPI devel] Issue with MPI_Put in version 1.8.3

2014-10-07 Thread Nathan Hjelm
Thanks for the reproducer. I am taking a look now. BTW, MPI_Win_allocate is preferred over MPI_Win_create. It allows the MPI implementation more opportunities to optimize. -Nathan On Tue, Oct 07, 2014 at 01:09:33PM +0200, Berk Hess wrote: >Hi, > >I am implementing RMA in the Gromacs

Re: [OMPI devel] Issue with MPI_Put in version 1.8.3

2014-10-07 Thread Nathan Hjelm
Should be fixed on trunk now. There were a couple of minor issues in the PSCW path. CMR'd to 1.8. -Nathan On Tue, Oct 07, 2014 at 01:09:33PM +0200, Berk Hess wrote: >Hi, > >I am implementing RMA in the Gromacs molecular simulation package and ran >into an issue while using a

Re: [OMPI devel] osu_mbw_mr error

2014-11-03 Thread Nathan Hjelm
I see the problem. The openib btl does not properly handle the following call sequence (this is an openib btl bug IMHO): btl_sendi (..., ); btl_free (..., descriptor); The bug is in the message coalescing code and it looks like extra logic needs to be added to the openib btl's btl_free function

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
I have run into the issue as well. I will open a pull request for 1.8.4 as part of a patch fixing the coalescing issues. -Nathan On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote: > On 11/4/2014 2:09 PM, Steve Wise wrote: > >Hi, > > > >I'm running ompi top-o-tree from github and seeing

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
> goto good; > } > > > On 11/4/2014 3:08 PM, Nathan Hjelm wrote: > >I have run into the issue as well. I will open a pull request for 1.8.4 > >as part of a patch fixing the coalescing issues. > > > >-Nathan > > > >

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Wise wrote: >Ok, sounds like I should let you continue the good work! :) When do you >plan to merge this into ompi proper? > >On 11/4/2014 3:58 PM, Nathan Hjelm wrote: > > That certainly addresses part of the problem. I am working on a complete > revamp of t

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
) When do you > > plan to merge this into ompi proper? > > > > > > On 11/4/2014 3:58 PM, Nathan Hjelm wrote: > >> That certainly addresses part of the problem. I am working on a complete > >> revamp of the btl RDMA interface. I

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
is one > particular fix in 1.8.4. > > > On Nov 4, 2014, at 5:24 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > > Going to put the RFC out today with a timeout of about 2 weeks. This > > will give me some time to talk with other Open MPI developers > > face

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-05 Thread Nathan Hjelm
he target, so goes through the >SM BTL, and the other initiator is off host, so goes through the network >BTL. > >Josh >On Tue, Nov 4, 2014 at 6:46 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > What: Completely revamp the BTL RDMA interface (btl_put,

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
All atomics must be done through not just "the same btl" but the same btl >MODULE, since atomics from two IB HCAs, for instance, are not necessarily >coherent. So, how is the "best" one to be selected? > >-Paul [Sent from my phone] > >On Nov 5, 2014

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
yer as well? >On Thu, Nov 6, 2014 at 5:23 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > At the moment I select the lowest latency BTL that can reach all of the > ranks in the communicator used to create the window. I can add code to > round-robin windows

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
rd >2014-11-06 12:08 GMT-07:00 Nathan Hjelm <hje...@lanl.gov>: > > I haven't look at that yet. Would be great to get the new osc component > working over both btls and mtls. I know portals supports atomics but I > don't know whether psm does. > >

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Nathan Hjelm
On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: >Nathan, >Has this bug always been present in OpenIB or is this a recent addition? >If this is regression, I would also be inclined to say that this is a The bug is as old as the message coalescing feature in the openib btl.

Re: [OMPI devel] osu_mbw_mr error

2014-11-06 Thread Nathan Hjelm
On Thu, Nov 06, 2014 at 04:29:44PM -0500, Joshua Ladd wrote: >On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote: > > On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: > >Nathan, > >Has this bug al

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
: >MXM supports atomics. > >On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote: > > I haven't look at that yet. Would be great to get the new osc component > working over both btls and mtls. I know portals supports atomics but I >

Re: [OMPI devel] RFC: revamp btl rdma interface

2014-11-06 Thread Nathan Hjelm
Looks like put and get functions should be added if possible. The MTL layer looks like it is designed for two-sided only with no intention of supporting one-sided. -Nathan On Thu, Nov 06, 2014 at 03:21:32PM -0700, Nathan Hjelm wrote: > > Great! We should probably try to figure out how t

Re: [OMPI devel] oshmem: put does not work with btl/vader if knem is enabled

2014-11-12 Thread Nathan Hjelm
On Wed, Nov 12, 2014 at 07:56:08PM +0900, Gilles Gouaillardet wrote: > Folks, > > I found (at least) two issues with oshmem put if btl/vader is used with > knem enabled : > > $ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction >

Re: [OMPI devel] oshmem: put does not work with btl/vader if knem is enabled

2014-11-12 Thread Nathan Hjelm
On Wed, Nov 12, 2014 at 07:56:08PM +0900, Gilles Gouaillardet wrote: > Folks, > > I found (at least) two issues with oshmem put if btl/vader is used with > knem enabled : > > $ oshrun -np 2 --mca btl vader,self ./oshmem_max_reduction >

Re: [OMPI devel] RFC: update opal lifo class and add fifo class

2014-12-02 Thread Nathan Hjelm
On Tue, Dec 02, 2014 at 05:54:04PM -0500, George Bosilca wrote: >The FIFO implementation doesn't look right to me. I don't have time to >look at it right now, but just looking at the push it will not correctly >succeed if two threads are pushing items in same time. >A FIFO is a

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Nathan Hjelm
Ralph, I corrected this as part of the thread multiple pull request in 1.8. https://github.com/rhc54/ompi-release/commit/52823d592c3759c53ed63ed1f63fe200d2491220#diff-3673b21a7f42dc0665ea4470b3171df1R510 -Nathan On Tue, Dec 09, 2014 at 12:31:55AM -0800, Ralph Castain wrote: >Hi Pascal >

Re: [OMPI devel] Patch proposed: opal_set_using_threads(true) in ompi/runtime/ompi_mpi_init.c is called to late

2014-12-09 Thread Nathan Hjelm
Just hadn't gotten around to it yet :). Still working on free list and lifo stuff. -Nathan On Tue, Dec 09, 2014 at 07:56:04AM -0800, Ralph Castain wrote: > Kewl - I wonder why it wasn’t fixed in trunk then? > > > > On Dec 9, 2014, at 7:52 AM, Nathan Hjelm <hje.

Re: [OMPI devel] opal_lifo/opal_fifo fail with make distcheck

2014-12-10 Thread Nathan Hjelm
The failure was due to the use of opal_init() in the tests. I thought it was ok to use because it is used by other tests (which turned out to be disabled) but that isn't the case. opal_init_util() has to be used instead. I pushed a fix to master last night. -Nathan On Tue, Dec 09, 2014 at

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Nathan Hjelm
On Thu, Dec 11, 2014 at 07:37:17AM -0800, Howard Pritchard wrote: >Hi Folks, >I'm trying to use mtt on a cluster where it looks like the only functional >compiler that >1) can build open mpi master >2) can also build the ibm test suite >may be pgi. Can't compile write now,

Re: [OMPI devel] [1.8.4rc2] build broken by default on SGI UV

2014-12-12 Thread Nathan Hjelm
Hmm, I thought we already cleaned that up in 1.8. I will take a look today. BTW, can you send me the sn/xpmem.h file from your machine. I might have an idea what is going wrong. Can't seen to find the link the SGI's tarball on their oss site. -Nathan On Thu, Dec 11, 2014 at 06:53:00PM -0800,

Re: [OMPI devel] OpenIB has some borked code

2014-12-12 Thread Nathan Hjelm
Yeah, that code is completely wrong. I have a fix in my btl modifications branch. https://github.com/hjelmn/ompi/commit/38e961193074d382983d000e68adb721aaf3df7d -Nathan On Fri, Dec 12, 2014 at 08:26:34AM -0800, Ralph Castain wrote: >Hey folks >I've been looking into this warning: >

Re: [OMPI devel] Trunk warnings

2014-12-12 Thread Nathan Hjelm
The osc warnings will go away after the btl modifications are applied. I made signifigant changes to the component. -Nathan On Fri, Dec 12, 2014 at 09:49:47AM -0800, Ralph Castain wrote: >While building optimized on Linux: >bcol_ptpcoll_allreduce.c: In function >

Re: [OMPI devel] OpenIB has some borked code

2014-12-12 Thread Nathan Hjelm
roblem is contained in its own commit. >Howard >2014-12-12 9:38 GMT-07:00 Nathan Hjelm <hje...@lanl.gov>: > > Yeah, that code is completely wrong. I have a fix in my btl > modifications branch. > > > https://github.com/hjelmn/ompi/commit/38e9

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-509-g38d6627

2014-12-15 Thread Nathan Hjelm
- > > https://github.com/open-mpi/ompi/commit/38d66272c51fd531181d9dc282a7260f40270f64 > > > > commit 38d66272c51fd531181d9dc282a7260f40270f64 > > Author: Nathan Hjelm <hje...@lanl.gov> > > Date: Fri Dec 12 09:09:01

[OMPI devel] RFC: remove --disable-smp-locks

2015-01-06 Thread Nathan Hjelm
What: Remove the --disable-smp-locks configure option from master. Why: Use of this option produces incorrect results/undefined behavior when any shared memory BTL is in use. Since BTL usage is enabled even when using cm for point-to-point this option can never be safely used. When: Thurs, Jan

Re: [OMPI devel] pthreads (was: Re: RFC: remove --disable-smp-locks)

2015-01-07 Thread Nathan Hjelm
Yes, we decided some time back that pthreads is a minimum requirement for Open MPI. -Nathan On Wed, Jan 07, 2015 at 04:26:01PM +, Jeff Squyres (jsquyres) wrote: > On Jan 7, 2015, at 11:22 AM, Gilles Gouaillardet > wrote: > > > Valid options are : > >

Re: [OMPI devel] RFC: remove --disable-smp-locks

2015-01-07 Thread Nathan Hjelm
eff Squyres (jsquyres) > <jsquy...@cisco.com> wrote: > > > > > > +1 > > > > > > On Jan 6, 2015, at 11:55 AM, Howard Pritchard <hpprit...@gmail.com> > wrote: > > > > > >> I agree. Please r

Re: [OMPI devel] test/class/opal_fifo failure on ppc64

2015-01-08 Thread Nathan Hjelm
I will take a look if you can give me ssh access. -Nathan On Thu, Jan 08, 2015 at 02:29:05PM +0100, Adrian Reber wrote: > I am trying to build OMPI git master on ppc64 (PPC970MP) and > test/class/opal_fifo fails during make check most of the time. > > [adrian@bimini class]$ ./opal_fifo >

Re: [OMPI devel] test/class/opal_fifo failure on ppc64

2015-01-08 Thread Nathan Hjelm
Fixed on master. I forgot a write memory barrier in the 64-bit version of opal_fifo_pop_atomic. -Nathan On Thu, Jan 08, 2015 at 02:29:05PM +0100, Adrian Reber wrote: > I am trying to build OMPI git master on ppc64 (PPC970MP) and > test/class/opal_fifo fails during make check most of the time. >

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Nathan Hjelm
I think I see the issue. Looks like there is a missing memory barrier after the head consistency code. I will add one and see if that fixes your problem. BTW, I can't reproduce the issue on any of my systems :-/. -Nathan On Thu, Feb 12, 2015 at 02:07:08AM -0800, Paul Hargrove wrote: >Just

Re: [OMPI devel] opal_fifo SEGV from master

2015-02-12 Thread Nathan Hjelm
could see for a possibly inconsistency. It might not make any difference. If that is the case I will dig deeper. -Nathan On Thu, Feb 12, 2015 at 03:48:25PM -0500, George Bosilca wrote: >Seriously? > George. >On Thu, Feb 12, 2015 at 1:00 PM, Nathan Hjelm <hje...@lan

[OMPI devel] RFC: merge opal_free_list_t and ompi_free_list_t

2015-02-19 Thread Nathan Hjelm
What: Merge the opal_free_list_t and ompi_free_list_t implementations and add explicit interfaces for single and multi-threaded use. Why: Historically these two lists were different due to ompi_free_list_t dependencies in ompi (mpool). Those dependencies have since been moved to opal so it is

Re: [OMPI devel] Free list warning

2015-02-26 Thread Nathan Hjelm
Is it just complaining that the inline functions are deprecated or some code still using ompi_free_list_t? If it is the former I will go ahead and remove the dummy implementation. -Nathan On Wed, Feb 25, 2015 at 09:00:26PM -0800, Ralph Castain wrote: >It looks like everything in

Re: [OMPI devel] Free list warning

2015-02-26 Thread Nathan Hjelm
using it > > > On Feb 26, 2015, at 8:07 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > > > > > Is it just complaining that the inline functions are deprecated or some > > code still using ompi_free_list_t? If it is the former I will go ahead > > and remove the dum

Re: [OMPI devel] Master warning on oob:ud w/ PGI

2015-03-11 Thread Nathan Hjelm
Fixed. -Nathan On Thu, Feb 26, 2015 at 04:57:22PM -0800, Paul Hargrove wrote: >The warning below comes from pgi-14.7 on the latest master tarball (output >from "make V=1"). >-Paul >libtool: compile: pgcc -DHAVE_CONFIG_H -I. > >

Re: [OMPI devel] BML changes

2015-03-11 Thread Nathan Hjelm
Definitely a side-effect though it could be beneficial in some cases as the RDMA engine in the HCA may be faster than using memcpy (larger than a certain size). I don't know how to best fix this as I need all RDMA capable BTLs to listed for RMA. I though about adding another list to track BTLs

Re: [OMPI devel] BML changes

2015-03-11 Thread Nathan Hjelm
the DMA engine of the >NIC is not such a good idea. >Howard >2015-03-11 10:57 GMT-06:00 Nathan Hjelm <hje...@lanl.gov>: > > Definitely a side-effect though it could be beneficial in some cases as > the RDMA engine in the HCA may be faster than using memcpy

Re: [OMPI devel] hangs/crashes with openmpi-1.8.4-99-20150228

2015-03-16 Thread Nathan Hjelm
As mentioned on the ticket I cannot reproduce this on master (same version of vader) with netcdf master, hdf 1.8.15, and gcc 4.8.2. The test runs to completion with both ompio and romio with both xpmem and no single copy support. This could be a romio bug as the version in 1.8.4 lags behind

Re: [OMPI devel] hangs/crashes with openmpi-1.8.4-99-20150228

2015-03-25 Thread Nathan Hjelm
Boulder/CoRA Office FAX: 303-415-9702 > 3380 Mitchell Lane or...@nwra.com > Boulder, CO 80301 http://www.nwra.com > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://ww

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch asm_fix created. dev-1370-g26f96c0

2015-03-25 Thread Nathan Hjelm
On Wed, Mar 25, 2015 at 08:59:31PM +, Dave Goodell (dgoodell) wrote: > On Mar 25, 2015, at 3:02 PM, git...@crest.iu.edu wrote: > > > +static inline int32_t opal_atomic_swap_32( volatile int32_t *addr, > > + int32_t newval) > > +{ > > +int32_t oldval; >

[OMPI devel] Opal atomics question

2015-03-26 Thread Nathan Hjelm
I am working on cleaning up the atomics in opal and I noticed something odd. We define opal_atomic_sub_32 and opal_atomic_sub_64 yet only use opal_atomic_sub_32 once: ./opal/runtime/opal_progress.c:val = opal_atomic_sub_32(_event_users, 1); This could easily be changed to: val =

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Nathan Hjelm
are known to be wrong and most compilers support in-line assembly. -Nathan On Thu, Mar 26, 2015 at 09:22:39AM -0600, Nathan Hjelm wrote: > > I am working on cleaning up the atomics in opal and I noticed something > odd. We define opal_atomic_sub_32 and opal_atomic_sub_64 yet

Re: [OMPI devel] Opal atomics question

2015-03-26 Thread Nathan Hjelm
parcv8+, sparcv9, ia64 and mips in release candidates. >That isn't the same as *using* any of those platforms in production. >I just mean to say that the implementations are known to pass "make >check". >-Paul >On Thu, Mar 26, 2015 at 8:48 AM, Natha

Re: [OMPI devel] hangs/crashes with openmpi-1.8.4-99-20150228

2015-03-30 Thread Nathan Hjelm
Ok, I will take a look today and see if I can determine why vader is hanging in 32-bit builds. -Nathan On Fri, Mar 27, 2015 at 11:26:36AM -0600, Orion Poplawski wrote: > On 03/25/2015 01:46 PM, Nathan Hjelm wrote: > > > > Can you please retest both make check and vader with the

[OMPI devel] .ompi_info dependency files

2015-04-07 Thread Nathan Hjelm
I am working on rewriting some of the MCA component open code to delay dlclose until opal_util_finalize () and I ran into something interesting. Open MPI supports component dependency files ending in .ompi_info. These files can be used to describe dependencies between mca components. This feature

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1488-g40b7643

2015-04-13 Thread Nathan Hjelm
That is going to be unreachable code. The outstanding lock can only take on the value lock_nocheck, lock_exclusive, lock_shared, or lock_none. All of which are checked. The correct fix would be to change it to an switch so a warning will be printed if any other valid lock values are added.

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch revert-520-valgrind_cleanness created. dev-1504-g7a8a4a0

2015-04-15 Thread Nathan Hjelm
Looks like they just accidentally created a branch on the github repo. I can confirm that they indeed did not revert the commit but only fixed the relevant issue. -Nathan On Wed, Apr 15, 2015 at 07:36:16AM -0700, Ralph Castain wrote: >Sare you going to restore the rest of it? Or are

Re: [OMPI devel] Common symbols warning

2015-04-15 Thread Nathan Hjelm
On Tue, Apr 14, 2015 at 08:14:09PM -0700, Ralph Castain wrote: >Dave committed this earlier today, and here is the first error report: >WARNING! Common symbols found: > comm_request.o: 0068 C ompi_comm_request_mutex > comm_request.o:

Re: [OMPI devel] Common symbols warning

2015-04-15 Thread Nathan Hjelm
Ah, oh well. Its my code so I went ahead and committed that fix. -Nathan On Wed, Apr 15, 2015 at 09:02:48AM -0700, Ralph Castain wrote: >FWIW: Gilles has a pending PR that fixes all of these > https://github.com/open-mpi/ompi/pull/530 > > On Apr 15, 2015, at 8:55 AM,

Re: [OMPI devel] 1.8.5rc1 and OOB on Cray XC30

2015-04-16 Thread Nathan Hjelm
Take a look at contrib/platform/lanl/cray_xe6/optimized-lustre.conf. There are a couple of MCA variables that need to be set in order to enable mpirun on Cray systems. -Nathan On Thu, Apr 16, 2015 at 04:29:21PM -0400, Aurélien Bouteiller wrote: > > >- Improved support for Cray >

Re: [OMPI devel] Master appears broken on the Mac

2015-04-20 Thread Nathan Hjelm
Shoot. That would be my configure changes. Looks like I should rename that temporary variable or push/pop it. Will get you a fix soon. -Nathan On Mon, Apr 20, 2015 at 01:57:45PM -0700, Ralph Castain wrote: >Hit this error with current HEAD: > >checking if threads have different pids

Re: [OMPI devel] noticing odd message

2015-04-20 Thread Nathan Hjelm
Tracking it down now. Probably a typo in a component initialization. -Nathan On Mon, Apr 20, 2015 at 04:34:10PM -0600, Howard Pritchard wrote: >Hi Folks, >Working on master, I"m getting an odd message: >malloc debug: Request for 1 zeroed elements of size 0 (mca_base_var.c, >170)

Re: [OMPI devel] noticing odd message

2015-04-20 Thread Nathan Hjelm
Fixed in 359a282e7d31a8a7af3a69ead518ff328862b801. mca_base_var does not currently allow component to be registered with NULL for both the framework and component. -Nathan On Mon, Apr 20, 2015 at 04:34:10PM -0600, Howard Pritchard wrote: >Hi Folks, >Working on master, I"m getting an odd

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm
Umm, why are you cleaning up this way. The allocated resources *should* be freed by the udcm_module_finalize call. If there is a bug in that path it should be fixed there NOT by adding a bunch of gotos (ick). I will take a look now and apply the appropriate fix. -Nathan On Wed, Apr 22, 2015 at

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm
I see the problem. I thought I fixed this awhile ago but apparently not. The various OBJ_CONSTRUCT lines should be at the top of the udcm_module_init to ensure that they are always called. Fixing. -Nathan On Wed, Apr 22, 2015 at 01:13:08PM -0600, Nathan Hjelm wrote: > > I will commit t

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm
Agreed. goto's just make me grumpy. -Nathan On Wed, Apr 22, 2015 at 01:17:11PM -0600, Howard Pritchard wrote: >Hi Rafael, >I give you an A+ for effort. We always appreciate patches. >Howard >2015-04-22 12:43 GMT-06:00 Nathan Hjelm <hje...@lanl.gov>: > &g

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm
PR https://github.com/open-mpi/ompi-release/pull/250. Raphaël, can you please confirm this fixes your issue. -Nathan On Wed, Apr 22, 2015 at 04:55:57PM +0200, Raphaël Fouassier wrote: > We are experiencing a bug in OpenMPI in 1.8.4 which happens also on > master: if locked memory limits are too

Re: [OMPI devel] MCA params

2015-05-11 Thread Nathan Hjelm
Hmm, that shouldn't be happening. Will take a look now. -Nathan On Mon, May 11, 2015 at 04:51:43PM -0400, George Bosilca wrote: >I was looking to preconnect all MPI processes to remove some weird >behaviors. As I did not remembered the full name I hope to get that from >the

Re: [OMPI devel] Hang in IMB-RMA?

2015-05-12 Thread Nathan Hjelm
Thanks for the report. Can you try with master and see if the issue is fixed there? -Nathan On Tue, May 12, 2015 at 04:38:01PM +, Friedley, Andrew wrote: > Hi, > > I've run into a problem with the IMB-RMA exchange_get test. At this point I > suspect it's an issue in Open MPI or the test

Re: [OMPI devel] Hang in IMB-RMA?

2015-05-12 Thread Nathan Hjelm
ght to do that. Yes, the issue seems to be fixed on master > -- no hangs on PSM, openib, or tcp. > > Andrew > > > -Original Message- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > > Hjelm > > Sent: Tuesday, May 12, 2015 9:44 A

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Nathan Hjelm
7B%7D,'cvml','gil...@rist.or.jp');>> wrote: > > > > > > Nathan, > > > > > > i think it is a good idea to use names vs numeric values for verbosity. > > > > > > what about using "a la" log4c verbosity names ? > > &g

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Nathan Hjelm
Yes. -Nathan On Mon, Jun 08, 2015 at 09:17:17AM -0700, Ralph Castain wrote: > So how is the user going to specify these? -mca oob_base_verbose debug? > > > On Jun 8, 2015, at 9:11 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > > > > > > That would work.

Re: [OMPI devel] error in test/threads/opal_condition.c

2015-07-01 Thread Nathan Hjelm
PGI no longer suprises me with how bad it is. The lines in question look ok to me. We can fix this (and remove the common symbols) by removing the initializers and making the variables static. I will go ahead and do this. -Nathan On Wed, Jul 01, 2015 at 05:41:59AM -0700, Paul Hargrove wrote: >

Re: [OMPI devel] opal_lifo hangs on ppc in master

2015-07-01 Thread Nathan Hjelm
Paul, can you send me the config.log for the ppc build? -Nathan On Wed, Jul 01, 2015 at 09:33:53AM -0700, Paul Hargrove wrote: >Testing last night's master tarball with "make check" I find that >opal_lifo *hangs* on every ppc/linux system I try, including both gcc and >xlc, both 32-

Re: [OMPI devel] Open MPI 1.8.6 memory leak

2015-07-01 Thread Nathan Hjelm
Don't see the leak on master with OS X using the leaks command. Will see what valgrind finds on linux. -Nathan On Wed, Jul 01, 2015 at 08:48:57PM +, Rolf vandeVaart wrote: >There have been two reports on the user list about memory leaks. I have >reproduced this leak with LAMMPS.

[OMPI devel] RFC: kill alpha asm support

2015-07-14 Thread Nathan Hjelm
I would like to kill the alpha assembly support in Open MPI in 2.x and master. alpha processors have not been available since 2007. Anyone still interested in alpha support can use the gcc sync atomics are stick with 1.10 or earlier? Any objections? -Nathan pgpFt2pWILmi2.pgp Description: PGP

Re: [OMPI devel] RFC: kill alpha asm support

2015-07-14 Thread Nathan Hjelm
). It is not in widespread use so its existence should not save the alpha asm. -Nathan On Tue, Jul 14, 2015 at 01:29:28PM -0600, Nathan Hjelm wrote: > > I would like to kill the alpha assembly support in Open MPI in 2.x and > master. alpha processors have not been available since 2007. Anyone > stil

Re: [OMPI devel] C standard compatibility

2015-07-30 Thread Nathan Hjelm
On Thu, Jul 30, 2015 at 12:41:33PM +, Jeff Squyres (jsquyres) wrote: > We only recently started allowing the use of C99 in the code base (i.e., we > put AC_PROG_CC_C99 in configure.ac). > > There's no *requirement* to use C99 throughout the code, but we generally do > the following kinds of

Re: [OMPI devel] [OMPI users] open mpi 1.8.6. MPI_T

2015-08-17 Thread Nathan Hjelm
That is interesting. Let me look at the logic and see if I can determine what is going wrong. It could be a naming issues. ie. opal_btl_vader_flags vs btl_vader_flags. Both are valid names for the same variable but the search may only be succeeding for one. Should be simple enought to fix if

Re: [OMPI devel] [OMPI users] open mpi 1.8.6. MPI_T

2015-08-17 Thread Nathan Hjelm
I see the problem. The second argument of MPI_T_pvar_get_index is not the binding. It is the variable class. Change it to: err = MPI_T_pvar_get_index(name, varClass, _idx); and it works as expected. -Nathan On Fri, Aug 14, 2015 at 03:08:42PM -0400, George Bosilca wrote: >Another issue,

Re: [OMPI devel] 1.10.0rc3 build failure Solaris/x86 + gcc

2015-08-20 Thread Nathan Hjelm
I see the problem. Both Ralph and I missed an error in the cherry-pick. For add_32 in the ia32 atomics we were checking for OPAL_GCC_INLINE_ASSEMBLY instead of OMPI_GCC_INLINE_ASSEMBLY. -Nathan On Thu, Aug 20, 2015 at 03:01:35PM +, Jeff Squyres (jsquyres) wrote: > Paul -- > > I see that

Re: [OMPI devel] 1.10.0rc3 build failure Solaris/x86 + gcc

2015-08-20 Thread Nathan Hjelm
f Squyres (jsquyres) ><jsquy...@cisco.com> wrote: > > (the fix has been merged in to v1.8 and v1.10 branches) > > On Aug 20, 2015, at 12:18 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > > > > > I see the problem. Both R

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Nathan Hjelm
+1 On Mon, Aug 24, 2015 at 07:08:02PM +, Jeff Squyres (jsquyres) wrote: > FWIW, we have had verbal agreement in the past that the v1.8 series was the > last one to contain MX support. I think it would be fine for all MX-related > components to disappear from v1.10. > > Don't forget that

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm
The reproducer is working for me with master on OX 10.10. Some changes to ompi_comm_set went in yesterday. Are you on the latest hash? -Nathan On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: > something is borked right now on master in the management of inter vs. intra >

  1   2   3   4   >