Re: [OMPI devel] How to progress MPI_Recv using custom BTL for NIC under development

2022-08-03 Thread Nathan Hjelm via devel
Kind of sounds to me like they are using the wrong proc when receiving. Here is an example of what a modex receive should look like:https://github.com/open-mpi/ompi/blob/main/opal/mca/btl/ugni/btl_ugni_endpoint.c#L44-NathanOn Aug 3, 2022, at 11:29 AM, "Jeff Squyres (jsquyres) via devel" wrote:Gl

Re: [OMPI devel] C style rules / reformatting

2021-05-18 Thread Nathan Hjelm via devel
It really is a shame that could not go forward. There are really three end goals in mind: 1) Consistency. We all have different coding styles and following a common coding style is more and more considered a best practice. The number of projects using clang-format grows continuously. I find it

Re: [OMPI devel] Intel OPA and Open MPI

2019-04-24 Thread Nathan Hjelm via devel
for that at this time? > >> -Original Message- >> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of >> Nathan Hjelm via devel >> Sent: Friday, April 12, 2019 11:19 AM >> To: Open MPI Developers >> Cc: Nathan Hjelm ; Castain, Ralph H >> ; Y

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-16 Thread Nathan Hjelm via devel
d but it'll use more resources. > > On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel > wrote: > If you do that it may run out of resources and deadlock or crash. I recommend > either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) > enab

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Nathan Hjelm via devel
If you do that it may run out of resources and deadlock or crash. I recommend either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) enable coll/sync (which essentially does 1). Honestly, 2 is probably the easiest option and depending on how large you run may not be any slowe

Re: [OMPI devel] Intel OPA and Open MPI

2019-04-12 Thread Nathan Hjelm via devel
That is accurate. We expect to support OPA with the btl/ofi component. It should give much better performance than osc/pt2pt + mtl/ofi. What would be good for you to do on your end is verify everything works as expected and that the performance is on par for what you expect. -Nathan > On Apr 1

Re: [OMPI devel] IBM CI re-enabled.

2018-10-18 Thread Nathan Hjelm via devel
Appears to be broken. Its failing and simply saying: Testing in progress.. -Nathan On Oct 18, 2018, at 11:34 AM, Geoffrey Paulsen wrote:   I've re-enabled IBM CI for PRs.   ___ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/

Re: [OMPI devel] Bug on branch v2.x since october 3

2018-10-17 Thread Nathan Hjelm via devel
Ah yes, 18f23724a broke things so we had to fix the fix. Didn't apply it to the v2.x branch. Will open a PR to bring it over. -Nathan On Oct 17, 2018, at 11:28 AM, Eric Chamberland wrote: Hi, since commit 18f23724a, our nightly base test is broken on v2.x branch. Strangely, on branch v3.x

Re: [OMPI devel] Patcher on MacOS

2018-09-28 Thread Nathan Hjelm via devel
Nope. We just never bothered to disable it on osx. I think Jeff was working on a patch. -Nathan > On Sep 28, 2018, at 3:21 PM, Barrett, Brian via devel > wrote: > > Is there any practical reason to have the memory patcher component enabled > for MacOS? As far as I know, we don’t have any t

Re: [OMPI devel] Test mail

2018-08-28 Thread Nathan Hjelm via devel
no Sent from my iPhone > On Aug 27, 2018, at 8:51 AM, Jeff Squyres (jsquyres) via devel > wrote: > > Will this get through? > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi

Re: [OMPI devel] openmpi 3.1.x examples

2018-07-17 Thread Nathan Hjelm via devel
> On Jul 16, 2018, at 11:18 PM, Marco Atzeri wrote: > >> Am 16.07.2018 um 23:05 schrieb Jeff Squyres (jsquyres) via devel: >>> On Jul 13, 2018, at 4:35 PM, Marco Atzeri wrote: >>> For one. The C++ bindings are no longer part of the standard and they are not built by default in v3.1

Re: [OMPI devel] openmpi 3.1.x examples

2018-07-13 Thread Nathan Hjelm via devel
For one. The C++ bindings are no longer part of the standard and they are not built by default in v3.1x. They will be removed entirely in Open MPI v5.0.0. Not sure why the fortran one is not building. -Nathan > On Jul 13, 2018, at 2:02 PM, Marco Atzeri wrote: > > Hi, > may be I am missing s

Re: [OMPI devel] Odd warning in OMPI v3.0.x

2018-07-06 Thread Nathan Hjelm via devel
Looks like a bug to me. The second argument should be a value in v3.x.x. -Nathan > On Jul 6, 2018, at 4:00 PM, r...@open-mpi.org wrote: > > I’m seeing this when building the v3.0.x branch: > > runtime/ompi_mpi_init.c:395:49: warning: passing argument 2 of > ‘opal_atomic_cmpset_32’ makes intege

Re: [OMPI devel] Master warnings?

2018-06-02 Thread Nathan Hjelm
Should have it fixed today or tomorrow. Guess I didn't have a sufficiently old gcc to catch this during testing. -Nathan > On Jun 2, 2018, at 1:09 AM, gil...@rist.or.jp wrote: > > Hi Ralph, > > > > see my last comment in https://github.com/open-mpi/ompi/pull/5210 > > > > long story shor

[OMPI devel] RFC: Add an option to disable interfaces removed in MPI-3.0 and make it default

2018-05-02 Thread Nathan Hjelm
I put together a pull request that does the following: 1) Make all MPI-3.0 obsoleted interfaces conditionally built. They will still show up in mpi.h (though I can remove them from there with some configury magic) but will either be #if 0'd out or marked with __attribute__((__error__)). The later

Re: [OMPI devel] Time to remove the openib btl? (Was: Users/Eager RDMA causing slow osu_bibw with 3.0.0)

2018-04-05 Thread Nathan Hjelm
ox's publicly-stated support positions). Is it time to deprecate / print warning messages / remove the openib BTL? Begin forwarded message: From: Nathan Hjelm Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0 Date: April 5, 2018 at 12:48:08 PM EDT To: Open MPI Users

Re: [OMPI devel] Openmpi 3.0.0/3.0.1rc3 MPI_Win_post failing

2018-03-09 Thread Nathan Hjelm
Fixed in master and I'm the 3.0.x branch. Try the nightly tarball. > On Mar 9, 2018, at 10:01 AM, Alan Wild wrote: > > I’ve been running the OSU micro benchmarks ( > http://mvapich.cse.ohio-state.edu/benchmarks/ ). on my various MPI > installations. One test that has been consistently faili

Re: [OMPI devel] [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe

2018-02-26 Thread Nathan Hjelm
All MPI implementations have support for using CMA to transfer data between local processes. The performance is fairly good (not as good as XPMEM) but the interface limits what we can do with to remote process memory (no atomics). I have not heard about this new proposal. What is the benefit of

Re: [OMPI devel] btl/vader: osu_bibw hangs when the number of execution loops is increased

2017-12-05 Thread Nathan Hjelm
Should be fixed by PR #4569 (https://github.com/open-mpi/ompi/pull/4569). Please treat and let me know. -Nathan > On Dec 1, 2017, at 7:37 AM, DERBEY, NADIA wrote: > > Hi, > > Our validation team detected a hang when running osu_bibw > micro-benchmarks from the OMB 5.3 suite on openmpi 2.0.2

Re: [OMPI devel] Abstraction violation!

2017-06-22 Thread Nathan Hjelm
I have a fix I am working on. Will open a PR tomorrow morning. -Nathan > On Jun 22, 2017, at 6:11 PM, r...@open-mpi.org wrote: > > Here’s something even weirder. You cannot build that file unless mpi.h > already exists, which it won’t until you build the MPI layer. So apparently > what is happ

Re: [OMPI devel] Master MTT results

2017-06-01 Thread Nathan Hjelm
A quick glance makes me think this might be related to the info changes. I am taking a look now. -Nathan On Jun 01, 2017, at 09:35 AM, "r...@open-mpi.org" wrote: Hey folks I scanned the nightly MTT results from last night on master, and the RTE looks pretty solid. However, there are a LOT o

Re: [OMPI devel] Fwd: about ialltoallw

2017-05-08 Thread Nathan Hjelm
Probably a bug. Can you open an issue on github? -Nathan > On May 8, 2017, at 2:16 PM, Dahai Guo wrote: > > > Hi, > > The attached test code pass with MPICH well, but has problems with OpenMPI. > > There are three tests in the code, the first passes, the second one hangs, > and the third o

Re: [OMPI devel] count = -1 for reduce

2017-05-04 Thread Nathan Hjelm
By default MPI errors are fatal and abort. The error message says it all: *** An error occurred in MPI_Reduce *** reported by process [3645440001,0] *** on communicator MPI_COMM_WORLD *** MPI_ERR_COUNT: invalid count argument *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort

Re: [OMPI devel] Stack Overflow with osc_pt2pt _frag_alloc

2017-03-23 Thread Nathan Hjelm
3226 > > Sounds like Nathan is going to fix it shortly. > > >> On Mar 23, 2017, at 10:39 AM, Nathan Hjelm wrote: >> >> Looks like an uncaught merge error. The call should have an _ before it. >> Will fix. >> >> On Mar 23, 2017, at 6:33 AM, Clement F

Re: [OMPI devel] Stack Overflow with osc_pt2pt _frag_alloc

2017-03-23 Thread Nathan Hjelm
Looks like an uncaught merge error. The call should have an _ before it. Will fix. > On Mar 23, 2017, at 6:33 AM, Clement FOYER wrote: > > Hi everyone, > > While testing localy on my computer code using one-sided operations, the > selected module is the pt2pt OSC component, which always crash

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm
l) > > > Cheers, > > > Gilles > > > On 11/22/2016 12:29 PM, Nathan Hjelm wrote: >> MPI_Win_lock does not have to be blocking. In osc/rdma it is blocking in >> most cases but not others (lock all with on-demand is non-blocking) but in >> osc/pt2pt

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm
lling process to the target rank on the specified window", > which can be read as being a noop if no pending operations exists. > > George. > > > > On Mon, Nov 21, 2016 at 8:29 PM, Nathan Hjelm wrote: > MPI_Win_lock does not have to be blocking. In osc/rdma it is b

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm
MPI_Win_lock does not have to be blocking. In osc/rdma it is blocking in most cases but not others (lock all with on-demand is non-blocking) but in osc/pt2pt is is almost always non-blocking (it has to be blocking for proc self). If you really want to ensure the lock is acquired you can call MPI

Re: [OMPI devel] Deadlock in sync_wait_mt(): Proposed patch

2016-09-21 Thread Nathan Hjelm
Yeah, that looks like a bug to me. We need to keep the check before the lock but otherwise this is fine and should be fixed in 2.0.2. -Nathan > On Sep 21, 2016, at 3:16 AM, DEVEZE, PASCAL wrote: > > I encountered a deadlock in sync_wait_mt(). > > After investigations, it appears that a first

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-07 Thread Nathan Hjelm
ly MTT, so we can monitor progress on this support. It might > not make it into tonight's testing, but should be tomorrow. I might also try > to add it to our Jenkins testing too. > > On Wed, Sep 7, 2016 at 7:36 PM, Nathan Hjelm wrote: > Thanks for reporting this! Glad the pr

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-07 Thread Nathan Hjelm
Thanks for reporting this! Glad the problem is fixed. We will get this into 2.0.2. -Nathan > On Sep 7, 2016, at 9:39 AM, Vallee, Geoffroy R. wrote: > > I just tried the fix and i can confirm that it fixes the problem. :) > > Thanks!!! > >> On Sep 2, 2016, at 6:18 AM, Jeff Squyres (jsquyres)

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm
Posted a possible fix to the intercomm hang. See https://github.com/open-mpi/ompi/pull/2061 -Nathan > On Sep 7, 2016, at 6:53 AM, Nathan Hjelm wrote: > > Looking at the code now. This code was more or less directly translated from > the blocking version. I wouldn’t be surprised

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm
Looking at the code now. This code was more or less directly translated from the blocking version. I wouldn’t be surprised if there is an error that I didn’t catch with MTT on my laptop. That said, there is an old comment about not using bcast to avoid a possible deadlock. Since the collective

Re: [OMPI devel] FYI: soon to lose IA64 access

2016-08-30 Thread Nathan Hjelm
This might be the last straw in IA64 support. If we can’t even test it anymore it might *finally* be time to kill the asm. If someone wants to use IA64 they can use the builtin atomic support. -Nathan > On Aug 30, 2016, at 4:42 PM, Paul Hargrove wrote: > > I don't recall the details of the la

Re: [OMPI devel] C89 support

2016-08-30 Thread Nathan Hjelm
The best way to put this is his compiler defaults to --std=gnu89. That gives him about 90% of what we require from C99 but has weirdness like __restrict. The real solution is the list of functions that are called out on link and spot fixing with the gnu_inline attribute if -fgnu89-inline does n

Re: [OMPI devel] C89 support

2016-08-29 Thread Nathan Hjelm
Looking at the bug in google cache (https://webcache.googleusercontent.com/search?q=cache:p2WZm7Vlt2gJ:https://llvm.org/bugs/show_bug.cgi%3Fid%3D5960+&cd=1&hl=en&ct=clnk&gl=us) then isn’t the answer to just use -fgnu89-inline on this platform? Does that not solve the linking issue? From what I c

Re: [OMPI devel] C89 support

2016-08-29 Thread Nathan Hjelm
FYI, C99 has been required since late 2012. Going through the commits there is no way Open MPI could possibly compile with —std=c89 or —std=gnu99. Older compilers require we add —std=c99 so we can not remove the configure check. commit aebd1ea43237741bd29878604b742b14cc87d68b Author: Nathan

Re: [OMPI devel] C89 support

2016-08-27 Thread Nathan Hjelm
patch is simple and non-performance impacting. > > Original Message > From: Nathan Hjelm > Sent: Saturday, August 27, 2016 20:23 > To: Open MPI Developers > Reply To: Open MPI Developers > Subject: Re: [OMPI devel] C89 support > > Considering gcc more or less had full C99 supp

Re: [OMPI devel] C89 support

2016-08-27 Thread Nathan Hjelm
Considering gcc more or less had full C99 support in 3.1 (2002) and SLES10 dates back to 2004 I find this surprising. Clangs goal from the beginning was full C99 support. Checking back it looks like llvm 1.0 (2003) had C99 support. What version of clang/llvm are you using? -Nathan > On Aug 27,

Re: [OMPI devel] Upgrading our (openSUSE) Open MPI version

2016-08-25 Thread Nathan Hjelm
__malloc_initialize_hook got “poisoned” in a newer release of glibc. We disabled use of that symbol in 2.0.x. It might be worth adding to 1.10.4 as well. -Nathan > On Aug 25, 2016, at 8:09 AM, Karol Mroz wrote: > > __malloc_initialize_hook ___ deve

Re: [OMPI devel] Coll/sync component missing???

2016-08-19 Thread Nathan Hjelm
> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote: > > Hi folks > > I had a question arise regarding a problem being seen by an OMPI user - has > to do with the old bugaboo I originally dealt with back in my LANL days. The > problem is with an app that repeatedly hammers on a collective, a

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-09 Thread Nathan Hjelm
gt;>>>>> ***[manage:25442] Signal: Segmentation fault (11)[manage:25442] >>> Signal >>>>>>> code: Address not mapped (1)[manage:25442] Failing at address: >>>>>>>> 0x8 >>>>>>>> >>>>>>&

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm
base_btl_array_get_size(&bml_endpoint->btl_rdma); >>>>>>> + int num_eager_btls = mca_bml_base_btl_array_get_size >>> (&bml_endpoint-> >>>>> btl_eager);> double weight_total = 0;> int num_btls_used = 0;>> @@ >>> -57,6 >

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm
0[core 4[hwt 0]], socket >>>> 0[core 5[hwt0]]:> >> [B/B/B/B/B/B][./././././.]> >> >>> [manage.cluster:27444] MCW rank 1 bound to socket 0[core 0[hwt > 0]],socket> >>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so> >> cket 0[core

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm
gt; double > weight_total = 0;> + int rdma_count = 0;>> - for(i = 0; i < num_btls && i < > >> mca_pml_ob1.max_rdma_per_request; i+ >> +) {> - rdma_btls[i].bml_btl => - mca_bml_base_btl_array_get_next > (&bml_endpoint->btl_rdma);> - rdma_btls[i].btl

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm
tls[rdma_count++].btl_reg = NULL; + + weight_total += bml_btl->btl_weight; } - mca_pml_ob1_calc_weighted_length(rdma_btls, i, size, weight_total); + mca_pml_ob1_calc_weighted_length (rdma_btls, rdma_count, size, weight_total); - return i; + return rdma_count; } > On Aug 7, 2016, at 6:51 PM,

Re: [OMPI devel] openmpi2.0.0 - bad performance - ALIGNMENT

2016-08-08 Thread Nathan Hjelm
On Aug 08, 2016, at 05:17 AM, Paul Kapinos wrote:Dear Open MPI developers,there is already a thread about 'sm BTL performace of the openmpi-2.0.0'https://www.open-mpi.org/community/lists/devel/2016/07/19288.phpand we also see 30% bandwidth loss, on communication *via InfiniBand*.And we also have a

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm
We have established that the bug does not occur unless an RDMA network is being used (openib, ugni, etc). The fix has been identified and will be included in the 2.0.1 release. -Nathan > On Aug 8, 2016, at 3:20 AM, Christoph Niethammer wrote: > > Hello Howard, > > > If I use tcp I get sligh

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread Nathan Hjelm
> mca_pml_ob1.max_rdma_per_request; >> i++) { >>> + mca_bml_base_btl_t *bml_btl = mca_bml_base_btl_array_get_next >> (&bml_endpoint->btl_rdma); >>> +bool ignore = true; >>> + >>> +for (int i = 0 ; i < num_eager_bt

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread Nathan Hjelm
h (rdma_btls, rdma_count, size, weight_total); -return i; +return rdma_count; } > On Aug 7, 2016, at 6:51 PM, Nathan Hjelm wrote: > > Looks like the put path probably needs a similar patch. Will send another > patch soon. > >> On Aug 7, 2016, at 6:01 PM, tmish.

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-07 Thread Nathan Hjelm
= 0 ; i < num_eager_btls ; ++i) { >> +mca_bml_base_btl_t *eager_btl = > mca_bml_base_btl_array_get_index (&bml_endpoint->btl_eager, i); >> +if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) { >> +ignore = false; >> +

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-05 Thread Nathan Hjelm
tinue; +} if (btl->btl_register_mem) { /* do not use the RDMA protocol with this btl if 1) leave pinned is disabled, -Nathan > On Aug 5, 2016, at 8:44 AM, Nathan Hjelm wrote: > > Nope. We are not going to change the flags as this will disable the blt for &g

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-05 Thread Nathan Hjelm
Nope. We are not going to change the flags as this will disable the blt for one-sided. Not sure what is going on here as the openib btl should be 1) not used for pt2pt, and 2) polled infrequently. The btl debug log suggests both of these are the case. Not sure what is going on yet. -Nathan > O

Re: [OMPI devel] MCA Component Development: Function Pointers

2016-07-29 Thread Nathan Hjelm
Look at ompi/mpi/c/start.c. Just add a check there thst redirects to the coll component for coll requests. You will probably need to expand the coll interface in ompi/mca/coll/coll.h to include a start function. > On Jul 29, 2016, at 10:20 AM, Bradley Morgan wrote: > > > Hello Gilles, > > Th

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread Nathan Hjelm
sm is deprecated in 2.0.0 and will likely be removed in favor of vader in 2.1.0. This issue is probably this known issue: https://github.com/open-mpi/ompi-release/pull/1250 Please apply those commits and see if it fixes the issue for you. -Nathan > On Jul 26, 2016, at 6:17 PM, tmish...@jcity.m

Re: [OMPI devel] OpenMPI 2.0 and Petsc 3.7.2

2016-07-25 Thread Nathan Hjelm
It looks to me like double free on both send and receive requests. The receive free is an extra OBJ_RELEASE of MPI_DOUBLE which was not malloced (invalid free). The send free is an assert failure in OBJ_RELEASE of an OBJ_NEW() object (invalid magic). I plan to look at in in the next couple of da

Re: [OMPI devel] RFC: remove --disable-smp-locks

2016-07-12 Thread Nathan Hjelm
> On Jul 12, 2016, at 12:01 AM, Sreenidhi Bharathkar Ramesh > wrote: > > [ query regarding an old thread ] > > Hi, > > It looks like "--disable-smp-locks" is still available as an option. > > 1. Will this be continued or deprecated ? It was completely discontinued. The problem with the opti

Re: [OMPI devel] BML/R2 error

2016-07-03 Thread Nathan Hjelm
Its correct as written. the && takes precidence over the || and the statement gets evaluated in the order i intended. i will add the parentheses to quiet the warning when i get a chance > On Jul 3, 2016, at 9:01 AM, Ralph Castain wrote: > > I agree with the compiler - I can’t figure out exactl

Re: [OMPI devel] 2.0.0rc3 MPI_Comm_split_type()

2016-06-16 Thread Nathan Hjelm
https://github.com/open-mpi/ompi/pull/1788 On Jun 16, 2016, at 05:16 PM, Nathan Hjelm wrote: Not sure why happened but it is indeed a regression. Will submit a fix now. -Nathan On Jun 16, 2016, at 02:19 PM, Lisandro Dalcin wrote: Could you please check/confirm you are supporting passing

Re: [OMPI devel] 2.0.0rc3 MPI_Comm_split_type()

2016-06-16 Thread Nathan Hjelm
Not sure why happened but it is indeed a regression. Will submit a fix now. -Nathan On Jun 16, 2016, at 02:19 PM, Lisandro Dalcin wrote: Could you please check/confirm you are supporting passing split_type=MPI_UNDEFINED to MPI_Comm_split_type() ? IIRC, this is a regression from 2.0.0rc2. $ ca

Re: [OMPI devel] Open MPI v2.0.0rc3 now available

2016-06-15 Thread Nathan Hjelm
Will track this one down as well tomorrow. -Nathan > On Jun 15, 2016, at 7:13 PM, Paul Hargrove wrote: > > With a PPC64/Fedora20/gcc-4.8.3 system configuring for "-m32": > > configure --prefix=[...] --enable-debug \ > CFLAGS=-m32 --with-wrapper-cflags=-m32 \ > CXXFLAGS=-m32 --w

Re: [OMPI devel] [2.0.0rc3] ppc64el/XLC crash in intercept_mmap()

2016-06-15 Thread Nathan Hjelm
Ok, that was working. Our PPC64 system is back up and I will finally be able to fix it tomorrow. -Nathan > On Jun 15, 2016, at 7:35 PM, Paul Hargrove wrote: > > Also seen now on a big-endian Power7 with XLC-13.1 > > -Paul > > On Wed, Jun 15, 2016 at 6:20 PM, Paul Hargrove wrote: > On a litt

Re: [OMPI devel] BTL flags

2016-06-02 Thread Nathan Hjelm
Opps, my compiler didn’t catch that. Will fix that now. > On Jun 2, 2016, at 7:07 PM, George Bosilca wrote: > > Nathan, > > I see a lot of [for once valid] complains from clang regarding the last UGNI > related commit. More precisely the MCA_BTL_ATOMIC_SUPPORTS_FLOAT value is too > large with

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-02 Thread Nathan Hjelm
The osc hang is fixed by a PR to fix bugs in start in cm and ob1. See #1729. -Nathan > On Jun 2, 2016, at 5:17 AM, Gilles Gouaillardet > wrote: > > fwiw, > > the onsided/c_fence_lock test from the ibm test suite hangs > > (mpirun -np 2 ./c_fence_lock) > > i ran a git bisect and it incrimina

Re: [OMPI devel] contributing to Open MPI

2016-06-02 Thread Nathan Hjelm
release branches on a rolling basis. You're too > late for the 2.0.x series, but the door is open for the v2.1.x series (and > beyond, of course). > > > > > On May 30, 2016, at 11:15 AM, Nathan Hjelm wrote: > > > > I should clarify. The PR adds support fo

Re: [OMPI devel] contributing to Open MPI

2016-05-30 Thread Nathan Hjelm
I should clarify. The PR adds support for ARM64 atomics and CMA when the linux headers are not installed. It does not update the timer code and still needs some testing. -Nathan > On May 30, 2016, at 8:37 AM, Nathan Hjelm wrote: > > We already have a PR open to add ARM64 support. Pl

Re: [OMPI devel] contributing to Open MPI

2016-05-30 Thread Nathan Hjelm
We already have a PR open to add ARM64 support. Please test https://github.com/open-mpi/ompi/pull/1634 and let me know if it works for you. Additional contributions are greatly appreciated! -Nathan > On May 30, 2016, at 4:32 AM, Sreenidhi Bharathkar Ramesh > wrote: > > Hello, > > We may be

Re: [OMPI devel] One-sided failures on master

2016-05-26 Thread Nathan Hjelm
Only thing I can think of is the request stuff. It was working last time I tested George’s branch. I will take a look at MTT tomorrow. -Nathan > On May 26, 2016, at 8:24 PM, Ralph Castain wrote: > > I’m seeing a lot of onesided hangs on master when trying to run an MTT scan > on it tonight -

Re: [OMPI devel] Process connectivity map

2016-05-16 Thread Nathan Hjelm
add_procs is always called at least once. This is how we set up shared memory communication. It will then be invoked on-demand for non-local peers with the reachability argument set to NULL (because the bitmask doesn't provide any benefit when adding only 1 peer). -Nathan On Tue, May 17, 2016 at

Re: [OMPI devel] Question about 'progress function'

2016-05-06 Thread Nathan Hjelm
The return code of your progress function should be related to the activity (send, recv, put, get, etc completion) on your network. The return is not really used right now but may be meaningful in the future. Your BTL signals progress through two mechanisms: 1) Send completion is indicated by e

Re: [OMPI devel] [2.0.0rc2] xlc build failure (inline asm)

2016-05-04 Thread Nathan Hjelm
Go ahead, I don't have access to xlc so I couldn't verify myself. I don't fully understand why the last : can be omitted when there are no clobbers. -Nathan On Wed, May 04, 2016 at 01:34:48PM -0500, Josh Hursey wrote: >Did someone pick this up to merge into master & v2.x? >I can confirm

Re: [OMPI devel] [2.0.0rc2] Illegal instruction on Pentium III

2016-05-02 Thread Nathan Hjelm
Should be fixed by https://github.com/open-mpi/ompi/pull/1618 Thanks for catching this. -Nathan On Mon, May 02, 2016 at 02:30:19PM -0700, Paul Hargrove wrote: >I have an Pentium III Linux system which fails "make check" with: >make[3]: Entering directory > > `/home/phargrov/OMPI/ope

Re: [OMPI devel] [2.0.0rc2] openib btl build failure

2016-05-02 Thread Nathan Hjelm
Fixed by https://github.com/open-mpi/ompi/pull/1619 Thanks for catching this. -Nathan On Mon, May 02, 2016 at 01:57:07PM -0700, Paul Hargrove wrote: >I have an x86-64/Linux system with a fairly standard install of Scientific >Linux 6.3 (a RHEL clone like CentOS). >However, it appear

Re: [OMPI devel] [2.0.0rc2] Linux/MIPS64 failures

2016-05-02 Thread Nathan Hjelm
Looks like patcher/linux needs some work on 32-bit systems. I will try to get this fixed in the next day or two. -Nathan On Mon, May 02, 2016 at 03:36:35PM -0700, Paul Hargrove wrote: >New since the last time I did testing, I now have access to Linux MIPS64 >(Cavium Octeon II) systems. >

Re: [OMPI devel] [2.0.0rc2] Illegal instruction on Pentium III

2016-05-02 Thread Nathan Hjelm
Thanks Paul, I will see what I can do to fix this one. -Nathan On Mon, May 02, 2016 at 02:30:19PM -0700, Paul Hargrove wrote: >I have an Pentium III Linux system which fails "make check" with: >make[3]: Entering directory > > `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-1

Re: [OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-21 Thread Nathan Hjelm
at a time, and the only pml that >could use btl's was ob1. If that is the case, how can the openib btl run >at the same time as cm and yalla? > >Also, what is UD? > >Thanks, > David > >On 04/21/2016 09:25 AM, Nathan Hjelm wrote: > > T

Re: [OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-21 Thread Nathan Hjelm
The openib btl should be able to run alongside cm/mxm or yalla. If I have time this weekend I will get on the mustang and see what the problem is. The best answer is to change the openmpi-mca-params.conf in the install to have pml = ob1. I have seen little to no benefit with using MXM on mustang.

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3792-g92290b9

2016-04-07 Thread Nathan Hjelm
Hah, just caught that as well. Commented on the commit on github. Definitely looks wrong. -Nathan On Thu, Apr 07, 2016 at 05:43:17PM +, Dave Goodell (dgoodell) wrote: > [inline] > > On Apr 7, 2016, at 12:53 PM, git...@crest.iu.edu wrote: > > > > This is an automated email from the git hook

Re: [OMPI devel] query on atomic operations

2016-04-01 Thread Nathan Hjelm
This is done to provide the functionality when the compiler doesn't support inline asm. I do not know how testing is done with the atomics in opal/asm/base atomics so its possible some of them are incorrect. -Nathan On Fri, Apr 01, 2016 at 02:39:39PM +0530, Sreenidhi Bharathkar Ramesh wrote: >

Re: [OMPI devel] mca_btl__prepare_dst

2016-03-18 Thread Nathan Hjelm
The prepare_dst function was a bottleneck to providing fast one-sided support using network RDMA. As the function was only used in the RDMA path it was removed in favor of btl_register_mem + a more complete put/get interface. You can look at the way the various btls moved the functionality. The si

Re: [OMPI devel] Network atomic operations

2016-03-04 Thread Nathan Hjelm
etti situation. > >Thank you >Durga >Life is complex. It has real and imaginary parts. >On Fri, Mar 4, 2016 at 11:06 AM, Nathan Hjelm wrote: > > On Thu, Mar 03, 2016 at 05:26:45PM -0500, dpchoudh . wrote: > >Hello all > > > >

Re: [OMPI devel] Network atomic operations

2016-03-04 Thread Nathan Hjelm
On Thu, Mar 03, 2016 at 05:26:45PM -0500, dpchoudh . wrote: >Hello all > >Here is a 101 level question: > >OpenMPI supports many transports, out of the box, and can be extended to >support those which it does not. Some of these transports, such as >infiniband, provide hardwar

Re: [OMPI devel] MTT setup updated to gcc-6.0 (pre)

2016-03-01 Thread Nathan Hjelm
I will add to how crazy this is. The C standard has been very careful to not break existing code. For example the C99 boolean is _Bool not bool because C reserves _[A-Z]* for its own use. This means a valid C89 program is a valid C99 and C11 program. It Look like this is not true in C++. -Nathan

Re: [OMPI devel] Fwd: [OMPI users] shared memory under fortran, bug?

2016-02-02 Thread Nathan Hjelm
Hmm, I think you are correct. There may be instances where two different local processes may use the same CID for different communicators. It should be sufficient to add the PID of the current process to the filename to ensure it is unique. -Nathan On Tue, Feb 02, 2016 at 09:33:29PM +0900, Gille

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Nathan Hjelm
orms with this idea. > There are environments, for example, that use PBSPro for one part of >the system (e.g., IO nodes), but something else for the compute >section. >> >> Personally, I'd rather follow Howard's suggestion. >

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Nathan Hjelm
On Mon, Jan 25, 2016 at 05:55:20PM +, Jeff Squyres (jsquyres) wrote: > Hmm. I'm of split mind here. > > I can see what Howard is saying here -- adding complexity is usually a bad > thing. > > But we have gotten these problem reports multiple times over the years: > someone *thinking* that

Re: [OMPI devel] vader and mmap_shmem module cleanup problem

2015-12-15 Thread Nathan Hjelm
Looks like there is a missing conditional in mca_btl_vader_component_close(). Will add it and PR to 1.10 and 2.x. -Nathan On Tue, Dec 15, 2015 at 11:18:11AM +0100, Justin Cinkelj wrote: > I'm trying to port Open MPI to OS with threads instead of processes. > Currently, during MPI_Finalize, I get

Re: [OMPI devel] ompi_win_create hangs on a non uniform cluster

2015-11-24 Thread Nathan Hjelm
This happens because we do not currently have a way to detect connectivity without allocating ompi_proc_t's for every rank in the window. I added the osc_rdma_btls MCA variable to act as a short-circuit that avoids the costly connectivity lookup. By default the value is ugni,openib. You can set it

Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x

2015-10-14 Thread Nathan Hjelm
I think this is from a known issue. Try applying this and run again: https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch -Nathan On Wed, Oct 14, 2015 at 06:33:07PM +0200, Paul Kapinos wrote: > Dear Open MPI developer, > > We're puzzled by reproducible perform

Re: [OMPI devel] 16 byte real in Fortran

2015-10-14 Thread Nathan Hjelm
On Wed, Oct 14, 2015 at 02:40:00PM +0100, Vladimír Fuka wrote: > Hello, > > I have a problem with using the quadruple (128bit) or extended > (80bit) precision reals in Fortran. I did my tests with gfortran-4.8.5 > and OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2), but others confirmed > this beha

Re: [OMPI devel] problems compiling ompi master

2015-09-22 Thread Nathan Hjelm
Hah, opps. Typo in the coverity fixes. Fixing now. -Nathan On Tue, Sep 22, 2015 at 10:24:29AM -0600, Howard Pritchard wrote: >Hi Folks, >Is anyone seeing a problem compiling ompi today? >This is what I'm getting > CC osc_pt2pt_passive_target.lo >In file included from .

Re: [OMPI devel] --enable-spare-groups build broken

2015-09-17 Thread Nathan Hjelm
No, it was not. Will fix. -Nathan On Wed, Sep 16, 2015 at 07:26:58PM -0700, Ralph Castain wrote: >Yes - Nathan made some changes related to the add_procs code. I doubt that >configure option was checked... >On Wed, Sep 16, 2015 at 7:13 PM, Jeff Squyres (jsquyres) > wrote: > >

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm
Not sure. I give a +1 for blowing them away. We can bring them back later if needed. -Nathan On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote: >As they don't even compile why are we keeping them around? > George. >On Wed, Sep 16, 2015 at 12:05 PM, Natha

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm
cesses. > > Thanks > Edgar > > On 9/16/2015 10:42 AM, Nathan Hjelm wrote: > > > >The reproducer is working for me with master on OX 10.10. Some changes > >to ompi_comm_set went in yesterday. Are you on the latest hash? > > > >-Nathan > > > &g

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm
iboffload and bfo are opal ignored by default. Neither exists in the release branch. -Nathan On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote: >While looking into a possible fix for this problem we should also cleanup >in the trunk the leftover from the OMPI_FREE_LIST. >

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm
I just realized my branch is behind master. Updating now and will retest. -Nathan On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote: > yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and > more processes. > > Thanks > Edgar > > On 9/16/

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm
The reproducer is working for me with master on OX 10.10. Some changes to ompi_comm_set went in yesterday. Are you on the latest hash? -Nathan On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote: > something is borked right now on master in the management of inter vs. intra > communica

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm
The formatting of the code got all messed up. Please send a diff and I will take a look. ompi free list no longer exists in master or the next release branch but the change may be worthwhile for the opal free list code. -Nathan On Wed, Sep 16, 2015 at 04:03:44PM +0300, Алексей Рыжих wrote: >

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Nathan Hjelm
+1 On Mon, Aug 24, 2015 at 07:08:02PM +, Jeff Squyres (jsquyres) wrote: > FWIW, we have had verbal agreement in the past that the v1.8 series was the > last one to contain MX support. I think it would be fine for all MX-related > components to disappear from v1.10. > > Don't forget that M

Re: [OMPI devel] 1.10.0rc3 build failure Solaris/x86 + gcc

2015-08-20 Thread Nathan Hjelm
f Squyres (jsquyres) > wrote: > > (the fix has been merged in to v1.8 and v1.10 branches) > > On Aug 20, 2015, at 12:18 PM, Nathan Hjelm wrote: > > > > > > I see the problem. Both Ralph and I missed an error in the > > che

  1   2   3   4   5   >