from:"Nathan Hjelm"

Re: [OMPI devel] How to progress MPI_Recv using custom BTL for NIC under development

2022-08-03 Thread Nathan Hjelm via devel


Kind of sounds to me like they are using the wrong proc when receiving. Here is an example of what a modex receive should look like:https://github.com/open-mpi/ompi/blob/main/opal/mca/btl/ugni/btl_ugni_endpoint.c#L44-NathanOn Aug 3, 2022, at 11:29 AM, 
"Jeff Squyres (jsquyres) via devel"  wrote:Glad you solved the first issue!With respect to debugging, if you don't have a parallel debugger, you can do something like this: 
https://www.open-mpi.org/faq/?category=debugging#serial-debuggersIf you haven't done so already, I highly suggest configuring Open MPI with "CFLAGS=-g -O0".As for the modex, it does actually use TCP under the covers, but that shouldn't matter 
to you: the main point is that the BTL is not used for exchanging modex information.  Hence, whatever your BTL module puts into the modex and gets out of the modex should happen asynchronously without involving the BTL.--Jeff 
Squyresjsquyres@cisco.comFrom: devel  on behalf of Michele Martinelli via devel Sent: Wednesday, August 3, 2022 12:49 PMTo: 
de...@lists.open-mpi.orgCc: Michele MartinelliSubject: Re: [OMPI devel] How to progress MPI_Recv using custom BTL for NIC under developmentthank you for the answer. Actually I think I solved that problem somedays ago, basically (if I correctly 
understand) MPI "adds" in some sensean header to the data sent (please correct me if I'm wrong), which isthen used by ob1 to match the data arrived with the mpi_recv posted bythe user. The problem was then a poorly reconstructed header on 
thereceiving side.unfortunately my happiness didn't last long because I have already foundanother problem: it seems that the peers are not actually exchanging thecorrect information via the modex protocol (not sure which kind ofnetwork connection they 
are using in that phase), receiving "local" datainstead of the remote ones, but I just started debugging this, maybe Icould open a new thread specific on this.MicheleIl 03/08/22 15:43, Jeff Squyres (jsquyres) ha scritto:Sorry for the huge 
delay in replies -- it's summer / vacation season, and I think we (as a community) are a little behind in answering some of these emails.  :-(It's been quite a while since I have been in the depths of BTL internals; I'm afraid I don't remember the 
details offhand.When I was writing the usnic BTL, I know I found it useful to attach a debugger on the sending and/or receiving side processes, and actually step through both my BTL code and the OB1 PML code to see what was happening.  I frequently 
found that either my BTL wasn't correctly accounting for network conditions, or it wasn't passing information up to OB1 that it expected (e.g., it passed the wrong length, or the wrong ID number, or ...something else).  You can actually follow what 
happens in OB1 when your BTL invokes the cbfunc -- does it find a corresponding MPI_Request, and does it mark it complete?  Or does it put your incoming fragment as an unexpected message for some reason, and put it on the unexpected queue?  Look for 
that kind of stuff.--Jeff Squyresjsquyres@cisco.comFrom: devel  on behalf of Michele Martinelli via devel Sent: Saturday, July 23, 2022 9:04 
AMTo: de...@lists.open-mpi.orgCc: Michele MartinelliSubject: [OMPI devel] How to progress MPI_Recv using custom BTL for NIC under developmentHi,I'm trying to develop a btl for a custom NIC. I studied the btl.h fileto understand the flow of calls that 
are expected to be implemented inmy component. I'm using a simple test (which works like a charm with theTCP btl) to test my development, the code is a simple MPI_Send + MPI_Recv: MPI_Init(NULL, NULL); int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, 
_rank); int world_size; MPI_Comm_size(MPI_COMM_WORLD, _size); int ping_pong_count = 1; int partner_rank = (world_rank + 1) % 2; printf("MY RANK: %d PARTNER: %d\n",world_rank,partner_rank); if (world_rank == 0) { 
ping_pong_count++; MPI_Send(_pong_count, 1, MPI_INT, partner_rank, 0,MPI_COMM_WORLD); printf("%d sent and incremented ping_pong_count %d to %d\n",world_rank, ping_pong_count, partner_rank); } else { MPI_Recv(_pong_count, 1, 
MPI_INT, partner_rank, 0,MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%d received ping_pong_count %d from %d\n", world_rank, ping_pong_count, partner_rank); } MPI_Finalize();I see that in my component's btl code the functions called during 
the"MPI_send" phase are: 1. mca_btl_mycomp_add_procs 2. mca_btl_mycomp_prepare_src 3. mca_btl_mycomp_send (where I set the return to 1, so the send phase should be finished)I see then the print inside the test: 0 sent and incremented 
ping_pong_count 2 to 1and this should conclude the MPI_Send phase.Then I implemented in the btl_mycomp_component_progress function a call to: mca_btl_active_message_callback_t *reg =mca_btl_base_active_message_trigger + tag; 
reg->cbfunc(_btl->super, );I saw the same code in all the other BTLs and I thought this was enoughto "unlock" the MPI_Recv "polling".

Re: [OMPI devel] C style rules / reformatting

2021-05-18 Thread Nathan Hjelm via devel


It really is a shame that could not go forward. There are really three end 
goals in mind:

1) Consistency. We all have different coding styles and following a common 
coding style is more and more considered a best practice. The number of 
projects using clang-format grows continuously. I find it mildly annoying when 
someone changes the coding style in code I maintain because I end up having to 
fix it for consistency.

2) clang-tidy. This is the ultimate end goal. clang-tidy can find real problems 
in the code and help to reduce debugging time down the road. It has a huge 
community behind it and is constantly improving. Think of it like lint on speed.

3) Correctness. clang-format doesn't do much of this on its own but sorting the 
include lines found real errors in header dependencies. The fixing of 
indentation can also help to expose real coding issues during development that 
may be hidden by poor indentation (this has been a problem in Open MPI in the 
past). Other tools can do this part but we loose clang-tidy.

All in all the formatting was not really that bad beyond a few corner cases. 
Usually when I see these I rearrange the code to make it look better. One 
example (which Open MPI should recommend) is the trailing comma in 
initializers. It makes clang-format output much cleaner.

Anyway, I have said my peace and will continue to clang-format my code whenever 
I modify it :).

-Nathan

On May 17, 2021 at 2:01 PM, "Jeff Squyres (jsquyres) via devel" 
 wrote:

FYI: It was decided last week that we will abandon the current effort to 
reformat master / v5.0.x according to style rules.

SHORT VERSION

We have already reformatted opal/ and tests/. But the two PRs for reformatting 
ompi/ brought up a whole bunch of issues that do not seem resolvable via 
clang-format. As such, we're just walking away: we're not going to revert the 
reformatting that was done to opal/ and tests/ on master and v5.0.x, but we're 
just going to close the ompi/ reformatting PRs without merging.

Many thanks to Nathan who invested a lot of time in this; I'm sorry it didn't 
fully work out. :-(

MORE DETAIL

It turns out that clang-format actually parses the C code into internal language primitives and 
then re-renders the code according to all the style choices that you configure. Meaning: you have 
to make decisions about every single style choice (e.g., whether to put "&&" at 
the beginning or end of the line, when expressions span multiple lines).

This is absolutely not what we want to do. 
https://github.com/open-mpi/ompi/wiki/CodingStyle is intentionally very "light 
touch": it only specifies a bare minimum of style rules -- the small number of 
things that we could all agree on. Everything outside of those rules is not regulated.

Clang-format simply doesn't work that way: you have to make a decision for 
every single style choice.

So we'll close https://github.com/open-mpi/ompi/pull/8816 and 
https://github.com/open-mpi/ompi/pull/8923 without merging them.

If someone would like to find a better tool that can:

a) re-format the ompi/ and oshmem/ trees according to our "light touch" rules
b) fail a CI test when a PR introduces a delta that results in code breaking the 
"light touch" rules

Then great: let's have that conversation. But clang-format is not going to work 
for us, unfortunately. :-(

--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI devel] Intel OPA and Open MPI

2019-04-24 Thread Nathan Hjelm via devel

I think so. The one who is most familiar with btl/ofi is Arm (CCed in this 
email).

-Nathan

> On Apr 24, 2019, at 9:41 AM, Heinz, Michael William 
>  wrote:
> 
> So, 
> 
> Would it be worthwhile for us to start doing test builds now? Is the code 
> ready for that at this time?
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of
>> Nathan Hjelm via devel
>> Sent: Friday, April 12, 2019 11:19 AM
>> To: Open MPI Developers 
>> Cc: Nathan Hjelm ; Castain, Ralph H
>> ; Yates, Brandon 
>> Subject: Re: [OMPI devel] Intel OPA and Open MPI
>> 
>> That is accurate. We expect to support OPA with the btl/ofi component. It
>> should give much better performance than osc/pt2pt + mtl/ofi. What would
>> be good for you to do on your end is verify everything works as expected
>> and that the performance is on par for what you expect.
>> 
>> -Nathan
>> 
>>> On Apr 12, 2019, at 9:11 AM, Heinz, Michael William
>>  wrote:
>>> 
>>> Hey guys,
>>> 
>>> So, I’ve watched the videos, dug through the release notes, and
>> participated in a few of the weekly meetings and I’m feeling a little more
>> comfortable about being a part of Open MPI - and I’m looking forward to it.
>>> 
>>> But I find myself needing to look for some direction for my participation
>> over the next few months.
>>> 
>>> First - a little background. Historically, I’ve been involved with IB/OPA
>> development for 17+ years now, but for the past decade or so I’ve been
>> entirely focused on fabric management rather than application-level stuff.
>> (Heck, if you ever wanted to complain about why OPA management
>> datagrams are different from IB MADs, feel free to point the finger at me,
>> I’m happy to explain why the new ones are better… ;-) ) However, it was only
>> recently that the FM team were given the additional responsibility for
>> maintaining / participating in our MPI efforts with very little opportunity 
>> for a
>> transfer of information with the prior team.
>>> 
>>> So, while I’m looking forward to this new role I’m feeling a bit
>> overwhelmed - not least of which because I will be unavailable for about 8
>> weeks this summer…
>>> 
>>> In particular, I found an issue in our internal tracking systems that says 
>>> (and
>> I may have mentioned this before…)
>>> 
>>> OMPI v5.0.0 will remove osc/pt2pt component that is the only component
>> that MTLs use (PSM2 and OFI). OMPI v5.0.0 is planned to be released during
>> summer 2019 (no concrete dates).  https://github.com/open-
>> mpi/ompi/wiki/5.0.x-FeatureList. The implications is that none of the MTLs
>> used for Omni-Path will support running one sided MPI APIs (RMA).
>>> 
>>> Is this still accurate? The current feature list says:
>>> 
>>> If osc/rdma supports all possible scenarios (e.g., all BTLs support the RDMA
>> methods osc/rdma needs), this should allow us to remove osc/pt2pt (i.e.,
>> 100% migrated to osc/rdma).
>>> 
>>> If this is accurate, I’m going to need help from the other maintainers to
>> understand the reason this is being done, the scope of this effort and where
>> we need to focus our attention. To deal with the lack of coverage over the
>> summer, I’ve asked a co-worker, Brandon Yates to start sitting in on the
>> weekly meetings with me.
>>> 
>>> Again, I’m looking forward to both the opportunity of working with an open
>> source team, and the chance to focus on the users of our software instead of
>> just the management of the fabric - I’m just struggling at the moment to get
>> a handle on this potential deadline.
>>> 
>>> ---
>>> Mike Heinz
>>> Networking Fabric Software Engineer
>>> Intel Corporation
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-16 Thread Nathan Hjelm via devel

What Ralph said. You just blow memory on a queue that is not recovered in the 
current implementation.

Also, moving to Allreduce will resolve the issue as now every call is 
effectively also a barrier. I have found with some benchmarks and collective 
implementations it can be faster than reduce anyway. That is why it might be 
worth trying.

-Nathan

> On Apr 15, 2019, at 2:33 PM, Saliya Ekanayake  wrote:
> 
> Thank you, Nathan. Could you elaborate a bit on what happens internally? From 
> your answer it seems, the program will still produce the correct output at 
> the end but it'll use more resources. 
> 
> On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel 
>  wrote:
> If you do that it may run out of resources and deadlock or crash. I recommend 
> either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
> enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
> easiest option and depending on how large you run may not be any slower than 
> 1 or 3.
> 
> -Nathan
> 
> > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  wrote:
> > 
> > Hi Devs,
> > 
> > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> > understanding that ranks other than root (0 in this case) will pass the 
> > collective as soon as their data is written to MPI buffers without waiting 
> > for all of them to be received at the root?
> > 
> > If that's the case then what would happen (semantically) if we execute 
> > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> > collective multiple times while the root will be processing an earlier 
> > reduce? For example, the root can be in the first reduce invocation, while 
> > another rank is in the second the reduce invocation.
> > 
> > Thank you,
> > Saliya
> > 
> > -- 
> > Saliya Ekanayake, Ph.D
> > Postdoctoral Scholar
> > Performance and Algorithms Research (PAR) Group
> > Lawrence Berkeley National Laboratory
> > Phone: 510-486-5772
> > 
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Nathan Hjelm via devel

If you do that it may run out of resources and deadlock or crash. I recommend 
either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
easiest option and depending on how large you run may not be any slower than 1 
or 3.

-Nathan

> On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  wrote:
> 
> Hi Devs,
> 
> When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> understanding that ranks other than root (0 in this case) will pass the 
> collective as soon as their data is written to MPI buffers without waiting 
> for all of them to be received at the root?
> 
> If that's the case then what would happen (semantically) if we execute 
> MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> collective multiple times while the root will be processing an earlier 
> reduce? For example, the root can be in the first reduce invocation, while 
> another rank is in the second the reduce invocation.
> 
> Thank you,
> Saliya
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Intel OPA and Open MPI

2019-04-12 Thread Nathan Hjelm via devel

That is accurate. We expect to support OPA with the btl/ofi component. It 
should give much better performance than osc/pt2pt + mtl/ofi. What would be 
good for you to do on your end is verify everything works as expected and that 
the performance is on par for what you expect.

-Nathan

> On Apr 12, 2019, at 9:11 AM, Heinz, Michael William 
>  wrote:
> 
> Hey guys,
>  
> So, I’ve watched the videos, dug through the release notes, and participated 
> in a few of the weekly meetings and I’m feeling a little more comfortable 
> about being a part of Open MPI - and I’m looking forward to it.
>  
> But I find myself needing to look for some direction for my participation 
> over the next few months.
>  
> First - a little background. Historically, I’ve been involved with IB/OPA 
> development for 17+ years now, but for the past decade or so I’ve been 
> entirely focused on fabric management rather than application-level stuff. 
> (Heck, if you ever wanted to complain about why OPA management datagrams are 
> different from IB MADs, feel free to point the finger at me, I’m happy to 
> explain why the new ones are better… ;-) ) However, it was only recently that 
> the FM team were given the additional responsibility for maintaining / 
> participating in our MPI efforts with very little opportunity for a transfer 
> of information with the prior team.
>  
> So, while I’m looking forward to this new role I’m feeling a bit overwhelmed 
> - not least of which because I will be unavailable for about 8 weeks this 
> summer…
>  
> In particular, I found an issue in our internal tracking systems that says 
> (and I may have mentioned this before…)
>  
> OMPI v5.0.0 will remove osc/pt2pt component that is the only component that 
> MTLs use (PSM2 and OFI). OMPI v5.0.0 is planned to be released during summer 
> 2019 (no concrete dates).  
> https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList. The implications is 
> that none of the MTLs used for Omni-Path will support running one sided MPI 
> APIs (RMA).
>  
> Is this still accurate? The current feature list says:
>  
> If osc/rdma supports all possible scenarios (e.g., all BTLs support the RDMA 
> methods osc/rdma needs), this should allow us to remove osc/pt2pt (i.e., 100% 
> migrated to osc/rdma).
>  
> If this is accurate, I’m going to need help from the other maintainers to 
> understand the reason this is being done, the scope of this effort and where 
> we need to focus our attention. To deal with the lack of coverage over the 
> summer, I’ve asked a co-worker, Brandon Yates to start sitting in on the 
> weekly meetings with me.
>  
> Again, I’m looking forward to both the opportunity of working with an open 
> source team, and the chance to focus on the users of our software instead of 
> just the management of the fabric - I’m just struggling at the moment to get 
> a handle on this potential deadline.
>  
> ---
> Mike Heinz
> Networking Fabric Software Engineer
> Intel Corporation
>  
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] IBM CI re-enabled.

2018-10-18 Thread Nathan Hjelm via devel


Appears to be broken. Its failing and simply saying:

Testing in progress..

-Nathan

On Oct 18, 2018, at 11:34 AM, Geoffrey Paulsen  wrote:

 
I've re-enabled IBM CI for PRs.
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Bug on branch v2.x since october 3

2018-10-17 Thread Nathan Hjelm via devel



Ah yes, 18f23724a broke things so we had to fix the fix. Didn't apply it to the 
v2.x branch. Will open a PR to bring it over.

-Nathan

On Oct 17, 2018, at 11:28 AM, Eric Chamberland 
 wrote:

Hi,

since commit 18f23724a, our nightly base test is broken on v2.x branch.

Strangely, on branch v3.x, it broke the same day with 2fd9510b4b44, but 
was repaired some days after (can't tell exactly, but at most it was 
fixed with fa3d92981a).


I get segmentation faults or deadlocks in many cases.

Could this be related with issue 5842 ?
(https://github.com/open-mpi/ompi/issues/5842)

Here is an example of backtrace for a deadlock:

#4 
#5 0x7f9dc9151d17 in sched_yield () from /lib64/libc.so.6
#6 0x7f9dccee in opal_progress () at runtime/opal_progress.c:243
#7 0x7f9dbe53cf78 in ompi_request_wait_completion (req=0x46ea000) 
at ../../../../ompi/request/request.h:392
#8 0x7f9dbe53e162 in mca_pml_ob1_recv (addr=0x7f9dd64a6b30 
long, long, PAType*, std::__debug::vectorstd::allocator >&)::slValeurs>, count=3, 
datatype=0x7f9dca61e2c0 , src=1, tag=32767, 
comm=0x7f9dca62a840 , status=0x7ffcf4f08170) at 
pml_ob1_irecv.c:129
#9 0x7f9dca35f3c4 in PMPI_Recv (buf=0x7f9dd64a6b30 
long, long, PAType*, std::__debug::vectorstd::allocator >&)::slValeurs>, count=3, 
type=0x7f9dca61e2c0 , source=1, tag=32767, 
comm=0x7f9dca62a840 , status=0x7ffcf4f08170) at 
precv.c:77
#10 0x7f9dd6261d06 in assertionValeursIdentiquesSurTousLesProcessus 
(pComm=0x7f9dca62a840 , pRang=0, pNbProcessus=2, 
pValeurs=0x7f9dd5a94da0 girefSynchroniseGroupeProcessusModeDebugImpl(PAGroupeProcessus 
const&, char const*, int)::slDonnees>, pRequetes=std::__debug::vector of 
length 1, capacity 1 = {...}) at 
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/src/commun/Parallele/mpi_giref.cc:332


And some informations about configuration:

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_config.log

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_ompi_info_all.txt

Thanks,

Eric
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Patcher on MacOS

2018-09-28 Thread Nathan Hjelm via devel

Nope.  We just never bothered to disable it on osx. I think Jeff was working on 
a patch.

-Nathan

> On Sep 28, 2018, at 3:21 PM, Barrett, Brian via devel 
>  wrote:
> 
> Is there any practical reason to have the memory patcher component enabled 
> for MacOS?  As far as I know, we don’t have any transports which require 
> memory hooks on MacOS, and with the recent deprecation of the syscall 
> interface, it emits a couple of warnings.  It would be nice to crush said 
> warnings and the easiest way would be to not build the component.
> 
> Thoughts?
> 
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Test mail

2018-08-28 Thread Nathan Hjelm via devel

no

Sent from my iPhone

> On Aug 27, 2018, at 8:51 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> Will this get through?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi 3.1.x examples

2018-07-17 Thread Nathan Hjelm via devel



> On Jul 16, 2018, at 11:18 PM, Marco Atzeri  wrote:
> 
>> Am 16.07.2018 um 23:05 schrieb Jeff Squyres (jsquyres) via devel:
>>> On Jul 13, 2018, at 4:35 PM, Marco Atzeri  wrote:
>>> 
 For one. The C++ bindings are no longer part of the standard and they are 
 not built by default in v3.1x. They will be removed entirely in Open MPI 
 v5.0.0.
>> Hey Marco -- you should probably join our packagers mailing list:
>> https://lists.open-mpi.org/mailman/listinfo/ompi-packagers
>> Low volume, but intended exactly for packagers like you.  It's fairly 
>> recent; we realized we needed to keep in better communication with our 
>> downstream packagers.
> 
> noted thanks.
> 
>> (+ompi-packagers to the CC)
>> As Nathan mentioned, we stopped building the MPI C++ bindings by default in 
>> Open MPI 3.0.  You can choose to build them with the configure 
>> --enable-mpi-cxx.
> 
> I was aware, as I am not building it anymore, however
> probably we should exclude the C++ from default examples.

Good point. I will fix that today and PR the fix to v3.0.x and v3.1.x. 


> Regards
> Merco
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi 3.1.x examples

2018-07-13 Thread Nathan Hjelm via devel

For one. The C++ bindings are no longer part of the standard and they are not 
built by default in v3.1x. They will be removed entirely in Open MPI v5.0.0. 

Not sure why the fortran one is not building. 

-Nathan

> On Jul 13, 2018, at 2:02 PM, Marco Atzeri  wrote:
> 
> Hi,
> may be I am missing something obvious, but are the
> examples still actual
> 
>  C:   hello_c.c
>  C++: hello_cxx.cc
>  Fortran mpif.h:  hello_mpifh.f
>  Fortran use mpi: hello_usempi.f90
>  Fortran use mpi_f08: hello_usempif08.f90
>  Java:Hello.java
>  C shmem.h:   hello_oshmem_c.c
>  Fortran shmem.fh:hello_oshmemfh.f90
> 
> 
> $ make hello_cxx
> mpic++ -g  hello_cxx.cc  -o hello_cxx
> hello_cxx.cc: In function ‘int main(int, char**)’:
> hello_cxx.cc:25:5: error: ‘MPI’ has not been declared
> 
> $ make -i
> ...
> mpifort -g  hello_usempi.f90  -o hello_usempi
> hello_usempi.f90:14:8:
> 
> use mpi
>1
> Fatal Error: Can't open module file ‘mpi.mod’ for reading at (1): No such 
> file or directory
> 
> The second could be a different problem
> 
> $ ls /usr/lib/mpi.mod
> /usr/lib/mpi.mod
> 
> 
> ---
> Diese E-Mail wurde von AVG auf Viren geprüft.
> http://www.avg.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Odd warning in OMPI v3.0.x

2018-07-06 Thread Nathan Hjelm via devel

Looks like a bug to me. The second argument should be a value in v3.x.x.

-Nathan

> On Jul 6, 2018, at 4:00 PM, r...@open-mpi.org wrote:
> 
> I’m seeing this when building the v3.0.x branch:
> 
> runtime/ompi_mpi_init.c:395:49: warning: passing argument 2 of 
> ‘opal_atomic_cmpset_32’ makes integer from pointer without a cast 
> [-Wint-conversion]
>  if (!opal_atomic_cmpset_32(_mpi_state, , desired)) {
>  ^
> In file included from ../opal/include/opal/sys/atomic.h:159:0,
>  from ../opal/threads/thread_usage.h:30,
>  from ../opal/class/opal_object.h:126,
>  from ../opal/class/opal_list.h:73,
>  from runtime/ompi_mpi_init.c:43:
> ../opal/include/opal/sys/x86_64/atomic.h:85:19: note: expected ‘int32_t {aka 
> int}’ but argument is of type ‘int32_t * {aka int *}’
>  static inline int opal_atomic_cmpset_32( volatile int32_t *addr,
>^
> 
> 
> I have a feeling this isn’t correct - yes?
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel



signature.asc
Description: Message signed with OpenPGP
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Master warnings?

2018-06-02 Thread Nathan Hjelm

Should have it fixed today or tomorrow. Guess I didn't have a sufficiently old 
gcc to catch this during testing.

-Nathan

> On Jun 2, 2018, at 1:09 AM, gil...@rist.or.jp wrote:
> 
> Hi Ralph,
> 
>  
> 
> see my last comment in https://github.com/open-mpi/ompi/pull/5210
> 
>  
> 
> long story short, this is just a warning you can ignore.
> 
> If you are running on a CentOS 7 box
> 
> with the default GNU compiler, you can
> 
> opal_cv___attribute__error=0 configure ...
> 
> in order to get rid of these.
> 
>  
> 
> Cheers,
> 
>  
> 
> Gilles
> 
> - Original Message -
> 
> Geez guys - what happened?
>  
> In file included from monitoring_prof.c:47:0:
> ../../../../ompi/include/mpi.h:423:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Comm_errhandler_fn was removed in 
> MPI-3.0; use MPI_Comm_errhandler_function instead");
>   ^
> ../../../../ompi/include/mpi.h:425:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_File_errhandler_fn was removed in 
> MPI-3.0; use MPI_File_errhandler_function instead");
>   ^
> ../../../../ompi/include/mpi.h:427:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Win_errhandler_fn was removed in 
> MPI-3.0; use MPI_Win_errhandler_function instead");
>   ^
> ../../../../ompi/include/mpi.h:429:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Handler_function was removed in MPI-3.0; 
> use MPI_Win_errhandler_function instead");
>   ^
> ../../../../ompi/include/mpi.h:1042:29:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>  OMPI_DECLSPEC extern struct ompi_predefined_datatype_t ompi_mpi_lb 
> __mpi_interface_removed__("MPI_LB was removed in MPI-3.0");
> ^~
> ../../../../ompi/include/mpi.h:1043:29:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>  OMPI_DECLSPEC extern struct ompi_predefined_datatype_t ompi_mpi_ub 
> __mpi_interface_removed__("MPI_UB was removed in MPI-3.0");
> ^~
> In file included from monitoring_test.c:65:0:
> ../../ompi/include/mpi.h:423:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Comm_errhandler_fn was removed in 
> MPI-3.0; use MPI_Comm_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:425:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_File_errhandler_fn was removed in 
> MPI-3.0; use MPI_File_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:427:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Win_errhandler_fn was removed in 
> MPI-3.0; use MPI_Win_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:429:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Handler_function was removed in MPI-3.0; 
> use MPI_Win_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:1042:29:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>  OMPI_DECLSPEC extern struct ompi_predefined_datatype_t ompi_mpi_lb 
> __mpi_interface_removed__("MPI_LB was removed in MPI-3.0");
> ^~
> ../../ompi/include/mpi.h:1043:29:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>  OMPI_DECLSPEC extern struct ompi_predefined_datatype_t ompi_mpi_ub 
> __mpi_interface_removed__("MPI_UB was removed in MPI-3.0");
> ^~
> In file included from check_monitoring.c:21:0:
> ../../ompi/include/mpi.h:423:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Comm_errhandler_fn was removed in 
> MPI-3.0; use MPI_Comm_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:425:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_File_errhandler_fn was removed in 
> MPI-3.0; use MPI_File_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:427:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Win_errhandler_fn was removed in 
> MPI-3.0; use MPI_Win_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:429:9:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>   __mpi_interface_removed__("MPI_Handler_function was removed in MPI-3.0; 
> use MPI_Win_errhandler_function instead");
>   ^
> ../../ompi/include/mpi.h:1042:29:warning: ‘__error__’ attribute ignored 
> [-Wattributes]
>

[OMPI devel] RFC: Add an option to disable interfaces removed in MPI-3.0 and make it default

2018-05-02 Thread Nathan Hjelm


I put together a pull request that does the following:

1) Make all MPI-3.0 obsoleted interfaces conditionally built. They will
still show up in mpi.h (though I can remove them from there with some
configury magic) but will either be #if 0'd out or marked with
__attribute__((__error__)). The later is only available with gcc and
should give a helpful error message if the offending
functions/interfaces are used.

2) Add an option (--enable-mpi1-compat) to control building the
obsolute APIs. The new default is --disable-mpi1-compat. The goal is to
make this option available for the entirety of Open MPI v4.x.x and
remove it and the associated obsolete code in Open MPI v5.0.0.


How does the community feel about this change? Most of the obsolete
functions are trivial to swap out. The only exception is the LB/UB
markers but I intend to add examples of how to modernize to the FAQ.

Why make this change?

1) We are releasing a new major version. This change can only happen at
a major version.

2) These functions were deprecated along the way in MPI-2.x. They were
all removed in MPI-3.0 (2012). I think 6 years with a deprecated
warning is long enough. I would prefer to axe them now but having an
option to re-enable them for a major release series is a good
compromise option.


The PR can be found @ https://github.com/open-mpi/ompi/pull/5127


Open MPI is MPI-3.0 clean now. Most of MTT is as well. I intend to work
through MTT to remove any tests using obsolete functions. The one
exception to this will be the IBM test suite which will keep test to
conditionally check the obsolete functions if they exist (for the
benefit of v3.x.x and v4.x.x).

-Nathan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Time to remove the openib btl? (Was: Users/Eager RDMA causing slow osu_bibw with 3.0.0)

2018-04-05 Thread Nathan Hjelm


As soon as the uct btl is in place I have no use for the openib btl myself. It 
is a pain to maintain and I do have the time to mess with it. From a OSC 
perspective the uct btl offers much better performance with none of the 
headache.

-Nathan

On Apr 05, 2018, at 11:39 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

Below is an email exchange from the users mailing list.

I'm moving this over to devel to talk among the developer community.

Multiple times recently on the users list, we've told people with problems with 
the openib BTL that they should be using UCX (per Mellanox's publicly-stated 
support positions).

Is it time to deprecate / print warning messages / remove the openib BTL?



Begin forwarded message:
From: Nathan Hjelm <hje...@me.com>
Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0
Date: April 5, 2018 at 12:48:08 PM EDT
To: Open MPI Users <us...@lists.open-mpi.org>
Cc: Open MPI Users <us...@lists.open-mpi.org>
Reply-To: Open MPI Users <us...@lists.open-mpi.org>


Honestly, this is a configuration issue with the openib btl. There is no reason to keep 
either eager RDMA nor is there a reason to pipeline RDMA. I haven't found an app where 
either of these "features" helps you with infiniband. You have the right idea 
with the parameter changes but Howard is correct, for Mellanox the future is UCX not 
verbs. I would try it and see if it works for you but if it doesn't I would set those two 
parameters in your /etc/openmpi-mca-params.conf and run like that.

-Nathan

On Apr 05, 2018, at 01:18 AM, Ben Menadue <ben.mena...@nci.org.au> wrote:

Hi,

Another interesting point. I noticed that the last two message sizes tested 
(2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. Increasing 
the minimum size to use the RDMA pipeline to above these sizes brings those two 
data-points up to scratch for both benchmarks:

3.0.0, osu_bw, no rdma for large messages


mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -map-by ppr:1:node -np 2 
-H r6,r7 ./osu_bw -m 2097152:4194304

# OSU MPI Bi-Directional Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
2097152              6133.22
4194304              6054.06

3.0.0, osu_bibw, eager rdma disabled, no rdma for large messages


mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -mca 
btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw -m 
2097152:4194304

# OSU MPI Bi-Directional Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
2097152             11397.85
4194304             11389.64

This makes me think something odd is going on in the RDMA pipeline.

Cheers,
Ben



On 5 Apr 2018, at 5:03 pm, Ben Menadue <ben.mena...@nci.org.au> wrote:
Hi,

We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed that 
osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR IB). 
However, osu_bw is fine.

If I disable eager RDMA, then osu_bibw gives the expected numbers. Similarly, 
if I increase the number of eager RDMA buffers, it gives the expected results.

OpenMPI 1.10.7 gives consistent, reasonable numbers with default settings, but 
they’re not as good as 3.0.0 (when tuned) for large buffers. The same option 
changes produce no different in the performance for 1.10.7.

I was wondering if anyone else has noticed anything similar, and if this is 
unexpected, if anyone has a suggestion on how to investigate further?

Thanks,
Ben


Here’s are the numbers:

3.0.0, osu_bw, default settings


mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bw

# OSU MPI Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
1                       1.13
2                       2.29
4                       4.63
8                       9.21
16                     18.18
32                     36.46
64                     69.95
128                   128.55
256                   250.74
512                   451.54
1024                  829.44
2048                 1475.87
4096                 2119.99
8192                 3452.37
16384                2866.51
32768                4048.17
65536                5030.54
131072               5573.81
262144               5861.61
524288               6015.15
1048576              6099.46
2097152               989.82
4194304               989.81

3.0.0, osu_bibw, default settings


mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw

# OSU MPI Bi-Directional Bandwidth Test v5.4.0
# Size      Bandwidth (MB/s)
1                       0.00
2                       0.01
4                       0.01
8                       0.02
16                      0.04
32                      0.09
64                      0.16
128                   135.30
256                   265.35
512                   499.92
1024                  949.22
2048                 1440.27
4096                 1960.09
8192                 3166.97
16384                 127.62
32768                 165.12
65536                 31

Re: [OMPI devel] Openmpi 3.0.0/3.0.1rc3 MPI_Win_post failing

2018-03-09 Thread Nathan Hjelm

Fixed in master and I'm the 3.0.x branch. Try the nightly tarball. 

> On Mar 9, 2018, at 10:01 AM, Alan Wild  wrote:
> 
> I’ve been running the OSU micro benchmarks  ( 
> http://mvapich.cse.ohio-state.edu/benchmarks/ ). on my various MPI 
> installations.  One test that has been consistently failing is osu_put_bibw 
> when compiled with either openmpi 3.0.0 or openmpi 3.0.1rc3 when these builds 
> have also linked in the Mellanox mxm, hcoll, and SHaRP libraries AND when 
> running this two rank test across two nodes communicating with EDR Infiniband.
> 
> Fortunately this failure was true for both optimized and debug builds of 
> openmpi.
> 
> Stepping into the code with Allinea DDT I think I found the issue...
> 
> MPI_Win_post is ultimately calling ompi_osc_rdma_post_atomic() and on line 
> 245 there’s an if statement that reads:
> 
> If (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) {
> return OMPI_ERR_OUT_OF_RESOURCE;
> }
> 
> (Sorry can’t easily cut and paste the code... my work PC can’t get to my 
> personal email so I have to post this from an iPad).
> 
> Anyway,  if you look at the proceeding ~16 lines of code... “ret” is never 
> initialized or assigned to in any way... (as far as I can tell).  I’m not 
> completely familiar with the all the macros used, but it doesn’t appear that 
> any of them are assigning to “ret”.  Surprised this isn’t causing more chaos.
> 
> If I’m “right”.. is the right thing just to initialize ret to OMPI_SUCCESS or 
> perhaps should this condition just come out?
> 
> Thoughts?
> 
> -Alan
> a...@madllama.net
> -- 
> a...@madllama.net http://humbleville.blogspot.com
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [PATCH v5 0/4] vm: add a syscall to map a process memory into a pipe

2018-02-26 Thread Nathan Hjelm

All MPI implementations have support for using CMA to transfer data between 
local processes. The performance is fairly good (not as good as XPMEM) but the 
interface limits what we can do with to remote process memory (no atomics). I 
have not heard about this new proposal. What is the benefit of the proposed 
calls over the existing calls?

-Nathan

> On Feb 26, 2018, at 2:02 AM, Pavel Emelyanov  wrote:
> 
> On 02/21/2018 03:44 AM, Andrew Morton wrote:
>> On Tue,  9 Jan 2018 08:30:49 +0200 Mike Rapoport  
>> wrote:
>> 
>>> This patches introduces new process_vmsplice system call that combines
>>> functionality of process_vm_read and vmsplice.
>> 
>> All seems fairly strightforward.  The big question is: do we know that
>> people will actually use this, and get sufficient value from it to
>> justify its addition?
> 
> Yes, that's what bothers us a lot too :) I've tried to start with finding out 
> if anyone
> used the sys_read/write_process_vm() calls, but failed :( Does anybody know 
> how popular
> these syscalls are? If its users operate on big amount of memory, they could 
> benefit from
> the proposed splice extension.
> 
> -- Pavel
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel



signature.asc
Description: Message signed with OpenPGP
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] btl/vader: osu_bibw hangs when the number of execution loops is increased

2017-12-05 Thread Nathan Hjelm

Should be fixed by PR #4569 (https://github.com/open-mpi/ompi/pull/4569). 
Please treat and let me know.

-Nathan

> On Dec 1, 2017, at 7:37 AM, DERBEY, NADIA  wrote:
> 
> Hi,
> 
> Our validation team detected a hang when running osu_bibw 
> micro-benchmarks from the OMB 5.3 suite on openmpi 2.0.2 (note that the 
> same hang appears with openmpi-3.0).
> This hang occurs when calling osu_bibw on a single node (vader btl) with 
> the options "-x 100 -i 1000".
> The -x option changes the warmup loop size.
> The -i option changes the measured loop size.
> 
> For each exchanged message size, osu_bibw loops doing the following 
> sequence on both ranks:
>. posts 64 non-blocking sends
>. posts 64 non-blocking receives
>. waits for all the send requests to complete
>. waits for all the receive requests to complete
> 
> The loop size is the sum of
>. options.skip (warm up phase that can be changed with the -x option)
>. options.loop (actually measured loop that can be changed with the 
> -i option).
> 
> The default values are the following:
> 
> +==+==+==+
> | message size | skip | loop |
> |==+==+==|
> |<= 8K |   10 |  100 |
> |>  8K |2 |   20 |
> +==+==+==+
> 
> As said above, the test hangs when moving to more aggressive loop 
> values: 100 for skip and 1000 for loop.
> 
> mca_btl_vader_frag_alloc() calls opal_free_list_get() to get a fragment 
> from the appropriate free list.
> If there are no free fragments anymore, opal_free_list_get() calls 
> opal_free_list_grow() which in turn calls mca_btl_vader_frag_init() 
> (initialization routine for the vader btl fragements).
> This routine checks if there is enough space left in the mapped memory 
> segment for the wanted fragment size (current offset + fragment size 
> shoudl be <= segment size), and it makes opal_free_list_grow fail if the 
> shared memory segment is exhausted.
> 
> As soon as we begin exhausting memory, the 2 ranks get unsynchronized 
> and the test rapidly hangs. To avoid this hang, I found 2 possible 
> solutions:
> 
> 1) change the vader btl segment size: I have set it to 4GB - in order to 
> be able to do this, I had to change the type parameter in the parameter 
> registrations to MCA_BASE_VAR_TYPE_SIZE_T.
> 
> 2) change the call to opal_free_list_get() by a call to 
> opal_free_list_wait() in mca_btl_vader_frag_alloc(). This also makes the 
> micro-benchmark run to the end.
> 
> So my question is: what would be the best approach (#1 or #2)? and the 
> question behind this is: what is the reason that makes favoring 
> opal_free_list_get() instead of opal_free_list_wait().
> 
> Thanks
> 
> -- 
> Nadia Derbey - B1-387
> HPC R - MPI
> Tel: +33 4 76 29 77 62
> nadia.der...@atos.net
> 1 Rue de Provence BP 208
> 38130 Echirolles Cedex, France
> www.atos.com
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Abstraction violation!

2017-06-22 Thread Nathan Hjelm

I have a fix I am working on. Will open a PR tomorrow morning.

-Nathan

> On Jun 22, 2017, at 6:11 PM, r...@open-mpi.org wrote:
> 
> Here’s something even weirder. You cannot build that file unless mpi.h 
> already exists, which it won’t until you build the MPI layer. So apparently 
> what is happening is that we somehow pickup a pre-existing version of mpi.h 
> and use that to build the file?
> 
> Checking around, I find that all my available machines have an mpi.h 
> somewhere in the default path because we always install _something_. I wonder 
> if our master would fail in a distro that didn’t have an MPI installed...
> 
>> On Jun 22, 2017, at 5:02 PM, r...@open-mpi.org wrote:
>> 
>> It apparently did come in that way. We just never test -no-ompi and so it 
>> wasn’t discovered until a downstream project tried to update. Then...boom.
>> 
>> 
>>> On Jun 22, 2017, at 4:07 PM, Barrett, Brian via devel 
>>>  wrote:
>>> 
>>> I’m confused; looking at history, there’s never been a time when 
>>> opal/util/info.c hasn’t included mpi.h.  That seems odd, but so does info 
>>> being in opal.
>>> 
>>> Brian
>>> 
 On Jun 22, 2017, at 3:46 PM, r...@open-mpi.org wrote:
 
 I don’t understand what someone was thinking, but you CANNOT #include 
 “mpi.h” in opal/util/info.c. It has broken pretty much every downstream 
 project.
 
 Please fix this!
 Ralph
 
 ___
 devel mailing list
 devel@lists.open-mpi.org
 https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Master MTT results

2017-06-01 Thread Nathan Hjelm


A quick glance makes me think this might be related to the info changes. I am 
taking a look now.

-Nathan

On Jun 01, 2017, at 09:35 AM, "r...@open-mpi.org"  wrote:

Hey folks

I scanned the nightly MTT results from last night on master, and the RTE looks 
pretty solid. However, there are a LOT of onesided segfaults occurring, and I 
know that will eat up people’s disk space.

Just wanted to ensure folks were aware of the problem
Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Fwd: about ialltoallw

2017-05-08 Thread Nathan Hjelm

Probably a bug. Can you open an issue on github?

-Nathan

> On May 8, 2017, at 2:16 PM, Dahai Guo  wrote:
> 
> 
> Hi, 
> 
> The attached test code pass with MPICH well, but has problems with OpenMPI.
> 
> There are three tests in the code, the first passes, the second one hangs, 
> and the third one results in seg. fault and core dump.
> 
> The hanging seemed caused by the handle in the function 
> ompi_coll_libnbc_ialltoallw in mca/coll/libnbc/nbc_ialltoallw.c, where it is 
> not set correctly for the request, based on the input parameters.
> 
> The the seg fault is caused by ompi_datatype_type_size(sendtypes[me], 
> _size); //sendtypes[me] = NULL.
> 
> any suggestion?
> 
> Dahai
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] count = -1 for reduce

2017-05-04 Thread Nathan Hjelm


By default MPI errors are fatal and abort. The error message says it all:

*** An error occurred in MPI_Reduce
*** reported by process [3645440001,0]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_COUNT: invalid count argument
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)

If you want different behavior you have to change the default error handler on 
the communicator using MPI_Comm_set_errhandler. You can set it to 
MPI_ERRORS_RETURN and check the error code or you can create your own function. 
See MPI 3.1 Chapter 8.

-Nathan

On May 04, 2017, at 02:58 PM, Dahai Guo  wrote:

Hi,

Using opemi 2.1,  the following code resulted in the core dump, although only a 
simple error msg was expected.  Any idea what is wrong?  It seemed related the 
errhandler somewhere.


D.G.


 *** An error occurred in MPI_Reduce
 *** reported by process [3645440001,0]
 *** on communicator MPI_COMM_WORLD
 *** MPI_ERR_COUNT: invalid count argument
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)  
..

[1,1]:1000151c-1000151e rw-p  00:00 0 
[1,1]:1000151e-10001525 rw-p  00:00 0 
[1,1]:10001525-10001527 rw-p  00:00 0 
[1,1]:10001527-1000152e rw-p  00:00 0 
[1,1]:1000152e-10001530 rw-p  00:00 0 
[1,1]:10001530-10001551 rw-p  00:00 0 
[1,1]:10001551-10001553 rw-p  00:00 0 
[1,1]:10001553-10001574 rw-p  00:00 0 
[1,1]:10001574-10001576 rw-p  00:00 0 
[1,1]:10001576-10001597 rw-p  00:00 0 
[1,1]:10001597-10001599 rw-p  00:00 0 
[1,1]:10001599-100015ba rw-p  00:00 0 
[1,1]:100015ba-100015bc rw-p  00:00 0 
[1,1]:100015bc-100015dd rw-p  00:00 0 
[1,1]:100015dd-100015df rw-p  00:00 0 
[1,1]:100015df-10001600 rw-p  00:00 0 
[1,1]:10001600-10001602 rw-p  00:00 0 
[1,1]:10001602-10001623 rw-p  00:00 0 
[1,1]:10001623-10001625 rw-p  00:00 0 
[1,1]:10001625-10001646 rw-p  00:00 0 
[1,1]:10001646-10001647 rw-p  00:00 0 
[1,1]:3fffd463-3fffd46c rw-p  00:00 0   
   [stack] 
-- 

#include  
#include  
#include  
int main(int argc, char** argv) 
{ 

int r[1], s[1]; 
MPI_Init(,); 

s[0] = 1; 
r[0] = -1; 
MPI_Reduce(s,r,-1,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD); 
printf("%d\n",r[0]); 
MPI_Finalize(); 
}

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Stack Overflow with osc_pt2pt _frag_alloc

2017-03-23 Thread Nathan Hjelm

https://github.com/open-mpi/ompi/pull/3229


> On Mar 23, 2017, at 8:45 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> Clement -- I think we ran into the same error; I filed a bug last night (this 
> affects master and v3.x):
> 
>https://github.com/open-mpi/ompi/issues/3226
> 
> Sounds like Nathan is going to fix it shortly.
> 
> 
>> On Mar 23, 2017, at 10:39 AM, Nathan Hjelm <hje...@me.com> wrote:
>> 
>> Looks like an uncaught merge error. The call should have an _ before it. 
>> Will fix.
>> 
>> On Mar 23, 2017, at 6:33 AM, Clement FOYER <clement.fo...@gmail.com> wrote:
>> 
>>> Hi everyone,
>>> 
>>> While testing localy on my computer code using one-sided operations, the 
>>> selected module is the pt2pt OSC component, which always crashes with a 
>>> segmentation fault when the stack overflows.
>>> 
>>> From what I've seen, it occurs when ompi_osc_pt2pt_frag_alloc is called. 
>>> This function recursively call itself as first statement, without modifying 
>>> any argument. This function is defined in 
>>> ompi/mca/osc/pt2pt/osc_pt2pt_frag.h line 169.
>>> 
>>> Is it the normal behaviour? How is it supposed to stop from recursively 
>>> call itself?
>>> 
>>> Cheers,
>>> 
>>> Clement FOYER
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Stack Overflow with osc_pt2pt _frag_alloc

2017-03-23 Thread Nathan Hjelm

Looks like an uncaught merge error. The call should have an _ before it. Will 
fix.

> On Mar 23, 2017, at 6:33 AM, Clement FOYER  wrote:
> 
> Hi everyone,
> 
> While testing localy on my computer code using one-sided operations, the 
> selected module is the pt2pt OSC component, which always crashes with a 
> segmentation fault when the stack overflows.
> 
> From what I've seen, it occurs when ompi_osc_pt2pt_frag_alloc is called. This 
> function recursively call itself as first statement, without modifying any 
> argument. This function is defined in ompi/mca/osc/pt2pt/osc_pt2pt_frag.h 
> line 169.
> 
> Is it the normal behaviour? How is it supposed to stop from recursively call 
> itself?
> 
> Cheers,
> 
> Clement FOYER
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm

To be safe I would call MPI_Get then MPI_Win_flush. That lock will always be 
acquired before the MPI_Win_flush call returns. As long as it is more than 0 
bytes. We always short-circuit 0-byte operations in both osc/rdma and osc/pt2pt.

-Nathan

> On Nov 21, 2016, at 8:54 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> 
> Thanks Nathan,
> 
> 
> any thoughts about my modified version of the test ?
> 
> do i need to MPI_Win_flush() after the first MPI_Get() in order to ensure the 
> lock was acquired ?
> 
> (and hence the program will either success or hang, but never fail)
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> On 11/22/2016 12:29 PM, Nathan Hjelm wrote:
>> MPI_Win_lock does not have to be blocking. In osc/rdma it is blocking in 
>> most cases but not others (lock all with on-demand is non-blocking) but in 
>> osc/pt2pt is is almost always non-blocking (it has to be blocking for proc 
>> self). If you really want to ensure the lock is acquired you can call 
>> MPI_Win_flush. I think this should work even if you have not started any RMA 
>> operations inside the epoch.
>> 
>> -Nathan
>> 
>>> On Nov 21, 2016, at 7:53 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
>>> 
>>> Nathan,
>>> 
>>> 
>>> we briefly discussed the test_lock1 test from the onesided test suite using 
>>> osc/pt2pt
>>> 
>>> https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70
>>> 
>>> 
>>> task 0 does
>>> 
>>> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
>>> 
>>> MPI_Send(...,dest=2,...)
>>> 
>>> 
>>> and task 2 does
>>> 
>>> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
>>> 
>>> MPI_Recv(...,source=0,...)
>>> 
>>> 
>>> hoping to guarantee task 0 will acquire the lock first.
>>> 
>>> 
>>> once in a while, the test fails when task 2 acquires the lock first
>>> 
>>> /* MPI_Win_lock() only sends a lock request, and return without owning the 
>>> lock */
>>> 
>>> so if task 1 is running on a loaded server, and even if task 2 requests the 
>>> lock *after* task 0,
>>> 
>>> lock request from task 2 can be processed first, and hence task 2 is not 
>>> guaranteed to acquire the lock *before* task 0.
>>> 
>>> 
>>> can you please confirm MPI_Win_lock() behaves as it is supposed to ?
>>> 
>>> if yes, is there a way for task 0 to block until it acquires the lock ?
>>> 
>>> 
>>> i modified the test, and inserted in task 0 a MPI_Get of 1 MPI_Double 
>>> *before* MPI_Send.
>>> 
>>> see my patch below (note i increased the message length)
>>> 
>>> 
>>> my expectation is that the test would either success (e.g. task 0 gets the 
>>> lock first) or hang
>>> 
>>> (if task 1 gets the lock first)
>>> 
>>> 
>>> 
>>> surprisingly, the test never hangs (so far ...) but once in a while, it 
>>> fails (!), which makes me very confused
>>> 
>>> 
>>> Any thoughts ?
>>> 
>>> 
>>> Cheers,
>>> 
>>> 
>>> Gilles
>>> 
>>> 
>>> 
>>> diff --git a/onesided/test_lock1.c b/onesided/test_lock1.c
>>> index c549093..9fa3f8d 100644
>>> --- a/onesided/test_lock1.c
>>> +++ b/onesided/test_lock1.c
>>> @@ -20,7 +20,7 @@ int
>>> test_lock1(void)
>>> {
>>> double *a = NULL;
>>> -size_t len = 10;
>>> +size_t len = 100;
>>> MPI_Winwin;
>>> inti;
>>> 
>>> @@ -56,6 +56,7 @@ test_lock1(void)
>>>  */
>>> if (me == 0) {
>>>MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
>>> +   MPI_Get(a,1,MPI_DOUBLE,1,0,1,MPI_DOUBLE,win);
>>> MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
>>>MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
>>> MPI_Win_unlock(1, win);
>>> @@ -76,6 +77,7 @@ test_lock1(void)
>>> /* make sure 0 got the data from 1 */
>>>for (i = 0; i < len; i++) {
>>>if (a[i] != (double)(10*1+i)) {
>>> +if (0 == nfail) fprintf(stderr, "at index %d, expected %lf 
>>> but got %lf\n", i, (double)10*1+i, a[i]);
>>>nfail++;
>>>}
>>>}
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm

True. The lock itself is not an RMA operation so I am not sure if it is 
supposed to complete with flush. He may have to initiate an RMA operation to 
get the semantics desired.

I do think osc/pt2pt will still flush and wait for the lock message. The 
semantics are not wrong but could be more than an implementation has to provide.

-Nathan

> On Nov 21, 2016, at 8:47 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
> Why is MPI_Win_flush required to ensure the lock is acquired ? According to 
> the standard MPI_Win_flush "completes all outstanding RMA operations 
> initiated by the calling process to the target rank on the specified window", 
> which can be read as being a noop if no pending operations exists.
> 
>   George.
> 
> 
> 
> On Mon, Nov 21, 2016 at 8:29 PM, Nathan Hjelm <hje...@me.com> wrote:
> MPI_Win_lock does not have to be blocking. In osc/rdma it is blocking in most 
> cases but not others (lock all with on-demand is non-blocking) but in 
> osc/pt2pt is is almost always non-blocking (it has to be blocking for proc 
> self). If you really want to ensure the lock is acquired you can call 
> MPI_Win_flush. I think this should work even if you have not started any RMA 
> operations inside the epoch.
> 
> -Nathan
> 
> > On Nov 21, 2016, at 7:53 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> >
> > Nathan,
> >
> >
> > we briefly discussed the test_lock1 test from the onesided test suite using 
> > osc/pt2pt
> >
> > https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70
> >
> >
> > task 0 does
> >
> > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
> >
> > MPI_Send(...,dest=2,...)
> >
> >
> > and task 2 does
> >
> > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
> >
> > MPI_Recv(...,source=0,...)
> >
> >
> > hoping to guarantee task 0 will acquire the lock first.
> >
> >
> > once in a while, the test fails when task 2 acquires the lock first
> >
> > /* MPI_Win_lock() only sends a lock request, and return without owning the 
> > lock */
> >
> > so if task 1 is running on a loaded server, and even if task 2 requests the 
> > lock *after* task 0,
> >
> > lock request from task 2 can be processed first, and hence task 2 is not 
> > guaranteed to acquire the lock *before* task 0.
> >
> >
> > can you please confirm MPI_Win_lock() behaves as it is supposed to ?
> >
> > if yes, is there a way for task 0 to block until it acquires the lock ?
> >
> >
> > i modified the test, and inserted in task 0 a MPI_Get of 1 MPI_Double 
> > *before* MPI_Send.
> >
> > see my patch below (note i increased the message length)
> >
> >
> > my expectation is that the test would either success (e.g. task 0 gets the 
> > lock first) or hang
> >
> > (if task 1 gets the lock first)
> >
> >
> >
> > surprisingly, the test never hangs (so far ...) but once in a while, it 
> > fails (!), which makes me very confused
> >
> >
> > Any thoughts ?
> >
> >
> > Cheers,
> >
> >
> > Gilles
> >
> >
> >
> > diff --git a/onesided/test_lock1.c b/onesided/test_lock1.c
> > index c549093..9fa3f8d 100644
> > --- a/onesided/test_lock1.c
> > +++ b/onesided/test_lock1.c
> > @@ -20,7 +20,7 @@ int
> > test_lock1(void)
> > {
> > double *a = NULL;
> > -size_t len = 10;
> > +size_t len = 100;
> > MPI_Winwin;
> > inti;
> >
> > @@ -56,6 +56,7 @@ test_lock1(void)
> >  */
> > if (me == 0) {
> >MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
> > +   MPI_Get(a,1,MPI_DOUBLE,1,0,1,MPI_DOUBLE,win);
> > MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
> >MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
> > MPI_Win_unlock(1, win);
> > @@ -76,6 +77,7 @@ test_lock1(void)
> > /* make sure 0 got the data from 1 */
> >for (i = 0; i < len; i++) {
> >if (a[i] != (double)(10*1+i)) {
> > +if (0 == nfail) fprintf(stderr, "at index %d, expected %lf 
> > but got %lf\n", i, (double)10*1+i, a[i]);
> >nfail++;
> >}
> >}
> >
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] MPI_Win_lock semantic

2016-11-21 Thread Nathan Hjelm

MPI_Win_lock does not have to be blocking. In osc/rdma it is blocking in most 
cases but not others (lock all with on-demand is non-blocking) but in osc/pt2pt 
is is almost always non-blocking (it has to be blocking for proc self). If you 
really want to ensure the lock is acquired you can call MPI_Win_flush. I think 
this should work even if you have not started any RMA operations inside the 
epoch.

-Nathan

> On Nov 21, 2016, at 7:53 PM, Gilles Gouaillardet  wrote:
> 
> Nathan,
> 
> 
> we briefly discussed the test_lock1 test from the onesided test suite using 
> osc/pt2pt
> 
> https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70
> 
> 
> task 0 does
> 
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
> 
> MPI_Send(...,dest=2,...)
> 
> 
> and task 2 does
> 
> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
> 
> MPI_Recv(...,source=0,...)
> 
> 
> hoping to guarantee task 0 will acquire the lock first.
> 
> 
> once in a while, the test fails when task 2 acquires the lock first
> 
> /* MPI_Win_lock() only sends a lock request, and return without owning the 
> lock */
> 
> so if task 1 is running on a loaded server, and even if task 2 requests the 
> lock *after* task 0,
> 
> lock request from task 2 can be processed first, and hence task 2 is not 
> guaranteed to acquire the lock *before* task 0.
> 
> 
> can you please confirm MPI_Win_lock() behaves as it is supposed to ?
> 
> if yes, is there a way for task 0 to block until it acquires the lock ?
> 
> 
> i modified the test, and inserted in task 0 a MPI_Get of 1 MPI_Double 
> *before* MPI_Send.
> 
> see my patch below (note i increased the message length)
> 
> 
> my expectation is that the test would either success (e.g. task 0 gets the 
> lock first) or hang
> 
> (if task 1 gets the lock first)
> 
> 
> 
> surprisingly, the test never hangs (so far ...) but once in a while, it fails 
> (!), which makes me very confused
> 
> 
> Any thoughts ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> 
> diff --git a/onesided/test_lock1.c b/onesided/test_lock1.c
> index c549093..9fa3f8d 100644
> --- a/onesided/test_lock1.c
> +++ b/onesided/test_lock1.c
> @@ -20,7 +20,7 @@ int
> test_lock1(void)
> {
> double *a = NULL;
> -size_t len = 10;
> +size_t len = 100;
> MPI_Winwin;
> inti;
> 
> @@ -56,6 +56,7 @@ test_lock1(void)
>  */
> if (me == 0) {
>MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
> +   MPI_Get(a,1,MPI_DOUBLE,1,0,1,MPI_DOUBLE,win);
> MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
>MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
> MPI_Win_unlock(1, win);
> @@ -76,6 +77,7 @@ test_lock1(void)
> /* make sure 0 got the data from 1 */
>for (i = 0; i < len; i++) {
>if (a[i] != (double)(10*1+i)) {
> +if (0 == nfail) fprintf(stderr, "at index %d, expected %lf 
> but got %lf\n", i, (double)10*1+i, a[i]);
>nfail++;
>}
>}
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Deadlock in sync_wait_mt(): Proposed patch

2016-09-21 Thread Nathan Hjelm

Yeah, that looks like a bug to me. We need to keep the check before the lock 
but otherwise this is fine and should be fixed in 2.0.2.

-Nathan

> On Sep 21, 2016, at 3:16 AM, DEVEZE, PASCAL  wrote:
> 
> I encountered a deadlock in sync_wait_mt().
>  
> After investigations, it appears that a first thread executing 
> wait_sync_update() decrements sync->count just after a second thread in 
> sync_wait_mt() made the test :
>  
> if(sync->count <= 0)
> return (0 == sync->status) ? OPAL_SUCCESS : OPAL_ERROR;
>  
> After that, there is a narrow window in which the first thread may call 
> pthread_cond_signal() before the second thread calls pthread_cond_wait().
>  
> If I protect this test by the sync->lock, this window is closed and the 
> problem does not reproduce.
>  
> To easy reproduce the problem, just add a call to usleep(100) before the call 
> to pthread_mutex(>lock);
>  
> So my proposed patch is:
>  
> diff --git a/opal/threads/wait_sync.c b/opal/threads/wait_sync.c
> index c9b9137..2f90965 100644
> --- a/opal/threads/wait_sync.c
> +++ b/opal/threads/wait_sync.c
> @@ -25,12 +25,14 @@ static ompi_wait_sync_t* wait_sync_list = NULL;
>  
> int sync_wait_mt(ompi_wait_sync_t *sync)
> {
> -if(sync->count <= 0)
> -return (0 == sync->status) ? OPAL_SUCCESS : OPAL_ERROR;
> -
>  /* lock so nobody can signal us during the list updating */
>  pthread_mutex_lock(>lock);
>  
> +if(sync->count <= 0) {
> +pthread_mutex_unlock(>lock);
> +return (0 == sync->status) ? OPAL_SUCCESS : OPAL_ERROR;
> +}
> +
>  /* Insert sync on the list of pending synchronization constructs */
>  OPAL_THREAD_LOCK(_sync_lock);
>  if( NULL == wait_sync_list ) {
>  
> For performance reasons, it is also possible to leave the first test call. So 
> if the request is terminated, we do not spend time to take and free the lock.
>  
>  
>  
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-07 Thread Nathan Hjelm

Great! Thank you Josh! If everything looks good over the next couple of days I 
will open up the 2.0.2 PR for this.

-Nathan

> On Sep 7, 2016, at 7:22 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
> 
> I just gained access to the PGI 16.7 compiler for ppc64le. I'm going to add 
> it to our nightly MTT, so we can monitor progress on this support. It might 
> not make it into tonight's testing, but should be tomorrow. I might also try 
> to add it to our Jenkins testing too.
> 
> On Wed, Sep 7, 2016 at 7:36 PM, Nathan Hjelm <hje...@me.com> wrote:
> Thanks for reporting this! Glad the problem is fixed. We will get this into 
> 2.0.2.
> 
> -Nathan
> 
> > On Sep 7, 2016, at 9:39 AM, Vallee, Geoffroy R. <valle...@ornl.gov> wrote:
> >
> > I just tried the fix and i can confirm that it fixes the problem. :)
> >
> > Thanks!!!
> >
> >> On Sep 2, 2016, at 6:18 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> >> wrote:
> >>
> >> Issue filed at https://github.com/open-mpi/ompi/issues/2044.
> >>
> >> I asked Nathan and Sylvain to have a look.
> >>
> >>
> >>> On Sep 1, 2016, at 9:20 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >>>
> >>> I failed to get PGI 16.x working at all (licence issue, I think).
> >>> So, I can neither confirm nor refute Geoffroy's reported problems.
> >>>
> >>> -Paul
> >>>
> >>> On Thu, Sep 1, 2016 at 6:15 PM, Vallee, Geoffroy R. <valle...@ornl.gov> 
> >>> wrote:
> >>> Interesting, I am having the problem with both 16.5 and 16.7.
> >>>
> >>> My 2 cents,
> >>>
> >>>> On Sep 1, 2016, at 8:25 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> >>>>
> >>>> FWIW I have not seen problems when testing the 2.0.1rc2 w/ PGI versions 
> >>>> 12.10, 13.9, 14.3 or 15.9.
> >>>>
> >>>> I am going to test 2.0.2.rc3 ASAP and try to get PGI 16.4 coverage added 
> >>>> in
> >>>>
> >>>> -Paul
> >>>>
> >>>> On Thu, Sep 1, 2016 at 12:48 PM, Jeff Squyres (jsquyres) 
> >>>> <jsquy...@cisco.com> wrote:
> >>>> Please send all the information on the build support page and open an 
> >>>> issue at github.  Thanks.
> >>>>
> >>>>
> >>>>> On Sep 1, 2016, at 3:41 PM, Vallee, Geoffroy R. <valle...@ornl.gov> 
> >>>>> wrote:
> >>>>>
> >>>>> This is indeed a little better but still creating a problem:
> >>>>>
> >>>>> CCLD opal_wrapper
> >>>>> ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> >>>>> `_opal_progress_unregister':
> >>>>> /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:459:
> >>>>>  undefined reference to `opal_atomic_swap_64'
> >>>>> ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> >>>>> `_opal_progress_register':
> >>>>> /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:398:
> >>>>>  undefined reference to `opal_atomic_swap_64'
> >>>>> make[2]: *** [opal_wrapper] Error 2
> >>>>> make[2]: Leaving directory 
> >>>>> `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/tools/wrappers'
> >>>>> make[1]: *** [all-recursive] Error 1
> >>>>> make[1]: Leaving directory 
> >>>>> `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal'
> >>>>> make: *** [all-recursive] Error 1
> >>>>>
> >>>>> $ nm libopen-pal.a  | grep atomic
> >>>>>   U opal_atomic_cmpset_64
> >>>>> 0ab0 t opal_atomic_cmpset_ptr
> >>>>>   U opal_atomic_wmb
> >>>>> 0950 t opal_lifo_push_atomic
> >>>>>   U opal_atomic_cmpset_acq_32
> >>>>> 03d0 t opal_atomic_lock
> >>>>> 0450 t opal_atomic_unlock
> >>>>>   U opal_atomic_wmb
> >>>>>   U opal_atomic_ll_64
> >>>>>   U opal_atomic_sc_64
> >>>>>   U opal_atomic_wmb
> >>>>> 1010 t opal_lifo_pop_atomic
> >>>>>   U opal_atomic_cmpset_acq_32
> >&

Re: [OMPI devel] openmpi-2.0.0 - problems with ppc64, PGI and atomics

2016-09-07 Thread Nathan Hjelm

Thanks for reporting this! Glad the problem is fixed. We will get this into 
2.0.2.

-Nathan

> On Sep 7, 2016, at 9:39 AM, Vallee, Geoffroy R.  wrote:
> 
> I just tried the fix and i can confirm that it fixes the problem. :)
> 
> Thanks!!!
> 
>> On Sep 2, 2016, at 6:18 AM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> Issue filed at https://github.com/open-mpi/ompi/issues/2044.
>> 
>> I asked Nathan and Sylvain to have a look.
>> 
>> 
>>> On Sep 1, 2016, at 9:20 PM, Paul Hargrove  wrote:
>>> 
>>> I failed to get PGI 16.x working at all (licence issue, I think).
>>> So, I can neither confirm nor refute Geoffroy's reported problems.
>>> 
>>> -Paul
>>> 
>>> On Thu, Sep 1, 2016 at 6:15 PM, Vallee, Geoffroy R.  
>>> wrote:
>>> Interesting, I am having the problem with both 16.5 and 16.7.
>>> 
>>> My 2 cents,
>>> 
 On Sep 1, 2016, at 8:25 PM, Paul Hargrove  wrote:
 
 FWIW I have not seen problems when testing the 2.0.1rc2 w/ PGI versions 
 12.10, 13.9, 14.3 or 15.9.
 
 I am going to test 2.0.2.rc3 ASAP and try to get PGI 16.4 coverage added in
 
 -Paul
 
 On Thu, Sep 1, 2016 at 12:48 PM, Jeff Squyres (jsquyres) 
  wrote:
 Please send all the information on the build support page and open an 
 issue at github.  Thanks.
 
 
> On Sep 1, 2016, at 3:41 PM, Vallee, Geoffroy R.  wrote:
> 
> This is indeed a little better but still creating a problem:
> 
> CCLD opal_wrapper
> ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> `_opal_progress_unregister':
> /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:459:
>  undefined reference to `opal_atomic_swap_64'
> ../../../opal/.libs/libopen-pal.a(opal_progress.o): In function 
> `_opal_progress_register':
> /autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/runtime/opal_progress.c:398:
>  undefined reference to `opal_atomic_swap_64'
> make[2]: *** [opal_wrapper] Error 2
> make[2]: Leaving directory 
> `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal/tools/wrappers'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/autofs/nccs-svm1_sw/gvh/src/openmpi-2.0.1rc2/opal'
> make: *** [all-recursive] Error 1
> 
> $ nm libopen-pal.a  | grep atomic
>   U opal_atomic_cmpset_64
> 0ab0 t opal_atomic_cmpset_ptr
>   U opal_atomic_wmb
> 0950 t opal_lifo_push_atomic
>   U opal_atomic_cmpset_acq_32
> 03d0 t opal_atomic_lock
> 0450 t opal_atomic_unlock
>   U opal_atomic_wmb
>   U opal_atomic_ll_64
>   U opal_atomic_sc_64
>   U opal_atomic_wmb
> 1010 t opal_lifo_pop_atomic
>   U opal_atomic_cmpset_acq_32
> 04b0 t opal_atomic_init
> 04e0 t opal_atomic_lock
>   U opal_atomic_mb
> 0560 t opal_atomic_unlock
>   U opal_atomic_wmb
>   U opal_atomic_add_32
>   U opal_atomic_cmpset_acq_32
> 0820 t opal_atomic_init
> 0850 t opal_atomic_lock
>   U opal_atomic_sub_32
>   U opal_atomic_swap_64
> 08d0 t opal_atomic_unlock
>   U opal_atomic_wmb
> 0130 t opal_atomic_init
> atomic-asm.o:
> 0138 T opal_atomic_add_32
> 0018 T opal_atomic_cmpset_32
> 00c4 T opal_atomic_cmpset_64
> 003c T opal_atomic_cmpset_acq_32
> 00e8 T opal_atomic_cmpset_acq_64
> 0070 T opal_atomic_cmpset_rel_32
> 0110 T opal_atomic_cmpset_rel_64
>  T opal_atomic_mb
> 0008 T opal_atomic_rmb
> 0150 T opal_atomic_sub_32
> 0010 T opal_atomic_wmb
> 2280 t mca_base_pvar_is_atomic
>   U opal_atomic_ll_64
>   U opal_atomic_sc_64
>   U opal_atomic_wmb
> 0900 t opal_lifo_pop_atomic
> 
>> On Sep 1, 2016, at 3:16 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> Can you try the latest v2.0.1 nightly snapshot tarball?
>> 
>> 
>>> On Sep 1, 2016, at 2:56 PM, Vallee, Geoffroy R.  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> I get the following problem when we compile OpenMPI-2.0.0 (it seems to 
>>> be specific to 2.x; the problem did not appear with 1.10.x) with PGI:
>>> 
>>> CCLD opal_wrapper
>>> ../../../opal/.libs/libopen-pal.so: undefined reference to 
>>> `opal_atomic_sc_64'
>>> ../../../opal/.libs/libopen-pal.so:

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm

Posted a possible fix to the intercomm hang. See 
https://github.com/open-mpi/ompi/pull/2061

-Nathan


> On Sep 7, 2016, at 6:53 AM, Nathan Hjelm <hje...@me.com> wrote:
> 
> Looking at the code now. This code was more or less directly translated from 
> the blocking version. I wouldn’t be surprised if there is an error that I 
> didn’t catch with MTT on my laptop.
> 
> That said, there is an old comment about not using bcast to avoid a possible 
> deadlock. Since the collective is now non-blocking that is no longer a 
> problem. The simple answer is to use ibcast instead of iallgather. Will work 
> on that fix now.
> 
> -Nathan
> 
>> On Sep 7, 2016, at 3:02 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
>> 
>> Thanks guys,
>> 
>> 
>> so i was finally able to reproduce the bug on my (oversubscribed) VM with 
>> tcp.
>> 
>> 
>> MPI_Intercomm_merge (indirectly) incorrectly invokes iallgatherv.
>> 1,main (MPI_Issend_rtoa_c.c:196)
>> 1,  MPITEST_get_communicator (libmpitest.c:3544)
>> 1,PMPI_Intercomm_merge (pintercomm_merge.c:131)
>> 1,  ompi_comm_activate (comm_cid.c:514)
>> 1,ompi_request_wait_completion (request.h:397)
>> 1,  opal_progress (opal_progress.c:221)
>> 1,ompi_comm_request_progress (comm_request.c:132)
>> 1,  ompi_comm_allreduce_inter_leader_reduce (comm_cid.c:699)
>> 1,ompi_comm_allreduce_inter_allgather (comm_cid.c:723)
>> 1,  ompi_coll_libnbc_iallgatherv_inter 
>> (nbc_iallgatherv.c:173)
>> 
>> 
>> global tasks 0 and 1 are both root task 0 of an intercomm on groups A and B
>> 
>> they both invoke iallgatherv with scount=1, but context->rcounts[0]=0 (it 
>> should be 1)
>> per the man page
>> "The type signature associated with sendcount, sendtype, at process j must 
>> be equal to the type signature associated with recvcounts[j], recvtype at 
>> any other process."
>> 
>> so if the initial intention was not to gather only on roots, then this is 
>> not possible with iallgatherv
>> 
>> what happens then is that iallgatherv isend data (scount>0), but no matching 
>> irecv is posted (rcounts[0]==0)
>> then the intercomm is destroyed.
>> and then the message is received later by opal_progress on a communicator 
>> that do not exist (any more)
>> this message is hence stored by pml/ob1 in the 
>> non_existing_communicator_pending list
>> /* btw, can someone kindly explain me the rationale for this ?
>> is there any valid case in which a message can be received on a communicator 
>> that does not exist yet ?
>> if the only valid case is the communicator does not exist any more, should 
>> the message be simply discarded ? */
>> 
>> much later in the test, a new communicator is created with the same cid than 
>> the intercomm, and a hang can occur
>> i can only suspect the message in the non_existing_communicator_pending list 
>> causes that.
>> 
>> 
>> bottom line, i think the root cause is a bad invocation of iallgatherv.
>> Nathan, could you please have a look ?
>> 
>> 
>> fwiw, during my investigations, i was able to get rid of the hang by *not* 
>> recycling CIDs
>> with the patch below.
>> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> diff --git a/ompi/communicator/comm_init.c b/ompi/communicator/comm_init.c
>> index f453ca1..7195aa2 100644
>> --- a/ompi/communicator/comm_init.c
>> +++ b/ompi/communicator/comm_init.c
>> @@ -297,7 +297,7 @@ int ompi_comm_finalize(void)
>> max = opal_pointer_array_get_size(_mpi_communicators);
>> for ( i=3; i<max; i++ ) {
>> comm = (ompi_communicator_t 
>> *)opal_pointer_array_get_item(_mpi_communicators, i);
>> -if ( NULL != comm ) {
>> +if ( NULL != comm && (ompi_communicator_t *)0x1 != comm) {
>> /* Communicator has not been freed before finalize */
>> OBJ_RELEASE(comm);
>> comm=(ompi_communicator_t 
>> *)opal_pointer_array_get_item(_mpi_communicators, i);
>> @@ -435,7 +435,7 @@ static void ompi_comm_destruct(ompi_communicator_t* comm)
>>  NULL != opal_pointer_array_get_item(_mpi_communicators,
>>  comm->c_contextid)) {
>> opal_pointer_array_set_item ( _mpi_communicators,
>> -  comm->c_contextid, NULL);
>> +  comm->c_contextid, (void *)0x1);
>> }
>> 
>>

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm

Looking at the code now. This code was more or less directly translated from 
the blocking version. I wouldn’t be surprised if there is an error that I 
didn’t catch with MTT on my laptop.

That said, there is an old comment about not using bcast to avoid a possible 
deadlock. Since the collective is now non-blocking that is no longer a problem. 
The simple answer is to use ibcast instead of iallgather. Will work on that fix 
now.

-Nathan

> On Sep 7, 2016, at 3:02 AM, Gilles Gouaillardet  wrote:
> 
> Thanks guys,
> 
> 
> so i was finally able to reproduce the bug on my (oversubscribed) VM with tcp.
> 
> 
> MPI_Intercomm_merge (indirectly) incorrectly invokes iallgatherv.
> 1,main (MPI_Issend_rtoa_c.c:196)
> 1,  MPITEST_get_communicator (libmpitest.c:3544)
> 1,PMPI_Intercomm_merge (pintercomm_merge.c:131)
> 1,  ompi_comm_activate (comm_cid.c:514)
> 1,ompi_request_wait_completion (request.h:397)
> 1,  opal_progress (opal_progress.c:221)
> 1,ompi_comm_request_progress (comm_request.c:132)
> 1,  ompi_comm_allreduce_inter_leader_reduce (comm_cid.c:699)
> 1,ompi_comm_allreduce_inter_allgather (comm_cid.c:723)
> 1,  ompi_coll_libnbc_iallgatherv_inter (nbc_iallgatherv.c:173)
> 
> 
> global tasks 0 and 1 are both root task 0 of an intercomm on groups A and B
> 
> they both invoke iallgatherv with scount=1, but context->rcounts[0]=0 (it 
> should be 1)
> per the man page
> "The type signature associated with sendcount, sendtype, at process j must be 
> equal to the type signature associated with recvcounts[j], recvtype at any 
> other process."
> 
> so if the initial intention was not to gather only on roots, then this is not 
> possible with iallgatherv
> 
> what happens then is that iallgatherv isend data (scount>0), but no matching 
> irecv is posted (rcounts[0]==0)
> then the intercomm is destroyed.
> and then the message is received later by opal_progress on a communicator 
> that do not exist (any more)
> this message is hence stored by pml/ob1 in the 
> non_existing_communicator_pending list
> /* btw, can someone kindly explain me the rationale for this ?
> is there any valid case in which a message can be received on a communicator 
> that does not exist yet ?
> if the only valid case is the communicator does not exist any more, should 
> the message be simply discarded ? */
> 
> much later in the test, a new communicator is created with the same cid than 
> the intercomm, and a hang can occur
> i can only suspect the message in the non_existing_communicator_pending list 
> causes that.
> 
> 
> bottom line, i think the root cause is a bad invocation of iallgatherv.
> Nathan, could you please have a look ?
> 
> 
> fwiw, during my investigations, i was able to get rid of the hang by *not* 
> recycling CIDs
> with the patch below.
> 
> 
> Cheers,
> 
> Gilles
> 
> diff --git a/ompi/communicator/comm_init.c b/ompi/communicator/comm_init.c
> index f453ca1..7195aa2 100644
> --- a/ompi/communicator/comm_init.c
> +++ b/ompi/communicator/comm_init.c
> @@ -297,7 +297,7 @@ int ompi_comm_finalize(void)
>  max = opal_pointer_array_get_size(_mpi_communicators);
>  for ( i=3; i  comm = (ompi_communicator_t 
> *)opal_pointer_array_get_item(_mpi_communicators, i);
> -if ( NULL != comm ) {
> +if ( NULL != comm && (ompi_communicator_t *)0x1 != comm) {
>  /* Communicator has not been freed before finalize */
>  OBJ_RELEASE(comm);
>  comm=(ompi_communicator_t 
> *)opal_pointer_array_get_item(_mpi_communicators, i);
> @@ -435,7 +435,7 @@ static void ompi_comm_destruct(ompi_communicator_t* comm)
>   NULL != opal_pointer_array_get_item(_mpi_communicators,
>   comm->c_contextid)) {
>  opal_pointer_array_set_item ( _mpi_communicators,
> -  comm->c_contextid, NULL);
> +  comm->c_contextid, (void *)0x1);
>  }
>  
>  /* reset the ompi_comm_f_to_c_table entry */
> diff --git a/ompi/mca/pml/ob1/pml_ob1_recvfrag.c 
> b/ompi/mca/pml/ob1/pml_ob1_recvfrag.c
> index 5f3f8fd..1d0f881 100644
> --- a/ompi/mca/pml/ob1/pml_ob1_recvfrag.c
> +++ b/ompi/mca/pml/ob1/pml_ob1_recvfrag.c
> @@ -128,7 +128,7 @@ void 
> mca_pml_ob1_recv_frag_callback_match(mca_btl_base_module_t* btl,
>  
>  /* communicator pointer */
>  comm_ptr = ompi_comm_lookup(hdr->hdr_ctx);
> -if(OPAL_UNLIKELY(NULL == comm_ptr)) {
> +if(OPAL_UNLIKELY(NULL == comm_ptr || (ompi_communicator_t *)0x1 == 
> comm_ptr)) {
>  /* This is a special case. A message for a not yet existing
>   * communicator can happens. Instead of doing a matching we
>   * will temporarily add it the a pending queue in the PML.
> 
> On 9/7/2016 2:28 AM, George Bosilca wrote:
>> I can make MPI_Issend_rtoa deadlock with vader and sm.
>> 
>>   George.
>> 
>>

Re: [OMPI devel] FYI: soon to lose IA64 access

2016-08-30 Thread Nathan Hjelm

This might be the last straw in IA64 support. If we can’t even test it anymore 
it might *finally* be time to kill the asm. If someone wants to use IA64 they 
can use the builtin atomic support.

-Nathan

> On Aug 30, 2016, at 4:42 PM, Paul Hargrove  wrote:
> 
> I don't recall the details of the last discussion over which CPU 
> architectures would be dropped effective when.
> However, apparently IA64 support is still present in both 2.0.1rc2 and master
> 
> I suspect that I am currently the only member of this community with the 
> ability to test IA64.
> So, I thought I should let you know that the owners of the 4-CPU IA64-based 
> Altix I've been using are planning to retire it.
> No date has been set that I know of, but "soon".
> 
> If any of you are interested in receiving a donation of a 2U rack-mount Altix 
> system, I can make the necessary introductions.
> I have already declined the offer.
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

The best way to put this is his compiler defaults to --std=gnu89. That gives
him about 90% of what we require from C99 but has weirdness like __restrict.
The real solution is the list of functions that are called out on link and spot
fixing with the gnu_inline attribute if -fgnu89-inline does not work.

-Nathan

On Aug 30, 2016, at 09:23 AM, "r...@open-mpi.org" wrote:

Chris

At the risk of being annoying, it would really help me if you could answer my
question: is Gilles correct in his feeling that we are looking at a scenario
where you can support 90% of C99 (e.g., C99-style comments, named structure
fields), and only the things modified in this PR are required?

I’m asking because these changes are minor and okay, but going back thru the
code to revise all the comments and other C99isms would be a rather large task.

On Aug 30, 2016, at 7:06 AM, C Bergström wrote:

On Tue, Aug 30, 2016 at 9:20 PM, Jeff Squyres (jsquyres)
wrote:
On Aug 29, 2016, at 11:42 PM, C Bergström wrote:

Paul - Is this your typical post? I can't tell if you're trying to be
rude or it's accidental.

I believe that multiple people on this thread are reacting to the
passive-aggressive tones and negative connotations in charged replies.

Total bullshit - If any of my replies were "charged", passive
aggressive or otherwise that was not my intention. Anyone who I
thought has replied rudely, I have called out directly and I don't
mince words.

I'm not interested to spend 50 replies on 3 small patches. If you guys
don't care about platform X, Foo compiler or older standards I respect
that. My 1st email started with what I consider a humble tone. My
patches are public and I've given all the details I have time for.

Last try

I'd like to see:

1. The specific link error that we're talking about.

As posted before - the error is *exactly* the same as in the public
clang bug report.

(Thanks to Nathan)
https://webcache.googleusercontent.com/search?q=cache:p2WZm7Vlt2gJ:https://llvm.org/bugs/show_bug.cgi%3Fid%3D5960+=1=en=clnk=us

2. All the information listed on https://www.open-mpi.org/community/help/ for
compile/build problems.

I'm not going to shift threw this wall of text to try to figure out
what you feel is missing. (Now my tone is "curt" if I have to be
clear)

3. More complete patches for fixing the issues. Specifically, the 3 provided
patches fix certain issues in some parts of the code base, but the same issues
occur in other places in the code base. As such, the provided patches are not
complete.

The patches against 1.x are complete. If you want to test and fix 2.x
branch or git master feel free to pull my patches when I push them to
our github.

You can verify the patches with clang and SLES10. In the near future
it's likely I'll even post prebuilt binaries of clang which could be
used for easier validation. There's also of course the nightly EKOPath
builds that are available.. etc etc

In parting - I will test LDFLAGS|CFLAGS=“-fgnu89-inline” and if it
does indeed fix the issue without side effects I'll let you guys know.
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-29 Thread Nathan Hjelm

Looking at the bug in google cache 
(https://webcache.googleusercontent.com/search?q=cache:p2WZm7Vlt2gJ:https://llvm.org/bugs/show_bug.cgi%3Fid%3D5960+=1=en=clnk=us)
 then isn’t the answer to just use -fgnu89-inline on this platform? Does that 
not solve the linking issue? From what I can tell gcc activates this hack by 
default if it detects a bad glibc version.

Would have been helpful to have this bug be mentioned from the get go. We can 
try to workaround the inline issue but we needed to know that was what was 
going on. Can you try with ./configure CFLAGS=“-fgnu89-inline” and see if it 
works? If not can you send the link failure?

-Nathan

> On Aug 29, 2016, at 9:42 PM, C Bergström  wrote:
> 
> On Tue, Aug 30, 2016 at 5:49 AM, Paul Hargrove  wrote:
>> 
>> On Mon, Aug 29, 2016 at 8:32 AM, C Bergström 
>> wrote:
>> [...snip...]
>>> 
>>> Based on the latest response - it seems that we'll just fork OMPI and
>>> maintain those patches on top. I'll advise our customers not to use
>>> OMPI and document why.
>>> 
>>> Thanks again
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> 
>> 
>> Though I participate on this list, I am not one of the Open MPI developers,
>> and do not pretend to speak for them.
>> 
>> So, speaking only for myself, I already recommend that users of any recent
>> Open MPI avoid compiling it using the PathScale compilers.
>> My own testing shows that both ekopath-5.0.5 and ekopath-6.0.527 experience
>> Internal Compiler Errors or SEGVs when building Open MPI, and at least one
>> other package I care about (GASNet).
>> So I think you can understand why I find it ironic that PathScale should
>> request that the Open MPI sources revert to C89 to support PathScale
>> compilers for an EOL distro.
> 
> Paul - Is this your typical post? I can't tell if you're trying to be
> rude or it's accidental.
> 
> Moving your complaint to more technical points
> #0 As stated before this issue is not exclusive to PathScale, but
> inherited from clang and root caused by glibc.
> 
> A forum post with a similar complaint/question
> http://clang-developers.42468.n3.nabble.com/minimum-glibc-on-Linux-needed-to-work-with-clang-in-c99-mode-td2093917.html
> 
> clang bugzilla is currently limited access, but when back to public
> you can get more details here
> https://llvm.org/bugs/show_bug.cgi?id=5960
> 
> 
> Again thanks for hijacking the thread, but in regards to your issue
> #1 Have you tested a newer version? (You appear to be more than a year
> off in versions and not on anything officially supported)
> 
> #2 Have you ever filed a support request with us?
> 
> #3 You should realize that we're in the process of trying to setup
> versions of OpenMPI that are validated and 100% tested. (Thus trying
> to avoid problems like this going forward)
> 
> I have no problem taking a hit on a bug or some issue, but I would
> hope that anyone an ironic sense of humor would fact check before
> complaining publicly.
> 
> My motivation isn't driven by some deficiency with our c99 support,
> but an older platform. If I tried to build this ${_} on SLES11 it
> wouldn't be a problem.
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-27 Thread Nathan Hjelm

We do not depend on any C99 specific behavior out of libc that I know of. We 
depend on the types (stdint.h) and syntax (sub-object naming, variadic macros, 
etc). A little surprised there are any linking failures with Open MPI even with 
an ancient glibc.

If the patch is simple please send it to us and we will take a look. If it 
doesn’t disrupt anything we will consider it.

-Nathan

> On Aug 27, 2016, at 7:41 AM, cbergst...@pathscale.com wrote:
> 
> It's well documented that the version of glibc that goes with SLES10 is not 
> c99. As well as that gcc's claimed c99 is not in fact conformant. Newer glibc 
> fixed this but SLES10 is stuck. I can provide exact documentation links if 
> necessary. 
> 
> Clang and any real c99 compiler fails at link time.
> 
> This effects all versions of clang or us up to svn trunk.
> 
> The patch is simple and non-performance impacting.
> 
>   Original Message  
> From: Nathan Hjelm
> Sent: Saturday, August 27, 2016 20:23
> To: Open MPI Developers
> Reply To: Open MPI Developers
> Subject: Re: [OMPI devel] C89 support
> 
> Considering gcc more or less had full C99 support in 3.1 (2002) and SLES10 
> dates back to 2004 I find this surprising. Clangs goal from the beginning was 
> full C99 support. Checking back it looks like llvm 1.0 (2003) had C99 
> support. What version of clang/llvm are you using?
> 
> -Nathan
> 
>> On Aug 27, 2016, at 6:38 AM, C Bergström <cbergst...@pathscale.com> wrote:
>> 
>> I realize a number of changes have been made to make the codebase C99.
>> As I'm setting up more testing platforms, I found that this caused
>> Clang (and us) to be broken on SLES10. While I realize that platform
>> is quite *old*, it is still used in production at more than one sight
>> which we support. If there isn't a strong feeling against it, would
>> you guys accept a patch to get this building again..
>> 
>> Thanks
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] C89 support

2016-08-27 Thread Nathan Hjelm

Considering gcc more or less had full C99 support in 3.1 (2002) and SLES10 
dates back to 2004 I find this surprising. Clangs goal from the beginning was 
full C99 support. Checking back it looks like llvm 1.0 (2003) had C99 support. 
What version of clang/llvm are you using?

-Nathan

> On Aug 27, 2016, at 6:38 AM, C Bergström  wrote:
> 
> I realize a number of changes have been made to make the codebase C99.
> As I'm setting up more testing platforms, I found that this caused
> Clang (and us) to be broken on SLES10. While I realize that platform
> is quite *old*, it is still used in production at more than one sight
> which we support. If there isn't a strong feeling against it, would
> you guys accept a patch to get this building again..
> 
> Thanks
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Upgrading our (openSUSE) Open MPI version

2016-08-25 Thread Nathan Hjelm

__malloc_initialize_hook got “poisoned” in a newer release of glibc. We 
disabled use of that symbol in 2.0.x. It might be worth adding to 1.10.4 as 
well.

-Nathan

> On Aug 25, 2016, at 8:09 AM, Karol Mroz  wrote:
> 
> __malloc_initialize_hook

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] Coll/sync component missing???

2016-08-19 Thread Nathan Hjelm

> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote:
> 
> Hi folks
> 
> I had a question arise regarding a problem being seen by an OMPI user - has 
> to do with the old bugaboo I originally dealt with back in my LANL days. The 
> problem is with an app that repeatedly hammers on a collective, and gets 
> overwhelmed by unexpected messages when one of the procs falls behind.

I did some investigation on roadrunner several years ago and determined that 
the user code issue coll/sync was attempting to fix was due to a bug in 
ob1/cksum (really can’t remember). coll/sync was simply masking a live-lock 
problem. I committed a workaround for the bug in r26575 
(https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d14ec4d2a2e5ef0cea5e35)
 and tested it with the user code. After this change the user code ran fine 
without coll/sync. Since lanl no longer had any users of coll/sync we stopped 
supporting it.

> I solved this back then by introducing the “sync” component in ompi/mca/coll, 
> which injected a barrier operation every N collectives. You could even “tune” 
> it by doing the injection for only specific collectives.
> 
> However, I can no longer find that component in the code base - I find it in 
> the 1.6 series, but someone removed it during the 1.7 series.
> 
> Can someone tell me why this was done??? Is there any reason not to bring it 
> back? It solves a very real, not uncommon, problem.
> Ralph

This was discussed during one (or several) tel-cons years ago. We agreed to 
kill it and bring it back if there is 1) a use case, and 2) someone is willing 
to support it. See 
https://github.com/open-mpi/ompi/commit/5451ee46bd6fcdec002b333474dec919475d2d62
 .

Can you link the user email?

-Nathan
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-09 Thread Nathan Hjelm

gt;> Hi, here is the gdb output for additional information:
>>>>>>>> 
>>>>>>>> (It might be inexact, because I built openmpi-2.0.0 without debug
>>>>> option)
>>>>>>>> 
>>>>>>>> Core was generated by `osu_bw'.
>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>> #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
>>>>>>>> (gdb) where
>>>>>>>> #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
>>>>>>>> #1 0x0031d9008934 in _Unwind_Backtrace ()
>>>>> from /lib64/libgcc_s.so.1
>>>>>>>> #2 0x0037ab8e5ee8 in backtrace () from /lib64/libc.so.6
>>>>>>>> #3 0x2ad882bd4345 in opal_backtrace_print ()
>>>>>>>> at ./backtrace_execinfo.c:47
>>>>>>>> #4 0x2ad882bd1180 in show_stackframe () at ./stacktrace.c:331
>>>>>>>> #5 
>>>>>>>> #6 mca_pml_ob1_recv_request_schedule_once ()
>>>>> at ./pml_ob1_recvreq.c:983
>>>>>>>> #7 0x2aaab412f47a in mca_pml_ob1_recv_request_progress_rndv ()
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so
>>>>>>>> #8 0x2aaab412c645 in mca_pml_ob1_recv_frag_match ()
>>>>>>>> at ./pml_ob1_recvfrag.c:715
>>>>>>>> #9 0x2aaab412bba6 in mca_pml_ob1_recv_frag_callback_rndv ()
>>>>>>>> at ./pml_ob1_recvfrag.c:267
>>>>>>>> #10 0x2f2748d3 in mca_btl_vader_poll_handle_frag ()
>>>>>>>> at ./btl_vader_component.c:589
>>>>>>>> #11 0x2f274b9a in mca_btl_vader_component_progress ()
>>>>>>>> at ./btl_vader_component.c:231
>>>>>>>> #12 0x2ad882b916fc in opal_progress () at
>>>>> runtime/opal_progress.c:224
>>>>>>>> #13 0x2ad8820a9aa5 in ompi_request_default_wait_all () at
>>>>>>>> request/req_wait.c:77
>>>>>>>> #14 0x2ad8820f10dd in PMPI_Waitall () at ./pwaitall.c:76
>>>>>>>> #15 0x00401108 in main () at ./osu_bw.c:144
>>>>>>>> 
>>>>>>>> Tetsuya
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2016/08/08 12:34:57、"devel"さんは「Re: [OMPI devel] sm BTL
>>> performace
>>>>> of
>>>>>>>> the openmpi-2.0.0」で書きました
>>>>>>>> Hi, it caused segfault as below:
>>>>>>>> [manage.cluster:25436] MCW rank 0 bound to socket 0[core 0[hwt
>>>>> 0]],socket
>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt
> 0]],
>>>>>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt
>>>>>>> 0]]:[B/B/B/B/B/B][./././././.][manage.cluster:25436] MCW rank 1
> bound
>>>>> to
>>>>>>> socket 0[core
>>>>>>>> 0[hwt 0]],socket
>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt
> 0]],
>>>>>>> socket 0[core 4[hwt 0]], socket 0[core 5[hwt
>>>>>>> 0]]:[B/B/B/B/B/B][./././././.]# OSU MPI Bandwidth Test v3.1.1# Size
>>>>>>> Bandwidth (MB/s)1
>>>>>>>> 2.232 4.514 8.998 17.8316 35.1832 69.6664 109.84128 179.65256
>>>>> 303.52512
>>>>>>> 532.811024 911.742048 1605.294096 1598.738192 2135.9416384
>>> 2468.9832768
>>>>>>> 2818.3765536 3658.83131072 4200.50262144 4545.01524288
>>>>>>>> 4757.841048576 4831.75[manage:25442] *** Process received signal
>>>>>>> ***[manage:25442] Signal: Segmentation fault (11)[manage:25442]
>>> Signal
>>>>>>> code: Address not mapped (1)[manage:25442] Failing at address:
>>>>>>>> 0x8
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> --
>>>>>>>> mpirun noticed that process rank 1 with PID 0 on node manage
> exited
>>>>>>> onsignal 11 (Segmentation fault).
>>>>>>>> 
>>>>>>>

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm

gt;>> Let me know if this works:>> diff --git
>>>>>> a/ompi/mca/pml/ob1/pml_ob1_rdma.cb/ompi/mca/pml/ob1/pml_ob1_rdma.c>
>>> index
>>>>> 888e126..a3ec6f8 100644> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c> +++
>>>>> b/ompi/mca/pml/ob1/pml_ob1_rdma.c> @@ -42,6 +42,7 @@
>>>>>> size_t mca_pml_ob1_rdma_btls(> mca_pml_ob1_com_btl_t* rdma_btls)> {>
>>> int
>>>>> num_btls = mca_bml_base_btl_array_get_size(_endpoint->btl_rdma);
>>>>>>> + int num_eager_btls = mca_bml_base_btl_array_get_size
>>> (_endpoint->
>>>>> btl_eager);> double weight_total = 0;> int num_btls_used = 0;>> @@
>>> -57,6
>>>>> +58,21 @@ size_t mca_pml_ob1_rdma_btls(>
>>>>>> (bml_endpoint->btl_rdma_index + n) % num_btls);>
>>>>> mca_btl_base_registration_handle_t *reg_handle = NULL;>
>>>>> mca_btl_base_module_t *btl = bml_btl->btl;> + bool ignore = true;> +>
>>> + /*
>>>>> do not use rdma
>>>>>> btls that are not in the eager list. thisis
>>>>>> necessary to avoid using> + * btls that exist on the endpoint only
> to
>>>>> support RMA. */> + for (int i = 0 ; i < num_eager_btls ; ++i) {> +
>>>>> mca_bml_base_btl_t *eager_btl
>>>>>> =mca_bml_base_btl_array_get_index (_endpoint->btl_eager, i);> +
> if
>>>>> (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> + ignore =
>>> false;> +
>>>>> break;> + }> + }> +> + if (ignore) {> + continue;> +
>>>>>> }>> if (btl->btl_register_mem) {> /* do not use the RDMA protocol
> with
>>>>> this btl if 1) leave pinned isdisabled,> @@ -99,18 +115,34 @@ size_t
>>>>> mca_pml_ob1_rdma_pipeline_btls( mca_bml_base_endpoint_t*
>>>>>> bml_endpoint,> size_t size,> mca_pml_ob1_com_btl_t* rdma_btls )> {>
> -
>>> int
>>>>> i, num_btls = mca_bml_base_btl_array_get_size(_endpoint->
>>> btl_rdma);> +
>>>>> int num_btls = mca_bml_base_btl_array_get_size
>>>>>> (_endpoint->btl_rdma);> + int num_eager_btls =
>>>>> mca_bml_base_btl_array_get_size(_endpoint->btl_eager);> double
>>>>> weight_total = 0;> + int rdma_count = 0;>> - for(i = 0; i < num_btls
> &&
>>> i <
>>>>> 
>>>>>> mca_pml_ob1.max_rdma_per_request; i+
>>>>>> +) {> - rdma_btls[i].bml_btl => - mca_bml_base_btl_array_get_next
>>>>> (_endpoint->btl_rdma);> - rdma_btls[i].btl_reg = NULL;> + for(int
> i
>>> =
>>>>> 0; i < num_btls && i <mca_pml_ob1.max_rdma_per_request;
>>>>>> i++) {> + mca_bml_base_btl_t *bml_btl =
>>> mca_bml_base_btl_array_get_next
>>>>> (_endpoint->btl_rdma);> + bool ignore = true;> +> + for (int i =
>>> 0 ; i
>>>>> < num_eager_btls ; ++i) {> + mca_bml_base_btl_t
>>>>>> *eager_btl =mca_bml_base_btl_array_get_index (_endpoint->
>>> btl_eager,
>>>>> i);> + if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> +
>>> ignore =
>>>>> false;> + break;> + }> + }>> - weight_total +=
>>>>>> rdma_btls[i].bml_btl->btl_weight;> + if (ignore) {> + continue;>
> + }>
>>> +>
>>>>> + rdma_btls[rdma_count].bml_btl = bml_btl;> + rdma_btls[rdma_count+
>>>>> +].btl_reg = NULL;> +> + weight_total +=
>>>>>> bml_btl->btl_weight;> }>> - mca_pml_ob1_calc_weighted_length
>>> (rdma_btls,
>>>>> i, size,weight_total);
>>>>>>> + mca_pml_ob1_calc_weighted_length (rdma_btls, rdma_count,
>>>>> size,weight_total);>> - return i;> + return rdma_count;> }>>>>> > On
>>> Aug 7,
>>>>> 2016, at 6:51 PM, Nathan Hjelm <hje...@me.com> wrote:> >> >
>>>>>> Looks like the put path probably needs a similar patch. Will
>>> sendanother
>>>>> patch soon.> >> >> On Aug 7, 2016, at 6:01 PM,
>>> tmish...@jcity.maeda.co.jp
>>>>> wrote:> >>> >> Hi,> >>> >> I applied the patch to
>>>>>> the file "pml_ob1_rdma.c" and ran osu_bwagain.
>>>>>>>>> Then, I still see the bad performance for larger size
&g

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread Nathan Hjelm

; 0]]:[B/B/B/B/B/B][./././././.]# OSU MPI Bandwidth Test v3.1.1# Size
> Bandwidth (MB/s)1
>> 2.232 4.514 8.998 17.8316 35.1832 69.6664 109.84128 179.65256 303.52512
> 532.811024 911.742048 1605.294096 1598.738192 2135.9416384 2468.9832768
> 2818.3765536 3658.83131072 4200.50262144 4545.01524288
>> 4757.841048576 4831.75[manage:25442] *** Process received signal
> ***[manage:25442] Signal: Segmentation fault (11)[manage:25442] Signal
> code: Address not mapped (1)[manage:25442] Failing at address:
>> 0x8
>> 
> --
>> mpirun noticed that process rank 1 with PID 0 on node manage exited
> onsignal 11 (Segmentation fault).
>> 
> --
>> 
>> Tetsuya Mishima
>> 
>> 2016/08/08 10:12:05、"devel"さんは「Re: [OMPI devel] sm BTL performace
> ofthe openmpi-2.0.0」で書きました> This patch also modifies the put path.
> Let me know if this works:>> diff --git
>> a/ompi/mca/pml/ob1/pml_ob1_rdma.cb/ompi/mca/pml/ob1/pml_ob1_rdma.c> index
> 888e126..a3ec6f8 100644> --- a/ompi/mca/pml/ob1/pml_ob1_rdma.c> +++
> b/ompi/mca/pml/ob1/pml_ob1_rdma.c> @@ -42,6 +42,7 @@
>> size_t mca_pml_ob1_rdma_btls(> mca_pml_ob1_com_btl_t* rdma_btls)> {> int
> num_btls = mca_bml_base_btl_array_get_size(_endpoint->btl_rdma);
>>> + int num_eager_btls = mca_bml_base_btl_array_get_size(_endpoint->
> btl_eager);> double weight_total = 0;> int num_btls_used = 0;>> @@ -57,6
> +58,21 @@ size_t mca_pml_ob1_rdma_btls(>
>> (bml_endpoint->btl_rdma_index + n) % num_btls);>
> mca_btl_base_registration_handle_t *reg_handle = NULL;>
> mca_btl_base_module_t *btl = bml_btl->btl;> + bool ignore = true;> +> + /*
> do not use rdma
>> btls that are not in the eager list. thisis
>> necessary to avoid using> + * btls that exist on the endpoint only to
> support RMA. */> + for (int i = 0 ; i < num_eager_btls ; ++i) {> +
> mca_bml_base_btl_t *eager_btl
>> =mca_bml_base_btl_array_get_index (_endpoint->btl_eager, i);> + if
> (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> + ignore = false;> +
> break;> + }> + }> +> + if (ignore) {> + continue;> +
>> }>> if (btl->btl_register_mem) {> /* do not use the RDMA protocol with
> this btl if 1) leave pinned isdisabled,> @@ -99,18 +115,34 @@ size_t
> mca_pml_ob1_rdma_pipeline_btls( mca_bml_base_endpoint_t*
>> bml_endpoint,> size_t size,> mca_pml_ob1_com_btl_t* rdma_btls )> {> - int
> i, num_btls = mca_bml_base_btl_array_get_size(_endpoint->btl_rdma);> +
> int num_btls = mca_bml_base_btl_array_get_size
>> (_endpoint->btl_rdma);> + int num_eager_btls =
> mca_bml_base_btl_array_get_size(_endpoint->btl_eager);> double
> weight_total = 0;> + int rdma_count = 0;>> - for(i = 0; i < num_btls && i <
> 
>> mca_pml_ob1.max_rdma_per_request; i+
>> +) {> - rdma_btls[i].bml_btl => - mca_bml_base_btl_array_get_next
> (_endpoint->btl_rdma);> - rdma_btls[i].btl_reg = NULL;> + for(int i =
> 0; i < num_btls && i <mca_pml_ob1.max_rdma_per_request;
>> i++) {> + mca_bml_base_btl_t *bml_btl = mca_bml_base_btl_array_get_next
> (_endpoint->btl_rdma);> + bool ignore = true;> +> + for (int i = 0 ; i
> < num_eager_btls ; ++i) {> + mca_bml_base_btl_t
>> *eager_btl =mca_bml_base_btl_array_get_index (_endpoint->btl_eager,
> i);> + if (eager_btl->btl_endpoint == bml_btl->btl_endpoint) {> + ignore =
> false;> + break;> + }> + }>> - weight_total +=
>> rdma_btls[i].bml_btl->btl_weight;> + if (ignore) {> + continue;> + }> +>
> + rdma_btls[rdma_count].bml_btl = bml_btl;> + rdma_btls[rdma_count+
> +].btl_reg = NULL;> +> + weight_total +=
>> bml_btl->btl_weight;> }>> - mca_pml_ob1_calc_weighted_length(rdma_btls,
> i, size,weight_total);
>>> + mca_pml_ob1_calc_weighted_length (rdma_btls, rdma_count,
> size,weight_total);>> - return i;> + return rdma_count;> }>>>>> > On Aug 7,
> 2016, at 6:51 PM, Nathan Hjelm <hje...@me.com> wrote:> >> >
>> Looks like the put path probably needs a similar patch. Will sendanother
> patch soon.> >> >> On Aug 7, 2016, at 6:01 PM, tmish...@jcity.maeda.co.jp
> wrote:> >>> >> Hi,> >>> >> I applied the patch to
>> the file "pml_ob1_rdma.c" and ran osu_bwagain.
>>>>> Then, I still see the bad performance for larger size(>=2097152 ).>
>>>>>> [mishima@manage OMB-

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread Nathan Hjelm

sm is deprecated in 2.0.0 and will likely be removed in favor of vader in 2.1.0.

This issue is probably this known issue: 
https://github.com/open-mpi/ompi-release/pull/1250

Please apply those commits and see if it fixes the issue for you.

-Nathan

> On Jul 26, 2016, at 6:17 PM, tmish...@jcity.maeda.co.jp wrote:
> 
> Hi Gilles,
> 
> Thanks. I ran again with --mca pml ob1 but I've got the same results as
> below:
> 
> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -bind-to
> core -report-bindings osu_bw
> [manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././.][./././././.]
> [manage.cluster:18142] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././.][./././././.]
> # OSU MPI Bandwidth Test v3.1.1
> # SizeBandwidth (MB/s)
> 1 1.48
> 2 3.07
> 4 6.26
> 812.53
> 16   24.33
> 32   49.03
> 64   83.46
> 128 132.60
> 256 234.96
> 512 420.86
> 1024842.37
> 2048   1231.65
> 4096264.67
> 8192472.16
> 16384   740.42
> 32768  1030.39
> 65536  1191.16
> 131072 1269.45
> 262144 1238.33
> 524288 1247.97
> 10485761257.96
> 20971521274.74
> 41943041280.94
> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
> self,sm -bind-to core -report-bindings osu_b
> w
> [manage.cluster:18204] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././.][./././././.]
> [manage.cluster:18204] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././.][./././././.]
> # OSU MPI Bandwidth Test v3.1.1
> # SizeBandwidth (MB/s)
> 1 0.52
> 2 1.05
> 4 2.08
> 8 4.18
> 168.21
> 32   16.65
> 64   32.60
> 128  66.70
> 256 132.45
> 512 269.27
> 1024504.63
> 2048819.76
> 4096874.54
> 8192   1447.11
> 16384  2263.28
> 32768  3236.85
> 65536  3567.34
> 131072 3555.17
> 262144 3455.76
> 524288 3441.80
> 10485763505.30
> 20971523534.01
> 41943043546.94
> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
> self,sm,openib -bind-to core -report-binding
> s osu_bw
> [manage.cluster:18218] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././.][./././././.]
> [manage.cluster:18218] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././.][./././././.]
> # OSU MPI Bandwidth Test v3.1.1
> # SizeBandwidth (MB/s)
> 1 0.51
> 2 1.03
> 4 2.05
> 8 4.07
> 168.14
> 32   16.32
> 64   32.98
> 128  63.70
> 256 126.66
> 512 252.61
> 1024480.22
> 2048810.54
> 4096290.61
> 8192512.49
> 16384   764.60
> 32768  1036.81
> 65536  1182.81
> 131072 1264.48
> 262144 1235.82
> 524288 1246.70
> 10485761254.66
> 20971521274.64
> 41943041280.65
> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
> self,openib -bind-to core -report-bindings o
> su_bw
> [manage.cluster:18276] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././.][./././././.]
> [manage.cluster:18276] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././.][./././././.]
> # OSU MPI Bandwidth Test v3.1.1
> # SizeBandwidth (MB/s)
> 1 0.54
> 2 1.08
> 4 2.18
> 8 4.33
> 168.69
> 32   17.39
> 64   34.34
> 128  66.28
> 256 130.36
> 512 241.81
> 1024429.86
> 2048553.44
> 4096707.14
> 8192879.60
> 16384   763.02
> 32768  1042.89
> 65536  1185.45
> 131072 1267.56
> 262144 1227.41
> 524288 1244.61
> 10485761255.66
> 20971521273.55
>

Re: [OMPI devel] OpenMPI 2.0 and Petsc 3.7.2

2016-07-25 Thread Nathan Hjelm

It looks to me like double free on both send and receive requests. The receive 
free is an extra OBJ_RELEASE of MPI_DOUBLE which was not malloced (invalid 
free). The send free is an assert failure in OBJ_RELEASE of an OBJ_NEW() object 
(invalid magic). I plan to look at in in the next couple of days. Let me know 
if you figure it out before I get to it.

-Nathan

> On Jul 25, 2016, at 8:38 PM, Gilles Gouaillardet  wrote:
> 
> Eric,
> 
> where can your test case be downloaded ? how many nodes and tasks do you need 
> to reproduce the bug ?
> 
> fwiw, currently there are two Open MPI repositories
> - https://github.com/open-mpi/ompi
>  there is only one branch and is the 'master' branch, today, this can be seen 
> as Open MPI 3.0 pre alpha
> - https://github.com/open-mpi/ompi-release
>  the default branch is 'v2.x', today, this can be seen as Open MPI 2.0.1 pre 
> alpha
> 
> Cheers,
> 
> Gilles
> 
> On 7/26/2016 3:33 AM, Eric Chamberland wrote:
>> Hi,
>> 
>> has someone tried OpenMPI 2.0 with Petsc 3.7.2?
>> 
>> I am having some errors with petsc, maybe someone have them too?
>> 
>> Here are the configure logs for PETSc:
>> 
>> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
>>  
>> 
>> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
>>  
>> 
>> And for OpenMPI:
>> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
>>  
>> 
>> (in fact, I am testing the ompi-release branch, a sort of petsc-master 
>> branch, since I need the commit 9ba6678156).
>> 
>> For a set of parallel tests, I have 104 that works on 124 total tests.
>> 
>> And the typical error:
>> *** Error in 
>> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
>>  free(): invalid pointer:
>> === Backtrace: =
>> /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
>> /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
>> /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
>> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60] 
>> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
>> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
>> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
>> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6] 
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
>>  
>> 
>> a similar one:
>> *** Error in 
>> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
>>  free(): invalid pointer: 0x7f382a7c5bc0 ***
>> === Backtrace: =
>> /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
>> /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
>> /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
>> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60] 
>> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
>> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
>> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
>> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6] 
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
>>  
>> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
>>  
>>

Re: [OMPI devel] RFC: remove --disable-smp-locks

2016-07-12 Thread Nathan Hjelm

> On Jul 12, 2016, at 12:01 AM, Sreenidhi Bharathkar Ramesh 
>  wrote:
> 
> [ query regarding an old thread ]
> 
> Hi,
> 
> It looks like "--disable-smp-locks" is still available as an option.
> 
> 1. Will this be continued or deprecated ?

It was completely discontinued. The problem with the option is it made the 
opal_atomic_* functions non-atomic (on x86 at least). That is fine if there is 
only a single core in use on a node but caused problems with shared memory 
communication. The shared memory transports absolutely need the atomics to be 
atomic to work correctly. This RFC came up because users were trying to use the 
option and were running into issues. Made sense to just kill it.

> 2. Under what circumstances would "--disable-smp-locks" be useful ?
> In our experiments on ARM64 platform, it was seen that OSU Micro collective
> benchmarks actually degraded when "--disable-smp-locks" was used.  Hence,
> asking.

The optimization was likely meant for MPI_THREAD_SINGLE. The problem is we 
already optimize that case with the OPAL_THREAD_* macros which use atomics only 
if threads are in use. Others may be able to explain the intent.

-Nathan

Re: [OMPI devel] BML/R2 error

2016-07-03 Thread Nathan Hjelm

Its correct as written. the && takes precidence over the || and the statement 
gets evaluated in the order i intended. i will add the parentheses to quiet the 
warning when i get a chance

> On Jul 3, 2016, at 9:01 AM, Ralph Castain  wrote:
> 
> I agree with the compiler - I can’t figure out exactly what was meant here 
> either:
> 
> bml_r2.c: In function ‘mca_bml_r2_endpoint_add_btl’:
> bml_r2.c:271:21: warning: suggest parentheses around ‘&&’ within ‘||’ 
> [-Wparentheses]
> if ((btl_in_use && (btl_flags & MCA_BTL_FLAGS_RDMA) ||
>  ~~~^~~
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/07/19150.php

Re: [OMPI devel] 2.0.0rc3 MPI_Comm_split_type()

2016-06-16 Thread Nathan Hjelm


https://github.com/open-mpi/ompi/pull/1788

On Jun 16, 2016, at 05:16 PM, Nathan Hjelm <hje...@me.com> wrote:

Not sure why happened but it is indeed a regression. Will submit a fix now.

-Nathan

On Jun 16, 2016, at 02:19 PM, Lisandro Dalcin <dalc...@gmail.com> wrote:

Could you please check/confirm you are supporting passing
split_type=MPI_UNDEFINED to MPI_Comm_split_type() ? IIRC, this is a
regression from 2.0.0rc2.

$ cat test-comm-split-type.py
from mpi4py import MPI
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
assert subcomm == MPI.COMM_NULL

$ mpiexec -n 1 python test-comm-split-type.py
Traceback (most recent call last):
File "test-comm-split-type.py", line 2, in 
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
File "MPI/Comm.pyx", line 214, in mpi4py.MPI.Comm.Split_type
(src/mpi4py.MPI.c:95252)
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind


--
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459
___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/06/19123.php
___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/06/19124.php

Re: [OMPI devel] 2.0.0rc3 MPI_Comm_split_type()

2016-06-16 Thread Nathan Hjelm


Not sure why happened but it is indeed a regression. Will submit a fix now.

-Nathan

On Jun 16, 2016, at 02:19 PM, Lisandro Dalcin  wrote:

Could you please check/confirm you are supporting passing
split_type=MPI_UNDEFINED to MPI_Comm_split_type() ? IIRC, this is a
regression from 2.0.0rc2.

$ cat test-comm-split-type.py
from mpi4py import MPI
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
assert subcomm == MPI.COMM_NULL

$ mpiexec -n 1 python test-comm-split-type.py
Traceback (most recent call last):
File "test-comm-split-type.py", line 2, in 
subcomm = MPI.COMM_WORLD.Split_type(MPI.UNDEFINED)
File "MPI/Comm.pyx", line 214, in mpi4py.MPI.Comm.Split_type
(src/mpi4py.MPI.c:95252)
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind


--
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459
___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/06/19123.php

Re: [OMPI devel] Open MPI v2.0.0rc3 now available

2016-06-16 Thread Nathan Hjelm

Will track this one down as well tomorrow.

-Nathan

> On Jun 15, 2016, at 7:13 PM, Paul Hargrove  wrote:
> 
> With a PPC64/Fedora20/gcc-4.8.3 system configuring for "-m32":
> 
> configure --prefix=[...] --enable-debug \
> CFLAGS=-m32 --with-wrapper-cflags=-m32 \
> CXXFLAGS=-m32 --with-wrapper-cxxflags=-m32 \
> FCFLAGS=-m32 --with-wrapper-fcflags=-m32 --disable-mpi-fortran
> 
> Build fails with
> /bin/sh ../../../libtool  --tag=CC   --mode=link gcc -std=gnu99  -m32 -g 
> -finline-functions -fno-strict-aliasing -pthread   -o opal_wrapper 
> opal_wrapper.o ../../../opal/libopen-pal.la -lrt -lm -lutil
> libtool: link: gcc -std=gnu99 -m32 -g -finline-functions -fno-strict-aliasing 
> -pthread -o .libs/opal_wrapper opal_wrapper.o  
> ../../../opal/.libs/libopen-pal.so -ldl -lrt -lm -lutil -pthread -Wl,-rpath 
> -Wl,/home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc32-gcc/INST/lib
> ../../../opal/.libs/libopen-pal.so: undefined reference to `OPAL_THREAD_ADD64'
> collect2: error: ld returned 1 exit status
> 
> -Paul
> 
> On Wed, Jun 15, 2016 at 3:19 PM, Howard Pritchard  wrote:
> We are now feature complete for v2.0.0 and would appreciate testing by 
> developers and end users before we finalize the v2.0.0 release.  In that 
> light, v2.0.0rc3 is now available:
> 
> https://www.open-mpi.org/software/ompi/v2.x/
> 
> Here are the changes since 2.0.0rc2:
> 
> - The MPI C++ bindings -- which were removed from the MPI standard in
>v3.0 -- are no longer built by default and will be removed in some
>future version of Open MPI.  Use the --enable-mpi-cxx-bindings
>configure option to build the deprecated/removed MPI C++ bindings.
> 
> --> NOTE: this is not new, actually -- but we just added it to the NEWS file.
> 
> - In environments where mpirun cannot automatically determine the
>   number of slots available (e.g., when using a hostfile that does not
>   specify "slots", or when using --host without specifying a ":N"
>   suffix to hostnames), mpirun now requires the use of "-np N" to
>   specify how many MPI processes to launch.
> 
> - Many updates and fixes to the revamped memory hooks infrastructure
> 
> - Various configure-related compatibility updates for newer versions
>of libibverbs and OFED.
> 
> - Properly detect Intel TrueScale and OmniPath devices in the ACTIVE
>   state.  Thanks to Durga Choudhury for reporting the issue.
> 
> - Fix MPI_IREDUCE_SCATTER_BLOCK for a one-process communicator. Thanks
>to Lisandro Dalcin for reporting.
> 
> - Fix detection and use of Solaris Studio 12.5 (beta) compilers.
>Thanks to Paul Hargrove for reporting and debugging.
> 
> - Allow NULL arrays when creating empty MPI datatypes.
> 
> - Miscellaneous minor bug fixes in the hcoll component.
> 
> - Miscellaneous minor bug fixes in the ugni component.
> 
> - Fix various small memory leaks.
> 
> - Notable new MCA parameters:
> 
>-  opal_progress_lp_call_ration: Control how often low-priority
>   callbacks are made during Open MPI's main progress loop.
> 
> - Disable backtrace support by default in the PSM/PSM2 libraries to
>   prevent unintentional conflicting behavior.
> 
> Thanks,
> 
> Howard
> 
> --
> 
> Howard Pritchard
> HPC-DES
> Los Alamos National Laboratory
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19103.php
> 
> 
> 
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19104.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] [2.0.0rc3] ppc64el/XLC crash in intercept_mmap()

2016-06-16 Thread Nathan Hjelm

Ok, that was working. Our PPC64 system is back up and I will finally be able to 
fix it tomorrow.

-Nathan

> On Jun 15, 2016, at 7:35 PM, Paul Hargrove  wrote:
> 
> Also seen now on a big-endian Power7 with XLC-13.1
> 
> -Paul
> 
> On Wed, Jun 15, 2016 at 6:20 PM, Paul Hargrove  wrote:
> On a little-endian Power8 with XLC-13.1.2 I see a crash not seen with 
> gcc-4.9.2:
> 
> make[4]: Entering directory 
> '/home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc64el-xlc/BLD/ompi/debuggers'
> PASS: predefined_gap_test
> PASS: predefined_pad_test
> /home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc64el-xlc/openmpi-2.0.0rc3/config/test-driver:
>  line 107: 69310 Segmen
> tation fault  (core dumped) "$@" > $log_file 2>&1
> FAIL: dlopen_test
> 
> gdb shows:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x in ?? ()
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-14.fc21.ppc64le elfutils-libelf-0.163-1.fc21.ppc64le 
> elfutils-libs-0.163-1.fc21.ppc64le glibc-2.20-7.fc21.ppc64le 
> libgcc-4.9.2-1.fc21.ppc64le systemd-libs-216-17.fc21.ppc64le 
> xz-libs-5.1.2-14alpha.fc21.ppc64le zlib-1.2.8-7.fc21.ppc64le
> (gdb) where
> #0  0x in ?? ()
> #1  0x3fff82f8e8d8 in intercept_munmap (start=0x3fff82e5, 
> length=65536)
> at 
> /home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc64el-xlc/openmpi-2.0.0rc3/opal/mca/memory/patcher/memory_patcher_component.c:155
> #2  0x3fff82b1bc80 in __GI__IO_setb () from /lib64/libc.so.6
> #3  0x3fff82b19528 in __GI__IO_file_close_it () from /lib64/libc.so.6
> #4  0x3fff82b07f74 in fclose@@GLIBC_2.17 () from /lib64/libc.so.6
> #5  0x1f7c in do_test ()
> at 
> /home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc64el-xlc/openmpi-2.0.0rc3/ompi/debuggers/dlopen_test.c:97
> #6  0x100010e0 in main (argc=1, argv=0x37505048)
> at 
> /home/phargrov/OMPI/openmpi-2.0.0rc3-linux-ppc64el-xlc/openmpi-2.0.0rc3/ompi/debuggers/dlopen_test.c:135
> 
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> 
> 
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19107.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] BTL flags

2016-06-02 Thread Nathan Hjelm

Opps, my compiler didn’t catch that. Will fix that now.

> On Jun 2, 2016, at 7:07 PM, George Bosilca  wrote:
> 
> Nathan,
> 
> I see a lot of [for once valid] complains from clang regarding the last UGNI 
> related commit. More precisely the MCA_BTL_ATOMIC_SUPPORTS_FLOAT value is too 
> large with respect to the fact that ISO C restricts a enum to int.
> 
> Can we pack the enums ?
> 
> George.
> 
> 
> ../../../../../ompi/opal/mca/btl/btl.h:326:5: warning: ISO C restricts 
> enumerator values to range of 'int'
>   (2147483648 is too large) [-Wpedantic]
> MCA_BTL_ATOMIC_SUPPORTS_FLOAT  = 0x8000,
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19064.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] Seldom deadlock in mpirun

2016-06-02 Thread Nathan Hjelm

The osc hang is fixed by a PR to fix bugs in start in cm and ob1. See #1729.

-Nathan

> On Jun 2, 2016, at 5:17 AM, Gilles Gouaillardet 
>  wrote:
> 
> fwiw,
> 
> the onsided/c_fence_lock test from the ibm test suite hangs
> 
> (mpirun -np 2 ./c_fence_lock)
> 
> i ran a git bisect and it incriminates commit 
> b90c83840f472de3219b87cd7e1a364eec5c5a29
> 
> commit b90c83840f472de3219b87cd7e1a364eec5c5a29
> Author: bosilca 
> Date:   Tue May 24 18:20:51 2016 -0500
> 
> Refactor the request completion (#1422)
> 
> * Remodel the request.
> Added the wait sync primitive and integrate it into the PML and MTL
> infrastructure. The multi-threaded requests are now significantly
> less heavy and less noisy (only the threads associated with completed
> requests are signaled).
> 
> * Fix the condition to release the request.
> 
> 
> 
> 
> I also noted a warning is emitted when running only one task
> 
> ./c_fence_lock
> 
> but I did not git bisect, so that might not be related
> 
> Cheers,
> 
> 
> 
> Gilles
> 
> 
>> On Thursday, June 2, 2016, Ralph Castain  wrote:
>> Yes, please! I’d like to know what mpirun thinks is happening - if you like, 
>> just set the —timeout N —report-state-on-timeout flags and tell me what 
>> comes out
>> 
>>> On Jun 1, 2016, at 7:57 PM, George Bosilca  wrote:
>>> 
>>> I don't think it matters. I was running the IBM collective and pt2pt tests, 
>>> but each time it deadlocked was in a different test. If you are interested 
>>> in some particular values, I would be happy to attach a debugger next time 
>>> it happens.
>>> 
>>>   George.
>>> 
>>> 
 On Wed, Jun 1, 2016 at 10:47 PM, Ralph Castain  wrote:
 What kind of apps are they? Or does it matter what you are running?
 
 
 > On Jun 1, 2016, at 7:37 PM, George Bosilca  wrote:
 >
 > I have a seldomly occurring deadlock on a OS X laptop if I use more than 
 > 2 processes). It is coming up once every 200 runs or so.
 >
 > Here is what I could gather from my experiments: All the MPI processes 
 > seem to have correctly completed (I get all the expected output and the 
 > MPI processes are in a waiting state), but somehow the mpirun does not 
 > detect their completion. As a result, mpirun never returns.
 >
 >   George.
 >
 > ___
 > devel mailing list
 > de...@open-mpi.org
 > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
 > Searchable archives: 
 > http://www.open-mpi.org/community/lists/devel/2016/06/19054.php
 
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: 
 http://www.open-mpi.org/community/lists/devel/2016/06/19054.php
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/06/19055.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19059.php

Re: [OMPI devel] contributing to Open MPI

2016-06-02 Thread Nathan Hjelm

The branch will be merged once we know make check succeeds on arm64 with 
—disable-builtin-atomics. Just waiting on that testing.

-Nathan

> On Jun 1, 2016, at 10:22 PM, Sreenidhi Bharathkar Ramesh 
> <sreenidhi-bharathkar.ram...@broadcom.com> wrote:
> 
> Thank you for the response.
> 
> Nathan,
> Sure, we will try out your branch and let you know.  Any idea when it is 
> likely to be pulled into master ?
> I presume that our changes, if any, will need to be on top of these changes.
> 
> Jeff,
> 
> A few clarifications please:
> 
> > a) run MTT regularly and submit the results to the community database
> 
> Is MTT required along with initial proposed branch ?   Also, what regularity 
> of MTT is required after submission ?
> 
> I apologize if these questions sound basic.  We are just getting started, 
> hence asking.
> 
> Thanks!
> - Sreenidhi.
> 
> On Tue, May 31, 2016 at 7:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> Let me address the questions Nathan didn't:
> 
> - testing: if you're adding support for a platform that we don't test, it's 
> likely that the code will grow stale and eventually be removed.  The best way 
> to make sure a platform stays supported is to a) run MTT regularly and submit 
> the results to the community database, and b) make sure that there are tests 
> that are exercising the code paths you're adding for your platform.
> 
> - submissions are added to release branches on a rolling basis.  You're too 
> late for the 2.0.x series, but the door is open for the v2.1.x series (and 
> beyond, of course).
> 
> 
> 
> > On May 30, 2016, at 11:15 AM, Nathan Hjelm <hje...@me.com> wrote:
> >
> > I should clarify. The PR adds support for ARM64 atomics and CMA when the 
> > linux headers are not installed. It does not update the timer code and 
> > still needs some testing.
> >
> > -Nathan
> >
> >> On May 30, 2016, at 8:37 AM, Nathan Hjelm <hje...@me.com> wrote:
> >>
> >> We already have a PR open to add ARM64 support. Please test 
> >> https://github.com/open-mpi/ompi/pull/1634 and let me know if it works for 
> >> you. Additional contributions are greatly appreciated!
> >>
> >> -Nathan
> >>
> >> On May 30, 2016, at 4:32 AM, Sreenidhi Bharathkar Ramesh 
> >> <sreenidhi-bharathkar.ram...@broadcom.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> We may be in a position to contribute to Open MPI, initially by adding 
> >>> ARM64 support.  Specifically, atomics and Timer support.
> >>>
> >>> I have already gone through:
> >>> https://www.open-mpi.org/community/contribute/
> >>> https://www.open-mpi.org/faq/?category=contributing
> >>>
> >>> Please let me know:
> >>>
> >>> 1. baseline for the patch
> >>> 2. test logs and results that are expected
> >>> 3. any cutoffs from release timeline perspective
> >>>
> >>> Thanks,
> >>> - Sreenidhi.
> >>> ___
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> Link to this post: 
> >>> http://www.open-mpi.org/community/lists/devel/2016/05/19048.php
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2016/05/19050.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/19051.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19057.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] contributing to Open MPI

2016-05-30 Thread Nathan Hjelm

I should clarify. The PR adds support for ARM64 atomics and CMA when the linux 
headers are not installed. It does not update the timer code and still needs 
some testing.

-Nathan

> On May 30, 2016, at 8:37 AM, Nathan Hjelm <hje...@me.com> wrote:
> 
> We already have a PR open to add ARM64 support. Please test 
> https://github.com/open-mpi/ompi/pull/1634 and let me know if it works for 
> you. Additional contributions are greatly appreciated!
> 
> -Nathan
> 
> On May 30, 2016, at 4:32 AM, Sreenidhi Bharathkar Ramesh 
> <sreenidhi-bharathkar.ram...@broadcom.com> wrote:
> 
>> Hello,
>> 
>> We may be in a position to contribute to Open MPI, initially by adding ARM64 
>> support.  Specifically, atomics and Timer support.
>> 
>> I have already gone through:
>> https://www.open-mpi.org/community/contribute/
>> https://www.open-mpi.org/faq/?category=contributing
>> 
>> Please let me know:
>> 
>> 1. baseline for the patch
>> 2. test logs and results that are expected
>> 3. any cutoffs from release timeline perspective
>> 
>> Thanks,
>> - Sreenidhi.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/05/19048.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] contributing to Open MPI

2016-05-30 Thread Nathan Hjelm

We already have a PR open to add ARM64 support. Please test 
https://github.com/open-mpi/ompi/pull/1634 and let me know if it works for you. 
Additional contributions are greatly appreciated!

-Nathan

> On May 30, 2016, at 4:32 AM, Sreenidhi Bharathkar Ramesh 
>  wrote:
> 
> Hello,
> 
> We may be in a position to contribute to Open MPI, initially by adding ARM64 
> support.  Specifically, atomics and Timer support.
> 
> I have already gone through:
> https://www.open-mpi.org/community/contribute/
> https://www.open-mpi.org/faq/?category=contributing
> 
> Please let me know:
> 
> 1. baseline for the patch
> 2. test logs and results that are expected
> 3. any cutoffs from release timeline perspective
> 
> Thanks,
> - Sreenidhi. 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/19048.php

Re: [OMPI devel] One-sided failures on master

2016-05-27 Thread Nathan Hjelm

Only thing I can think of is the request stuff. It was working last time I 
tested George’s branch. I will take a look at MTT tomorrow.

-Nathan

> On May 26, 2016, at 8:24 PM, Ralph Castain  wrote:
> 
> I’m seeing a lot of onesided hangs on master when trying to run an MTT scan 
> on it tonight - did something go in that might be having trouble?
> 
> Ralph
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/19042.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI devel] Process connectivity map

2016-05-16 Thread Nathan Hjelm


add_procs is always called at least once. This is how we set up shared
memory communication. It will then be invoked on-demand for non-local
peers with the reachability argument set to NULL (because the bitmask
doesn't provide any benefit when adding only 1 peer).

-Nathan

On Tue, May 17, 2016 at 12:00:38AM +0900, Gilles Gouaillardet wrote:
>Jeff,
>this is not what I observed
>(tcp btl, 2 to 4 nodes with one task per node, cutoff=0)
>the add_procs of the tcp btl is invoked once with the 4 tasks.
>I checked the sources and found cutoff only controls if the modex is
>invoked once for all at init, or on demand.
>Cheers,
>Gilles
> 
>On Monday, May 16, 2016, Jeff Squyres (jsquyres) 
>wrote:
> 
>  We changed the way BTL add_procs is invoked on master and v2.x for
>  scalability reasons.
> 
>  In short: add_procs is only invoked the first time you talk to a given
>  peer.  The cutoff switch is an override to that -- if the sizeof
>  COMM_WORLD is less than the cutoff, we revert to the old behavior of
>  calling add_procs for all procs.
> 
>  As for why one BTL would be chosen over another, be sure to look at not
>  only the priority of the component/module, but also the exclusivity
>  level.  In short, only BTLs with the same exclusivity level will be
>  considered (e.g., this is how we exclude TCP when using HPC-class
>  networks), and then the BTL modules with the highest priority will be
>  used for a given peer.
> 
>  > On May 16, 2016, at 7:19 AM, Gilles Gouaillardet
>   wrote:
>  >
>  > it seems I misunderstood some things ...
>  >
>  > add_procs is always invoked, regardless the cutoff value.
>  > cutoff is used to retrieve processes info via the modex "on demand" vs
>  at init time.
>  >
>  > Please someone correct me and/or elaborate if needed
>  >
>  > Cheers,
>  >
>  > Gilles
>  >
>  > On Monday, May 16, 2016, Gilles Gouaillardet 
>  wrote:
>  > i cannot reproduce this behavior.
>  >
>  > note mca_btl_tcp_add_procs is invoked once per tcp component (e.g.
>  once per physical NIC)
>  >
>  > so you might want to explicitly select one nic
>  >
>  > mpirun --mca btl_tcp_if_include xxx ...
>  >
>  > my printf output are the same and regardless the mpi_add_procs_cutoff
>  value
>  >
>  >
>  > Cheers,
>  >
>  >
>  > Gilles
>  > On 5/16/2016 12:22 AM, dpchoudh . wrote:
>  >> Sorry, I accidentally pressed 'Send' before I was done writing the
>  last mail. What I wanted to ask was what is the parameter
>  mpi_add_procs_cutoff and why adding it seems to make a difference in the
>  code path but not in the end result of the program? How would it help me
>  debug my problem?
>  >>
>  >> Thank you
>  >> Durga
>  >>
>  >> The surgeon general advises you to eat right, exercise regularly and
>  quit ageing.
>  >>
>  >> On Sun, May 15, 2016 at 11:17 AM, dpchoudh . 
>  wrote:
>  >> Hello Gilles
>  >>
>  >> Setting -mca mpi_add_procs_cutoff 1024 indeed makes a difference to
>  the output, as follows:
>  >>
>  >> With -mca mpi_add_procs_cutoff 1024:
>  >> reachable = 0x1
>  >> (Note that add_procs was called once and the value of 'reachable is
>  correct')
>  >>
>  >> Without -mca mpi_add_procs_cutoff 1024
>  >> reachable = 0x0
>  >> reachable = NULL
>  >> reachable = NULL
>  >> (Note that add_procs() was caklled three times and the value of
>  'reachable' seems wrong.
>  >>
>  >> The program does run correctly in either case. The program listing is
>  as below (note that I have removed output from the program itself in the
>  above reporting.)
>  >>
>  >> The code that prints 'reachable' is as follows:
>  >>
>  >> if (reachable == NULL)
>  >> printf("reachable = NULL\n");
>  >> else
>  >> {
>  >> int i;
>  >> printf("reachable = ");
>  >> for (i = 0; i < reachable->array_size; i++)
>  >> printf("\t0x%llu", reachable->bitmap[i]);
>  >> printf("\n\n");
>  >> }
>  >> return OPAL_SUCCESS;
>  >>
>  >> And the code for the test program is as follows:
>  >>
>  >> #include 
>  >> #include 
>  >> #include 
>  >> #include 
>  >>
>  >> int main(int argc, char *argv[])
>  >> {
>  >> int world_size, world_rank, name_len;
>  >> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>  >>
>  >> MPI_Init(, );
>  >> MPI_Comm_size(MPI_COMM_WORLD, _size);
>  >> MPI_Comm_rank(MPI_COMM_WORLD, _rank);
>  >> MPI_Get_processor_name(hostname, _len);
>  >> printf("Hello world from processor %s, rank %d out of %d
>

Re: [OMPI devel] Question about 'progress function'

2016-05-06 Thread Nathan Hjelm


The return code of your progress function should be related to the
activity (send, recv, put, get, etc completion) on your network. The
return is not really used right now but may be meaningful in the
future.

Your BTL signals progress through two mechanisms:

 1) Send completion is indicated by either your btl_send() function
 returning 1 (this indicates no calls to btl_progress() are needed and
 that the user buffer is no longer needed), your btl_sendi() function
 returning OPAL_SUCCESS, or you calling the send fragment's callback
 function. btl_send() is the minimum function needed but btl_sendi() can
 provide a faster path to putting a fragment on a network.

 2) Receive completion is indicated by calling a callback associated
 with a fragment's tag. This tag is supplied to btl_send() and
 btl_sendi() is usually sent with the fragment data (usually inline with
 the data). A typical progress function polls the network and on finding
 an incoming fragment, extracts the btl tag and calls the associated
 calback.

It is usually helpful to look at how other btl's work but you can also
find quite a bit of information in opal/mca/btl/btl.h.

-Nathan

On Fri, May 06, 2016 at 12:01:05AM -0400, dpchoudh . wrote:
>George
> 
>Thanks for your help. But what should the progress function return, so
>that the event is signalled? Right now I am returning a 1 when data has
>been transmitted and 0 otherwise, but that does not seem to work. Also,
>please keep in mind that the transport I am working on supports unreliable
>datagrams only, so there is no ack from the recipient to wait for.
> 
>Thanks again
>Durga
>The surgeon general advises you to eat right, exercise regularly and quit
>ageing.
>On Thu, May 5, 2016 at 11:33 PM, George Bosilca 
>wrote:
> 
>  Durga,
>  TCP doesn't need a specialized progress function because we are tied
>  directly with libevent. In your case you should provide a BTL progress
>  function, function that will be called at the end of libevent base loop
>  regularly.
>George.
>  On Thu, May 5, 2016 at 11:30 PM, dpchoudh .  wrote:
> 
>Hi all
> 
>Apologies for a 101 level question again, but here it is:
> 
>A new BTL layer I am implementing hangs in MPI_Send(). Please keep in
>mind that at this stage, I am simply desperate to make MPI data move
>through this fabric in any way possible, so I have thrown all good
>programming practice out of the window and in the process might have
>added bugs.
> 
>The test code basically has a single call to MPI_Send() with 8 bytes
>of data, the smallest amount the HCA can DMA. I have a very simple
>mca_btl_component_progress() method that returns 0 if called before
>mca_btl_endpoint_send() and returns 1 if called after. I use a static
>variable to keep track whether endpoint_send() has been called.
> 
>With this, the MPI process hangs with the following stack:
> 
>(gdb) bt
>#0  0x7f7518c60b7d in poll () from /lib64/libc.so.6
>#1  0x7f75183e79f6 in poll_dispatch (base=0x19cf480,
>tv=0x7f75177efe80) at poll.c:165
>#2  0x7f75183df690 in opal_libevent2022_event_base_loop
>(base=0x19cf480, flags=1) at event.c:1630
>#3  0x7f75183613d4 in progress_engine (obj=0x19cedd8) at
>runtime/opal_progress_threads.c:105
>#4  0x7f7518f3ddf5 in start_thread () from /lib64/libpthread.so.0
>#5  0x7f7518c6b1ad in clone () from /lib64/libc.so.6
> 
>I am using code from master branch for this work.
> 
>Obviously I am not doing the progress handling right, and I don't even
>understand how it should work, as the TCP btl does not even provide a
>component progress function.
> 
>Any relevant pointer on how this should be done is highly appreciated.
> 
>Thanks
>Durga
> 
>The surgeon general advises you to eat right, exercise regularly and
>quit ageing.
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post:
>http://www.open-mpi.org/community/lists/devel/2016/05/18919.php
> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2016/05/18920.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18922.php



pgpItDUCsku5A.pgp
Description: PGP

Re: [OMPI devel] [2.0.0rc2] xlc build failure (inline asm)

2016-05-04 Thread Nathan Hjelm


Go ahead, I don't have access to xlc so I couldn't verify myself. I
don't fully understand why the last : can be omitted when there are no
clobbers.

-Nathan

On Wed, May 04, 2016 at 01:34:48PM -0500, Josh Hursey wrote:
>Did someone pick this up to merge into master & v2.x?
>I can confirm that Paul's patch fixes the issue for XL compilers. I didn't
>see a PR for it, but can file one if no one has yet.
>On Mon, May 2, 2016 at 6:55 PM, Paul Hargrove  wrote:
> 
>  It appears that xlc's support for gcc-style inline asm does not allow an
>  empty clobbers list.
>  The failure I see is
>  libtool: compile:  xlc -DHAVE_CONFIG_H -I.
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/asm
>  -I../../opal/include -I../../ompi/include -I../../oshmem/include
>  -I../../opal/mca/hwloc/hwloc1112/hwloc/include/private/autogen
>  -I../../opal/mca/hwloc/hwloc1112/hwloc/include/hwloc/autogen
>  -I../../ompi/mpiext/cuda/c
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2
>  -I../..
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/orte/include
>  -I../../orte/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/ompi/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/oshmem/include
>  -D_REENTRANT
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/mca/hwloc/hwloc1112/hwloc/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/BLD/opal/mca/hwloc/hwloc1112/hwloc/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/mca/event/libevent2022/libevent
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/mca/event/libevent2022/libevent/include
>  
> -I/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/BLD/opal/mca/event/libevent2022/libevent/include
>  -q64 -g -c
>  
> /home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/asm/asm.c
>  -Wp,-qmakedep=gcc,-MF.deps/asm.TPlo  -qpic -DPIC -o .libs/asm.o
>  
> "/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/include/opal/sys/powerpc/atomic.h",
>  line 150.27: 1506-276 (S) Syntax error: possible missing string literal?
>  
> "/home/hargrove/SCRATCH/OMPI/openmpi-2.0.0rc2-linux-ppc64-xlc-12.1/openmpi-2.0.0rc2/opal/include/opal/sys/powerpc/atomic.h",
>  line 239.27: 1506-276 (S) Syntax error: possible missing string literal?
>  make[2]: *** [asm.lo] Error 1
>  The code corresponding to the first error message is
> 
>   143  static inline int32_t opal_atomic_ll_32 (volatile int32_t
>*addr)
>   144  {
>   145 int32_t ret;
>   146
>   147 __asm__ __volatile__ ("lwarx   %0, 0, %1  \n\t"
>   148   : "=" (ret)
>   149   : "r" (addr)
>   150   :);
>   151 return ret;
>   152  }
> 
>  And the second error is the identical line as it appears in
>  opal_atomic_ll_64().
>  The following patch to remove the "trailing" colons was sufficient to
>  fix this problem.
>  --- openmpi-2.0.0rc2/opal/include/opal/sys/powerpc/atomic.h~  
>   2016-05-02 23:37:13.597782000 +
>  +++ openmpi-2.0.0rc2/opal/include/opal/sys/powerpc/atomic.h
>  2016-05-02 23:36:11.615404378 +
>  @@ -147,7 +147,7 @@
>  __asm__ __volatile__ ("lwarx   %0, 0, %1  \n\t"
>: "=" (ret)
>: "r" (addr)
>  - :);
>  +  );
>  return ret;
>   }
>  @@ -236,7 +236,7 @@
>  __asm__ __volatile__ ("ldarx   %0, 0, %1  \n\t"
>: "=" (ret)
>: "r" (addr)
>  - :);
>  +  );
>  return ret;
>   }
>  -Paul
>  --
>  Paul H. Hargrove  phhargr...@lbl.gov
>  Computer Languages & Systems Software (CLaSS) Group
>  Computer Science Department   Tel: +1-510-495-2352
>  Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>

Re: [OMPI devel] [2.0.0rc2] Illegal instruction on Pentium III

2016-05-02 Thread Nathan Hjelm


Should be fixed by https://github.com/open-mpi/ompi/pull/1618

Thanks for catching this.

-Nathan

On Mon, May 02, 2016 at 02:30:19PM -0700, Paul Hargrove wrote:
>I have an Pentium III Linux system which fails "make check" with:
>make[3]: Entering directory
>
> `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/BLD/ompi/debuggers'
>make[4]: Entering directory
>
> `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/BLD/ompi/debuggers'
>PASS: predefined_gap_test
>PASS: predefined_pad_test
>
> /home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/openmpi-2.0.0rc2/config/test-driver:
>line 107: 24448 Illegal instruction "$@" >$log_file 2>&1
>FAIL: dlopen_test
>
> 
>Testsuite summary for Open MPI 2.0.0rc2
>
> 
># TOTAL: 3
># PASS:  2
># SKIP:  0
># XFAIL: 0
># FAIL:  1
># XPASS: 0
># ERROR: 0
>
> 
>See ompi/debuggers/test-suite.log
>Please report to http://www.open-mpi.org/community/help/
>
> 
>Examining a core file with gdb:
> 
>  Core was generated by `.libs/dlopen_test'.
>  Program terminated with signal 4, Illegal instruction.
>  #0  0xb7d1caf9 in flush_and_invalidate_cache (a=3082248320)
>  at
>  
> /home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/openmpi-2.0.0rc2/opal/mca/patcher/base/patcher_base_patch.c:84
>  84  __asm__ volatile("mfence;clflush %0;mfence" : :"m"
>  (*(char*)a));
> 
>I am fairly confident that the problem is the "clflush" instruction which
>is not supported by most (all?) Intel processors prior to the introduction
>of SSE2.
>-Paul
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18877.php



pgpf113jjGajG.pgp
Description: PGP signature

Re: [OMPI devel] [2.0.0rc2] openib btl build failure

2016-05-02 Thread Nathan Hjelm


Fixed by https://github.com/open-mpi/ompi/pull/1619

Thanks for catching this.

-Nathan

On Mon, May 02, 2016 at 01:57:07PM -0700, Paul Hargrove wrote:
>I have an x86-64/Linux system with a fairly standard install of Scientific
>Linux 6.3 (a RHEL clone like CentOS).
>However, it appears from the error messages (at the bottom of this email)
>that the OFED install differs in some way from OMPI's expectations.
>It appears the OFED was not installed via RPMs, leaving me not knowing how
>to determine a version number.
>Please let me know what additional information is required to resolve
>this, and to whom to send it.
>FWIW: This is on NERSC's Babbage testbed, to which Nathan and Howard may
>already have access (or can request it).
>-Paul
>libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I.
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/btl/openib
>-I../../../../opal/include -I../../../../ompi/include
>-I../../../../oshmem/include
>-I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/private/autogen
>-I../../../../opal/mca/hwloc/hwloc1112/hwloc/include/hwloc/autogen
>-I../../../../ompi/mpiext/cuda/c -I/usr/include/infiniband
>-I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2
>-I../../../..
>-I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/include
>-I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/orte/include
>-I../../../../orte/include
>-I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/ompi/include
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/oshmem/include
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/hwloc/hwloc1112/hwloc/include
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/BLD/opal/mca/hwloc/hwloc1112/hwloc/include
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/event/libevent2022/libevent
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/event/libevent2022/libevent/include
>
> -I/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/BLD/opal/mca/event/libevent2022/libevent/include
>-g -finline-functions -fno-strict-aliasing -pthread -MT
>btl_openib_component.lo -MD -MP -MF .deps/btl_openib_component.Tpo -c
>
> /tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/btl/openib/btl_openib_component.c
> -fPIC -DPIC -o .libs/btl_openib_component.o
>
> /tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/btl/openib/btl_openib_component.c:
>In function 'init_one_port':
>
> /tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/openmpi-2.0.0rc2/opal/mca/btl/openib/btl_openib_component.c:785:
>error: 'struct ibv_exp_device_attr' has no member named 'ext_atom'
>make[2]: *** [btl_openib_component.lo] Error 1
>make[2]: Leaving directory
>`/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/BLD/opal/mca/btl/openib'
>make[1]: *** [all-recursive] Error 1
>make[1]: Leaving directory
>`/tmp/hargrove/OMPI/openmpi-2.0.0rc2-babbage/BLD/opal'
>make: *** [all-recursive] Error 1
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18875.php



pgpDkjre9VB8g.pgp
Description: PGP signature

Re: [OMPI devel] [2.0.0rc2] Linux/MIPS64 failures

2016-05-02 Thread Nathan Hjelm


Looks like patcher/linux needs some work on 32-bit systems. I will try
to get this fixed in the next day or two.

-Nathan

On Mon, May 02, 2016 at 03:36:35PM -0700, Paul Hargrove wrote:
>New since the last time I did testing, I now have access to Linux MIPS64
>(Cavium Octeon II) systems.
>They are infinitely better than the QEMU-emulated MIPS I have used for
>previous testing.
>I have access to both big-endian and little-endian, and am setup to test
>the three main ABIs (32, n32 and 64).
>For the little-endian system all three ABIs pass build, pass "make check"
>and run a sampling of the examples.
>However, for the big-endian system only the "-mabi=64" case passes.
>With big-endian and "-mabi=32" I passed "make check" but saw a SEGV from
>ring_c:
> 
>  $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>  [erpro8-fsf1:05119] *** Process received signal ***
>  [erpro8-fsf1:05119] Signal: Segmentation fault (11)
>  [erpro8-fsf1:05119] Signal code: Address not mapped (1)
>  [erpro8-fsf1:05119] Failing at address: 0x401c7c0
>  [erpro8-fsf1:05119] *** End of error message ***
>  Segmentation fault (core dumped)
> 
>With big-endian and "-mabi=n32" I fail the dl_open test (the first
>non-trivial test) in "make check":
> 
>  make[4]: Entering directory
>  
> `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-mips64-n32/BLD/ompi/debuggers'
>  PASS: predefined_gap_test
>  PASS: predefined_pad_test
>  
> /home/phargrov/OMPI/openmpi-2.0.0rc2-linux-mips64-n32/openmpi-2.0.0rc2/config/test-driver:
>  line 107: 14795 Segmentation fault  "$@" > $log_file 2>&1
>  FAIL: dlopen_test
>  
> 
>  Testsuite summary for Open MPI 2.0.0rc2
>  
> 
>  # TOTAL: 3
>  # PASS:  2
>  # SKIP:  0
>  # XFAIL: 0
>  # FAIL:  1
>  # XPASS: 0
>  # ERROR: 0
>  
> 
>  See ompi/debuggers/test-suite.log
>  Please report to http://www.open-mpi.org/community/help/
>  
> 
>  make[4]: *** [test-suite.log] Error 1
> 
>Or, when run manually:
> 
>  $ ./ompi/debuggers/dlopen_test
>  [erpro8-fsf1:05134] *** Process received signal ***
>  [erpro8-fsf1:05134] Signal: Segmentation fault (11)
>  [erpro8-fsf1:05134] Signal code: Address not mapped (1)
>  [erpro8-fsf1:05134] Failing at address: 0x1100154c
>  [erpro8-fsf1:05134] *** End of error message ***
>  Segmentation fault (core dumped)
> 
>Unfortunately, gdb is not working properly on the core files generated by
>either failure.
>In addition to the endian differences between the two MIPS64 systems, the
>little-endian one is running a newer distro (Debian jessie vs wheezy).
>So, I cannot conclusively state that the endianness is the root cause.
>-Paul
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18882.php



pgpu3rzX25m78.pgp
Description: PGP signature

Re: [OMPI devel] [2.0.0rc2] Illegal instruction on Pentium III

2016-05-02 Thread Nathan Hjelm


Thanks Paul, I will see what I can do to fix this one.

-Nathan

On Mon, May 02, 2016 at 02:30:19PM -0700, Paul Hargrove wrote:
>I have an Pentium III Linux system which fails "make check" with:
>make[3]: Entering directory
>
> `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/BLD/ompi/debuggers'
>make[4]: Entering directory
>
> `/home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/BLD/ompi/debuggers'
>PASS: predefined_gap_test
>PASS: predefined_pad_test
>
> /home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/openmpi-2.0.0rc2/config/test-driver:
>line 107: 24448 Illegal instruction "$@" >$log_file 2>&1
>FAIL: dlopen_test
>
> 
>Testsuite summary for Open MPI 2.0.0rc2
>
> 
># TOTAL: 3
># PASS:  2
># SKIP:  0
># XFAIL: 0
># FAIL:  1
># XPASS: 0
># ERROR: 0
>
> 
>See ompi/debuggers/test-suite.log
>Please report to http://www.open-mpi.org/community/help/
>
> 
>Examining a core file with gdb:
> 
>  Core was generated by `.libs/dlopen_test'.
>  Program terminated with signal 4, Illegal instruction.
>  #0  0xb7d1caf9 in flush_and_invalidate_cache (a=3082248320)
>  at
>  
> /home/phargrov/OMPI/openmpi-2.0.0rc2-linux-x86-OpenSuSE-10.2/openmpi-2.0.0rc2/opal/mca/patcher/base/patcher_base_patch.c:84
>  84  __asm__ volatile("mfence;clflush %0;mfence" : :"m"
>  (*(char*)a));
> 
>I am fairly confident that the problem is the "clflush" instruction which
>is not supported by most (all?) Intel processors prior to the introduction
>of SSE2.
>-Paul
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/05/18877.php



pgpy3FLaHNssX.pgp
Description: PGP signature

Re: [OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-21 Thread Nathan Hjelm


In 1.10.x is possible for the BTLs to be in use by ether ob1 or an
oshmem component. In 2.x one-sided components can also use BTLs. The MTL
interface doesn't not provide support for accessing hardware atomics and
RDMA. As for UD it stands for Unconnected Datagram. Its usage gets
better messaage rates for small messages but really hurts bandwidth. Our
applications are bandwidth bound and not message rate bound so we should
be using XRC not UD.

-Nathan

On Thu, Apr 21, 2016 at 09:33:06AM -0600, David Shrader wrote:
>Hey Nathan,
> 
>I thought only 1 pml could be loaded at a time, and the only pml that
>could use btl's was ob1. If that is the case, how can the openib btl run
>at the same time as cm and yalla?
> 
>Also, what is UD?
> 
>Thanks,
>David
> 
>On 04/21/2016 09:25 AM, Nathan Hjelm wrote:
> 
>  The openib btl should be able to run alongside cm/mxm or yalla. If I
>  have time this weekend I will get on the mustang and see what the
>  problem is. The best answer is to change the openmpi-mca-params.conf in
>  the install to have pml = ob1. I have seen little to no benefit with
>  using MXM on mustang. In fact, the default configuration (which uses UD)
>  gets terrible bandwidth.
> 
>  -Nathan
> 
>  On Thu, Apr 21, 2016 at 01:48:46PM +0300, Alina Sklarevich wrote:
> 
> David, thanks for the info you provided.
> I will try to dig in further to see what might be causing this issue.
> In the meantime, maybe Nathan can please comment about the openib btl
> behavior here?
> Thanks,
> Alina.
> On Wed, Apr 20, 2016 at 8:01 PM, David Shrader <dshra...@lanl.gov> wrote:
> 
>   Hello Alina,
> 
>   Thank you for the information about how the pml components work. I knew
>   that the other components were being opened and ultimately closed in
>   favor of yalla, but I didn't realize that initial open would cause a
>   persistent change in the ompi runtime.
> 
>   Here's the information you requested about the ib network:
> 
>   - MOFED version:
>   We are using the Open Fabrics Software as bundled by RedHat, and my ib
>   network folks say we're running something close to v1.5.4
>   - ibv_devinfo:
>   [dshrader@mu0001 examples]$ ibv_devinfo
>   hca_id: mlx4_0
>   transport:  InfiniBand (0)
>   fw_ver: 2.9.1000
>   node_guid:  0025:90ff:ff16:78d8
>   sys_image_guid: 0025:90ff:ff16:78db
>   vendor_id:  0x02c9
>   vendor_part_id: 26428
>   hw_ver: 0xB0
>   board_id:   SM_212101000
>   phys_port_cnt:  1
>   port:   1
>   state:  PORT_ACTIVE (4)
>   max_mtu:4096 (5)
>   active_mtu: 4096 (5)
>   sm_lid: 250
>   port_lid:   366
>   port_lmc:   0x00
>   link_layer: InfiniBand
> 
>   I still get the seg fault when specifying the hca:
> 
>   $> mpirun -n 1 -mca btl_openib_receive_queues
>   X,4096,1024:X,12288,512:X,65536,512 -mca btl_openib_if_include mlx4_0
>   ./hello_c.x
>   Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
>   dshra...@mu-fey.lanl.gov Distribution, ident: 1.10.2, repo rev:
>   v1.10.1-145-g799148f, Jan 21, 2016, 135)
>   
> --
>   mpirun noticed that process rank 0 with PID 10045 on node mu0001 exited
>   on signal 11 (Segmentation fault).
>   
> --
> 
>   I don't know if this helps, but the first time I tried the command I
>   mistyped the hca name. This got me a warning, but no seg fault:
> 
>   $> mpirun -n 1 -mca btl_openib_receive_queues
>   X,4096,1024:X,12288,512:X,65536,512 -mca btl_openib_if_include ml4_0
>   ./hello_c.x
>   
> --
>   WARNING: One or more nonexistent OpenFabrics devices/ports were
>   specified:
> 
> Host: mu0001
> MCA parameter:mca_btl_if_include
> Nonexistent entities: ml4_0
> 
>   These entities will be ignored.  You can disable this

Re: [OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-21 Thread Nathan Hjelm


The openib btl should be able to run alongside cm/mxm or yalla. If I
have time this weekend I will get on the mustang and see what the
problem is. The best answer is to change the openmpi-mca-params.conf in
the install to have pml = ob1. I have seen little to no benefit with
using MXM on mustang. In fact, the default configuration (which uses UD)
gets terrible bandwidth.

-Nathan

On Thu, Apr 21, 2016 at 01:48:46PM +0300, Alina Sklarevich wrote:
>David, thanks for the info you provided.
>I will try to dig in further to see what might be causing this issue.
>In the meantime, maybe Nathan can please comment about the openib btl
>behavior here?
>Thanks,
>Alina.
>On Wed, Apr 20, 2016 at 8:01 PM, David Shrader  wrote:
> 
>  Hello Alina,
> 
>  Thank you for the information about how the pml components work. I knew
>  that the other components were being opened and ultimately closed in
>  favor of yalla, but I didn't realize that initial open would cause a
>  persistent change in the ompi runtime.
> 
>  Here's the information you requested about the ib network:
> 
>  - MOFED version:
>  We are using the Open Fabrics Software as bundled by RedHat, and my ib
>  network folks say we're running something close to v1.5.4
>  - ibv_devinfo:
>  [dshrader@mu0001 examples]$ ibv_devinfo
>  hca_id: mlx4_0
>  transport:  InfiniBand (0)
>  fw_ver: 2.9.1000
>  node_guid:  0025:90ff:ff16:78d8
>  sys_image_guid: 0025:90ff:ff16:78db
>  vendor_id:  0x02c9
>  vendor_part_id: 26428
>  hw_ver: 0xB0
>  board_id:   SM_212101000
>  phys_port_cnt:  1
>  port:   1
>  state:  PORT_ACTIVE (4)
>  max_mtu:4096 (5)
>  active_mtu: 4096 (5)
>  sm_lid: 250
>  port_lid:   366
>  port_lmc:   0x00
>  link_layer: InfiniBand
> 
>  I still get the seg fault when specifying the hca:
> 
>  $> mpirun -n 1 -mca btl_openib_receive_queues
>  X,4096,1024:X,12288,512:X,65536,512 -mca btl_openib_if_include mlx4_0
>  ./hello_c.x
>  Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
>  dshra...@mu-fey.lanl.gov Distribution, ident: 1.10.2, repo rev:
>  v1.10.1-145-g799148f, Jan 21, 2016, 135)
>  
> --
>  mpirun noticed that process rank 0 with PID 10045 on node mu0001 exited
>  on signal 11 (Segmentation fault).
>  
> --
> 
>  I don't know if this helps, but the first time I tried the command I
>  mistyped the hca name. This got me a warning, but no seg fault:
> 
>  $> mpirun -n 1 -mca btl_openib_receive_queues
>  X,4096,1024:X,12288,512:X,65536,512 -mca btl_openib_if_include ml4_0
>  ./hello_c.x
>  
> --
>  WARNING: One or more nonexistent OpenFabrics devices/ports were
>  specified:
> 
>Host: mu0001
>MCA parameter:mca_btl_if_include
>Nonexistent entities: ml4_0
> 
>  These entities will be ignored.  You can disable this warning by
>  setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
>  
> --
>  Hello, world, I am 0 of 1, (Open MPI v1.10.2, package: Open MPI
>  dshra...@mu-fey.lanl.gov Distribution, ident: 1.10.2, repo rev:
>  v1.10.1-145-g799148f, Jan 21, 2016, 135)
> 
>  So, telling the openib btl to use the actual hca didn't get the seg
>  fault to go away, but giving it a dummy value did.
> 
>  Thanks,
>  David
> 
>  On 04/20/2016 08:13 AM, Alina Sklarevich wrote:
> 
>Hi David,
>I was able to reproduce the issue you reported. 
>When the command line doesn't specify the components to use, ompi will
>try to load/open all the ones available (and close them in the end)
>and then choose the components according to their priority and whether
>or not they were opened successfully.
>This means that even if pml yalla was the one running, other
>components were opened and closed as well.
>The parameter you are using - btl_openib_receive_queues, doesn't have
>an effect on pml yalla. It only affects the openib btl which is used
>

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-3792-g92290b9

2016-04-07 Thread Nathan Hjelm


Hah, just caught that as well. Commented on the commit on
github. Definitely looks wrong.

-Nathan

On Thu, Apr 07, 2016 at 05:43:17PM +, Dave Goodell (dgoodell) wrote:
> [inline]
> 
> On Apr 7, 2016, at 12:53 PM, git...@crest.iu.edu wrote:
> > 
> > This is an automated email from the git hooks/post-receive script. It was
> > generated because a ref change was pushed to the repository containing
> > the project "open-mpi/ompi".
> > 
> > The branch, master has been updated
> >   via  92290b94e0584271d6459a6ab5923a04125e23be (commit)
> >  from  7cdf50533cf940258072f70231a4a456fa73d2f8 (commit)
> > 
> > Those revisions listed above that are new to this repository have
> > not appeared on any other notification email; so we list those
> > revisions in full, below.
> > 
> > - Log -
> > https://github.com/open-mpi/ompi/commit/92290b94e0584271d6459a6ab5923a04125e23be
> > 
> > commit 92290b94e0584271d6459a6ab5923a04125e23be
> > Author: Thananon Patinyasakdikul 
> > Date:   Wed Apr 6 14:26:04 2016 -0400
> > 
> >Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)
> > 
> > diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c 
> > b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> > index 9d1d402..a912bc3 100644
> > --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> > +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
> > @@ -106,7 +106,7 @@ static int mca_pml_ob1_recv_request_cancel(struct 
> > ompi_request_t* ompi_request,
> > /* The rest should be protected behind the match logic lock */
> > OB1_MATCHING_LOCK(_comm->matching_lock);
> > if( true == request->req_match_received ) { /* way to late to cancel 
> > this one */
> > -OPAL_THREAD_UNLOCK(_comm->matching_lock);
> > +OB1_MATCHING_LOCK(_comm->matching_lock);
> 
> I've only taken a cursory look, but this looks very wrong to me.  Shouldn't 
> you be using the "OB1_MATCHING_UNLOCK" macro instead?  Doubly locking the 
> lock will almost certainly lead to sadness.
> 
> > assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
> > matched isn't it */
> > return OMPI_SUCCESS;
> > }
> > diff --git a/opal/mca/btl/tcp/btl_tcp.h b/opal/mca/btl/tcp/btl_tcp.h
> > index f2c8917..7e9d726 100644
> > --- a/opal/mca/btl/tcp/btl_tcp.h
> > +++ b/opal/mca/btl/tcp/btl_tcp.h
> > @@ -96,7 +96,7 @@ extern int mca_btl_tcp_progress_thread_trigger;
> > do {\
> > if(0 < mca_btl_tcp_progress_thread_trigger) {   \
> > opal_event_t* _event = (opal_event_t*)(event);  
> > \
> > -opal_fd_write( mca_btl_tcp_pipe_to_progress[1], 
> > sizeof(opal_event_t*), \
> > +(void) opal_fd_write( mca_btl_tcp_pipe_to_progress[1], 
> > sizeof(opal_event_t*), \
> 
> Seems better to capture the return value and at least put an assert() on it 
> if it fails, though admittedly things are very screwed up if you overrun the 
> pipe.
> 
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/04/18739.php


pgpN0PoPvU3nD.pgp
Description: PGP signature

Re: [OMPI devel] mca_btl__prepare_dst

2016-03-18 Thread Nathan Hjelm

The prepare_dst function was a bottleneck to providing fast one-sided
support using network RDMA. As the function was only used in the RDMA
path it was removed in favor of btl_register_mem + a more complete
put/get interface. You can look at the way the various btls moved the
functionality. The simplest example is probably btl/ugni. I didn't do
any signifigant restructuring when rewriting that btl.

On a Gemini network before the BTL change the best 1-byte MPI_Put
latency I could achieve was ~ 1.2 us. With the new interface it is
closer to ~1.0 us.

-Nathan

On Fri, Mar 18, 2016 at 02:19:09AM -0400, dpchoudh . wrote:
>Hello developers
> 
>It looks like in the trunk, the routine mca_btl__prepare_dst is no
>longer being implemented, at least in TCP and openib BTLs. Up until
>1.10.2, it does exist.
> 
>Is it a new MPI-3 related thing? What is the reason behind this?
> 
>Thanks
>Durga
>Life is complex. It has real and imaginary parts.

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/03/18712.php

pgpSN3xU1DX6T.pgp
Description: PGP signature

Re: [OMPI devel] Network atomic operations

2016-03-04 Thread Nathan Hjelm

On Thu, Mar 03, 2016 at 05:26:45PM -0500, dpchoudh . wrote:
>Hello all
> 
>Here is a 101 level question:
> 
>OpenMPI supports many transports, out of the box, and can be extended to
>support those which it does not. Some of these transports, such as
>infiniband, provide hardware atomic operations on remote memory, whereas
>others, such as iWARP, do not.
> 
>My question is: how (and where in the code base) does openMPI use this
>feature, on those hardware that support it? What is the penalty, in terms
>of additional code, runtime performance and all other considerations, on a
>hardware that does not support it?

Network atomics are used for oshmem (see Mike's email) and MPI RMA. For
RMA they are exposed through the BTL 3.0 interface on the v2.x branch
and master. So far we have only really implemented compare-and-swap,
atomic add, and atomic fetch-and-add. Compare-and-swap and fetch-and-add
are required by our optimized RMA component (ompi/mca/osc/rdma).

-Nathan

pgpEIqAC77Pxc.pgp
Description: PGP signature

Re: [OMPI devel] Fwd: [OMPI users] shared memory under fortran, bug?

2016-02-02 Thread Nathan Hjelm


Hmm, I think you are correct. There may be instances where two different
local processes may use the same CID for different communicators. It
should be sufficient to add the PID of the current process to the
filename to ensure it is unique.

-Nathan

On Tue, Feb 02, 2016 at 09:33:29PM +0900, Gilles Gouaillardet wrote:
>Nathan,
>the sm osc component uses communicator CID to name the file that will be
>used to create shared memory segments.
>if I understand and correctly, two different communicators coming from the
>same MPI_Comm_split might share the same CID, so CID (alone) cannot be
>used to generate a unique per communicator file name
>Makes sense ?
>Cheers,
>Gilles
> 
>-- Forwarded message --
>From: Peter Wind 
>Date: Tuesday, February 2, 2016
>Subject: [OMPI users] shared memory under fortran, bug?
>To: us...@open-mpi.org
> 
>Enclosed is a short (< 100 lines) fortran code example that uses shared
>memory.
>It seems to me it behaves wrongly if openmpi is used.
>Compiled with SGI/mpt , it gives the right result.
> 
>To fail, the code must be run on a single node.
>It creates two groups of 2 processes each. Within each group memory is
>shared.
>The error is that the two groups get the same memory allocated, but they
>should not.
> 
>Tested with openmpi 1.8.4, 1.8.5, 1.10.2 and gfortran, intel 13.0, intel
>14.0
>all fail.
> 
>The call:
>   call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL,
>comm_group, cp1, win, ierr)
> 
>Should allocate memory only within the group. But when the other group
>allocates memory, the pointers from the two groups point to the same
>address in memory.
> 
>Could you please confirm that this is the wrong behaviour?
> 
>Best regards,
>Peter Wind

> program shmem_mpi
> 
>!
>! in this example two groups are created, within each group memory is 
> shared.
>! Still the other group get allocated the same adress space, which it 
> shouldn't.
>!
>! Run with 4 processes, mpirun -np 4 a.out
> 
> 
>use mpi
> 
>use, intrinsic :: iso_c_binding, only : c_ptr, c_f_pointer
> 
>implicit none
> !   include 'mpif.h'
> 
>integer, parameter :: nsize = 100
>integer, pointer   :: array(:)
>integer:: num_procs
>integer:: ierr
>integer:: irank, irank_group
>integer:: win
>integer:: comm = MPI_COMM_WORLD
>integer:: disp_unit
>type(c_ptr):: cp1
>type(c_ptr):: cp2
>integer:: comm_group
> 
>integer(MPI_ADDRESS_KIND) :: win_size
>integer(MPI_ADDRESS_KIND) :: segment_size
> 
>call MPI_Init(ierr)
>call MPI_Comm_size(comm, num_procs, ierr)
>call MPI_Comm_rank(comm, irank, ierr)
> 
>disp_unit = sizeof(1)
>call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)
>call MPI_Comm_rank(comm_group, irank_group, ierr)
> !   print *, 'irank=', irank, ' group rank=', irank_group
> 
>if (irank_group == 0) then
>   win_size = nsize*disp_unit
>else
>   win_size = 0
>endif
> 
>call MPI_Win_allocate_shared(win_size, disp_unit, MPI_INFO_NULL, 
> comm_group, cp1, win, ierr)
>call MPI_Win_fence(0, win, ierr)
> 
>call MPI_Win_shared_query(win, 0, segment_size, disp_unit, cp2, ierr)
> 
>call MPI_Win_fence(0, win, ierr)
>CALL MPI_BARRIER(comm, ierr)! allocations finished
> !   print *, 'irank=', irank, ' size ', segment_size
> 
>call c_f_pointer(cp2, array, [nsize])
> 
>array(1)=0;array(2)=0
>CALL MPI_BARRIER(comm, ierr)!
> 77 format(4(A,I3))
>if(irank   if (irank_group == 0)array(1)=11
>   CALL MPI_BARRIER(comm, ierr)
>   print 77, 'Group 0, rank', irank, ':  array ', array(1), ' ',array(2)
>   CALL MPI_BARRIER(comm, ierr)!Group 1 not yet start writing
>   CALL MPI_BARRIER(comm, ierr)!Group 1 finished writing
>   print 77, 'Group 0, rank', irank, ':  array ', array(1),' ',array(2) 
>   if(array(1)==11.and.array(2)==0)then
>  print *,irank,' correct result'
>   else
>  print *,irank,' wrong result'
>   endif
>else
>   CALL MPI_BARRIER(comm, ierr)
>   CALL MPI_BARRIER(comm, ierr)!Group 0 finished writing
>   print 77, 'Group 1, rank', irank, ':  array ', array(1),' ',array(2)
>   if (irank_group == 0)array(2)=22
>   CALL MPI_BARRIER(comm, ierr)
>   print 77, 'Group 1, rank', irank, ':  array ', array(1),' ',array(2)
>   if(array(1)==0.and.array(2)==22)then
>  print *,irank,' correct result'
>   else
>  print *,irank,' wrong result'
>   endif
>endif
> 
>call MPI_Finalize(ierr)
> 
> end program

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Nathan Hjelm


Another thing that might be useful is at the end of configure print out
a list of each framework with a list of components and some build info
(static vs dynamic, etc). Something like:

plm:
  alps (dynamic)
  rsh (dynamic)
  tm (dynamic)

-Nathan

On Mon, Jan 25, 2016 at 01:46:44PM -0800, Ralph Castain wrote:
>That makes sense, Paul - what if we output effectively the ompi_info
>summary of what was built at the end of the make install procedure? Then
>you would have immediate feedback on the result.
>On Mon, Jan 25, 2016 at 1:27 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>  As one who builds other people's software frequently, I have my own
>  opinions here.
>  Above all else, is that there is no one "right" answer, but that
>  consistency with in a product is best.
>  So (within reason) the same things that work to configure module A and B
>  should work with C and D as well.
>  To use an analogy from (human) languages, I dislike "irregular verbs".
>  The proposal to report (at run time) the existence of TM support on the
>  system (but lacking in ORTE), doesn't "feel" consistent with existing
>  practice.
>  In GASNet we *do* report at runtime if a high-speed network is present
>  and you are not using it.
>  For instance we warn if the headers were missing at configure time but
>  we can see the /dev entry at runtime.
>  However, we do that uniformly across all the networks and have done this
>  for years.
>  So, it is a *consistent* practice in that project.
>  Keep It Simple Stupid is also an important one.
>  So, I agree with those who think the proposal to catch this at runtime
>  is an unnecessary complication.
>  I think improving the FAQ a good idea
>  I do, however, I can think of one thing that might help the "I thought I
>  had configured X" problem Jeff mentions.
>  What about a summary output at the end of configure or make?
>  Right now I sometimes use something like the following:
>$ grep 'bindings\.\.\. yes' configure.out
>$ grep -e 'component .* can compile\.\.\. yes' configure.log
>  This lets me see what is going to be built.
>  Outputing something like this a the end of configure might encourage
>  admins to check for their feature X before typing "make"
>  The existing configury goop can easily be modified to keep a list of
>  configured components and language bindings.
>  However, another alternative is probably easier to implement:
>  The last step of "make install" could print a message like
>NOTICE: Your installation is complete.
>NOTICE: You can run ompi_info to verify that all expected components
>  and language bindings have been built.
>  -Paul
>  On Mon, Jan 25, 2016 at 11:13 AM, Jeff Squyres (jsquyres)
>  <jsquy...@cisco.com> wrote:
> 
>Haters gotta hate.  ;-)
> 
>Kidding aside, ok, you make valid points.  So -- no tm "addition".  We
>just have to rely on people using functionality like "--with-tm" in
>the configure line to force/ensure that tm (or whatever feature) will
>actually get built.
> 
>> On Jan 25, 2016, at 1:31 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>> I think we would be opening a real can of worms with this idea.
>There are environments, for example, that use PBSPro for one part of
>the system (e.g., IO nodes), but something else for the compute
>section.
>>
>> Personally, I'd rather follow Howard's suggestion.
>>
>> On Mon, Jan 25, 2016 at 10:21 AM, Nathan Hjelm <hje...@lanl.gov>
>wrote:
>> On Mon, Jan 25, 2016 at 05:55:20PM +, Jeff Squyres (jsquyres)
>wrote:
>> > Hmm.  I'm of split mind here.
>> >
>> > I can see what Howard is saying here -- adding complexity is
>usually a bad thing.
>> >
>> > But we have gotten these problem reports multiple times over the
>years: someone *thinking* that they have built with launcher support X
>(e.g., TM, LSF), but then figuring out later that things aren't
>running as expected, and after a bunch of work, figure out that it's
>because they didn't build with support X.
>> >
>> > Gilles idea actually sounds interesting -- if the tm module detect
>some of the sentinel PBS/TM env variables, emit a show_help() if we
>don't have full TM support compiled in.  This wou

Re: [OMPI devel] tm-less tm module

2016-01-25 Thread Nathan Hjelm

On Mon, Jan 25, 2016 at 05:55:20PM +, Jeff Squyres (jsquyres) wrote:
> Hmm.  I'm of split mind here.
> 
> I can see what Howard is saying here -- adding complexity is usually a bad 
> thing.
> 
> But we have gotten these problem reports multiple times over the years: 
> someone *thinking* that they have built with launcher support X (e.g., TM, 
> LSF), but then figuring out later that things aren't running as expected, and 
> after a bunch of work, figure out that it's because they didn't build with 
> support X.
> 
> Gilles idea actually sounds interesting -- if the tm module detect some of 
> the sentinel PBS/TM env variables, emit a show_help() if we don't have full 
> TM support compiled in.  This would actually save some users a bunch of time 
> and frustration.
> 
> --> Keep in mind that the SLRUM launcher is different, because it's all 
> CLI-based (not API-based) and therefore we always build it (because we don't 
> have to find headers and libraries).
> 
> FWIW, we do have precedent of having extra MCA params for users to turn off 
> warnings that they don't want to see.
> 
> I guess the question here is: is there a valid use case for running in 
> PBS/Torque and *not* wanting to use the TM launcher?

Once case comes to mind. In the case of Cray systems that unfortunately
run Moab/Toque we can launch using either alps or torque (Howard correct
me if I am wrong). When Sam and I originally wrote the XE support we
used alps instead of torque. I am not entirely sure what we do now.

-Nathan


pgplimhF5bE4i.pgp
Description: PGP signature

Re: [OMPI devel] vader and mmap_shmem module cleanup problem

2015-12-15 Thread Nathan Hjelm


Looks like there is a missing conditional in
mca_btl_vader_component_close(). Will add it and PR to 1.10 and 2.x.

-Nathan

On Tue, Dec 15, 2015 at 11:18:11AM +0100, Justin Cinkelj wrote:
> I'm trying to port Open MPI to OS with threads instead of processes.
> Currently, during MPI_Finalize, I get attempt to call munmap first with
> address of 0x20c0 and later 0x20c8.
> 
> mca_btl_vader_component_close():
> munmap (mca_btl_vader_component.my_segment,
> mca_btl_vader_component.segment_size)
> 
> mca_btl_vader_component_init():
> if(MCA_BTL_VADER_XPMEM != mca_btl_vader_component.single_copy_mechanism) {
>   opal_shmem_segment_create (>seg_ds, sm_file,
> component->segment_size);
>   component->my_segment = opal_shmem_segment_attach (>seg_ds);
> } else {
>   mmap (NULL, component->segment_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS
> | MAP_SHARED, -1, 0);
> }
> 
> But opal_shmem_segment_attach (from mmap module) ends with:
> /* update returned base pointer with an offset that hides our stuff */
> return (ds_buf->seg_base_addr + sizeof(opal_shmem_seg_hdr_t));
> 
> So mca_btl_vader_component_close() should in that case call
> opal_shmem_segment_dettach() instead of munmap.
> Or actually, as at that point shmem_mmap module cleanup code is already
> done, vader could/should just skip cleanup part?
> 
> Maybe I should ask first how does that setup/cleanup work on normal Linux
> system?
> Is mmap called twice, and vader and shmem_mmap module each uses different
> address (so vader munmap is indeed required in that case)?
> 
> Second question.
> With two threads in one process, I got attempt to
> opal_shmem_segment_dettach() and munmap() on same mmap-ed address, from both
> threads. I 'fixed' that by replacing "ds_buf->seg_cpid = getpid()" with
> gettid(), and then each thread munmap-s only address allocated by itself. Is
> that correct? Or is it possible, that the second thread might still try to
> access data at that address?
> 
> BR Justin
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/12/18417.php


pgp5DCt6Gs0Js.pgp
Description: PGP signature

Re: [OMPI devel] ompi_win_create hangs on a non uniform cluster

2015-11-24 Thread Nathan Hjelm


This happens because we do not currently have a way to detect
connectivity without allocating ompi_proc_t's for every rank in the
window. I added the osc_rdma_btls MCA variable to act as a short-circuit
that avoids the costly connectivity lookup. By default the value is
ugni,openib. You can set it to the empty string to force it to check
connectivity.

This will be in 2.x once the mlx5 fix is in. I can update the check to
do an allreduce to ensure all processes in the window select the same
btl. I do not, however, want to change the default value of
osc_rdma_btls since it is there to ensure performance and reduce the
memory footprint on heterogenous clusters.

-Nathan

On Sun, Nov 15, 2015 at 10:34:45AM +0900, Gilles Gouaillardet wrote:
>Howard,
>there is no rdma osc component in v2.x, so I doubt the issue occurs here.
>I will double check this anyway on Monday
>Cheers,
>Gilles
> 
>On Sunday, November 15, 2015, Howard  wrote:
> 
>  Hi Gilles
> 
>  Could you check whether you also see this problem with v2.x?
> 
>  Thanks,
> 
>  Howard
> 
>  Von meinem iPhone gesendet
> 
>  > Am 10.11.2015 um 19:57 schrieb Gilles Gouaillardet
>  :
>  >
>  > Nathan,
>  >
>  > a simple MPI_Win_create test hangs on my non uniform cluster
>  (ibm/onesided/c_create)
>  >
>  > one node has an IB card but not the other one.
>  > the node with the IB card select the rdma osc module, but the other
>  node select the pt2pt module.
>  > and then it hangs because both ends do no try to initialize the same
>  module
>  >
>  > if i understand correctly, the rdma osc component is selected if at
>  least a rdma capable btl is initialized,
>  > imho, the logic should be :
>  > the rdma osc component is selected for a given communicator if all the
>  btls involved in this communicator
>  > (maybe except the self btl) are rdma capable.
>  >
>  > can you please have a look at this ?
>  >
>  > Cheers,
>  >
>  > Gilles
>  > ___
>  > devel mailing list
>  > de...@open-mpi.org
>  > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  > Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2015/11/18356.php
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2015/11/18370.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/11/18371.php



pgpDriWPbNRTc.pgp
Description: PGP signature

Re: [OMPI devel] Bad performance (20% bandwidth loss) when compiling with GCC 5.2 instead of 4.x

2015-10-14 Thread Nathan Hjelm


I think this is from a known issue. Try applying this and run again:

https://github.com/open-mpi/ompi/commit/952d01db70eab4cbe11ff4557434acaa928685a4.patch

-Nathan

On Wed, Oct 14, 2015 at 06:33:07PM +0200, Paul Kapinos wrote:
> Dear Open MPI developer,
> 
> We're puzzled by reproducible performance (bandwidth) penalty observed when
> comparing measurements via InfibiBand between two nodes, OpenMPI/1.10.0
> compiled with *GCC/5.2* instead of GCC 4.8 and Intel compiler.
> 
> Take a look at the attached picture of two measurements of NetPIPE
> http://bitspjoule.org/netpipe/ benchmark done with one MPI rank per node,
> communicating via QDR InfiniBand (y axis: Mbps, y axis: sample number)
> 
> Up to sample 64 (8195 bytes message size) the achieved performance is
> virtually the same; from sample 65 (12285 bytes, *less* than 12k) the
> version of GCC compiled using GCC 5.2 suffer form 20%+ penalty in bandwidth.
> 
> The result is reproducible and independent from nodes and ever linux
> distribution (both Scientific Linux 6 and CentOS 7 have the same results).
> Both C and Fortran benchmarks offer the very same behaviour so it is *not*
> an f08 issue.
> 
> The acchieved bandwidth is definitely IB-range (gigabytes per second), the
> communication is running via InfinfiBand in all cases (no failback to IP,
> huh).
> 
> The compile line is the same; the output of ompi_info --all and --params is
> the very same (cf. attachments) up to added support for fortran-08 in /5.2
> version.
> 
> We know about existence of 'eager_limit' parameter, which is *not* changed
> and is 12288 in both versions (this is *less* that the first distinguishing
> sample).
> 
> Again, for us the *only* difference is usage of other (new) GCC release.
> 
> Any idea about this 20%+ bandwidth loss?
> 
> Best
> 
> Paul Kapinos
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, IT Center
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915


>  MCA btl: parameter "btl_openib_verbose" (current value: 
> "false", data source: default, level: 9 dev/all, type: bool)
>   Output some verbose OpenIB BTL information (0 = no 
> output, nonzero = output)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_warn_no_device_params_found" 
> (current value: "true", data source: default, level: 9 dev/all, type: bool, 
> synonyms: btl_openib_warn_no_hca_params_found)
>   Warn when no device-specific parameters are found 
> in the INI file specified by the btl_openib_device_param_files MCA parameter 
> (0 = do not warn; any other value = warn)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_warn_no_hca_params_found" 
> (current value: "true", data source: default, level: 9 dev/all, type: bool, 
> deprecated, synonym of: btl_openib_warn_no_device_params_found)
>   Warn when no device-specific parameters are found 
> in the INI file specified by the btl_openib_device_param_files MCA parameter 
> (0 = do not warn; any other value = warn)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_warn_default_gid_prefix" 
> (current value: "true", data source: default, level: 9 dev/all, type: bool)
>   Warn when there is more than one active ports and 
> at least one of them connected to the network with only default GID prefix 
> configured (0 = do not warn; any other value = warn)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_warn_nonexistent_if" (current 
> value: "true", data source: default, level: 9 dev/all, type: bool)
>   Warn if non-existent devices and/or ports are 
> specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not warn; 
> any other value = warn)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_abort_not_enough_reg_mem" 
> (current value: "false", data source: default, level: 9 dev/all, type: bool)
>   If there is not enough registered memory available 
> on the system for Open MPI to function properly, Open MPI will issue a 
> warning.  If this MCA parameter is set to true, then Open MPI will also abort 
> all MPI jobs (0 = warn, but do not abort; any other value = warn and abort)
>   Valid values: 0: f|false|disabled, 1: t|true|enabled
>  MCA btl: parameter "btl_openib_poll_cq_batch" (current 
> value: "256", data source: default, level: 9 dev/all, type: unsigned)
>   Retrieve up to poll_cq_batch completions from CQ
>  MCA btl: parameter

Re: [OMPI devel] 16 byte real in Fortran

2015-10-14 Thread Nathan Hjelm


On Wed, Oct 14, 2015 at 02:40:00PM +0100, Vladimír Fuka wrote:
> Hello,
> 
>   I have a problem with using the  quadruple (128bit) or extended
> (80bit) precision reals in Fortran. I did my tests with gfortran-4.8.5
> and OpenMPI-1.7.2 (preinstalled OpenSuSE 13.2), but others confirmed
> this behaviour for more recent versions at
> http://stackoverflow.com/questions/33109040/strange-result-of-mpi-allreduce-for-16-byte-real?noredirect=1#comment54060649_33109040
> .
> 
>   When I try to use REAL*16 variables (or equivalent kind-based
> definition) and MPI_REAL16 the reductions don't give correct results
> (see the link for the exact code). I was pointed to this issue ticket
> https://github.com/open-mpi/ompi/issues/63.

As that ticket notes if REAL*16 <> long double Open MPI should be
disabling redutions on MPI_REAL16. I can take a look and see if I can
determine why that is not working as expected.

> Is there a correct way how to use the extended or quadruple precision
> in OpenMPI? My intended usage is mainly checking if differences seen
> numerical computations are getting smaller with increasing precision
> and can therefore be attributed to rounding errors. If not they could
> be a sign of a bug.

Take a look at the following article:

http://dl.acm.org/citation.cfm?id=1988419=553203244=11814269

You may be able to use the method described to get the enhanced
precision you need.

-Nathan
HPC-5, LANL


pgp3p5D1g27uS.pgp
Description: PGP signature

Re: [OMPI devel] problems compiling ompi master

2015-09-22 Thread Nathan Hjelm


Hah, opps. Typo in the coverity fixes. Fixing now.

-Nathan

On Tue, Sep 22, 2015 at 10:24:29AM -0600, Howard Pritchard wrote:
>Hi Folks,
>Is anyone seeing a problem compiling ompi today?
>This is what I'm getting
>  CC   osc_pt2pt_passive_target.lo
>In file included from ../../../../opal/include/opal_config.h:2802:0,
> from ../../../../ompi/include/ompi_config.h:29,
> from osc_pt2pt_active_target.c:24:
>osc_pt2pt_active_target.c: In function 'ompi_osc_pt2pt_get_peers':
>osc_pt2pt_active_target.c:84:35: error: 'ompi_osc_rdma_peer_t' undeclared
>(first use in this function)
> peers = calloc (size, sizeof (ompi_osc_rdma_peer_t *));
>   ^
>../../../../opal/include/opal_config_bottom.h:323:61: note: in definition
>of macro 'calloc'
> #define calloc(nmembers, size) opal_calloc((nmembers), (size),
>__FILE__, __LINE__)
> ^
>osc_pt2pt_active_target.c:84:35: note: each undeclared identifier is
>reported only once for each function it appears in
> peers = calloc (size, sizeof (ompi_osc_rdma_peer_t *));
>   ^
>../../../../opal/include/opal_config_bottom.h:323:61: note: in definition
>of macro 'calloc'
> #define calloc(nmembers, size) opal_calloc((nmembers), (size),
>__FILE__, __LINE__)
> ^
>osc_pt2pt_active_target.c:84:57: error: expected expression before ')'
>token
> peers = calloc (size, sizeof (ompi_osc_rdma_peer_t *));
> ^
>../../../../opal/include/opal_config_bottom.h:323:61: note: in definition
>of macro 'calloc'
> #define calloc(nmembers, size) opal_calloc((nmembers), (size),
>__FILE__, __LINE__)
> ^
>Howard

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18098.php



pgpCZh_9o9Rfz.pgp
Description: PGP signature

Re: [OMPI devel] --enable-spare-groups build broken

2015-09-17 Thread Nathan Hjelm


No, it was not. Will fix.

-Nathan

On Wed, Sep 16, 2015 at 07:26:58PM -0700, Ralph Castain wrote:
>Yes - Nathan made some changes related to the add_procs code. I doubt that
>configure option was checked...
>On Wed, Sep 16, 2015 at 7:13 PM, Jeff Squyres (jsquyres)
> wrote:
> 
>  Did something change in the group structure in the last 24-48 hours?
> 
>  --enable-spare-groups groups are currently broken:
> 
>  
>  make[2]: Entering directory `/home/jsquyres/git/ompi/ompi/debuggers'
>CC   libdebuggers_la-ompi_debuggers.lo
>  In file included from ../../ompi/communicator/communicator.h:38:0,
>   from ../../ompi/mca/pml/base/pml_base_request.h:32,
>   from ompi_debuggers.c:67:
>  ../../ompi/group/group.h: In function `ompi_group_get_proc_ptr':
>  ../../ompi/group/group.h:366:52: error: `peer_id' undeclared (first use
>  in this function)
>   return ompi_group_dense_lookup (group, peer_id, allocate);
>  ^
>  ../../ompi/group/group.h:366:52: note: each undeclared identifier is
>  reported only once for each function it appears in
>  -
> 
>  Can someone have a look?
> 
>  Thanks.
>  --
>  Jeff Squyres
>  jsquy...@cisco.com
>  For corporate legal information go to:
>  http://www.cisco.com/web/about/doing_business/legal/cri/
> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post:
>  http://www.open-mpi.org/community/lists/devel/2015/09/18056.php

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18057.php



pgpUgIJf38XsO.pgp
Description: PGP signature

Re: [OMPI devel] The issue with OMPI_FREE_LIST_GET_MT()

2015-09-16 Thread Nathan Hjelm


Not sure. I give a +1 for blowing them away. We can bring them back
later if needed.

-Nathan

On Wed, Sep 16, 2015 at 01:19:24PM -0400, George Bosilca wrote:
>As they don't even compile why are we keeping them around?
>  George.
>On Wed, Sep 16, 2015 at 12:05 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>  iboffload and bfo are opal ignored by default. Neither exists in the
>  release branch.
> 
>  -Nathan
>  On Wed, Sep 16, 2015 at 12:02:29PM -0400, George Bosilca wrote:
>  >While looking into a possible fix for this problem we should also
>  cleanup
>  >in the trunk the leftover from the OMPI_FREE_LIST.
>  >$find . -name "*.[ch]" -exec grep -Hn OMPI_FREE_LIST_GET_MT {} +
>  >./opal/mca/btl/usnic/btl_usnic_compat.h:161:
>  > OMPI_FREE_LIST_GET_MT(list, (item))
>  >./ompi/mca/pml/bfo/pml_bfo_recvreq.h:89:
>  >OMPI_FREE_LIST_GET_MT(_pml_base_recv_requests, item); 
>  \
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:149:
>  > OMPI_FREE_LIST_GET_MT(>tasks_free, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_task.h:206:
>  > OMPI_FREE_LIST_GET_MT(task_list, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:107:
>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:146:
>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.c:208:
>  > OMPI_FREE_LIST_GET_MT(>device->frags_free[qp_index],
>  item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_qp_info.c:156:
>  > OMPI_FREE_LIST_GET_MT(>frags_free[qp_index], item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_collfrag.h:130:
>  > OMPI_FREE_LIST_GET_MT(>collfrags_free, item);
>  >./ompi/mca/bcol/iboffload/bcol_iboffload_frag.h:115:
>  > OMPI_FREE_LIST_GET_MT(>ml_frags_free, item);
>  >I wonder how these are even compiling ...
>  >  George.
>  >On Wed, Sep 16, 2015 at 11:59 AM, George Bosilca
>  <bosi...@icl.utk.edu>
>  >wrote:
>  >
>  >  Alexey,
>  >  This is not necessarily the fix for all cases. Most of the
>  internal uses
>  >  of the free_list can easily accommodate to the fact that no more
>  >  elements are available. Based on your description of the problem
>  I would
>  >  assume you encounter this problem once the
>  >  MCA_PML_OB1_RECV_REQUEST_ALLOC is called. In this particular case
>  the
>  >  problem is that fact that we call OMPI_FREE_LIST_GET_MT and that
>  the
>  >  upper level is unable to correctly deal with the case where the
>  returned
>  >  item is NULL. In this particular case the real fix is to use the
>  >  blocking version of the free_list accessor (similar to the case
>  for
>  >  send) OMPI_FREE_LIST_WAIT_MT.
>  >  It is also possible that I misunderstood your problem. IF the
>  solution
>  >  above doesn't work can you describe exactly where the NULL return
>  of the
>  >  OMPI_FREE_LIST_GET_MT is creating an issue?
>  >  George.
>  >  On Wed, Sep 16, 2015 at 9:03 AM, Aleksej Ryzhih
>  >  <avryzh...@compcenter.org> wrote:
>  >
>  >Hi all,
>  >
>  >We experimented with MPI+OpenMP hybrid application
>  >(MPI_THREAD_MULTIPLE support level)  where several threads
>  submits a
>  >lot of MPI_Irecv() requests simultaneously and encountered an
>  >intermittent bug OMPI_ERR_TEMP_OUT_OF_RESOURCE after
>  >MCA_PML_OB1_RECV_REQUEST_ALLOC()  because 
>  OMPI_FREE_LIST_GET_MT()
>  > returned NULL.  Investigating this bug we found that sometimes
>  the
>  >thread calling ompi_free_list_grow()  don't have any free items
>  in
>  >LIFO list at exit because other threads  retrieved  all new
>  items at
>  >opal_atomic_lifo_pop()
>  >
>  >So we suggest to change OMPI_FREE_LIST_GET_MT() as below:
>  >
>  >
>  >
>  >#define OMPI_FREE_LIST_GET_MT(fl,
>  >item) 
>\
>

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm


I see the problem. Before my changes ompi_comm_dup signalled that the
communicator was not an inter-communicator by setting remote_size to
0. The remote size is now from the remote group if one was supplied
(which is the case with intra-communicators) so ompi_comm_dup needs to
make sure NULL is passed for the remote_group when duplicating
intra-communicators.

I opened a PR. Once jenkins finishes I will merge it onto master.

-Nathan

On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
> 
> Thanks
> Edgar
> 
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include 
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >>   MPI_Comm comm1;
> >>   int root=0;
> >>   int rank2, size2, global_buf=1;
> >>   int rank, size;
> >>
> >>   MPI_Init ( ,  );
> >>
> >>   MPI_Comm_rank ( MPI_COMM_WORLD,  );
> >>   MPI_Comm_size ( MPI_COMM_WORLD,  );
> >>
> >>/* Setting up a new communicator */
> >>   MPI_Comm_dup ( MPI_COMM_WORLD,  );
> >>
> >>   MPI_Comm_size ( comm1,  );
> >>   MPI_Comm_rank ( comm1,  );
> >>
> >>
> >>   MPI_Bcast ( _buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >>   if ( rank == root ) {
> >>   printf("Bcast on MPI_COMM_WORLD finished\n");
> >>   }
> >>   MPI_Bcast ( _buf, 1, MPI_INT, root, comm1 );
> >>   if ( rank == root ) {
> >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >>   }
> >>
> >>   MPI_Comm_free (  );
> >>
> >>   MPI_Finalize ();
> >>   return ( 0 );
> >>}
> >
> >>___
> >>devel mailing list
> >>de...@open-mpi.org
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post: 
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php


pgpmMsCoitOgp.pgp
Description: PGP signature

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm


I just realized my branch is behind master. Updating now and will retest.

-Nathan

On Wed, Sep 16, 2015 at 10:43:45AM -0500, Edgar Gabriel wrote:
> yes, I did fresh pull this morning, for me it deadlocks reliably for 2 and
> more processes.
> 
> Thanks
> Edgar
> 
> On 9/16/2015 10:42 AM, Nathan Hjelm wrote:
> >
> >The reproducer is working for me with master on OX 10.10. Some changes
> >to ompi_comm_set went in yesterday. Are you on the latest hash?
> >
> >-Nathan
> >
> >On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> >>something is borked right now on master in the management of inter vs. intra
> >>communicators. It looks like intra communicators are wrongly selecting the
> >>inter coll module thinking that it is an inter communicator, and we have
> >>hangs because of that. I attach a small replicator, where a bcast of a
> >>duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> >>being selected.
> >>
> >>Thanks
> >>Edgar
> >
> >>#include 
> >>#include "mpi.h"
> >>
> >>int main( int argc, char *argv[] )
> >>{
> >>   MPI_Comm comm1;
> >>   int root=0;
> >>   int rank2, size2, global_buf=1;
> >>   int rank, size;
> >>
> >>   MPI_Init ( ,  );
> >>
> >>   MPI_Comm_rank ( MPI_COMM_WORLD,  );
> >>   MPI_Comm_size ( MPI_COMM_WORLD,  );
> >>
> >>/* Setting up a new communicator */
> >>   MPI_Comm_dup ( MPI_COMM_WORLD,  );
> >>
> >>   MPI_Comm_size ( comm1,  );
> >>   MPI_Comm_rank ( comm1,  );
> >>
> >>
> >>   MPI_Bcast ( _buf, 1, MPI_INT, root, MPI_COMM_WORLD );
> >>   if ( rank == root ) {
> >>   printf("Bcast on MPI_COMM_WORLD finished\n");
> >>   }
> >>   MPI_Bcast ( _buf, 1, MPI_INT, root, comm1 );
> >>   if ( rank == root ) {
> >>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
> >>   }
> >>
> >>   MPI_Comm_free (  );
> >>
> >>   MPI_Finalize ();
> >>   return ( 0 );
> >>}
> >
> >>___
> >>devel mailing list
> >>de...@open-mpi.org
> >>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>Link to this post: 
> >>http://www.open-mpi.org/community/lists/devel/2015/09/18040.php
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/devel/2015/09/18042.php
> >
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18043.php


pgpYHFc1iNhFV.pgp
Description: PGP signature

Re: [OMPI devel] inter vs. intra communicator problem on master

2015-09-16 Thread Nathan Hjelm


The reproducer is working for me with master on OX 10.10. Some changes
to ompi_comm_set went in yesterday. Are you on the latest hash?

-Nathan

On Wed, Sep 16, 2015 at 08:49:59AM -0500, Edgar Gabriel wrote:
> something is borked right now on master in the management of inter vs. intra
> communicators. It looks like intra communicators are wrongly selecting the
> inter coll module thinking that it is an inter communicator, and we have
> hangs because of that. I attach a small replicator, where a bcast of a
> duplicate of MPI_COMM_WORLD hangs, because the inter collective module is
> being selected.
> 
> Thanks
> Edgar

> #include 
> #include "mpi.h"
> 
> int main( int argc, char *argv[] )
> {
>   MPI_Comm comm1;
>   int root=0;
>   int rank2, size2, global_buf=1;
>   int rank, size;
> 
>   MPI_Init ( ,  );
> 
>   MPI_Comm_rank ( MPI_COMM_WORLD,  );
>   MPI_Comm_size ( MPI_COMM_WORLD,  );
> 
> /* Setting up a new communicator */
>   MPI_Comm_dup ( MPI_COMM_WORLD,  );
> 
>   MPI_Comm_size ( comm1,  );
>   MPI_Comm_rank ( comm1,  );
> 
> 
>   MPI_Bcast ( _buf, 1, MPI_INT, root, MPI_COMM_WORLD );
>   if ( rank == root ) {
>   printf("Bcast on MPI_COMM_WORLD finished\n");
>   }
>   MPI_Bcast ( _buf, 1, MPI_INT, root, comm1 );
>   if ( rank == root ) {
>   printf("Bcast on duplicate of MPI_COMM_WORLD finished\n");
>   }
> 
>   MPI_Comm_free (  );
> 
>   MPI_Finalize ();
>   return ( 0 );
> }

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18040.php



pgp5P6XiYmwtY.pgp
Description: PGP signature

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Nathan Hjelm


+1

On Mon, Aug 24, 2015 at 07:08:02PM +, Jeff Squyres (jsquyres) wrote:
> FWIW, we have had verbal agreement in the past that the v1.8 series was the 
> last one to contain MX support.  I think it would be fine for all MX-related 
> components to disappear from v1.10.
> 
> Don't forget that Myricom as an HPC company no longer exists.
> 
> 
> > On Aug 24, 2015, at 2:34 PM, Paul Hargrove  wrote:
> > 
> > 
> > On Mon, Aug 24, 2015 at 10:52 AM, Paul Hargrove  wrote:
> > Thus if this newly reported problem is (as I am going to guess) in 
> > config/ompi_check_mx.m4 then it may go unfixed.
> > You say you and I are the only ones to care, and I think we both care for 
> > reasons related to software quality rather than any desire to use MX.
> > 
> > I looked to see where the -rpath options are coming from.
> > I am 95% certain that libtool is constructing them from the 
> > network-specific .la files (such as libfabric.la).
> > That is also the reason why libfabric gets linked by full path instead of a 
> > "-l" option.
> > 
> > So, my conclusions:
> > 
> > 1. Since there is no libmyriexpress.la, one should either
> >1a.  add the MX libdir to LD_LIBRARY_PATH
> >1b.  use the wrapper-ldflags family of configure arguments to add an 
> > rpath
> > 
> > 2. There is *probably* no Open MPI bug here assuming the authors of MX 
> > support assumed "1a".
> > 
> > In support of these conclusion, the following is quoted from the MX 
> > installation instructions:
> > For Linux, FreeBSD and Solaris, add the MX library directory to the
> > system library search path. Otherwise, individual users will have to
> > either manage their LD_LIBRARY_PATH(_64) environment variable or 
> > link
> > their program with an "-rpath/-R" option for the dynamic linker to
> > locate the MX shared library.
> > 
> > So, I am actually wondering if Ralph's changes yesterday to "fix" 
> > $(WRAPPER_EXTRA_LDFLAGS) might have been unnecessary.
> > Instead, I think *removing* those [testname]_LDFLAGS lines may be the 
> > correct solution - they were *empty* until rc6.
> > 
> > IMHO:  dropping MX support in 1.10 is probably wise given the lack of 
> > vendor support .
> > 
> > -Paul
> > 
> > -- 
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/08/17827.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17829.php


pgpjlU8rw13qI.pgp
Description: PGP signature

Re: [OMPI devel] 1.10.0rc3 build failure Solaris/x86 + gcc

2015-08-20 Thread Nathan Hjelm


No problem. I should have caught it in my post-cherry-pick tests. I
forgot to test with -m32.

-Nathan

On Thu, Aug 20, 2015 at 11:37:17AM -0700, Paul Hargrove wrote:
>Excellent.  Sorry I let this escape into the 1.8.8 release.
>-Paul
>On Thu, Aug 20, 2015 at 10:29 AM, Jeff Squyres (jsquyres)
><jsquy...@cisco.com> wrote:
> 
>  (the fix has been merged in to v1.8 and v1.10 branches)
>  > On Aug 20, 2015, at 12:18 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
>  >
>  >
>  > I see the problem. Both Ralph and I missed an error in the
>  > cherry-pick. For add_32 in the ia32 atomics we were checking for
>  > OPAL_GCC_INLINE_ASSEMBLY instead of OMPI_GCC_INLINE_ASSEMBLY.
>  >
>  > -Nathan
>  >
>  > On Thu, Aug 20, 2015 at 03:01:35PM +, Jeff Squyres (jsquyres)
>  wrote:
>  >> Paul --
>  >>
>  >> I see that there was an ASM change in 1.8.8.  At first look, it seems
>  harmless / shouldn't have caused this kind of problem.
>  >>
>  >> Nathan is checking into it...
>  >>
>  >>
>  >>
>  >>> On Aug 14, 2015, at 9:52 PM, Paul Hargrove <phhargr...@lbl.gov>
>  wrote:
>  >>>
>  >>> I have a systems running Solaris 11.1 on x86-64 hardware and 11.2 in
>  an x86-64 VM.
>  >>> To the extent I have tested the results are the same on both,
>  despite gcc-4.5.2 vs 4.8.2
>  >>>
>  >>> I have normally tested only the Sun/Oracle Studio compilers on these
>  systems.
>  >>> However, today I gave the vendor-provided gcc, g++ and gfortran in
>  /usr/bin a try.
>  >>> So I configured the OpenMPI 1.10.0rc3 tarball with NO arguments to
>  configure.
>  >>>
>  >>> When doing so I see tons of warnings like:
>  >>>
>  >>> ../../../../openmpi-1.10.0rc3/opal/include/opal/sys/atomic.h:393:9:
>  warning: `opal_atomic_add_32' used but never defined
>  >>> ../../../../openmpi-1.10.0rc3/opal/include/opal/sys/atomic.h:401:9:
>  warning: `opal_atomic_sub_32' used but never defined
>  >>>
>  >>> and an eventual link failure to match:
>  >>>
>  >>>  CCLD libopen-pal.la
>  >>> Text relocation remains referenced
>  >>>against symbol  offset  in file
>  >>> opal_atomic_add_32  0x1e4 
>   runtime/.libs/opal_progress.o
>  >>> opal_atomic_sub_32  0x234 
>   runtime/.libs/opal_progress.o
>  >>> ld: fatal: relocations remain against allocatable but non-writable
>  sections
>  >>> collect2: ld returned 1 exit status
>  >>>
>  >>>
>  >>>
>  >>> Here is the possibly-relevant portion of the configure output:
>  >>>
>  >>> checking if gcc -std=gnu99 supports GCC inline assembly... yes
>  >>> checking if gcc -std=gnu99 supports DEC inline assembly... no
>  >>> checking if gcc -std=gnu99 supports XLC inline assembly... no
>  >>> checking for assembly format...
>  default-.text-.globl-:--.L-@-1-0-1-1-0
>  >>> checking for assembly architecture... IA32
>  >>> checking for builtin atomics... BUILTIN_NO
>  >>> checking for perl... perl
>  >>> checking for pre-built assembly file... yes
>  (atomic-ia32-linux-nongas.s)
>  >>> checking for atomic assembly filename... atomic-ia32-linux-nongas.s
>  >>>
>  >>>
>  >>> The same problem is present in Open MPI 1.8.8, but 1.8.7 builds just
>  fine.
>  >>>
>  >>> Note that on Solaris the default ABI is ILP32 (e.g. default to -m32
>  rather than -m64).
>  >>> There are no problems with LP64 builds ("-m64" in *FLAGS and the
>  wrapper flags).
>  >>> There are also no problems with either ILP32 or LP64 and the Studio
>  compilers.
>  >>> Only gcc with (default) 32-bit target experiences this failure.
>  >>>
>  >>> -Paul
>  >>>
>  >>> --
>  >>> Paul H. Hargrove  phhargr...@lbl.gov
>  >>> Computer Languages & Systems Software (CLaSS) Group
>  >>> Computer Science Department   Tel: +1-510-495-

Re: [OMPI devel] 1.10.0rc3 build failure Solaris/x86 + gcc

2015-08-20 Thread Nathan Hjelm


I see the problem. Both Ralph and I missed an error in the
cherry-pick. For add_32 in the ia32 atomics we were checking for
OPAL_GCC_INLINE_ASSEMBLY instead of OMPI_GCC_INLINE_ASSEMBLY.

-Nathan

On Thu, Aug 20, 2015 at 03:01:35PM +, Jeff Squyres (jsquyres) wrote:
> Paul --
> 
> I see that there was an ASM change in 1.8.8.  At first look, it seems 
> harmless / shouldn't have caused this kind of problem.
> 
> Nathan is checking into it...
> 
> 
> 
> > On Aug 14, 2015, at 9:52 PM, Paul Hargrove  wrote:
> > 
> > I have a systems running Solaris 11.1 on x86-64 hardware and 11.2 in an 
> > x86-64 VM.
> > To the extent I have tested the results are the same on both, despite 
> > gcc-4.5.2 vs 4.8.2
> > 
> > I have normally tested only the Sun/Oracle Studio compilers on these 
> > systems.
> > However, today I gave the vendor-provided gcc, g++ and gfortran in /usr/bin 
> > a try.
> > So I configured the OpenMPI 1.10.0rc3 tarball with NO arguments to 
> > configure.
> > 
> > When doing so I see tons of warnings like:
> > 
> > ../../../../openmpi-1.10.0rc3/opal/include/opal/sys/atomic.h:393:9: 
> > warning: `opal_atomic_add_32' used but never defined
> > ../../../../openmpi-1.10.0rc3/opal/include/opal/sys/atomic.h:401:9: 
> > warning: `opal_atomic_sub_32' used but never defined
> > 
> > and an eventual link failure to match:
> > 
> >   CCLD libopen-pal.la
> > Text relocation remains referenced
> > against symbol  offset  in file
> > opal_atomic_add_32  0x1e4   
> > runtime/.libs/opal_progress.o
> > opal_atomic_sub_32  0x234   
> > runtime/.libs/opal_progress.o
> > ld: fatal: relocations remain against allocatable but non-writable sections
> > collect2: ld returned 1 exit status
> > 
> > 
> > 
> > Here is the possibly-relevant portion of the configure output:
> > 
> > checking if gcc -std=gnu99 supports GCC inline assembly... yes
> > checking if gcc -std=gnu99 supports DEC inline assembly... no
> > checking if gcc -std=gnu99 supports XLC inline assembly... no
> > checking for assembly format... default-.text-.globl-:--.L-@-1-0-1-1-0
> > checking for assembly architecture... IA32
> > checking for builtin atomics... BUILTIN_NO
> > checking for perl... perl
> > checking for pre-built assembly file... yes (atomic-ia32-linux-nongas.s)
> > checking for atomic assembly filename... atomic-ia32-linux-nongas.s
> > 
> > 
> > The same problem is present in Open MPI 1.8.8, but 1.8.7 builds just fine.
> > 
> > Note that on Solaris the default ABI is ILP32 (e.g. default to -m32 rather 
> > than -m64).
> > There are no problems with LP64 builds ("-m64" in *FLAGS and the wrapper 
> > flags).
> > There are also no problems with either ILP32 or LP64 and the Studio 
> > compilers.
> > Only gcc with (default) 32-bit target experiences this failure.
> > 
> > -Paul
> > 
> > -- 
> > Paul H. Hargrove  phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department   Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2015/08/17750.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/08/17766.php


pgpBjYKaUKMEM.pgp
Description: PGP signature

Re: [OMPI devel] [OMPI users] open mpi 1.8.6. MPI_T

2015-08-17 Thread Nathan Hjelm


I see the problem. The second argument of MPI_T_pvar_get_index is not
the binding. It is the variable class. Change it to:

err = MPI_T_pvar_get_index(name, varClass, _idx);

and it works as expected.

-Nathan

On Fri, Aug 14, 2015 at 03:08:42PM -0400, George Bosilca wrote:
>Another issue, maybe a little bit more unsettling.
>If I iterate over the existing pvars, and for each after retrieving their
>name I use the name to search for the associated index I get an error. A
>short example is below.
>  George.
>err = MPI_T_pvar_get_num();
>if(err) MPI_Abort(MPI_COMM_WORLD, 0);
>printf("%d MPI Performance Variables\n", numPvar);
>for(i = 0; i < numPvar; i++) {
>  nameLen = sizeof(name);
>  descLen = sizeof(desc);
>  err = MPI_T_pvar_get_info(i, name, , ,
>, , , desc,
>, , ,
>, );
>  if( (MPI_SUCCESS != err) && (MPI_T_ERR_INVALID_INDEX != err) ) {
>printf("Failed to read Pvar %d/%d\n", i, numPvar);
>MPI_Abort(MPI_COMM_WORLD, 0);
>  }
> 
>
> printf("\t%s\tClass-%d\tBinding-%d\tReadonly-%s\tContinous-%s\tAtomic-%s\t%s\n",
> name, varClass, binding, isReadonly ? "T" : "F",
> isContinous ? "T" : "F", isAtomic ? "T" : "F", desc);
>  err = MPI_T_pvar_get_index(name, binding, _idx);
>  if (err != MPI_SUCCESS) {
>printf("cannot find %s pvar\n", name);
>MPI_Abort(MPI_COMM_WORLD, 0);
>  }
>  if( pvar_idx != i )
>printf("This is weird (%d != %d)!\n", pvar_idx, i);
>}
>On Fri, Aug 14, 2015 at 2:36 PM, George Bosilca 
>wrote:
> 
>  For this particular test I used the current master (022a9d8).
>  I reread the MPI_T chapter and [as usual] there might be something that
>  cautions the current behavior (aka. returning MPI_T_ERR_INVALID_INDEX
>  for an index smaller than the number of cvars returned
>  by MPI_T_cvar_get_num). This is indicated by the example 14.4, page 576.
>  If I exclude this return code from the list of errors, then things are
>  working as expected.
>  What is the community feeling? Should we reutrn the exact number of
>  available cvars or an upper bound is a valid value?
>George.
>  On Fri, Aug 14, 2015 at 2:21 PM, Jeff Squyres (jsquyres)
>   wrote:
> 
>George: what OMPI version did you test?
>> On Aug 14, 2015, at 2:14 PM, George Bosilca 
>wrote:
>>
>> This user email requires special attention, as it highlighted some
>issues with our MPI_T variables.
>>
>> I wrote a short application to list all pvar and cvar available.
>Unexpectedly, listing the cvars leads to a lot of failures, 138 over
>1035 cvars. If a cvar is broken I would have expected (based on the
>reading of the MPI_T chapter) not to be able to iterate over them
>instead of getting an error. The tester is attached.
>>
>>   George.
>>
>>
>> -- Forwarded message --
>> From: Khalid Hasanov 
>> Date: Fri, Aug 14, 2015 at 11:14 AM
>> Subject: [OMPI users] open mpi 1.8.6. MPI_T
>> To: Open MPI Users 
>>
>>
>> Hello,
>>
>> I am trying to use MPI_T interface to set coll_tuned_bcast_algorithm
>mca parameter during run time, however I was not successful to do
>that.
>>
>> I wonder if is it currently supported in Open MPI.
>>
>> I had the same problem with setting btl_self_eager_limit parameter.
>>
>> The code I am using attached below.
>>
>>
>> Thanks.
>>
>>
>> --
>> Best Regards,
>> Khalid
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>http://www.open-mpi.org/community/lists/users/2015/08/27470.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2015/08/17744.php
> 
>--
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
> 
>___
>devel mailing list
>de...@open-mpi.org
>Subscription:

Re: [OMPI devel] [OMPI users] open mpi 1.8.6. MPI_T

2015-08-17 Thread Nathan Hjelm


That is interesting. Let me look at the logic and see if I can determine
what is going wrong.

It could be a naming issues. ie. opal_btl_vader_flags vs
btl_vader_flags. Both are valid names for the same variable but the
search may only be succeeding for one. Should be simple enought to
fix if that is the case.

-Nathan

On Fri, Aug 14, 2015 at 03:08:42PM -0400, George Bosilca wrote:
>Another issue, maybe a little bit more unsettling.
>If I iterate over the existing pvars, and for each after retrieving their
>name I use the name to search for the associated index I get an error. A
>short example is below.
>  George.
>err = MPI_T_pvar_get_num();
>if(err) MPI_Abort(MPI_COMM_WORLD, 0);
>printf("%d MPI Performance Variables\n", numPvar);
>for(i = 0; i < numPvar; i++) {
>  nameLen = sizeof(name);
>  descLen = sizeof(desc);
>  err = MPI_T_pvar_get_info(i, name, , ,
>, , , desc,
>, , ,
>, );
>  if( (MPI_SUCCESS != err) && (MPI_T_ERR_INVALID_INDEX != err) ) {
>printf("Failed to read Pvar %d/%d\n", i, numPvar);
>MPI_Abort(MPI_COMM_WORLD, 0);
>  }
> 
>
> printf("\t%s\tClass-%d\tBinding-%d\tReadonly-%s\tContinous-%s\tAtomic-%s\t%s\n",
> name, varClass, binding, isReadonly ? "T" : "F",
> isContinous ? "T" : "F", isAtomic ? "T" : "F", desc);
>  err = MPI_T_pvar_get_index(name, binding, _idx);
>  if (err != MPI_SUCCESS) {
>printf("cannot find %s pvar\n", name);
>MPI_Abort(MPI_COMM_WORLD, 0);
>  }
>  if( pvar_idx != i )
>printf("This is weird (%d != %d)!\n", pvar_idx, i);
>}
>On Fri, Aug 14, 2015 at 2:36 PM, George Bosilca 
>wrote:
> 
>  For this particular test I used the current master (022a9d8).
>  I reread the MPI_T chapter and [as usual] there might be something that
>  cautions the current behavior (aka. returning MPI_T_ERR_INVALID_INDEX
>  for an index smaller than the number of cvars returned
>  by MPI_T_cvar_get_num). This is indicated by the example 14.4, page 576.
>  If I exclude this return code from the list of errors, then things are
>  working as expected.
>  What is the community feeling? Should we reutrn the exact number of
>  available cvars or an upper bound is a valid value?
>George.
>  On Fri, Aug 14, 2015 at 2:21 PM, Jeff Squyres (jsquyres)
>   wrote:
> 
>George: what OMPI version did you test?
>> On Aug 14, 2015, at 2:14 PM, George Bosilca 
>wrote:
>>
>> This user email requires special attention, as it highlighted some
>issues with our MPI_T variables.
>>
>> I wrote a short application to list all pvar and cvar available.
>Unexpectedly, listing the cvars leads to a lot of failures, 138 over
>1035 cvars. If a cvar is broken I would have expected (based on the
>reading of the MPI_T chapter) not to be able to iterate over them
>instead of getting an error. The tester is attached.
>>
>>   George.
>>
>>
>> -- Forwarded message --
>> From: Khalid Hasanov 
>> Date: Fri, Aug 14, 2015 at 11:14 AM
>> Subject: [OMPI users] open mpi 1.8.6. MPI_T
>> To: Open MPI Users 
>>
>>
>> Hello,
>>
>> I am trying to use MPI_T interface to set coll_tuned_bcast_algorithm
>mca parameter during run time, however I was not successful to do
>that.
>>
>> I wonder if is it currently supported in Open MPI.
>>
>> I had the same problem with setting btl_self_eager_limit parameter.
>>
>> The code I am using attached below.
>>
>>
>> Thanks.
>>
>>
>> --
>> Best Regards,
>> Khalid
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>http://www.open-mpi.org/community/lists/users/2015/08/27470.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>http://www.open-mpi.org/community/lists/devel/2015/08/17744.php
> 
>--
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
> 
>

Re: [OMPI devel] C standard compatibility

2015-07-30 Thread Nathan Hjelm

On Thu, Jul 30, 2015 at 12:41:33PM +, Jeff Squyres (jsquyres) wrote:
> We only recently started allowing the use of C99 in the code base (i.e., we 
> put AC_PROG_CC_C99 in configure.ac).
> 
> There's no *requirement* to use C99 throughout the code, but we generally do 
> the following kinds of things:
> 
> * restrict (as you noted)
> * variable declarations in the middle of blocks / loops (as you noted)
> * struct member initialization

Though there is no requirement we do strongly recommend the usage of
designated initializers for structure intialization. It may save us from
future headaches as structures evolve.

-Nathan


pgpvtMnhX4rBW.pgp
Description: PGP signature

Re: [OMPI devel] RFC: kill alpha asm support

2015-07-14 Thread Nathan Hjelm

That last sentence got mucked up somehow. Should read

Anyone still interested in alpha support can use the gcc sync atomics or
sitck with 1.10 and earlier.

Also, it looks like a Chinese company makes an alpha derivative called
ShenWei (https://en.wikipedia.org/wiki/ShenWei). It is not in widespread
use so its existence should not save the alpha asm.

-Nathan

On Tue, Jul 14, 2015 at 01:29:28PM -0600, Nathan Hjelm wrote:
> 
> I would like to kill the alpha assembly support in Open MPI in 2.x and
> master. alpha processors have not been available since 2007. Anyone
> still interested in alpha support can use the gcc sync atomics are stick
> with 1.10 or earlier?
> 
> Any objections?
> 
> -Nathan

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17640.php

pgpUvt5TJK1Hf.pgp
Description: PGP signature

[OMPI devel] RFC: kill alpha asm support

2015-07-14 Thread Nathan Hjelm


I would like to kill the alpha assembly support in Open MPI in 2.x and
master. alpha processors have not been available since 2007. Anyone
still interested in alpha support can use the gcc sync atomics are stick
with 1.10 or earlier?

Any objections?

-Nathan


pgpFt2pWILmi2.pgp
Description: PGP signature

Re: [OMPI devel] Open MPI 1.8.6 memory leak

2015-07-01 Thread Nathan Hjelm


Don't see the leak on master with OS X using the leaks command. Will see
what valgrind finds on linux.

-Nathan

On Wed, Jul 01, 2015 at 08:48:57PM +, Rolf vandeVaart wrote:
>There have been two reports on the user list about memory leaks.  I have
>reproduced this leak with LAMMPS.  Note that this has nothing to do with
>CUDA-aware features.  The steps that Stefan has provided make it easy to
>reproduce.
> 
> 
> 
>Here are some more specific steps to reproduce derived from Stefan.
> 
> 
> 
>1. clone LAMMPS (git clone git://git.lammps.org/lammps-ro.git lammps)
>2. cd src/, compile with openMPI 1.8.6.  To do this, set your path to Open
>MPI and type "make mpi"
>3. run the example listed in lammps/examples/melt. To do this, first copy
>"lmp_mpi" from the src directory into the melt directory.  Then you need
>to modify the in.melt file so that it will run for a while.  Change
>"run 25" to "run25"
> 
>4. you can run by mpirun -np 2 lmp_mpi < in.melt
> 
> 
> 
>For reference, here is both 1.8.5 and 1.8.6 memory consumption.  1.8.5
>stays very stable where 1.8.6 almost triples after 6 minutes of running.
> 
> 
> 
>Open MPI 1.8.5
> 
> 
> 
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 59.0  0.0 329672 14584 pts/16   Rl   16:24   0:00
>./lmp_mpi_185_nocuda
>3234126908 60.0  0.0 329672 14676 pts/16   Rl   16:24   0:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 98.3  0.0 329672 14932 pts/16   Rl   16:24   0:30
>./lmp_mpi_185_nocuda
>3234126908 98.5  0.0 329672 14932 pts/16   Rl   16:24   0:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 98.9  0.0 329672 14960 pts/16   Rl   16:24   1:00
>./lmp_mpi_185_nocuda
>3234126908 99.1  0.0 329672 14952 pts/16   Rl   16:24   1:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.1  0.0 329672 14960 pts/16   Rl   16:24   1:30
>./lmp_mpi_185_nocuda
>3234126908 99.3  0.0 329672 14952 pts/16   Rl   16:24   1:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.2  0.0 329672 14960 pts/16   Rl   16:24   2:00
>./lmp_mpi_185_nocuda
>3234126908 99.4  0.0 329672 14952 pts/16   Rl   16:24   2:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.3  0.0 329672 14960 pts/16   Rl   16:24   2:30
>./lmp_mpi_185_nocuda
>3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   2:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   2:59
>./lmp_mpi_185_nocuda
>3234126908 99.5  0.0 329672 14952 pts/16   Rl   16:24   3:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:29
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   3:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   3:59
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.4  0.0 329672 14960 pts/16   Rl   16:24   4:29
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   4:30
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   4:59
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:00
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:29
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:29
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126907 99.5  0.0 329672 14960 pts/16   Rl   16:24   5:59
>./lmp_mpi_185_nocuda
>3234126908 99.6  0.0 329672 14956 pts/16   Rl   16:24   5:59
>./lmp_mpi_185_nocuda
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 
> 
> 
>Open MPI 1.8.6
> 
> 
> 
>USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>3234126755  0.0  0.0 330288

Re: [OMPI devel] opal_lifo hangs on ppc in master

2015-07-01 Thread Nathan Hjelm


Paul, can you send me the config.log for the ppc build?

-Nathan

On Wed, Jul 01, 2015 at 09:33:53AM -0700, Paul Hargrove wrote:
>Testing last night's master tarball with "make check" I find that
>opal_lifo *hangs* on every ppc/linux system I try, including both gcc and
>xlc, both 32- and 64-bit CPUs and even a little-endian POWER8.
>Attaching gdb to a hung yields:
>(gdb) thread apply all bt full
>Thread 1 (Thread 0xfff9e4f8f30 (LWP 32858)):
>#0  0x10001778 in check_lifo_consistency (lifo=0xfffde0100b8,
>expected_count=100)
>at
>
> /home/hargrove/SCRATCH/OMPI/openmpi-master-linux-ppc64-xlc-12.1/openmpi-dev-2014-gc8730b5/test/class/opal_lifo.c:73
>item = 0x1003d6a8190
>count = 215728334
>__func__ = "check_lifo_consistency"
>#1  0x100022e4 in main (argc=1, argv=0xfffde010688)
>at
>
> /home/hargrove/SCRATCH/OMPI/openmpi-master-linux-ppc64-xlc-12.1/openmpi-dev-2014-gc8730b5/test/class/opal_lifo.c:171
>threads = {17590531453408, 17590520967648, 17590510481888,
>1759046128, 17590489510368, 
>  17590479024608, 17590468538848, 17590458053088}
>item = 0x1003d6a83c0
>prev = 0xfffde0100f0
>item2 = 0x1003d6a2cf0
>start = {tv_sec = 1435767655, tv_usec = 746823}
>stop = {tv_sec = 1435767667, tv_usec = 450808}
>total = {tv_sec = 11, tv_usec = 703985}
>lifo = {super = {obj_magic_id = 16046253926196952813, obj_class =
>0xfff9e431fb0, 
>obj_reference_count = 1, 
>cls_init_file_name = 0x10003524
>
> "/home/hargrove/SCRATCH/OMPI/openmpi-master-linux-ppc64-xlc-12.1/openmpi-dev-2014-gc8730b5/test/class/opal_lifo.c",
>cls_init_lineno = 96}, opal_lifo_head = {data = {
>  counter = 0, item = 0x1003d6a8190}}, opal_lifo_ghost =
>{super = {
>  obj_magic_id = 16046253926196952813, obj_class =
>0xfff9e431e18, obj_reference_count = 1, 
>  cls_init_file_name = 0xfff9e36f528
>
> "/home/hargrove/SCRATCH/OMPI/openmpi-master-linux-ppc64-xlc-12.1/openmpi-dev-2014-gc8730b5/opal/class/opal_lifo.c",
>cls_init_lineno = 27}, 
>opal_list_next = 0xfffde0100f0, opal_list_prev = 0x0,
>item_free = 1, 
>opal_list_item_refcount = 0, opal_list_item_belong_to = 0x0}}
>success = false
>timing = 1.462998124999e-06
>rc = 0
>__func__ = "main"
>-Paul
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17587.php



pgpGMrWYAucBF.pgp
Description: PGP signature

Re: [OMPI devel] error in test/threads/opal_condition.c

2015-07-01 Thread Nathan Hjelm


PGI no longer suprises me with how bad it is. The lines in question look
ok to me. We can fix this (and remove the common symbols) by removing
the initializers and making the variables static. I will go ahead and do
this.

-Nathan

On Wed, Jul 01, 2015 at 05:41:59AM -0700, Paul Hargrove wrote:
>I find that PGI version 9, 10, 11, 12, 13 and 14 all fail "make check"
>with last night's master tarball.  All expect 9 fail with pretty much the
>same message:
>  CC   opal_condition.o
>PGC-S-0155-Empty initializer not supported
> 
> (/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2014-gc8730b5/test/threads/opal_condition.c:
>48)
>PGC-S-0155-Empty initializer not supported
> 
> (/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2014-gc8730b5/test/threads/opal_condition.c:
>49)
>PGC-S-0155-Empty initializer not supported
> 
> (/scratch/scratchdirs/hargrove/OMPI/openmpi-master-linux-x86_64-pgi-14.7/openmpi-dev-2014-gc8730b5/test/threads/opal_condition.c:
>50)
>PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
>make[3]: *** [opal_condition.o] Error 2
>Those lines are:
>opal_mutex_t mutex = {};
>opal_condition_t thr1_cond = {};
>opal_condition_t thr2_cond = {};
>I have no clue why pgi won't accept that when every other compiler does.
>Tests were on NERSC's Hopper and Carver, where Howard should be able to
>reproduce.
>-Paul
>--
>Paul H. Hargrove  phhargr...@lbl.gov
>Computer Languages & Systems Software (CLaSS) Group
>Computer Science Department   Tel: +1-510-495-2352
>Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17580.php



pgpFxPFlAdDNF.pgp
Description: PGP signature

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Nathan Hjelm


Yes.

-Nathan

On Mon, Jun 08, 2015 at 09:17:17AM -0700, Ralph Castain wrote:
> So how is the user going to specify these? -mca oob_base_verbose debug?
> 
> > On Jun 8, 2015, at 9:11 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
> > 
> > 
> > That would work. The standard verbosity levels could be one of those
> > values but allow any number in the interval [0,100] or any of none,
> > error, warn, info, debug, and trace. The standard levels could be
> > defined as:
> > 
> > enum {
> >MCA_BASE_VERBOSE_NONE  = 0,
> >MCA_BASE_VERBOSE_ERROR = 1,
> >MCA_BASE_VERBOSE_WARN  = 10,
> >MCA_BASE_VERBOSE_INFO  = 20,
> >MCA_BASE_VERBOSE_DEBUG = 40,
> >MCA_BASE_VERBOSE_TRACE = 60,
> >MCA_BASE_VERBOSE_MAX   = 100,
> > };
> > 
> > static mca_base_var_enum_value_t verbose_values[] = {
> >{"none",   MCA_BASE_VERBOSE_NONE},
> >{"error",  MCA_BASE_VERBOSE_ERROR},
> >{"warn",   MCA_BASE_VERBOSE_WARN},
> >{"info",   MCA_BASE_VERBOSE_INFO},
> >{"debug",  MCA_BASE_VERBOSE_DEBUG},
> >{"trace",  MCA_BASE_VERBOSE_TRACE},
> >{NULL, -1}
> > };
> > 
> > -Nathan
> > 
> > On Tue, Jun 09, 2015 at 12:42:05AM +0900, KAWASHIMA Takahiro wrote:
> >>> static const char* const priorities[] = {
> >>>"ERROR",
> >>>"WARN",
> >>>"INFO",
> >>>"DEBUG",
> >>>"TRACE"
> >>> };
> >> 
> >> +1
> >> 
> >> I usually use these levels.
> >> 
> >> Typical usage:
> >> 
> >> ERROR:
> >>  Print an error message on returning a value other than
> >>  OMPI_SUCCESS (and OMPI_ERR_TEMP_OUT_OF_RESOURCE etc.).
> >> 
> >> WARN:
> >>  This does not indicate an error. But users/developers should
> >>  be aware on debugging/tuning. For example, network-level
> >>  timeout, hardware queue full, buggy code.
> >>  Often used with OMPI_ERR_TEMP_OUT_OF_RESOURCE.
> >> 
> >> INFO:
> >>  Information that may be useful for users and developers.
> >>  Not so verbose. Output only on initialization or
> >>  object creation etc.
> >> 
> >> DEBUG:
> >>  Information that is useful only for developers.
> >>  Not so verbose. Output once per MPI routine call.
> >> 
> >> TRACE:
> >>  Information that is useful only for developers.
> >>  Verbose. Output more than once per MPI routine call.
> >> 
> >> Regards,
> >> KAWASHIMA Takahiro
> >> 
> >>> so what about :
> >>> 
> >>> static const char* const priorities[] = {
> >>>"ERROR",
> >>>"WARN",
> >>>"INFO",
> >>>"DEBUG",
> >>>"TRACE"
> >>> };
> >>> 
> >>> and merge debug and trace if there should be only 4
> >>> 
> >>> Cheers,
> >>> 
> >>> Gilles
> >>> 
> >>> 
> >>> On Monday, June 8, 2015, Ralph Castain <r...@open-mpi.org> wrote:
> >>> 
> >>>> Could we maybe narrow it down some? If we are going to do it, let’s not
> >>>> make the mistake of the MCA param system and create so many levels. 
> >>>> Nobody
> >>>> can figure out the right gradation as it is just too fine grained.
> >>>> 
> >>>> I think Nathan’s proposal is the max that makes sense.
> >>>> 
> >>>> I’d also like to see us apply the same logic to the MCA param system.
> >>>> Let’s just define ~4 named levels and get rid of the fine grained 
> >>>> numbering.
> >>>> 
> >>>> 
> >>>> On Jun 8, 2015, at 2:04 AM, Gilles Gouaillardet <gil...@rist.or.jp
> >>>> <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> wrote:
> >>>> 
> >>>> Nathan,
> >>>> 
> >>>> i think it is a good idea to use names vs numeric values for verbosity.
> >>>> 
> >>>> what about using "a la" log4c verbosity names ?
> >>>> http://sourceforge.net/projects/log4c/
> >>>> 
> >>>> static const char* const priorities[] = {
> >>>>"FATAL",
> >>>>"ALERT",
> >>&

Re: [OMPI devel] RFC: standardize verbosity values

2015-06-08 Thread Nathan Hjelm


That would work. The standard verbosity levels could be one of those
values but allow any number in the interval [0,100] or any of none,
error, warn, info, debug, and trace. The standard levels could be
defined as:

enum {
MCA_BASE_VERBOSE_NONE  = 0,
MCA_BASE_VERBOSE_ERROR = 1,
MCA_BASE_VERBOSE_WARN  = 10,
MCA_BASE_VERBOSE_INFO  = 20,
MCA_BASE_VERBOSE_DEBUG = 40,
MCA_BASE_VERBOSE_TRACE = 60,
MCA_BASE_VERBOSE_MAX   = 100,
};

static mca_base_var_enum_value_t verbose_values[] = {
{"none",   MCA_BASE_VERBOSE_NONE},
{"error",  MCA_BASE_VERBOSE_ERROR},
{"warn",   MCA_BASE_VERBOSE_WARN},
{"info",   MCA_BASE_VERBOSE_INFO},
{"debug",  MCA_BASE_VERBOSE_DEBUG},
{"trace",  MCA_BASE_VERBOSE_TRACE},
{NULL, -1}
};

-Nathan

On Tue, Jun 09, 2015 at 12:42:05AM +0900, KAWASHIMA Takahiro wrote:
> > static const char* const priorities[] = {
> > "ERROR",
> > "WARN",
> > "INFO",
> > "DEBUG",
> > "TRACE"
> > };
> 
> +1
> 
> I usually use these levels.
> 
> Typical usage:
> 
> ERROR:
>   Print an error message on returning a value other than
>   OMPI_SUCCESS (and OMPI_ERR_TEMP_OUT_OF_RESOURCE etc.).
> 
> WARN:
>   This does not indicate an error. But users/developers should
>   be aware on debugging/tuning. For example, network-level
>   timeout, hardware queue full, buggy code.
>   Often used with OMPI_ERR_TEMP_OUT_OF_RESOURCE.
> 
> INFO:
>   Information that may be useful for users and developers.
>   Not so verbose. Output only on initialization or
>   object creation etc.
> 
> DEBUG:
>   Information that is useful only for developers.
>   Not so verbose. Output once per MPI routine call.
> 
> TRACE:
>   Information that is useful only for developers.
>   Verbose. Output more than once per MPI routine call.
> 
> Regards,
> KAWASHIMA Takahiro
> 
> > so what about :
> > 
> > static const char* const priorities[] = {
> > "ERROR",
> > "WARN",
> > "INFO",
> > "DEBUG",
> > "TRACE"
> > };
> > 
> > and merge debug and trace if there should be only 4
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > 
> > On Monday, June 8, 2015, Ralph Castain <r...@open-mpi.org> wrote:
> > 
> > > Could we maybe narrow it down some? If we are going to do it, let’s not
> > > make the mistake of the MCA param system and create so many levels. Nobody
> > > can figure out the right gradation as it is just too fine grained.
> > >
> > > I think Nathan’s proposal is the max that makes sense.
> > >
> > > I’d also like to see us apply the same logic to the MCA param system.
> > > Let’s just define ~4 named levels and get rid of the fine grained 
> > > numbering.
> > >
> > >
> > > On Jun 8, 2015, at 2:04 AM, Gilles Gouaillardet <gil...@rist.or.jp
> > > <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>> wrote:
> > >
> > >  Nathan,
> > >
> > > i think it is a good idea to use names vs numeric values for verbosity.
> > >
> > > what about using "a la" log4c verbosity names ?
> > > http://sourceforge.net/projects/log4c/
> > >
> > > static const char* const priorities[] = {
> > > "FATAL",
> > > "ALERT",
> > > "CRIT",
> > > "ERROR",
> > > "WARN",
> > > "NOTICE",
> > > "INFO",
> > > "DEBUG",
> > > "TRACE",
> > > "NOTSET",
> > > "UNKNOWN"
> > > };
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > On 5/30/2015 1:32 AM, Nathan Hjelm wrote:
> > >
> > > At the moment we have a loosely enforced standard for verbosity
> > > values. In general frameworks accept anything in the range 0 - 100 with
> > > few exceptions. I am thinking about adding an enumerator for verbosities
> > > that will accept values in this range and certain named constants which
> > > will match with specific verbosity levels. One possible set: none - 0,
> > > low - 25, med - 50, high - 75, max - 100. I am open to any set of named
> > > verbosities.
> > >
> > > Thoughts?
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/06/17475.php


pgpVFYxsar0oC.pgp
Description: PGP signature

Re: [OMPI devel] Hang in IMB-RMA?

2015-05-12 Thread Nathan Hjelm


Thanks! I will look at osc/rdma in 1.8 and see about patching the
bug. The RMA code in master and 1.8 has diverged significantly but it
shouldn't be too dificult to fix.

-Nathan

On Tue, May 12, 2015 at 06:50:41PM +, Friedley, Andrew wrote:
> Hi Nathan,
> 
> I should have thought to do that.  Yes, the issue seems to be fixed on master 
> -- no hangs on PSM, openib, or tcp.
> 
> Andrew
> 
> > -Original Message-
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
> > Hjelm
> > Sent: Tuesday, May 12, 2015 9:44 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] Hang in IMB-RMA?
> > 
> > 
> > Thanks for the report. Can you try with master and see if the issue is fixed
> > there?
> > 
> > -Nathan
> > 
> > On Tue, May 12, 2015 at 04:38:01PM +, Friedley, Andrew wrote:
> > > Hi,
> > >
> > > I've run into a problem with the IMB-RMA exchange_get test.  At this point
> > I suspect it's an issue in Open MPI or the test itself.  Could someone take 
> > a
> > look?
> > >
> > > I'm running Open MPI 1.8.5 and IMB 4.0.2.  MVAPICH2 is able to run all of
> > IMB-RMA successfully.
> > >
> > >  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA
> > >
> > > Eventually hangs at the end of exchange_get (after 4mb is reported)
> > running the np=2 pass.  IMB runs every np power of 2 up to and including the
> > np given on the command line.  So, with mpirun -np 4 above, IMB runs each
> > of its tests with np=2 and then with np=4.
> > >
> > > If I run just the exchange_get test, the same thing happens:
> > >
> > >  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA exchange_get
> > >
> > > If I run either of the above commands with -np 2, IMB-RMA successfully
> > runs to completion.
> > >
> > > I have reproduced with tcp, verbs, and PSM -- does not appear to be
> > transport specific.  MVAPICH2 2.0 works.
> > >
> > > Below are bracktraces from two of the four ranks.  The other two ranks
> > each have a backtrace similar to these two.
> > >
> > > Thanks!
> > >
> > > Andrew
> > >
> > > #0  0x7fca39a4c0c7 in sched_yield () from /lib64/libc.so.6
> > > #1  0x7fca393ef2fb in opal_progress () at
> > > runtime/opal_progress.c:197
> > > #2  0x7fca33cd21f5 in opal_condition_wait (m=0x247fc70, c=0x247fcd8)
> > > at ../../../../opal/threads/condition.h:78
> > > #3  ompi_osc_rdma_flush_lock (module=module@entry=0x247fb50,
> > lock=0x2481a20,
> > > target=target@entry=3) at
> > > osc_rdma_passive_target.c:530
> > > #4  0x7fca33cd43bd in ompi_osc_rdma_flush (target=3,
> > win=0x2482150)
> > > at osc_rdma_passive_target.c:578
> > > #5  0x7fca39fe5654 in PMPI_Win_flush (rank=3, win=0x2482150)
> > > at pwin_flush.c:58
> > > #6  0x0040aec5 in IMB_rma_exchange_get ()
> > > #7  0x00406a35 in IMB_warm_up ()
> > > #8  0x004023bd in main ()
> > >
> > > #0  0x7f1c81890bdd in poll () from /lib64/libc.so.6
> > > #1  0x7f1c81271c86 in poll_dispatch (base=0x1be8350,
> > tv=0x7fff4c323480)
> > > at poll.c:165
> > > #2  0x7f1c81269aa4 in opal_libevent2021_event_base_loop
> > (base=0x1be8350,
> > > flags=2) at event.c:1633
> > > #3  0x7f1c812232e8 in opal_progress () at
> > > runtime/opal_progress.c:169
> > > #4  0x7f1c7b9641f5 in opal_condition_wait (m=0x1ccf4a0, c=0x1ccf508)
> > > at ../../../../opal/threads/condition.h:78
> > > #5  ompi_osc_rdma_flush_lock (module=module@entry=0x1ccf380,
> > lock=0x23287f0,
> > > target=target@entry=0) at
> > > osc_rdma_passive_target.c:530
> > > #6  0x7f1c7b9663bd in ompi_osc_rdma_flush (target=0,
> > win=0x2317d00)
> > > at osc_rdma_passive_target.c:578
> > > #7  0x7f1c81e19654 in PMPI_Win_flush (rank=0, win=0x2317d00)
> > > at pwin_flush.c:58
> > > #8  0x0040aec5 in IMB_rma_exchange_get ()
> > > #9  0x00406a35 in IMB_warm_up ()
> > > #10 0x004023bd in main ()
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/05/17396.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/05/17398.php


pgpKV5Zdvu1BK.pgp
Description: PGP signature

Re: [OMPI devel] Hang in IMB-RMA?

2015-05-12 Thread Nathan Hjelm


Thanks for the report. Can you try with master and see if the issue is
fixed there?

-Nathan

On Tue, May 12, 2015 at 04:38:01PM +, Friedley, Andrew wrote:
> Hi,
> 
> I've run into a problem with the IMB-RMA exchange_get test.  At this point I 
> suspect it's an issue in Open MPI or the test itself.  Could someone take a 
> look?
> 
> I'm running Open MPI 1.8.5 and IMB 4.0.2.  MVAPICH2 is able to run all of 
> IMB-RMA successfully.
> 
>  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA
> 
> Eventually hangs at the end of exchange_get (after 4mb is reported) running 
> the np=2 pass.  IMB runs every np power of 2 up to and including the np given 
> on the command line.  So, with mpirun -np 4 above, IMB runs each of its tests 
> with np=2 and then with np=4.
> 
> If I run just the exchange_get test, the same thing happens:
> 
>  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA exchange_get
> 
> If I run either of the above commands with -np 2, IMB-RMA successfully runs 
> to completion.
> 
> I have reproduced with tcp, verbs, and PSM -- does not appear to be transport 
> specific.  MVAPICH2 2.0 works.
> 
> Below are bracktraces from two of the four ranks.  The other two ranks each 
> have a backtrace similar to these two.
> 
> Thanks!
> 
> Andrew
> 
> #0  0x7fca39a4c0c7 in sched_yield () from /lib64/libc.so.6
> #1  0x7fca393ef2fb in opal_progress () at runtime/opal_progress.c:197
> #2  0x7fca33cd21f5 in opal_condition_wait (m=0x247fc70, c=0x247fcd8)
> at ../../../../opal/threads/condition.h:78
> #3  ompi_osc_rdma_flush_lock (module=module@entry=0x247fb50, lock=0x2481a20,
> target=target@entry=3) at osc_rdma_passive_target.c:530
> #4  0x7fca33cd43bd in ompi_osc_rdma_flush (target=3, win=0x2482150)
> at osc_rdma_passive_target.c:578
> #5  0x7fca39fe5654 in PMPI_Win_flush (rank=3, win=0x2482150)
> at pwin_flush.c:58
> #6  0x0040aec5 in IMB_rma_exchange_get ()
> #7  0x00406a35 in IMB_warm_up ()
> #8  0x004023bd in main ()
> 
> #0  0x7f1c81890bdd in poll () from /lib64/libc.so.6
> #1  0x7f1c81271c86 in poll_dispatch (base=0x1be8350, tv=0x7fff4c323480)
> at poll.c:165
> #2  0x7f1c81269aa4 in opal_libevent2021_event_base_loop (base=0x1be8350,
> flags=2) at event.c:1633
> #3  0x7f1c812232e8 in opal_progress () at runtime/opal_progress.c:169
> #4  0x7f1c7b9641f5 in opal_condition_wait (m=0x1ccf4a0, c=0x1ccf508)
> at ../../../../opal/threads/condition.h:78
> #5  ompi_osc_rdma_flush_lock (module=module@entry=0x1ccf380, lock=0x23287f0,
> target=target@entry=0) at osc_rdma_passive_target.c:530
> #6  0x7f1c7b9663bd in ompi_osc_rdma_flush (target=0, win=0x2317d00)
> at osc_rdma_passive_target.c:578
> #7  0x7f1c81e19654 in PMPI_Win_flush (rank=0, win=0x2317d00)
> at pwin_flush.c:58
> #8  0x0040aec5 in IMB_rma_exchange_get ()
> #9  0x00406a35 in IMB_warm_up ()
> #10 0x004023bd in main ()
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/05/17396.php


pgpAPF2mv7b_3.pgp
Description: PGP signature

Re: [OMPI devel] MCA params

2015-05-11 Thread Nathan Hjelm

Hmm, that shouldn't be happening. Will take a look now.

-Nathan

On Mon, May 11, 2015 at 04:51:43PM -0400, George Bosilca wrote:
>I was looking to preconnect all MPI processes to remove some weird
>behaviors. As I did not remembered the full name I hope to get that from
>the ompi_info.
>$ ompi_info -a -l 9  | grep preco
> MCA mpi: parameter "mpi_preconnect_all" (current value:
>"false", data source: default, level: 9 dev/all, type: bool, deprecated,
>synonym of: mpi_preconnect_mpi)
>so mpi_preconnect_all is a deprecated synonym to a non-existing parameter.
>So be it!
>I then added "mpi_preconnect_mpi = true" to my
>$(HOME).opemmpi/mca-params.conf file.
>ompi_info -a -l 9  | grep preco
> MCA mpi: parameter "mpi_preconnect_all" (current value:
>"true", data source: file ((null)), level: 9 dev/all, type: bool,
>deprecated, synonym of: mpi_preconnect_mpi)
>While the change has been detected, the MCA system somehow got confused to
>where this change is coming from (the source file is (null)). Not very
>user friendly.
>  George.

> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/05/17393.php



pgpkoMjACyfyo.pgp
Description: PGP signature

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm


PR https://github.com/open-mpi/ompi-release/pull/250. Raphaël, can you
please confirm this fixes your issue.

-Nathan

On Wed, Apr 22, 2015 at 04:55:57PM +0200, Raphaël Fouassier wrote:
> We are experiencing a bug in OpenMPI in 1.8.4 which happens also on
> master: if locked memory limits are too low, a segfault happens
> in openib/udcm because some memory is not correctly deallocated.
> 
> To reproduce it, modify /etc/security/limits.conf with:
> * soft memlock 64
> * hard memlock 64
> and launch with mpirun (not in a slurm allocation).
> 
> 
> I propose 2 patches for 1.8.4 and master (because of the btl move to
> opal) which:
> - free all allocated ressources
> - print the limits error
> 

> diff --git a/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c 
> b/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
> index 19753a9..b74 100644
> --- a/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
> +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
> @@ -5,6 +5,7 @@
>   * Copyright (c) 2009  IBM Corporation.  All rights reserved.
>   * Copyright (c) 2011-2014 Los Alamos National Security, LLC.  All rights
>   * reserved.
> + * Copyright (c) 2015  Bull SAS. All rights reserved.
>   *
>   * $COPYRIGHT$
>   * 
> @@ -460,6 +461,8 @@ static int udcm_component_query(mca_btl_openib_module_t 
> *btl,
>  
>  rc = udcm_module_init (m, btl);
>  if (OMPI_SUCCESS != rc) {
> +free(m);
> +m = NULL;
>  break;
>  }
>  
> @@ -536,9 +539,19 @@ static int udcm_endpoint_finalize(struct 
> mca_btl_base_endpoint_t *lcl_ep)
>  return OMPI_SUCCESS;
>  }
>  
> +static void *udcm_unmonitor(int fd, int flags, void *context)
> +{
> +volatile int *barrier = (volatile int *)context;
> +
> +*barrier = 1;
> +
> +return NULL;
> +}
> +
>  static int udcm_module_init (udcm_module_t *m, mca_btl_openib_module_t *btl)
>  {
>  int rc = OMPI_ERR_NOT_SUPPORTED;
> +volatile int barrier = 0;
>  
>  BTL_VERBOSE(("created cpc module %p for btl %p",
>   (void*)m, (void*)btl));
> @@ -549,7 +562,7 @@ static int udcm_module_init (udcm_module_t *m, 
> mca_btl_openib_module_t *btl)
>  m->cm_channel = ibv_create_comp_channel (btl->device->ib_dev_context);
>  if (NULL == m->cm_channel) {
>  BTL_VERBOSE(("error creating ud completion channel"));
> -return OMPI_ERR_NOT_SUPPORTED;
> +goto out;
>  }
>  
>  /* Create completion queues */
> @@ -558,29 +571,33 @@ static int udcm_module_init (udcm_module_t *m, 
> mca_btl_openib_module_t *btl)
> m->cm_channel, 0);
>  if (NULL == m->cm_recv_cq) {
>  BTL_VERBOSE(("error creating ud recv completion queue"));
> -return OMPI_ERR_NOT_SUPPORTED;
> +mca_btl_openib_show_init_error(__FILE__, __LINE__, "ibv_create_cq",
> +   
> ibv_get_device_name(btl->device->ib_dev));
> +goto out1;
>  }
>  
>  m->cm_send_cq = ibv_create_cq (btl->device->ib_dev_context,
> UDCM_SEND_CQ_SIZE, NULL, NULL, 0);
>  if (NULL == m->cm_send_cq) {
>  BTL_VERBOSE(("error creating ud send completion queue"));
> -return OMPI_ERR_NOT_SUPPORTED;
> +mca_btl_openib_show_init_error(__FILE__, __LINE__, "ibv_create_cq",
> +   
> ibv_get_device_name(btl->device->ib_dev));
> +goto out2;
>  }
>  
>  if (0 != (rc = udcm_module_allocate_buffers (m))) {
>  BTL_VERBOSE(("error allocating cm buffers"));
> -return rc;
> +goto out3;
>  }
>  
>  if (0 != (rc = udcm_module_create_listen_qp (m))) {
>  BTL_VERBOSE(("error creating UD QP"));
> -return rc;
> +goto out4;
>  }
>  
>  if (0 != (rc = udcm_module_post_all_recvs (m))) {
>  BTL_VERBOSE(("error posting receives"));
> -return rc;
> +goto out5;
>  }
>  
>  /* UD CM initialized properly.  So fill in the rest of the CPC
> @@ -633,12 +650,41 @@ static int udcm_module_init (udcm_module_t *m, 
> mca_btl_openib_module_t *btl)
>  /* Finally, request CQ notification */
>  if (0 != ibv_req_notify_cq (m->cm_recv_cq, 0)) {
>  BTL_VERBOSE(("error requesting recv completions"));
> -return OMPI_ERROR;
> +rc = OMPI_ERROR;
> +goto out6;
>  }
>  
>  /* Ready to use */
>  
>  return OMPI_SUCCESS;
> +
> +out6:
> +OBJ_DESTRUCT(>cm_timeout_lock);
> +OBJ_DESTRUCT(>flying_messages);
> +OBJ_DESTRUCT(>cm_recv_msg_queue_lock);
> +OBJ_DESTRUCT(>cm_recv_msg_queue);
> +OBJ_DESTRUCT(>cm_send_lock);
> +OBJ_DESTRUCT(>cm_lock);
> +
> +m->channel_monitored = false;
> +
> +ompi_btl_openib_fd_unmonitor(m->cm_channel->fd,
> + udcm_unmonitor, (void *));
> +while (0 == barrier) {
>

Re: [OMPI devel] Fwd: OpenIB module initialisation causes segmentation fault when locked memory limit too low

2015-04-22 Thread Nathan Hjelm


Agreed. goto's just make me grumpy.

-Nathan

On Wed, Apr 22, 2015 at 01:17:11PM -0600, Howard Pritchard wrote:
>Hi Rafael,
>I give you an A+ for effort.   We always appreciate patches.
>Howard
>2015-04-22 12:43 GMT-06:00 Nathan Hjelm <hje...@lanl.gov>:
> 
>  Umm, why are you cleaning up this way. The allocated resources *should*
>  be freed by the udcm_module_finalize call. If there is a bug in that
>  path it should be fixed there NOT by adding a bunch of gotos (ick).
> 
>  I will take a look now and apply the appropriate fix.
> 
>  -Nathan
> 
>  On Wed, Apr 22, 2015 at 04:55:57PM +0200, Raphael Fouassier wrote:
>  > We are experiencing a bug in OpenMPI in 1.8.4 which happens also on
>  > master: if locked memory limits are too low, a segfault happens
>  > in openib/udcm because some memory is not correctly deallocated.
>  >
>  > To reproduce it, modify /etc/security/limits.conf with:
>  > * soft memlock 64
>  > * hard memlock 64
>  > and launch with mpirun (not in a slurm allocation).
>  >
>  >
>  > I propose 2 patches for 1.8.4 and master (because of the btl move to
>  > opal) which:
>  > - free all allocated ressources
>  > - print the limits error
>  >
> 
>  > diff --git a/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
>  b/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
>  > index 19753a9..b74 100644
>  > --- a/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
>  > +++ b/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c
>  > @@ -5,6 +5,7 @@
>  >   * Copyright (c) 2009  IBM Corporation.  All rights reserved.
>  >   * Copyright (c) 2011-2014 Los Alamos National Security, LLC.  All
>  rights
>  >   * reserved.
>  > + * Copyright (c) 2015  Bull SAS. All rights reserved.
>  >   *
>  >   * $COPYRIGHT$
>  >   *
>  > @@ -460,6 +461,8 @@ static int
>  udcm_component_query(mca_btl_openib_module_t *btl,
>  >
>  >  rc = udcm_module_init (m, btl);
>  >  if (OMPI_SUCCESS != rc) {
>  > +free(m);
>  > +m = NULL;
>  >  break;
>  >  }
>  >
>  > @@ -536,9 +539,19 @@ static int udcm_endpoint_finalize(struct
>  mca_btl_base_endpoint_t *lcl_ep)
>  >  return OMPI_SUCCESS;
>  >  }
>  >
>  > +static void *udcm_unmonitor(int fd, int flags, void *context)
>  > +{
>  > +volatile int *barrier = (volatile int *)context;
>  > +
>  > +*barrier = 1;
>  > +
>  > +return NULL;
>  > +}
>  > +
>  >  static int udcm_module_init (udcm_module_t *m,
>  mca_btl_openib_module_t *btl)
>  >  {
>  >  int rc = OMPI_ERR_NOT_SUPPORTED;
>  > +volatile int barrier = 0;
>  >
>  >  BTL_VERBOSE(("created cpc module %p for btl %p",
>  >   (void*)m, (void*)btl));
>  > @@ -549,7 +562,7 @@ static int udcm_module_init (udcm_module_t *m,
>  mca_btl_openib_module_t *btl)
>  >  m->cm_channel = ibv_create_comp_channel
>  (btl->device->ib_dev_context);
>  >  if (NULL == m->cm_channel) {
>  >  BTL_VERBOSE(("error creating ud completion channel"));
>  > -return OMPI_ERR_NOT_SUPPORTED;
>  > +goto out;
>  >  }
>  >
>  >  /* Create completion queues */
>  > @@ -558,29 +571,33 @@ static int udcm_module_init (udcm_module_t *m,
>  mca_btl_openib_module_t *btl)
>  > m->cm_channel, 0);
>  >  if (NULL == m->cm_recv_cq) {
>  >  BTL_VERBOSE(("error creating ud recv completion queue"));
>  > -return OMPI_ERR_NOT_SUPPORTED;
>  > +mca_btl_openib_show_init_error(__FILE__, __LINE__,
>  "ibv_create_cq",
>  > + 
>   ibv_get_device_name(btl->device->ib_dev));
>  > +goto out1;
>  >  }
>  >
>  >  m->cm_send_cq = ibv_create_cq (btl->device->ib_dev_context,
>  > UDCM_SEND_CQ_SIZE, NULL, NULL, 0);
>  >  if (NULL == m->cm_send_cq) {
>  >  BTL_VERBOSE(("error creating ud send completion queue"));
>  >

1 2 3 4 >

1 - 100 of 385 matches

Mail list logo