Re: [OMPI devel] Intel OPA and Open MPI

2019-04-24 Thread Nathan Hjelm via devel
I think so. The one who is most familiar with btl/ofi is Arm (CCed in this 
email).

-Nathan

> On Apr 24, 2019, at 9:41 AM, Heinz, Michael William 
>  wrote:
> 
> So, 
> 
> Would it be worthwhile for us to start doing test builds now? Is the code 
> ready for that at this time?
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of
>> Nathan Hjelm via devel
>> Sent: Friday, April 12, 2019 11:19 AM
>> To: Open MPI Developers 
>> Cc: Nathan Hjelm ; Castain, Ralph H
>> ; Yates, Brandon 
>> Subject: Re: [OMPI devel] Intel OPA and Open MPI
>> 
>> That is accurate. We expect to support OPA with the btl/ofi component. It
>> should give much better performance than osc/pt2pt + mtl/ofi. What would
>> be good for you to do on your end is verify everything works as expected
>> and that the performance is on par for what you expect.
>> 
>> -Nathan
>> 
>>> On Apr 12, 2019, at 9:11 AM, Heinz, Michael William
>>  wrote:
>>> 
>>> Hey guys,
>>> 
>>> So, I’ve watched the videos, dug through the release notes, and
>> participated in a few of the weekly meetings and I’m feeling a little more
>> comfortable about being a part of Open MPI - and I’m looking forward to it.
>>> 
>>> But I find myself needing to look for some direction for my participation
>> over the next few months.
>>> 
>>> First - a little background. Historically, I’ve been involved with IB/OPA
>> development for 17+ years now, but for the past decade or so I’ve been
>> entirely focused on fabric management rather than application-level stuff.
>> (Heck, if you ever wanted to complain about why OPA management
>> datagrams are different from IB MADs, feel free to point the finger at me,
>> I’m happy to explain why the new ones are better… ;-) ) However, it was only
>> recently that the FM team were given the additional responsibility for
>> maintaining / participating in our MPI efforts with very little opportunity 
>> for a
>> transfer of information with the prior team.
>>> 
>>> So, while I’m looking forward to this new role I’m feeling a bit
>> overwhelmed - not least of which because I will be unavailable for about 8
>> weeks this summer…
>>> 
>>> In particular, I found an issue in our internal tracking systems that says 
>>> (and
>> I may have mentioned this before…)
>>> 
>>> OMPI v5.0.0 will remove osc/pt2pt component that is the only component
>> that MTLs use (PSM2 and OFI). OMPI v5.0.0 is planned to be released during
>> summer 2019 (no concrete dates).  https://github.com/open-
>> mpi/ompi/wiki/5.0.x-FeatureList. The implications is that none of the MTLs
>> used for Omni-Path will support running one sided MPI APIs (RMA).
>>> 
>>> Is this still accurate? The current feature list says:
>>> 
>>> If osc/rdma supports all possible scenarios (e.g., all BTLs support the RDMA
>> methods osc/rdma needs), this should allow us to remove osc/pt2pt (i.e.,
>> 100% migrated to osc/rdma).
>>> 
>>> If this is accurate, I’m going to need help from the other maintainers to
>> understand the reason this is being done, the scope of this effort and where
>> we need to focus our attention. To deal with the lack of coverage over the
>> summer, I’ve asked a co-worker, Brandon Yates to start sitting in on the
>> weekly meetings with me.
>>> 
>>> Again, I’m looking forward to both the opportunity of working with an open
>> source team, and the chance to focus on the users of our software instead of
>> just the management of the fabric - I’m just struggling at the moment to get
>> a handle on this potential deadline.
>>> 
>>> ---
>>> Mike Heinz
>>> Networking Fabric Software Engineer
>>> Intel Corporation
>>> 
>>> ___
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-16 Thread Nathan Hjelm via devel
What Ralph said. You just blow memory on a queue that is not recovered in the 
current implementation.

Also, moving to Allreduce will resolve the issue as now every call is 
effectively also a barrier. I have found with some benchmarks and collective 
implementations it can be faster than reduce anyway. That is why it might be 
worth trying.

-Nathan

> On Apr 15, 2019, at 2:33 PM, Saliya Ekanayake  wrote:
> 
> Thank you, Nathan. Could you elaborate a bit on what happens internally? From 
> your answer it seems, the program will still produce the correct output at 
> the end but it'll use more resources. 
> 
> On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel 
>  wrote:
> If you do that it may run out of resources and deadlock or crash. I recommend 
> either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
> enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
> easiest option and depending on how large you run may not be any slower than 
> 1 or 3.
> 
> -Nathan
> 
> > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  wrote:
> > 
> > Hi Devs,
> > 
> > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> > understanding that ranks other than root (0 in this case) will pass the 
> > collective as soon as their data is written to MPI buffers without waiting 
> > for all of them to be received at the root?
> > 
> > If that's the case then what would happen (semantically) if we execute 
> > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> > collective multiple times while the root will be processing an earlier 
> > reduce? For example, the root can be in the first reduce invocation, while 
> > another rank is in the second the reduce invocation.
> > 
> > Thank you,
> > Saliya
> > 
> > -- 
> > Saliya Ekanayake, Ph.D
> > Postdoctoral Scholar
> > Performance and Algorithms Research (PAR) Group
> > Lawrence Berkeley National Laboratory
> > Phone: 510-486-5772
> > 
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Nathan Hjelm via devel
If you do that it may run out of resources and deadlock or crash. I recommend 
either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
easiest option and depending on how large you run may not be any slower than 1 
or 3.

-Nathan

> On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake  wrote:
> 
> Hi Devs,
> 
> When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> understanding that ranks other than root (0 in this case) will pass the 
> collective as soon as their data is written to MPI buffers without waiting 
> for all of them to be received at the root?
> 
> If that's the case then what would happen (semantically) if we execute 
> MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> collective multiple times while the root will be processing an earlier 
> reduce? For example, the root can be in the first reduce invocation, while 
> another rank is in the second the reduce invocation.
> 
> Thank you,
> Saliya
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Intel OPA and Open MPI

2019-04-12 Thread Nathan Hjelm via devel
That is accurate. We expect to support OPA with the btl/ofi component. It 
should give much better performance than osc/pt2pt + mtl/ofi. What would be 
good for you to do on your end is verify everything works as expected and that 
the performance is on par for what you expect.

-Nathan

> On Apr 12, 2019, at 9:11 AM, Heinz, Michael William 
>  wrote:
> 
> Hey guys,
>  
> So, I’ve watched the videos, dug through the release notes, and participated 
> in a few of the weekly meetings and I’m feeling a little more comfortable 
> about being a part of Open MPI - and I’m looking forward to it.
>  
> But I find myself needing to look for some direction for my participation 
> over the next few months.
>  
> First - a little background. Historically, I’ve been involved with IB/OPA 
> development for 17+ years now, but for the past decade or so I’ve been 
> entirely focused on fabric management rather than application-level stuff. 
> (Heck, if you ever wanted to complain about why OPA management datagrams are 
> different from IB MADs, feel free to point the finger at me, I’m happy to 
> explain why the new ones are better… ;-) ) However, it was only recently that 
> the FM team were given the additional responsibility for maintaining / 
> participating in our MPI efforts with very little opportunity for a transfer 
> of information with the prior team.
>  
> So, while I’m looking forward to this new role I’m feeling a bit overwhelmed 
> - not least of which because I will be unavailable for about 8 weeks this 
> summer…
>  
> In particular, I found an issue in our internal tracking systems that says 
> (and I may have mentioned this before…)
>  
> OMPI v5.0.0 will remove osc/pt2pt component that is the only component that 
> MTLs use (PSM2 and OFI). OMPI v5.0.0 is planned to be released during summer 
> 2019 (no concrete dates).  
> https://github.com/open-mpi/ompi/wiki/5.0.x-FeatureList. The implications is 
> that none of the MTLs used for Omni-Path will support running one sided MPI 
> APIs (RMA).
>  
> Is this still accurate? The current feature list says:
>  
> If osc/rdma supports all possible scenarios (e.g., all BTLs support the RDMA 
> methods osc/rdma needs), this should allow us to remove osc/pt2pt (i.e., 100% 
> migrated to osc/rdma).
>  
> If this is accurate, I’m going to need help from the other maintainers to 
> understand the reason this is being done, the scope of this effort and where 
> we need to focus our attention. To deal with the lack of coverage over the 
> summer, I’ve asked a co-worker, Brandon Yates to start sitting in on the 
> weekly meetings with me.
>  
> Again, I’m looking forward to both the opportunity of working with an open 
> source team, and the chance to focus on the users of our software instead of 
> just the management of the fabric - I’m just struggling at the moment to get 
> a handle on this potential deadline.
>  
> ---
> Mike Heinz
> Networking Fabric Software Engineer
> Intel Corporation
>  
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] IBM CI re-enabled.

2018-10-18 Thread Nathan Hjelm via devel

Appears to be broken. Its failing and simply saying:

Testing in progress..

-Nathan

On Oct 18, 2018, at 11:34 AM, Geoffrey Paulsen  wrote:

 
I've re-enabled IBM CI for PRs.
 

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Bug on branch v2.x since october 3

2018-10-17 Thread Nathan Hjelm via devel


Ah yes, 18f23724a broke things so we had to fix the fix. Didn't apply it to the 
v2.x branch. Will open a PR to bring it over.

-Nathan

On Oct 17, 2018, at 11:28 AM, Eric Chamberland 
 wrote:

Hi,

since commit 18f23724a, our nightly base test is broken on v2.x branch.

Strangely, on branch v3.x, it broke the same day with 2fd9510b4b44, but 
was repaired some days after (can't tell exactly, but at most it was 
fixed with fa3d92981a).


I get segmentation faults or deadlocks in many cases.

Could this be related with issue 5842 ?
(https://github.com/open-mpi/ompi/issues/5842)

Here is an example of backtrace for a deadlock:

#4 
#5 0x7f9dc9151d17 in sched_yield () from /lib64/libc.so.6
#6 0x7f9dccee in opal_progress () at runtime/opal_progress.c:243
#7 0x7f9dbe53cf78 in ompi_request_wait_completion (req=0x46ea000) 
at ../../../../ompi/request/request.h:392
#8 0x7f9dbe53e162 in mca_pml_ob1_recv (addr=0x7f9dd64a6b30 
long, long, PAType*, std::__debug::vectorstd::allocator >&)::slValeurs>, count=3, 
datatype=0x7f9dca61e2c0 , src=1, tag=32767, 
comm=0x7f9dca62a840 , status=0x7ffcf4f08170) at 
pml_ob1_irecv.c:129
#9 0x7f9dca35f3c4 in PMPI_Recv (buf=0x7f9dd64a6b30 
long, long, PAType*, std::__debug::vectorstd::allocator >&)::slValeurs>, count=3, 
type=0x7f9dca61e2c0 , source=1, tag=32767, 
comm=0x7f9dca62a840 , status=0x7ffcf4f08170) at 
precv.c:77
#10 0x7f9dd6261d06 in assertionValeursIdentiquesSurTousLesProcessus 
(pComm=0x7f9dca62a840 , pRang=0, pNbProcessus=2, 
pValeurs=0x7f9dd5a94da0 girefSynchroniseGroupeProcessusModeDebugImpl(PAGroupeProcessus 
const&, char const*, int)::slDonnees>, pRequetes=std::__debug::vector of 
length 1, capacity 1 = {...}) at 
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/src/commun/Parallele/mpi_giref.cc:332


And some informations about configuration:

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_config.log

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_ompi_info_all.txt

Thanks,

Eric
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Patcher on MacOS

2018-09-28 Thread Nathan Hjelm via devel
Nope.  We just never bothered to disable it on osx. I think Jeff was working on 
a patch.

-Nathan

> On Sep 28, 2018, at 3:21 PM, Barrett, Brian via devel 
>  wrote:
> 
> Is there any practical reason to have the memory patcher component enabled 
> for MacOS?  As far as I know, we don’t have any transports which require 
> memory hooks on MacOS, and with the recent deprecation of the syscall 
> interface, it emits a couple of warnings.  It would be nice to crush said 
> warnings and the easiest way would be to not build the component.
> 
> Thoughts?
> 
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Test mail

2018-08-28 Thread Nathan Hjelm via devel
no

Sent from my iPhone

> On Aug 27, 2018, at 8:51 AM, Jeff Squyres (jsquyres) via devel 
>  wrote:
> 
> Will this get through?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] openmpi 3.1.x examples

2018-07-17 Thread Nathan Hjelm via devel


> On Jul 16, 2018, at 11:18 PM, Marco Atzeri  wrote:
> 
>> Am 16.07.2018 um 23:05 schrieb Jeff Squyres (jsquyres) via devel:
>>> On Jul 13, 2018, at 4:35 PM, Marco Atzeri  wrote:
>>> 
 For one. The C++ bindings are no longer part of the standard and they are 
 not built by default in v3.1x. They will be removed entirely in Open MPI 
 v5.0.0.
>> Hey Marco -- you should probably join our packagers mailing list:
>> https://lists.open-mpi.org/mailman/listinfo/ompi-packagers
>> Low volume, but intended exactly for packagers like you.  It's fairly 
>> recent; we realized we needed to keep in better communication with our 
>> downstream packagers.
> 
> noted thanks.
> 
>> (+ompi-packagers to the CC)
>> As Nathan mentioned, we stopped building the MPI C++ bindings by default in 
>> Open MPI 3.0.  You can choose to build them with the configure 
>> --enable-mpi-cxx.
> 
> I was aware, as I am not building it anymore, however
> probably we should exclude the C++ from default examples.

Good point. I will fix that today and PR the fix to v3.0.x and v3.1.x. 


> Regards
> Merco
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] openmpi 3.1.x examples

2018-07-13 Thread Nathan Hjelm via devel
For one. The C++ bindings are no longer part of the standard and they are not 
built by default in v3.1x. They will be removed entirely in Open MPI v5.0.0. 

Not sure why the fortran one is not building. 

-Nathan

> On Jul 13, 2018, at 2:02 PM, Marco Atzeri  wrote:
> 
> Hi,
> may be I am missing something obvious, but are the
> examples still actual
> 
>  C:   hello_c.c
>  C++: hello_cxx.cc
>  Fortran mpif.h:  hello_mpifh.f
>  Fortran use mpi: hello_usempi.f90
>  Fortran use mpi_f08: hello_usempif08.f90
>  Java:Hello.java
>  C shmem.h:   hello_oshmem_c.c
>  Fortran shmem.fh:hello_oshmemfh.f90
> 
> 
> $ make hello_cxx
> mpic++ -g  hello_cxx.cc  -o hello_cxx
> hello_cxx.cc: In function ‘int main(int, char**)’:
> hello_cxx.cc:25:5: error: ‘MPI’ has not been declared
> 
> $ make -i
> ...
> mpifort -g  hello_usempi.f90  -o hello_usempi
> hello_usempi.f90:14:8:
> 
> use mpi
>1
> Fatal Error: Can't open module file ‘mpi.mod’ for reading at (1): No such 
> file or directory
> 
> The second could be a different problem
> 
> $ ls /usr/lib/mpi.mod
> /usr/lib/mpi.mod
> 
> 
> ---
> Diese E-Mail wurde von AVG auf Viren geprüft.
> http://www.avg.com
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Odd warning in OMPI v3.0.x

2018-07-06 Thread Nathan Hjelm via devel
Looks like a bug to me. The second argument should be a value in v3.x.x.

-Nathan

> On Jul 6, 2018, at 4:00 PM, r...@open-mpi.org wrote:
> 
> I’m seeing this when building the v3.0.x branch:
> 
> runtime/ompi_mpi_init.c:395:49: warning: passing argument 2 of 
> ‘opal_atomic_cmpset_32’ makes integer from pointer without a cast 
> [-Wint-conversion]
>  if (!opal_atomic_cmpset_32(_mpi_state, , desired)) {
>  ^
> In file included from ../opal/include/opal/sys/atomic.h:159:0,
>  from ../opal/threads/thread_usage.h:30,
>  from ../opal/class/opal_object.h:126,
>  from ../opal/class/opal_list.h:73,
>  from runtime/ompi_mpi_init.c:43:
> ../opal/include/opal/sys/x86_64/atomic.h:85:19: note: expected ‘int32_t {aka 
> int}’ but argument is of type ‘int32_t * {aka int *}’
>  static inline int opal_atomic_cmpset_32( volatile int32_t *addr,
>^
> 
> 
> I have a feeling this isn’t correct - yes?
> Ralph
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel



signature.asc
Description: Message signed with OpenPGP
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel