Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread Gilles Gouaillardet
Dave, These settings tell ompi to use native infiniband on the ib qdr port and tcpo/ip on the other port. From the faq, roce is implemented in the openib btl http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce Did you use --mca btl_openib_cpc_include rdmacm in your first tests ?

Re: [OMPI devel] OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread Gilles Gouaillardet
George, I cannot acces parsec : http error 403 :-( I understand your point of view. Back to the opal_lifo test, and if i remember correctly, it hangs in the non multi threaded part : the very first pop loops forever since cas always fails in comparing values that are equal indeed. Though there

[OMPI devel] Shutdown-time crash via oob:ud

2015-02-06 Thread Paul Hargrove
With last night's master tarball (openmpi-dev-845-ga3275aa) on a Linux/x86-64 system, I am seeing a crash (below) from ring_c run on a login node. Other than CC/CXX/FC settings I've configured with only --prefix=... --enable-debug --with-tm=... This is occurring with at least the Gnu, Intel, Path

Re: [OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread George Bosilca
On Fri, Feb 6, 2015 at 8:54 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > George, > > Can you point me to an other project that uses 128 bits atomics ? > http://icl.cs.utk.edu/parsec/. It heavily uses lock-free structures, and the 128 bits atomics are the safest and fastest wa

[OMPI devel] ess:alps build failure with PGI

2015-02-06 Thread Paul Hargrove
The following in orte/mca/ess/alps/Makefile.am assumes a GNU (or GNU-like) compiler: mca_ess_alps_la_CPPFLAGS = $(ess_alps_CPPFLAGS) -fno-ident If building with PGI, the result is pgcc-Error-Unknown switch: -fno-ident when compiling orte/mca/ess/alps/ess_alps_component.c This is last night's

[OMPI devel] opal_fifo SEGV from master

2015-02-06 Thread Paul Hargrove
Yes, this time I really mean "fifo", not "lifo". ;-) With last night's master tarball (Open MPI dev-845-ga3275aa) configured with only --prefix and --enable-debug A Linux-86-64 system running debian Wheezy and compiler = "gcc (Debian 4.7.2-5) 4.7.2" Failure from "make check": /home/phargrov/OMP

Re: [OMPI devel] Master build broken libfabrics + PGI

2015-02-06 Thread Paul Hargrove
With a newer master tarball I still see PGI + libfabrics failing, but with different errors this time. Relevant output from "make V=1" appears below. Though the build below was with pgi-10.9, I see the same problem with other PGI compiler versions (at least 11.9 as well) on the same system (and w

Re: [OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread Dave Turner
George, I can check with my guys on Monday but I think the bandwidth parameters are the defaults. I did alter these to 40960 and 10240 as someone else suggested to me. The attached graph shows the base red line, along with the manual balanced blue line and auto balanced green line (0's for

Re: [OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread George Bosilca
Dave, Based on your ompi_info.all the following bandwidth are reported on your system: MCA btl: parameter "btl_openib_bandwidth" (current value: "4", data source: default, level: 5 tuner/detail, type: unsigned) Approximate maximum bandwidth of interconnec

[OMPI devel] RoCE plus QDR IB tunable parameters

2015-02-06 Thread Dave Turner
We have nodes in our HPC system that have 2 NIC's, one being QDR IB and the second being a slower 10 Gbps card configured for both RoCE and TCP. Aggregate bandwidth tests with 20 cores on one node yelling at 20 cores on a second node (attached roce.ib.aggregate.pdf) show that without tuning t

[OMPI devel] PMIx support in ORTE

2015-02-06 Thread Ralph Castain
Hi folks Just a heads-up that I will be starting to work this weekend on shifting the current pmix server in ORTE to a new framework so we can support both our internal pmix support and users of the soon-to-be-released external PMIx client library. Please let me know if you are, or plan to, be

Re: [OMPI devel] OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread Gilles Gouaillardet
George, Can you point me to an other project that uses 128 bits atomics ? In my tests, i noticed that the volatile keyword is (one of) the trigger of the compiler bug. At this stage, i could not see anything wrong in ompi, plus this is working fine with recent gcc and icc, so i concluded this i

Re: [OMPI devel] Master hangs in opal_fifo test

2015-02-06 Thread George Bosilca
My feeling is that the current patch hide the symptoms without addressing the real issue. As a side note: The compiler incriminated in this thread, works perfectly for 128 bits atomic operations in other projects where I use atomic LIFO & FIFO (but not the one from OMPI as I already raised my conc

Re: [OMPI devel] Master assert failure on Linux/PPC64

2015-02-06 Thread Nysal Jan K A
It seems the ompi_free_list_init() in libnbc_open() failed for some reason. That would explain why mca_coll_libnbc_component.active_requests is not initialized and hence crash in libnbc_close(). This might help, but still doesn't explain why the free list initialization failed: diff --git a/ompi/m