Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-01 Thread Paul Hargrove
Trunk now works for me using Tetsuya's test code with both PGI-14.1 and 14.4 -Paul On Thu, Jul 31, 2014 at 6:17 PM, Jeff Squyres (jsquyres) wrote: > Many thanks guys, this thread was most helpful in finding the fix. > > Paul H. nailed 80% of it on the head in the post where he identified the >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
In general I am only setup to build from tarballs, not svn. However, I can (and will) apply this change manually w/o difficulty. I will report back when I've had a chance to try that. I already have many builds in-flight to test George's atomics patch and am in danger of confusing myself if I am

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul, As Ralph pointed, this issue was reported last month on the user mailing list. #include did not help : http://www.open-mpi.org/community/lists/users/2014/07/24883.php I will try if i can reproduce and fix this issue on a solaris10 (but x86) VM BTW, are you using the GNU compiler ? Cheer

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread George Bosilca
I am afraid the suggestion on the mailing list only addressed half of the problem. Indeed, alloca is used in two files (isend and irecv) while the suggested patch only only fixed the one in isend. George. On Fri, Aug 1, 2014 at 12:12 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wr

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
Gilles, This test was using the Solaris Studio Compilers version 12.3. /usr/bin/gcc on this system is "gccfss" which Open MPI does NOT support. There is also a gcc-3.3.2 in /usr/local/bin and gcc-3.4.3 in /usr/sfw/bin Neither includes usable fortran compilers, which is why the Studio compilers a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul, George just made a good point, you should test with his patch first if it still does not work, could you try to mix gnu and sun compilers ? configure ... CC=/usr/sfw/bin/gcc CXX=/usr/sfw/bin/g++ FC= Cheers, Gilles On 2014/08/01 13:19, Paul Hargrove wrote: > Gilles, > > This test was usin

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Paul Hargrove
George's patch worked for me. Now of course since this is a big-endian system things are still busted on trunk, but ring_c is now hung instead of failing at load time. -Paul On Thu, Jul 31, 2014 at 9:30 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > George just ma

[OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Ralph Castain
Usual place - this is a last-chance check, so please hit it. Main change from rc2 is the repairs to the Fortran binding config logic http://www.open-mpi.org/software/ompi/v1.8/ Ralph

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread George Bosilca
In case someone else want to play with the new atomics here is the most up-to-date patch. George. On Thu, Jul 31, 2014 at 10:26 PM, Paul Hargrove wrote: > George: > > Have a failure with your patch applied on PPC64/Linux and gcc-4.4.6: > > Making all in asm > make[2]: Entering directory > `

[OMPI devel] [1.8.2rc3] Build failure on FreeBSD (missing header)

2014-08-01 Thread Paul Hargrove
/home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36: error: use of undeclared identifier 'S_IRUSR' fd = open(myfile, O_CREAT, S_IRUSR); ^ To fix this it was sufficient to add the following 3 li

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph, for what it's worth : a) i faced the very same issue on my (slw) qemu emulated ppc64 vm b) i was able to run very basic programs when passing --mca coll ^ml to mpirun Cheers, Gilles On 2014/08/01 12:30, Ralph Castain wrote: > Yes, I fear this will require some effort to cha

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles's findings are consistent with mine which showed the SEGVs to be in the coll/ml code. I've built with --enable-debug and so below is a backtrace (well, two actually) that might be helpful. Unfortunately the output of the two ranks did get slightly entangled. -Paul $ ../INST/bin/mpirun -mca

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Hmm, maybe this has nothing to do with big-endian. Below is a backtrace from ring_c on an IA64 platform (definitely little-endian) that looks very similar to me. It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems. So, I wonder if that might be related. -Paul $ mpirun -mca

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul, you are absolutly right ! in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53, cm->lmngr_alignment is hard coded to 4096 as a proof of concept, i hard coded it to 65536 and now coll/ml works just fine i will now write a patch that uses sysconf(_SC_PAGESIZE) instead Cheers, Gilles On 2014/08

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Paul Hargrove
Note that the Solaris unresolved alloca problem George fixed in r32388 is still present in 1.8.2rc3. I have manually confirmed that the same patch resolves the problem in 1.8.2rc3. -Paul On Thu, Jul 31, 2014 at 9:44 PM, Ralph Castain wrote: > Usual place - this is a last-chance check, so pleas

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
Gilles, At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following: #ifdef HAVE_GETPAGESIZE pagesize = getpagesize(); #else pagesize = 4096; #endif While other places in the code use sysconf(), but not always consistently. And on some systems _SC_PAGESIZE is spelled

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul, i just commited r32393 (and made a CMR for v1.8) can you please give it a try ? in the mean time, i received your email ... sysconf is called directly (e.g. no #ifdef protected) in several other places : $ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v autom4te |grep PA

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Paul Hargrove
On Fri, Aug 1, 2014 at 1:19 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Paul, > > i just commited r32393 (and made a CMR for v1.8) > > can you please give it a try ? > I am not equipped to build from svn on most of my test platforms. However, I applied your one-line change

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Mike Dubman
Also, latest commit into openib (origin/v1.8 https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something: *11:45:01* + timeout -s SIGSEGV 3m /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,openib /scrap/jenkins/workspace/OMPI-ven

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Kawashima, Takahiro
George, I compiled trunk with your patch for SPARCV9/Linux/GCC. I see following warning/errors. In file included from opal/include/opal/sys/atomic.h:175, from opal/asm/asm.c:21: opal/include/opal/sys/sparcv9/atomic.

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
George and Ralph, i am very confused whether there is an issue or not. anyway, today Paul and i ran basic tests on big endian machines and did not face any issue related to big endianness. so i made my homework, digged into the code, and basically, opal_process_name_t is used as an orte_process

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
one last point : in orte_process_name_t, jobid and vpid have type orte_jobid_t and orte_vpid_t which really is uint32_t. in orte/util/proc.c, the function pointers opal_process_name_vpid and opal_process_name_jobid return an int32_t should it be an uint32_t instead ? /* and then _process_name_jo

Re: [OMPI devel] [1.8.2rc3] Build failure on FreeBSD (missing header)

2014-08-01 Thread Ralph Castain
Thanks Paul - added and cmr'd On Jul 31, 2014, at 11:23 PM, Paul Hargrove wrote: > > /home/phargrov/OMPI/openmpi-1.8.2rc3-freebsd10-amd64/openmpi-1.8.2rc3/orte/mca/ess/base/ess_base_std_app.c:412:36: > error: use of undeclared identifier 'S_IRUSR' > fd = open(myfile, O_CREAT, S_IRUSR)

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Ralph Castain
Okay, I fixed those two and will release rc4 once the coll/ml fix has been reviewed. Thanks On Aug 1, 2014, at 2:46 AM, Mike Dubman wrote: > Also, latest commit into openib (origin/v1.8 > https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something: > > 11:45:01 + timeout -s SIGSEGV 3m

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Hi Gilles I'm not sure if we have a problem or not - we'll have to wait and see, I guess. So far, I'm not seeing any problems on x86 archs, but that's to be expected and I don't have access to anything else. I fixed the issues you noted plus a few others I found. I imagine we'll discover more

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Ralph Castain
Oh, should point out: I didn't deal with the potential btl/tcp issue you noted - I defer that to George On Aug 1, 2014, at 7:56 AM, Ralph Castain wrote: > Hi Gilles > > I'm not sure if we have a problem or not - we'll have to wait and see, I > guess. So far, I'm not seeing any problems on x8

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-01 Thread Pritchard Jr., Howard
Hi Jeff, Finally got info yesterday about where the newer PGI compilers are hiding out at LANL. I'll check this out today. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, July 29, 2014 5:24 PM To: Open MPI D

Re: [OMPI devel] Further questions about BTL OPAL move...

2014-08-01 Thread Ralph Castain
Closing the loop on this: I think I misunderstood George's comment. Looking thru the code, the MPI layer is indeed using the RTE names when defining things, so there is no inter-layer confusion. ORTE has no need to "own" the structure itself (except for non-MPI processes), so the existing code i

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread George Bosilca
Gilles, The design of the BTL move was to let the opal_process_name_t be agnostic to what is stored inside, and all accesses should be done through the provided accessors. Thus, big endian or little endian doesn’t make a difference, as long as everything goes through the accessors. I’m skeptic

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-01 Thread Pritchard Jr., Howard
Sorry, finally got through all this ompi email and see this problem was fixed. -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard Jr., Howard Sent: Friday, August 01, 2014 8:59 AM To: Open MPI Developers Subject: Re: [OMPI devel] openmpi-1.8.2rc2 and

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
This commit brings two things. One if the renaming suggested by Gilles. The second one is forcing the ORTE process down on the OPAL. This doesn't fit the current design of the BTL move. The current design assumes that the local OPAL process is part of the local OMPI process. George. PS: If it d

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread Ralph Castain
On Aug 1, 2014, at 8:27 AM, George Bosilca wrote: > This commit brings two things. One if the renaming suggested by Gilles. The > second one is forcing the ORTE process down on the OPAL. This doesn't fit the > current design of the BTL move. The current design assumes that the local > OPAL pr

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
I missed the fact that the app doesn't force it. But if this is indeed the case then it is extremely weird that you are seing someone else releasing your proc. Regarding the destruction of the proc, the OPAL layer only does in a single place, when the local proc is set (opal_proc_local_set). Moreo

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread Ralph Castain
I found the problem - the issue is that assert on the convertor. MPI apps are setting that convertor, but not non-MPI apps, and so the field is NULL. Can we remove that assert? On Aug 1, 2014, at 9:30 AM, George Bosilca wrote: > I missed the fact that the app doesn't force it. But if this is

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32401 - trunk/opal/util

2014-08-01 Thread Ralph Castain
Thanks George! On Aug 1, 2014, at 9:36 AM, svn-commit-mai...@open-mpi.org wrote: > Author: bosilca (George Bosilca) > Date: 2014-08-01 12:36:23 EDT (Fri, 01 Aug 2014) > New Revision: 32401 > URL: https://svn.open-mpi.org/trac/ompi/changeset/32401 > > Log: > No more assert in the proc destructor.

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32398 - in trunk: ompi/mca/bcol/basesmuma ompi/mca/coll/hierarch ompi/mca/coll/sm ompi/mca/dpm/orte ompi/mca/pml/bfo ompi/mca/rte/orte ompi/proc ompi/runtime

2014-08-01 Thread George Bosilca
I guess (r32401). George. On Fri, Aug 1, 2014 at 12:32 PM, Ralph Castain wrote: > I found the problem - the issue is that assert on the convertor. MPI apps > are setting that convertor, but not non-MPI apps, and so the field is NULL. > Can we remove that assert? > > > On Aug 1, 2014, at 9:3

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread George Bosilca
Another version of the atomic patch. Paul has tested it on a bunch of platforms. At this point we have confirmation from all architectures except SPARC (v8+ and v9). George. atomics.patch Description: Binary data On Jul 31, 2014, at 19:13 , George Bosilca wrote: > All, > > Here is the p

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Paul Hargrove
MIPS32, MIPS64 and ARMv7 tests are also pending. -Paul On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote: > Another version of the atomic patch. Paul has tested it on a bunch of > platforms. At this point we have confirmation from all architectures except > SPARC (v8+ and v9). > > George.

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Paul Hargrove
Regarding review of the coll/ml fix: While the fix Gilles worked out overnight proved sufficient on Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns: 1) As I already voiced on the list, I am concerned with the portability of _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()). 2) Thou

Re: [OMPI devel] RFC: add atomic compare-and-swap that returns old value

2014-08-01 Thread Paul Hargrove
I have confirmed that George's latest version works on both SPARC ABIs. ARMv7 and three MIPS ABIs still pending... -Paul On Fri, Aug 1, 2014 at 9:40 AM, George Bosilca wrote: > Another version of the atomic patch. Paul has tested it on a bunch of > platforms. At this point we have confirmatio

[OMPI devel] [1.8.2rc3] build failure on OpenBSD (libevent)

2014-08-01 Thread Paul Hargrove
I am seeing the following on OpenBSD/amd64 with "make V=1": Making all in tools/wrappers /bin/sh ../../../libtool --tag=CC--mode=link gcc -std=gnu99 -g -finline-functions -fno-strict-aliasing -pthread -export-dynamic -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lutil -lm

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Gilles Gouaillardet
Paul, about the second point : mmap is called with the MAP_FIXED flag, before the fix, the required address was not aligned on a page size and hence mmap failed. the mmap failure was immediatly handled, but for some reasons i did not fully investigate yet, this failure was not correctly propagated