Ralph,
*IF* I have new enough autotools to autogen on my FreeBSD platform, I'll
take a shot tonight from the svn trunk. Otherwise I'll need a tarball to
point my testing script at.
-Paul
On Wed, Jan 8, 2014 at 8:10 PM, Ralph Castain wrote:
> Actually, as I look at it, the logic escapes me an
Requested verbose output below.
-Paul
-bash-4.2$ mpirun -mca ess_base_verbose 10 -np 1 examples/ring_c
[pcp-j-17:02150] mca: base: components_register: registering ess components
[pcp-j-17:02150] mca: base: components_register: found loaded component env
[pcp-j-17:02150] mca: base: components_regi
No joy running autogen.pl on my FreeBSD system.
So, I'll try to remember to run on Thu night's trunk tarball.
-Paul
On Wed, Jan 8, 2014 at 9:01 PM, Paul Hargrove wrote:
> Ralph,
>
> *IF* I have new enough autotools to autogen on my FreeBSD platform, I'll
> take a shot tonight from the svn trun
Ralph,
When rebuilding with --enable-debug and the original gcc-4.0.0 the SEGV
returns.
So, the ompi-1.4 in the LD_LIBRARY_PATH was NOT the cause.
Below is a backtrace from gdb which includes line numbers.
The SEGV is in strlen() which suggests a string which lacks
null-termination.
The initial
Ralph,
Nevermind my previous emails about autotools, tarballs, etc.
I just did "svn di -r30171:30172" on one system and applied with "patch".
The result is that the build is able to get past opal/util/path.c
When the rest of the test completes, I'll report the results.
I am still concerned by the
With Ralph's fix for opal/util/path.c, I can build the trunk on
FreeBSD-9/x86-64.
However, building the examples fails with:
$ cp -r $SRCDIR/examples .
$ cd examples
$ make
mpicc -g hello_c.c -o hello_c
mpicc -g ring_c.c -o ring_c
mpicc -g connectivity_c.c -o connectivity_c
shmemcc -g -o he
Building the trunk on FreeBSD-9/x86-64, and using gmake to work around the
non-portable examples/Makefile, I *still* encounter issues with shmemfort
when running "gmake" in the examples subdirectory:
$ gmake
mpicc -ghello_c.c -o hello_c
mpicc -gring_c.c -o ring_c
mpicc -gconnectivi
Il 1/9/2014 5:10 AM, Ralph Castain ha scritto:
Actually, as I look at it, the logic escapes me anyway. Basically, you
only have two options - use the vfs struct for Sun, and use fs struct
for everything else. I'm not aware of any other choice, and indeed the
list of all the systems for the latter
Yup, the 1.7.4rc2r30168 tarball appears to have resolved this.
-Paul
On Wed, Jan 8, 2014 at 4:08 PM, Ralph Castain wrote:
> Yeah - it only today was approved for move into 1.7.4 :-)
>
> Hopefully will make it into tonight's tarball
>
>
> On Jan 8, 2014, at 3:58 PM, Paul Hargrove wrote:
>
> On
Some minor misc warnings from the current 1.7.4rc tarball:
On both FreeBSD and NetBSD, the symbol CACHE_LINE_SIZE used
in ompi/mca/bcol/basesmuma/ appears to clash with a system-defined symbol.
FreeBSD-9/x86-64:
CC bcol_basesmuma_component.lo
In file included from
/home/phargrov/OMPI/openm
The following three warnings occur on 32-bit targets, and can each be
avoided by adding an intermediate cast to uintptr_t or intptr_t:
-bash-4.2$ grep -B2 'different size' make.log
CC fcoll_dynamic_file_read_all.lo
/home/phargrov/OMPI/openmpi-1.7-latest-netbsd6-i386/openmpi-1.7.4rc2r30168/
And FWIW, all of those warnings occur on a Linux/x86 host with InfiniBand.
So, despite the platforms shown in my previous email the problem is not
related to use of NetBSD or Solaris.
/home/pcp1/phargrov/OMPI/openmpi-1.7-latest-linux-x86/openmpi-1.7.4rc2r30168/ompi/mca/btl/openib/btl_openib.c:272:
Mike / Devendar --
I'd like to understand the use case for this a bit more, from the perspective
of: is there more infrastructure that we need to provide in the coll framework?
Can you explain what you're trying to do, and when you need your cleanup
action(s) to run?
I ask because this seems
I see the issue - there are no "cores" on this topology, only "pu's", so
"bind-to core" is going to fail even though binding is supported. Will adjust.
Thanks!
On Jan 8, 2014, at 9:06 PM, Paul Hargrove wrote:
> Requested verbose output below.
> -Paul
>
> -bash-4.2$ mpirun -mca ess_base_verbos
Shoot. My bad there. Looks like the enumerator sentinel is missing. Will fix
now.
-Nathan
On Wed, Jan 08, 2014 at 09:27:46PM -0800, Paul Hargrove wrote:
>Ralph,
>When rebuilding with --enable-debug and the original gcc-4.0.0 the SEGV
>returns.
>So, the ompi-1.4 in the LD_LIBRARY_
I fully concur - just limited by my available time to fix it. Jeff has
volunteered to step in, though.
On Jan 8, 2014, at 11:44 PM, marco atzeri wrote:
> Il 1/9/2014 5:10 AM, Ralph Castain ha scritto:
>> Actually, as I look at it, the logic escapes me anyway. Basically, you
>> only have two opt
Il 1/7/2014 2:54 PM, George Bosilca ha scritto:
Can you try with the latest trunk please?
Also if things are not going well with the trunk please provide the
opal_config.h file.
Thanks,
George.
same failures
opal_config.h for openmpi-1.9a1r30128-1 attached.
Sorry for delay but buil
+Valentine
Jeff,
Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is
destroyed, by the time we destroy the hcoll module, the communicator context is
no longer valid and any pending operations that rely on its existence will
fail. In particular, we have a non-blocking syn
Continuing with the CR code I now get a crash which can be easily reproduced
using orte/test/system/orte_barrier.c
I get:
orte_barrier: ../../../../../opal/class/opal_list.h:547: _opal_list_append:
Assertion `0 == item->opal_list_item_refcount' failed.
[dcbz:05085] *** Process received signal **
Should now be fixed in trunk (silently fall back to not binding if cores not
found) - scheduled for 1.7.4. If you could test the next trunk tarball, that
would help as I can't actually test it on my machines
On Jan 9, 2014, at 6:25 AM, Ralph Castain wrote:
> I see the issue - there are no "co
On Jan 9, 2014, at 11:00 AM, Joshua Ladd wrote:
> Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is
> destroyed, by the time we destroy the hcoll module, the communicator context
> is no longer valid and any pending operations that rely on its existence will
> fail. In
Paul --
Could you send the configure stdout and config.log for your 32 bit build on
Solaris?
We have a test in the usnic BTL that is supposed to ensure that it only builds
on Linux, but given some of the output you've sent, it looks like it is also
building on Solaris... (which means our confi
See inline...
-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
Sent: Thursday, January 09, 2014 11:53 AM
To: Open MPI Developers
Cc: Devendar Bureddy; valentin (valentin.pet...@itseez.com)
(valentin.pet...@itseez.com); Mike Dubman
Su
This approach is a completely legit way of destroying pending operation on a
module tied to a communicator.
A communicator is not destroyed when MPI_Comm_free is called but when all the
pending operations using the communicator are done (this is the only moment
when one can safely go and releas
Fixed https://svn.open-mpi.org/trac/ompi/ticket/4076.
On Jan 9, 2014, at 1:14 AM, Paul Hargrove wrote:
> With Ralph's fix for opal/util/path.c, I can build the trunk on
> FreeBSD-9/x86-64.
> However, building the examples fails with:
>
> $ cp -r $SRCDIR/examples .
> $ cd examples
> $ make
> mp
Ralph,
Thanks for fielding all these issues I've been finding.
I will plan to run tonight's trunk tarball through all of the systems where
I've seen any issues.
-Paul
On Thu, Jan 9, 2014 at 8:40 AM, Ralph Castain wrote:
> Should now be fixed in trunk (silently fall back to not binding if core
Many thanks, Paul!
On Jan 9, 2014, at 3:07 PM, Paul Hargrove wrote:
> Ralph,
>
> Thanks for fielding all these issues I've been finding.
> I will plan to run tonight's trunk tarball through all of the systems where
> I've seen any issues.
>
> -Paul
>
>
> On Thu, Jan 9, 2014 at 8:40 AM, Ralp
Jeff,
I actually noted this in the unofficial tarball you rolled for me in
December:
On Thu, Dec 19, 2013 at 6:35 PM, Paul Hargrove wrote:
[...]
> I'm not sure why one is trying to build the usnic btl on Solaris at all.
> Perhaps just because the OFED stack is present?
>
[...]
I assumed your la
On Jan 9, 2014, at 3:27 PM, Paul Hargrove wrote:
> I actually noted this in the unofficial tarball you rolled for me in December:
> On Thu, Dec 19, 2013 at 6:35 PM, Paul Hargrove wrote:
> [...]
> I'm not sure why one is trying to build the usnic btl on Solaris at all.
> Perhaps just because the
Fixed on trunk in https://svn.open-mpi.org/trac/ompi/changeset/30198.
I can't test on all the kinds of systems Paul/Marco have, though -- we'll have
to see what happens when he tries.
On Jan 9, 2014, at 10:36 AM, Ralph Castain wrote:
> I fully concur - just limited by my available time to fix
Paul --
Can you send the config.log file from this? It has more details in it that
will be useful (e.g., *why* ibv_open_device wasn't found in -libverbs).
I wonder if the issue has to do something with our handling of the legacy
--with-openib switch...? (it's been deprecated on the trunk in fa
Jeff,
The changes as described in the commit message make good sense to me except
for one thing:
In the 1.7 branch there is still a defined(__WINDOWS__) case for which
opal_path_nfs() is currently a no-op . So, I fear that if CMR'ed blindly
both the configure-time and build-time checks to ensure
Jeff,
The requested config.log was attached
as openmpi-trunk-solaris11-x64-ib-gcc452-config.log.bz2 in my recent
response to the usnic-on-solaris thread:
http://www.open-mpi.org/community/lists/devel/2014/01/13637.php
It looks like there were 2 successful probes for ibv_open_device() before
the f
The windows reference is stale on that branch - I'll remove it when applying
the cmr. We no longer support native Windows, and never did on the 1.7 series.
On Jan 9, 2014, at 2:08 PM, Paul Hargrove wrote:
> Jeff,
>
> The changes as described in the commit message make good sense to me except
For my CR work this can probably ignored. I think I was looking at the
wrong place.
On Thu, Jan 09, 2014 at 05:28:01PM +0100, Adrian Reber wrote:
> Continuing with the CR code I now get a crash which can be easily reproduced
> using orte/test/system/orte_barrier.c
>
> I get:
>
> orte_barrier: ..
On Thu, Jan 9, 2014 at 2:08 PM, Paul Hargrove wrote:
> However, only my Solaris (10/SPARC and 11/x86-64) systems have NFS-mounted
> filesystems. So, I don't have any means to ensure that the "newly active"
> code performs correctly on the BSD systems. In other words,
> opal_path_nfs() might con
Not sure I grok - are you saying you believe the assert is bogus? We haven't
see it elsewhere, but perhaps this is happening only with c/r config and
running?
I'm happy to take a look if you can provide more specifics as to how it can be
made to happen
On Jan 9, 2014, at 2:46 PM, Adrian Reber
The README in the latest 1.7.4rc lists support for:
- OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
Absoft compilers (*)
Should 10.8 (Mountain Lion) be listed?
What about 10.9 (Mavericks)?
I can test 10.5 through 10.8 but haven't been doing so assuming that is
covered by th
As I noted in another email, 1.7.4's README claims support for Mac OSX
versions 10.5 through 10.7. So, I just now tried (but failed) to build on
10.5 (Leopard):
*** Assembler
checking dependency style of gcc -std=gnu99... gcc3
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -p
On Jan 9, 2014, at 5:51 PM, Paul Hargrove wrote:
> Nevermind that - I just recalled that test/util/opal_path_nfs.c is
> Linux-specific. So, there won't be any testing coverage of the new code on
> any of my Solaris or BSD systems. Nor will Mac OSX get any "real" testing.
>
> Does anybody hav
On Thu, Jan 9, 2014 at 4:15 PM, Jeff Squyres (jsquyres)
wrote:
> For example, if you build and run the test like this:
>
> make check
> ./opal_path_nfs . Makefile ~
>
> It'll report if ., Makefile, and ~ are on network filesystems (i.e., the
> result of sending each of ".", "Makefile", and "your_h
My attempts to build the current 1.7.4rc tarball with versions 8.0 and 9.0
of the Portland Group compilers fails miserably on compilation of
mpi-f08-types.F90.
I am sort of surprised by the attempt to build Fortran 2008 support w/ such
old compilers.
I think that fact itself is the real bug here,
I dunno if we really go back that far, Paul - I doubt anyone has tested on
anything less than 10.8, frankly. Might be better if we update to not make
claims that far back.
Were you able to build/run on 10.7?
On Jan 9, 2014, at 3:25 PM, Paul Hargrove wrote:
> As I noted in another email, 1.7.4
Yeah, I can add those as we regularly run on them - see other note about
earlier versions
On Jan 9, 2014, at 3:00 PM, Paul Hargrove wrote:
> The README in the latest 1.7.4rc lists support for:
>
> - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
> Absoft compilers (*)
>
>
Ralph,
I can build fine on 10.7 (the system I am typing on now), and on 10.6 too.
I have no strong opinion on fix-vs-document, but as Jeff knows quite well
if you say you support it I am going to try to make it break :).
-Paul
On Thu, Jan 9, 2014 at 4:46 PM, Ralph Castain wrote:
> I dunno if
Trying to run on the front-end of one of our production Linux systems I see
the following:
$ mpirun -mca btl sm,self -np 2 examples/ring_c'
[cvrsvc01:17692] [[42051,1],0] ORTE_ERROR_LOG: Data for specified key not
found in file
/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.7-latest-linux-x86_64
I wonder if the reason PGI V10.x does not use mpi_f08 bindings is that old PGI
compiler version number parsing error. (Unless, of course, if PGI V11.x or
V12.x do use mpi_f08 bindings.) In that old (autoconf?) bug, decisions were
made about features supported on PGI compilers by parsing the ve
Larry,
I didn't try pgi-11, but pgi-12.8 *does* have F08 support detected by
OpenMPI:
$ openmpi-1.7-latest-linux-x86_64-pgi-12.8/INST/bin/ompi_info --all | grep
-i f08
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the pgf90 comp
I believe the following means that the compiler has determined that the two
named variables DO NOT actually get initialized to NULL as written.
However, it looks like their initialization is not required, as each is
set before it is read.
CC btl_usnic_component.lo
/scratch/scratchdirs/har
The README in the current 1.7.4rc tarball still claims support for uDAPL
and Quadrics Elan. Unless I am mistaken, those were both removed.
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawr
The README in the current 1.7.4rc tarball lists support for "Portals" and
documents --with-portals{,-config,-libs} configure arguments.
However, unless I am mistaken mtl:portals is gone and mtl:portals4 has
different configure arguments.
-Paul
--
Paul H. Hargrove phharg
Is anybody still testing MX and PSM?
They are both still present in ompi/mca/mtl/
I have access to a system w/ QLogic HCAs w/ PSM and have verified that I
can:
$ mpirun -mca btl sm,self -np 2 -host n15,n16 -mca mtl psm examples/ring_c
I have an x86 (32-bit) system w/ MX headers and libs that I ha
I've now seen this same failure mode on another Linux system.
I forgot to mention before that the job is hung after issuing the error
message.
Singleton runs fail in the same manner.
Both are front-end machines and perhaps that is related to this failure;
for instance expecting an allocation becau
Jeff,
Sorry, but the new opal/util/pth.c in the trunk tarball (1.9a1r30215) fails
to build on NetBSD:
CC path.lo
/home/phargrov/OMPI/openmpi-trunk-netbsd6-i386/openmpi-1.9a1r30215/opal/util/path.c:
In function 'opal_path_nfs':
/home/phargrov/OMPI/openmpi-trunk-netbsd6-i386/openmpi-1.9a1r3
With the new opal/util/path.c I get farther building the trunk on OpenBSD
but hit a new failure:
Making all in mca/memheap
CC base/memheap_base_frame.lo
CC base/memheap_base_select.lo
CC base/memheap_base_alloc.lo
/home/phargrov/OMPI/openmpi-trunk-openbsd5-i386/openmpi-1.9a
Jeff,
No joy on Solaris-10 either:
CC path.lo
"/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v8plus/openmpi-1.9a1r30215/opal/util/path.c",
line
478: prototype mismatch: 2 args passed, 4 expected
"/home/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v8plus/openmpi-1.9a1
On Thu, Jan 9, 2014 at 7:15 PM, Paul Hargrove wrote:
> My Solaris-11 build stopped again on the failure to find ibv_open_device().
> I am re-running w/o --enable-openib now.
>
It finished while I was typing the previous message.
The Solaris-11 build failed in the same way as Solaris-10.
-Paul
The following might be helpful:
http://stackoverflow.com/questions/1653163/difference-between-statvfs-and-statfs-system-calls
It seems to indicate that even if one does find a statfs() function, there
are multiple os-dependent versions and it should therefore be avoided.
Since statvfs() is defin
Same issue for NetBSD, too.
-Paul
On Thu, Jan 9, 2014 at 7:09 PM, Paul Hargrove wrote:
> With the new opal/util/path.c I get farther building the trunk on OpenBSD
> but hit a new failure:
>
> Making all in mca/memheap
> CC base/memheap_base_frame.lo
> CC base/memheap_base_selec
From your ompi_info output, it looks like this is a slurm system - yes?
Wouldn't really matter anyway as we run fine on a head node without an
allocation, but worth clarifying.
What the message is indicating is a failure of the modex - we are missing an
expected piece of data. I don't see anyth
So far as I know, yes - still being tested and used. Glad to hear you could
validate the QLogic stuff - I don't know about Myrinet, but imagine someone
will shout if it has an issue
On Jan 9, 2014, at 5:52 PM, Paul Hargrove wrote:
> Is anybody still testing MX and PSM?
> They are both still
Ralph,
The problem has occurred with two builds (both PGI-based) on head nodes of
two clusters managed by TORQUE, not by SLURM. Somehow configure on the
first picked up SLURM headers and libs, but not TM. While the second
picked up the TM headers and libs.
I'll try a gcc-based build on one of t
62 matches
Mail list logo