Use --enable-debug on your configure line. This will add in some debugging
code to OMPI, and it'll compile everything with -g so that you can get stack
traces.
Beware that the extra debugging junk makes OMPI slightly slower; don't do any
benchmarking with this install, etc.
On Sep 28, 2011,
I tried 1.4.4rc4, same problem. Where do I get a debugging version?
On 9/28/11 8:32 AM, Jeff Squyres wrote:
Agreed that the original program had the char*[20]/char[20] bug, but his segv
is occurring before trying to use that array. So it's a bug - but he just
hadn't hit it yet. :-)
I'd stil
Jeff,
I've tried it now adding --without-libnuma. Actually that did NOT fix the
problem, so I can send you the full output from configure if you want, to
understand why this "hwloc" function is trying to use a function which appears
to be unavailable. The answers to some of your questions:
I am wondering what the proper way of stop a mpirun process and the child
process it created. I tried to send SIGTERM, it does not respond to it ?
What kind of signal should I be sending to it ?
Thanks
Xin
Am 28.09.2011 um 18:09 schrieb Brice Goglin:
Le 28/09/2011 17:55, Blosch, Edwin L a écrit :
I am getting some undefined references in building OpenMPI 1.5.4
and I would like to know how to work around it.
The errors look like this:
/scratch1/bloscel/builds/release/openmpi-intel/lib/
On 09/27/2011 05:30 PM, Jeff Squyres wrote:
> On Sep 27, 2011, at 5:03 PM, Prentice Bisbal wrote:
>
>> To clarify, is IP/Ethernet required, or will IPoIB be used if it's
>> configured on the nodes? Would this make a difference.
>
> IPoIB is fine, although I've heard concerns about its stability a
Le 28/09/2011 17:55, Blosch, Edwin L a écrit :
>
> I am getting some undefined references in building OpenMPI 1.5.4 and I
> would like to know how to work around it.
>
>
>
> The errors look like this:
>
>
>
> /scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
> In fu
Hi,
Interestingly, the errors are gone after I removed "-g" from the app
compile options.
I tested again on the fresh Ubuntu 11.10 install: both 1.4.3 and 1.5.4
compile fine, but with the same error.
Also I tried hard to find any 32-bit object or library and failed.
They all are 64-bit.
- D.
20
Yowza; that sounds like a configury bug. :-(
What line were you using to configure Open MPI? Do you have libnuma installed?
If so, do you have the .h and .so files? Do you have the .a file?
Can you send the last few lines of output from a failed "make V=1" in that
tree? (it'll show us the
I am getting some undefined references in building OpenMPI 1.5.4 and I would
like to know how to work around it.
The errors look like this:
/scratch1/bloscel/builds/release/openmpi-intel/lib/libmpi.a(topology-linux.o):
In function `hwloc_linux_alloc_membind':
topology-linux.c:(.text+0x1da): und
Hello All,
I have just rebuilt openmpi-1.4-3 on our cluster, and I see this error:
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environme
Agreed that the original program had the char*[20]/char[20] bug, but his segv
is occurring before trying to use that array. So it's a bug - but he just
hadn't hit it yet. :-)
I'd still like to see a debugging version so that we can get a real stack
trace, and/or try the latest 1.4.4 RC (poste
Hello Reuti,
> defining 12 slots and request the machines exclusive is not an option?
I would like to. Unfortunatly the system is productive (for 2 years now) and
many
scripts depend on this setup.
>
> The only way to get it working otherwise is to unset $JOB_ID and so
> on, so that Open MPI
Hi Rob,
thanks for your comments. I understand that it's most probably not worth
the effort to find the actual reason.
Because I have to deal with very large files I preferred using
"std::numeric_limits::max()" rather than a hard-coded value
to split the read in case an IO request exceeds this am
14 matches
Mail list logo