Yeah, I think we've concluded that this is just a bug in the compiler and not
something wrong in OMPI itself. Sadly, compilers (just like all software) also
have bugs.
I'd just use the upgraded version as they apparently fixed the problem.
On Jun 3, 2014, at 4:43 AM, Alain Miniussi wrote:
>
Please note that I had the problem with 13.1.0 but not with the 13.1.1
On 28/05/2014 00:47, Ralph Castain wrote:
On May 27, 2014, at 3:32 PM, Alain Miniussi wrote:
Unfortunately, the debug library works like a charm (which make the
uninitialized variable issue more likely).
Indeed - sounds
On Wed, May 28, 2014 at 12:32:35AM +0200, Alain Miniussi wrote:
> Unfortunately, the debug library works like a charm (which make the
> uninitialized variable issue more likely).
>
> Still, the stack trace point to mca_btl_openib_add_procs in
> ompi/mca/btl/openib/btl_openib.c and there is only on
On May 27, 2014, at 3:32 PM, Alain Miniussi wrote:
> Unfortunately, the debug library works like a charm (which make the
> uninitialized variable issue more likely).
Indeed - sounds like there is some optimization occurring that triggers the
problem.
>
> Still, the stack trace point to mca_
Unfortunately, the debug library works like a charm (which make the
uninitialized variable issue more likely).
Still, the stack trace point to mca_btl_openib_add_procs in
ompi/mca/btl/openib/btl_openib.c and there is only one division in that
function (although not floating point) at the end:
Ah, good. On the setup that fails, could you use gdb to find the line number
where it is dividing by zero? It could be an uninitialized variable that gcc
inits one way and icc inits another.
On May 27, 2014, at 4:49 AM, Alain Miniussi wrote:
> So it's working with a gcc compiled openmpi:
>
>
So it's working with a gcc compiled openmpi:
[alainm@gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme
gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath
-Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags
-L/softs/openmpi-1.8.1-gnu447/lib -lmpi
(reverse-i-search)`m
Hi Gus,
Yes I did, with the same result on each process. Actually the problem
was spotted on a real code although I just posted the minimal version.
Alain
On 26/05/2014 17:14, Gustavo Correa wrote:
Hi Alain
Have you tried this?
mpiexec -np 2 ./a.out
Note: mpicc to compile, mpiexec to exec
If you wouldn't mind, yes - let's see if it is a problem with icc. We know some
versions have bugs, though this may not be the issue here
On May 26, 2014, at 7:39 AM, Alain Miniussi wrote:
>
> Hi,
>
> Did that too, with the same result:
>
> [alainm@tagir mpi]$ mpirun -n 1 ./a.out
> [tagir:05
Hi Alain
Have you tried this?
mpiexec -np 2 ./a.out
Note: mpicc to compile, mpiexec to execute.
I hope this helps,
Gus Correa
On May 26, 2014, at 9:59 AM, Alain Miniussi wrote:
>
> Hi,
>
> I have a failure with the following minimalistic testcase:
> $: more ./test.c
> #include "mpi.h"
>
>
Hi,
Did that too, with the same result:
[alainm@tagir mpi]$ mpirun -n 1 ./a.out
[tagir:05123] *** Process received signal ***
[tagir:05123] Signal: Floating point exception (8)
[tagir:05123] Signal code: Integer divide-by-zero (1)
[tagir:05123] Failing at address: 0x2adb507b3d9f
[tagir:05123] [
Strange - I note that you are running these as singletons. Can you try running
it under mpirun?
mpirun -n 1 ./a.out
just to see if it is the singleton that is causing the problem, or something in
the openib btl itself.
On May 26, 2014, at 6:59 AM, Alain Miniussi wrote:
>
> Hi,
>
> I have
Hi,
I have a failure with the following minimalistic testcase:
$: more ./test.c
#include "mpi.h"
int main(int argc, char* argv[]) {
MPI_Init(&argc,&argv);
MPI_Finalize();
return 0;
}
$: mpicc -v
icc version 13.1.1 (gcc version 4.4.7 compatibility)
$: mpicc ./test.c
$: ./a.out
[tagir
13 matches
Mail list logo