Mahmood,

-march=bdver1

should be ok on your nodes.
from the gcc command line, i was expecting -march=xxx, but it is
missing (your gcc might be a bit older for that)
note you have to recompile all your libs (openblas and friends) with
-march=bdver1

i guess your gdb is also a bit too old to support all operations on a core file
(fwiw, i am able to do that on RHEL7)

at first, i recommend you find the smallest number of nodes necessary
to reproduce the issue.
ideally, you would confirm the app is working fine by running it
exclusively on the frontend.

if you do not have a parallel debugger, then you have to manually
parallel debug your app.

i usually update my main app like this

int _dbg=1;

MPI_Init(...);
printf("gdb --pid=%d\n", getpid());
while (_dbg) poll(NULL, 0, 1);

rebuild and run.

then log into the compute nodes, and run the gdb command that was
displayed previously
you usually have to (for all your MPI tasks, in different terminals)
bt
frame #1
set _dbg=0
c

and wait for a crash

hopefully, you will be able to run
disas
info proc mapping
x /100x $rp

Cheers,

Gilles


On Fri, Sep 16, 2016 at 2:54 AM, Mahmood Naderan <mahmood...@gmail.com> wrote:
> The differences are very very minor
>
> root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1
>  /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v -
> -mtune=generic
>
> [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1
>  /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v -
> -mtune=generic
>
>
> Even I tried to compile the program with -march=amdfam10. Something like
> these
>
> /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10
> `FoX/FoX-config --fcflags`  -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT
> -DTRANSIESTA    /export/apps/siesta/siesta-4.0/Src/pspltm1.F
>
> But got the same error.
>
> /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute
> node it shows (family 21, model 1).
>
>
>
>>That being said, my best bet is you compile on a compute node ...
> gcc is there on the computes, but the NFS permission is another issue. It
> seems that nodes are not able to write on /share (the one which is shared
> between frontend and computes).
>
>
>
> An important question is that, how can I find out what is the name of the
> illegal instruction. Then, I hope to find the document that points which
> instruction set (avx, sse4, ...) contains that instruction.
>
> Is there any option in mpirun to turn on the verbosity to see more
> information?
>
> Regards,
> Mahmood
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to