Re: [OMPI users] OMPI users] Still "illegal instruction"
>Thx for sharing, quite interesting. But does this mean, that there is no working command line flag for gcc to switch this >off (like -march=bdver1 what Gilles mentioned) or to tell me what he thinks it should compile for? Well that didn't work. maybe I messed somethings since I did recompile the programs multiple times with different configs and options. I will try one more time. Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
> Am 22.09.2016 um 17:20 schrieb Mahmood Naderan: > > Although this problem is not related to OMPI *at all*, I think it is good to > tell the others what was going on. Finally, I caught the illegal instruction > :) > > Briefly, I built the serial version of Siesta on the frontend and ran it > directly on the compute node. Fortunately, "x/i $pc" from GDB showed that the > illegal instruction was a FMA3 instruction. More detail is available at > https://gcc.gnu.org/ml/gcc-help/2016-09/msg00084.html > > According to the Wikipedia, > > • FMA4 is supported in AMD processors starting with the Bulldozer > architecture. FMA4 was realized in hardware before FMA3. > • FMA3 is supported in AMD processors starting with the Piledriver > architecture and Intel starting with Haswell processors and Broadwell > processors since 2014. > Therefore, the frontend (piledriver) inserts a FMA3 instruction while the > compute node (Bulldozer) doesn't recognize it. Thx for sharing, quite interesting. But does this mean, that there is no working command line flag for gcc to switch this off (like -march=bdver1 what Gilles mentioned) or to tell me what he thinks it should compile for? For pgcc there is -show and I can spot the target it discovered in the USETPVAL= line. -- Reuti > > The solution was (as stated by guys) building Siesta on the compute node. I > have to say that I tested all related programs (OMPI, Scalapack, OpenBLAS) > sequentially on the compute node in order to find who generate the illegal > instruction. > > Anyway... thanks a lot for your comments. Hope this helps others in the > future. > > > > Regards, > Mahmood > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Dear Gilles, It seems that using GDB with MPI is a bit tricky. I read the FAQ about that. Please see the post at https://gcc.gnu.org/ml/gcc-help/2016-09/msg00078.html >i guess your gdb is also a bit too old to support all operations on a core file >(fwiw, i am able to do that on RHEL7) This is a Rocks-6 and the GBD is 7.2. It seems that it doesn't support "info proc mapping" command I will try your suggestion by modifying the code. Meanwhile do you have any comment about that post (the link above)? Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
OK Gilles, let me try that. I will troubleshoot with gcc mailing list and will come back later. Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Mahmood, note you have to compile the source file that contains the snippet with '-g -O0', and link with '-g -O0' also, there was a typo in the gdb command, please read "frame 1" instead of "frame #1" Cheers, Gilles On Fri, Sep 16, 2016 at 12:53 PM, Gilles Gouaillardetwrote: > Mahmood, > > -march=bdver1 > > should be ok on your nodes. > from the gcc command line, i was expecting -march=xxx, but it is > missing (your gcc might be a bit older for that) > note you have to recompile all your libs (openblas and friends) with > -march=bdver1 > > i guess your gdb is also a bit too old to support all operations on a core > file > (fwiw, i am able to do that on RHEL7) > > at first, i recommend you find the smallest number of nodes necessary > to reproduce the issue. > ideally, you would confirm the app is working fine by running it > exclusively on the frontend. > > if you do not have a parallel debugger, then you have to manually > parallel debug your app. > > i usually update my main app like this > > int _dbg=1; > > MPI_Init(...); > printf("gdb --pid=%d\n", getpid()); > while (_dbg) poll(NULL, 0, 1); > > rebuild and run. > > then log into the compute nodes, and run the gdb command that was > displayed previously > you usually have to (for all your MPI tasks, in different terminals) > bt > frame #1 > set _dbg=0 > c > > and wait for a crash > > hopefully, you will be able to run > disas > info proc mapping > x /100x $rp > > Cheers, > > Gilles > > > On Fri, Sep 16, 2016 at 2:54 AM, Mahmood Naderan wrote: >> The differences are very very minor >> >> root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1 >> /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v - >> -mtune=generic >> >> [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1 >> /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v - >> -mtune=generic >> >> >> Even I tried to compile the program with -march=amdfam10. Something like >> these >> >> /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10 >> `FoX/FoX-config --fcflags` -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT >> -DTRANSIESTA/export/apps/siesta/siesta-4.0/Src/pspltm1.F >> >> But got the same error. >> >> /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute >> node it shows (family 21, model 1). >> >> >> >>>That being said, my best bet is you compile on a compute node ... >> gcc is there on the computes, but the NFS permission is another issue. It >> seems that nodes are not able to write on /share (the one which is shared >> between frontend and computes). >> >> >> >> An important question is that, how can I find out what is the name of the >> illegal instruction. Then, I hope to find the document that points which >> instruction set (avx, sse4, ...) contains that instruction. >> >> Is there any option in mpirun to turn on the verbosity to see more >> information? >> >> Regards, >> Mahmood >> >> >> >> ___ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Mahmood, -march=bdver1 should be ok on your nodes. from the gcc command line, i was expecting -march=xxx, but it is missing (your gcc might be a bit older for that) note you have to recompile all your libs (openblas and friends) with -march=bdver1 i guess your gdb is also a bit too old to support all operations on a core file (fwiw, i am able to do that on RHEL7) at first, i recommend you find the smallest number of nodes necessary to reproduce the issue. ideally, you would confirm the app is working fine by running it exclusively on the frontend. if you do not have a parallel debugger, then you have to manually parallel debug your app. i usually update my main app like this int _dbg=1; MPI_Init(...); printf("gdb --pid=%d\n", getpid()); while (_dbg) poll(NULL, 0, 1); rebuild and run. then log into the compute nodes, and run the gdb command that was displayed previously you usually have to (for all your MPI tasks, in different terminals) bt frame #1 set _dbg=0 c and wait for a crash hopefully, you will be able to run disas info proc mapping x /100x $rp Cheers, Gilles On Fri, Sep 16, 2016 at 2:54 AM, Mahmood Naderanwrote: > The differences are very very minor > > root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1 > /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v - > -mtune=generic > > [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1 > /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v - > -mtune=generic > > > Even I tried to compile the program with -march=amdfam10. Something like > these > > /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10 > `FoX/FoX-config --fcflags` -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT > -DTRANSIESTA/export/apps/siesta/siesta-4.0/Src/pspltm1.F > > But got the same error. > > /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute > node it shows (family 21, model 1). > > > >>That being said, my best bet is you compile on a compute node ... > gcc is there on the computes, but the NFS permission is another issue. It > seems that nodes are not able to write on /share (the one which is shared > between frontend and computes). > > > > An important question is that, how can I find out what is the name of the > illegal instruction. Then, I hope to find the document that points which > instruction set (avx, sse4, ...) contains that instruction. > > Is there any option in mpirun to turn on the verbosity to see more > information? > > Regards, > Mahmood > > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
I don't think there is anything OpenMPI can do for you here. The issue is clearly on how you are compiling your application. To start, you can try to compile without the --march=generic and use something as generic as possible (i.e. only SSE2). Then if this doesn't work for your app, do the same for any 3rd party library. Cheers, 2016-09-15 19:01 GMT+01:00 Mahmood Naderan: > Excuse me, which is most suitable for me to find the name of the illegal > instruction? > > --verbose > --debug-level > --debug-daemons > --debug-daemons-file > > > Regards, > Mahmood > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Information System Engineer, Ph.D. Blog: http://blog.audio-tk.com/ LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Am 15.09.2016 um 19:54 schrieb Mahmood Naderan: > The differences are very very minor > > root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1 > /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v - -mtune=generic > > [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1 > /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v - -mtune=generic > > > Even I tried to compile the program with -march=amdfam10. Something like these > > /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10 > `FoX/FoX-config --fcflags` -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT > -DTRANSIESTA/export/apps/siesta/siesta-4.0/Src/pspltm1.F > > But got the same error. > > /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute > node it shows (family 21, model 1). Just for curiosity: what is the model name of them? > >That being said, my best bet is you compile on a compute node ... > gcc is there on the computes, but the NFS permission is another issue. It > seems that nodes are not able to write on /share (the one which is shared > between frontend and computes). Would it work to compile with a shared target and copy it to /shared on the frontend? -- Reuti > An important question is that, how can I find out what is the name of the > illegal instruction. Then, I hope to find the document that points which > instruction set (avx, sse4, ...) contains that instruction. > > Is there any option in mpirun to turn on the verbosity to see more > information? > > Regards, > Mahmood > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Excuse me, which is most suitable for me to find the name of the illegal instruction? --verbose --debug-level --debug-daemons --debug-daemons-file Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
The differences are very very minor root@cluster:tpar# echo | gcc -v -E - 2>&1 | grep cc1 /usr/libexec/gcc/x86_64-redhat-linux/4.4.7/cc1 -E -quiet -v - -mtune=generic [root@compute-0-1 ~]# echo | gcc -v -E - 2>&1 | grep cc1 /usr/libexec/gcc/x86_64-redhat-linux/4.4.6/cc1 -E -quiet -v - -mtune=generic Even I tried to compile the program with -march=amdfam10. Something like these /export/apps/siesta/openmpi-2.0.0/bin/mpifort -c -g -Os -march=amdfam10 `FoX/FoX-config --fcflags` -DMPI -DFC_HAVE_FLUSH -DFC_HAVE_ABORT -DTRANSIESTA/export/apps/siesta/siesta-4.0/Src/pspltm1.F But got the same error. /proc/cpuinfo on the frontend shows (family 21, model 2) and on the compute node it shows (family 21, model 1). >That being said, my best bet is you compile on a compute node ... gcc is there on the computes, but the NFS permission is another issue. It seems that nodes are not able to write on /share (the one which is shared between frontend and computes). An important question is that, how can I find out what is the name of the illegal instruction. Then, I hope to find the document that points which instruction set (avx, sse4, ...) contains that instruction. Is there any option in mpirun to turn on the verbosity to see more information? Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
if gcc is installed on your compute node, you can run echo | gcc -v -E - 2>&1 | grep cc1 and look for the -march=xxx parameter /* you might want to compare that with your fronted */ And/or you can run grep family /proc/cpuinfo on your compute node Then man gcc on your front end node >From my gcc, -march=bdver1 for Family 15h, -march=barcelona for family 10h That being said, my best bet is you compile on a compute node ... Cheers, Gilles On Thursday, September 15, 2016, Mahmood Naderanwrote: > Although the CPUs are nearly the same, but the CPU flags are different. > I noticed that the frontend has fma, f16c, tch, tce, tbm and bmi1 while > the compute nodes don't have them. > > I guess that since the programs were compiled on the frontend (6380), > there are some especial instructions in the optimization phase which aren't > available in compute nodes (6282). > > Maybe this is not really related to OMPI, but anybody know which compiler > flags are related to these special instructions? > > > > > >Ok, you can try this under gdb > >info proc mapping > >info registers > >x /100x $rip > >x /100x $eip > > The process is dead, so some commands are invalid. > > Program terminated with signal 4, Illegal instruction. > #0 0x008da76e in ?? () > (gdb) info proc mapping > No /proc directory: '/proc/5383' > (gdb) info registers > rax0x0 0 > rbx0x448f98071891328 > rcx0x7fff52810b00 140734577576704 > rdx0x448f98071891328 > rsi0x448f98071891328 > rdi0x8 8 > rbp0x448f9800x448f980 > rsp0x7fff52810ae8 0x7fff52810ae8 > r8 0x1 1 > r9 0x9c02496 > r100x44af48072021120 > r110x44b1b8072031104 > r120x8 8 > r130x8 8 > r140x9 9 > r150x13880 8 > rip0x8da76e 0x8da76e > eflags 0x10246 [ PF ZF IF RF ] > cs 0x33 51 > ss 0x2b 43 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x0 0 > (gdb) x /100x $rip > 0x8da76e: Cannot access memory at address 0x8da76e > (gdb) x /100x $eip > Value can't be converted to integer. > (gdb) > > > > Regards, > Mahmood > > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Although the CPUs are nearly the same, but the CPU flags are different. I noticed that the frontend has fma, f16c, tch, tce, tbm and bmi1 while the compute nodes don't have them. I guess that since the programs were compiled on the frontend (6380), there are some especial instructions in the optimization phase which aren't available in compute nodes (6282). Maybe this is not really related to OMPI, but anybody know which compiler flags are related to these special instructions? >Ok, you can try this under gdb >info proc mapping >info registers >x /100x $rip >x /100x $eip The process is dead, so some commands are invalid. Program terminated with signal 4, Illegal instruction. #0 0x008da76e in ?? () (gdb) info proc mapping No /proc directory: '/proc/5383' (gdb) info registers rax0x0 0 rbx0x448f98071891328 rcx0x7fff52810b00 140734577576704 rdx0x448f98071891328 rsi0x448f98071891328 rdi0x8 8 rbp0x448f9800x448f980 rsp0x7fff52810ae8 0x7fff52810ae8 r8 0x1 1 r9 0x9c02496 r100x44af48072021120 r110x44b1b8072031104 r120x8 8 r130x8 8 r140x9 9 r150x13880 8 rip0x8da76e 0x8da76e eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) x /100x $rip 0x8da76e: Cannot access memory at address 0x8da76e (gdb) x /100x $eip Value can't be converted to integer. (gdb) Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
Ok, you can try this under gdb info proc mapping info registers x /100x $rip x /100x $eip I remember you are running on AMD cpus that is why INTEL is only instructions must be avoided Cheers, Gilles On Thursday, September 15, 2016, Mahmood Naderanwrote: > disas command fails. > > Program terminated with signal 4, Illegal instruction. > #0 0x008da76e in ?? () > (gdb) bt > #0 0x008da76e in ?? () > #1 0x008da970 in ?? () > #2 0x00bfe9f8 in ?? () > #3 0x in ?? () > (gdb) disas > No function contains program counter for selected frame. > > > >Btw, did you run some simple applications with openmpi 2.0.0 ? > >We do have bits of assembly code, and even if i do not believe they are > specific to intel cpus, i might be wrong >and that could be the root cause. > > I didn't run the tests. But I am pretty sure that OpenMPI is working > because, other applications (not siesta) have no problem. > Please note that the CPUs are AMD. Frontend is Opteron 6380 and the > compute nodes are 6282SE > > >Also, did you run > >make check > >After you built openmpi ? > > All are OK. Please see below. > > > > Testsuite summary for Open MPI 2.0.0 > > > # TOTAL: 2 > # PASS: 2 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 0 > # XPASS: 0 > # ERROR: 0 > > > > > Regards, > Mahmood > > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
disas command fails. Program terminated with signal 4, Illegal instruction. #0 0x008da76e in ?? () (gdb) bt #0 0x008da76e in ?? () #1 0x008da970 in ?? () #2 0x00bfe9f8 in ?? () #3 0x in ?? () (gdb) disas No function contains program counter for selected frame. >Btw, did you run some simple applications with openmpi 2.0.0 ? >We do have bits of assembly code, and even if i do not believe they are specific to intel cpus, i might be wrong >and that could be the root cause. I didn't run the tests. But I am pretty sure that OpenMPI is working because, other applications (not siesta) have no problem. Please note that the CPUs are AMD. Frontend is Opteron 6380 and the compute nodes are 6282SE >Also, did you run >make check >After you built openmpi ? All are OK. Please see below. Testsuite summary for Open MPI 2.0.0 # TOTAL: 2 # PASS: 2 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OMPI users] Still "illegal instruction"
--core=... is the right syntax, sorry about that No need to recompile with -g, binary is good enough here Then you need to run disas in gdb, to disassemble the instruction at 0x08da76e And then, still in gdb info maps or show maps To find out the library this instruction is coming from OpenBLAS is fine, my question is if you compiled it by yourself, and on the same platform Btw, did you run some simple applications with openmpi 2.0.0 ? We do have bits of assembly code, and even if i do not believe they are specific to intel cpus, i might be wrong and that could be the root cause. Also, did you run make check After you built openmpi ? Cheers, Gilles Mahmood Naderanwrote: >>gdb --pid=core.5383 > > >Are you sure about the syntax? > >PID must be a running process. I see --core which seems to be relevant here. > > >Both OpenMPI and Siesta were compiled with O flags. This is not appropriate >for gdb. Should I compile both of them with debug symbols? > > >>Btw, did you compile lapack and friends by yourself ? > >I use Scalapack which need BLAS. I use OpenBLAS instead of netllib's BLAS? > > > >$ gdb --core=core.5383 > >Try: yum --enablerepo='*-debug*' install >/usr/lib/debug/.build-id/e1/ddc85f7caa9f2571545a58479d64ba676217dd >[New Thread 5383] >[New Thread 5416] >[New Thread 5401] >[New Thread 5388] >[New Thread 5407] >[New Thread 5406] >[New Thread 5418] >[New Thread 5393] >[New Thread 5391] >[New Thread 5387] >[New Thread 5405] >[New Thread 5389] >[New Thread 5408] >[New Thread 5417] >[New Thread 5394] >[New Thread 5506] >[New Thread 5404] >[New Thread 5392] >[New Thread 5410] >[New Thread 5411] >[New Thread 5395] >[New Thread 5409] >[New Thread 5403] >[New Thread 5414] >[New Thread 5396] >[New Thread 5412] >[New Thread 5419] >[New Thread 5413] >[New Thread 5509] >[New Thread 5415] >[New Thread 5397] >[New Thread 5420] >[New Thread 5398] >[New Thread 5399] >Core was generated by `/share/apps/siesta/siesta-4.0/tpar/transiesta'. >Program terminated with signal 4, Illegal instruction. >#0 0x008da76e in ?? () >(gdb) bt >#0 0x008da76e in ?? () >#1 0x008da970 in ?? () >#2 0x00bfe9f8 in ?? () >#3 0x in ?? () >(gdb) > > > >Regards, >Mahmood > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Still "illegal instruction"
>gdb --pid=core.5383 Are you sure about the syntax? PID must be a running process. I see --core which seems to be relevant here. Both OpenMPI and Siesta were compiled with O flags. This is not appropriate for gdb. Should I compile both of them with debug symbols? >Btw, did you compile lapack and friends by yourself ? I use Scalapack which need BLAS. I use OpenBLAS instead of netllib's BLAS? $ gdb --core=core.5383 Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/e1/ddc85f7caa9f2571545a58479d64ba676217dd [New Thread 5383] [New Thread 5416] [New Thread 5401] [New Thread 5388] [New Thread 5407] [New Thread 5406] [New Thread 5418] [New Thread 5393] [New Thread 5391] [New Thread 5387] [New Thread 5405] [New Thread 5389] [New Thread 5408] [New Thread 5417] [New Thread 5394] [New Thread 5506] [New Thread 5404] [New Thread 5392] [New Thread 5410] [New Thread 5411] [New Thread 5395] [New Thread 5409] [New Thread 5403] [New Thread 5414] [New Thread 5396] [New Thread 5412] [New Thread 5419] [New Thread 5413] [New Thread 5509] [New Thread 5415] [New Thread 5397] [New Thread 5420] [New Thread 5398] [New Thread 5399] Core was generated by `/share/apps/siesta/siesta-4.0/tpar/transiesta'. Program terminated with signal 4, Illegal instruction. #0 0x008da76e in ?? () (gdb) bt #0 0x008da76e in ?? () #1 0x008da970 in ?? () #2 0x00bfe9f8 in ?? () #3 0x in ?? () (gdb) Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Still "illegal instruction"
Mahmood, You can gdb --pid=core.5383 And then bt An then disas And "scroll" until the current instruction Iirc, there is a star at the beginning of this line You can also try show maps Or info maps (I cannot remember the syntax...) Btw, did you compile lapack and friends by yourself ? Mahmood Naderanwrote: >Hi, > >After upgrading OpenMPI (from 1.6.5 to 2.0.0) and my program (from 3.2 to >4.0), still the parallel run aborts with the "Illegal instruction" error in >the middle on the run. > > >I wonder why this happens and how can I debug more? How can I find that this >error is related to the program itself, mpi or system libraries? > > >Gilles gave a suggestion about using ulimit to create a core file >(https://mail-archive.com/users@lists.open-mpi.org/msg29919.html). Please see >the following: > > >mahmood@cluster:tran$ cat sc.sh >#!/bin/bash >ulimit -c unlimited >exec /share/apps/siesta/siesta-4.0/tpar/transiesta < trans-cc.fdf >mahmood@cluster:tran$ cat hosts.txt >compute-0-1 >mahmood@cluster:tran$ hostname >cluster >mahmood@cluster:tran$ #/share/apps/siesta/openmpi-2.0.0/bin/mpirun -hostfile >hosts.txt -np 15 sc.sh > > > >-- >mpirun noticed that process rank 0 with PID 5383 on node compute-0-1 exited on >signal 4 (Illegal instruction). >-- > > > >Now I see a file core.5383 > >It is a very huge file (1290018816 bytes)!!! > >How can I process that? > > >Regards, >Mahmood > > ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] Still "illegal instruction"
Hi, After upgrading OpenMPI (from 1.6.5 to 2.0.0) and my program (from 3.2 to 4.0), still the parallel run aborts with the "Illegal instruction" error in the middle on the run. I wonder why this happens and how can I debug more? How can I find that this error is related to the program itself, mpi or system libraries? Gilles gave a suggestion about using ulimit to create a core file ( https://mail-archive.com/users@lists.open-mpi.org/msg29919.html). Please see the following: mahmood@cluster:tran$ cat sc.sh #!/bin/bash ulimit -c unlimited exec /share/apps/siesta/siesta-4.0/tpar/transiesta < trans-cc.fdf mahmood@cluster:tran$ cat hosts.txt compute-0-1 mahmood@cluster:tran$ hostname cluster mahmood@cluster:tran$ #/share/apps/siesta/openmpi-2.0.0/bin/mpirun -hostfile hosts.txt -np 15 sc.sh -- mpirun noticed that process rank 0 with PID 5383 on node compute-0-1 exited on signal 4 (Illegal instruction). -- Now I see a file core.5383 It is a very huge file (1290018816 bytes)!!! How can I process that? Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users