[Wien] Problems with mpi for Wien12.1
I compiled fftw3 using the Intel suite as well. The appropriate line from config.log reads ./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local --enable-mpi --enable-threads --prefix=/opt/local/fftw3 I note that the configuration file only calls for a mpicc compiler (and I used the Intel compiler) and not a fortran compiler. The compiled code (mpi-bench does work fine with the Intel mpirun). After commenting out the call W2kinit subroutine and recompiling lapw0 (via the siteconfig script), I attempted to run run_lapw in both serial and parallel forms as you can see below. The serial form worked fine Paul matstud at ursa:~/WienDisk/Fons/GaAs run_lapw LAPW0 END LAPW1 END LAPW2 END CORE END MIXER END ec cc and fc_conv 1 1 1 stop matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred stop error On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote: One suggestion: comment out the line towards the top of lapw0.F call W2kinit You should get a more human readable error message. As an addendum, was fftw3 compiled with mpiifort? I assume from your email that it was, just checking. N.B., there is a small chance that this will hang your computer. ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien Dr. Paul Fons Senior Research Scientist Functional Nano-phase-change Research Team Nanoelectronics Research Institute National Institute for Advanced Industrial Science Technology METI AIST Central 4, Higashi 1-1-1 Tsukuba, Ibaraki JAPAN 305-8568 tel. +81-298-61-5636 fax. +81-298-61-2939 email: paul-fons at aist.go.jp The following lines are in a Japanese font ?305-8562 ? 1-1-1 ? ?? ? -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120829/98a836c2/attachment.htm
[Wien] Problems with mpi for Wien12.1
Dear Prof. Blaha, I was under the impression that I had replied promptly to your initial question. I apologize for the delay. I have been using the mpi complier of the intel mpi (4.0.3) suite, namely mpiifort. Here are the results of the which operation and the underlying version of the fortran compiler. Thank you for your hep. matstud at ursa:~/Wien2K which mpiifort /opt/intel/impi/4.0.3.008/intel64/bin/mpiifort matstud at ursa:~/Wien2K mpiifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. Below find a short sequence from a recompile of lapw1 using siteconfig. I note that mpiifort is being used. touch .parallel make PARALLEL='-DParallel' TYPE='REAL' TYPE_COMMENT='\!_REAL' \ ./lapw1_mpi FORT=mpiifort FFLAGS=' -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback '-DParallel'' make[1]: Entering directory `/home/matstud/Wien2K_12_1/SRC_lapw1' modules.F: REAL version extracted mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c modules_tmp_.F mv modules_tmp_.o modules.o rm modules_tmp_.F mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c abc.f atpar.F: REAL version extracted mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c atpar_tmp_.F mv atpar_tmp_.o atpar.o rm atpar_tmp_.F calkpt.F: REAL version extracted mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c calkpt_tmp_.F mv calkpt_tmp_.o calkpt.o rm calkpt_tmp_.F mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c cbcomb.f mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c coors.f dscgst.F: REAL version extracted mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c dscgst_tmp_.F mv dscgst_tmp_.o dscgst.o rm dscgst_tmp_.F and the final linking step mv W2kinit_tmp_.o W2kinit.o rm W2kinit_tmp_.F mpiifort -o ./lapw1c_mpi abc.o atpar.o bandv1.o calkpt.o cbcomb.o coors.o cputim.o dblr2k.o dgeqrl.o dgewy.o dgewyg.o dlbrfg.o dsbein1.o dscgst.o dstebz2.o dsyevx2.o dsyr2m.o dsyrb4.o dsyrb5l.o dsyrdt4.o dsywyv.o dsyxev4.o dvbes1.o eisps.o errclr.o errflg.o forfhs.o gaunt1.o gaunt2.o gbass.o gtfnam.o hamilt.o hns.o horb.o inikpt.o inilpw.o lapw1.o latgen.o lmsort.o locdef.o lohns.o lopw.o matmm.o modules.o nn.o outerr.o outwinb.o prtkpt.o prtres.o pzheevx16.o rdswar.o rint13.o rotate.o rotdef.o seclit.o seclr4.o seclr5.o select.o service.o setkpt.o setwar.o sphbes.o stern.o SymmRot.o tapewf.o ustphx.o vectf.o warpin.o wfpnt.o wfpnt1.o ylm.o zhcgst.o zheevx2.o zher2m.o jacdavblock.o make_albl.o global2local.o par_syrk.o my_dsygst.o refblas_dtrsm.o seclit_par.o pdsyevx17.o pdstebz17.o pdgetri_my.o pzgetri_my.o W2kutils.o W2kinit.o -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback -L/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64 -pthread -L/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64 /opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64/libmkl_blas95_lp64.a /opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread make[1]: Leaving directory `/home/matstud/Wien2K_12_1/SRC_lapw1' Copying programs SRC_lapw1/lapw1 SRC_lapw1/lapw1c SRC_lapw1/lapw1_mpi SRC_lapw1/lapw1c_mpi done. Compile time errors (if any) were: On Aug 24, 2012, at 11:59 PM, Peter Blaha wrote: To make this comment more clear: You did not tell us which command you are using for MPF (parallel compiler). It is not always mpif90 (as this could use some other compiler or mpi) it could be mpiifort or something else. Then check with which mpif90
[Wien] Problems with mpi for Wien12.1
Hmmm. I was hoping for something human readable like a traceback showing where it died. Please check both the lapw0.error files and case.dayfile to see if they gave anything useful. Also, what are the last few lines of case.output? You may get somewhere by running the mpirun command by hand, I have seen this help. If you understand csh then you want to add an echo $tt at the relevant location in lapw0para. If not you can change the first line of lapw1para to -xf rather than just -f. Then do x lapw0 -p again. You will get a hundred or so lines of output one of which towards the end will be something like mpirun -np 12 ... Then paste this line by itself in a terminal. Maybe then something human readable will emerge. Unfortunately debugging mpi is not trivial, and a SIGSEV can also be non trivial as the error may not appear at the right place, making life more fun. Do you gave totalview or a similar mpi debugger available? You can get a demo version of totalview free for I believe 30 days. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Aug 28, 2012 10:09 PM, Paul Fons paul-fons at aist.go.jp wrote: I compiled fftw3 using the Intel suite as well. The appropriate line from config.log reads ./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local --enable-mpi --enable-threads --prefix=/opt/local/fftw3 I note that the configuration file only calls for a mpicc compiler (and I used the Intel compiler) and not a fortran compiler. The compiled code (mpi-bench does work fine with the Intel mpirun). After commenting out the call W2kinit subroutine and recompiling lapw0 (via the siteconfig script), I attempted to run run_lapw in both serial and parallel forms as you can see below. The serial form worked fine Paul matstud at ursa:~/WienDisk/Fons/GaAs run_lapw LAPW0 END LAPW1 END LAPW2 END CORE END MIXER END ec cc and fc_conv 1 1 1 stop matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred stop error On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote: One suggestion: comment out the line towards the top of lapw0.F call W2kinit You should get a more human readable error message. As an addendum, was fftw3 compiled with mpiifort? I assume from your email that it was, just checking. N.B., there is a small chance that this will hang your computer. ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien Dr. Paul Fons Senior Research Scientist Functional Nano-phase-change Research Team Nanoelectronics Research Institute National Institute for Advanced Industrial Science Technology METI AIST Central 4, Higashi 1-1-1 Tsukuba, Ibaraki JAPAN 305-8568 tel. +81-298-61-5636 fax. +81-298-61-2939 email: *paul-fons at aist.go.jp* The following lines are in a Japanese font ?305-8562 ? 1-1-1 ? ?? ? -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120828/078e27b3/attachment.htm
[Wien] Problems with mpi for Wien12.1
N.b., I meant lapw0 everywhere as I believe you said that is where the problem is. If it is in lapw1, then change everything to lapw1 in my email. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Aug 28, 2012 10:34 PM, Laurence Marks L-marks at northwestern.edu wrote: Hmmm. I was hoping for something human readable like a traceback showing where it died. Please check both the lapw0.error files and case.dayfile to see if they gave anything useful. Also, what are the last few lines of case.output? You may get somewhere by running the mpirun command by hand, I have seen this help. If you understand csh then you want to add an echo $tt at the relevant location in lapw0para. If not you can change the first line of lapw1para to -xf rather than just -f. Then do x lapw0 -p again. You will get a hundred or so lines of output one of which towards the end will be something like mpirun -np 12 ... Then paste this line by itself in a terminal. Maybe then something human readable will emerge. Unfortunately debugging mpi is not trivial, and a SIGSEV can also be non trivial as the error may not appear at the right place, making life more fun. Do you gave totalview or a similar mpi debugger available? You can get a demo version of totalview free for I believe 30 days. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Aug 28, 2012 10:09 PM, Paul Fons paul-fons at aist.go.jp wrote: I compiled fftw3 using the Intel suite as well. The appropriate line from config.log reads ./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local --enable-mpi --enable-threads --prefix=/opt/local/fftw3 I note that the configuration file only calls for a mpicc compiler (and I used the Intel compiler) and not a fortran compiler. The compiled code (mpi-bench does work fine with the Intel mpirun). After commenting out the call W2kinit subroutine and recompiling lapw0 (via the siteconfig script), I attempted to run run_lapw in both serial and parallel forms as you can see below. The serial form worked fine Paul matstud at ursa:~/WienDisk/Fons/GaAs run_lapw LAPW0 END LAPW1 END LAPW2 END CORE END MIXER END ec cc and fc_conv 1 1 1 stop matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred stop error On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote: One suggestion: comment out the line towards the top of lapw0.F call W2kinit You should get a more human readable error message. As an addendum, was fftw3 compiled with mpiifort? I assume from your email that it was, just checking. N.B., there is a small chance that this will hang your computer. ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien Dr. Paul Fons Senior Research Scientist Functional Nano-phase-change Research Team Nanoelectronics Research Institute National Institute for Advanced Industrial Science Technology METI AIST Central 4, Higashi 1-1-1 Tsukuba, Ibaraki JAPAN 305-8568 tel. +81-298-61-5636 fax. +81-298-61-2939 email: *paul-fons at aist.go.jp* The following lines are in a Japanese font ?305-8562 ? 1-1-1 ? ?? ? -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120828/5c5ca9e3/attachment.htm
[Wien] Problems with mpi for Wien12.1
One suggestion: comment out the line towards the top of lapw0.F call W2kinit You should get a more human readable error message. As an addendum, was fftw3 compiled with mpiifort? I assume from your email that it was, just checking. N.B., there is a small chance that this will hang your computer. -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120827/c3477afc/attachment.htm
[Wien] Problems with mpi for Wien12.1
Greetings all, I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!). Environment. I am using the latest intel fort, icc, and impi libraries for linux. matstud at pyxis:~/Wien2K ifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K mpirun --version Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824 Copyright (C) 2003-2011, Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. My OPTIONS files from /siteconfig_lapw current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread current:RP_LIBS:-L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The code compiles and links without error. It runs fine in serial mode and in k-point parallel mode, e.g. .machines with 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 This runs fine. When I attempt to run a mpi process with 12 processes (on a 12 core machine), I crash and burn (see below) with a SIGSEV error with instructions to contact the developers. The linking options were derived from Intel's mkl link advisor (the version on the intel site. I should add that the mpi-bench in fftw3 works fine using the intel mpi as do commands like hostname or even abinit so it would appear that that the Intel MPI environment itself is fine. I have wasted a lot of time trying to figure out how to fix this before writing to the list, but at this point, I feel like a monkey at a keyboard attempting to duplicate Shakesphere -- if you know what I mean. Thanks in advance for any heads up that you can offer. .machines lapw0:localhost:12 1:localhost:12 granularity:1 extrafine:1 stop error error: command /home/matstud/Wien2K/lapw0para -c lapw0.def failed 0.029u 0.046s 0:00.93 6.4% 0+0k 0+176io 0pf+0w Child id 2 SIGSEGV, contact developers Child id 8 SIGSEGV, contact developers Child id 7 SIGSEGV, contact developers Child id 11 SIGSEGV, contact developers Child id 10 SIGSEGV, contact developers Child id 9 SIGSEGV, contact developers Child id 6 SIGSEGV, contact developers Child id 5 SIGSEGV, contact developers Child id 4 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 0 SIGSEGV, contact developers .machine0 : 12 processors lapw0 -p(09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 2012 cycle 1 (Fri Aug 24 09:04:45 JST 2012) (40/99 to go) start (Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go) using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K on pyxis with PID 15375 Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120824/1d183071/attachment.htm
[Wien] Problems with mpi for Wien12.1
Hard to say. What is in $WIENROOT/parallel_options ? MPI_REMOTE should be 0 ! Otherwise run lapw0_mpi by hand: mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile .machine0) Am 24.08.2012 02:24, schrieb Paul Fons: Greetings all, I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!). Environment. I am using the latest intel fort, icc, and impi libraries for linux. matstud at pyxis:~/Wien2K ifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K mpirun --version Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824 Copyright (C) 2003-2011, Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. My OPTIONS files from /siteconfig_lapw current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread current:RP_LIBS:-L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The code compiles and links without error. It runs fine in serial mode and in k-point parallel mode, e.g. .machines with 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 This runs fine. When I attempt to run a mpi process with 12 processes (on a 12 core machine), I crash and burn (see below) with a SIGSEV error with instructions to contact the developers. The linking options were derived from Intel's mkl link advisor (the version on the intel site. I should add that the mpi-bench in fftw3 works fine using the intel mpi as do commands like hostname or even abinit so it would appear that that the Intel MPI environment itself is fine. I have wasted a lot of time trying to figure out how to fix this before writing to the list, but at this point, I feel like a monkey at a keyboard attempting to duplicate Shakesphere -- if you know what I mean. Thanks in advance for any heads up that you can offer. .machines lapw0:localhost:12 1:localhost:12 granularity:1 extrafine:1 stop error error: command /home/matstud/Wien2K/lapw0para -c lapw0.def failed 0.029u 0.046s 0:00.93 6.4%0+0k 0+176io 0pf+0w Child id 2 SIGSEGV, contact developers Child id 8 SIGSEGV, contact developers Child id 7 SIGSEGV, contact developers Child id 11 SIGSEGV, contact developers Child id 10 SIGSEGV, contact developers Child id 9 SIGSEGV, contact developers Child id 6 SIGSEGV, contact developers Child id 5 SIGSEGV, contact developers Child id 4 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 0 SIGSEGV, contact developers .machine0 : 12 processors lapw0 -p (09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 2012 cycle 1 (Fri Aug 24 09:04:45 JST 2012) (40/99 to go) start(Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go) using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K on pyxis with PID 15375 Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- Peter Blaha Inst.Materials Chemistry TU Vienna Getreidemarkt 9 A-1060 Vienna Austria +43-1-5880115671
[Wien] Problems with mpi for Wien12.1
Dear Prof. Blaha, Thank you for your earlier email. Running the command manually gives the following output (for a GaAs structure that works fine in serial or k-point parallel form). I am still not sure what to try next. Any suggestions? matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi lapw0.def w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault Child id 0 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 2 SIGSEGV, contact developers application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) The MPI compilation options from siteconfig are as follows: (the settings are from the Intel MKL link advisor plus the fftw3 library) Current settings: RP RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) FP FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback MP MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The file parallel_options now reads setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the Intel MPI environment as the siteconfig prompt only mentioned mich2. As I mentioned the mpirun command seems to work fine. For example, the fftw3 benchmark program gives with 24 processes mpirun -np 24 ./mpi-bench 1024x1024 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2 On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote: Hard to say. What is in $WIENROOT/parallel_options ? MPI_REMOTE should be 0 ! Otherwise run lapw0_mpi by hand: mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile .machine0) Am 24.08.2012 02:24, schrieb Paul Fons: Greetings all, I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!). Environment. I am using the latest intel fort, icc, and impi libraries for linux. matstud at pyxis:~/Wien2K ifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K mpirun --version Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824 Copyright (C) 2003-2011, Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. My OPTIONS files from /siteconfig_lapw current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread current:RP_LIBS:-L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The code compiles and links without error. It runs fine in serial mode and in k-point parallel mode, e.g. .machines with 1:localhost 1:localhost 1:localhost granularity:1 extrafine:1 This runs fine. When I attempt to run a mpi process with 12 processes (on a 12 core machine), I crash and burn (see below) with a SIGSEV error with instructions to contact the developers. The linking options were derived from Intel's mkl link advisor (the version on the intel site. I should add that the mpi-bench in fftw3 works fine using the intel mpi as do commands like hostname or even abinit
[Wien] Problems with mpi for Wien12.1
In my experience the SIGSEV normally comes from mixing different flavors of mpif90 and mpirun. Openmpi, mpich2 and Intels mpi all need different versions of blacs. You can also have problems if you choose the wrong model for integers in the linking advisor page. I would check using ldd that lapw0_mpi is linked to the right version, and that the default versions are correct (e.g. which mpirun). Often you can minimize problems by using static linking for mpi. N.B. The contact developers message is a relic of when some code was added for fault handlers and to eliminate issues with limits that used to be pervasive. It should probably be removed. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Aug 24, 2012 8:22 AM, Paul Fons paul-fons at aist.go.jp wrote: Dear Prof. Blaha, Thank you for your earlier email. Running the command manually gives the following output (for a GaAs structure that works fine in serial or k-point parallel form). I am still not sure what to try next. Any suggestions? matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi lapw0.def w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault Child id 0 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 2 SIGSEGV, contact developers application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) The MPI compilation options from siteconfig are as follows: (the settings are from the Intel MKL link advisor plus the fftw3 library) Current settings: RP RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) FP FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback MP MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The file parallel_options now reads setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the Intel MPI environment as the siteconfig prompt only mentioned mich2. As I mentioned the mpirun command seems to work fine. For example, the fftw3 benchmark program gives with 24 processes mpirun -np 24 ./mpi-bench 1024x1024 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2 On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote: Hard to say. What is in $WIENROOT/parallel_options ? MPI_REMOTE should be 0 ! Otherwise run lapw0_mpi by hand: mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile .machine0) Am 24.08.2012 02:24, schrieb Paul Fons: Greetings all, I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!). Environment. I am using the latest intel fort, icc, and impi libraries for linux. matstud at pyxis:~/Wien2K ifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K mpirun --version Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824 Copyright (C) 2003-2011, Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K icc --version icc (ICC) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. My OPTIONS files from /siteconfig_lapw current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack95_lp64
[Wien] Problems with mpi for Wien12.1
To make this comment more clear: You did not tell us which command you are using for MPF (parallel compiler). It is not always mpif90 (as this could use some other compiler or mpi) it could be mpiifort or something else. Then check with which mpif90 if it points to the proper directory/version of mpi, Am 24.08.2012 15:35, schrieb Laurence Marks: In my experience the SIGSEV normally comes from mixing different flavors of mpif90 and mpirun. Openmpi, mpich2 and Intels mpi all need different versions of blacs. You can also have problems if you choose the wrong model for integers in the linking advisor page. I would check using ldd that lapw0_mpi is linked to the right version, and that the default versions are correct (e.g. which mpirun). Often you can minimize problems by using static linking for mpi. N.B. The contact developers message is a relic of when some code was added for fault handlers and to eliminate issues with limits that used to be pervasive. It should probably be removed. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu http://www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Aug 24, 2012 8:22 AM, Paul Fons paul-fons at aist.go.jp mailto:paul-fons at aist.go.jp wrote: Dear Prof. Blaha, Thank you for your earlier email. Running the command manually gives the following output (for a GaAs structure that works fine in serial or k-point parallel form). I am still not sure what to try next. Any suggestions? matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi lapw0.def w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault Child id 0 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 2 SIGSEGV, contact developers application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) The MPI compilation options from siteconfig are as follows: (the settings are from the Intel MKL link advisor plus the fftw3 library) Current settings: RP RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) FP FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback MP MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The file parallel_options now reads setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the Intel MPI environment as the siteconfig prompt only mentioned mich2. As I mentioned the mpirun command seems to work fine. For example, the fftw3 benchmark program gives with 24 processes mpirun -np 24 ./mpi-bench 1024x1024 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2 On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote: Hard to say. What is in $WIENROOT/parallel_options ? MPI_REMOTE should be 0 ! Otherwise run lapw0_mpi by hand: mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile .machine0) Am 24.08.2012 02:24, schrieb Paul Fons: Greetings all, I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the latest Intel compilers with identical mpi launch problems and I am hoping for some suggestions as to where to look to fix things. Note that the serial and k-point parallel versions of the code run fine (I have optimized GaAs a lot in my troubleshooting!). Environment. I am using the latest intel fort, icc, and impi libraries for linux. matstud at pyxis:~/Wien2K ifort --version ifort (IFORT) 12.1.5 20120612 Copyright (C) 1985-2012 Intel Corporation. All rights reserved. matstud at pyxis:~/Wien2K mpirun --version Intel(R) MPI Library for