Dear Prof. Blaha, Thank you for your earlier email. Running the command manually gives the following output (for a GaAs structure that works fine in serial or k-point parallel form). I am still not sure what to try next. Any suggestions?
matstud at ursa:~/WienDisk/Fons/GaAs> mpirun -np 4 ${WIENROOT}/lapw0_mpi lapw0.def w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault w2k_dispatch_signal(): received: Segmentation fault Child id 0 SIGSEGV, contact developers Child id 1 SIGSEGV, contact developers Child id 3 SIGSEGV, contact developers Child id 2 SIGSEGV, contact developers application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1) The MPI compilation options from siteconfig are as follows: (the settings are from the Intel MKL link advisor plus the fftw3 library) Current settings: RP RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 $(R_LIBS) FP FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback MP MPIRUN commando : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ The file parallel_options now reads setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_" I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the Intel MPI environment as the siteconfig prompt only mentioned mich2. As I mentioned the mpirun command seems to work fine. For example, the fftw3 benchmark program gives with 24 processes mpirun -np 24 ./mpi-bench 1024x1024 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2 On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote: > Hard to say. > > What is in $WIENROOT/parallel_options ? > MPI_REMOTE should be 0 ! > > Otherwise run lapw0_mpi by "hand": > > mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def (or including .machinefile > .machine0) > > > Am 24.08.2012 02:24, schrieb Paul Fons: >> Greetings all, >> I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) >> and the latest Intel compilers with identical mpi launch problems and I >> am hoping for some suggestions as to where to look to fix things. Note >> that the serial and k-point parallel versions of the code run fine (I >> have optimized GaAs a lot in my troubleshooting!). >> >> Environment. >> >> I am using the latest intel fort, icc, and impi libraries for linux. >> >> matstud at pyxis:~/Wien2K> ifort --version >> ifort (IFORT) 12.1.5 20120612 >> Copyright (C) 1985-2012 Intel Corporation. All rights reserved. >> >> matstud at pyxis:~/Wien2K> mpirun --version >> Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824 >> Copyright (C) 2003-2011, Intel Corporation. All rights reserved. >> >> matstud at pyxis:~/Wien2K> icc --version >> icc (ICC) 12.1.5 20120612 >> Copyright (C) 1985-2012 Intel Corporation. All rights reserved. >> >> >> My OPTIONS files from /siteconfig_lapw >> >> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback >> current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR >> -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback >> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread >> current:DPARALLEL:'-DParallel' >> current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread >> -lmkl_core -openmp -lpthread >> current:RP_LIBS:-L$(MKLROOT)/lib/intel64 >> $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a >> $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 >> -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core >> -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ >> -lfftw3_mpi -lfftw3 $(R_LIBS) >> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ >> >> >> >> >> The code compiles and links without error. It runs fine in serial mode >> and in k-point parallel mode, e.g. >> >> .machines with >> >> 1:localhost >> 1:localhost >> 1:localhost >> granularity:1 >> extrafine:1 >> >> This runs fine. When I attempt to run a mpi process with 12 processes >> (on a 12 core machine), I crash and burn (see below) with a SIGSEV error >> with instructions to contact the developers. >> >> The linking options were derived from Intel's mkl link advisor (the >> version on the intel site. I should add that the mpi-bench in fftw3 >> works fine using the intel mpi as do commands like hostname or even >> abinit so it would appear that that the Intel MPI environment itself is >> fine. I have wasted a lot of time trying to figure out how to fix this >> before writing to the list, but at this point, I feel like a monkey at a >> keyboard attempting to duplicate Shakesphere -- if you know what I mean. >> Thanks in advance for any heads up that you can offer. >> >> >> >> .machines >> >> lapw0:localhost:12 >> 1:localhost:12 >> granularity:1 >> extrafine:1 >> >>> stop error >> >> error: command /home/matstud/Wien2K/lapw0para -c lapw0.def failed >> 0.029u 0.046s 0:00.93 6.4% 0+0k 0+176io 0pf+0w >> Child id 2 SIGSEGV, contact developers >> Child id 8 SIGSEGV, contact developers >> Child id 7 SIGSEGV, contact developers >> Child id 11 SIGSEGV, contact developers >> Child id 10 SIGSEGV, contact developers >> Child id 9 SIGSEGV, contact developers >> Child id 6 SIGSEGV, contact developers >> Child id 5 SIGSEGV, contact developers >> Child id 4 SIGSEGV, contact developers >> Child id 3 SIGSEGV, contact developers >> Child id 1 SIGSEGV, contact developers >> Child id 0 SIGSEGV, contact developers >> -------- .machine0 : 12 processors >>> lapw0 -p (09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST >>> 2012 >> >> cycle 1 (Fri Aug 24 09:04:45 JST 2012) (40/99 to go) >> >> start (Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go) >> >> >> using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K >> on pyxis with PID 15375 >> Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs >> >> >> >> >> _______________________________________________ >> Wien mailing list >> Wien at zeus.theochem.tuwien.ac.at >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien >> > > -- > Peter Blaha > Inst.Materials Chemistry > TU Vienna > Getreidemarkt 9 > A-1060 Vienna > Austria > +43-1-5880115671 > _______________________________________________ > Wien mailing list > Wien at zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien Dr. Paul Fons Senior Research Scientist Functional Nano-phase-change Research Team Nanoelectronics Research Institute National Institute for Advanced Industrial Science & Technology METI AIST Central 4, Higashi 1-1-1 Tsukuba, Ibaraki JAPAN 305-8568 tel. +81-298-61-5636 fax. +81-298-61-2939 email: paul-fons at aist.go.jp The following lines are in a Japanese font ?305-8562 ????????????? 1-1-1 ????????? ?????????????? ???????????????? ????? ???????? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120824/0607a4a0/attachment.htm>