[Wien] Problems with mpi for Wien12.1

2012-08-29 Thread Paul Fons
I compiled fftw3 using the Intel suite as well.  The appropriate line from 
config.log reads

./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local --enable-mpi 
--enable-threads --prefix=/opt/local/fftw3

I note that the configuration file only calls for a mpicc compiler (and I used 
the Intel compiler) and not a fortran compiler.   The compiled code (mpi-bench 
does work fine with the Intel mpirun).


After commenting out the call W2kinit subroutine and recompiling lapw0 (via the 
siteconfig script), I attempted to run run_lapw in both serial and parallel 
forms as you can see below.  The serial form worked fine

Paul

matstud at ursa:~/WienDisk/Fons/GaAs run_lapw
 LAPW0 END
 LAPW1 END
 LAPW2 END
 CORE  END
 MIXER END
ec cc and fc_conv 1 1 1

   stop


matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred

   stop error



On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote:

 One suggestion: comment out the line towards the top of lapw0.F
 
  call W2kinit
 
 You should get a more human readable error message.
 
 As an addendum, was fftw3 compiled with mpiifort? I assume from your email 
 that it was, just checking.
 
 N.B., there is a small chance that this will hang your computer.
 
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

Dr. Paul Fons
Senior Research Scientist
Functional Nano-phase-change Research Team
Nanoelectronics Research Institute
National Institute for Advanced Industrial Science  Technology
METI

AIST Central 4, Higashi 1-1-1
Tsukuba, Ibaraki JAPAN 305-8568

tel. +81-298-61-5636
fax. +81-298-61-2939

email: paul-fons at aist.go.jp

The following lines are in a Japanese font

?305-8562 ? 1-1-1
?
??

?






-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120829/98a836c2/attachment.htm


[Wien] Problems with mpi for Wien12.1

2012-08-28 Thread Paul Fons
Dear Prof. Blaha,
I was under the impression that I had replied promptly to your initial 
question.  I apologize for the delay.  I have been using the mpi complier of 
the intel mpi (4.0.3) suite, namely mpiifort.  Here are the results of the 
which operation and the underlying version of the fortran compiler.  Thank you 
for your hep.

matstud at ursa:~/Wien2K which mpiifort
/opt/intel/impi/4.0.3.008/intel64/bin/mpiifort
matstud at ursa:~/Wien2K mpiifort --version
ifort (IFORT) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

Below find a short sequence from a recompile of lapw1 using siteconfig.  I note 
that mpiifort is being used.

touch .parallel
make PARALLEL='-DParallel' TYPE='REAL' TYPE_COMMENT='\!_REAL' \
  ./lapw1_mpi FORT=mpiifort FFLAGS=' 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback '-DParallel''
make[1]: Entering directory `/home/matstud/Wien2K_12_1/SRC_lapw1'
modules.F: REAL version extracted
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c modules_tmp_.F
mv modules_tmp_.o modules.o
rm modules_tmp_.F
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c abc.f
atpar.F: REAL version extracted
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c atpar_tmp_.F
mv atpar_tmp_.o atpar.o
rm atpar_tmp_.F
calkpt.F: REAL version extracted
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c calkpt_tmp_.F
mv calkpt_tmp_.o calkpt.o
rm calkpt_tmp_.F
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c cbcomb.f
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c coors.f
dscgst.F: REAL version extracted
mpiifort -I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include/intel64/lp64 
-I/opt/intel/composer_xe_2011_sp1.11.339/mkl/include -FR -mp1 -w -prec_div 
-pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback -DParallel -c dscgst_tmp_.F
mv dscgst_tmp_.o dscgst.o
rm dscgst_tmp_.F

and the final linking step

mv W2kinit_tmp_.o W2kinit.o
rm W2kinit_tmp_.F
mpiifort  -o ./lapw1c_mpi abc.o atpar.o bandv1.o calkpt.o cbcomb.o coors.o 
cputim.o dblr2k.o dgeqrl.o dgewy.o dgewyg.o dlbrfg.o dsbein1.o dscgst.o 
dstebz2.o dsyevx2.o dsyr2m.o dsyrb4.o dsyrb5l.o dsyrdt4.o dsywyv.o dsyxev4.o 
dvbes1.o eisps.o errclr.o errflg.o forfhs.o gaunt1.o gaunt2.o gbass.o gtfnam.o 
hamilt.o hns.o horb.o inikpt.o inilpw.o lapw1.o latgen.o lmsort.o locdef.o 
lohns.o lopw.o matmm.o modules.o nn.o outerr.o outwinb.o prtkpt.o prtres.o 
pzheevx16.o rdswar.o rint13.o rotate.o rotdef.o seclit.o seclr4.o seclr5.o 
select.o service.o setkpt.o setwar.o sphbes.o stern.o SymmRot.o tapewf.o 
ustphx.o vectf.o warpin.o wfpnt.o wfpnt1.o ylm.o zhcgst.o zheevx2.o zher2m.o 
jacdavblock.o make_albl.o global2local.o par_syrk.o my_dsygst.o refblas_dtrsm.o 
seclit_par.o pdsyevx17.o pdstebz17.o pdgetri_my.o pzgetri_my.o W2kutils.o 
W2kinit.o -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback 
-L/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64 -pthread 
-L/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64 
/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64/libmkl_blas95_lp64.a 
/opt/intel/composer_xe_2011_sp1.11.339/mkl/lib/intel64/libmkl_lapack95_lp64.a 
-lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm 
-L/opt/local/fftw3/lib/ -lfftw3_mpi -lfftw3 -lmkl_lapack95_lp64 
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread 
make[1]: Leaving directory `/home/matstud/Wien2K_12_1/SRC_lapw1'
Copying programs
  SRC_lapw1/lapw1
  SRC_lapw1/lapw1c
  SRC_lapw1/lapw1_mpi
  SRC_lapw1/lapw1c_mpi

done.

Compile time errors (if any) were:



On Aug 24, 2012, at 11:59 PM, Peter Blaha wrote:

 To make this comment more clear:
 
 You did not tell us which command you are using for MPF (parallel compiler). 
 It is not always mpif90 (as this could use some other compiler or mpi)
 it could be   mpiifort or something else.
 
 Then check with  which mpif90   

[Wien] Problems with mpi for Wien12.1

2012-08-28 Thread Laurence Marks
Hmmm. I was hoping for something human readable like a traceback showing
where it died. Please check both the lapw0.error files and case.dayfile to
see if they gave anything useful. Also, what are the last few lines of
case.output?

You may get somewhere by running the mpirun command by hand, I have seen
this help. If you understand csh then you want to add an echo $tt at the
relevant location in lapw0para.

If not you can change the first line of lapw1para to -xf rather than just
-f. Then do x lapw0 -p again. You will get a hundred or so lines of
output one of which towards the end will be something like

mpirun -np 12 ...

Then paste this line by itself in a terminal. Maybe then something human
readable will emerge.

Unfortunately debugging mpi is not trivial, and a SIGSEV can also be non
trivial as the error may not appear at the right place, making life more
fun.

Do you gave totalview or a similar mpi debugger available? You can get a
demo version of totalview free for I believe 30 days.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
Research is to see what everybody else has seen, and to think what nobody
else has thought
Albert Szent-Gyorgi
 On Aug 28, 2012 10:09 PM, Paul Fons paul-fons at aist.go.jp wrote:

  I compiled fftw3 using the Intel suite as well.  The appropriate line
 from config.log reads

 ./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local --enable-mpi
 --enable-threads --prefix=/opt/local/fftw3

  I note that the configuration file only calls for a mpicc compiler (and
 I used the Intel compiler) and not a fortran compiler.   The compiled code
 (mpi-bench does work fine with the Intel mpirun).


  After commenting out the call W2kinit subroutine and recompiling lapw0
 (via the siteconfig script), I attempted to run run_lapw in both serial and
 parallel forms as you can see below.  The serial form worked fine

  Paul

  matstud at ursa:~/WienDisk/Fons/GaAs run_lapw
  LAPW0 END
  LAPW1 END
  LAPW2 END
  CORE  END
  MIXER END
 ec cc and fc_conv 1 1 1

 stop


  matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred

 stop error



  On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote:

  One suggestion: comment out the line towards the top of lapw0.F

  call W2kinit

 You should get a more human readable error message.

 As an addendum, was fftw3 compiled with mpiifort? I assume from your email
 that it was, just checking.

 N.B., there is a small chance that this will hang your computer.
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


   Dr. Paul Fons
  Senior Research Scientist
  Functional Nano-phase-change Research Team
  Nanoelectronics Research Institute
  National Institute for Advanced Industrial Science  Technology
  METI

  AIST Central 4, Higashi 1-1-1
  Tsukuba, Ibaraki JAPAN 305-8568

  tel. +81-298-61-5636
  fax. +81-298-61-2939

  email: *paul-fons at aist.go.jp*

  The following lines are in a Japanese font

  ?305-8562 ? 1-1-1
  ?
  ??
  
  ?
  






-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120828/078e27b3/attachment.htm


[Wien] Problems with mpi for Wien12.1

2012-08-28 Thread Laurence Marks
N.b., I meant lapw0 everywhere as I believe you said that is where the
problem is. If it is in lapw1, then change everything to lapw1 in my email.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
Research is to see what everybody else has seen, and to think what nobody
else has thought
Albert Szent-Gyorgi
 On Aug 28, 2012 10:34 PM, Laurence Marks L-marks at northwestern.edu
wrote:

 Hmmm. I was hoping for something human readable like a traceback showing
 where it died. Please check both the lapw0.error files and case.dayfile to
 see if they gave anything useful. Also, what are the last few lines of
 case.output?

 You may get somewhere by running the mpirun command by hand, I have seen
 this help. If you understand csh then you want to add an echo $tt at the
 relevant location in lapw0para.

 If not you can change the first line of lapw1para to -xf rather than
 just -f. Then do x lapw0 -p again. You will get a hundred or so lines of
 output one of which towards the end will be something like

 mpirun -np 12 ...

 Then paste this line by itself in a terminal. Maybe then something human
 readable will emerge.

 Unfortunately debugging mpi is not trivial, and a SIGSEV can also be non
 trivial as the error may not appear at the right place, making life more
 fun.

 Do you gave totalview or a similar mpi debugger available? You can get a
 demo version of totalview free for I believe 30 days.

 ---
 Professor Laurence Marks
 Department of Materials Science and Engineering
 Northwestern University
 www.numis.northwestern.edu 1-847-491-3996
 Research is to see what everybody else has seen, and to think what nobody
 else has thought
 Albert Szent-Gyorgi
  On Aug 28, 2012 10:09 PM, Paul Fons paul-fons at aist.go.jp wrote:

  I compiled fftw3 using the Intel suite as well.  The appropriate line
 from config.log reads

 ./configure CC=icc F77=ifort MPICC=mpiicc --prefix=/opt/local
 --enable-mpi --enable-threads --prefix=/opt/local/fftw3

  I note that the configuration file only calls for a mpicc compiler (and
 I used the Intel compiler) and not a fortran compiler.   The compiled code
 (mpi-bench does work fine with the Intel mpirun).


  After commenting out the call W2kinit subroutine and recompiling lapw0
 (via the siteconfig script), I attempted to run run_lapw in both serial and
 parallel forms as you can see below.  The serial form worked fine

  Paul

  matstud at ursa:~/WienDisk/Fons/GaAs run_lapw
  LAPW0 END
  LAPW1 END
  LAPW2 END
  CORE  END
  MIXER END
 ec cc and fc_conv 1 1 1

 stop


  matstud at ursa:~/WienDisk/Fons/GaAs run_lapw -p
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred
 forrtl: severe (174): SIGSEGV, segmentation fault occurred

 stop error



  On Aug 28, 2012, at 9:16 AM, Laurence Marks wrote:

  One suggestion: comment out the line towards the top of lapw0.F

  call W2kinit

 You should get a more human readable error message.

 As an addendum, was fftw3 compiled with mpiifort? I assume from your
 email that it was, just checking.

 N.B., there is a small chance that this will hang your computer.
 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


   Dr. Paul Fons
  Senior Research Scientist
  Functional Nano-phase-change Research Team
  Nanoelectronics Research Institute
  National Institute for Advanced Industrial Science  Technology
  METI

  AIST Central 4, Higashi 1-1-1
  Tsukuba, Ibaraki JAPAN 305-8568

  tel. +81-298-61-5636
  fax. +81-298-61-2939

  email: *paul-fons at aist.go.jp*

  The following lines are in a Japanese font

  ?305-8562 ? 1-1-1
  ?
  ??
  
  ?
  






-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120828/5c5ca9e3/attachment.htm


[Wien] Problems with mpi for Wien12.1

2012-08-27 Thread Laurence Marks
One suggestion: comment out the line towards the top of lapw0.F

 call W2kinit

You should get a more human readable error message.

As an addendum, was fftw3 compiled with mpiifort? I assume from your email
that it was, just checking.

N.B., there is a small chance that this will hang your computer.
-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120827/c3477afc/attachment.htm


[Wien] Problems with mpi for Wien12.1

2012-08-24 Thread Paul Fons
Greetings all,

  I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1) and the 
latest Intel compilers with identical mpi launch problems and I am hoping for 
some suggestions as to where to look to fix things.  Note that the serial and 
k-point parallel versions of the code run fine (I have optimized GaAs a lot in 
my troubleshooting!).

Environment.

I am using the latest intel fort, icc, and impi libraries for linux.

matstud at pyxis:~/Wien2K ifort --version
ifort (IFORT) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

matstud at pyxis:~/Wien2K mpirun --version
Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

matstud at pyxis:~/Wien2K icc --version
icc (ICC) 12.1.5 20120612
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.


My OPTIONS files from /siteconfig_lapw

current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR -mp1 
-w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -openmp -lpthread
current:RP_LIBS:-L$(MKLROOT)/lib/intel64 
$(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a 
$(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 
-lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
-lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ 
-lfftw3_mpi -lfftw3 $(R_LIBS)
current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_




The code compiles and links without error.  It runs fine in serial mode and in 
k-point parallel mode, e.g.

.machines with

1:localhost
1:localhost
1:localhost
granularity:1
extrafine:1

This runs fine.  When I attempt to run a mpi process with 12 processes (on a 12 
core machine), I crash and burn (see below) with a SIGSEV error with 
instructions to contact the developers.

The linking options were derived from Intel's mkl link advisor (the version on 
the intel site.  I should add that the mpi-bench in fftw3 works fine using the 
intel mpi as do commands like hostname or even abinit so it would appear that 
that the Intel MPI environment itself is fine.  I have wasted a lot of time 
trying to figure out how to fix this before writing to the list, but at this 
point, I feel like a monkey at a keyboard attempting to duplicate Shakesphere 
-- if you know what I mean.  Thanks in advance for any heads up that you can 
offer.



.machines

lapw0:localhost:12
1:localhost:12
granularity:1
extrafine:1

   stop error

error: command   /home/matstud/Wien2K/lapw0para -c lapw0.def   failed
0.029u 0.046s 0:00.93 6.4%  0+0k 0+176io 0pf+0w
 Child id   2 SIGSEGV, contact developers
 Child id   8 SIGSEGV, contact developers
 Child id   7 SIGSEGV, contact developers
 Child id  11 SIGSEGV, contact developers
 Child id  10 SIGSEGV, contact developers
 Child id   9 SIGSEGV, contact developers
 Child id   6 SIGSEGV, contact developers
 Child id   5 SIGSEGV, contact developers
 Child id   4 SIGSEGV, contact developers
 Child id   3 SIGSEGV, contact developers
 Child id   1 SIGSEGV, contact developers
 Child id   0 SIGSEGV, contact developers
 .machine0 : 12 processors
   lapw0 -p(09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 
 2012

cycle 1 (Fri Aug 24 09:04:45 JST 2012)  (40/99 to go)

start   (Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go)


using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K
on pyxis with PID 15375
Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs

-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120824/1d183071/attachment.htm


[Wien] Problems with mpi for Wien12.1

2012-08-24 Thread Peter Blaha
Hard to say.

What is in $WIENROOT/parallel_options ?
MPI_REMOTE should be 0 !

Otherwise run lapw0_mpi by hand:

mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def   (or including  .machinefile 
.machine0)


Am 24.08.2012 02:24, schrieb Paul Fons:
 Greetings all,
I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1)
 and the latest Intel compilers with identical mpi launch problems and I
 am hoping for some suggestions as to where to look to fix things.  Note
 that the serial and k-point parallel versions of the code run fine (I
 have optimized GaAs a lot in my troubleshooting!).

 Environment.

 I am using the latest intel fort, icc, and impi libraries for linux.

 matstud at pyxis:~/Wien2K ifort --version
 ifort (IFORT) 12.1.5 20120612
 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

 matstud at pyxis:~/Wien2K mpirun --version
 Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
 Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

 matstud at pyxis:~/Wien2K icc --version
 icc (ICC) 12.1.5 20120612
 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.


 My OPTIONS files from /siteconfig_lapw

 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR
 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
 current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
 -lmkl_core -openmp -lpthread
 current:RP_LIBS:-L$(MKLROOT)/lib/intel64
 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
 $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64
 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
 -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/
 -lfftw3_mpi -lfftw3 $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_




 The code compiles and links without error.  It runs fine in serial mode
 and in k-point parallel mode, e.g.

 .machines with

 1:localhost
 1:localhost
 1:localhost
 granularity:1
 extrafine:1

 This runs fine.  When I attempt to run a mpi process with 12 processes
 (on a 12 core machine), I crash and burn (see below) with a SIGSEV error
 with instructions to contact the developers.

 The linking options were derived from Intel's mkl link advisor (the
 version on the intel site.  I should add that the mpi-bench in fftw3
 works fine using the intel mpi as do commands like hostname or even
 abinit so it would appear that that the Intel MPI environment itself is
 fine.  I have wasted a lot of time trying to figure out how to fix this
 before writing to the list, but at this point, I feel like a monkey at a
 keyboard attempting to duplicate Shakesphere -- if you know what I mean.
   Thanks in advance for any heads up that you can offer.



 .machines

 lapw0:localhost:12
 1:localhost:12
 granularity:1
 extrafine:1

   stop error

 error: command   /home/matstud/Wien2K/lapw0para -c lapw0.def   failed
 0.029u 0.046s 0:00.93 6.4%0+0k 0+176io 0pf+0w
   Child id   2 SIGSEGV, contact developers
   Child id   8 SIGSEGV, contact developers
   Child id   7 SIGSEGV, contact developers
   Child id  11 SIGSEGV, contact developers
   Child id  10 SIGSEGV, contact developers
   Child id   9 SIGSEGV, contact developers
   Child id   6 SIGSEGV, contact developers
   Child id   5 SIGSEGV, contact developers
   Child id   4 SIGSEGV, contact developers
   Child id   3 SIGSEGV, contact developers
   Child id   1 SIGSEGV, contact developers
   Child id   0 SIGSEGV, contact developers
  .machine0 : 12 processors
   lapw0 -p   (09:04:45) starting parallel lapw0 at Fri Aug 24 09:04:45 JST 
 2012

  cycle 1  (Fri Aug 24 09:04:45 JST 2012)  (40/99 to go)

  start(Fri Aug 24 09:04:45 JST 2012) with lapw0 (40/99 to go)


 using WIEN2k_12.1 (Release 22/7/2012) in /home/matstud/Wien2K
 on pyxis with PID 15375
 Calculating GaAs in /usr/local/share/Wien2K/Fons/GaAs




 ___
 Wien mailing list
 Wien at zeus.theochem.tuwien.ac.at
 http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


-- 
Peter Blaha
Inst.Materials Chemistry
TU Vienna
Getreidemarkt 9
A-1060 Vienna
Austria
+43-1-5880115671


[Wien] Problems with mpi for Wien12.1

2012-08-24 Thread Paul Fons
Dear Prof. Blaha,
Thank you for your earlier email.  Running the command manually gives the 
following output (for a GaAs structure that works fine in serial or k-point 
parallel form).  I am still not sure what to try next.  Any suggestions?
 

matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi 
lapw0.def
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
 Child id   0 SIGSEGV, contact developers
 Child id   1 SIGSEGV, contact developers
 Child id   3 SIGSEGV, contact developers
 Child id   2 SIGSEGV, contact developers
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)


The MPI compilation options from siteconfig are as follows: (the settings are 
from the Intel MKL link advisor plus the fftw3 library)

 Current settings:
 RP  RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 
$(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a 
$(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64 
-lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
-lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ 
-lfftw3_mpi -lfftw3 $(R_LIBS)
 FP  FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 
-I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 
-traceback
 MP  MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_

The file parallel_options now reads
setenv USE_REMOTE 1
setenv MPI_REMOTE 0
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_


I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied to the 
Intel MPI environment as the siteconfig prompt only mentioned mich2.

As I mentioned the mpirun command seems to work fine.  For example, the fftw3 
benchmark program gives with 24 processes

mpirun -np 24 ./mpi-bench 1024x1024
Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2



On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote:

 Hard to say.
 
 What is in $WIENROOT/parallel_options ?
 MPI_REMOTE should be 0 !
 
 Otherwise run lapw0_mpi by hand:
 
 mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def   (or including  .machinefile 
 .machine0)
 
 
 Am 24.08.2012 02:24, schrieb Paul Fons:
 Greetings all,
   I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1)
 and the latest Intel compilers with identical mpi launch problems and I
 am hoping for some suggestions as to where to look to fix things.  Note
 that the serial and k-point parallel versions of the code run fine (I
 have optimized GaAs a lot in my troubleshooting!).
 
 Environment.
 
 I am using the latest intel fort, icc, and impi libraries for linux.
 
 matstud at pyxis:~/Wien2K ifort --version
 ifort (IFORT) 12.1.5 20120612
 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.
 
 matstud at pyxis:~/Wien2K mpirun --version
 Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
 Copyright (C) 2003-2011, Intel Corporation. All rights reserved.
 
 matstud at pyxis:~/Wien2K icc --version
 icc (ICC) 12.1.5 20120612
 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.
 
 
 My OPTIONS files from /siteconfig_lapw
 
 current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
 current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR
 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback
 current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
 current:DPARALLEL:'-DParallel'
 current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
 -lmkl_core -openmp -lpthread
 current:RP_LIBS:-L$(MKLROOT)/lib/intel64
 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
 $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64
 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
 -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/
 -lfftw3_mpi -lfftw3 $(R_LIBS)
 current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
 
 
 
 
 The code compiles and links without error.  It runs fine in serial mode
 and in k-point parallel mode, e.g.
 
 .machines with
 
 1:localhost
 1:localhost
 1:localhost
 granularity:1
 extrafine:1
 
 This runs fine.  When I attempt to run a mpi process with 12 processes
 (on a 12 core machine), I crash and burn (see below) with a SIGSEV error
 with instructions to contact the developers.
 
 The linking options were derived from Intel's mkl link advisor (the
 version on the intel site.  I should add that the mpi-bench in fftw3
 works fine using the intel mpi as do commands like hostname or even
 abinit 

[Wien] Problems with mpi for Wien12.1

2012-08-24 Thread Laurence Marks
In my experience the SIGSEV normally comes from mixing different flavors of
mpif90 and mpirun. Openmpi, mpich2 and Intels mpi all need different
versions of blacs. You can also have problems if you choose the wrong model
for integers in the linking advisor page. I would check using ldd that
lapw0_mpi is linked to the right version, and that the default versions are
correct (e.g. which mpirun). Often you can minimize problems by using
static linking for mpi.

N.B. The contact developers message is a relic of when some code was
added for fault handlers and to eliminate issues with limits that used to
be pervasive. It should probably be removed.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
Research is to see what everybody else has seen, and to think what nobody
else has thought
Albert Szent-Gyorgi
 On Aug 24, 2012 8:22 AM, Paul Fons paul-fons at aist.go.jp wrote:

  Dear Prof. Blaha,
 Thank you for your earlier email.  Running the command manually gives the
 following output (for a GaAs structure that works fine in serial or k-point
 parallel form).  I am still not sure what to try next.  Any suggestions?


  matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi
 lapw0.def
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
  Child id   0 SIGSEGV, contact developers
  Child id   1 SIGSEGV, contact developers
  Child id   3 SIGSEGV, contact developers
  Child id   2 SIGSEGV, contact developers
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)


  The MPI compilation options from siteconfig are as follows: (the
 settings are from the Intel MKL link advisor plus the fftw3 library)

   Current settings:
  RP  RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64
 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a
 $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64
 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
 -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/
 -lfftw3_mpi -lfftw3 $(R_LIBS)
  FP  FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64
 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
 -DFFTW3 -traceback
  MP  MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_
 _EXEC_

  The file parallel_options now reads
  setenv USE_REMOTE 1
 setenv MPI_REMOTE 0
 setenv WIEN_GRANULARITY 1
 setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_


  I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied
 to the Intel MPI environment as the siteconfig prompt only mentioned mich2.

  As I mentioned the mpirun command seems to work fine.  For example, the
 fftw3 benchmark program gives with 24 processes

  mpirun -np 24 ./mpi-bench 1024x1024
 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2



  On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote:

  Hard to say.

 What is in $WIENROOT/parallel_options ?
 MPI_REMOTE should be 0 !

 Otherwise run lapw0_mpi by hand:

 mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def   (or including  .machinefile
 .machine0)


 Am 24.08.2012 02:24, schrieb Paul Fons:

 Greetings all,

   I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1)

 and the latest Intel compilers with identical mpi launch problems and I

 am hoping for some suggestions as to where to look to fix things.  Note

 that the serial and k-point parallel versions of the code run fine (I

 have optimized GaAs a lot in my troubleshooting!).


  Environment.


  I am using the latest intel fort, icc, and impi libraries for linux.


  matstud at pyxis:~/Wien2K ifort --version

 ifort (IFORT) 12.1.5 20120612

 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.


  matstud at pyxis:~/Wien2K mpirun --version

 Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824

 Copyright (C) 2003-2011, Intel Corporation. All rights reserved.


  matstud at pyxis:~/Wien2K icc --version

 icc (ICC) 12.1.5 20120612

 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.



  My OPTIONS files from /siteconfig_lapw


  current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback

 current:FPOPT:-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -FR

 -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 -traceback

 current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread

 current:DPARALLEL:'-DParallel'

 current:R_LIBS:-lmkl_lapack95_lp64 

[Wien] Problems with mpi for Wien12.1

2012-08-24 Thread Peter Blaha
To make this comment more clear:

You did not tell us which command you are using for MPF (parallel compiler). It 
is not always mpif90 (as this could use some other compiler or mpi)
it could be   mpiifort or something else.

Then check with  which mpif90   if it points to the proper directory/version 
of mpi,

Am 24.08.2012 15:35, schrieb Laurence Marks:
 In my experience the SIGSEV normally comes from mixing different flavors of 
 mpif90 and mpirun. Openmpi, mpich2 and Intels mpi all need different versions 
 of blacs. You can also
 have problems if you choose the wrong model for integers in the linking 
 advisor page. I would check using ldd that lapw0_mpi is linked to the right 
 version, and that the default
 versions are correct (e.g. which mpirun). Often you can minimize problems by 
 using static linking for mpi.

 N.B. The contact developers message is a relic of when some code was added 
 for fault handlers and to eliminate issues with limits that used to be 
 pervasive. It should probably
 be removed.

 ---
 Professor Laurence Marks
 Department of Materials Science and Engineering
 Northwestern University
 www.numis.northwestern.edu http://www.numis.northwestern.edu 1-847-491-3996
 Research is to see what everybody else has seen, and to think what nobody 
 else has thought
 Albert Szent-Gyorgi

 On Aug 24, 2012 8:22 AM, Paul Fons paul-fons at aist.go.jp 
 mailto:paul-fons at aist.go.jp wrote:

 Dear Prof. Blaha,
 Thank you for your earlier email.  Running the command manually gives the 
 following output (for a GaAs structure that works fine in serial or k-point 
 parallel form).  I am
 still not sure what to try next.  Any suggestions?

 matstud at ursa:~/WienDisk/Fons/GaAs mpirun -np 4 ${WIENROOT}/lapw0_mpi 
 lapw0.def
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
 w2k_dispatch_signal(): received: Segmentation fault
   Child id   0 SIGSEGV, contact developers
   Child id   1 SIGSEGV, contact developers
   Child id   3 SIGSEGV, contact developers
   Child id   2 SIGSEGV, contact developers
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
 application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
 APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)


 The MPI compilation options from siteconfig are as follows: (the settings 
 are from the Intel MKL link advisor plus the fftw3 library)

   Current settings:
   RP  RP_LIB(SCALAPACK+PBLAS): -L$(MKLROOT)/lib/intel64 
 $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a 
 $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -lmkl_scalapack_lp64
 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
 -lmkl_blacs_intelmpi_lp64 -openmp -lpthread -lm -L/opt/local/fftw3/lib/ 
 -lfftw3_mpi -lfftw3 $(R_LIBS)
   FP  FPOPT(par.comp.options): -I$(MKLROOT)/include/intel64/lp64 
 -I$(MKLROOT)/include -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -DFFTW3 
 -traceback
   MP  MPIRUN commando: mpirun -np _NP_ -machinefile _HOSTS_ 
 _EXEC_

 The file parallel_options now reads
 setenv USE_REMOTE 1
 setenv MPI_REMOTE 0
 setenv WIEN_GRANULARITY 1
 setenv WIEN_MPIRUN mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_


 I changed the MPI_REMOTE to 0 as suggested (I was not sure this applied 
 to the Intel MPI environment as the siteconfig prompt only mentioned mich2.

 As I mentioned the mpirun command seems to work fine.  For example, the 
 fftw3 benchmark program gives with 24 processes

 mpirun -np 24 ./mpi-bench 1024x1024
 Problem: 1024x1024, setup: 126.32 ms, time: 15.98 ms, ``mflops'': 6562.2



 On Aug 24, 2012, at 3:05 PM, Peter Blaha wrote:

 Hard to say.

 What is in $WIENROOT/parallel_options ?
 MPI_REMOTE should be 0 !

 Otherwise run lapw0_mpi by hand:

 mpirun -np 4 $WIENROOT/lapw0_mpi lapw0.def   (or including  .machinefile 
 .machine0)


 Am 24.08.2012 02:24, schrieb Paul Fons:
 Greetings all,
   I have compiled Wien2K 12.1 under OpenSuse 11.4 (and OpenSuse 12.1)
 and the latest Intel compilers with identical mpi launch problems and I
 am hoping for some suggestions as to where to look to fix things.  Note
 that the serial and k-point parallel versions of the code run fine (I
 have optimized GaAs a lot in my troubleshooting!).

 Environment.

 I am using the latest intel fort, icc, and impi libraries for linux.

 matstud at pyxis:~/Wien2K ifort --version
 ifort (IFORT) 12.1.5 20120612
 Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

 matstud at pyxis:~/Wien2K mpirun --version
 Intel(R) MPI Library for