Hi Maxim, Thanks for your reply! We tried MPIRUN=mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_, but the problem persists. The only difference is that stdout changes to ''? MPI: invalid option -hostfile ?''.
Thanks, Wei On Oct 31, 2010, at 10:40 PM, Maxim Rakitin wrote: > Hi, > > It looks like Intel's mpirun doesn't have '-machinefile' option. Instead of > this it has '-hostfile' option (form here: > http://downloadmirror.intel.com/18462/eng/nes_release_notes.txt). > > Try 'mpirun -h' for information about options and apply appropriate. > Best regards, > Maxim Rakitin > email: rms85 at physics.susu.ac.ru > web: http://www.susu.ac.ru > > 01.11.2010 4:56, Wei Xie ?????: >> >> Dear all WIEN2k community members: >> >> We encountered some problem when running in parallel (K-point, MPI or >> both)--the calculations crashed at LAPW2. Note we had no problem running it >> in serial. We have tried to diagnose the problem, recompile the code with >> difference options and test with difference cases and parameters based on >> similar problems reported on the mail list, but the problem persists. So we >> write here hoping someone can offer us some suggestion. We have attached >> related files below for your reference. Your replies are appreciated in >> advance! >> >> This is a TiC example running in both Kpoint and MPI parallel on two nodes >> r1i0n0 and r1i0n1 (8cores/node): >> >> 1. stdout (abridged) >> MPI: invalid option -machinefile >> real 0m0.004s >> user 0m0.000s >> sys 0m0.000s >> ... >> MPI: invalid option -machinefile >> real 0m0.003s >> user 0m0.000s >> sys 0m0.004s >> TiC.scf1up_1: No such file or directory. >> >> LAPW2 - Error. Check file lapw2.error >> cp: cannot stat `.in.tmp': No such file or directory >> rm: cannot remove `.in.tmp': No such file or directory >> rm: cannot remove `.in.tmp1': No such file or directory >> >> 2. TiC.dayfile (abridged) >> ... >> start (Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go) >> cycle 1 (Sun Oct 31 16:25:06 MDT 2010) (40/99 to go) >> >> > lapw0 -p (16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT >> > 2010 >> -------- .machine0 : 16 processors >> invalid "local" arg: -machinefile >> >> 0.436u 0.412s 0:04.63 18.1% 0+0k 2600+0io 1pf+0w >> > lapw1 -up -p (16:25:12) starting parallel lapw1 at Sun Oct 31 >> > 16:25:12 MDT 2010 >> -> starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010 >> running LAPW1 in parallel mode (using .machines) >> 2 number_of_parallel_jobs >> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) r1i0n1 >> r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) r1i0n0 r1i0n0 >> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) Summary of lapw1para: >> r1i0n0 k=0 user=0 wallclock=0 >> r1i0n1 k=0 user=0 wallclock=0 >> ... >> 0.116u 0.316s 0:10.48 4.0% 0+0k 0+0io 0pf+0w >> > lapw2 -up -p (16:25:34) running LAPW2 in parallel mode >> ** LAPW2 crashed! >> 0.032u 0.104s 0:01.13 11.5% 0+0k 82304+0io 8pf+0w >> error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed >> >> 3. uplapw2.error >> Error in LAPW2 >> 'LAPW2' - can't open unit: 18 >> >> 'LAPW2' - filename: TiC.vspup >> >> 'LAPW2' - status: old form: formatted >> >> ** testerror: Error in Parallel LAPW2 >> >> 4. .machines >> # >> 1:r1i0n0:8 >> 1:r1i0n1:8 >> lapw0:r1i0n0:8 r1i0n1:8 >> granularity:1 >> extrafine:1 >> >> 5. compilers, MPI and options >> Intel Compilers and MKL 11.1.046 >> Intel MPI 3.2.0.011 >> >> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback >> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback >> current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t >> -pthread >> current:DPARALLEL:'-DParallel' >> current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core >> -openmp -lpthread -lguide >> current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t >> -lmkl_scalapack_lp64 >> /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a >> -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core >> -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread >> -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS) >> current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_ >> >> Best regards, >> Wei Xie >> Computational Materials Group >> University of Wisconsin-Madison >> >> >> _______________________________________________ >> Wien mailing list >> Wien at zeus.theochem.tuwien.ac.at >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > _______________________________________________ > Wien mailing list > Wien at zeus.theochem.tuwien.ac.at > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20101031/2ce15505/attachment.htm>